• 제목/요약/키워드: Clustered data

검색결과 543건 처리시간 0.035초

Modeling clustered count data with discrete weibull regression model

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • 제29권4호
    • /
    • pp.413-420
    • /
    • 2022
  • In this study we adapt discrete weibull regression model for clustered count data. Discrete weibull regression model has an attractive feature that it can handle both under and over dispersion data. We analyzed the eighth Korean National Health and Nutrition Examination Survey (KNHANES VIII) from 2019 to assess the factors influencing the 1 month outpatient stay in 17 different regions. We compared the results using clustered discrete Weibull regression model with those of Poisson, negative binomial, generalized Poisson and Conway-maxwell Poisson regression models, which are widely used in count data analyses. The results show that the clustered discrete Weibull regression model using random intercept model gives the best fit. Simulation study is also held to investigate the performance of the clustered discrete weibull model under various dispersion setting and zero inflated probabilities. In this paper it is shown that using a random effect with discrete Weibull regression can flexibly model count data with various dispersion without the risk of making wrong assumptions about the data dispersion.

Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation

  • Yoo, Hanna;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • 제25권2호
    • /
    • pp.159-172
    • /
    • 2018
  • In many epidemiological studies, missing values in the outcome arise due to censoring. Such censoring is what makes survival analysis special and differentiated from other analytical methods. There are many methods that deal with censored data in survival analysis. However, few studies have dealt with missing covariates in survival data. Furthermore, studies dealing with missing covariates are rare when data are clustered. In this paper, we conducted a simulation study to compare results of several missing data methods when data had clustered multi-structured type with missing covariates. In this study, we modeled unknown baseline hazard and frailty with Bayesian B-Spline to obtain more smooth and accurate estimates. We also used prior information to achieve more accurate results. We assumed the missing mechanism as MAR. We compared the performance of five different missing data techniques and compared these results through simulation studies. We also presented results from a Multi-Center study of Korean IBD patients with Crohn's disease(Lee et al., Journal of the Korean Society of Coloproctology, 28, 188-194, 2012).

Sample size calculations for clustered count data based on zero-inflated discrete Weibull regression models

  • Hanna Yoo
    • Communications for Statistical Applications and Methods
    • /
    • 제31권1호
    • /
    • pp.55-64
    • /
    • 2024
  • In this study, we consider the sample size determination problem for clustered count data with many zeros. In general, zero-inflated Poisson and binomial models are commonly used for zero-inflated data; however, in real data the assumptions that should be satisfied when using each model might be violated. We calculate the required sample size based on a discrete Weibull regression model that can handle both underdispersed and overdispersed data types. We use the Monte Carlo simulation to compute the required sample size. With our proposed method, a unified model with a low failure risk can be used to cope with the dispersed data type and handle data with many zeros, which appear in groups or clusters sharing a common variation source. A simulation study shows that our proposed method provides accurate results, revealing that the sample size is affected by the distribution skewness, covariance structure of covariates, and amount of zeros. We apply our method to the pancreas disorder length of the stay data collected from Western Australia.

개발제한구역 해제지역내 집단취락 개발잠재력 평가분석 (Assessment Analysis on Development Potential of the Clustered Settlements in the Released Green-Belt)

  • 최임주;안준홍
    • 한국지리정보학회지
    • /
    • 제11권4호
    • /
    • pp.112-121
    • /
    • 2008
  • 본 연구에서는 부산광역시 기장군 개발제한구역 해제지내 집단취락지를 대상으로 순수개발지표 및 향후 개발여건들을 고려한 표준화 점수 도출로 우선순위를 결정하여 집단취락의 개발 잠재력을 도출하고자 하였다. 본 연구는 부산광역시 GIS Data를 사용하여 객관적이고 과학적인 분석을 위해 자연적, 물리적, 개발적, 접근성 측면의 4개 부문에서 개별지표를 선정하여 분석하였다. 분석 결과 해안변에 입지한 대규모 취락들은 개별지표 값들이 높은 지역으로 개발 잠재력이 높은 것으로 평가되었으며, 국도 14호선 서측의 내륙에 입지한 소규모 취락들은 개별지표 값들이 낮게 나타나 개발 잠재력이 낮은 것으로 평가되었다.

  • PDF

Tests for homogeneity of proportions in clustered binomial data

  • Jeong, Kwang Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제23권5호
    • /
    • pp.433-444
    • /
    • 2016
  • When we observe binary responses in a cluster (such as rat lab-subjects), they are usually correlated to each other. In clustered binomial counts, the independence assumption is violated and we encounter an extra-variation. In the presence of extra-variation, the ordinary statistical analyses of binomial data are inappropriate to apply. In testing the homogeneity of proportions between several treatment groups, the classical Pearson chi-squared test has a severe flaw in the control of Type I error rates. We focus on modifying the chi-squared statistic by incorporating variance inflation factors. We suggest a method to adjust data in terms of dispersion estimate based on a quasi-likelihood model. We explain the testing procedure via an illustrative example as well as compare the performance of a modified chi-squared test with competitive statistics through a Monte Carlo study.

Clustered Storage Server 환경에서 뉴스 데이터에 적합한 분산 저장방법 (Efficient striping policy of NOD data on clustered storage server)

  • 정귀옥;박성호;김영주;정기동
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 1998년도 가을 학술발표논문집 Vol.25 No.2 (3)
    • /
    • pp.89-91
    • /
    • 1998
  • 현대 사회의 정보 요구 증가와 편리함의 추구는 정보통신 기술의 발달과 함께 멀티미디어 데이터 서비스를 급증 시켰다. NOD 데이터의 경우 이러한 요구에 부합하므로, 많은 사용자를 가지게 될 것이며, 그에 따른 제반 요건으로 서버 구현에서 scalability, availability, reliability 등이 중요한 요건이다. 따라서 이러한 요건을 멀티미디어 데이터 특성을 이용한 저장 방법으로 만족시키려는 많은 연구가 있다. 그러나 NOD 시스템에 대한 연구는 미흡한 실정이며 clustered 환경에서의 New 데이터에 대한 연구는 거의 없다. VOD 데이터에 적합한 것으로 알려진 일반적인 저장 방법이 NOD 데이터에 반드시 적합한 것이 아니며, 본 논문에서는 기존에 연구된 데이터 저장 방법 중에서 NOD 데이터의 small volume, skewed popularity distribution 등의 특성을 고려하여 clustered storage server환경에 맞는 striping 정책을 찾는다.

  • PDF

집락자료의 분할표에서 독립성검정 (Testing Independence in Contingency Tables with Clustered Data)

  • 정광모;이현영
    • 응용통계연구
    • /
    • 제17권2호
    • /
    • pp.337-346
    • /
    • 2004
  • 랜덤표본에 관한 이원분할표의 독립성검정에는 통상 피어슨의 카이제곱적합도검정과 우도비검정을 사용한다. 그러나 랜덤표본이 아닌 집락자료에 관한 분할표의 경우에는 이들 검정법은 잘못된 결과를 나타낸다. 이러한 경우에는 공변량의 고정효과 외에 집락에 따른 변량효과를 함께 포함하는 일반화선형혼합모형을 고려함으로써 집락간의 이질성과 집락내의 종속성을 반영할 수 있다. 본 연구에서는 집락자료의 분할표에 대한 일반화선형혼합모형을 소개하고 실례를 통하여 이들 모형의 적합에 대해 논의한다.

Genetic variation and relationship of Artemisia capillaris Thunb.(Compositae) by RAPD analysis

  • Kim, Jung-Hyun;Kim, Dong-Kap;Kim, Joo-Hwan
    • 한국자원식물학회지
    • /
    • 제22권3호
    • /
    • pp.242-247
    • /
    • 2009
  • Randomly Amplified Polymorphic DNA (RAPD) was performed to define the genetic variation and relationships of Artemisia capillaris. Fifteen populations by the distributions and habitat were collected to conduct RAPD analysis. RAPD markers were observed mainly between 300bp and 1600bp. Total 72 scorable markers from 7 primers were applied to generate the genetic matrix, and 69 bands were polymorphic and only 3 bands were monomorphic. The genetic dissimilarity matrix by Nei's genetic distance (1972) and UPGMA phenogram were produced from the data matrix. Populations of Artemisia capillaris were clustered with high genetic affinities and cluster patterns were correlated with distributional patterns. Two big groups were clustered as southern area group and middle area group. The closest OTUs were GW2 and GG1 in middle area group, and GB1 from southern area group was clustered with OTUs in middle area group. RAPD data was useful to define the genetic variations and relationships of A. capillaris.

가족체계내 역동성요소에 근거한 가족유형에 따른 주부의 가정관리행동 (The role of family types clustered based on the intra system dynamics elements in explaining housewive's managerial behavior.)

  • 이연숙
    • 대한가정학회지
    • /
    • 제34권4호
    • /
    • pp.295-308
    • /
    • 1996
  • The purpose of this study was to explore how family types clustered based on the intra system dynamics explained housewive's managerial behavior. The data were collected by means of questionnaire distributed to a stratified sample of 544 housewives in Seoul who lived with husband and children. The questionnaires included FACES Ⅱ and Ⅲ, Communication Scale, Managerial behavior Scale and Life Satisfaction Scale. Frequency, percentile, mean, correlation, factor analysis, cluster analysis, One-way ANOVA with Scheffe test, and multiple regression were used to analyze the data. This study had resulted in three major findings. The first was that families were clustered by four types, named structed-separated family, flexible-connected family, change oriented emashed, and rigid-disengaed family. The second finding was that a difference in managerial behavior was found among four types of family. Housewives whose family were more connected each other and adapted more easily to changing situations showed better managerial behavior. The last one was that the managerial behavior of housewives was better explained by family types than socio-demographic variables. The recommendations for future research and the better ways to lead effective managerial behavior were suggested.

  • PDF

군집분석을 이용한 침수관련 유역특성 분류 (Classification of basin characteristics related to inundation using clustering)

  • 이한승;조재웅;강호선;황정근;문혜진
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2020년도 학술발표회
    • /
    • pp.96-96
    • /
    • 2020
  • In order to establish the risk criteria of inundation due to typhoons or heavy rainfall, research is underway to predict the limit rainfall using basin characteristics, limit rainfall and artificial intelligence algorithms. In order to improve the model performance in estimating the limit rainfall, the learning data are used after the pre-processing. When 50.0% of the entire data was removed as an outlier in the pre-processing process, it was confirmed that the accuracy is over 90%. However, the use rate of learning data is very low, so there is a limitation that various characteristics cannot be considered. Accordingly, in order to predict the limit rainfall reflecting various watershed characteristics by increasing the use rate of learning data, the watersheds with similar characteristics were clustered. The algorithms used for clustering are K-Means, Agglomerative, DBSCAN and Spectral Clustering. The k-Means, DBSCAN and Agglomerative clustering algorithms are clustered at the impervious area ratio, and the Spectral clustering algorithm is clustered in various forms depending on the parameters. If the results of the clustering algorithm are applied to the limit rainfall prediction algorithm, various watershed characteristics will be considered, and at the same time, the performance of predicting the limit rainfall will be improved.

  • PDF