• Title/Summary/Keyword: sampling bias

Search Result 183, Processing Time 0.028 seconds

Adjusting sampling bias in case-control genetic association studies

  • Seo, Geum Chu;Park, Taesung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1127-1135
    • /
    • 2014
  • Genome-wide association studies (GWAS) are designed to discover genetic variants such as single nucleotide polymorphisms (SNPs) that are associated with human complex traits. Although there is an increasing interest in the application of GWAS methodologies to population-based cohorts, many published GWAS have adopted a case-control design, which raise an issue related to a sampling bias of both case and control samples. Because of unequal selection probabilities between cases and controls, the samples are not representative of the population that they are purported to represent. Therefore, non-random sampling in case-control study can potentially lead to inconsistent and biased estimates of SNP-trait associations. In this paper, we proposed inverse-probability of sampling weights based on disease prevalence to eliminate a case-control sampling bias in estimation and testing for association between SNPs and quantitative traits. We apply the proposed method to a data from the Korea Association Resource project and show that the standard estimators applied to the weighted data yield unbiased estimates.

Time-Balanced Quota Sampling for Telephone Survey (전화조사를 위한 시간균형할당표본추출)

  • Huh, Myung-Hoe;Hwang, Jin-Mo
    • Survey Research
    • /
    • v.7 no.2
    • /
    • pp.39-52
    • /
    • 2006
  • Most of Korean survey institutions adopt quota sampling for telephone surveys based on region, gender and age-band. In weekdays, it is well blown that there exist substantial differences in day time in-house rate by individual's socio-demographic attributes. So, quota sampling may induce systematic respondent selection bias. To solve the problem, we propose "time-balanced quota sampling" in which interviewer's call time-band is added as an quota variable. Furthermore, we propose "time-balanced quasi-quota sampling" which is derived by partially relaxing evening time quotas in time-balanced quota sampling. We compare the conventional and the newly proposed quota sampling schemes by drawing Monte Carlo samples from the hypothetical population for which the Korea 2004 time use survey data is assumed.

  • PDF

Efficient Markov Chain Monte Carlo for Bayesian Analysis of Neural Network Models

  • Paul E. Green;Changha Hwang;Lee, Sangbock
    • Journal of the Korean Statistical Society
    • /
    • v.31 no.1
    • /
    • pp.63-75
    • /
    • 2002
  • Most attempts at Bayesian analysis of neural networks involve hierarchical modeling. We believe that similar results can be obtained with simpler models that require less computational effort, as long as appropriate restrictions are placed on parameters in order to ensure propriety of posterior distributions. In particular, we adopt a model first introduced by Lee (1999) that utilizes an improper prior for all parameters. Straightforward Gibbs sampling is possible, with the exception of the bias parameters, which are embedded in nonlinear sigmoidal functions. In addition to the problems posed by nonlinearity, direct sampling from the posterior distributions of the bias parameters is compounded due to the duplication of hidden nodes, which is a source of multimodality. In this regard, we focus on sampling from the marginal posterior distribution of the bias parameters with Markov chain Monte Carlo methods that combine traditional Metropolis sampling with a slice sampler described by Neal (1997, 2001). The methods are illustrated with data examples that are largely confined to the analysis of nonparametric regression models.

Comparison of Latin Hypercube Sampling and Simple Random Sampling Applied to Neural Network Modeling of HfO2 Thin Film Fabrication

  • Lee, Jung-Hwan;Ko, Young-Don;Yun, Il-Gu;Han, Kyong-Hee
    • Transactions on Electrical and Electronic Materials
    • /
    • v.7 no.4
    • /
    • pp.210-214
    • /
    • 2006
  • In this paper, two sampling methods which are Latin hypercube sampling (LHS) and simple random sampling were. compared to improve the modeling speed of neural network model. Sampling method was used to generate initial weights and bias set. Electrical characteristic data for $HfO_2$ thin film was used as modeling data. 10 initial parameter sets which are initial weights and bias sets were generated using LHS and simple random sampling, respectively. Modeling was performed with generated initial parameters and measured epoch number. The other network parameters were fixed. The iterative 20 minimum epoch numbers for LHS and simple random sampling were analyzed by nonparametric method because of their nonnormality.

The Weighting Adjustment for Unit Nonresponse in the Stratified Sampling (층화 표본에서 단위 무응답에 대한 가중치 조정 방법)

  • 염준근;손창균
    • Journal of Korean Society for Quality Management
    • /
    • v.26 no.3
    • /
    • pp.82-99
    • /
    • 1998
  • In sampling survey the nonresponse reduces the precision of the estimator becuase of the nonresponse bias of the estimator. Deville, et al.(1993) considered the generalized raking procedure with the auxiliary information under five distance measures for reducing the nonresponse bias of the estimator. This paper extends the classical weighting adjustment of Deville, et al.(1993) to the stratified sampling case with three among five measures.

  • PDF

Sampling Bias of Discontinuity Orientation Measurements for Rock Slope Design in Linear Sampling Technique : A Case Study of Rock Slopes in Western North Carolina (선형 측정 기법에 의해 발생하는 불연속면 방향성의 왜곡 : 서부 North Carolina의 암반 사면에서의 예)

  • 박혁진
    • Journal of the Korean Geotechnical Society
    • /
    • v.16 no.1
    • /
    • pp.145-155
    • /
    • 2000
  • Orientation data of discontinuities are of paramount importance for rock slope stability studies because they control the possibility of unstable conditions or excessive deformation. Most orientation data are collected by using linear sampling techniques, such as borehole fracture mapping and the detailed scanline method (outcrop mapping). However, these data, acquired by the above linear sampling techniques, are subjected to bias, owing to the orientation of the sampling line. Even though a weighting factor is applied to orientation data in order to reduce this bias, the bias will not be significantly reduced when certain sampling orientations are involved. That is, if the linear sampling orientation nearly parallels the discontinuity orientation, most discontinuities orientation data which are parallel to sampling line will be excluded from the survey result. This phenomenon can cause serious misinterpretation of discontinuity orientation data because critical information is omitted. In the case study, orientation data collected by using the borehole fracture mapping method (vertical scanline) were compared to those based on orientation data from the detailed scanline method (horizontal scanline). Differences in results for the two procedures revealed a concern that a representative orientation of discontinuities was not accomplished. Equal-area, polar stereo nets were used to determine the distribution of dip angles and to compare the data distribution fur the borehole method versus those for the scanline method.

  • PDF

Estimation of P(X > Y) when X and Y are dependent random variables using different bivariate sampling schemes

  • Samawi, Hani M.;Helu, Amal;Rochani, Haresh D.;Yin, Jingjing;Linder, Daniel
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.5
    • /
    • pp.385-397
    • /
    • 2016
  • The stress-strength models have been intensively investigated in the literature in regards of estimating the reliability ${\theta}$ = P(X > Y) using parametric and nonparametric approaches under different sampling schemes when X and Y are independent random variables. In this paper, we consider the problem of estimating ${\theta}$ when (X, Y) are dependent random variables with a bivariate underlying distribution. The empirical and kernel estimates of ${\theta}$ = P(X > Y), based on bivariate ranked set sampling (BVRSS) are considered, when (X, Y) are paired dependent continuous random variables. The estimators obtained are compared to their counterpart, bivariate simple random sampling (BVSRS), via the bias and mean square error (MSE). We demonstrate that the suggested estimators based on BVRSS are more efficient than those based on BVSRS. A simulation study is conducted to gain insight into the performance of the proposed estimators. A real data example is provided to illustrate the process.

A Novel Simulation Architecture of Configurational-Bias Gibbs Ensemble Monte Carlo for the Conformation of Polyelectrolytes Partitioned in Confined Spaces

  • Chun, Myung-Suk
    • Macromolecular Research
    • /
    • v.11 no.5
    • /
    • pp.393-397
    • /
    • 2003
  • By applying a configurational-bias Gibbs ensemble Monte Carlo algorithm, priority simulation results regarding the conformation of non-dilute polyelectrolytes in solvents are obtained. Solutions of freely-jointed chains are considered, and a new method termed strandwise configurational-bias sampling is developed so as to effectively overcome a difficulty on the transfer of polymer chains. The structure factors of polyelectrolytes in the bulk as well as in the confined space are estimated with variations of the polymer charge density.

BERT-Based Logits Ensemble Model for Gender Bias and Hate Speech Detection

  • Sanggeon Yun;Seungshik Kang;Hyeokman Kim
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.641-651
    • /
    • 2023
  • Malicious hate speech and gender bias comments are common in online communities, causing social problems in our society. Gender bias and hate speech detection has been investigated. However, it is difficult because there are diverse ways to express them in words. To solve this problem, we attempted to detect malicious comments in a Korean hate speech dataset constructed in 2020. We explored bidirectional encoder representations from transformers (BERT)-based deep learning models utilizing hyperparameter tuning, data sampling, and logits ensembles with a label distribution. We evaluated our model in Kaggle competitions for gender bias, general bias, and hate speech detection. For gender bias detection, an F1-score of 0.7711 was achieved using an ensemble of the Soongsil-BERT and KcELECTRA models. The general bias task included the gender bias task, and the ensemble model achieved the best F1-score of 0.7166.

Off-level Sampling Method for Bias Stabilization of an Electro-Optic Mach-Zehnder Modulator (전기 광학 광변조기의 바이어스 안정화를 위한 오프 레벨 샘플링 방법)

  • 양충열;홍현하;김해근
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.1B
    • /
    • pp.42-47
    • /
    • 2000
  • A new method for stabilizing the bias of an Electro-Optic mach-Zehnder modulator has been developed to maximize the switching extinction ratio in burst mode packet traffic. By sampling and minimizing the off-level output power of the modulator, a high extinction optical gate switch in obtain regardless of the variation of the packet traffic density.

  • PDF