• Title/Summary/Keyword: False Discovery Rate

Search Result 43, Processing Time 0.025 seconds

Comparison of multiscale multiple change-points estimators (SMUCE와 FDR segmentation 방법에 의한 다중변화점 추정법 비교)

  • Kim, Jaehee
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.561-572
    • /
    • 2019
  • We study false discovery rate segmentation (FDRSeg) and simultaneous multiscale change-point estimator (SMUCE) methods for multiscale multiple change-point estimation, and compare empirical behavior via simulation. FSRSeg is based on the control of a false discovery rate while SMUCE used for the multiscale local likelihood ratio tests. FDRSeg seems to work best if the number of change-points is large; however, FDRSeg and SMUCE methods can both provide similar estimation results when there are only a small number of change-points. As a real data application, multiple change-points estimation is done with the well-log data.

Multivariate Process Control Chart for Controlling the False Discovery Rate

  • Park, Jang-Ho;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.11 no.4
    • /
    • pp.385-389
    • /
    • 2012
  • With the development of computer storage and the rapidly growing ability to process large amounts of data, the multivariate control charts have received an increasing attention. The existing univariate and multivariate control charts are a single hypothesis testing approach to process mean or variance by using a single statistic plot. This paper proposes a multiple hypothesis approach to developing a new multivariate control scheme. Plotted Hotelling's $T^2$ statistics are used for computing the corresponding p-values and the procedure for controlling the false discovery rate in multiple hypothesis testing is applied to the proposed control scheme. Some numerical simulations were carried out to compare the performance of the proposed control scheme with the ordinary multivariate Shewhart chart in terms of the average run length. The results show that the proposed control scheme outperforms the existing multivariate Shewhart chart for all mean shifts.

Comparison and analysis of multiple testing methods for microarray gene expression data (유전자 발현 데이터에 대한 다중검정법 비교 및 분석)

  • Seo, Sumin;Kim, Tae Houn;Kim, Jaehee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.971-986
    • /
    • 2014
  • When thousands of hypotheses are tested simultaneously, the probability of rejecting any true hypotheses increases, and large multiplicity problems are generated. To solve these problems, researchers have proposed different approaches to multiple testing methods, considering family-wise error rate (FWER), false discovery rate (FDR) or false nondiscovery rate (FNR) as a type I error and some test statistics. In this article, we discuss Bonferroni (1960), Holm (1979), Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001) procedures based on T statistics, modified T statistics or local-pooled-error (LPE) statistics. We also consider Sun and Cai (2007) procedure based on Z statistics. These procedures are compared in the simulation and applied to Arabidopsis microarray gene expression data to identify differentially expressed genes.

Multiple Testing in Genomic Sequences Using Hamming Distance

  • Kang, Moonsu
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.899-904
    • /
    • 2012
  • High-dimensional categorical data models with small sample sizes have not been used extensively in genomic sequences that involve count (or discrete) or purely qualitative responses. A basic task is to identify differentially expressed genes (or positions) among a number of genes. It requires an appropriate test statistics and a corresponding multiple testing procedure so that a multivariate analysis of variance should not be feasible. A family wise error rate(FWER) is not appropriate to test thousands of genes simultaneously in a multiple testing procedure. False discovery rate(FDR) is better than FWER in multiple testing problems. The data from the 2002-2003 SARS epidemic shows that a conventional FDR procedure and a proposed test statistic based on a pseudo-marginal approach with Hamming distance performs better.

Estimation of Gini-Simpson index for SNP data

  • Kang, Joonsung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1557-1564
    • /
    • 2017
  • We take genomic sequences of high-dimensional low sample size (HDLSS) without ordering of response categories into account. When constructing an appropriate test statistics in this model, the classical multivariate analysis of variance (MANOVA) approach might not be useful owing to very large number of parameters and very small sample size. For these reasons, we present a pseudo marginal model based upon the Gini-Simpson index estimated via Bayesian approach. In view of small sample size, we consider the permutation distribution by every possible n! (equally likely) permutation of the joined sample observations across G groups of (sizes $n_1,{\ldots}n_G$). We simulate data and apply false discovery rate (FDR) and positive false discovery rate (pFDR) with associated proposed test statistics to the data. And we also analyze real SARS data and compute FDR and pFDR. FDR and pFDR procedure along with the associated test statistics for each gene control the FDR and pFDR respectively at any level ${\alpha}$ for the set of p-values by using the exact conditional permutation theory.

Robust inference with order constraint in microarray study

  • Kang, Joonsung
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.5
    • /
    • pp.559-568
    • /
    • 2018
  • Gene classification can involve complex order-restricted inference. Examining gene expression pattern across groups with order-restriction makes standard statistical inference ineffective and thus, requires different methods. For this problem, Roy's union-intersection principle has some merit. The M-estimator adjusting for outlier arrays in a microarray study produces a robust test statistic with distribution-insensitive clustering of genes. The M-estimator in conjunction with a union-intersection principle provides a nonstandard robust procedure. By exact permutation distribution theory, a conditionally distribution-free test based on the proposed test statistic generates corresponding p-values in a small sample size setup. We apply a false discovery rate (FDR) as a multiple testing procedure to p-values in simulated data and real microarray data. FDR procedure for proposed test statistics controls the FDR at all levels of ${\alpha}$ and ${\pi}_0$ (the proportion of true null); however, the FDR procedure for test statistics based upon normal theory (ANOVA) fails to control FDR.

Separating Signals and Noises Using Mixture Model and Multiple Testing (혼합모델 및 다중 가설 검정을 이용한 신호와 잡음의 분류)

  • Park, Hae-Sang;Yoo, Si-Won;Jun, Chi-Hyuck
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.759-770
    • /
    • 2009
  • A problem of separating signals from noises is considered, when they are randomly mixed in the observation. It is assumed that the noise follows a Gaussian distribution and the signal follows a Gamma distribution, thus the underlying distribution of an observation will be a mixture of Gaussian and Gamma distributions. The parameters of the mixture model will be estimated from the EM algorithm. Then the signals and noises will be classified by a fixed threshold approach based on multiple testing using positive false discovery rate and Bayes error. The proposed method is applied to a real optical emission spectroscopy data for the quantitative analysis of inclusions. A simulation is carried out to compare the performance with the existing method using 3 sigma rule.

Intrusion Detection: Supervised Machine Learning

  • Fares, Ahmed H.;Sharawy, Mohamed I.;Zayed, Hala H.
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.4
    • /
    • pp.305-313
    • /
    • 2011
  • Due to the expansion of high-speed Internet access, the need for secure and reliable networks has become more critical. The sophistication of network attacks, as well as their severity, has also increased recently. As such, more and more organizations are becoming vulnerable to attack. The aim of this research is to classify network attacks using neural networks (NN), which leads to a higher detection rate and a lower false alarm rate in a shorter time. This paper focuses on two classification types: a single class (normal, or attack), and a multi class (normal, DoS, PRB, R2L, U2R), where the category of attack is also detected by the NN. Extensive analysis is conducted in order to assess the translation of symbolic data, partitioning of the training data and the complexity of the architecture. This paper investigates two engines; the first engine is the back-propagation neural network intrusion detection system (BPNNIDS) and the second engine is the radial basis function neural network intrusion detection system (BPNNIDS). The two engines proposed in this paper are tested against traditional and other machine learning algorithms using a common dataset: the DARPA 98 KDD99 benchmark dataset from International Knowledge Discovery and Data Mining Tools. BPNNIDS shows a superior response compared to the other techniques reported in literature especially in terms of response time, detection rate and false positive rate.

Statistical Methods for Gene Expression Data

  • Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.1
    • /
    • pp.59-77
    • /
    • 2004
  • Since the introduction of DNA microarray, a revolutionary high through-put biological technology, a lot of papers have been published to deal with the analyses of the gene expression data from the microarray. In this paper we review most papers relevant to the cDNA microarray data, classify them in statistical methods' point of view, and present some statistical methods deserving consideration and future study.

Hybrid Neural Networks for Intrusion Detection System

  • Jirapummin, Chaivat;Kanthamanon, Prasert
    • Proceedings of the IEEK Conference
    • /
    • 2002.07b
    • /
    • pp.928-931
    • /
    • 2002
  • Network based intrusion detection system is a computer network security tool. In this paper, we present an intrusion detection system based on Self-Organizing Maps (SOM) and Resilient Propagation Neural Network (RPROP) for visualizing and classifying intrusion and normal patterns. We introduce a cluster matching equation for finding principal associated components in component planes. We apply data from The Third International Knowledge Discovery and Data Mining Tools Competition (KDD cup'99) for training and testing our prototype. From our experimental results with different network data, our scheme archives more than 90 percent detection rate, and less than 5 percent false alarm rate in one SYN flooding and two port scanning attack types.

  • PDF