DOI QR코드

DOI QR Code

Controlling the false discovery rate in sparse VHAR models using knockoffs

KNOCKOFF를 이용한 성근 VHAR 모형의 FDR 제어

  • Minsu, Park (Department of Statistics, Sungkyunkwan University) ;
  • Jaewon, Lee (Department of Statistics, Sungkyunkwan University) ;
  • Changryong, Baek (Department of Statistics, Sungkyunkwan University)
  • 박민수 (성균관대학교 통계학과) ;
  • 이재원 (성균관대학교 통계학과) ;
  • 백창룡 (성균관대학교 통계학과)
  • Received : 2022.05.19
  • Accepted : 2022.07.22
  • Published : 2022.12.31

Abstract

FDR is widely used in high-dimensional data inference since it provides more liberal criterion contrary to FWER which is known to be very conservative by controlling Type-1 errors. This paper proposes a sparse VHAR model estimation method controlling FDR by adapting the knockoff introduced by Barber and Candès (2015). We also compare knockoff with conventional method using adaptive Lasso (AL) through extensive simulation study. We observe that AL shows sparsistency and decent forecasting performance, however, AL is not satisfactory in controlling FDR. To be more specific, AL tends to estimate zero coefficients as non-zero coefficients. On the other hand, knockoff controls FDR sufficiently well under desired level, but it finds too sparse model when the sample size is small. However, the knockoff is dramatically improved as sample size increases and the model is getting sparser.

FDR은 1종 오류를 제어하는 매우 보수적인 FWER과 달리 더 자유로운 변수 판단을 제공하여 고차원 자료의 추론에 있어 널리 쓰이고 있다. 본 논문은 Barber와 Candès (2015)가 제안한 knockoff 방법론을 사용하여 FDR을 일정 수준으로 제어하면서 고차원 장기억 시계열 모형인 성근 VHAR 모형을 추정하는 방법을 제안한다. 또한 기존의 방법론인 AL (adaptive Lasso)와의 모의실험을 통한 비교 연구를 통해서 장단점을 비교하였다. 그 결과 AL이 성근 일치성을 보이는 등 전체적으로 좋은 성질을 가지고 있지만, FDR의 관점에서는 비교적 높은 값을 주는 것을 관찰했다. 즉 AL은 0인 계수를 0이 아닌 계수로 추정하려는 경향이 있었다. 반면, knockoff 방법론은 FDR을 일정 수준으로 유지하였지만 표본의 수가 작을 경우 매우 보수적으로 0이 아닌 계수를 찾아냄을 관찰할 수 있었다. 하지만, 모형이 희박할 수록 knockoff의 성능이 크게 향상됨을 확인할 수 있어 표본의 개수가 크고 성근 모형일 경우 knockoff 방법론이 우수함을 살펴볼 수 있었다.

Keywords

Acknowledgement

이 논문은 한국연구재단의 지원을 받아 수행된 기초연구 사업임 (NRF-2022R1F1A1066209).

References

  1. Baek C and Park M (2021). Sparse vector heterogeneous autoregressive modeling for realized volatility, Journal of the Korean Statistical Society, 50, 495-510.  https://doi.org/10.1007/s42952-020-00090-5
  2. Barber RF and Candes EJ (2015). Controlling the false discovery rate via knockoffs, The Annals of Statistics, 43, 2055-2085.  https://doi.org/10.1214/15-AOS1337
  3. Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B (Methodological), 57, 289-300.  https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
  4. Breaux HJ (1967). On stepwise multiple linear regression, Army Ballistic Research Lab Aberdeen Proving Ground, Maryland. 
  5. Candes E, Fan Y, Janson L, and Lv J (2018). Panning for gold: 'Model-X' knockoffs for high-dimensional controlled variable selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80, 551-577.  https://doi.org/10.2307/2341042
  6. Corsi F (2009). A simple approximate long-memory model of realized volatility, Journal of Financial Econometrics, 7, 174-196.  https://doi.org/10.1093/jjfinec/nbp001
  7. Desboulets LDD (2018). A review on variable selection in regression analysis, Econometrics, 6, 45. 
  8. Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360.  https://doi.org/10.1198/016214501753382273
  9. Hochberg Y (1988). A sharper Bonferroni procedure for multiple tests of significance, Biometrika, 75, 800-802.  https://doi.org/10.1093/biomet/75.4.800
  10. Patterson E and Sesia M (2020). Knockoff: The Knockoff Filter for Controlled Variable Selection, R package version 0.3.3. 
  11. Simes RJ (1986). An improved Bonferroni procedure for multiple tests of significance, Biometrika, 73, 751-754.  https://doi.org/10.1093/biomet/73.3.751
  12. Tibshirani R (1996). Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society: Series B (Methodological), 58, 267-288.  https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
  13. Zhang CH (2010). Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 38, 894-942.  https://doi.org/10.1214/09-AOS729
  14. Zhang CH and Huang J (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression, The Annals of Statistics, 36, 1567-1594.  https://doi.org/10.1214/07-AOS520
  15. Zhao P and Yu B (2006). On model selection consistency of Lasso, The Journal of Machine Learning Research, 7, 2541-2563. 
  16. Zou H (2006). The adaptive Lasso and its oracle properties, Journal of the American Statistical Association, 101, 1418-1429. https://doi.org/10.1198/016214506000000735