DOI QR코드

DOI QR Code

Robustness, Data Analysis, and Statistical Modeling: The First 50 Years and Beyond

  • Barrios, Erniel B. (School of Statistics, University of the Philippines Diliman)
  • Received : 2015.10.29
  • Accepted : 2015.11.20
  • Published : 2015.11.30

Abstract

We present a survey of contributions that defined the nature and extent of robust statistics for the last 50 years. From the pioneering work of Tukey, Huber, and Hampel that focused on robust location parameter estimation, we presented various generalizations of these estimation procedures that cover a wide variety of models and data analysis methods. Among these extensions, we present linear models, clustered and dependent observations, times series data, binary and discrete data, models for spatial data, nonparametric methods, and forward search methods for outliers. We also present the current interest in robust statistics and conclude with suggestions on the possible future direction of this area for statistical science.

Keywords

References

  1. Alhamzawi, R. (2015). Model selection in quantile regression models, Journal of Applied Statistics, 42, 445-458. https://doi.org/10.1080/02664763.2014.959905
  2. Atkinson, A. C. (1994). Fast very robust methods for the detection of multiple outliers, Journal of the American Statistical Association, 89, 1329-1339. https://doi.org/10.1080/01621459.1994.10476872
  3. Atkinson, A. C. (2009). Econometric applications of the forward search in regression: Robustness, diagnostics, and graphics, Econometric Reviews, 28, 21-39.
  4. Atkinson, A. C. and Cheng, T. C. (2000). On robust linear regression with incomplete data, Computational Statistics & Data Analysis, 33, 361-380. https://doi.org/10.1016/S0167-9473(99)00061-4
  5. Atkinson, A. C. and Riani, M. (2007a). Exploratory tools for clustering multivariate data, Computational Statistics & Data Analysis, 52, 272-285. https://doi.org/10.1016/j.csda.2006.12.034
  6. Atkinson, A. C. and Riani, M. (2007b). Building regression models with the forward search, Journal of Computing and Information Technology, 15, 287-294. https://doi.org/10.2498/cit.1001135
  7. Bastero, R. F. and Barrios, E. B. (2011). Robust estimation of a spatiotemporal model with structural change, Communications in Statistics-Simulation and Computation, 40, 448-468. https://doi.org/10.1080/03610918.2010.543298
  8. Beran, R. (1982). Robust estimation in models for independent non-identically distributed data, The Annals of Statistics, 10, 415-428. https://doi.org/10.1214/aos/1176345783
  9. Bertaccini, B. and Varriale, R. (2007). Robust analysis of variance: An approach based on the forward search, Computational Statistics & Data Analysis, 51, 5172-5183. https://doi.org/10.1016/j.csda.2006.08.010
  10. Campano, W. Q. and Barrios, E. B. (2011). Robust estimation of a time series model with structural change, Journal of Statistical Computation and Simulation, 81, 909-927. https://doi.org/10.1080/00949650903575211
  11. Cantoni, E. and Ronchetti, E. (2001). Robust inference for generalized linear models, Journal of the American Statistical Association, 96, 1022-1030. https://doi.org/10.1198/016214501753209004
  12. Cao, F., Ye, H. and Wang, D. (2015). A probabilistic learning algorithm for robust modeling using neural networks with random weights, information sciences, 313, 62-78. https://doi.org/10.1016/j.ins.2015.03.039
  13. Carroll, R. J. and Ruppert, D. (1982). Robust estimation in heteroscedastic linear models, The Annals of Statistics, 10, 429-441. https://doi.org/10.1214/aos/1176345784
  14. Chang, L., Hu, B., Chang, G. and Li, A. (2013). Robust derivative-free Kalman filter based on Huber's M-estimation, Journal of Process Control, 23, 1555-1561. https://doi.org/10.1016/j.jprocont.2013.05.004
  15. Cizek, P. (2008). Robust and efficient adaptive estimation of binary-choice regression models, Journal of the American Statistical Association, 103, 687-696. https://doi.org/10.1198/016214508000000175
  16. Cizek, P. (2012). Semiparametric robust estimation of truncated and censored regression models, Journal of Econometrics, 168, 347-366. https://doi.org/10.1016/j.jeconom.2012.02.002
  17. Cressie, N. and Hawkins, D. M. (1980). Robust estimation of the variogram: I, Mathematical Geology, 12, 115-125. https://doi.org/10.1007/BF01035243
  18. Dang, V. A., Kim, M. and Shin, Y. (2015). In search of robust methods for dynamic panel data models in empirical corporate finance, Journal of Banking & Finance, 53, 84-98. https://doi.org/10.1016/j.jbankfin.2014.12.009
  19. de Luna, X. and Genton, M. G. (2001). Robust simulation-based estimation of ARMA models, Journal of Computational and Graphical Statistics, 10, 370-387. https://doi.org/10.1198/10618600152628347
  20. Dogan, O. and Taspinar, S. (2014). Spatial autoregressive models with unknown heteroscedasticity: A comparison of Bayesian and robust GMM approach, Regional Science and Urban Economics, 45, 1-21. https://doi.org/10.1016/j.regsciurbeco.2013.12.003
  21. Field, C. A., Pang, Z. and Welsh, A. H. (2010). Bootstrapping robust estimates for clustered data, Journal of the American Statistical Association, 105, 1606-1616. https://doi.org/10.1198/jasa.2010.tm09541
  22. Furno, M. (2004). ARCH tests and quantile regressions, Journal of Statistical Computation and Simulation, 74, 277-292. https://doi.org/10.1080/0094965031000151178
  23. Gaglianone, W. P., Lima, L. R., Linton, O. and Smith, D. R. (2011). Evaluating value-at-risk models via quantile regression, Journal of Business & Economic Statistics, 29, 150-160. https://doi.org/10.1198/jbes.2010.07318
  24. Hampel, F. R. (1971). A general qualitative definition of robustness, The Annals of Mathematical Statistics, 42, 1887-1896. https://doi.org/10.1214/aoms/1177693054
  25. Hampel, F. R. (1973). Robust estimation: A condensed partial survey, Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 27, 87-104. https://doi.org/10.1007/BF00536619
  26. Hampel, F. R. (1974). The influence curve and its role in robust estimation, Journal of the American Statistical Association, 69, 383-393. https://doi.org/10.1080/01621459.1974.10482962
  27. Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions, John Wiley & Sons, New York.
  28. Hardle, W. (1984). Robust regression function estimation, Journal of Multivariate Analysis, 14, 169-180. https://doi.org/10.1016/0047-259X(84)90003-4
  29. Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models, Chapman and Hall, London.
  30. He, X. and Zhu, L. X. (2003). A lack-of-fit test for quantile regression, Journal of the American Statistical Association, 98, 1013-1022. https://doi.org/10.1198/016214503000000963
  31. He, X., Fung, W. Z. and Zhu, Z. (2005). Robust estimation in generalized partial linear models for clustered data, Journal of the American Statistical Association, 100, 1176-1184. https://doi.org/10.1198/016214505000000277
  32. Hettmansperger, T. P. and McKean, J. W. (1988). Robust Nonparametric Statistical Methods, Arnold, London.
  33. Hettmansperger, T. P., McKean, J. W., and Sheather, S. J. (2000). Robust nonparametric methods, Journal of the American Statistical Association, 95, 1308-1312. https://doi.org/10.1080/01621459.2000.10474337
  34. Hoshino, T. (2014). Quantile regression estimation of partially linear additive models, Journal of Nonparametric Statistics, 26, 509-536. https://doi.org/10.1080/10485252.2014.929675
  35. Huang, A. Y. H. (2012). Volatility forecasting by quantile regression, Applied Economics, 44, 423-433. https://doi.org/10.1080/00036846.2010.508727
  36. Huber, P. J. (1964). Robust estimation of a location parameter, The Annals of Mathematical Statistics, 35, 73-101. https://doi.org/10.1214/aoms/1177703732
  37. Huber, P. J. (1972). The 1972 wald lecture robust statistics: A review, The Annals of Mathematical Statistics, 43, 1041-1067. https://doi.org/10.1214/aoms/1177692459
  38. Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo, The Annals of Statistics, 1, 799-821. https://doi.org/10.1214/aos/1176342503
  39. Huber, P. J. (2002). John W. Tukey's contributions to robust statistics, The Annals of Statistics, 30, 1640-1648. https://doi.org/10.1214/aos/1043351251
  40. Huber, P. J. and Ronchetti, E. M. (2009). Robust Statistics, 2nd ed., John Wiley and Sons, New York.
  41. Hubert, M. and Rousseeuw, P. J. (1997). Robust regression with both continuous and binary regressors, Journal of Statistical Planning and Inference, 57, 153-163. https://doi.org/10.1016/S0378-3758(96)00041-9
  42. Hung, K. W. and Siu, W. C. (2015). Learning-based image interpolation via robust k-NN searching for coherent AR parameters estimation, Journal of Visual Communication Image Representation, 31, 305-311. https://doi.org/10.1016/j.jvcir.2015.07.006
  43. Karunamuni, R. J., Tang, Q. and Zhao, B. (2015). Robust and efficient estimation of effective dose, Computational Statistics & Data Analysis, 90, 47-60. https://doi.org/10.1016/j.csda.2015.04.001
  44. Kelly, G. E. and Lindsey, J. K. (2002). Robust estimation of the median lethal dose, Journal of Biopharmaceutical Statistics, 12, 137-147. https://doi.org/10.1081/BIP-120014416
  45. Kitromilidou, S. and Fokianos, K. (2015). Robust estimation methods for a class of log-linear count time series models, Journal of Statistical Computation and Simulation, DOI: 10.1080/00949655.2015.1035271.
  46. Kim, M. O. and Yang, Y. (2011). Semiparametric approach to a random effects quantile regression, Journal of the American Statistical Association, 106, 1405-1417. https://doi.org/10.1198/jasa.2011.tm10470
  47. Li, Y. and Zhu, J. (2008). L1-norm quantile regression, Journal of Computational and Graphical Statistics, 17, 163-185. https://doi.org/10.1198/106186008X289155
  48. Lv, Z., Zhu, H. and Yu, K. (2014). Robust variable selection for nonlinear models with diverging number of parameters, Statistics & Probability Letters, 91, 90-97. https://doi.org/10.1016/j.spl.2014.04.013
  49. Mann, H. B. and Wald, A. (1942). On the choice of the number of class intervals in the application of the chi square test, The Annals of Mathematical Statistics, 13, 306-317. https://doi.org/10.1214/aoms/1177731569
  50. Maronna, R. A. and Zamar, R. H. (2002). Robust estimates of location and dispersion for high dimensional datasets, Technometrics, 44, 307-317. https://doi.org/10.1198/004017002188618509
  51. Mavridis, D. and Moustaki, I. (2009). The forward search algorithm for detecting response patterns in factor analysis for binary data, Journal of Computational and Graphical Statistics, 18, 1016-1034. https://doi.org/10.1198/jcgs.2009.08060
  52. Moscone, F. and Tosetti, E. (2015). Robust estimation under error cross section dependence, Economics Letters, 133, 100-104. https://doi.org/10.1016/j.econlet.2015.05.020
  53. Nassiri, V. and Loris, I. (2013). A generalized quantile regression model, Journal of Applied Statistics, 40, 1090-1105. https://doi.org/10.1080/02664763.2013.780158
  54. Perez, B., Molina, I. and Pena, D. (2014). Outlier detection and robust estimation in linear regression models with fixed group effects, Journal of Statistical Computation and Simulation, 84, 2652-2669. https://doi.org/10.1080/00949655.2013.811669
  55. Riani, M. (2004). Extensions of the forward search to time series, Studies in Nonlinear Dynamics & Econometrics, 8, Article 2.
  56. Rieder, H. (1996). Robust Statistics, Data Analysis, and Computer Intensive Methods, Springer-Verlag, New York.
  57. Sacks, J. and Ylvisaker, D. (1972). A note of Huber's robust estimation of a location parameter, The Annals of Mathematical Statistics, 43, 1068-1075. https://doi.org/10.1214/aoms/1177692460
  58. Santos, K. C. P. and Barrios, E. B. (2015). Improving predictive accuracy of logistic regression model using ranked set samples, Communications in Statistics-Simulation and Computation, DOI: 10.1080/03610918.2014.955113.
  59. Shahriari, H. and Ahmadi, O. (2015). Robust estimation of the mean vector for high-dimensional data set using robust clustering, Journal of Applied Statistics, 42, 1183-1205. https://doi.org/10.1080/02664763.2014.999030
  60. Tukey, J. W. (1962). The future of data analysis, The Annals of Mathematical Statistics, 33, 1-67. https://doi.org/10.1214/aoms/1177704711
  61. Ursu, E. and Pereau, J. C. (2014). Robust modelling of periodic vector autoregressive time series, Journal of Statistical Planning and Inference, 155, 93-106. https://doi.org/10.1016/j.jspi.2014.07.005
  62. Vretos, N., Tefas, A. and Pitas, I. (2013). Using robust dispersion estimation in support vector machines, Pattern Recognition, 46, 3441-3451. https://doi.org/10.1016/j.patcog.2013.05.016
  63. Wang, Y., Fan, Y., Bhatt, P. and Davatzikos, C. (2010). High-dimensional pattern regression using machine learning: From medical images to continuous clinical variables, Neuroimage, 50, 1519-1535. https://doi.org/10.1016/j.neuroimage.2009.12.092
  64. Wei, Y. and Carroll, R. J. (2009). Quantile regression with measurement error, Journal of American Statistical Association, 104, 1129-1143. https://doi.org/10.1198/jasa.2009.tm08420
  65. Wong, R. K.W., Yao, F. and Lee, T. C. M. (2014). Robust estimation for generalized additive models, Journal of Computational and Graphical Statistics, 23, 270-289. https://doi.org/10.1080/10618600.2012.756816
  66. Xiao, Z. (2012). Robust inference in nonstationary time series models, Journal of Econometrics, 169, 211-223. https://doi.org/10.1016/j.jeconom.2012.01.027
  67. Zhao, J. and Wang, J. (2009). Robust testing procedures in heteroscedastic linear models, Communications in Statistics-Simulation and Computation, 38, 244-256. https://doi.org/10.1080/03610910802468666