Comparison of Laplace and Double Pareto Penalty: LASSO and Elastic Net Kyung, Minjung;
Lasso (Tibshirani, 1996) and Elastic Net (Zou and Hastie, 2005) have been widely used in various fields for simultaneous variable selection and coefficient estimation. Bayesian methods using a conditional Laplace and a double Pareto prior specification have been discussed in the form of hierarchical specification. Full conditional posterior distributions with each priors have been derived. We compare the performance of Bayesian lassos with Laplace prior and the performance with double Pareto prior using simulations. We also apply the proposed Bayesian hierarchical models to real data sets to predict the collapse of governments in Asia.
Lasso;Elastic net;hierarchical models;scale mixture of normals;Laplace prior;double Pareto prior;
Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions, Journal of the Royal Statistical Society, Series B, 36, 99-102.
Armagan, A., Dunson, D. B. and Lee, J. (2013). Generalized double Pareto shrinkage, Statistica Sinica, 23, 119-143.
Bae, K. and Mallick, B. K. (2004). Gene selection using a two-level hierarchical Bayesian model, Bioinfor- matics, 20, 3423-3430.
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression, The Annals of Statistics, 32, 407-451.
Esty, D. C., Goldstone, J. A., Gurr, T. R., Harff, B., Levy, M., Dabelko, G. D., Surko, P. T. and Unger, A. N. (1999). State Failure Task Force Report: Phase II Findings Environmental Change & Security Project Report 5, Summer.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360.
Figueiredo, M. A. T. (2003). Adaptive sparseness for supervised learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25 1150-1159.
Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools, Technometrics, 35, 109-135.
Hobert, J. P. and Geyer, C. J. (1998). Geometric ergodicity of Gibbs and Block Gibbs samplers for a hierarchical random effects model, Journal of Multivariate Analysis, 67 414-430.
Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Applications to nonorthogonal problems, Technometrics, 12, 55-68.
Kim, Y., Kim, J. and Kim, Y. (2006). Blockwise sparse regression, Statistica Sinica, 16, 375-390.
Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators, The Annals of Statistics, 28, 1356-1378.
Kyung, M., Gill, J., Ghosh, M. and Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos, Bayesian Analysis, 5, 369-412.
Liu, J. S. , Wong, W. H. and Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparison of estimators and augmentation schemes, Biometrika, 81, 27-40.
Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, 20, 389-404.
Park, T. and Casella, G. (2008). The Bayesian lasso, Journal of the American Statistical Association, 103, 681-686.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B, 58, 267-288.
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso, Journal of the Royal Statistical Society, Series B, 67, 91-108.
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society, Series B, 68, 49-67.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society, Series B, 67, 301-320.