Advanced SearchSearch Tips
Subset selection in multiple linear regression: An improved Tabu search
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Subset selection in multiple linear regression: An improved Tabu search
Bae, Jaegug; Kim, Jung-Tae; Kim, Jae-Hwan;
  PDF(new window)
This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.
Metaheuristics;Improved tabu search;Subset selection problem;
 Cited by
I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of Machine Leaning Research, vol. 3, pp. 1157-1182, 2003.

G. M. Furnival and R.W. Wilson, "Regression by leaps and bounds," Technometrics, vol. 16, pp. 416-423, 1974.

A. P. D. Silva, "Efficient variable screening for multivariate analysis," Journal of Multivariate Analysis, vol.76, pp. 35-62, 2001. crossref(new window)

A. P. Duarte-Silva, "Discarding variables in a principal component analysis: algorithms for all-subsets comparisons," Computational Statistics, vol. 17 pp. 251-271, 2002. crossref(new window)

C. Gatu and E. J. Kontoghiorghes, "Branch-and-bound algorithms for computing the best-subset regression models," Journal of Computational and Graphical Statistics, vol. 15, no. 1, pp. 139-156, 2006. crossref(new window)

M. Hofmann, C. Gatu, and E. J. Kontoghiorghes, "Efficient algorithms for computing the best subset regression models for large-scale problems," Computational Statistics and Data Analysis, vol. 52, no. 1, pp. 16-29, 2007. crossref(new window)

M. J. Brusco, D. Steinley, and J. D. Cradit, "An exact algorithm for hierarchically well-formulated subsets in second-order polynomial regression," Technometrics, vol. 51, no. 3, pp. 306-315, 2009. crossref(new window)

J. Pacheco, S. Casado, and S. Porras, "Exact methods for variable selection in principal component analysis: Guide functions and pre-selection," Computational Statistics and Data Analysis, vol. 57, no. 1, pp. 95-111, 2013. crossref(new window)

Z. Drezner and G. A. Marcoulides, "Tabu seach model selection in multiple regression analysis," Communications in Statistics - Simulation and Computation, vol. 28, no. 9, pp. 349-367, 1999. crossref(new window)

H. Hasan, "Subset selection in multiple linear regression models: A hybrid of genetic and simulated annealing algorithms," Applied Mathematics and Computation, vol. 219, no. 23, pp. 11018-11028, 2013. crossref(new window)

N. R. Draper and H. Smith, Applied Regression Analysis, 3th Edition, NewYork: Wiley, 1998.

D. G. Montgomery and E. A Peck, Introduction to Linear Regression Analysis, 2nd Edition, NewYork: Wiley, 1992.

F. Glover, "Heuristics for integer programming using surrogate constraints", Decision Sciences, vol. 8, no. 1, pp. 156-166, 1977. crossref(new window)

F. Glover, "Future paths for integer programming and links to artificial intelligence," Computers and Operations Research, vol. 13, no. 5, pp. 533-549, 1986. crossref(new window)

S. Oliveira and G. Stroud, "A parallel version of tabu search and the assignment problem," Heuristics for Combinatorial Optimization, vol. 4, pp. 1-24, 1989.

D. D. Werra and A. Herz, "Tabu search techniques: a tutorial and an application to neural networks," OR Spektrum, vol. 11, pp. 131-141, 1989. crossref(new window)

M. Laguna, J. W. Barnes, and F. Glover, "Tabu search methods for a single machine scheduling problem", Journal of Intelligent Manufacturing, vol. 2, no. 2, pp. 63-74, 1991. crossref(new window)

M. Laguna and J. L. G. Velarde, "A search heuristic for just-in-time scheduling in parallel machines," Journal of Intelligent Manufacturing, vol. 2, no. 4, pp. 253-260, 1991. crossref(new window)

J. A. Bland and G. P. Dawson, "Tabu search and design optimization," Computer Aided Design, vol. 23, no. 3, pp. 195-202, 1991. crossref(new window)

F. T. Lin, C. Y. Kao, and C. C. Hsu, "Applying the genetic approach to simulated annealing in solving some NP-hard problems," IEEE Transactions on System Man Cybernetics, vol. 23, no. 6, pp. 1752-1767, 1993. crossref(new window)

J. H. Holland, "Adaptaion in natural and artificial systems," University of Michigan Press, 1975.

S. Kirpatirck, C. D. Gelatt, and M. P. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 671-680, 1983. crossref(new window)

M. Widmer and A. Hertz, "A new heuristic method for the flow shop sequencing problem," European Journal of Operational Research, vol. 41, no. 2, pp. 186-193, 1989. crossref(new window)

E. Tailard, "Some efficient heuristic methods for the flow shop sequencing problem," European Journal of Operational Research, vol. 47, no. 1, pp. 65-74, 1990. crossref(new window)