Subset selection in multiple linear regression: An improved Tabu search

- Journal title : Journal of the Korean Society of Marine Engineering
- Volume 40, Issue 2, 2016, pp.138-145
- Publisher : Korean Society of Marine Engineers
- DOI : 10.5916/jkosme.2016.40.2.138

Title & Authors

Subset selection in multiple linear regression: An improved Tabu search

Bae, Jaegug; Kim, Jung-Tae; Kim, Jae-Hwan;

Bae, Jaegug; Kim, Jung-Tae; Kim, Jae-Hwan;

Abstract

This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

Keywords

Metaheuristics;Improved tabu search;Subset selection problem;

Language

English

References

1.

I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of Machine Leaning Research, vol. 3, pp. 1157-1182, 2003.

2.

G. M. Furnival and R.W. Wilson, "Regression by leaps and bounds," Technometrics, vol. 16, pp. 416-423, 1974.

3.

A. P. D. Silva, "Efficient variable screening for multivariate analysis," Journal of Multivariate Analysis, vol.76, pp. 35-62, 2001.

4.

A. P. Duarte-Silva, "Discarding variables in a principal component analysis: algorithms for all-subsets comparisons," Computational Statistics, vol. 17 pp. 251-271, 2002.

5.

C. Gatu and E. J. Kontoghiorghes, "Branch-and-bound algorithms for computing the best-subset regression models," Journal of Computational and Graphical Statistics, vol. 15, no. 1, pp. 139-156, 2006.

6.

M. Hofmann, C. Gatu, and E. J. Kontoghiorghes, "Efficient algorithms for computing the best subset regression models for large-scale problems," Computational Statistics and Data Analysis, vol. 52, no. 1, pp. 16-29, 2007.

7.

M. J. Brusco, D. Steinley, and J. D. Cradit, "An exact algorithm for hierarchically well-formulated subsets in second-order polynomial regression," Technometrics, vol. 51, no. 3, pp. 306-315, 2009.

8.

J. Pacheco, S. Casado, and S. Porras, "Exact methods for variable selection in principal component analysis: Guide functions and pre-selection," Computational Statistics and Data Analysis, vol. 57, no. 1, pp. 95-111, 2013.

9.

Z. Drezner and G. A. Marcoulides, "Tabu seach model selection in multiple regression analysis," Communications in Statistics - Simulation and Computation, vol. 28, no. 9, pp. 349-367, 1999.

10.

H. Hasan, "Subset selection in multiple linear regression models: A hybrid of genetic and simulated annealing algorithms," Applied Mathematics and Computation, vol. 219, no. 23, pp. 11018-11028, 2013.

11.

N. R. Draper and H. Smith, Applied Regression Analysis, 3th Edition, NewYork: Wiley, 1998.

12.

D. G. Montgomery and E. A Peck, Introduction to Linear Regression Analysis, 2nd Edition, NewYork: Wiley, 1992.

13.

F. Glover, "Heuristics for integer programming using surrogate constraints", Decision Sciences, vol. 8, no. 1, pp. 156-166, 1977.

14.

F. Glover, "Future paths for integer programming and links to artificial intelligence," Computers and Operations Research, vol. 13, no. 5, pp. 533-549, 1986.

15.

S. Oliveira and G. Stroud, "A parallel version of tabu search and the assignment problem," Heuristics for Combinatorial Optimization, vol. 4, pp. 1-24, 1989.

16.

D. D. Werra and A. Herz, "Tabu search techniques: a tutorial and an application to neural networks," OR Spektrum, vol. 11, pp. 131-141, 1989.

17.

M. Laguna, J. W. Barnes, and F. Glover, "Tabu search methods for a single machine scheduling problem", Journal of Intelligent Manufacturing, vol. 2, no. 2, pp. 63-74, 1991.

18.

M. Laguna and J. L. G. Velarde, "A search heuristic for just-in-time scheduling in parallel machines," Journal of Intelligent Manufacturing, vol. 2, no. 4, pp. 253-260, 1991.

19.

J. A. Bland and G. P. Dawson, "Tabu search and design optimization," Computer Aided Design, vol. 23, no. 3, pp. 195-202, 1991.

20.

F. T. Lin, C. Y. Kao, and C. C. Hsu, "Applying the genetic approach to simulated annealing in solving some NP-hard problems," IEEE Transactions on System Man Cybernetics, vol. 23, no. 6, pp. 1752-1767, 1993.

21.

J. H. Holland, "Adaptaion in natural and artificial systems," University of Michigan Press, 1975.

22.

S. Kirpatirck, C. D. Gelatt, and M. P. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 671-680, 1983.