WHEN CAN SUPPORT VECTOR MACHINE ACHIEVE FAST RATES OF CONVERGENCE?

  • Published : 2007.09.30

Abstract

Classification as a tool to extract information from data plays an important role in science and engineering. Among various classification methodologies, support vector machine has recently seen significant developments. The central problem this paper addresses is the accuracy of support vector machine. In particular, we are interested in the situations where fast rates of convergence to the Bayes risk can be achieved by support vector machine. Through learning examples, we illustrate that support vector machine may yield fast rates if the space spanned by an adopted kernel is sufficiently large.

Keywords

References

  1. BARTLETT, P. AND SHAWE-TAYLOR, J. (1998). 'Generalization performance of support vector machines and other pattern classifiers', In Advances in Kernel Methods: Support Vector Learning (Scholkopf, B., Burges, C. J. C. and Smola, A. J., eds.), 43-54, MIT Press, Cambridge, USA
  2. BARTLETT, P. L., JORDAN, M. I. AND McAULIFFE, J. D. (2006). 'Convexity, classification and risk bounds', Journal of the American Statistical Association, 101, 138-156 https://doi.org/10.1198/016214505000000907
  3. BLANCHARD, G., BOUSQUET, O. AND MASSART, P. (2004). 'Statistical performance of support vector machines' , preprint
  4. CORTES, C. AND VAPNIK, V. (1995). 'Support-vector networks', Machine Learning, 20, 273-297
  5. KOLMOGOROV, A. N. AND TIKHOMIROV, V. M. (1959). '$\varepsilon$-entropy and $\varepsilon$-capacity of sets in a functional spaces', Uspekhi Mat. Nauk, 14, 3-86. In Russian. English Translations in American Society Translations, 17, 277-364 (1961)
  6. KOLMOGOROV, A. N. AND TIKHOMIROV, V. M. (1959). '$\varepsilon$-entropy and $\varepsilon$-capacity of sets in a functional spaces', Uspekhi Mat. Nauk, 14, 3-86. In Russian. English Translations in American Society Translations, 17, 277-364 (1961)
  7. MAMMEN, E. AND TSYBAKOV, A. B. (1999). 'Smooth discrimination analysis', The Annals of Statistics, 27, 1808-1829 https://doi.org/10.1214/aos/1017939240
  8. MERCER, J. (1909). 'Functions of positive and negative type, and their connection with the theory of integral equations', Philosophical Transactions of the Royal Society of London, Ser. A, 209, 415-446 https://doi.org/10.1098/rsta.1909.0016
  9. PARK, C. (2006). 'Convergence rates of generalization errors for margin-based classification', preprint
  10. STEINWART, I. AND SCOVEL, C. (2007). 'Fast rates for support vector machines using Gaussian kernels', The Annals of Statistics, 35, 575-607 https://doi.org/10.1214/009053606000001226
  11. VAN DER VAART, A. W. AND WELLNER, J. A. (1996). Weak Convergence and Empirical Processes: with Applications to Statistics, Springer-Verlag, New York
  12. WAHBA, G. (1990). Spline Models for Observational Data, Society for Industrial and Applied Mathematics, Philadelphia
  13. ZHANG, T. (2004). 'Statistical behavior and consistency of classification methods based on convex risk minimization', The Annals of Statistics, 32, 56-85 https://doi.org/10.1214/aos/1079120130
  14. ZHOU, D.-X. (2002). 'The covering number in learning theory', Journal of Complexity, 18, 739-767 https://doi.org/10.1006/jcom.2002.0635