JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Severity-based Software Quality Prediction using Class Imbalanced Data
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Severity-based Software Quality Prediction using Class Imbalanced Data
Hong, Euy-Seok; Park, Mi-Kyeong;
  PDF(new window)
 Abstract
Most fault prediction models have class imbalance problems because training data usually contains much more non-fault class modules than fault class ones. This imbalanced distribution makes it difficult for the models to learn the minor class module data. Data imbalance is much higher when severity-based fault prediction is used. This is because high severity fault modules is a smaller subset of the fault modules. In this paper, we propose severity-based models to solve these problems using the three sampling methods, Resample, SpreadSubSample and SMOTE. Empirical results show that Resample method has typical over-fit problems, and SpreadSubSample method cannot enhance the prediction performance of the models. Unlike two methods, SMOTE method shows good performance in terms of AUC and FNR values. Especially J48 decision tree model using SMOTE outperforms other prediction models.
 Keywords
Data imbalance;Fault prediction;Severity;Sampling;
 Language
Korean
 Cited by
 References
1.
C. Catal, "Software fault prediction: A literature review and current trends," Expert Systems with Applications, Vol.38, No.4, pp.4626-4636, April 2011. crossref(new window)

2.
R. Malhotra, "A systematic review of machine learning techniques for software fault prediction," Applied Soft. Computing Vol.27, pp.504-518, Feb. 2015. crossref(new window)

3.
D. E. Harter, C. F. Kemerer and S. A. Slaughter, "Does Software Process Improvement Reduce the Severity of Defects? A Longitudinal Field Study," IEEE Trans. Software Eng., Vol.38, No.4, pp. 810-827, July 2012. crossref(new window)

4.
Y. Zhou and H. Leung, "Empirical analysis of object-oriented design metrics for predicting high and low severity faults," IEEE Trans. Software Eng., Vol.32, No.10, pp.771-789, Oct. 2006. crossref(new window)

5.
E. S. Hong, "Software Quality Prediction based on Defect Severity," Journal of the Korea Society of Computer and Information, Vol.20, No.5, pp. 73-81, May 2015.

6.
E. S. Hong, "Ambiguity Analysis of Defectiveness in NASA MDP data sets," Journal of the Korea Society of IT Services, Vol.12, No.2, pp.361-371, June 2013. crossref(new window)

7.
E. S. Hong and M. K. Park, "Unsupervised learning model for fault prediction using representative clustering algorithms," KIPS Trans. Software and Data Engineering, Vol.3, No.2, pp.57-64, Feb. 2014. crossref(new window)

8.
Y. Zhou and H. Leung, "Empirical analysis of object-oriented design metrics for predicting high and low severity faults," IEEE Trans. Software Eng., Vol.32, No.10, pp.771-789, Oct. 2006. crossref(new window)

9.
Y. Singh, A. Kaur and R. Malhotra, "Empirical validation of object-oriented metrics for predicting fault proneness models," Software Quality Journal, Vol.18, pp.3-35, March 2010. crossref(new window)

10.
Y. Kamei, A. Moden, S. Matsumoto, T. Kakimoto and K. Matsumoto, "The Effects of Over and Under Sampling on Fault-prone Module Detection," proc. ESEM, pp.196-204, 2007.

11.
Y. Jiang, M. Li and Z. Zhou, "Software defect detection with ROCUS," Journal of Computer Science and Technology, Vol.26, No.2, pp.328-342, March 2011. crossref(new window)

12.
M. Li, H. Zhang, R. Wu and Z. H. Zhou, "Sample based software defect prediction with active and semi-supervised learning," Automated Software Engineering, Vol.19, No.2, pp.201-230, June 2012. crossref(new window)

13.
S. Wang and X. Yao, "Using class imbalance learning for software defect prediction," IEEE Trans. Reliability, Vol.62, No.2, pp.434-443, June 2013. crossref(new window)

14.
WEKA (Waikato Environment for Knowledge Analysis) http://www.cs.waikato.ac.nz/-ml/weka/

15.
T. Fawcett, "An introduction to ROC analysis," Pattern recognition letters, Vol.27, No.8, pp.861-874, June 2006. crossref(new window)

16.
N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, Vol.16, No.1, pp.321-357, Jan. 2002.

17.
L. Rokach and O. Maimon, "Top-Down Induction of Decision Trees Classifiers - A Survey," IEEE Trans. Systems, Man, and Cybernetics, Part C, Vol.35, No.4, pp. 476-487, Nov. 2005. crossref(new window)