DOI QR코드

DOI QR Code

Design and Implementation of a Data Extraction Tool for Analyzing Software Changes

  • Lee, Yong-Hyeon (Dept. of Computer Engineering, Chungbuk National University) ;
  • Kim, Kisub (Dept. of Computer Engineering, Chungbuk National University) ;
  • Lee, Jaekwon (Dept. of Computer Engineering, Chungbuk National University) ;
  • Jung, Woosung (Dept. of Computer Engineering, Chungbuk National University)
  • Received : 2016.07.14
  • Accepted : 2016.08.10
  • Published : 2016.08.31

Abstract

In this paper, we present a novel approach to help MSR researchers obtain necessary data with a tool, termed General Purpose Extractor for Source code (GPES). GPES has a single function extracts high-quality data, e.g., the version history, abstract syntax tree (AST), changed code diff, and software quality metrics. Moreover, features such as an AST of other languages or new software metrics can be extended easily given that GPES has a flexible data model and a component-based design. We conducted several case studies to evaluate the usefulness and effectiveness of our tool. Case studies show that researchers can reduce the overall cost of data analysis by transforming the data into the required formats.

Keywords

References

  1. S. Oh, "A Study on the Efficient Configuration Thread Control Modeling in Version Control using Object Oriented System," Journal of the Korea Society of Computer and Information, Vol. 10, No. 4, pp. 123-132, Sep. 2005.
  2. GitHubArchive Event Dataset on Google BigQuery, https://www.githubarchive.org/
  3. T. Zimmermann, A. Zeller, P. Weissgerber, and S. Diehl, "Mining Version Histories to Guide Software Changes," IEEE Transactions on Software Engineering, Vol. 31, No. 6, pp. 429-445, June 2005. https://doi.org/10.1109/TSE.2005.72
  4. C. Gorg, and P. WeiBgerber, "Error Detection by Refactoring Reconstruction," ACM SIGSOFT Software Engineering Notes, Vol. 30, No. 4, pp. 1-5, July 2005.
  5. C. Teyton, M. Palyart, J. R. Falleri, F. Morandat, and X. Blanc, "Automatic Extraction of Developer Expertise," in Proceedings of the 18th ACM International Conference on Evaluation and Assessment in Software Engineering, May 2014.
  6. S. Oh and C. Park, "Development of Automatic Rule Extraction Method in Data Mining : An Approach based on Hierarchical Clustering Algorithm and Rough Set Theory," Journal of the Korea Society of Computer and Information, Vol. 14, No. 6, pp. 135-142, June 2009.
  7. E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi, "Sourcerer: Mining and Searching Internet-Scale Software Repositories," Data Mining and Knowledge Discovery, Vol. 18, No. 2, pp. 300-336, April 2009. https://doi.org/10.1007/s10618-008-0118-x
  8. R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen, "Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories," in Proceedings of the 35th ACM/IEEE International Conference on Software Engineering, pp. 422-431, May 2013.
  9. Dyer, R., H. A. Nguyen, H. Rajan, and T. N. Nguyen, "Boa: Ultra-Large-Scale Software Repository and Source-Code Mining," ACM Transactions on Software Engineering and Methodology, Vol. 25, No. 1, pp. 178-212, Dec. 2015.
  10. A. K. Schneider, C. Gutwin, R. Penner, and D. Paquette, "Mining a Software Developer's Local Interaction History," in Proceedings of the 1st International Workshop on Mining Software Repositories, pp. 106-110, May 2004.
  11. R. Robbes, "Mining a Change-Based Software Repository," in Proceedings of the 4th International Workshop on Mining Software Repositories, 2007.
  12. J. Bevan, E. J. Whitehead, Jr., S. Kim, and M. Godfrey, "Facilitating Software Evolution Research with Kenyon," ACM SIGSOFT Software Engineering Notes, Vol. 30, No. 5, pp. 177-186, Sep. 2005. https://doi.org/10.1145/1095430.1081736
  13. W. Shang, B. Adams, and A. E. Hassan, "An Experience Report on Scaling Tools for Mining Software Repositories using MapReduce," in Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering, pp. 275-284, Sept. 2010.
  14. A. Mockus, and L. G. Votta, "Identifying Reasons for Software Changes using Historic Databases," in Proceedings of the 16th IEEE International Conference on Software Maintenance, pp. 120-130, Aug. 2000.
  15. F. Van Rysselberghe, and S. Demeyer, "Studying Software Evolution Information by Visualizing the Change History," in Proceedings of the 20th IEEE International Conference on Software Maintenance, pp. 328-337, Sept. 2004.
  16. A. Capiluppi, M. Morisio, and J. F. Ramil, "Structural evolution of an open source system: A case study," in Proceedings of the 12th IEEE International Workshop on Program Comprehension, pp. 172-182, June 2004.
  17. H. Gall, K. Hajek, and M. Jazayeri, "Detection of Logical Coupling Based on Product Release History," in Proceedings of the 14th IEEE International Conference on Software Maintenance, pp. 190-198, March 1998.
  18. H. Gall, M. Jazayeri, and J. Krajewski, "CVS Release History Data for Detecting Logical Couplings," in Proceedings of the 6th IEEE International Workshop on Principles of Software Evolution, pp. 13-23, Sept. 2003.
  19. D. Beyer, and A. Noack, "Clustering Software Artifacts Based on Frequent Common Changes," in Proceedings of the 13th IEEE International Workshop on Program Comprehension, pp. 259-268, May 2005.
  20. T. Girba, A. Kuhn, M. Seeberger, and S. Ducasse, "How Developers Drive Software Evolution," in Proceedings of the 8th IEEE International Workshop on Principles of Software Evolution, pp. 113-122, Sept. 2005.
  21. J. J. Amor, G. Robles, and J. M. Gonzalez-Barahona, "Effort Estimation by Characterizing Developer Activity," in Proceedings of the 8th ACM International Workshop on Economics Driven Software Engineering Research, pp. 3-6, May 2006.
  22. R. Sindhgatta, "Identifying Domain Expertise of Developers from Source Code," in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 981-989, Aug. 2008.
  23. D. Schuler, and T. Zimmermann, "Mining Usage Expertise from Version Archives," in Proceedings of the 5th ACM International Working Conference on Mining Software Repositories, pp. 121-124, May 2008.