DOI QR코드

DOI QR Code

A Deep Learning Application for Automated Feature Extraction in Transaction-based Machine Learning

트랜잭션 기반 머신러닝에서 특성 추출 자동화를 위한 딥러닝 응용

  • 우덕채 (국민대학교 데이터사이언스학과) ;
  • 문현실 (경희대학교 경영대학 & AI경영연구센터) ;
  • 권순범 (국민대학교 경영학부) ;
  • 조윤호 (국민대학교 경영학부)
  • Received : 2019.05.03
  • Accepted : 2019.05.27
  • Published : 2019.06.30

Abstract

Machine learning (ML) is a method of fitting given data to a mathematical model to derive insights or to predict. In the age of big data, where the amount of available data increases exponentially due to the development of information technology and smart devices, ML shows high prediction performance due to pattern detection without bias. The feature engineering that generates the features that can explain the problem to be solved in the ML process has a great influence on the performance and its importance is continuously emphasized. Despite this importance, however, it is still considered a difficult task as it requires a thorough understanding of the domain characteristics as well as an understanding of source data and the iterative procedure. Therefore, we propose methods to apply deep learning for solving the complexity and difficulty of feature extraction and improving the performance of ML model. Unlike other techniques, the most common reason for the superior performance of deep learning techniques in complex unstructured data processing is that it is possible to extract features from the source data itself. In order to apply these advantages to the business problems, we propose deep learning based methods that can automatically extract features from transaction data or directly predict and classify target variables. In particular, we applied techniques that show high performance in existing text processing based on the structural similarity between transaction data and text data. And we also verified the suitability of each method according to the characteristics of transaction data. Through our study, it is possible not only to search for the possibility of automated feature extraction but also to obtain a benchmark model that shows a certain level of performance before performing the feature extraction task by a human. In addition, it is expected that it will be able to provide guidelines for choosing a suitable deep learning model based on the business problem and the data characteristics.

Keywords

OTSBB9_2019_v18n2_143_f0001.png 이미지

Handling Transactions as Text

OTSBB9_2019_v18n2_143_f0002.png 이미지

Concept of Deep Learning-based Machine Learning Flow

OTSBB9_2019_v18n2_143_f0003.png 이미지

An Example of Vector Representation for BOW

OTSBB9_2019_v18n2_143_f0004.png 이미지

Input Data Example of Word2Vec-basedMethod

OTSBB9_2019_v18n2_143_f0005.png 이미지

Schematic Diagram for Deep Learning-based Methods

OTSBB9_2019_v18n2_143_f0006.png 이미지

1-D Convolution Example

OTSBB9_2019_v18n2_143_f0007.png 이미지

Bi-directional LSTM Example

OTSBB9_2019_v18n2_143_f0008.png 이미지

Experiment Data Shape Comparisons (Tibshirani, 2017)

OTSBB9_2019_v18n2_143_f0009.png 이미지

Machine Learning Algorithms Comparison

OTSBB9_2019_v18n2_143_f0010.png 이미지

AUC Results

Data Description

OTSBB9_2019_v18n2_143_t0001.png 이미지

Model Summary Used in Experiments

OTSBB9_2019_v18n2_143_t0002.png 이미지

References

  1. Ahn, S.M., "Deep Learning Architectures and Applications", Journal of Intelligence and Information Systems, Vol.22, No.2, 2016, 127-142. https://doi.org/10.13088/jiis.2016.22.2.127
  2. Alex, S., S.H. Seo, and Y. Kwon, "Development of Deep Learning Models for Multi-class Sentiment Analysis", Journal of Information Technology Services, Vol.16, No.4, 2017, 149-160. https://doi.org/10.9716/KITS.2017.16.4.149
  3. Babaee, M., D.T. Dinh, and G. Rigoll, "A deep convolutional neural network for video sequence background subtraction", Pattern Recognition, Vol.76, 2018, 635-649. https://doi.org/10.1016/j.patcog.2017.09.040
  4. Chollet, F., Deep Learning with Python, Manning Publications Company, New York, 2017.
  5. Balaji, A. and A. Allen, "Benchmarking Automatic Machine Learning Frameworks", arXiv preprint arXiv:1808.06492, 2018.
  6. Bansal, T., D. Belanger, and A. McCallum, "Ask the gru : Multi-task learning for deep text recommendations", In Proceedings of the 10th ACM Conference on Recommender Systems, 2016, 107-114.
  7. Barkan, O. and N. Koenigstein, "Item2vec : neural item embedding for collaborative filtering", In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing, 2016, 1-6.
  8. Barnaghi, P., A. Sheth, and C. Henson, "From data to actionable knowledge : big data challenges in the web of things", IEEE Intelligent Systems, Vol.6, 2013, 6-11. https://doi.org/10.1109/MIS.2013.142
  9. Bradley, A.P., "The use of the area under the ROC curve in the evaluation of machine learning algorithms", Pattern recognition, Vol.30, No.7, 1997, 1145-1159. https://doi.org/10.1016/S0031-3203(96)00142-2
  10. Chen, T. and C. Guestrin, "Xgboost : A scalable tree boosting system", In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, 785-794.
  11. Cho, K., B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation", arXiv preprint arXiv : 1406.1078, 2014.
  12. Chung, J., C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling", arXiv preprint arXiv : 1412.3555, 2014.
  13. Deng, L. and Y. Liu, Deep Learning in Natural Language Processing, Springer, Singapore, 2018.
  14. Dhingra, B., H. Liu, Z. Yang, W.W. Cohen, and R. Salakhutdinov, "Gated-attention readers for text comprehension", arXiv preprint arXiv : 1606.01549, 2016.
  15. Domingos, P.M., "A few useful things to know about machine learning", Communications of the ACM, Vol.55, No.10, 2012, 78-87. https://doi.org/10.1145/2347736.2347755
  16. Faust, O., Y. Hagiwara, T.J. Hong, O.S. Lih, and U.R. Acharya, "Deep learning for healthcare applications based on physiological signals : a review", Computer methods and programs in biomedicine , Vol.161, 2018, 1-13. https://doi.org/10.1016/j.cmpb.2018.04.005
  17. Ghosh, S. and M.S. Desarkar, "Class Specific TF-IDF Boosting for Short-text Classification : Application to Short-texts Generated During Disasters", In Companion of the The Web Conference 2018 on The Web Conference 2018, 2018, 1629-1637.
  18. Hanley, J.A. and B.J. McNeil, "The meaning and use of the area under a receiver operating characteristic(ROC) curve", Radiology, Vol. 143, No.1, 1982, 29-36. https://doi.org/10.1148/radiology.143.1.7063747
  19. He, K., X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers : Surpassing humanlevel performance on imagenet classification", In Proceedings of the IEEE international conference on computer vision, 2015, 1026-1034.
  20. Hochreiter, S. and J. Schmidhuber, "Long shortterm memory", Neural computation, Vol.9, No.8, 1997, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  21. Goodfellow, I., Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016.
  22. IBM, Extracting business value from the 4 V's of big data, 2017, Available at https://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs-big-data(Downloaded 28 February, 2019)
  23. Jaderberg, M., A. Vedaldi, and A. Zisserman, "Deep features for text spotting", In European conference on computer vision, 2014, 512-528.
  24. Johnson, R. and T. Zhang, "Effective use of word order for text categorization with convolutional neural networks", arXiv preprint arXiv : 1412.1058, 2014.
  25. Jordan, M.I. and T.M. Mitchell, "Machine learning : Trends, perspectives, and prospects", Science , Vol.349, No.6245, 2015, 255-260. https://doi.org/10.1126/science.349.6245.225
  26. Joulin, A., E. Grave, P. Bojanowski, and T. Mikolov, "Bag of tricks for efficient text classification", arXiv preprint arXiv : 1607.01759, 2016.
  27. Jozefowicz, R., W. Zaremba, and I. Sutskever, "An empirical exploration of recurrent network architectures", In International Conference on Machine Learning, 2015, 2342-2350.
  28. Kanter, J.M. and K. Veeramachaneni, "Deep feature synthesis : Towards automating data science endeavors", In 2015 IEEE International Conference on Data Science and Advanced Analytics, 2015, 1-10.
  29. Katz, G., E.C.R. Shin, and D. Song, "Explorekit : Automatic feature generation and selection", In 2016 IEEE 16th International Conference on Data Mining, 2016, 979-984.
  30. Kohavi, R., "A study of cross-validation and bootstrap for accuracy estimation and model selection", In the International Joint Conference on Artificial Intelligence, Vol.14, No.2, 1995, 1137-1145.
  31. Krizhevsky, A., I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks", In Advances in neural information processing systems, 2012, 1097-1105.
  32. Lam, H.T., J.M. Thiebaut, M. Sinn, B. Chen, T. Mai, and O. Alkan, "One button machine for automating feature engineering in relational databases", arXiv preprint arXiv : 1706.00327, 2017.
  33. LaValle, S., E. Lesser, R. Shockley, M.S., Hopkins, and N. Kruschwitz, "Big data, analytics and the path from insights to value", MIT Sloan Management Review, Vol.52, No.2, 2011, 21-31.
  34. Lee, H., D. Lim, and H. Zo, "Personal Information Overload and User Resistance in the Big Data Age", Journal of Intelligence and Information Systems, Vol.19, No.1, 2013, 125-139. https://doi.org/10.13088/jiis.2013.19.1.125
  35. Lee, J.J., S.B. Kwon, and S.M. Ahn, "Sementic Analysis Using Deep Learning Model based on Phoneme-level Korean", Journal of Information Technology Services, Vol.17, No.1, 2018, 77-89.
  36. Mitchell, T.M. Machine Learning, McGraw-Hill, New York, 1997.
  37. Mikolov, T., I. Sutskever, K., Chen, G.S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality", In Advances in neural information processing systems, 2013, 3111-3119.
  38. Muller, A.C. and S. Guido, Introduction to machine learning with Python : a guide for data scientists, O'Reilly Media, Inc., California, 2016.
  39. Ng., A., Machine Learning and AI via brain simulations, 2013, Available at http://datascien ceassn.org/sites/default/files/Machine%20 Learning%20and%20AI%20via%20Brain% 20Simulations.pdf(Downloaded 28 February, 2019).
  40. Ozsoy, M.G., "From word embeddings to item recommendation", arXiv preprint arXiv : 1601.01356, 2016.
  41. Pal, N.R. and S.K. Pal, "A review on image segmentation techniques", Pattern recognition, Vol.26, No.9, 1993, 1277-1294. https://doi.org/10.1016/0031-3203(93)90135-J
  42. Park, C.Y., I.H., Jang, and Z.K. Lee, "Authorship Attribution of Web Texts with Korean Language Applying Deep Learning Method", Journal of Information Technology Services, Vol.15, No.3, 2016, 147-155. https://doi.org/10.9716/KITS.2016.15.3.147
  43. Park, J. and Y. Cho, "Clickstream Big Data Mining for Demographics based Digital Marketing", Journal of Intelligence and Information Systems, Vol.22, No.3, 2016, 143-163. https://doi.org/10.13088/jiis.2016.22.3.143
  44. Rusinol, M. and J. Llados, "Logo spotting by a bag-of-words approach for document categorization", In 2009 10th international conference on document analysis and recognition, 2009, 111-115.
  45. Sarkar, D.J. Understanding Feature Engineering (Part 1)-Continuous Numeric Data, 2018, Available at https://towardsdatascience.com/understanding-feature-engineering-part-1-continuous-numeric-data-da4e47099a7b (Downloaded 28 February, 2019)
  46. Sharif Razavian, A., H. Azizpour, J. Sullivan, and S. Carlsson, "CNN features off-the-shelf: an astounding baseline for recognition", In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition Workshops, 2014, 806-813.
  47. Sikka, K., T. Wu, J., Susskind, and M. Bartlett, "Exploring bag of words architectures in the facial expression domain", In European Conference on Computer Vision, 2012, 250-259.
  48. Snoek, J., H. Larochelle, and R.P. Adams, "Practical bayesian optimization of machine learning algorithms", In Advances in Neural Information Processing Systems, 2012, 2951-2959.
  49. Sun, Z., J. Yang, J. Zhang, A. Bozzon, Y. Chen, and C. Xu, "MRLR : Multi-level Representation Learning for Personalized Ranking in Recommendation", In 26th International Joint conferences on Artificial Intelligence, 2017, 2807-2813.
  50. Tibshirani, R.J., "Statistical Learning with Big Data", In the Joint Statistical Meetings 2017, 2017.
  51. Thomas, R., An Introduction to Deep Learning for Tabular Data, 2018. Available at https://www.fast.ai/2018/04/29/categorical-embeddings(Downloaded 28 February, 2019)
  52. Pembeci, I., "Using word embeddings for ontology enrichment", International Journal of Intelligent Systems and Applications in Engineering, Vol.4, No.3, 2016, 49-56. https://doi.org/10.18201/ijisae.58806
  53. Wang, Y., L. Kung, and T.A. Byrd, "Big data analytics : Understanding its capabilities and potential benefits for healthcare organizations", Technological Forecasting and Social Change, Vol.126, 2018, 3-13. https://doi.org/10.1016/j.techfore.2015.12.019
  54. Wang, Y. and X.J. Wang, "A new approach to feature selection in text classification", In 2005 International conference on machine learning and cybernetics, 2005, 3814-3819.
  55. Wallach, H.M., "Topic modeling : beyond bag-ofwords", In Proceedings of the 23rd International Conference on Machine Learning, 2006, 977-984.
  56. Wu, L., S.C. Hoi, and N. Yu, "Semantics-preserving bag-of-words models and applications, "IEEE Transactions on Image Processing", Vol.19, No.7, 2010, 1908-1920. https://doi.org/10.1109/TIP.2010.2045169
  57. Zhang, D., H. Xu, Z., Su, and Y. Xu, "Chinese comments sentiment classification based on word2vec and SVMperf", Expert Systems with Applications, Vol.42, No.4, 2015, 1857-1863. https://doi.org/10.1016/j.eswa.2014.09.011
  58. Zhang, Y., R. Jin, and Z.H., Zhou, "Understanding bag-of-words model : a statistical framework", International Journal of Machine Learning and Cybernetics, Vol.1, No.1-4, 2010, 43-52. https://doi.org/10.1007/s13042-010-0001-0
  59. Zheng, A. and A. Casari, Feature Engineering for Machine Learning : Principles and Techniques for Data Scientists, O'Reilly Media, Inc., California, 2018.
  60. Zhou, P., Z. Qi, S., Zheng, J., Xu, H., Bao, and B., Xu, "Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling", arXiv preprint arXiv : 1611.06639, 2016.
  61. Zhou, Q., N. Yang, F. Wei, C. Tan, H. Bao, and M. Zhou, "Neural question generation from text : A preliminary study", In National CCF Conference on Natural Language Processing and Chinese Computing, 2017, 662-671.