DOI QR코드

DOI QR Code

Authorship Attribution of Web Texts with Korean Language Applying Deep Learning Method

딥러닝을 활용한 웹 텍스트 저자의 남녀 구분 및 연령 판별 : SNS 사용자를 중심으로

  • Received : 2016.05.11
  • Accepted : 2016.08.02
  • Published : 2016.09.30

Abstract

According to rapid development of technology, web text is growing explosively and attracting many fields as substitution for survey. The user of Facebook is reaching up to 113 million people per month, Twitter is used in various institution or company as a behavioral analysis tool. However, many research has focused on meaning of the text itself. And there is a lack of study for text's creation subject. Therefore, this research consists of sex/age text classification with by using 20,187 Facebook users' posts that reveal the sex and age of the writer. This research utilized Convolution Neural Networks, a type of deep learning algorithms which came into the spotlight as a recent image classifier in web text analyzing. The following result assured with 92% of accuracy for possibility as a text classifier. Also, this research was minimizing the Korean morpheme analysis and it was conducted using a Korean web text to Authorship Attribution. Based on these feature, this study can develop users' multiple capacity such as web text management information resource for worker, non-grammatical analyzing system for researchers. Thus, this study proposes a new method for web text analysis.

Keywords

References

  1. Abbasi, A. and H. Chen, "Writerprints : A Stylometric Approach to Identity-level Identification and Similarity Detection", ACM Transactions on Information Systems, Vol.26, No.2, 2008.
  2. Argamon, S., M. Koppel, J.W. Pennebaker, and J. Schler, "Automatically Profiling the Author of an Anonymous Text", Communications of the ACM, Vol.52, No.2, 2009, 119-123. https://doi.org/10.1145/1461928.1461959
  3. Bhargava, M., P. Mehndiratta, and K. Asawa, "Stylometric Analysis for Authorship Attribution on Twitter", BDA, Vol.8302, 2013, 37-47.
  4. Choi, J.M., "Authorship Attribution of Korean Texts Using Machine Learning Methods : A Study on Movie Reviews on Blogs", Yonsei University Master's thesis located, 2015. (최지명, "기계학습을 활용한 한국어 텍스트 저자판별", 연세대학교 석사학위논문, 2015.)
  5. Han, N.R., "Authorship Attribution in Korean Using Frequency Profiles", KJCS, Vol.20, No.2, 2009, 225-241. (한나래, "빈도정보를 이용한 한국어 저자판별", 인지과학학회지, 제20권, 제2호, 2009, 225-241.)
  6. IWGDPT, "Report and Guidance on Privacy in Social Network Services : Rome Memorandum", 2008. Available at http://www.datenschutz-berlin.de/attachments/461/WP_social_network_services.pdf(Downloaded June 15. 2015).
  7. Kang, B.I. and J.Y. Lee, "A Bibliometric Analysis on Twitter Research", Journal of the Korean Society for Information Management, Vol.31, No.3, 2014, 293-311. (강범일, 이재윤, "트위터 관련 연구에 대한 계량정보학적 분석", 정보관리학회지, 제31권, 제3호, 2014, 293-311.) https://doi.org/10.3743/KOSIM.2014.31.3.293
  8. Mikolov, T., K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space", 2013. Available at https://arxiv.org/pdf/1301.3781.pdf(Downloaded June 12. 2015.)
  9. Park, C.Y, "Korean Authorship Attribution from Web Texts Using Machine Learning Methods-Facebook post", Yonsei University Master's thesis located, 2015. (박찬엽, "기계학습을 활용한 한국어 웹 텍스트 저자판별(성별, 연령별) : 페이스북 사용자를 중심으로", 연세대학교 석사학위논문, 2015.)
  10. Stamatatos, E., "A Survey of Modern Authorship Attribution Methods", Journal of the American Society for Information Science and Technology, Vol.60, No.3, 2009, 538-556. https://doi.org/10.1002/asi.21001
  11. Zhang, X., J. Zhao, and Y. LeCun, "Character-Level Convolutional Networks for Text Classification", Advances in Neural Information Processing Systems, Vol.28, 2015.
  12. Zheng, R., J.X. Li, H.C. Chen, and Z. Huang, "A Framework for Authorship Identification of Online Messages : Writing-style Features and Classification Techniques", Journal of the American Society for Information Science and Technology, Vol.57, No.3, 2006, 378-393. https://doi.org/10.1002/asi.20316