DOI QR코드

DOI QR Code

조세심판 문서 검색 효율 향상 모델에 관한 연구

A Study on the Improvement Model of Document Retrieval Efficiency of Tax Judgment

  • 이후영 (공주대학교 컴퓨터공학과) ;
  • 박구락 (공주대학교 컴퓨터공학부) ;
  • 김동현 (공주대학교 컴퓨터공학과)
  • Lee, Hoo-Young (Dept. of Computer Engineering, Kongju National University) ;
  • Park, Koo-Rack (Dept. of Computer Science & Engineering, Kongju National University) ;
  • Kim, Dong-Hyun (Dept. of Computer Engineering, Kongju National University)
  • 투고 : 2019.05.01
  • 심사 : 2019.06.20
  • 발행 : 2019.06.28

초록

조세 심판에 대한 선결정례는 법원 판례의 경우 유사 심판례를 검색하여 파악하는 것이 매우 중요한 상황이다. 그러나 기존 심판문에 대한 검색은 사용자가 입력하는 키워드를 통하여 검색하는 방법을 사용하고 있으나, 정확한 키워드의 입력이 필요하며, 키워드를 모르는 경우 필요한 문서를 검색하는 것은 불가능하다. 또한 검색된 문서 중에는 내용이 다른 경우도 발생한다. 이에 본 논문에서는 정확한 심판례의 검색을 위하여 문서를 3차원 공간에 벡터화하고, 코사인 유사도를 계산하여, 거리상 가까운 문서를 검색하는 방법의 효율성을 향상시키기 위하여 심판례에서 사용되고 있는 단어들의 유사도를 분석한 후, 최빈값을 추출하여 본문의 텍스트에 삽입하는 방법으로 검색하고자 하는 문서의 코사인 유사도를 향상시키는 방안을 제안한다. 제안 모델을 통하여 조세와 관련된 심판례를 검색하고자 하는 사용자에게 신속하고, 정확한 검색을 제공할 수 있을 것으로 기대된다.

It is very important to search for and obtain an example of a similar judgment in case of court judgment. The existing judge's document search uses a method of searching through key-words entered by the user. However, if it is necessary to input an accurate keyword and the keyword is unknown, it is impossible to search for the necessary document. In addition, the detected document may have different contents. In this paper, we want to improve the effectiveness of the method of vectorizing a document into a three-dimensional space, calculating cosine similarity, and searching close documents in order to search an accurate judge's example. Therefore, after analyzing the similarity of words used in the judge's example, a method is provided for extracting the mode and inserting it into the text of the text, thereby providing a method for improving the cosine similarity of the document to be retrieved. It is hoped that users will be able to provide a fast, accurate search trying to find an example of a tax-related judge through the proposed model.

키워드

OHHGBW_2019_v10n6_41_f0001.png 이미지

Fig. 1. Distributed Momory(DM) Model

OHHGBW_2019_v10n6_41_f0002.png 이미지

Fig. 2. Distributed Bag of Words(DBOW) Model

OHHGBW_2019_v10n6_41_f0003.png 이미지

Fig. 3. Flow of Natural Language Processing

OHHGBW_2019_v10n6_41_f0004.png 이미지

Fig. 4. System Configuration

OHHGBW_2019_v10n6_41_f0005.png 이미지

Fig. 5. Structure of Judgment Document

OHHGBW_2019_v10n6_41_f0006.png 이미지

FIg. 6. Visualization of Analytical Data

Table 1. Extract the Top 10 Words

OHHGBW_2019_v10n6_41_t0001.png 이미지

Table 2. Most Words and Highly Similar Word List

OHHGBW_2019_v10n6_41_t0002.png 이미지

Table 3. Change in Cosine Similarity

OHHGBW_2019_v10n6_41_t0003.png 이미지

참고문헌

  1. S. J. Baek. (2017). Multi-Document Summarization Method Based on Semantic Relationship using VAE. Journal of Digital Convergence, 15(12), 341-347. DOI : 10.14400/JDC.2017.15.12.341
  2. W. J. Lee & T. G. Kim. (2019). A Study on the Research Trend in the Dyslexia and Learning Disability Trough a Keyword Network Analysis. Journal of Digital Convergence, 17(1), 91-98. DOI : 10.14400/JDC.2019.17.1.091
  3. B. S. Kang. (2019). A Study on the Accuracy Improvement of Movie Recommender System Using Word2Vec and Ensemble Convolutional Neural Networks. Journal of Digital Convergence, 17(1), 123-130. DOI : 10.14400/JDC.2019.17.1.123
  4. Y. S. Jeong. (2019). A Model Design for Enhancing the Efficiency of Smart Factory for Small and Medium-Sized Businesses Based on Artificial Intelligence. Journal of Convergence for Information Technology, 9(3), 16-21. DOI : 10.22156/CS4SMB.2019.9.3.016
  5. J. M. Kim. (2017). Study on Intention and Attitude of Using Artificial Intelligence Technology in Healthcare. Journal of Convergence for Information Technology, 7(4), 53-60. DOI : 10.22156/CS4SMB.2017.7.4.053
  6. J. Turian, L. Ratinov & Y. Bengio. (2010). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics, 384- 394.
  7. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado & J. Dean. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111-3119.
  8. Y. Bengio, R. Ducharme, P. Vincent & C. Jauvin. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137-1155.
  9. D. Tang, B. Qin & T. Liu. (2015). Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 conference on empirical methods in natural language processing, 1422-1432. DOI : 10.18653/v1/d15-1167
  10. R. Collobert & J. Weston. (2008, July). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ACM, 160-167. DOI : 10.1145/1390156.1390177
  11. J. H. Yuk & S. Min. (2018). A Study of Research on Methods of Automated Biomedical Document Classification using Topic Modeling and Deep Learning. Journal of Korean Society for Information Society, 35(2), 63-88. DOI : 10.3743/KOSIM.2018.35.2.063
  12. I. S. Kang. (2013). A Comparative Study on Using SentiWordNet for English Twitter Sentiment Analysis. Journal of Korean Institute of Intelligent Systems, 23(4), 317-324. DOI : 10.5391/JKIIS.2013.23.4.317
  13. T. Mikolov, K. Chen, G. Corrado & J. Dean, (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  14. M. Y. Ren & S. Kang. (2015). Comparison Between Optimal Features of Korean and Chinese for Text Classification. Journal of Korean Institute of Intelligent Systems, 25(4), 386-391. DOI : 10.5391/JKIIS.2015.25.4.386
  15. D. W. Lee, S. H. Baek, M. J. Park, J. H. Park, H. W. Jung & J. H. Lee. (2012). Document Summarization Using Mutual Recommendation with LSA and Sense Analysis. Journal of Korean Institute of Intelligent Systems, 22(5), 656-662. DOI : 10.5391/JKIIS.2012.22.5.656
  16. S. H. Jun. (2015). A big data preprocessing using statistical text mining. Journal of Korean Institute of Intelligent Systems, 25(5), 470-476. DOI : 10.5391/JKKIS.2015.25.5.470
  17. Q. Le & T. Mikolov. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML 2014), Beijing, China, 1188-1196.
  18. J. H. Lau & T. Baldwin. (2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368.
  19. J. H. Kim. (2019). Method of Keyword Recommendation Considering Importance and Correlation of Words. Chosun University, Master's Thesis.
  20. A. R. Song & Y. H. Park. (2018). WV-BTM: A Technique on Improving Accuracy of Topic Model for Short Texts in SNS. Journal of Digital Contents Society, 19(1), 51-58. DOI : 10.9728/dcs.2018.19.1.51
  21. H. J. Lee & J. W. Kim. (2017). A Study on the Natural Language Processing(NLP) Technical and Standardization Trend. Proceeding of Korea Institute of Communication Sciences, 876-877.