A Study on the Improvement Model of Document Retrieval Efficiency of Tax Judgment

조세심판 문서 검색 효율 향상 모델에 관한 연구

  • Lee, Hoo-Young (Dept. of Computer Engineering, Kongju National University) ;
  • Park, Koo-Rack (Dept. of Computer Science & Engineering, Kongju National University) ;
  • Kim, Dong-Hyun (Dept. of Computer Engineering, Kongju National University)
  • 이후영 (공주대학교 컴퓨터공학과) ;
  • 박구락 (공주대학교 컴퓨터공학부) ;
  • 김동현 (공주대학교 컴퓨터공학과)
  • Received : 2019.05.01
  • Accepted : 2019.06.20
  • Published : 2019.06.28


It is very important to search for and obtain an example of a similar judgment in case of court judgment. The existing judge's document search uses a method of searching through key-words entered by the user. However, if it is necessary to input an accurate keyword and the keyword is unknown, it is impossible to search for the necessary document. In addition, the detected document may have different contents. In this paper, we want to improve the effectiveness of the method of vectorizing a document into a three-dimensional space, calculating cosine similarity, and searching close documents in order to search an accurate judge's example. Therefore, after analyzing the similarity of words used in the judge's example, a method is provided for extracting the mode and inserting it into the text of the text, thereby providing a method for improving the cosine similarity of the document to be retrieved. It is hoped that users will be able to provide a fast, accurate search trying to find an example of a tax-related judge through the proposed model.


Convergence;Tax Cases;Similar Documents;NLP;Word Embedding

OHHGBW_2019_v10n6_41_f0001.png 이미지

Fig. 1. Distributed Momory(DM) Model

OHHGBW_2019_v10n6_41_f0002.png 이미지

Fig. 2. Distributed Bag of Words(DBOW) Model

OHHGBW_2019_v10n6_41_f0003.png 이미지

Fig. 3. Flow of Natural Language Processing

OHHGBW_2019_v10n6_41_f0004.png 이미지

Fig. 4. System Configuration

OHHGBW_2019_v10n6_41_f0005.png 이미지

Fig. 5. Structure of Judgment Document

OHHGBW_2019_v10n6_41_f0006.png 이미지

FIg. 6. Visualization of Analytical Data

Table 1. Extract the Top 10 Words

OHHGBW_2019_v10n6_41_t0001.png 이미지

Table 2. Most Words and Highly Similar Word List

OHHGBW_2019_v10n6_41_t0002.png 이미지

Table 3. Change in Cosine Similarity

OHHGBW_2019_v10n6_41_t0003.png 이미지


  1. S. J. Baek. (2017). Multi-Document Summarization Method Based on Semantic Relationship using VAE. Journal of Digital Convergence, 15(12), 341-347. DOI : 10.14400/JDC.2017.15.12.341
  2. W. J. Lee & T. G. Kim. (2019). A Study on the Research Trend in the Dyslexia and Learning Disability Trough a Keyword Network Analysis. Journal of Digital Convergence, 17(1), 91-98. DOI : 10.14400/JDC.2019.17.1.091
  3. B. S. Kang. (2019). A Study on the Accuracy Improvement of Movie Recommender System Using Word2Vec and Ensemble Convolutional Neural Networks. Journal of Digital Convergence, 17(1), 123-130. DOI : 10.14400/JDC.2019.17.1.123
  4. Y. S. Jeong. (2019). A Model Design for Enhancing the Efficiency of Smart Factory for Small and Medium-Sized Businesses Based on Artificial Intelligence. Journal of Convergence for Information Technology, 9(3), 16-21. DOI : 10.22156/CS4SMB.2019.9.3.016
  5. J. M. Kim. (2017). Study on Intention and Attitude of Using Artificial Intelligence Technology in Healthcare. Journal of Convergence for Information Technology, 7(4), 53-60. DOI : 10.22156/CS4SMB.2017.7.4.053
  6. J. Turian, L. Ratinov & Y. Bengio. (2010). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics, 384- 394.
  7. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado & J. Dean. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111-3119.
  8. Y. Bengio, R. Ducharme, P. Vincent & C. Jauvin. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137-1155.
  9. D. Tang, B. Qin & T. Liu. (2015). Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 conference on empirical methods in natural language processing, 1422-1432. DOI : 10.18653/v1/d15-1167
  10. R. Collobert & J. Weston. (2008, July). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ACM, 160-167. DOI : 10.1145/1390156.1390177
  11. J. H. Yuk & S. Min. (2018). A Study of Research on Methods of Automated Biomedical Document Classification using Topic Modeling and Deep Learning. Journal of Korean Society for Information Society, 35(2), 63-88. DOI : 10.3743/KOSIM.2018.35.2.063
  12. I. S. Kang. (2013). A Comparative Study on Using SentiWordNet for English Twitter Sentiment Analysis. Journal of Korean Institute of Intelligent Systems, 23(4), 317-324. DOI : 10.5391/JKIIS.2013.23.4.317
  13. T. Mikolov, K. Chen, G. Corrado & J. Dean, (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  14. M. Y. Ren & S. Kang. (2015). Comparison Between Optimal Features of Korean and Chinese for Text Classification. Journal of Korean Institute of Intelligent Systems, 25(4), 386-391. DOI : 10.5391/JKIIS.2015.25.4.386
  15. D. W. Lee, S. H. Baek, M. J. Park, J. H. Park, H. W. Jung & J. H. Lee. (2012). Document Summarization Using Mutual Recommendation with LSA and Sense Analysis. Journal of Korean Institute of Intelligent Systems, 22(5), 656-662. DOI : 10.5391/JKIIS.2012.22.5.656
  16. S. H. Jun. (2015). A big data preprocessing using statistical text mining. Journal of Korean Institute of Intelligent Systems, 25(5), 470-476. DOI : 10.5391/JKKIS.2015.25.5.470
  17. Q. Le & T. Mikolov. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML 2014), Beijing, China, 1188-1196.
  18. J. H. Lau & T. Baldwin. (2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368.
  19. J. H. Kim. (2019). Method of Keyword Recommendation Considering Importance and Correlation of Words. Chosun University, Master's Thesis.
  20. A. R. Song & Y. H. Park. (2018). WV-BTM: A Technique on Improving Accuracy of Topic Model for Short Texts in SNS. Journal of Digital Contents Society, 19(1), 51-58. DOI : 10.9728/dcs.2018.19.1.51
  21. H. J. Lee & J. W. Kim. (2017). A Study on the Natural Language Processing(NLP) Technical and Standardization Trend. Proceeding of Korea Institute of Communication Sciences, 876-877.