Sentence Similarity Measurement Method Using a Set-based POI Data Search

Ko, EunByul;Lee, JongWoo;

doi:10.5626/KTCP.2014.20.12.711

KIISE Transactions on Computing Practices (정보과학회 컴퓨팅의 실제 논문지)

Volume 20 Issue 12
/
Pages.711-716
/
2014
/
2383-6318(pISSN)
/
2383-6326(eISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

DOI QR Code

Sentence Similarity Measurement Method Using a Set-based POI Data Search

집합 기반 POI 검색을 이용한 문장 유사도 측정 기법

Ko, EunByul (Sookmyung Womens Univ.) ;
Lee, JongWoo (Sookmyung Womens Univ.)

고은별 (숙명여자대학교 멀티미디어과학과) ;
이종우 (숙명여자대학교 멀티미디어과학과)

Received : 2014.09.30
Accepted : 2014.10.23
Published : 2014.12.15

https://doi.org/10.5626/KTCP.2014.20.12.711 Citation

⟨ Previous Next ⟩

Abstract

With the gradual increase of interest in plagiarism and intelligent file content search, the demand for similarity measuring between two sentences is increasing. There is a lot of researches for sentence similarity measurement methods in various directions such as n-gram, edit-distance and LSA. However, these methods have their own advantages and disadvantages. In this paper, we propose a new sentence similarity measurement method approaching from another direction. The proposed method uses the set-based POI data search that improves search performance compared to the existing hard matching method when data includes the inverse, omission, insertion and revision of characters. Using this method, we are able to measure the similarity between two sentences more accurately and more quickly. We modified the data loading and text search algorithm of the set-based POI data search. We also added a word operation algorithm and a similarity measure between two sentences expressed as a percentage. From the experimental results, we observe that our sentence similarity measurement method shows better performance than n-gram and the set-based POI data search.

최근 논문 표절 논란과 지능형 텍스트 검색서비스에 대한 관심이 증가하면서 문장 유사도 측정의 필요성이 증가하고 있다. n-gram, 편집거리, LSA 등 기존의 다양한 방향으로 선행 연구가 있었지만 각 기법마다 장단점이 존재한다. 본 논문에서는 집합 기반 POI 검색 기법을 이용한 새로운 방향의 문장 유사도 측정 기법을 제안한다. 집합 기반 POI 검색 기법은 하드매칭에 비해 단어의 도치, 누락, 삽입, 변경에 현저한 성능 향상을 보인다. 이 기법을 이용하면 보다 정확하고 빠른 문장 유사도 측정이 가능하다. 제안하는 기법은 기존 집합 기반 POI 검색 기법의 데이터 로딩 알고리즘과 텍스트 검색 알고리즘을 변형하고 어절 연산 알고리즘을 추가하여 두 문장의 유사도를 백분율로 표현한다. 실험을 통해 본 논문에서 제시하는 기법이 정확도와 속도에서 n-gram과 기존 집합 기반 POI 검색 기법에 비해 우수함을 확인하였다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

E. J. Oh, "Exploring the Information Ethics and Plagiarism of University Students," International Journal of Creativity & Problem Solving, Vol. 9, No. 3, pp. 163-184, Jan. 2013. (in Korean)
J. K. Cho, S. E. Ha, "Effective Scheme for File Search Engine in Mobile Environments," International Jounal of Contents, Vol. 8, No. 11, pp. 41-48, Nov. 2008. (in Korean) https://doi.org/10.5392/JKCA.2008.8.11.041
J. I. Kim, "Efficient Edit Similarity Search Technique Using Prefix Element Selection," Journal of KIISE : Computing Practices and Letters, Vol. 18, No. 9, pp. 654-659, Sep. 2012. (in Korean)
D. J. Kim, H. W. Kim, "Context-Weighted Metrics for Example Matching," Journal of the Institute of Electronics Engineers of Korea, Vol. 43, No. 6, pp. 43-51, Nov. 2006. (in Korean)
H. S. Ji, J. H. Joh, H. S. Lim, "A Detection Method of Similar Sentences Considering Plagiarism Patterns of Korean Sentence," Journal of the Korean Association of Computer Education, Vol. 13, No. 6, pp. 79-89, Nov. 2010. (in Korean)
E. B. Go, J. W. Lee, J. W. Lee, "An Efficient Set-based POI Search Algorithm," Journal of KIISE : Computing Practices and Letters, Vol. 19, No. 5, pp. 242-251, May. 2013. (in Korean)
E. B. Ko and J. W. Lee, "Implementation of A Setbased POI Search Algorithm Supporting Classifying Duplicate Characters," Journal of Digital Contents Society, Vol. 14, No. 4, pp. 465-471, Dec. 2013. (in Korean)
A. Y. Jin, J. W. Lee, J. W. Lee, "Measuring Method of String Similarity for POI Data Retrieval," Journal of KIISE : Computing Practices and Letters, Vol. 19, No. 4, pp. 177-185, Apr. 2013. (in Korean)

KIISE Transactions on Computing Practices (정보과학회 컴퓨팅의 실제 논문지)

Sentence Similarity Measurement Method Using a Set-based POI Data Search

집합 기반 POI 검색을 이용한 문장 유사도 측정 기법

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)