• Title/Summary/Keyword: Keyphrase Extraction

Search Result 19, Processing Time 0.027 seconds

Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes, and Artificial Neural Networks

  • Sarkar, Kamal;Nasipuri, Mita;Ghose, Suranjan
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.693-712
    • /
    • 2012
  • The paper presents three machine learning based keyphrase extraction methods that respectively use Decision Trees, Na$\ddot{i}$ve Bayes, and Artificial Neural Networks for keyphrase extraction. We consider keyphrases as being phrases that consist of one or more words and as representing the important concepts in a text document. The three machine learning based keyphrase extraction methods that we use for experimentation have been compared with a publicly available keyphrase extraction system called KEA. The experimental results show that the Neural Network based keyphrase extraction method outperforms two other keyphrase extraction methods that use the Decision Tree and Na$\ddot{i}$ve Bayes. The results also show that the Neural Network based method performs better than KEA.

Adjusting Weights of Single-word and Multi-word Terms for Keyphrase Extraction from Article Text

  • Kang, In-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.47-54
    • /
    • 2021
  • Given a document, keyphrase extraction is to automatically extract words or phrases which topically represent the content of the document. In unsupervised keyphrase extraction approaches, candidate words or phrases are first extracted from the input document, and scores are calculated for keyphrase candidates, and final keyphrases are selected based on the scores. Regarding the computation of the scores of candidates in unsupervised keyphrase extraction, this study proposes a method of adjusting the scores of keyphrase candidates according to the types of keyphrase candidates: word-type or phrase-type. For this, type-token ratios of word-type and phrase-type candidates as well as information content of high-frequency word-type and phrase-type candidates are collected from the input document, and those values are employed in adjusting the scores of keyphrase candidates. In experiments using four keyphrase extraction evaluation datasets which were constructed for full-text articles in English, the proposed method performed better than a baseline method and comparison methods in three datasets.

Latent Keyphrase Extraction Using Deep Belief Networks

  • Jo, Taemin;Lee, Jee-Hyong
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.3
    • /
    • pp.153-158
    • /
    • 2015
  • Nowadays, automatic keyphrase extraction is considered to be an important task. Most of the previous studies focused only on selecting keyphrases within the body of input documents. These studies overlooked latent keyphrases that did not appear in documents. In addition, a small number of studies on latent keyphrase extraction methods had some structural limitations. Although latent keyphrases do not appear in documents, they can still undertake an important role in text mining because they link meaningful concepts or contents of documents and can be utilized in short articles such as social network service, which rarely have explicit keyphrases. In this paper, we propose a new approach that selects qualified latent keyphrases from input documents and overcomes some structural limitations by using deep belief networks in a supervised manner. The main idea of this approach is to capture the intrinsic representations of documents and extract eligible latent keyphrases by using them. Our experimental results showed that latent keyphrases were successfully extracted using our proposed method.

A Dependency Graph-Based Keyphrase Extraction Method Using Anti-patterns

  • Batsuren, Khuyagbaatar;Batbaatar, Erdenebileg;Munkhdalai, Tsendsuren;Li, Meijing;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1254-1271
    • /
    • 2018
  • Keyphrase extraction is one of fundamental natural language processing (NLP) tools to improve many text-mining applications such as document summarization and clustering. In this paper, we propose to use two novel techniques on the top of the state-of-the-art keyphrase extraction methods. First is the anti-patterns that aim to recognize non-keyphrase candidates. The state-of-the-art methods often used the rich feature set to identify keyphrases while those rich feature set cover only some of all keyphrases because keyphrases share very few similar patterns and stylistic features while non-keyphrase candidates often share many similar patterns and stylistic features. Second one is to use the dependency graph instead of the word co-occurrence graph that could not connect two words that are syntactically related and placed far from each other in a sentence while the dependency graph can do so. In experiments, we have compared the performances with different settings of the graphs (co-occurrence and dependency), and with the existing method results. Finally, we discovered that the combination method of dependency graph and anti-patterns outperform the state-of-the-art performances.

Domain Specific Annotation of Digital Documents through Keyphrase Extraction (고정키어구 추출을 통한 디지털 문서의 도메인 특정 주석)

  • Fatima, Iram;Lee, Young-Koo;Lee, Sung-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.1389-1391
    • /
    • 2011
  • In this paper, we propose a methodology to annotate the digital documents through keyphrase extraction using domain specific taxonomy. Limitation of the existing keyphrase extraction algorithms is that output keyphrases may contain irrelevant information along with relevant ones. The quality of the generated keyphrases by the existing approaches does not meet the required level of accuracy. Our proposed approach exploits semantic relationships and hierarchical structure of the classification scheme to filter out irrelevant keyphrases suggested by Keyphrase Extraction Algorithm (KEA++). Our experimental results proved the accuracy of the proposed algorithm through high precision and low recall.

A Keyphrase Extraction Model for Each Conference or Journal (학술대회 및 저널별 기술 핵심구 추출 모델)

  • Jeong, Hyun Ji;Jang, Gwangseon;Kim, Tae Hyun;Sin, Donggu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.81-83
    • /
    • 2022
  • Understanding research trends is necessary to select research topics and explore related works. Most researchers search representative keywords of interesting domains or technologies to understand research trends. However some conferences in artificial intelligence or data mining fields recently publish hundreds to thousands of papers for each year. It makes difficult for researchers to understand research trend of interesting domains. In our paper, we propose an automatic technology keyphrase extraction method to support researcher to understand research trend for each conference or journal. Keyphrase extraction that extracts important terms or phrases from a text, is a fundamental technology for a natural language processing such as summarization or searching, etc. Previous keyphrase extraction technologies based on pretrained language model extract keyphrases from long texts so performances are degraded in short texts like titles of papers. In this paper, we propose a techonolgy keyphrase extraction model that is robust in short text and considers the importance of the word.

  • PDF

Fine-tuning BERT Models for Keyphrase Extraction in Scientific Articles

  • Lim, Yeonsoo;Seo, Deokjin;Jung, Yuchul
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.1
    • /
    • pp.45-56
    • /
    • 2020
  • Despite extensive research, performance enhancement of keyphrase (KP) extraction remains a challenging problem in modern informatics. Recently, deep learning-based supervised approaches have exhibited state-of-the-art accuracies with respect to this problem, and several of the previously proposed methods utilize Bidirectional Encoder Representations from Transformers (BERT)-based language models. However, few studies have investigated the effective application of BERT-based fine-tuning techniques to the problem of KP extraction. In this paper, we consider the aforementioned problem in the context of scientific articles by investigating the fine-tuning characteristics of two distinct BERT models - BERT (i.e., base BERT model by Google) and SciBERT (i.e., a BERT model trained on scientific text). Three different datasets (WWW, KDD, and Inspec) comprising data obtained from the computer science domain are used to compare the results obtained by fine-tuning BERT and SciBERT in terms of KP extraction.

Keyphrase Extraction Using Active Learning and Clustering (Active Learning과 군집화를 이용한 고정키어구 추출)

  • Lee, Hyun-Woo;Cha, Jeong-Won
    • MALSORI
    • /
    • no.66
    • /
    • pp.87-103
    • /
    • 2008
  • We describe a new active learning method in conditional random fields (CRFs) framework for keyphrase extraction. To save elaboration in annotation, we use diversity and representative measure. We select high diversity training candidates by sentence confidence value. We also select high representative candidates by clustering the part-of-speech patterns of contexts. In the experiments using dialog corpus, our method achieves 86.80% and saves 88% training corpus compared with those of supervised method. From the results of experiment, we can see that the proposed method shows improved performance over the previous methods. Additionally, the proposed method can be applied to other applications easily since its implementation is independent on applications.

  • PDF

Multi-cue Integration for Automatic Annotation (자동 주석을 위한 멀티 큐 통합)

  • Shin, Seong-Yoon;Rhee, Yang-Won
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2010.07a
    • /
    • pp.151-152
    • /
    • 2010
  • WWW images locate in structural, networking documents, so the importance of a word can be indicated by its location, frequency. There are two patterns for multi-cues ingegration annotation. The multi-cues integration algorithm shows initial promise as an indicator of semantic keyphrases of the web images. The latent semantic automatic keyphrase extraction that causes the improvement with the usage of multi-cues is expected to be preferable.

  • PDF

RoBERTa-catseqE: Neural keyphrase Extraction with Entity linking using RoBERTa (RoBERTa-catSeqE: 개체 연결을 이용한 RoBERTa기반 키워드 추출)

  • Lee, Jeong-Doo;Na, Seung-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.486-490
    • /
    • 2020
  • 키워드 구문 추출(Keyphrase extraction)은 각 문서에서 내용과 주제를 포괄하는 핵심 단어 또는 구문을 추출하는 것을 말한다. 이는 뉴스나 논문에서 중요한 정보를 추출하는 데 매우 중요한 역할을 한다. 본 논문에서는 기존 catSeq 모델에 한국어로 학습한 RoBERTa 언어 모델을 적용하고 개체 연결 정보를 활용해 기존 키워드 생성 디코더와 개체 연결된 단어의 키워드 여부 분류 디코더, 즉 듀얼 디코더를 사용하는 모델을 제안하고 직접 구축한 한국어 키워드 추출 데이터에 대한 각 모델의 성능을 비교한다.

  • PDF