• Title/Summary/Keyword: keywords

Search Result 2,290, Processing Time 0.032 seconds

A Methodology for Extracting Shopping-Related Keywords by Analyzing Internet Navigation Patterns (인터넷 검색기록 분석을 통한 쇼핑의도 포함 키워드 자동 추출 기법)

  • Kim, Mingyu;Kim, Namgyu;Jung, Inhwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.123-136
    • /
    • 2014
  • Recently, online shopping has further developed as the use of the Internet and a variety of smart mobile devices becomes more prevalent. The increase in the scale of such shopping has led to the creation of many Internet shopping malls. Consequently, there is a tendency for increasingly fierce competition among online retailers, and as a result, many Internet shopping malls are making significant attempts to attract online users to their sites. One such attempt is keyword marketing, whereby a retail site pays a fee to expose its link to potential customers when they insert a specific keyword on an Internet portal site. The price related to each keyword is generally estimated by the keyword's frequency of appearance. However, it is widely accepted that the price of keywords cannot be based solely on their frequency because many keywords may appear frequently but have little relationship to shopping. This implies that it is unreasonable for an online shopping mall to spend a great deal on some keywords simply because people frequently use them. Therefore, from the perspective of shopping malls, a specialized process is required to extract meaningful keywords. Further, the demand for automating this extraction process is increasing because of the drive to improve online sales performance. In this study, we propose a methodology that can automatically extract only shopping-related keywords from the entire set of search keywords used on portal sites. We define a shopping-related keyword as a keyword that is used directly before shopping behaviors. In other words, only search keywords that direct the search results page to shopping-related pages are extracted from among the entire set of search keywords. A comparison is then made between the extracted keywords' rankings and the rankings of the entire set of search keywords. Two types of data are used in our study's experiment: web browsing history from July 1, 2012 to June 30, 2013, and site information. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The original sample dataset contains 150 million transaction logs. First, portal sites are selected, and search keywords in those sites are extracted. Search keywords can be easily extracted by simple parsing. The extracted keywords are ranked according to their frequency. The experiment uses approximately 3.9 million search results from Korea's largest search portal site. As a result, a total of 344,822 search keywords were extracted. Next, by using web browsing history and site information, the shopping-related keywords were taken from the entire set of search keywords. As a result, we obtained 4,709 shopping-related keywords. For performance evaluation, we compared the hit ratios of all the search keywords with the shopping-related keywords. To achieve this, we extracted 80,298 search keywords from several Internet shopping malls and then chose the top 1,000 keywords as a set of true shopping keywords. We measured precision, recall, and F-scores of the entire amount of keywords and the shopping-related keywords. The F-Score was formulated by calculating the harmonic mean of precision and recall. The precision, recall, and F-score of shopping-related keywords derived by the proposed methodology were revealed to be higher than those of the entire number of keywords. This study proposes a scheme that is able to obtain shopping-related keywords in a relatively simple manner. We could easily extract shopping-related keywords simply by examining transactions whose next visit is a shopping mall. The resultant shopping-related keyword set is expected to be a useful asset for many shopping malls that participate in keyword marketing. Moreover, the proposed methodology can be easily applied to the construction of special area-related keywords as well as shopping-related ones.

Keyword Reorganization Techniques for Improving the Identifiability of Topics (토픽 식별성 향상을 위한 키워드 재구성 기법)

  • Yun, Yeoil;Kim, Namgyu
    • Journal of Information Technology Services
    • /
    • v.18 no.4
    • /
    • pp.135-149
    • /
    • 2019
  • Recently, there are many researches for extracting meaningful information from large amount of text data. Among various applications to extract information from text, topic modeling which express latent topics as a group of keywords is mainly used. Topic modeling presents several topic keywords by term/topic weight and the quality of those keywords are usually evaluated through coherence which implies the similarity of those keywords. However, the topic quality evaluation method based only on the similarity of keywords has its limitations because it is difficult to describe the content of a topic accurately enough with just a set of similar words. In this research, therefore, we propose topic keywords reorganizing method to improve the identifiability of topics. To reorganize topic keywords, each document first needs to be labeled with one representative topic which can be extracted from traditional topic modeling. After that, classification rules for classifying each document into a corresponding label are generated, and new topic keywords are extracted based on the classification rules. To evaluated the performance our method, we performed an experiment on 1,000 news articles. From the experiment, we confirmed that the keywords extracted from our proposed method have better identifiability than traditional topic keywords.

A Comparative Analysis on Keywords of International and Korean Journals in Library and Information Science (국내외 문헌정보학 저널의 키워드 비교 분석)

  • Kim, Eungi
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.1
    • /
    • pp.207-225
    • /
    • 2017
  • The aim of this study was to discover various Library and Information Science (LIS) research areas by examining similarities and differences between LIS journals in terms of keyword characteristics. To conduct this study, for the years from 2004 to 2016, the keywords of 6 international journals were downloaded from Scopus database (http://www.scopus.com), and the keywords of 4 Korean journals were downloaded from the RISS database (http://www.riss.co.kr). The characteristics of keywords were investigated by examining frequently used keywords and frequently used distinctive keywords pertaining to international and Korean journals. The distinctive keywords are referred to as the keywords that appear in one domain but not in another. The result of this study indicated the following: a) a frequency analysis of the keywords showed major research themes and unique traits concerning Korea. b) In general, the keywords used in Korean journals frequently reflected the library as a major subject area of research, while keywords used in international journals reflected bibliometrics and information retrieval as major subject areas of research. c) The overarching themes of each created dataset were clearly noticeable in frequently used distinctive keywords. d) Some keywords were bound by a nation or by a region due to their scope of usage. The important implication of this study is that both most frequently used keywords and most frequently used distinctive keywords seemed to adequately represent the LIS subject areas.

A Study on the Research Trend of Elementary Environmental Education through an Analysis of the Network of Author Keywords (저자 키워드 네트워크 분석을 통한 초등 환경교육의 연구 동향 탐색)

  • Kim, Dong-Ryeul
    • Journal of Korean Elementary Science Education
    • /
    • v.36 no.2
    • /
    • pp.113-128
    • /
    • 2017
  • This study aims to investigate the research trend of elementary environmental education. Thus, author keywords were extracted from a total of 197 academic these related to elementary environmental education during two different periods when detailed goals were applied to the 2007 and 2009 revised curriculums respectively, and then this study analyzed the network of author keywords. The results of this study can be summarized as below. Firstly, as a result of analyzing the frequency of author keywords from academic theses related to elementary environmental education, this study discovered 369 author keywords from the period when detailed goals were applied to 2009 revised curriculum. Out of them, it was found that the keyword, 'climate change education', showed the highest frequency, followed by 'environmental literacy' and 'environmental perception', except such central keywords as 'environmental education' and 'elementary school student'. From the period when detailed goals were applied to the 2007 revised curriculum, a total of 394 author keywords were discovered, and the keyword, 'environmental literacy', showed the highest frequency, followed by 'environmental perception' and 'ESD (education for sustainable development)'. Secondly, as a result of analyzing the network of author keywords, this study found out that in the total number of network connections, average connection degree, density and clique, the period when detailed goals were applied to the 2007 revised curriculum was somewhat higher than the period when detailed goals were applied to the 2009 revised curriculum. As a result of analyzing the centrality of author keywords, this study found out that during both the periods, 'environmental perception' and 'environmental literacy' were high in degree centrality and betweenness centrality, except such central keywords as 'environmental education' and 'elementary school student'. As a result of analyzing the components of author keywords as sub-networks, this study discovered 9 components from the period when detailed goals were applied to the 2009 revised curriculum and 6 components from the period when detailed goals were applied to the 2007 revised curriculum. During both the periods, the largest component was composed of keywords high in degree centrality and betweenness centrality.

Topic Modeling with Deep Learning-based Sentiment Filters (감정 딥러닝 필터를 활용한 토픽 모델링 방법론)

  • Choi, Byeong-Seol;Kim, Namgyu
    • The Journal of Information Systems
    • /
    • v.28 no.4
    • /
    • pp.271-291
    • /
    • 2019
  • Purpose The purpose of this study is to propose a methodology to derive positive keywords and negative keywords through deep learning to classify reviews into positive reviews and negative ones, and then refine the results of topic modeling using these keywords. Design/methodology/approach In this study, we extracted topic keywords by performing LDA-based topic modeling. At the same time, we performed attention-based deep learning to identify positive and negative keywords. Finally, we refined the topic keywords using these keywords as filters. Findings We collected and analyzed about 6,000 English reviews of Gyeongbokgung, a representative tourist attraction in Korea, from Tripadvisor, a representative travel site. Experimental results show that the proposed methodology properly identifies positive and negative keywords describing major topics.

Coincidence Analysis of Keywords of the Journal of Korean Academy of Nursing with MeSH (대한간호학회지 게재 논문 주요어 분석(2003-2005년))

  • Jeong Geum-Hee;Ahn Young-Mee;Cho Dong-Sook
    • Journal of Korean Academy of Nursing
    • /
    • v.35 no.7
    • /
    • pp.1420-1425
    • /
    • 2005
  • Purpose: We try to disclose how much the keywords of the papers from the Journal of the Korean Academy of Nursing coincide with MeSH terminologies and to understand the major subjects of the recent nursing research in Korea from keywords. Methods: Keywords of journals were extracted and compared with MeSH terms. The frequency of the appearance of each keyword was sorted by a descending order. Results: Coincidence rate of 1,235 keywords with MeSH terms was $51.6\%$. Out of them, depression, elderly, stress, self efficacy, quality of life, exercise, middle-aged women, and women appeared most frequently in descending order. Conclusion: Coincidence rate of the keywords with MeSH terms was at an acceptable level, however to improve it, the education of submitters and editorial board members are required, as well as the copy editor, to take a role in checking keywords. To infer the subjects of the research from keywords might well represent the recent topics of research work.

A Study on the Keyword Extraction for ESG Controversies Through Association Rule Mining (연관규칙 분석을 통한 ESG 우려사안 키워드 도출에 관한 연구)

  • Ahn, Tae Wook;Lee, Hee Seung;Yi, June Suh
    • The Journal of Information Systems
    • /
    • v.30 no.1
    • /
    • pp.123-149
    • /
    • 2021
  • Purpose The purpose of this study is to define the anti-ESG activities of companies recognized by media by reflecting ESG recently attracted attention. This study extracts keywords for ESG controversies through association rule mining. Design/methodology/approach A research framework is designed to extract keywords for ESG controversies as follows: 1) From DeepSearch DB, we collect 23,837 articles on anti-ESG activities exposed to 130 media from 2013 to 2018 of 294 listed companies with ESG ratings 2) We set keywords related to environment, social, and governance, and delete or merge them with other keywords based on the support, confidence, and lift derived from association rule mining. 3) We illustrate the importance of keywords and the relevance between keywords through density, degree centrality, and closeness centrality on network analysis. Findings We identify a total of 26 keywords for ESG controversies. 'Gapjil' records the highest frequency, followed by 'corruption', 'bribery', and 'collusion'. Out of the 26 keywords, 16 are related to governance, 8 to social, and 2 to environment. The keywords ranked high are mostly related to the responsibility of shareholders within corporate governance. ESG controversies associated with social issues are often related to unfair trade. As a result of confidence analysis, the keywords related to social and governance are clustered and the probability of mutual occurrence between keywords is high within each group. In particular, in the case of "owner's arrest", it is caused by "bribery" and "misappropriation" with an 80% confidence level. The result of network analysis shows that 'corruption' is located in the center, which is the most likely to occur alone, and is highly related to 'breach of duty', 'embezzlement', and 'bribery'.

An Analysis on Keywords in the Journal of Korean Safety Management Science from 2018 to 2021 (2018년부터 2021년까지 대한안전경영과학회지의 주제어에 관한 분석)

  • Byoung-Hak Yang
    • Journal of the Korea Safety Management & Science
    • /
    • v.25 no.1
    • /
    • pp.1-6
    • /
    • 2023
  • This study tried to analyze the keywords of the papers published in the Korea Safety Management Science by using the social network analysis. In order to extract the keywords, information on journal articles published from 2018 to 2021 was extracted from the SCIENCE ON. Among the keywords extracted from a total of 129 papers, the keywords with similar meanings were standardized. The keywords used in the same paper were visualized by connecting them through a network. Four centrality indicators of the social network analysis were used to analyze the effect of the keyword. Safety, Safety management, Apartment, Fire hose, SMEs, Virtual reality, Machine learning, Waterproof time, R&D capability, and Job crafting were selected as the keywords analyzed with high influence in the four centrality indicators.

Coincidence analysis of keywords and MeSH terms in the Korean Journal of Emergency Medical Services (한국응급구조학회지 게재 논문의 중심 단어 분석(2005년-2011년))

  • Lee, Kyoung-Hee;Ham, Young-Lim
    • The Korean Journal of Emergency Medical Services
    • /
    • v.16 no.2
    • /
    • pp.43-51
    • /
    • 2012
  • Purpose : We try to disclose how much the keywords of the papers from the Korean Journal of Emergency Medical Services with Medical Subject Headings(MeSH) terminologies and to understand the major subjects of the recent emergency medical technology research in Korea from keywords. Methods : We analyzed keywords from 524 articles of the Korean Journal of Emergency Medical Services that were published between 2005 and 2011. We investigated frequently used keywords and what percentages of keywords agree with MeSH terms using the MeSH browser. Results : There were on average 3.2 keywords per article. The most frequent key words were AED, Attitude, Cardiopulmonary Resuscitation, CPR, EMT, EMT students, External Defibrillator, Job satisfaction, Knowledge, 119 EMT in order. The number of terms in precise agreement with MeSH headings was 101(19.3%); 327 terms(62.4%) were not found in the MeSH browser and 96 terms(18.3%) partially matched MeSH terms. Conclusion : Many keywords used in the Korean Journal of Emergency Medical Services did not agree with MeSH terms. We conclude that contribution rules should be using MeSH terms and authors should be educated in the proper use of MeSH terms in their research and subsequent publication.

The Comparison of Keyword of Articles in Journal of the Korean Society of Physical Medicine with MeSH (대한물리의학회지 논문의 주제어와 MeSH용어의 비교)

  • Roh, Jung-Suk
    • Journal of the Korean Society of Physical Medicine
    • /
    • v.7 no.3
    • /
    • pp.367-377
    • /
    • 2012
  • Purpose : The purpose of this study was to investigate the coincidence between keywords of Journal of the Korean Society of Physical Medicine (JKSPM) and MeSH terms, a controlled vocabulary used in MEDLINE. Methods : A total of 838 keywords used in 252 papers of JKSPM from Vol.1, No.1, 2006 to Vol.7, No.1, 2012 were compared with MeSH terms. All of keywords are classified to three large categories; complete coincidence, incomplete coincidence, and complete incoincidence. Results : The keywords in complete coincidence category were 183(21.8%), the keywords in incomplete coincidence category were 378(45.1%), and the keywords in complete incoincidence category were 277(33%). The most used keyword in complete coincidence category was 'stroke' and in complete incoincidence category was 'balance'. The most used keyword matching entry terms in incomplete coincidence category was 'elderly'. Conclusion : The rate of complete coincidene of keywords with MeSH terms was not higher than the rates of incomplete coincidence and complete incoincidence. It is necessary to understand MeSH terms more accurately and specifically. The JKSPM should ask the authors to use MeSH terms as keyword when they submit the paper.