• Title/Summary/Keyword: Text Repository

Search Result 26, Processing Time 0.021 seconds

Implementation of Artificial Intelligence Speech Recognition Text Repository for Elementary Career Counseling (초등 진로 상담을 위한 인공지능 음성 인식 텍스트 레포지토리 구현)

  • Yu, Minjeong;Ma, Youngji;Koo, Dukhoi
    • 한국정보교육학회:학술대회논문집
    • /
    • 2021.08a
    • /
    • pp.327-333
    • /
    • 2021
  • Currently development of the Artificial Intelligence technology is rapidly progressing in the era of the Fourth Industrial Revolution. The government is trying to improve the education of Artificial Intelligence and cultivating human resources. However there are very few cases where A.I technology is actually used in public education classes. Therefore we designed a text repository by implementing A.I speech recognition to provide career counseling for elementary school students. In the meantime, there have been many difficulties in giving advance consultations required for students' career counseling. In this study we suggested A.I speech recognition technology which can solve addressed problem and we planned various ways to make the program more educational. To conclude we expect A.I technology implemented in this study provides effective solution to career counseling.

  • PDF

XML Repository System Using DBMS and IRS

  • Kang, Hyung-Il;Yoo, Jae-Soo;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • v.3 no.3
    • /
    • pp.6-14
    • /
    • 2007
  • In this paper, we design and implement a XML Repository System(XRS) that exploits the advantages of DBMSs and IRSs. Our scheme uses BRS to support full text indexing and content-based queries efficiently, and ORACLE to store XML documents, multimedia data, DTD and structure information. We design databases to manage XML documents including audio, video, images as well as text. We employ the non-composition model when storing XML documents into ORACLE. We represent structured information as ETID(Element Type Id), SORD(Sibling ORDer) and SSORD(Same Sibling ORDer). ETID is a unique value assigned to each element of DTD. SORD and SSORD represent an order information between sibling nodes and an order information among the sibling nodes with the same element respectively. In order to show superiority of our XRS, we perform various experiments in terms of the document loading time, document extracting time and contents retrieval time. It is shown through experiments that our XRS outperforms the existing XML document management systems. We also show that it supports various types of queries through performance experiments.

A Digital Library Prototype - Digital Repository and Diverse Collections (디지털도서관 프로토타입의 구축 -디지털 리포지토리와 컬렉션을 중심으로)

  • 최원태
    • Proceedings of the Korea Database Society Conference
    • /
    • 1998.09a
    • /
    • pp.383-394
    • /
    • 1998
  • This article is an overview of the digital library project, indicating what roles Korea's diverse digital collections may play. Our digital library prototype has simple architecture, consisting of digital repositories, filters, indexing and searching, and clients. Digital repositories include various types of materials and databases. The role of filters is to recognize a format of a document collection and mark the structural components of each of its documents, We are using a database management system (ORACLE and ConText) supporting user-defined functions and access methods that allows us to easily incorporate new object analysis, structuring, and indexing technology into a repository.

  • PDF

Using the PubAnnotation ecosystem to perform agile text mining on Genomics & Informatics: a tutorial review

  • Nam, Hee-Jo;Yamada, Ryota;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.18 no.2
    • /
    • pp.13.1-13.6
    • /
    • 2020
  • The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotating, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnotation, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon.

N-gram Feature Selection for Text Classification Based on Symmetrical Conditional Probability and TF-IDF (대칭 조건부 확률과 TF-IDF 기반 텍스트 분류를 위한 N-gram 특질 선택)

  • Choi, Woo-Sik;Kim, Seoung Bum
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.41 no.4
    • /
    • pp.381-388
    • /
    • 2015
  • The rapid growth of the World Wide Web and online information services has generated and made accessible a huge number of text documents. To analyze texts, selecting important keywords is an essential step. In this paper, we propose a feature selection method that combines a term frequency-inverse document frequency technique and symmetrical conditional probability. The proposed method can identify features with N-gram, the sequential multiword. The effectiveness of the proposed method is demonstrated through a real text data from the machine learning repository, University of California, Irvine.

A Technique to Link Bug and Commit Report based on Commit History (커밋 히스토리에 기반한 버그 및 커밋 연결 기법)

  • Chae, Youngjae;Lee, Eunjoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.5
    • /
    • pp.235-239
    • /
    • 2016
  • 'Commit-bug link', the link between commit history and bug reports, is used for software maintenance and defect prediction in bug tracking systems. Previous studies have shown that the links are automatically detected based on text similarity, time interval, and keyword. Existing approaches depend on the quality of commit history and could thus miss several links. In this paper, we proposed a technique to link commit and bug report using not only messages of commit history, but also the similarity of files in the commit history coupled with bug reports. The experimental results demonstrated the applicability of the suggested approach.

A Study on Typology of Japanese Institutional Repositories and Features of Groups (일본 기관 레포지토리 유형화 및 군집의 특성 분석)

  • Cho, Jane
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.1
    • /
    • pp.143-161
    • /
    • 2014
  • While dCollections of Korea have been initiated by a government for metadata harvesting, institutional repositories of Japan have been managed as instituion's independent tool for not only collectiong, archiving and distributing their intellecture assets, but also realizing open access. This study analyzes IRDB of Japanese statistically for understanding features of institutional repositories and by clustering the repository on the basis of types of contents, the differences have been brightened. According to analysis result, Japanese repository contains diverse types of contents, such as journal articles;scholarly papers, text books and technical reports. etc. and clustered by five distinguished group with different contents type.

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

  • Prakash, Amit;Singh, Niraj Kumar;Saha, Sujan Kumar
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.413-425
    • /
    • 2022
  • The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.

A Study on Analysis of Research Data Repository in Humanities and Social Sciences (re3data를 기반으로 한 인문사회 RDR 연구)

  • Cho, Jane;Park, Jong-Do
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.30 no.2
    • /
    • pp.69-87
    • /
    • 2019
  • As the discussions on sharing research data prevail by the chance of the inauguration of the International Open Data Charter, research support organizations in the United States, the United Kingdom, and Japan are encouraging researchers to deposit their findings in a credible repository. Humanities and social sciences field, in which research data sharing culture and storage infrastructure are immature compared to life science and natural science, also needs to establish and operate a reliable storage infrastructure to guarantee the continuous access and utilization of data. This study analyzed the overall operational status of 305 subject repositories registered in re3data for the humanities and social sciences and clustered them according to the operational level using 5 indicators. As a result, 70% of the population were identified as universal clusters, and 20% of the excellent cluster was found to have the largest number of linguistic fields and the German-operated. In addition, this study confirmed through correspondence analysis that there is a relation between the sub-theme fields of humanities and social sciences and the types of data to be archived. The history and art domians are related to images, and social studies are related to statistical data. Linguistics has also been analyzed to be related to audio, plain text, and code.

A Market Positioning Analysis using Mobile Shopping App Reviews (모바일 쇼핑 앱 리뷰를 이용한 시장 포지셔닝 분석)

  • Kim, Yong-Hwan;Park, Ji-hoon;Lee, Seung-Jun;Kim, Ja-Hee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.01a
    • /
    • pp.157-160
    • /
    • 2016
  • 최근 모바일 쇼핑 시장의 거래액 규모는 해마다 기하급수적으로 증가하고 있으며, 기업들은 모바일 애플리케이션의 어떤 특성들이 자사의 매출을 증대시킬 수 있는지에 대해 관심이 있다. 그러므로 본 논문에서는 텍스트 마이닝을 이용하여 사용자들이 많이 쓰는 모바일 쇼핑 애플리케이션의 리뷰에서 자주 쓰는 명사를 추출하고 내용분석을 통해 평가 항목들을 도출한다. 그리고 도출된 평가항목에 레퍼토리 그리드 기법을 적용하여 모바일 쇼핑 애플리케이션을 평가하고 시장 포지셔닝을 실시한다. 이를 통해 모바일 쇼핑 애플리케이션의 어떤 특성이 이용자들의 서비스 선호도에 영향을 미치는지 분석한다.

  • PDF