• Title/Summary/Keyword: Multilingual Search

Search Result 14, Processing Time 0.025 seconds

Building and Analysis of Semantic Network on S&T Multilingual Terminology (과학기술 전문용어의 다국어 의미망 생성과 분석)

  • Jeong, Do-Heon;Choi, Hee-Yoon
    • Journal of Information Management
    • /
    • v.37 no.4
    • /
    • pp.25-47
    • /
    • 2006
  • A terminology system capable of providing interpretations and classification information on a multilingual science and technology(S&T) terminology is essential to establish an integrated search environment for multilingual S&T information systems. This paper aims to build a base system to manage an integrated information system for multilingual S&T terminology search. It introduces a method to build a search system for S&T terminologies internally linked through the multilingual semantic network and a search technique on the multiple linked nodes. In order to provide a foundation for further analysis researches, it also attempts to suggest a basic approach to interpret terminology clusters generated with those two search methods.

Improving Elasticsearch for Chinese, Japanese, and Korean Text Search through Language Detector

  • Kim, Ki-Ju;Cho, Young-Bok
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.1
    • /
    • pp.33-38
    • /
    • 2020
  • Elasticsearch is an open source search and analytics engine that can search petabytes of data in near real time. It is designed as a distributed system horizontally scalable and highly available. It provides RESTful APIs, thereby making it programming-language agnostic. Full text search of multilingual text requires language-specific analyzers and field mappings appropriate for indexing and searching multilingual text. Additionally, a language detector can be used in conjunction with the analyzers to improve the multilingual text search. Elasticsearch provides more than 40 language analysis plugins that can process text and extract language-specific tokens and language detector plugins that can determine the language of the given text. This study investigates three different approaches to index and search Chinese, Japanese, and Korean (CJK) text (single analyzer, multi-fields, and language detector-based), and identifies the advantages of the language detector-based approach compared to the other two.

Judging Translated Web Document & Constructing Bilingual Corpus (웹 번역문서 판별과 병렬 말뭉치 구축)

  • Jee-hyung, Kim;Yill-byung, Lee
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.787-789
    • /
    • 2004
  • People frequently feel the need of a general searching tool that frees from language barrier when they find information through the internet. Therefore, it is necessary to have a multilingual parallel corpus to search with a word that includes a search keyword and has a corresponding word in another language, Multilingual parallel corpus can be built and reused effectively through the several processes which are judgment of the web documents, sentence alignment and word alignment. To build a multilingual parallel corpus, multi-lingual dictionary should be constructed in each language and HTML should be simplified. And by understanding the meaning and the statistics of document structure, judgment on translated web documents will be made and the searched web pages will be aligned in sentence unit.

  • PDF

Subject Searching Using Controlled Vocabulary Versus Uncontrolled Vocaburary in Online Catalog System: Focusing on Multilingual Environment

  • Choi, Hee-Yoon
    • Journal of Information Management
    • /
    • v.26 no.2
    • /
    • pp.61-79
    • /
    • 1995
  • The purpose of this paper is to investigate search efficiency of controlled vocabulary versus uncontrolled vocabulary subject access in online catalog systems. The question of the effectiveness of controlled versus uncontrolled vocabulary in information retrieval has been raised in many literatures. A debate continues in the Library and Information Science Professions over the relative merit, appropriateness, and efficiency of uncontrolled vocabulary subject access in online catalog systems. Actually users used to combine uncontrolled vocabulary subject searching with controlled vocabulary subject searching. But the success of user's subject search depends on his choice of search terms. Also the technical developments that facilitate cooperation among information services in general make it increasingly possible for such cooperation to take place on an international level. In this study, several common types of vocabularies on online catalog systems are described and compared, especially usages of vocabularies in multilingual environment are analyzed.

  • PDF

Identifying Similar Overseas Patent Using Word2Vec-Based Semantic Text Analytics (Word2Vec 학습을 통한 의미 기반 해외 유사 특허 검색 방안)

  • Paek, Minji;Kim, Namgyu
    • Journal of Information Technology Services
    • /
    • v.17 no.2
    • /
    • pp.129-142
    • /
    • 2018
  • Recently, the number of patent applications have been increasing rapidly every year as the importance of protecting intellectual property rights becomes more important. Patents must be inventive and have novelty. Especially, the novelty implies that the corresponding invention is not the same as the previous invention. To confirm the novelty, prior art search must be conducted before and after the application. The target of prior art search should include not only Korean patents but also foreign patents. Search of foreign patents should be supported by multilingual search techniques. However, a dictionary-based naive approach shows a limitation because some technical concepts are represented in different terms according to each nation. For example, a Korean term and a Japanese term may not be synonym even though they represent the same technical concept. In this paper, we propose a new method to map semantic similarity between technical terms in Korean patents and Japanese patents. To investigate different representations in each nation for the same technical concept, we identified and analyzed pairs of patents those are mutually connected with priority claim relationship. By performing an experiment with real-world data, we showed that our approach can reveal semantically similar technical terms in other language successfully.

Web Contents Mining System for Real-Time Monitoring of Opinion Information based on Web 2.0 (웹2.0에서 의견정보의 실시간 모니터링을 위한 웹 콘텐츠 마이닝 시스템)

  • Kim, Young-Choon;Joo, Hae-Jong;Choi, Hae-Gill;Cho, Moon-Taek;Kim, Young-Baek;Rhee, Sang-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.1
    • /
    • pp.68-79
    • /
    • 2011
  • This paper focuses on the opinion information extraction and analysis system through Web mining that is based on statistics collected from Web contents. That is, users' opinion information which is scattered across several websites can be automatically analyzed and extracted. The system provides the opinion information search service that enables users to search for real-time positive and negative opinions and check their statistics. Also, users can do real-time search and monitoring about other opinion information by putting keywords in the system. Proposing technique proved that the actual performance is excellent by comparison experiment with other techniques. Performance evaluation of function extracting positive/negative opinion information, the performance evaluation applying dynamic window technique and tokenizer technique for multilingual information retrieval, and the performance evaluation of technique extracting exact multilingual phonetic translation are carried out. The experiment with typical movie review sentence and Wikipedia experiment data as object as that applying example is carried out and the result is analyzed.

Modeling and Implementation of Multilingual Meta-search Service using Open APIs and Ajax (Open API와 Ajax를 이용한 다국어 메타검색 서비스의 모델링 및 구현)

  • Kim, Seon-Jin;Kang, Sin-Jae
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.14 no.5
    • /
    • pp.11-18
    • /
    • 2009
  • Ajax based on Java Script receives attention as an alternative to ActiveX technology. Most portal sites in korea show a tendency to reopen existing services by combining the technology, because it supports most web browsers, and has the advantages of such a brilliant interface, excellent speed, and traffic reduction through asynchronous interaction. This paper modeled and implemented a multilingual meta-search service using the Ajax and open APIs provided by international famous sites. First, a Korean query is translated into one of the language of 54 countries around the world by Google translation API, and then the translated result is used to search the information of the social web sites such as Flickr, Youtube, Daum, and Naver. Searched results are displayed fast by dynamic loading of portion of the screen using Ajax. Our system can reduce server traffic and per-packet communications charges by preventing redundant transmission of unnecessary information.

Analysis on User Interface in Information Retrieval Systems (정보검색시스템에서의 이용자 인터페이스 기능에 관한 분석적 고찰)

  • 서은경
    • Journal of the Korean Society for information Management
    • /
    • v.16 no.4
    • /
    • pp.125-150
    • /
    • 1999
  • This study reviews various aspects of design of user interfaces in interactive information retrieval systems. Specially the study examines, 1) search related interfaces such as query processing, search strategies, and multilingual processing, and 2) browsing related interfaces such as document browsing and search result browsing. The main goals of this review are to characterize user interface techniques in information retrieval systems and to suggest potential future research direction and challenges.

  • PDF

A Study on the Interchangeability between a Thesaurus and an Ontology (시소러스와 온톨로지의 상호 호환성에 관한 연구)

  • Cho, Hyun-Yang;Nam, Young-Joon
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.4 s.54
    • /
    • pp.27-47
    • /
    • 2004
  • In this study, the experiment was made to transform the relationship among terms in a thesaurus to ontology language as search tools for multilingual text. As a result, the equivalent relationship in the thesaurus can be expressed by different ways in the ontology, such as equivalentClass, equivalentProperty, sameAS, and so on. On the other hand, the associative relationship can be represented by ObjectProperty, DatatypeProperty, and inverseOf. For this test, first of all, the descriptors assigned by AAT and the descriptors from bilingual thesaurus by ICCD were translated into Korean. Then, the facet was used for conceptual equivalence among terms from different languages. The result of the study showed that using rdf:Property in ontology was the most effective way of transforming multilingual thesaurus into ontology.

Effective Cross-Lingual Text Retrieval using a Fuzzy Knowledge Base (퍼지 지식베이스를 이용한 효과적인 다언어 문서 검색)

  • Choi, Myeong-Bok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.8 no.1
    • /
    • pp.53-62
    • /
    • 2008
  • Cross-lingual text retrieval(CLTR) is the information retrieval in which a user tries to search a set of documents written in one language for a query another language. This thesis proposes a CLTR system based on fuzzy multilingual thesaurus to handle a partial matching between terms of two different languages. The proposed CLTR system uses a fuzzy term matrix defined in our thesis to perform the information retrieval effectively. In the defined fuzzy term matrix, all relation degrees between terms are inferred from using the transitive closure algorithm to reflect all implicit links between terms into processing of the information retrieval. With this framework, the CLTR system proposed in our thesis enhances the retrieval effectiveness because it is able to emulate a human expert's decision making well in CLTR.

  • PDF