• Title/Summary/Keyword: text search

Search Result 534, Processing Time 0.029 seconds

Improving Elasticsearch for Chinese, Japanese, and Korean Text Search through Language Detector

  • Kim, Ki-Ju;Cho, Young-Bok
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.1
    • /
    • pp.33-38
    • /
    • 2020
  • Elasticsearch is an open source search and analytics engine that can search petabytes of data in near real time. It is designed as a distributed system horizontally scalable and highly available. It provides RESTful APIs, thereby making it programming-language agnostic. Full text search of multilingual text requires language-specific analyzers and field mappings appropriate for indexing and searching multilingual text. Additionally, a language detector can be used in conjunction with the analyzers to improve the multilingual text search. Elasticsearch provides more than 40 language analysis plugins that can process text and extract language-specific tokens and language detector plugins that can determine the language of the given text. This study investigates three different approaches to index and search Chinese, Japanese, and Korean (CJK) text (single analyzer, multi-fields, and language detector-based), and identifies the advantages of the language detector-based approach compared to the other two.

Usability Evaluation of Text-based Search and Visual Search of a Multidisciplinary Library Database (상용 학술데이터베이스의 텍스트 기반 검색과 비주얼검색의 사용성에 관한 연구)

  • Kim, Jong-Ae
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.111-129
    • /
    • 2009
  • This study examined the usability of text-based search and visual search of a large multidisciplinary library database to provide an empirical analysis of the acceptability of visual systems in the information retrieval environment. It also examined if there are differences in the usability assessment based on experimental order. The results indicated that the text-based search supported users' search behaviors more efficiently than the visual search. Also the text-based search was rated higher than the visual search in terms of user perceptions of four usability factors.

Consideration of a Robust Search Methodology that could be used in Full-Text Information Retrieval Systems (퍼지 논리를 이용한 사용자 중심적인 Full-Text 검색방법에 관한 연구)

  • Lee, Won-Bu
    • Asia pacific journal of information systems
    • /
    • v.1 no.1
    • /
    • pp.87-101
    • /
    • 1991
  • The primary purpose of this study was to investigate a robust search methodology that could be used in full-text information retrieval systems. A robust search methodology is one that can be easily used by a variety of users (particularly naive users) and it will give them comparable search performance regardless of their different expertise or interests In order to develop a possibly robust search methodology, a fully functional prototype of a fuzzy knowledge based information retrieval system was developed. Also, an experiment that used this prototype information retreival system was designed to investigate the performance of that search methodology over a small exploratory sample of user queries To probe the relatonships between the possibly robust search performance and the query organization using fuzzy inference logic, the search performance of a shallow query structure was analyzes. Consequently the following several noteworthy findings were obtained: 1) the hierachical(tree type) query structure might be a better query organization than the linear type query structure 2) comparing with the complex tree query structure, the simple tree query structure that has at most three levels of query might provide better search performance 3) the fuzzy search methodology that employs a proper levels of cut-off value might provide more efficient search performance than the boolean search methodology. Even though findings could not be statistically verified because the experiments were done using a single replication, it is worth noting however, that the research findings provided valuable information for developing a possibly robust search methodology in full-text information retrieval.

  • PDF

Implementation of Information Retrieval System for Full-Text (전문에 대한 검색시스템의 구현)

  • 김대규;정희택;강영만;한순희;조혁현
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2000.10a
    • /
    • pp.337-340
    • /
    • 2000
  • Using the Information Retrieval systems on the Internet, the demand of exact and specific information has also been popularized. To offer exact information, there k3 been generalized demand of searching from the keyword of the shortened text and also of the full-text. This study is to suggest a scheme for full-text searches. It is to compare the existing scheme of information search and full-text information search with interMedia text. We suggest search methods for the full-text.

  • PDF

PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

  • Kim, Tae-Kyung;Oh, Jeong-Su;Ko, Gun-Hwan;Cho, Wan-Sup;Hou, Bo-Kyeng;Lee, Sang-Hyuk
    • Interdisciplinary Bio Central
    • /
    • v.3 no.2
    • /
    • pp.7.1-7.6
    • /
    • 2011
  • Background: Published manuscripts are the main source of biological knowledge. Since the manual examination is almost impossible due to the huge volume of literature data (approximately 19 million abstracts in PubMed), intelligent text mining systems are of great utility for knowledge discovery. However, most of current text mining tools have limited applicability because of i) providing abstract-based search rather than sentence-based search, ii) improper use or lack of ontology terms, iii) the design to be used for specific subjects, or iv) slow response time that hampers web services and real time applications. Results: We introduce an advanced text mining system called PubMine that supports intelligent knowledge discovery based on diverse bio-ontologies. PubMine improves query accuracy and flexibility with advanced search capabilities of fuzzy search, wildcard search, proximity search, range search, and the Boolean combinations. Furthermore, PubMine allows users to extract multi-dimensional relationships between genes, diseases, and chemical compounds by using OLAP (On-Line Analytical Processing) techniques. The HUGO gene symbols and the MeSH ontology for diseases, chemical compounds, and anatomy have been included in the current version of PubMine, which is freely available at http://pubmine.kobic.re.kr. Conclusions: PubMine is a unique bio-text mining system that provides flexible searches and analysis of biological entity relationships. We believe that PubMine would serve as a key bioinformatics utility due to its rapid response to enable web services for community and to the flexibility to accommodate general ontology.

A Study on the Retrieval Effectiveness Based on Image Query Types (이미지 인지 유형 및 검색질의 방식에 따른 검색 효율성에 관한 연구)

  • Kim, Seonghee;Yi, Keunyoung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.47 no.3
    • /
    • pp.321-342
    • /
    • 2013
  • The purpose of this study was to compare and evaluate retrieval effectiveness of three types of image perception using different retrieval methods. Image types included specific, general, and abstract topics. The retrieval method included text only search, query by example (QBE) search, and a hybrid/hybrid search. Thirty-two college students were recruited for searching topics using Google image search system. The search results were compared with One-Way and Two-Way ANOVA. As a result, text search and hybrid search showed advantage when searching for specific and general topics. On the other hand, the QBE search performed better than both the text-only and hybrid search for abstract topics. The results have implications for the implementation of image retrieval systems.

Slab Region Localization for Text Extraction using SIFT Features (문자열 검출을 위한 슬라브 영역 추정)

  • Choi, Jong-Hyun;Choi, Sung-Hoo;Yun, Jong-Pil;Koo, Keun-Hwi;Kim, Sang-Woo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.5
    • /
    • pp.1025-1034
    • /
    • 2009
  • In steel making production line, steel slabs are given a unique identification number. This identification number, Slab management number(SMN), gives information about the use of the slab. Identification of SMN has been done by humans for several years, but this is expensive and not accurate and it has been a heavy burden on the workers. Consequently, to improve efficiency, automatic recognition system is desirable. Generally, a recognition system consists of text localization, text extraction, character segmentation, and character recognition. For exact SMN identification, all the stage of the recognition system must be successful. In particular, the text localization is great important stage and difficult to process. However, because of many text-like patterns in a complex background and high fuzziness between the slab and background, directly extracting text region is difficult to process. If the slab region including SMN can be detected precisely, text localization algorithm will be able to be developed on the more simple method and the processing time of the overall recognition system will be reduced. This paper describes about the slab region localization using SIFT(Scale Invariant Feature Transform) features in the image. First, SIFT algorithm is applied the captured background and slab image, then features of two images are matched by Nearest Neighbor(NN) algorithm. However, correct matching rate can be low when two images are matched. Thus, to remove incorrect match between the features of two images, geometric locations of the matched two feature points are used. Finally, search rectangle method is performed in correct matching features, and then the top boundary and side boundaries of the slab region are determined. For this processes, we can reduce search region for extraction of SMN from the slab image. Most cases, to extract text region, search region is heuristically fixed [1][2]. However, the proposed algorithm is more analytic than other algorithms, because the search region is not fixed and the slab region is searched in the whole image. Experimental results show that the proposed algorithm has a good performance.

The Prefix Array for Multimedia Information Retrieval in the Real-Time Stenograph (실시간 속기 자막 환경에서 멀티미디어 정보 검색을 위한 Prefix Array)

  • Kim, Dong-Joo;Kim, Han-Woo
    • Proceedings of the KIEE Conference
    • /
    • 2006.10c
    • /
    • pp.521-523
    • /
    • 2006
  • This paper proposes an algorithm and its data structure to support real-time full-text search for the streamed or broadcasted multimedia data containing real-time stenograph text. Since the traditional indexing method used at information retrieval area uses the linguistic information, there is a heavy cost. Therefore, we propose the algorithm and its data structure based on suffix array, which is a simple data structure and has low space complexity. Suffix array is useful frequently to search for huge text. However, subtitle text of multimedia data is to get longer by time. Therefore, suffix array must be reconstructed because subtitle text is continually changed. We propose the data structure called prefix array and search algorithm using it.

  • PDF

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

A Study on Developing a Metadata Search System Based on the Text Structure of Korean Studies Research Articles (한국학 연구 논문의 텍스트 구조 기반 메타데이터 검색 시스템 개발 연구)

  • Song, Min-Sun;Ko, Young Man;Lee, Seung-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.3
    • /
    • pp.155-176
    • /
    • 2016
  • This study aims to develope a scholarly metadata information system based on conceptual elements of text structure of Korean studies research articles and to identify the applicability of text structure based metadata as compared with the existing similar system. For the study, we constructed a database(Korean Studies Metadata Database, KMD) with text structure based on metadata of Korean Studies journal articles selected from the Korea Citation Index(KCI). Then we verified differences between KCI system and KMD system through search results using same keywords. As a result, KMD system shows the search results which meet the users' intention of searching more efficiently in comparison with the KCI system. In other words, even if keyword combinations and conditional expressions of searching execution are same, KMD system can directly present the content of research purposes, research data, and spatial-temporal contexts of research et cetera as search results through the search procedure.