• Title/Summary/Keyword: Similar Sequence Matching

Search Result 47, Processing Time 0.029 seconds

Hybrid Lower-Dimensional Transformation for Similar Sequence Matching (유사 시퀀스 매칭을 위한 하이브리드 저차원 변환)

  • Moon, Yang-Sae;Kim, Jin-Ho
    • The KIPS Transactions:PartD
    • /
    • v.15D no.1
    • /
    • pp.31-40
    • /
    • 2008
  • We generally use lower-dimensional transformations to convert high-dimensional sequences into low-dimensional points in similar sequence matching. These traditional transformations, however, show different characteristics in indexing performance by the type of time-series data. It means that the selection of lower-dimensional transformations makes a significant influence on the indexing performance in similar sequence matching. To solve this problem, in this paper we propose a hybrid approach that integrates multiple transformations and uses them in a single multidimensional index. We first propose a new notion of hybrid lower-dimensional transformation that exploits different lower-dimensional transformations for a sequence. We next define the hybrid distance to compute the distance between the transformed sequences. We then formally prove that the hybrid approach performs the similar sequence matching correctly. We also present the index building and the similar sequence matching algorithms that use the hybrid approach. Experimental results for various time-series data sets show that our hybrid approach outperforms the single transformation-based approach. These results indicate that the hybrid approach can be widely used for various time-series data with different characteristics.

Efficient Stream Sequence Matching Algorithms for Handheld Devices over Time-Series Stream Data (시계열 스트림 데이터 상에서 핸드헬드 디바이스를 위한 효율적인 스트림 시퀀스 매칭 알고리즘)

  • Moon Yang-Sae;Loh Woong-Kee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.8B
    • /
    • pp.736-744
    • /
    • 2006
  • For the handhold devices, minimizing repetitive CPU operations such as multiplications is a major factor for their performances. In this paper, we propose efficient algorithms for finding similar sequences from streaming time-series data such as stock prices, network traffic data, and sensor network data. First, we formally define the problem of similar subsequence matching from streaming time-series data, which is called the stream sequence matching in this paper. Second, based on the window construction mechanism adopted by the previous subsequence matching algorithms, we present an efficient window-based approach that minimizes CPU operations required for stream sequence matching. Third, we propose a notion of window MBR and present two stream sequence matching algorithms based on the notion. Fourth, we formally prove correctness of the proposed algorithms. Finally, through a series of analyses and experiments, we show that our algorithms significantly outperform the naive algorithm. We believe that our window-based algorithms are excellent choices for embedded stream sequence matching in handhold devices.

An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching (집합 유사 시퀀스 매칭의 성능 향상을 위한 인덱스 기반 검색 방법)

  • Lee, Juwon;Lim, Hyo-Sang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.507-520
    • /
    • 2017
  • The set-based similar sequence matching method measures similarity not for an individual data item but for a set grouping multiple data items. In the method, the similarity of two sets is represented as the size of intersection between them. However, there is a critical performances issue for the method in twofold: 1) calculating intersection size is a time consuming process, and 2) the number of set pairs that should be calculated the intersection size is quite large. In this paper, we propose an index-based search method for improving performance of set-based similar sequence matching in order to solve these performance issues. Our method consists of two parts. In the first part, we convert the set similarity problem into the intersection size comparison problem, and then, provide an index structure that accelerates the intersection size calculation. Second, we propose an efficient set-based similar sequence matching method which exploits the proposed index structure. Through experiments, we show that the proposed method reduces the execution time by 30 to 50 times then the existing methods. We also show that the proposed method has scalability since the performance gap becomes larger as the number of data sequences increases.

An Efficient Video Sequence Matching Algorithm (효율적인 비디오 시퀀스 정합 알고리즘)

  • 김상현;박래홍
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.5
    • /
    • pp.45-52
    • /
    • 2004
  • According tothe development of digital media technologies various algorithms for video sequence matching have been proposed to match the video sequences efficiently. A large number of video sequence matching methods have focused on frame-wise query, whereas a relatively few algorithms have been presented for video sequence matching or video shot matching. In this paper, we propose an efficientalgorithm to index the video sequences and to retrieve the sequences for video sequence query. To improve the accuracy and performance of video sequence matching, we employ the Cauchy function as a similarity measure between histograms of consecutive frames, which yields a high performance compared with conventional measures. The key frames extracted from segmented video shots can be used not only for video shot clustering but also for video sequence matching or browsing, where the key frame is defined by the frame that is significantly different from the previous fames. Several key frame extraction algorithms have been proposed, in which similar methods used for shot boundary detection were employed with proper similarity measures. In this paper, we propose the efficient algorithm to extract key frames using the cumulative Cauchy function measure and. compare its performance with that of conventional algorithms. Video sequence matching can be performed by evaluating the similarity between data sets of key frames. To improve the matching efficiency with the set of extracted key frames we employ the Cauchy function and the modified Hausdorff distance. Experimental results with several color video sequences show that the proposed method yields the high matching performance and accuracy with a low computational load compared with conventional algorithms.

Automatic Vowel Sequence Reproduction for a Talking Robot Based on PARCOR Coefficient Template Matching

  • Vo, Nhu Thanh;Sawada, Hideyuki
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.3
    • /
    • pp.215-221
    • /
    • 2016
  • This paper describes an automatic vowel sequence reproduction system for a talking robot built to reproduce the human voice based on the working behavior of the human articulatory system. A sound analysis system is developed to record a sentence spoken by a human (mainly vowel sequences in the Japanese language) and to then analyze that sentence to give the correct command packet so the talking robot can repeat it. An algorithm based on a short-time energy method is developed to separate and count sound phonemes. A matching template using partial correlation coefficients (PARCOR) is applied to detect a voice in the talking robot's database similar to the spoken voice. Combining the sound separation and counting the result with the detection of vowels in human speech, the talking robot can reproduce a vowel sequence similar to the one spoken by the human. Two tests to verify the working behavior of the robot are performed. The results of the tests indicate that the robot can repeat a sequence of vowels spoken by a human with an average success rate of more than 60%.

Physical Database Design for DFT-Based Multidimensional Indexes in Time-Series Databases (시계열 데이터베이스에서 DFT-기반 다차원 인덱스를 위한 물리적 데이터베이스 설계)

  • Kim, Sang-Wook;Kim, Jin-Ho;Han, Byung-ll
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.11
    • /
    • pp.1505-1514
    • /
    • 2004
  • Sequence matching in time-series databases is an operation that finds the data sequences whose changing patterns are similar to that of a query sequence. Typically, sequence matching hires a multi-dimensional index for its efficient processing. In order to alleviate the dimensionality curse problem of the multi-dimensional index in high-dimensional cases, the previous methods for sequence matching apply the Discrete Fourier Transform(DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes of the multi-dimensional index. This paper first points out the problems in such simple methods taking the firs two or three coefficients, and proposes a novel solution to construct the optimal multi -dimensional index. The proposed method analyzes the characteristics of a target database, and identifies the organizing attributes having the best discrimination power based on the analysis. It also determines the optimal number of organizing attributes for efficient sequence matching by using a cost model. To show the effectiveness of the proposed method, we perform a series of experiments. The results show that the Proposed method outperforms the previous ones significantly.

  • PDF

An Adaption of Pattern Sequence-based Electricity Load Forecasting with Match Filtering

  • Chu, Fazheng;Jung, Sung-Hwan
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.5
    • /
    • pp.800-807
    • /
    • 2017
  • The Pattern Sequence-based Forecasting (PSF) is an approach to forecast the behavior of time series based on similar pattern sequences. The innovation of PSF method is to convert the load time series into a label sequence by clustering technique in order to lighten computational burden. However, it brings about a new problem in determining the number of clusters and it is subject to insufficient similar days occasionally. In this paper we proposed an adaption of the PSF method, which introduces a new clustering index to determine the number of clusters and imposes a threshold to solve the problem caused by insufficient similar days. Our experiments showed that the proposed method reduced the mean absolute percentage error (MAPE) about 15%, compared to the PSF method.

A Subsequence Matching Technique that Supports Time Warping Efficiently (타임 워핑을 지원하는 효율적인 서브시퀀스 매칭 기법)

  • Park, Sang-Hyun;Kim, Sang-Wook;Cho, June-Suh;Lee, Hoen-Gil
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.167-179
    • /
    • 2001
  • This paper discusses an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In earlier work, we suggested an efficient method for whole matching under time warping. This method constructs a multidimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality. In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multi-dimensional index using a feature vector as indexing attributes. For query precessing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verily the superiority of our method, we perform extensive experiments. The results reseal that our method achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.

  • PDF

NBR-Safe Transform: Lower-Dimensional Transformation of High-Dimensional MBRs in Similar Sequence Matching (MBR-Safe 변환 : 유사 시퀀스 매칭에서 고차원 MBR의 저차원 변환)

  • Moon, Yang-Sae
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.693-707
    • /
    • 2006
  • To improve performance using a multidimensional index in similar sequence matching, we transform a high-dimensional sequence to a low-dimensional sequence, and then construct a low-dimensional MBR that contains multiple transformed sequences. In this paper we propose a formal method that transforms a high-dimensional MBR itself to a low-dimensional MBR, and show that this method significantly reduces the number of lower-dimensional transformations. To achieve this goal, we first formally define the new notion of MBR-safe. We say that a transform is MBR-safe if a low-dimensional MBR to which a high-dimensional MBR is transformed by the transform contains every individual low-dimensional sequence to which a high-dimensional sequence is transformed. We then propose two MBR-safe transforms based on DFT and DCT, the most representative lower-dimensional transformations. For this, we prove the traditional DFT and DCT are not MBR-safe, and define new transforms, called mbrDFT and mbrDCT, by extending DFT and DCT, respectively. We also formally prove these mbrDFT and mbrDCT are MBR-safe. Moreover, we show that mbrDFT(or mbrDCT) is optimal among the DFT-based(or DCT-based) MBR-safe transforms that directly convert a high-dimensional MBR itself into a low-dimensional MBR. Analytical and experimental results show that the proposed mbrDFT and mbrDCT reduce the number of lower-dimensional transformations drastically, and improve performance significantly compared with the $na\"{\i}ve$ transforms. These results indicate that our MBR- safe transforms provides a useful framework for a variety of applications that require the lower-dimensional transformation of high-dimensional MBRs.

Fast Self-Similar Network Traffic Generation Based on FGN and Daubechies Wavelets (FGN과 Daubechies Wavelets을 이용한 빠른 Self-Similar 네트워크 Traffic의 생성)

  • Jeong, Hae-Duck;Lee, Jong-Suk
    • The KIPS Transactions:PartC
    • /
    • v.11C no.5
    • /
    • pp.621-632
    • /
    • 2004
  • Recent measurement studies of real teletraffic data in modern telecommunication networks have shown that self-similar (or fractal) processes may provide better models of teletraffic in modern telecommunication networks than Poisson processes. If this is not taken into account, it can lead to inaccurate conclusions about performance of telecommunication networks. Thus, an important requirement for conducting simulation studies of telecommunication networks is the ability to generate long synthetic stochastic self-similar sequences. A new generator of pseu-do-random self-similar sequences, based on the fractional Gaussian nois and a wavelet transform, is proposed and analysed in this paper. Specifically, this generator uses Daubechies wavelets. The motivation behind this selection of wavelets is that Daubechies wavelets lead to more accurate results by better matching the self-similar structure of long range dependent processes, than other types of wavelets. The statistical accuracy and time required to produce sequences of a given (long) length are experimentally studied. This generator shows a high level of accuracy of the output data (in the sense of the Hurst parameter) and is fast. Its theoretical algorithmic complexity is 0(n).