• Title/Summary/Keyword: locality sensitive hash

Search Result 3, Processing Time 0.021 seconds

Improving the Lifetime of NAND Flash-based Storages by Min-hash Assisted Delta Compression Engine (MADE (Minhash-Assisted Delta Compression Engine) : 델타 압축 기반의 낸드 플래시 저장장치 내구성 향상 기법)

  • Kwon, Hyoukjun;Kim, Dohyun;Park, Jisung;Kim, Jihong
    • Journal of KIISE
    • /
    • v.42 no.9
    • /
    • pp.1078-1089
    • /
    • 2015
  • In this paper, we propose the Min-hash Assisted Delta-compression Engine(MADE) to improve the lifetime of NAND flash-based storages at the device level. MADE effectively reduces the write traffic to NAND flash through the use of a novel delta compression scheme. The delta compression performance was optimized by introducing min-hash based LSH(Locality Sensitive Hash) and efficiently combining it with our delta compression method. We also developed a delta encoding technique that has functionality equivalent to deduplication and lossless compression. The results of our experiment show that MADE reduces the amount of data written on NAND flash by up to 90%, which is better than a simple combination of deduplication and lossless compression schemes by 12% on average.

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

Object Classification based on Weakly Supervised E2LSH and Saliency map Weighting

  • Zhao, Yongwei;Li, Bicheng;Liu, Xin;Ke, Shengcai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.1
    • /
    • pp.364-380
    • /
    • 2016
  • The most popular approach in object classification is based on the bag of visual-words model, which has several fundamental problems that restricting the performance of this method, such as low time efficiency, the synonym and polysemy of visual words, and the lack of spatial information between visual words. In view of this, an object classification based on weakly supervised E2LSH and saliency map weighting is proposed. Firstly, E2LSH (Exact Euclidean Locality Sensitive Hashing) is employed to generate a group of weakly randomized visual dictionary by clustering SIFT features of the training dataset, and the selecting process of hash functions is effectively supervised inspired by the random forest ideas to reduce the randomcity of E2LSH. Secondly, graph-based visual saliency (GBVS) algorithm is applied to detect the saliency map of different images and weight the visual words according to the saliency prior. Finally, saliency map weighted visual language model is carried out to accomplish object classification. Experimental results datasets of Pascal 2007 and Caltech-256 indicate that the distinguishability of objects is effectively improved and our method is superior to the state-of-the-art object classification methods.