JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Grid-based Index Generation and k-nearest-neighbor Join Query-processing Algorithm using MapReduce
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
  • Journal title : Journal of KIISE
  • Volume 42, Issue 11,  2015, pp.1303-1313
  • Publisher : Korean Institute of Information Scientists and Engineers
  • DOI : 10.5626/JOK.2015.42.11.1303
 Title & Authors
Grid-based Index Generation and k-nearest-neighbor Join Query-processing Algorithm using MapReduce
Jang, Miyoung; Chang, Jae Woo;
 
 Abstract
MapReduce provides high levels of system scalability and fault tolerance for large-size data processing. A MapReduce-based k-nearest-neighbor(k-NN) join algorithm seeks to produce the k nearest-neighbors of each point of a dataset from another dataset. The algorithm has been considered important in bigdata analysis. However, the existing k-NN join query-processing algorithm suffers from a high index-construction cost that makes it unsuitable for the processing of bigdata. To solve the corresponding problems, we propose a new grid-based, k-NN join query-processing algorithm. Our algorithm retrieves only the neighboring data from a query cell and sends them to each MapReduce task, making it possible to improve the overhead data transmission and computation. Our performance analysis shows that our algorithm outperforms the existing scheme by up to seven-fold in terms of the query-processing time, while also achieving high extent of query-result accuracy.
 Keywords
distributed-data processing algorithm;MapReduce;k-NN join query-processing algorithm;grid index;
 Language
Korean
 Cited by
 References
1.
EMC ANNUAL REPORT, [online]. Available: http://korea.emc.com/corporate/annual-report/big-data.htm (accessed 2014, Feb, 01)

2.
Y. S. Min, H. Y. Kim, Y. K. Kim, "A Trend to Distributed File Systems for Cloud Computing," Journal of KIISE, Vol. 27, No. 5, pp. 86-94, May. 2009. (in Korean)

3.
Apache Software Foundation, Hadoop MapReduce [online]. Available: http://hadoop.apache.org/mapreduce

4.
J. Dean, S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Operating System Design and Implementation, 2004.

5.
D. Jiang, B. C. Ooi, L. Shi, S. Wu, "The performance of MapReduce: An In-depth Study," Proc. of the PVLDB, 2010.

6.
E. M. Knorr, R. T. Ng, "Algorithms for Mining Distance-based Outlier in Large Datasets," Proc. of the VLDB, 1998.

7.
M. M. Breuning, H. -P. Kriegel, R. T. NG, J. Sander, "Lof: Identifying Density-based Local Outliers," Proc. of the SIGMOD, 2000.

8.
C. Bohm and H.-P. Kriegel, "A Cost Model and Index Architecture for the Similarity Join," Proc. of the ICDE, 2001.

9.
C. Xia, H. Lu, B. C. Ooi, J. Hu. Gorder, "An efficient method for knn join processing," Proc. of the VLDB, 2004.

10.
C. Yu, B. Cui, S. Wang, and J. Su, "Efficient Index-based knn Join Processing for High-dimensional Data," Information and Software Technology, 2007.

11.
B. Yao, F. Li, P. Kumar, "K Nearest Neighbor Queries and knn-joins in Large Relational Databases (almost) for Free," Proc. of the ICDE, 2010.

12.
Lu Wei, Shen Su, Chen Beng, Chin Ooi, "Efficient Processing of k Nearest Neighbor Joins Using Mapreduce," Proc. of the PVLDB, 2012.

13.
C. Zhang, F Li, J. Jestes, "Efficient Parallel kNN joins for Large Data in MapReduce," Proc. of the EDBT: 15th International Conference on Extending Database Technology, 2012.

14.
H. Kllapi, B. Harb, C. Yu, "Near Neighbor Join," Proc. of International Conference of Data Engineering, 2014.

15.
S, Yang, M. A. Cheema, X. Lin, W. Wang, "Reverse k Nearest Neighbors Query Processing:Experiments and Analysis," Proc. of the VLDB, 2015.

16.
C. Yu, B. C. Ooi, K. -L. Tan, H. V. Jagadish, "Indexing the Distance: An Efficient Method to knn Processing," Proc. of the VLDB, 2001.

17.
Moderate Resolution Imaging Spectroradiometer. [Online]. Available: http://modis.gsfc.nasa.gov/data/ (downloaded 2014, Feb. 01)