An Efficient Bulk Loading for High Dimensional Index Structures

고차원 색인 구조를 위한 효율적인 벌크 로딩

  • Bok, Kyoung-Soo (Dept. of Information Communication Engineering, Graduate School of Chungbuk National University) ;
  • Lee, Seok-Hee (Dept.of Internet BroadCast, Dongah Broadcasting College) ;
  • Cho, Ki-Hyung (Dept. of Electrical Elecronic Engineering, Chungbuk National University) ;
  • Yoo, Jae-Soo (Dept. of Information Communication Engineering, Chungbuk National University)
  • 복경수 (충북대학교 대학원 정보통신공학과) ;
  • 이석희 (동아방송대학 인터넷방송과) ;
  • 조기형 (충북대학교 전기전자공학부) ;
  • 유재수 (충북대학교 정보통신공학과)
  • Published : 2000.08.01

Abstract

Existing bulk loading algorithms for multi-dimensional index structures suffer from satisfying both index construction time and retrieval perfonnancc. In this paper, we propose an efficient bulk loading algorithm to construct high dimensional index structures for large data set that overcomes the problem. Although several bulk loading algorithms have been proposed for this purpose, none of them improve both constnlCtion time and search performance. To improve the construction time, we don't sort whole data set and use bisectiou algorithm that divides the whole data set or a subset into two partitions according to the specific pivot value. Also, we improve the search performance by selecting split positions according to the distribution properties of the data set. We show that the proposed algorithm is superior to existing algorithms in terms of construction time and search perfomlance through various experiments.

다차원 색인 구조를 위한 기존의 벌크 로딩 알고리즘은 색인 구성 시간과 검색 성능 모두를 향상시키지 못하는 문제점을 갖는다. 이 논문은 이와 같은 문제점을 해결한 대량의 고차원 데이터에 대한 색인 구조를 위한 새로운 벌크 로딩 알고리즘을 제안한다. 제안한는 알고리즘은 색인을 구성하는 시간을 단축시키기 위해 전체 데이터 집합을 정렬하는 것이 아니라 데이터의 특성을 파악하여 피벗 값에 따라 분할하는 기법을 이용한다. 또한 검색 성능을 향상시키기 위해 데이터들의 분포 특성에 따라 분할 위치를 선택한다. 실험을 통해 제안하는 알고리즘의 기존의 알고리즘보다 색인 구성 시간과 검색 성능 측면에서 우수함을 보인다.

Keywords

References

  1. Guttman A., 'R-trees : A Dynamic Index Structure for Spatial Searching,' ACM SIGMOD, pp.47-57, 1984
  2. Beckmann N., Kriegel H. P., Schneider R., Seeger B., 'The R*-tree : An Efficient and Robust Access Method for Points and Rectangles,' ACM SIGMOD, pp.322-331, May, 1990
  3. K.I. Lin, H. Jagadish, and C. Faloutsos, 'The TV-tree - An Index Structure for High Dimensional Data,' VLDB Journal, Vol.3, pp.517-542, 1994 https://doi.org/10.1007/BF01231606
  4. Berchtold S., Keim D. A., Kriegel H. P., 'The X-tree : An Index Structure for High-Dimensional Data,' VLDB Conference, pp.28-39, 1996
  5. Roussopoulos N., Keifker D., 'Direct Spatial Search on Pictorial Databases Packed R-trees,' Proc. ACM SIGMOD Conference, pp.17-31, 1985 https://doi.org/10.1145/318898.318900
  6. Kamel I., Falousos C., 'On Packing R-trees,' CIKM, pp.490-499, 1993 https://doi.org/10.1145/170088.170403
  7. Leutenegger S. T., Lopez M. A., Edgington J., 'STR : A Simple and Efficient Algorithm for R-Tree Packing,' ICDE, pp.497-506, 1997
  8. Garcia Y. J., Lopez M. A., Leutenegger S. T., 'A Greedy Algorithm for Bulk Loading R-Trees,' ACM GIS, pp.163-164, 1998
  9. Van den Bercken J., Seeger B., Widmayer., 'A General Approach to Bulk Loading Multidimensional Index Structures,' VLDB Conference, pp. 406-415, 1997
  10. Arge L., 'The Buffer Tree : A New Technique for Optimal I/O-Algorithms,' WADS, pp.334-345, 1995
  11. Berchtold S., Bohm C., Kriegel H. P., 'Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations,' EDBT, pp.216-230, 1998
  12. Bially T., 'Space-Filling Curves : Their Generation and Their Application to Bandwidth Reduction,' IEEE Trans. on Information Theory, Vol.IT-15, No.6, pp.658-664, 1969 https://doi.org/10.1109/TIT.1969.1054385
  13. Lo M. N., Ravishankar C. V., 'Generating Seeded Trees from Data Sets,' SSD, pp.328-347, 1995
  14. Berchtold S., Bohm C., Keim D. A., Kriegel H. P., 'A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space,' PODS, pp.78-86, 1997 https://doi.org/10.1145/263661.263671