DOI QR코드

DOI QR Code

Load Shedding via Predicting the Frequency of Tuple for Efficient Analsis over Data Streams

효율적 데이터 스트림 분석을 위한 발생빈도 예측 기법을 이용한 과부하 처리

  • 장중혁 (연세대학교 소프트웨어응용연구소)
  • Published : 2006.10.31

Abstract

In recent, data streams are generated in various application fields such as a ubiquitous computing and a sensor network, and various algorithms are actively proposed for processing data streams efficiently. They mainly focus on the restriction of their memory usage and minimization of their processing time per data element. However, in the algorithms, if data elements of a data stream are generated in a rapid rate for a time unit, some of the data elements cannot be processed in real time. Therefore, an efficient load shedding technique is required to process data streams effcientlv. For this purpose, a load shedding technique over a data stream is proposed in this paper, which is based on the predicting technique of the frequency of data element considering its current frequency. In the proposed technique, considering the change of the data stream, its threshold for tuple alive is controlled adaptively. It can help to prevent unnecessary load shedding.

근래 들어 유비쿼터스 컴퓨팅 및 센서 네트워크 환경 등과 같은 다양한 응용 분야에서 데이터 스트림 형태의 정보를 발생시키고 있으며, 이들 정보를 효율적으로 처리하기 위한 다양한 방법들이 활발히 제안되어 왔다. 대부분의 이들 방법들은 주로 처리 과정에서의 공간 사용량 및 데이터당 처리 시간을 줄이는데 초점을 맞추고 있다. 하지만 이들 방법들에서 데이터 발생량이 급격히 증가되는 경우 일부 데이터는 실시간으로 처리되지 못하며 해당 방법의 성능 저하를 초래한다. 따라서, 데이터 스트림 처리의 효율성을 높이기 위해서는 효율적인 과부하 처리 기법을 필요로 한다. 이를 위해서 본 논문에서는 발생빈도 예측법을 이용한 과부하 처리 기법을 제안한다. 즉, 해당 기법에서는 처리 대상 데이터의 현재 시점까지의 발생빈도를 고려하여 해당 데이터의 향후 발생 상황을 예측하며, 이를 통해서 해당 데이터 스트림에서 과부하가 발생했을 때 효율적으로 대처할 수 있도록 지원한다. 또한, 제안되는 방법에서는 데이터 스트림의 변화를 고려하여 튜플 선별을 위한 임계값을 적응적으로 조절함으로써 불필요한 과부하 처리 수행을 최소화한다.

Keywords

References

  1. J.M. Hellerstein, W. Hong and S.R. Madden, The Sensor Spectrum: Technology, Trends, and Requirements. ACM SIGMOD Record, Vol.32, No.4, pp.22-27, 2003 https://doi.org/10.1145/959060.959065
  2. J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In Proceedings of the ACM International Conference on Management of Data, pp.379-390, 2000
  3. Y. Zhu, D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In Proceedings of the 28th International Conference on Very Large Data Bases, pp.358-369, 2002
  4. C. Cortes, K. Fisher, D. Pregibon, A. Rogers, and F. Smith. Hancock: A Language for Extracting Signatures from Data Streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.9-17, 2000
  5. D.J. Abadi, D. Carney, U. Cetinternel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S.B. Zdonik. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal, Vol.12, No.2, pp.120-139, 2003 https://doi.org/10.1007/s00778-003-0095-z
  6. R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query Processing, Approximation, and Resource Management in a Data Stream Management System. In Proceedings of the 1st Biennial Conference on Innovative Data Systems Research, pp.245-256, 2003
  7. J. Kang, J.F. Naughton, and S. D. Viglas. Evaluating Window Joins over Unbounded Streams. In Proceedings of the 19th International Conference on Data Engineering, pp.341-352, 2003
  8. M. Garofalakis, J. Gehrke, and R. Rastogi. Querying and Mining Data Streams: You Only Get One Look. In the tutorial notes of the 28th International Conference on Very Large Data Bases, 2002
  9. M. Datar, A. Gionis, P. Indyk, and R. Motawi, Maintaining Stream Statistics over Sliding Windows, In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp.635-644, 2002
  10. D. Lambert and J.C. Pinheiro, Mining a Stream of Transactions for Customer Patterns, In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.305-310, 2001 https://doi.org/10.1145/502512.502556
  11. R. Avnur and J. M. Hellerstein. Eddies: Continuously Adaptive Query Processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.261-272, 2000
  12. A. Das, J. Gehrke, and M. Riedewald. Approximate Join Processing over Data Streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.40-51, 2003 https://doi.org/10.1145/872757.872765
  13. B. Babcock, M. Datar, and R. Motwani. Load Shedding for Aggregation Queries over Data Streams. In Proceedings of the 19th International Conference on Data Engineering, pp.350-361, 2004
  14. A. Arasu, S. Babu, and J. Widom. An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Stanford University Technical Report 2002-57, 2002