DOI QR코드

DOI QR Code

TTT 타점법을 이용한 웹서버 파일 분포의 후미성 분석

A Analysis of Heavy Tailed Distribution for Files in Web Servers Using TTT Plot Technique

  • 정성무 (한국교육학술정보원) ;
  • 이상용 (아주대학교 대학원 산업공학과) ;
  • 장중순 (아주대학교 산업정보시스템 공학부) ;
  • 송재신 (한국교육학술정보원) ;
  • 유해영 (단국대학교 정보 컴퓨터과학부) ;
  • 최경희 (아주대학교 정보 및 컴퓨터공학부)
  • 발행 : 2003.08.01

초록

본 논문에서는 TTT 타점법을 이용하여 웹 서버가 서비스하는 파일의 크기에 대한 통계적 분포는 꼬리부분이 두꺼운 분포라는 것을 판단하는 방법을 제시한다. TTT 타점법은 신뢰성 공학에서 사용되는 방법으로써 TTT 통계량 타점결과의 직선성으로 지수분포 여부를 판단하는 방법이다. 본 연구에서 제안하는 방법을 모의실험과 실제 운영중인 웹서버의 자료를 사용하여 실험한 결과, 기존의 방법인 Hill 추정법과 LLCD 타점법에 비하여 후미성을 정확하게 판단하고 있으며, 판단의 효율성 면에서도 그들보다 우수하다는 것을 확인하였다. 특히 제안하는 방법은 기존의 방법이 웹서버의 파일 분포판정이나 통계학에서의 파레토 분포 판정시 나타날 수 있는 판정의 오류 가능성을 개선할 수 있다는 점도 확인하였다.

In this paper, we propose a method of analysis to show the heavy-tailed statistical distribution of file sizes in web servers, using TTT plot technique. TTT plot technique, a well-known method in the area of reliability engineering, determines that a distribution of samples fellows a heavy tailed one when their TTT statistical plots are lied on a straight line. We performed an intensive simulation using data gathered from real web servers. The simulation indicates that the proposed method is superior to Hill estimation technique or LLCD plot method in efficiency of data analysis. Moreover, the proposed method eliminates the possible decision error, which Pareto distribution or traditional method might cause.

키워드

참고문헌

  1. A. Bestavros, 'Discovering Spatial Locallity in WWW Access Patterns using Data Mining of Document Clusters in Server Logs,' Technical Report, TR-97-016, Computer Sci. dept. Boston Univ., August, 1997
  2. Binzhang Liu, Ghaleb Abdulla, Tommy Johnson and Edward A. Fox, 'Web Response Time and Proxy Caching,' Technical Report, TR-98-07, Computer Sci. Dept. Virginia Polytechnic Inst. and State University, March, 1998
  3. Daniel Andresen and Tao Yang, 'Adaptive Scheduling with Client Resources to Improve WWW Server Scal-ability,' Technical Report, TRC-96-27, Department of Computer Science UC Santa Barbara, November, 1996
  4. E. P. Markatos, 'Main Memory Caching of web documents,' Proceedings of the 5th International World Wide Web Conference, Paris, May, 1996 https://doi.org/10.1016/0169-7552(96)00035-9
  5. Hill B. M., 'A Simple General Approach to Inference about the Tail of a Distribution,' Ann. Statist., Vol.3, No.5, pp.1163-1174, 1975 https://doi.org/10.1214/aos/1176343247
  6. J. Dilly, 'Web Server Workload Characterization,' www.hp1.hp.com, Hewlett packard Co., December, 1996
  7. M. E. Crovella and A. Bestavros, 'Self-similarity in world wide web traffic : Evidence and possible causes,' In Proceedings of the 1996 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, May, 1996
  8. M. E. Crovella and M. Taqqu, 'Estimating the Heavy Tail Index from Scaling Properties,' Methodology and Computing in Applied probability, Vol.1, No.1, 1999 https://doi.org/10.1023/A:1010012224103
  9. M. E. Crovella, Robert Frangioso and Mor Harchol Balter, 'Connection Scheduling in Web Servers,' Technical Report, Computer Sci. Lab. M.I.T Univ., March, 1999
  10. M. E. Crovella, M. Taqqu, A. Bestavros, 'Heavy-Tailed Probability Distribution in the World wide web,' clteseer.ni.nec.com/crovella98heavytailed.html
  11. M. F. Arlitt and T. Jin, 'Workload Charcterization of the 1998 World Cup Web Site,' Internet Systems and Application Laborator, HPL-1999-35, Hewlett Packard Co., September, 1999
  12. M. Kratz and S. I. Resnick, 'The QQ Estimator and Heavy Tails,' Comm. Statist.-Stoch. Models, Vol.12, No.4, pp.699-724, 1996 https://doi.org/10.1080/15326349608807407
  13. P. Barford and M. E. Crovella, 'Generating Representative Web Worloads for Network and Server Performance Evaluation,' Proceedings of ACM SIGMETRICS 98, Madison, WI, pp.151-160, June, 1998
  14. S. Resenick, 'Heavy Tail Modeling and Teletraffic Data,' The Annual of Atatistics, Vol.25, No.5, pp.1805-1869, 1997 https://doi.org/10.1214/aos/1069362376
  15. T. Spangler, 'Promising Satellite Services Emerge as Alternative to Earthbound Lines, Internet World, March, 1998
  16. ftp://ita.ee.lbl.gov/traces/NASA_access_log_Aug95.gz
  17. Yu Hayakawa, 'The Total Time on Test Statistics and L1-Isotropy,' Intl. J. Reliability, Quality and Safety Eng., Vol. 7, No. 2, pp. 143-151, 2000 https://doi.org/10.1142/S0218539300000122