Advanced SearchSearch Tips
Anomaly Detection of Hadoop Log Data Using Moving Average and 3-Sigma
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Anomaly Detection of Hadoop Log Data Using Moving Average and 3-Sigma
Son, Siwoon; Gil, Myeong-Seon; Moon, Yang-Sae; Won, Hee-Sun;
  PDF(new window)
In recent years, there have been many research efforts on Big Data, and many companies developed a variety of relevant products. Accordingly, we are able to store and analyze a large volume of log data, which have been difficult to be handled in the traditional computing environment. To handle a large volume of log data, which rapidly occur in multiple servers, in this paper we design a new data storage architecture to efficiently analyze those big log data through Apache Hive. We then design and implement anomaly detection methods, which identify abnormal status of servers from log data, based on moving average and 3-sigma techniques. We also show effectiveness of the proposed detection methods by demonstrating that our methods identifies anomalies correctly. These results show that our anomaly detection is an excellent approach for properly detecting anomalies from Hadoop log data.
Big Data;Apache Hadoop;Apache Hive;Log Data;Anomaly Detection;
 Cited by
J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. Byers, "Big Data: The Next Frontier for Innovation, Competition, and Productivity," Technical Report, McKinsey Global Institute, 2011.

T. Rabl, M. Sadoghi, H.-A. Jacobsen, S. Gomez-Villamor, V. Muntes-Mulero, and S. Mankowskii, "Solving Big Data Challenges for Enterprise Application Performance Management," in Proc. of the VLDB Endowment, Vol.5, No. 12, pp.1724-1735, Aug., 2012. crossref(new window)

M. Saecker and V. Markl, "Big Data Analytics on Modern Hardware Architectures: A Technology Survey," Springer Lecture Notes in Business Information Processing, Vol.138, pp.125-149, 2013. crossref(new window)

Hadoop [Internet],

C. Lam and J. warren, "Hadoop in Action," Manning Publications, 2010.

T. White, "Hadoop: The Definitive Guide," O'Reilly Media, Yahoo! Press, June, 2009.

HDFS [Internet],

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System," in Proc. of the 26th IEEE Symp. on Mass Storage Systems and Technologies(MSST), Lake Tahoe, Nevada, pp.1-10, May, 2010.

Dhruba Borthakur, "The Hadoop Distributed File System: Architecture and Design," Technical Report, pp.1-14, 2007,

J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Communications of the ACM, Vol.51, No.1, pp.107-113, Jan., 2008.

J. Dean and S. Ghemawat, "MapReduce: a Flexible Data Processing Tool," Communications of the ACM, Vol.54, No.1, pp.72-77, Jan., 2010.

S. Lee, J. Kim, Y.-S. Moon, and W.-K. Loh, "Iceberg Cube Parallel Computation using MapReduce," Korea Computer Congress, Vol.37, No.1(A), pp.25-26, June, 2010.

H. Lee, M. Kim, H. Lee, and H. Yoon, "Design and Implementation of an Analysis module based on MapReduce for Large-scalable Social Data," Korea Computer Congress, Vol.38, No.1(B), pp.357-360, June, 2011.

G. Kim, G. Nam, and U. Kim, "Analysis and Statistics of Domestic Dam Based on MapReduce," Korean Society for Internet Information, pp.131-132, Nov., 2013.

D.-S. Choi, G.-J. Mun, Y.-M. Kim, and B.-N. Noh, "An Analysis of Large-Scale Security Log using MapReduce," Korean Institute of Information Technology, Vol.9, No.8, pp. 125-132, Aug., 2011.

Hive [Internet],

A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Authony, H. Liu, P. Wyckoff, and R. Murthy, "Hive: a Warehousing Solution over a Map-Reduce Framework," in Proc. of the VLDB Endowment, Vol.2, Issue 2, Aug., 2009.

J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy, "Hive - a petabyte scale data warehouse using Hadoop," in Proc. of the 26th IEEE International Conference on Data Engineering, pp.996-1005, Mar., 2010.

Y.-S. Moon and J. Kim, "Efficient Moving Average Transform-Based Subsequence Matching Algorithms in Time-Series Databases," Information Sciences, Vol.177, No. 23, pp.5415-5431, Dec., 2007. crossref(new window)

J. M. Lucas and M. S. Saccucci, "Exponentially Weighted Moving Average Control Schemes: Properties and Enhancements," Technometircs, Vol.32, Issue 1, 1990.

J. S. Hunter, "The exponentially Weighted Moving Average," Journal of Quality Technology, Vol.18, No.4, Oct., 1986.

William W. S. Wei, "Time Series Analysis Univariate And Multivariate Methods," Addison-Wesley, 2005.

F. Pukelsheim, "The three sigma rule," The American Statistician, Vol.48, Issue 2, pp.88-91, 1994.

H.-P. Kriegel, P. Kroger, E. Schubert, A. Zimek, "LoOP: local outlier probabilities," in Proc. of the 18th ACM Conference on Information and Knowledge Management, pp.1649-1652, Nov., 2009.

Ganglia Monitoring System [Internet],