DOI QR코드

DOI QR Code

A New Approach to Web Data Mining Based on Cloud Computing

  • Received : 2014.05.12
  • Accepted : 2014.08.18
  • Published : 2014.12.30

Abstract

Web data mining aims at discovering useful knowledge from various Web resources. There is a growing trend among companies, organizations, and individuals alike of gathering information through Web data mining to utilize that information in their best interest. In science, cloud computing is a synonym for distributed computing over a network; cloud computing relies on the sharing of resources to achieve coherence and economies of scale, similar to a utility over a network, and means the ability to run a program or application on many connected computers at the same time. In this paper, we propose a new system framework based on the Hadoop platform to realize the collection of useful information of Web resources. The system framework is based on the Map/Reduce programming model of cloud computing. We propose a new data mining algorithm to be used in this system framework. Finally, we prove the feasibility of this approach by simulation experiment.

Keywords

References

  1. M. Armbrust, A. Fox, G. Rean, A. Joseph, R. Katz, A. Konwinski, L. Gunho, P. David, A. Rabkin, I. Stoica and M. Zaharia, "Above the clouds: a Berkeley view of cloud computing," Department of Electrical Engineering and Computing Sciences, University of California at Berkeley, Tech. Rep. UCB/EECS-2009-28, 2009.
  2. C. H. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan, "A survey of Web information extraction systems," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 10, pp. 1411-1428, 2006. https://doi.org/10.1109/TKDE.2006.152
  3. Wikipedia, "Cloud computing," http://en.wikipedia.org/wiki/Cloud_computing.
  4. J. Dean and S. Ghemawat, "MapReduce simplified data processing on large clusters," in Proceedings of the 6th Symposium on Operating System Design and Implementation, San Francisco, CA, 2004, pp. 137-150.
  5. R. Cooley, B. Mobasher, and J. Srivastava, "Web mining: information and pattern discovery on the World Wide Web," in Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, Newport Beach, CA, 1997, pp. 558-567.
  6. Hadoop, http://hadoop.apache.org.
  7. Y. Tao, W. Lin, and X. Xiao, "Minimal MapReduce algorithms," in Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, NY, 2013, pp. 529-540.
  8. M. J. Fischer, X. Su, and Y. Yin, "Assigning tasks for efficiency in Hadoop: extended abstract," in Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, Santorini, Greece, 2010, pp. 30-39.
  9. W. W. Lin, "An improved data placement strategy for Hadoop," Journal of South China University of Technology: Natural Science, vol. 40, no. 1, pp. 152-158, 2012.
  10. C. Gong, J. Liu, Q. Zhang, H. Chen, and Z. Gong, "The characteristics of cloud computing," in Proceedings of the 39th International Conference on Parallel Processing, San Diego, CA, 2010, pp. 275-279.
  11. D. Jiang, B. C. Ooi, L. Shi, and S. Wu, "The performance of MapReduce: an in-depth study," Proceedings of the VLDB, vol. 3, no. 1-2, pp. 472-483, 2010. https://doi.org/10.14778/1920841.1920903
  12. X. L. Lu and J. M. He, "Study on cloud storage model of Map/Reduce-based index data," Journal of Ningbo University, vol. 24, no. 3, pp. 29-33, 2011.
  13. R. Lammel, "Google's MapReduce programming model - revisited," Science of Computer Programming, vol. 70, no. 1, pp. 1-30, 2008. https://doi.org/10.1016/j.scico.2007.07.001
  14. M. S. Chen, J. Han, and P. S. Yu, "Data mining: an overview from a database perspective," IEEE Transaction on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866-883, 1996. https://doi.org/10.1109/69.553155
  15. Z. Bar-Yossef and S. Rajagopalan, "Template detection via data mining and its applications," in Proceedings of the 11th International Conference on World Wide Web, Honolulu, HI, 2002, pp. 580-591.

Cited by

  1. An Effective Sensor Cloud Control Scheme Based on a Two-Stage Game Approach vol.6, pp.2169-3536, 2018, https://doi.org/10.1109/ACCESS.2018.2815578