DOI QR코드

DOI QR Code

An Automatic and Scalable Application Crawler for Large-Scale Mobile Internet Content Retrieval

  • Huang, Mingyi (Department of Computer Science and Technology, Tsinghua University) ;
  • Lyu, Yongqiang (Department of Computer Science and Technology, Tsinghua University) ;
  • Yin, Hao (Department of Computer Science and Technology, Tsinghua University)
  • Received : 2018.02.28
  • Accepted : 2018.05.31
  • Published : 2018.10.31

Abstract

The mobile internet has grown ubiquitous across the globe with the widespread use of smart devices. However, the designs of modern mobile operating systems and their applications limit content retrieval with mobile applications. The mobile internet is not as accessible as the traditional web, having more man-made restrictions and lacking a unified approach for crawling and content retrieval. In this study, we propose an automatic and scalable mobile application content crawler, which can recognize the interaction paths of mobile applications, representing them as interaction graphs and automatically collecting content according to the graphs in a parallel manner. The crawler was verified by retrieving content from 50 non-game applications from the Google Play Store using the Android platform. The experiment showed the efficiency and scalability potential of our crawler for large-scale mobile internet content retrieval.

Keywords

References

  1. Karen Church, Barry Smyth, Paul Cotter and Keith Bradley, "Mobile information access: A study of emerging search behavior on the mobile Internet," ACM Transactions on the Web (TWEB), vol. 1, no. 1, May, 2007.
  2. Tanzirul Azim and Iulian Neamtiu, "Targeted and depth-first exploration for systematic testing of android apps," ACM SIGPLAN Notices, vol. 48, no. 10, pp. 641-660, October, 2013. https://doi.org/10.1145/2544173.2509549
  3. Shuai Hao, Bin Liu, Suman Nath, William G. J. Halfond and Ramesh Govindan, "PUMA: programmable UI-automation for large-scale dynamic analysis of mobile apps," in Proc. of 12th Annu. Int. Conf. on Mobile Systems, Applications, and Services (MobiSys'14), pp. 204-217, June 16-19, 2014.
  4. Suman Nath, Felix Xiaozhu Lin, Lenin Ravindranath and Jitendra Padhye, "SmartAds: bringing contextual ads to mobile apps," in Proc. of 12th Annu. Int. Conf. on Mobile Systems, Applications, and Services (MobiSys'13), pp. 111-124, June 25-28, 2013.
  5. Facebook, "React Native, a framework for building native apps using React."
  6. Statista, "Number of apps available in leading app stores as of June 2016."
  7. Statista, "Number of smartphone social network users in the United States from 2014 to 2020 (in millions)," 2016.
  8. CNNIC, "Statistical Report on China's Internet Development (2016.1)," 2016.
  9. Statista, "Mobile Internet-Statistics & Facts."
  10. Cisco, "Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2016-2021 White Paper," 2017.
  11. Patton Ron, Software Testing, 2nd Editon, Sams Publishing, Indianapolis, 2005.
  12. Ian Goldberg, David Wagner, Randi Thomas and Eric A. Brewer, "A secure environment for untrusted helper applications: Confining the wily hacker," in Proc. of 6th Conf. on USENIX Security Symposium, Focusing on Applications of Cryptography, vol. 6, pp. 1, July 22-25, 1996.
  13. Eda Baykan, Monika Henzinger and Ingmar Weber, "A Comprehensive Study of Techniques for URL-Based Web Page Language Classification," ACM Transactions on the Web (TWEB), vol. 1, no. 1, article no. 3, March 2013.
  14. 4Tanzirul Azim and Iulian Neamtiu, "Targeted and depth-first exploration for systematic testing of android apps," ACM SIGPLAN Notices, vol. 48, no. 10, pp. 641-660. October, 2013. https://doi.org/10.1145/2544173.2509549
  15. RobotiumTech, "Robotium."
  16. International Data Corperation (IDC), "Smartphone OS Market Share," 2016.
  17. Ricardo Anacleto, Lino Figueiredo, Ana Almeida and Paulo Novais, "Server to Mobile Device Communication: A Case Study," Ambient Intelligence-Software and Applications, vol. 219, pp. 79-86, 2013.
  18. Tcpdump & Libpcap, "TCPDUMP/LIBPCAP public repository,"
  19. Soumen Chakrabarti, Martin Van Den Berg and Byron Dom, "Focused crawling: a new approach to topic-specific Web resource discovery," Computer Networks, vol. 31. no. 11-16, pp. 1623-1640, May, 1999. https://doi.org/10.1016/S1389-1286(99)00052-3
  20. Ziv Bar-Yossef, Alexander Berg, Steve Chien, Jittat Fakcharoenphol and Dror Weitz, "Approximating Aggregate Queries about Web Pages via Random Walks," in Proc. of 26th Int. Conf. on Very Large Data Bases (VLDB'00), pp. 535-544, September 10-14, 2000.