DOI QR코드

DOI QR Code

Scalable Prediction Models for Airbnb Listing in Spark Big Data Cluster using GPU-accelerated RAPIDS

  • Muralidharan, Samyuktha (Department of Information Systems, California State University) ;
  • Yadav, Savita (Department of Information Systems, California State University) ;
  • Huh, Jungwoo (Department of Electrical Engineering, Yonsei University) ;
  • Lee, Sanghoon (Department of Electrical Engineering, Yonsei University) ;
  • Woo, Jongwook (Department of Information Systems, California State University)
  • Received : 2021.12.07
  • Accepted : 2021.03.29
  • Published : 2022.06.30

Abstract

We aim to build predictive models for Airbnb's prices using a GPU-accelerated RAPIDS in a big data cluster. The Airbnb Listings datasets are used for the predictive analysis. Several machine-learning algorithms have been adopted to build models that predict the price of Airbnb listings. We compare the results of traditional and big data approaches to machine learning for price prediction and discuss the performance of the models. We built big data models using Databricks Spark Cluster, a distributed parallel computing system. Furthermore, we implemented models using multiple GPUs using RAPIDS in the spark cluster. The model was developed using the XGBoost algorithm, whereas other models were developed using traditional central processing unit (CPU)-based algorithms. This study compared all models in terms of accuracy metrics and computing time. We observed that the XGBoost model with RAPIDS using GPUs had the highest accuracy and computing time.

Keywords

Acknowledgement

The Databrick University Alliance supported this research. We appreciate the support of Rob Reed, Program Director at Databricks University Alliance.

References

  1. D. Dauletbak, J. Heo, S. Kim, Y. Kim, and, J. Woo, "Scalable traffic predictive analysis for smart city using GPU in big data," KSII The 16th Asia Pacific International Conference on Information Science and Technology (APIC-IST), pp 144-148, 2021.
  2. J. Woo, Market Basket Analysis Algorithms with MapReduce, DMKD-00150, Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, vol. 3, Issue 6, pp. 445-452, 2013. https://doi.org/10.1002/widm.1107
  3. J. Woo, and Y. Xu,. Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing, The 2011 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2011).
  4. J. Brownlee, XGBoost for Regression, Machine Learning Mastery, 2021. [Online] Available: https://machinelearningmastery.com/xgboost-for-regression/.
  5. Chip ICT b.v., GPU Computing, the basics: Chip ICT, 2021, [online] Available: https://www.chipict.com/gpu-computing-the-basics/.
  6. P. Choudhary, A. Jain, and R. Baijal, Unravelling Airbnb predicting price for new listing,. ArXiv, 2018, [online] Available: https://arxiv.org/pdf/1805.12101.pdf.
  7. P. R. Kalehbasti, L. Nikolenko, and H. Rezaei, Airbnb price prediction using machine learning and sentiment analysis, ArXiv, 2019, [online] Available: https://arxiv.org/pdf/1907.12665.pdf. https://doi.org/10.1007/978-3-030-84060-0_11
  8. R. Mitchell, and E. Frank, Accelerating the XGBoost algorithm using GPU computing, PeerJ Computer Science, 3, e127, 2017, [online] Available: https://doi.org/10.7717/peerj-cs.127.
  9. A. Mishra, "XGBoost an efficient implementation of gradient boosting, DataScience Foundation, 2020, [online] Available: https://datascience.foundation/datatalk/xgboost-an-efficient-implementationof-gradientboosting
  10. Airbnb Ratings Dataset. Kaggle, 2021, [online] Available: https://www.kaggle.com/samyukthamurali/airbnb-ratings-dataset?select=airbnb-reviews.csv
  11. Airbnb - Listings. Opendatasoft, 2020, [online] Available: https://public.opendatasoft.com/explore/dataset/airbnb-listings/table/?disjunctive.host_verifications&disjunctive.amenities&disjunctive.features.
  12. V. Morde, XGBoost algorithm: Long may she reign!, Towards Data Science. Medium, 2019, [online] Available: https://towardsdatascience.com/https-medium-com-vishalmorde-xgboost-algorithm-longshe-mayrein-edd9f99be63d.
  13. Nvidia-spark rapids. (n.d.). Home. Spark-Rapids, 2021, [online] Available: https://nvidia.github.io/sparkrapids/#:%7E:text=The%20RAPIDS%20Accelerator%20for%20Apache,processing%20via%20the%20RAPIDS%20libraries.&text=The%20RAPIDS%20Accelerator%20library%20also,GPU%20communication%20and%20RDMA%20capabilities.
  14. Nvidia. (n.d.-b). What's New in Deep Learning & Artificial Intelligence, 2021, [online] Available: https://www.nvidia.com/en-us/ai-data-science/spark-ebook/gpu-accelerated-spark-3/
  15. Nvidia. (n.d.). NVIDIA GPU Accelerated Solutions for Data Science, 2021, [online] Available: https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/#:%7E:text=Data%20science%20workflows%20have%20traditionally,%E2%84%A2%20open%20source%20software%20libraries.
  16. RAPIDS. (n.d.). Open GPU Data Science | RAPIDS. Rapids.Ai., 2021, [online] Available: https://rapids.ai/about.html#:%7E:text=The%20RAPIDS%20suite%20of%20open,hardware%20and%20data%20science%20experience.
  17. RAPIDS, Open GPU Data Science (n.d.), 2021, [online] Available: https://rapids.ai/.
  18. GPU Accelerated Apache Spark (n.d.), 2021, [online] Available: https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/apache-spark-3/.