A Multimodal Profile Ensemble Approach to Development of Recommender Systems Using Big Data

빅데이터 기반 추천시스템 구현을 위한 다중 프로파일 앙상블 기법

  • Received : 2015.11.25
  • Accepted : 2015.12.14
  • Published : 2015.12.30


The recommender system is a system which recommends products to the customers who are likely to be interested in. Based on automated information filtering technology, various recommender systems have been developed. Collaborative filtering (CF), one of the most successful recommendation algorithms, has been applied in a number of different domains such as recommending Web pages, books, movies, music and products. But, it has been known that CF has a critical shortcoming. CF finds neighbors whose preferences are like those of the target customer and recommends products those customers have most liked. Thus, CF works properly only when there's a sufficient number of ratings on common product from customers. When there's a shortage of customer ratings, CF makes the formation of a neighborhood inaccurate, thereby resulting in poor recommendations. To improve the performance of CF based recommender systems, most of the related studies have been focused on the development of novel algorithms under the assumption of using a single profile, which is created from user's rating information for items, purchase transactions, or Web access logs. With the advent of big data, companies got to collect more data and to use a variety of information with big size. So, many companies recognize it very importantly to utilize big data because it makes companies to improve their competitiveness and to create new value. In particular, on the rise is the issue of utilizing personal big data in the recommender system. It is why personal big data facilitate more accurate identification of the preferences or behaviors of users. The proposed recommendation methodology is as follows: First, multimodal user profiles are created from personal big data in order to grasp the preferences and behavior of users from various viewpoints. We derive five user profiles based on the personal information such as rating, site preference, demographic, Internet usage, and topic in text. Next, the similarity between users is calculated based on the profiles and then neighbors of users are found from the results. One of three ensemble approaches is applied to calculate the similarity. Each ensemble approach uses the similarity of combined profile, the average similarity of each profile, and the weighted average similarity of each profile, respectively. Finally, the products that people among the neighborhood prefer most to are recommended to the target users. For the experiments, we used the demographic data and a very large volume of Web log transaction for 5,000 panel users of a company that is specialized to analyzing ranks of Web sites. R and SAS E-miner was used to implement the proposed recommender system and to conduct the topic analysis using the keyword search, respectively. To evaluate the recommendation performance, we used 60% of data for training and 40% of data for test. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. A widely used combination metric called F1 metric that gives equal weight to both recall and precision was employed for our evaluation. As the results of evaluation, the proposed methodology achieved the significant improvement over the single profile based CF algorithm. In particular, the ensemble approach using weighted average similarity shows the highest performance. That is, the rate of improvement in F1 is 16.9 percent for the ensemble approach using weighted average similarity and 8.1 percent for the ensemble approach using average similarity of each profile. From these results, we conclude that the multimodal profile ensemble approach is a viable solution to the problems encountered when there's a shortage of customer ratings. This study has significance in suggesting what kind of information could we use to create profile in the environment of big data and how could we combine and utilize them effectively. However, our methodology should be further studied to consider for its real-world application. We need to compare the differences in recommendation accuracy by applying the proposed method to different recommendation algorithms and then to identify which combination of them would show the best performance.


Big Data;Recommender System;Collaborative Filtering;Multimodal Profile;Ensemble Methodology


  1. Bar, A., G. Rokach, G. Shani, B. Shapira, and A. Schclar, "Improving simple collaborative filtering models using ensemble methods," Multiple Classifier Systems, Springer, (2013), 1-12.
  2. Billsus, D. and M. J. Pazzani, "Learning Collaborative Information Filters," ICML, Vol.98, (1998), 46-54.
  3. Bok, K. S. and J. S. Yoo, "Activation Policy and Case Study of Big Data," The Journal of Korean Institute of Communication Sciences, Vol.31, No.11(2014), 3-13.
  4. Cabral, B., R. D. Beltro, and M. G., Manzato, "Combining Multiple Metadata Types in Movies Recommendation Using Ensemble Algorithms," Proceedings of the 20th Brazilian Symposium on Multimedia and the Web, (2014), 231-238.
  5. Claycamp, H. J. and W. F. Massy, "A Theory of Market Segmentation," Journal of Marketing Research, Vol.5, No.4(1968), 388-394.
  6. Goldberg, D., D. Nichols, B. M. Oki, and D. Terry, "Using Collaborative filtering to weave an information Tapestry," Communications of the ACM, Vol.35, No.12(1992), 61-70.
  7. Gower, J. C., "A General Coefficient of Similarity and Some of Its Properties," Biometrics, Vol.27, No.4(1971), 857-871.
  8. Gurrin, C., A. F. Smeaton, and A. R. Doherty, "LifeLogging: Personal Big Data," Foundations and Trends in Information Retrieval, Vol.8, No.1(2014), 1-107.
  9. Herlocker, J. L., J. A. Konstan, and J. Riedl, "An algorithmic framework for performing collaborative filtering," Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, (1999), 230-237.
  10. Hyun, Y., N. Kim, and Y. Cho, "Interest-based Customer Segmentation Methodology Using Topic Modeling," Journal of Information Technology Applications & Management, Vol.22, No.1(2015), 77-93.
  11. Herlocker, J. L., J. A. Konstan, L. G., Terveen, and J. Riedl, "Evaluating Collaborative Filtering Recommender Systems," ACM Transactions on Information Systems, Vol.22, No.1(2004), 5-53.
  12. Kim, N.-H., "A Study on the Improvement of Web-log Analysis in Internet shopping-Mall," Proceedings of Korea Intelligent Information System Society, (2002), 134-139.
  13. Kim, J.-H., B.-H. Ahn, and D. Jeong, "A Recommender System using Mixed Filtering for Health Products," The Journal of Internet Electronic Commerce Research, Vol.12, No.2(2012), 109-124.
  14. Kim, K. H. and S. R., Oh, "Methodology for Applying Text Mining Techniques to Analyzing Online Customer Reviews for Market Segmentation," Journal of the Korea Contents Association, Vol.9, No.8(2009), 272-284.
  15. Kim, Y., J. Moon, H. J. Lee, and C. S., Bae, "Knowledge Digest Engine for Personal Bigdata Analysis," Human Centric Technology and Service in Smart Space, Springer Netherlands, 2012.
  16. Lee, J. S. and S. D. Park, "Performance Improvement of a Movie Recommendation System using Genre-wise Collaborative Filtering," Journal of Intelligence and Information Systems, Vol.13, No.4(2007), 65-78.
  17. Lee, Y. and K.-j. Kim, "Product Recommender Systems using Multi-Model Ensemble Techniques," Journal of Intelligence and Information Systems, Vol.19, No.2(2013), 39-54.
  18. Linden, G., B. Simth, and J. York, " recommendations: Item-to-item collaborative filtering," IEEE Internet Computing, Vol.7, No.1(2003), 76-80.
  19. Mazanec, J. A. "Market Segmentation," J. Jafari(Ed), Encyclopedia of Tourism, London:Routledge, 2000.
  20. Middleton, S. E., Shadbolt, N. R., and De Roure, D. C., "Ontological User Profiling in Recommneder Systems," ACM Transactions on Information Systems, Vol.22, No.1(2004), 54-88.
  21. Niwattanakul,S., J. Singthongchai, E. Naenudorn, and S, Wanapu, "Using Jaccard Coefficient for Keywords Similarity," Proceedings of the International MultiConference of Engineers and Computer Scientists, Vol.1(2013).
  22. Park, Y.-J., E.-J. Jung, and K.-N. Chang, "Customer Behavior Based Customer Profiling Technique for Personalized Products Recommendation," Korean Management Science Review, Vol.23, No.3(2006), 183-194.
  23. Pazzani, M., "A Framework for Collaborative, Content-Based, and Demographic Filtering," Artificial Intelligence Review, Vol.13, No.5-6(1999), 393-408.
  24. Piotte, M. and M. Chabbert, "The Pramatic theory solution to the Netflix grand prize," Netflix prize documentation, 2009.
  25. Ward, J. S. and A. Barker, "Undefined By Data: A Survey of Big Data Definitions," The Computing Research Repository, 2013.
  26. Weng, S. S. and M. J. Liu, "Feature-based recommendations for one-to-one marketing," Expert Systems with Applications, Vol.26, No.4(2004), 493-508.