DOI QR코드

DOI QR Code

온라인 서비스 이용 패턴과 개인정보의 예측 가능성 연구

Study on the Predictability of Personal Information from Online Service Usage Patterns

  • Youngran Kim (College of Business Administration, Inha University) ;
  • Wonchang Hur (College of Business Administration, Inha University)
  • 투고 : 2025.02.07
  • 심사 : 2025.03.16
  • 발행 : 2025.05.31

초록

본 연구는 온라인 서비스 이용 행태에 대한 설문자료로부터 이용자의 여러 가지 인구통계학 속성과 성격 특성의 예측 가능성을 비교하였다. 예측 요인으로는 OTT 및 TV, 스마트폰 앱, 디지털 콘텐츠, 쇼핑, 스마트 기기, 소셜 미디어에 대한 이용 패턴을 사용하였고, 예측 모형으로는 네 가지 의사결정나무 기반 앙상블 기법을 사용하여 각 예측 대상에 대한 요인별 기여도를 분석하였다. 연구 결과, 결혼 여부, 나이, 성별, 고학력 여부는 예측 가능성이 매우 높았고, 취업 여부와 소득이 그 뒤를 이었다. 반면, 정치 성향, 종교 유무, 성격 특성은 상대적으로 예측이 어려웠다. 예측 요인별 기여도는 모형에 따라 달랐으나 OTT 및 TV 이용 특성은 여러 가지 개인정보의 예측에 크게 기여하는 것으로 나타났다. 본 연구는 개인정보의 예측 가능성과 예측 요인을 종합적으로 분석함으로써 개인정보의 효과적인 보호와 활용에 유용한 지식을 제공한다.

This study compared the predictability of various demographic and personality traits of individuals from their online service usage behavior. Usage patterns for OTT, smartphone apps, digital contents, online shopping, smart devices, and social media were used as predictors, and the contribution of each predictor was examined using four tree-based ensemble models. The result shows that marital status, age, gender, and education level were highly predictable, followed by employment status and income. On the other hand, political orientation, religion, and personality traits are relatively hard to predict. The contribution of predictors varied depending on the model, but in most models, OTT and TV usage characteristics significantly contributed to the prediction of various personal information. The comprehensive investigation of the predictability and prediction factors of personal information provides knowledge for both its effective protection and utilization.

키워드

과제정보

이 논문은 인하대학교의 지원과 2022년 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행되었음(RS-2022-NR070854).

참고문헌

  1. Azucar, D., D. Marengo, and M. Settanni, "Predicting the Big 5 personality traits from digital footprints on social media: A meta-analysis", Personality and Individual Differences", Personality and Individual Differences, Vol.124, 2018, pp. 150-159.
  2. Barberá, P., "Birds of the same feather tweet together: Bayesian ideal point estimation using twitter data", Political Analysis, Vol.23, 2015, pp. 76-91.
  3. Blackwell, D., C. Leaman, R. Tramposch, C. Osborne, and M. Liss, "Extraversion, neuroticism, attachment style and fear of missing out as predictors of social media use and addiction", Personality and Individual Differences, Vol.116, 2017, pp. 69-72.
  4. Blumenstock, J., G. Cadamuro, and R. On, "Predicting poverty and wealth from mobile phone metadata", Science, Vol.350, No.6264, 2015, pp. 1073-1076.
  5. Breiman, L., "Random forests", Machine Learning, Vol.45, No.1, 2001, pp. 5-32.
  6. Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and regression trees, Wadsworth & Brooks/Cole Advanced Books & Software, UK, 1984.
  7. Chen, T. and C. Guestrin, "Xgboost: A scalable tree boosting system", In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 785-794.
  8. Chen, X., Y. Guo, H. Xu, H. Yan and L. Lin, "User demographic prediction based on the fusion of mobile and survey data", IEEE Access, Vol.10, 2022, pp. 111507-111527.
  9. Christian, H., D. Suhartono, A. Chowanda, and K. Z. Zamli, "Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging", Journal of Big Data, Vol.8, No.68, 2021.
  10. Cohen, R. and D. Ruths, "Classifying political orientation on twitter: It's not easy!", Proceedings of the International AAAI Conference on Web and Social Media, Vol.7, No.1, 2021, pp. 91-99.
  11. de Montjoye, Y.-A., L. Radaelli, V. K. Singh, and A. Pentland, "Unique in the shopping mall: On the reidentifiability of credit card metadata", Science, Vol.347, No.6221, 2015, pp. 536-539.
  12. Hinds, J. and A. N. Joinson, "What demographic attributes do our digital footprints reveal? A systematic review", PLoS ONE, Vol.13, No.11, 2018.
  13. Islam, T., N. Meade, R. T. Carson, J. J. Louviere, and J. Wang, "The usefulness of socio-demographic variables in predicting purchase decisions: Evidence from machine learning procedures", Journal of Business Research, Vol.151, 2022, pp. 324-338.
  14. Ke, G., Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma and T. Y. Liu, "Lightgbm: A highly efficient gradient boosting decision tree", Proceedings of Advances in Neural Information Processing Systems, Vol. 30, 2017, pp. 3146-3154.
  15. Kim, I. and G. Pant, "Predicting web site audience demographics using content and design cues", Information and Management, Vol. 56, No. 5, 2019, pp. 718-730.
  16. Kosinski, M., D. Stillwell, and T. Graepel, "Private traits and attitudes are predictable from digital records of human behavior", Proceedings of the National Academy of Sciences of the United States of America, Vol. 10, No. 15, 2013, pp. 5802-5805.
  17. Kosinski, M., Y. Bachrach, P. Kohli, D. Stillwell and T. Graepel, "Manifestations of user personality in website choice and behaviour on online social networks", Machine Learning, Vol. 95, No. 3, 2014, pp. 357-380.
  18. Matthias R. M., V. Simine, R-E. Nairán, B. S. Richard and W. P. James, "Are women really more talkative than men?", Science, Vol.317, No.5834, 2007, pp. 82.
  19. Matz, S. C., J. I. Menges, D. J. Stillwell, and H. A. Schwartz, "Predicting individual-level income from Facebook profiles", PLoS ONE, Vol.14, No.3, 2019.
  20. McCrae, R. R. and P. T. Costa, "Personality trait structure as a human universal", American Psychologist, Vol.52, No.2, 1997, pp. 509-516.
  21. Mehta, Y., C. Stachl, K. Markov, J. T. Yun, and B. W. Schuller, "Future-generation personality prediction from digital footprints", In Future Generation Computer Systems, Vol.136, 2022, pp. 322-325.
  22. Min, J., H. S. Choi, C. Kwak, and J. Lee, "The datafication of privacy: An exploratory examination of the human-machine-generated and changeability characteristics of personal data and its identifiability", Asia Pacific Journal of Information Systems, Vol. 34, No. 4, 2024
  23. Preoţiuc-Pietro, D., S. Volkeva, V. Lampos, Y. Bachrach, and N. Aletras, "Studying user income through language, behaviour and affect in social media", PLoS ONE, Vol.10, No.9, 2015.
  24. Prokhorenkova, L., G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, "CatBoost: unbiased boosting with categorical features", Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, pp. 6639-6649.
  25. Schwartz, H. A., J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M. Ramones, and M. Agrawal, "Personality, gender, and age in the language of social media: the open-vocabulary approach", PLoS ONE, Vol. 8, No. 9, 2013.
  26. Stachl, C., Q. Au, R. Schoedel, S. D. Gosling, G. M. Harari, D. Buschek, S. T. Völkel, T. Schuwerk, M. Oldemeier, T. Ullmann, H. Hussmann, B. Bischl, and M. Bühner, "Predicting personality from patterns of behavior collected with smartphones", Proceedings of the National Academy of Sciences, Vol. 117, No. 30, 2020, pp. 17680-17687.
  27. Tadesse, M. M., H. Lin, B. Xu and L. Yang, "Personality predictions based on user behavior on the Facebook social media platform", IEEE Access, Vol. 6, 2018, pp. 61959-61969.
  28. Tucker, J. A., A. Guess, P. Barberá, C. Vaccari, A. Siegel, S. Sanovich, D. Stukal, and B. Nyhan, "Social media, political polarization, and political disinformation: A review of the scientific literature", SSRN, 2018, Available at http://dx.doi.org/10.2139/ssrn.3144139
  29. van den Poel, D. and W. Buckinx, "Predicting online-purchasing behaviour", European Journal of Operational Research, Vol.166, No.2, 2015, pp. 557-575.
  30. Welke, P., I. Andone, K. Błaszkiewicz, and A. Markowetz, "Differentiating smartphone users by app usage", Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2016, pp. 519-523.
  31. Youyou, W., M. Kosinski, and D. Stillwell, "Computer-based personality judgments are more accurate than those made by humans", Proceedings of the National Academy of Sciences of the United States of America, Vol. 112, No. 4, 2015, pp. 1036-1040.