DOI QR코드

DOI QR Code

A Methodology for Realty Time-series Generation Using Generative Adversarial Network

적대적 생성망을 이용한 부동산 시계열 데이터 생성 방안

  • Received : 2021.07.07
  • Accepted : 2021.10.20
  • Published : 2021.10.28

Abstract

With the advancement of big data analysis, artificial intelligence, machine learning, etc., data analytics technology has developed to help with optimal decision-making. However, in certain areas, the lack of data restricts the use of these techniques. For example, real estate related data often have a long release cycle because of its recent release or being a non-liquid asset. In order to overcome these limitations, we studied the scalability of the existing time series through the TimeGAN model. A total of 45 time series related to weekly real estate data were collected within the period of 2012 to 2021, and a total of 15 final time series were selected by considering the correlation between the time series. As a result of data expansion through the TimeGAN model for the 15 time series, it was found that the statistical distribution between the real data and the extended data was similar through the PCA and t-SNE visualization algorithms.

최근 빅데이터 분석, 인공지능, 기계학습 등의 발전으로 인해서 데이터를 과학적으로 분석하는 기술이 발전하고 있으며 이는 의사결정 문제를 최적으로 해결해주고 있다. 그러나 특정 분야의 경우에는 데이터의 양이 부족해서 과학적 방식에 적용하는 것이 어렵다. 예컨대 부동산과 같은 데이터는 데이터 발표 시점이 최근이거나 비 유동성 자산이다 보니 발표 주기가 긴 경우가 많다. 따라서 본 연구에서는 이런 문제점을 극복하기 위해서 TimeGAN 모형을 통해 기존의 시계열의 확장 가능성에 대해서 연구하고자 한다. 이를 위해 부동산과 관련된 총 45개의 시계열을 데이터 셋에 맞게 2012년부터 2021년까지 주 단위로 데이터를 수집하고 시계열 간의 상관관계를 고려해서 총 15개의 최종 시계열을 선정한다. 15개의 시계열에 대해서 TimeGAN 모형을 통해 데이터 확장을한 결과, PCA 및 T-SNE 시각화 알고리즘을 통해 실제 데이터와 확장 데이터 간의 통계적 분포가 유사하다는 것을 확인할 수 있었다. 따라서 본 논문을 통해서 데이터의 과적합 또는 과소적합이라는 한계점을 극복할 수 있는 다양한 실험이 연구되기를 기대한다.

Keywords

References

  1. W. K. Kang & B. R. Kim. (2019). Consideration of Human Emotions about Artificial Intelligence - Focused on the Analysis of Newspaper Articles on AlphaGo VS Lee Sedol, Journal of Ethics, 1(132), 191-201. DOI : 10.15801/je.1.132.201812.181
  2. J. P. Ryu, C. H. Han & H. J. Shin. (2016). Sector Investment strategies Using Big Data Trends, Journal of Information Technology and Architecture, 13(1), 111-121.
  3. J. Y. Yim & B. Y. Hwang. (2014). Predicting Movie Success based on Machine Learning Using Twitter, KIPS Transactions on Software and Data Engineering, 3(7), 263-270. https://doi.org/10.3745/KTSDE.2014.3.7.263
  4. S. S. Shin, H. Y. Cho & Y. H. Kim. (2021). Optimal Ratio of Data Oversampling Based on a Genetic Algorithm for Overcoming Data Imbalance, Journal of the Korea Convergence Society, 12(1), 49-55. DOI : 10.15801/je.1.132.201812.181
  5. S. W. Bae & J. S. Yu. (2018). Estimating the Real Estate Price Index Based on Sample House Price: Focusing on the Use of Machine Learning Method, Housing Studies, 26(4), 53-74. DOI : 10.24957/hsr.2018.26.4.53
  6. J. P. Ryu & H. J. Shin. (2012). Investment Strategies for KOSPI200 Index Futures Using VKOSPI and Control Chart, Journal of the Korean Institute of Industrial Engineers, 38(4), 237-243. DOI : 10.7232/JKIIE.2012.38.4.237
  7. J. W. Kim. (2019). Predictive Optimization Adjusted With Pseudo Data From A Missing Data Imputation Technique, Journal of the Korea Academia-Industrial cooperation Society, 20(2), 200-209. DOI : 10.5762/KAIS.2019.20.2.200
  8. A. Cowling & P. Hall. (1996). On pseudo data methods for removing boundary effects in kernel density estimation, Journal of the Royal Statistical Society, 58(3), 551-563. DOI : 10.1111/j.2517-6161.1996.tb02100.x
  9. L. Breiman. (1998). Using convex pseudo-data to increase prediction accuracy, breast, 5(2), 1-18.
  10. A. Purwar & S. K. Singh. (2015). Hybrid prediction model with missing value imputation for medical data, Expert Systems with Applications, 42(13), 5621-5631. DOI : 10.1016/j.eswa.2015.02.050
  11. J. H. Yoon, B. K. LEE & B. W. Kim. (2021). A Study on GAN Algorithm for Restoration of Cultural Property, Journal of The Korea Society of Computer and Information, 26(1), 77-84. DOI : 10.9708/jksci.2021.26.01.077
  12. U. Sivarajah, M. M. Karnal, Z. Irani & V. Weerakkody. (2017). Critical analysis of Big Data challenges and analytical methods, Journal of Business Research, 70, 263-286. DOI : 10.1016/j.jbusres.2016.08.001
  13. M. K. Back, S. W. Yoon, S. B. Lee & K. C. Lee. (2020). Improving Fidelity of Synthesized Voices Generated by Using GANs, KIPS Trans. Softw. and Data Eng, 10(1), 9-18. DOI : 10.3745/KTSDE.2021.10.1.9
  14. A. Odena, V. Dumoulin & C. Olah. (2016). Deconvolution and checkerboard artifacts, Distill, 1(10), 1-3. DOI : 10.23915/distill.00003
  15. I. Goodfellow, P. A. Jean, M. Mirza, B. Xu, W. F. David, S. Ozair, A. Courville & Y. Bengio. (2014). Generative adversarial nets, In Advances in neural information processing systems, 2672-2680. DOI : https://dl.acm.org/doi/10.5555/2969033.2969125
  16. J. S. Yoon & D. E. Jarrett. (2019). Time-series Generative Adversarial Networks, 33rd Conference on Neural Information Processing System.