DOI QR코드

DOI QR Code

t-SNE에 대한 요약

A review on the t-distributed stochastic neighbors embedding

  • Kipoong Kim (Department of Statistics, Seoul National University) ;
  • Choongrak Kim (Department of Statistics, Pusan National University)
  • 투고 : 2022.12.05
  • 심사 : 2022.12.15
  • 발행 : 2023.04.30

초록

본 논문에서는 고차원의 자료를 저차원으로 변환시켜 시각화하는 다양한 방법들을 소개하였다. 차원 축소는 크게 선형 방법과 비선형 방법으로 나눌 수 있는데 선형 방법으로 주성분 분석, 다차원 척도 등을 간략하게 소개하였고 비선형 방법으로 커널 주성분 분석, 자기조직도, 국소 선형 사상, Isomap, 국소 다차원 척도 등을 간략하게 소개하였으며, 가장 최근에 제안되었으며 매우 널리 사용되고 있지만 통계학 분야에는 비교적 생소한 t-SNE에 대하여 자세히 소개하였다. t-SNE를 이용한 간단한 예제를 제시하고 t-SNE의 장단점을 지적한 최근 연구 논문을 소개하고 제시된 향후 연구 과제들을 살펴보았다.

This paper investigates several methods of visualizing high-dimensional data in a low-dimensional space. At first, principal component analysis and multidimensional scaling are briefly introduced as linear approaches, and then kernel principal component analysis, self-organizing map, locally linear embedding, Isomap, Laplacian Eigenmaps, and local multidimensional scaling are introduced as nonlinear approaches. In particular, t-SNE, which is widely used but relatively unfamiliar in the field of statistics, is described in more detail. We also present a simple example for several methods, including t-SNE. Finally, we provide a review of several recent studies pointing out the limitations of t-SNE and discuss the future research problems presented.

키워드

과제정보

본 연구는 부산대학교 2년 과제 연구비에 의하여 수행되었음.

참고문헌

  1. Amid E and Warmuth MK (2019). TriMap: Large-Scale dimensionality reduction using triplets, Available from: arXiv:1910.00204
  2. Belkin M and Niyogi M (2003). Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation, 15, 1373-1396. https://doi.org/10.1162/089976603321780317
  3. Chen L and Buja A (2009). Local multidimensional scaling for nonlinear dimension reduction, graph drawing and proximity analysis, Journal of the American Statistical Association, 104, 209-219. https://doi.org/10.1198/jasa.2009.0111
  4. Kobak D and Berens P (2019). The art of t-SNE for single-cell transcriptomics, Nature Communications, 10, 5416.
  5. Kobak D and Linderman GC (2021). Initialization is critical for preserving global data structure in both t-SNE and UMAP, Nature Biotechnology, 39, 156-157. https://doi.org/10.1038/s41587-020-00809-z
  6. Kohonen T (1990). The self-organizing map, Proceedings of the IEEE, 78, 1464-1479. https://doi.org/10.1109/5.58325
  7. McInnes L, Healy J, and Melville J (2018). UMAP: Uniform manifold approximation and projection for dimension reduction, Available from: arXiv:1802.03426
  8. Roweis ST and Saul LK (2000). Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 2323-2326. https://doi.org/10.1126/science.290.5500.2323
  9. Scholkopf B, Smola A, and Muller KR (1999). Kernel principal component analysis. In Scholkopf B, Burges C, and Smola A (Eds), Advances in Kernel Methods - Support Vector Learning (pp. 327-352), MIT Press, Cambridge.
  10. Tenenbaum JB, de Silva V, and Langford JC (2000). A global geometric framework for nonlinear dimensionality reduction, Science, 290, 2319-2323. https://doi.org/10.1126/science.290.5500.2319
  11. van der Maaten L and Hinton G (2008). Visualizing data using t-SNE, Journal of Machine Learning Research, 9, 2579-2605.
  12. Wang Y, Huang H, Rudin C, and Shaposhnik Y (2021). Understanding how dimension reduction tools work: An empirical approach to deciphering t-SNE, UMAP, TriMap, and PaCMAP for data visualization, Journal of Machine Learning Research, 22, 1-73.