DOI QR코드

DOI QR Code

Design and Implementation of a Stereoscopic Image Control System based on User Hand Gesture Recognition

사용자 손 제스처 인식 기반 입체 영상 제어 시스템 설계 및 구현

  • Received : 2022.01.05
  • Accepted : 2022.01.24
  • Published : 2022.03.31

Abstract

User interactions are being developed in various forms, and in particular, interactions using human gestures are being actively studied. Among them, hand gesture recognition is used as a human interface in the field of realistic media based on the 3D Hand Model. The use of interfaces based on hand gesture recognition helps users access media media more easily and conveniently. User interaction using hand gesture recognition should be able to view images by applying fast and accurate hand gesture recognition technology without restrictions on the computer environment. This paper developed a fast and accurate user hand gesture recognition algorithm using the open source media pipe framework and machine learning's k-NN (K-Nearest Neighbor). In addition, in order to minimize the restriction of the computer environment, a stereoscopic image control system based on user hand gesture recognition was designed and implemented using a web service environment capable of Internet service and a docker container, a virtual environment.

영상 미디어를 위한 사용자 인터랙션은 다양한 형태로 개발되고 있으며, 특히, 인간의 제스처를 활용한 인터랙션이 활발히 연구되고 있다. 그 중에, 손 제스처 인식의 경우 3D Hand Model을 기반으로 실감 미디어 분야에서 휴먼 인터페이스로 활용되고 있다. 손 제스처 인식을 기반으로 한 인터페이스의 활용은 사용자가 미디어 매체에 보다 쉽고 편리하게 접근할 수 있도록 도와준다. 이러한 손 제스처 인식을 활용한 사용자 인터랙션은 컴퓨터 환경 제약 없이 빠르고 정확한 손 제스처 인식 기술을 적용하여 영상을 감상할 수 있어야 한다. 본 논문은 오픈 소스인 미디어 파이프 프레임워크와 머신러닝의 k-NN(K-Nearest Neighbor)을 활용하여 빠르고 정확한 사용자 손 제스처 인식 알고리즘을 제안한다. 그리고 컴퓨터 환경 제약을 최소화하기 위하여 인터넷 서비스가 가능한 웹 서비스 환경 및 가상 환경인 도커 컨테이너를 활용하여 사용자 손 제스처 인식 기반의 입체 영상 제어 시스템을 설계하고 구현한다.

Keywords

Acknowledgement

This work was supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government. [22ZH1200, The research of the basic media·contents technologies]

References

  1. MANO [Internet], Available: https://mano.is.tue.mpg.de
  2. N. Qian, J. Wang, F. Mueller, F. Bernard, V. Golyanik, and C. Theobalt, "HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization," in proceeding of the European Conference on Computer Vision, pp. 54-71, 2020.
  3. S. An, X. Zhang, D. Wei, H. Zhu, J. Yang, and K. A. Tsintotas, "Fast Hand: Fast monocular hand pose estimation on embedded systems," Journal of Systems Architecture, vol. 122, Jan. 2022.
  4. Z. Fan, A. Spurr, M. Kocabas, S. Tang, M. J. Black, O. Hilliges, "Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-pixel Part Segmentation," in Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-10, 2021.
  5. L. Chen, S. Y. Lin, Y. Xie, Y. Y. Lin, and X. Xie, "MVHM: A Large-Scale multi-View Hand Mesh Benchmark for Accurate 3D Hand Pose Estimation," in Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 836-845, 2021.
  6. Y. Zhou, M. Habermann, W. Xu, I. Habibie, C. Theobalt, F. Xu ,"Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5345-5354, 2020.
  7. B. D. Song, H. K. Choi, S. H. Kim, "Research Trends of User Hand Gesutre Recognition Technologies for Utilizing User Interaction in Stereoscopic Images," in Proceeding of the Korea Institute of Communications and Information Sciences Summer Conference, pp. 119-120, 2021.
  8. MediaPipe [Internet]. Available: https://mediapipe.dev
  9. Ian Fette; Alexey Melnikov (December 2011). "Relationship to TCP and HTTP". RFC 6455 The WebSocket Protocol. IETF. sec. 1.7. RFC 6455.