Search | Korea Science

Lip and Voice Synchronization Using Visual Attention (시각적 어텐션을 활용한 입술과 목소리의 동기화 연구)

Dongryun Yoon;Hyeonjoong Cho
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.4
- /
- pp.166-173
- /
- 2024
This study explores lip-sync detection, focusing on the synchronization between lip movements and voices in videos. Typically, lip-sync detection techniques involve cropping the facial area of a given video, utilizing the lower half of the cropped box as input for the visual encoder to extract visual features. To enhance the emphasis on the articulatory region of lips for more accurate lip-sync detection, we propose utilizing a pre-trained visual attention-based encoder. The Visual Transformer Pooling (VTP) module is employed as the visual encoder, originally designed for the lip-reading task, predicting the script based solely on visual information without audio. Our experimental results demonstrate that, despite having fewer learning parameters, our proposed method outperforms the latest model, VocaList, on the LRS2 dataset, achieving a lip-sync detection accuracy of 94.5% based on five context frames. Moreover, our approach exhibits an approximately 8% superiority over VocaList in lip-sync detection accuracy, even on an untrained dataset, Acappella.
https://doi.org/10.3745/TKIPS.2024.13.4.166 인용 PDF

Real-time Lip Region Detection for Lipreadingin Mobile Device (모바일 장치에서의 립리딩을 위한 실시간 입술 영역 검출)

Kim, Young-Un;Kang, Sun-Kyung;Jung, Sung-Tae
- Journal of the Korea Society of Computer and Information
- /
- v.14 no.4
- /
- pp.39-46
- /
- 2009
Many lip region detection methods have been developed in PC environment. But the existing methods are difficult to run on real-time in resource limited mobile devices. To solve the problem, this paper proposes a real-time lip region detection method for lipreading in Mobile device. It detects face region by using adaptive face color information. After that, it detects lip region by using geometrical relation between eyes and lips. The proposed method is implemented in a smart phone with Intel PXA 270 embedded processor and 386MB memory. Experimental results show that the proposed method runs at the speed 9.5 frame/see and the correct detection rate was 98.8% for 574 images.
https://doi.org/10.9708/jksci.2009.14.4.039 인용 PDF

A Study on Lip Detection based on Eye Localization for Visual Speech Recognition in Mobile Environment (모바일 환경에서의 시각 음성인식을 위한 눈 정위 기반 입술 탐지에 대한 연구)

Gyu, Song-Min;Pham, Thanh Trung;Kim, Jin-Young;Taek, Hwang-Sung
- Journal of the Korean Institute of Intelligent Systems
- /
- v.19 no.4
- /
- pp.478-484
- /
- 2009
Automatic speech recognition(ASR) is attractive technique in trend these day that seek convenient life. Although many approaches have been proposed for ASR but the performance is still not good in noisy environment. Now-a-days in the state of art in speech recognition, ASR uses not only the audio information but also the visual information. In this paper, We present a novel lip detection method for visual speech recognition in mobile environment. In order to apply visual information to speech recognition, we need to extract exact lip regions. Because eye-detection is more easy than lip-detection, we firstly detect positions of left and right eyes, then locate lip region roughly. After that we apply K-means clustering technique to devide that region into groups, than two lip corners and lip center are detected by choosing biggest one among clustered groups. Finally, we have shown the effectiveness of the proposed method through the experiments based on samsung AVSR database.
https://doi.org/10.5391/JKIIS.2009.19.4.478 인용 PDF KSCI

Lip Reading Method Using CNN for Utterance Period Detection (발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법)

Kim, Yong-Ki;Lim, Jong Gwan;Kim, Mi-Hye
- Journal of Digital Convergence
- /
- v.14 no.8
- /
- pp.233-243
- /
- 2016
Due to speech recognition problems in noisy environment, Audio Visual Speech Recognition (AVSR) system, which combines speech information and visual information, has been proposed since the mid-1990s,. and lip reading have played significant role in the AVSR System. This study aims to enhance recognition rate of utterance word using only lip shape detection for efficient AVSR system. After preprocessing for lip region detection, Convolution Neural Network (CNN) techniques are applied for utterance period detection and lip shape feature vector extraction, and Hidden Markov Models (HMMs) are then used for the recognition. As a result, the utterance period detection results show 91% of success rates, which are higher performance than general threshold methods. In the lip reading recognition, while user-dependent experiment records 88.5%, user-independent experiment shows 80.2% of recognition rates, which are improved results compared to the previous studies.
https://doi.org/10.14400/JDC.2016.14.8.233 인용 PDF KSCI

A Lip Detection Algorithm Using Color Clustering (색상 군집화를 이용한 입술탐지 알고리즘)

Jeong, Jongmyeon
- Journal of the Korea Society of Computer and Information
- /
- v.19 no.3
- /
- pp.37-43
- /
- 2014
In this paper, we propose a robust lip detection algorithm using color clustering. At first, we adopt AdaBoost algorithm to extract facial region and convert facial region into Lab color space. Because a and b components in Lab color space are known as that they could well express lip color and its complementary color, we use a and b component as the features for color clustering. The nearest neighbour clustering algorithm is applied to separate the skin region from the facial region and K-Means color clustering is applied to extract lip-candidate region. Then geometric characteristics are used to extract final lip region. The proposed algorithm can detect lip region robustly which has been shown by experimental results.
https://doi.org/10.9708/jksci.2014.19.3.037 인용 PDF KSCI

Lip Detection using Color Distribution and Support Vector Machine for Visual Feature Extraction of Bimodal Speech Recognition System (바이모달 음성인식기의 시각 특징 추출을 위한 색상 분석자 SVM을 이용한 입술 위치 검출)

정지년;양현승
- Journal of KIISE:Software and Applications
- /
- v.31 no.4
- /
- pp.403-410
- /
- 2004
Bimodal speech recognition systems have been proposed for enhancing recognition rate of ASR under noisy environments. Visual feature extraction is very important to develop these systems. To extract visual features, it is necessary to detect exact lip position. This paper proposed the method that detects a lip position using color similarity model and SVM. Face/Lip color distribution is teamed and the initial lip position is found by using that. The exact lip position is detected by scanning neighbor area with SVM. By experiments, it is shown that this method detects lip position exactly and fast.
PDF KSCI

A Study on Enhancing the Performance of Detecting Lip Feature Points for Facial Expression Recognition Based on AAM (AAM 기반 얼굴 표정 인식을 위한 입술 특징점 검출 성능 향상 연구)

Han, Eun-Jung;Kang, Byung-Jun;Park, Kang-Ryoung
- The KIPS Transactions:PartB
- /
- v.16B no.4
- /
- pp.299-308
- /
- 2009
AAM(Active Appearance Model) is an algorithm to extract face feature points with statistical models of shape and texture information based on PCA(Principal Component Analysis). This method is widely used for face recognition, face modeling and expression recognition. However, the detection performance of AAM algorithm is sensitive to initial value and the AAM method has the problem that detection error is increased when an input image is quite different from training data. Especially, the algorithm shows high accuracy in case of closed lips but the detection error is increased in case of opened lips and deformed lips according to the facial expression of user. To solve these problems, we propose the improved AAM algorithm using lip feature points which is extracted based on a new lip detection algorithm. In this paper, we select a searching region based on the face feature points which are detected by AAM algorithm. And lip corner points are extracted by using Canny edge detection and histogram projection method in the selected searching region. Then, lip region is accurately detected by combining color and edge information of lip in the searching region which is adjusted based on the position of the detected lip corners. Based on that, the accuracy and processing speed of lip detection are improved. Experimental results showed that the RMS(Root Mean Square) error of the proposed method was reduced as much as 4.21 pixels compared to that only using AAM algorithm.
https://doi.org/10.3745/KIPSTB.2009.16-B.4.299 인용 PDF KSCI

Real Time Lip Reading System Implementation in Embedded Environment (임베디드 환경에서의 실시간 립리딩 시스템 구현)

Kim, Young-Un;Kang, Sun-Kyung;Jung, Sung-Tae
- The KIPS Transactions:PartB
- /
- v.17B no.3
- /
- pp.227-232
- /
- 2010
This paper proposes the real time lip reading method in the embedded environment. The embedded environment has the limited sources to use compared to existing PC environment, so it is hard to drive the lip reading system with existing PC environment in the embedded environment in real time. To solve the problem, this paper suggests detection methods of lip region, feature extraction of lips, and awareness methods of phonetic words suitable to the embedded environment. First, it detects the face region by using face color information to find out the accurate lip region and then detects the exact lip region by finding the position of both eyes from the detected face region and using the geometric relations. To detect strong features of lighting variables by the changing surroundings, histogram matching, lip folding, and RASTA filter were applied, and the properties extracted by using the principal component analysis(PCA) were used for recognition. The result of the test has shown the processing speed between 1.15 and 2.35 sec. according to vocalizations in the embedded environment of CPU 806Mhz, RAM 128MB specifications and obtained 77% of recognition as 139 among 180 words were recognized.
https://doi.org/10.3745/KIPSTB.2010.17B.3.227 인용 PDF KSCI

Upper lip tie wrapping into the hard palate and anterior premaxilla causing alveolar hypoplasia

Heo, Woong;Ahn, Hee Chang
- Archives of Craniofacial Surgery
- /
- v.19 no.1
- /
- pp.48-50
- /
- 2018
Bony anomaly caused by lip tie is not many reported yet. There was a case of upper lip tie wrapping into the anterior premaxilla. We represent a case of severe upper lip tie of limited lip motion, upper lips curling inside, and alveolar hypoplasia. Male patient was born on June 3, 2016. He had a deep philtral sulcus, low vermilion border and deep cupid's bow of upper lip due to tension of short, stout and very tight frenulum. His upper lip motion was severely restricted in particular lip eversion. There was anterior alveolar hypoplasia with deep sulcus in anterior maxilla. Resection of frenulum cord with Z-plasty was performed at anterior premaxilla and upper lip sulcus. Frenulum was tightly attached to gingiva through gum and into hard palate. Width of frenulum cord was about 1 cm, and length was about 3 cm. He gained upper lip contour including cupid's bow and normal vermilion border after the surgery. This case is severe upper lip tie showing the premaxillary hypoplasia, abnormal lip motion and contour for child. Although there is mild limitation of feeding with upper lip tie child, early detection and treatment are needed to correct bony growth.
https://doi.org/10.7181/acfs.2018.19.1.48 인용 PDF KSCI

Segmentation of the Lip Region by Color Gamut Compression and Feature Projection (색역 압축과 특징치 투영을 이용한 입술영역 분할)

Kim, Jeong Yeop
- Journal of Korea Multimedia Society
- /
- v.21 no.11
- /
- pp.1279-1287
- /
- 2018
In this paper, a new type of color coordinate conversion is proposed as modified CIEXYZ from RGB to compress the color gamut. The proposed segmentation includes principal component analysis for the optimal projection of a feature vector into a one-dimensional feature. The final step adopted for lip segmentation is Otsu's threshold for a two-class problem. The performance of the proposed method was better than that of conventional methods, especially for the chromatic feature.
https://doi.org/10.9717/kmms.2018.21.11.1279 인용 PDF KSCI HTML

Search Result 57, Processing Time 0.019 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)