• 제목/요약/키워드: Multimodal fusion

검색결과 53건 처리시간 0.029초

TV 가이드 영역에서의 음성기반 멀티모달 사용 유형 분석 (Speech-Oriented Multimodal Usage Pattern Analysis for TV Guide Application Scenarios)

  • 김지영;이경님;홍기형
    • 대한음성학회지:말소리
    • /
    • 제58호
    • /
    • pp.101-117
    • /
    • 2006
  • The development of efficient multimodal interfaces and fusion algorithms requires knowledge of usage patterns that show how people use multiple modalities. We analyzed multimodal usage patterns for TV-guide application scenarios (or tasks). In order to collect usage patterns, we implemented a multimodal usage pattern collection system having two input modalities: speech and touch-gesture. Fifty-four subjects participated in our study. Analysis of the collected usage patterns shows a positive correlation between the task type and multimodal usage patterns. In addition, we analyzed the timing between speech-utterances and their corresponding touch-gestures that shows the touch-gesture occurring time interval relative to the duration of speech utterance. We believe that, for developing efficient multimodal fusion algorithms on an application, the multimodal usage pattern analysis for the given application, similar to our work for TV guide application, have to be done in advance.

  • PDF

Multimodal Biometric Using a Hierarchical Fusion of a Person's Face, Voice, and Online Signature

  • Elmir, Youssef;Elberrichi, Zakaria;Adjoudj, Reda
    • Journal of Information Processing Systems
    • /
    • 제10권4호
    • /
    • pp.555-567
    • /
    • 2014
  • Biometric performance improvement is a challenging task. In this paper, a hierarchical strategy fusion based on multimodal biometric system is presented. This strategy relies on a combination of several biometric traits using a multi-level biometric fusion hierarchy. The multi-level biometric fusion includes a pre-classification fusion with optimal feature selection and a post-classification fusion that is based on the similarity of the maximum of matching scores. The proposed solution enhances biometric recognition performances based on suitable feature selection and reduction, such as principal component analysis (PCA) and linear discriminant analysis (LDA), as much as not all of the feature vectors components support the performance improvement degree.

다중 바이오 인증에서 특징 융합과 결정 융합의 결합 (Combining Feature Fusion and Decision Fusion in Multimodal Biometric Authentication)

  • 이경희
    • 정보보호학회논문지
    • /
    • 제20권5호
    • /
    • pp.133-138
    • /
    • 2010
  • 본 논문은 얼굴과 음성 정보를 사용한 다중 바이오 인증에서, 특정 단계의 융합과 결정 단계의 융합을 동시에 수행하는 다단계 융합 방법을 제안한다. 얼굴과 음성 특징을 1차 융합한 얼굴 음성 융합특징에 대해 Support Vector Machines(SVM)을 생성한 후, 이 융합특징 SVM 인증기의 결정과 얼굴 SVM 인증기의 결정, 음성 SVM 인증기의 결정들을 다시 2차 융합하여 최종 인증 여부를 결정한다. XM2VTS 멀티모달 데이터베이스를 사용하여 특징 단계 융합, 결정 단계 융합, 다단계 융합 인증을 비교 실험한 결과, 제안한 다단계 융합에 의한 인증이 가장 우수한 성능을 보였다.

적외선 영상, 라이다 데이터 및 특성정보 융합 기반의 합성곱 인공신경망을 이용한 건물탐지 (Building Detection by Convolutional Neural Network with Infrared Image, LiDAR Data and Characteristic Information Fusion)

  • 조은지;이동천
    • 한국측량학회지
    • /
    • 제38권6호
    • /
    • pp.635-644
    • /
    • 2020
  • 딥러닝(DL)을 이용한 객체인식, 탐지 및 분할하는 연구는 여러 분야에서 활용되고 있으며, 주로 영상을 DL 모델의 학습 데이터로 사용하고 있지만, 본 논문은 영상뿐 아니라 공간정보 특성을 포함하는 다양한 학습 데이터(multimodal training data)를 향상된 영역기반 합성곱 신경망(R-CNN)인 Detectron2 모델 학습에 사용하여 객체를 분할하고 건물을 탐지하는 것이 목적이다. 이를 위하여 적외선 항공영상과 라이다 데이터의 내재된 객체의 윤곽 및 통계적 질감정보인 Haralick feature와 같은 여러 특성을 추출하였다. DL 모델의 학습 성능은 데이터의 수량과 특성뿐 아니라 융합방법에 의해 좌우된다. 초기융합(early fusion)과 후기융합(late fusion)의 혼용방식인 하이브리드 융합(hybrid fusion)을 적용한 결과 33%의 건물을 추가적으로 탐지 할 수 있다. 이와 같은 실험 결과는 서로 다른 특성 데이터의 복합적 학습과 융합에 의한 상호보완적 효과를 입증하였다고 판단된다.

조명을 위한 인간 자세와 다중 모드 이미지 융합 - 인간의 이상 행동에 대한 강력한 탐지 (Multimodal Image Fusion with Human Pose for Illumination-Robust Detection of Human Abnormal Behaviors)

  • ;공성곤
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 추계학술발표대회
    • /
    • pp.637-640
    • /
    • 2023
  • This paper presents multimodal image fusion with human pose for detecting abnormal human behaviors in low illumination conditions. Detecting human behaviors in low illumination conditions is challenging due to its limited visibility of the objects of interest in the scene. Multimodal image fusion simultaneously combines visual information in the visible spectrum and thermal radiation information in the long-wave infrared spectrum. We propose an abnormal event detection scheme based on the multimodal fused image and the human poses using the keypoints to characterize the action of the human body. Our method assumes that human behaviors are well correlated to body keypoints such as shoulders, elbows, wrists, hips. In detail, we extracted the human keypoint coordinates from human targets in multimodal fused videos. The coordinate values are used as inputs to train a multilayer perceptron network to classify human behaviors as normal or abnormal. Our experiment demonstrates a significant result on multimodal imaging dataset. The proposed model can capture the complex distribution pattern for both normal and abnormal behaviors.

Multimodal Attention-Based Fusion Model for Context-Aware Emotion Recognition

  • Vo, Minh-Cong;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • 제18권3호
    • /
    • pp.11-20
    • /
    • 2022
  • Human Emotion Recognition is an exciting topic that has been attracting many researchers for a lengthy time. In recent years, there has been an increasing interest in exploiting contextual information on emotion recognition. Some previous explorations in psychology show that emotional perception is impacted by facial expressions, as well as contextual information from the scene, such as human activities, interactions, and body poses. Those explorations initialize a trend in computer vision in exploring the critical role of contexts, by considering them as modalities to infer predicted emotion along with facial expressions. However, the contextual information has not been fully exploited. The scene emotion created by the surrounding environment, can shape how people perceive emotion. Besides, additive fusion in multimodal training fashion is not practical, because the contributions of each modality are not equal to the final prediction. The purpose of this paper was to contribute to this growing area of research, by exploring the effectiveness of the emotional scene gist in the input image, to infer the emotional state of the primary target. The emotional scene gist includes emotion, emotional feelings, and actions or events that directly trigger emotional reactions in the input image. We also present an attention-based fusion network, to combine multimodal features based on their impacts on the target emotional state. We demonstrate the effectiveness of the method, through a significant improvement on the EMOTIC dataset.

Multimodal System by Data Fusion and Synergetic Neural Network

  • Son, Byung-Jun;Lee, Yill-Byung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제5권2호
    • /
    • pp.157-163
    • /
    • 2005
  • In this paper, we present the multimodal system based on the fusion of two user-friendly biometric modalities: Iris and Face. In order to reach robust identification and verification we are going to combine two different biometric features. we specifically apply 2-D discrete wavelet transform to extract the feature sets of low dimensionality from iris and face. And then to obtain Reduced Joint Feature Vector(RJFV) from these feature sets, Direct Linear Discriminant Analysis (DLDA) is used in our multimodal system. In addition, the Synergetic Neural Network(SNN) is used to obtain matching score of the preprocessed data. This system can operate in two modes: to identify a particular person or to verify a person's claimed identity. Our results for both cases show that the proposed method leads to a reliable person authentication system.

A multisource image fusion method for multimodal pig-body feature detection

  • Zhong, Zhen;Wang, Minjuan;Gao, Wanlin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권11호
    • /
    • pp.4395-4412
    • /
    • 2020
  • The multisource image fusion has become an active topic in the last few years owing to its higher segmentation rate. To enhance the accuracy of multimodal pig-body feature segmentation, a multisource image fusion method was employed. Nevertheless, the conventional multisource image fusion methods can not extract superior contrast and abundant details of fused image. To superior segment shape feature and detect temperature feature, a new multisource image fusion method was presented and entitled as NSST-GF-IPCNN. Firstly, the multisource images were resolved into a range of multiscale and multidirectional subbands by Nonsubsampled Shearlet Transform (NSST). Then, to superior describe fine-scale texture and edge information, even-symmetrical Gabor filter and Improved Pulse Coupled Neural Network (IPCNN) were used to fuse low and high-frequency subbands, respectively. Next, the fused coefficients were reconstructed into a fusion image using inverse NSST. Finally, the shape feature was extracted using automatic threshold algorithm and optimized using morphological operation. Nevertheless, the highest temperature of pig-body was gained in view of segmentation results. Experiments revealed that the presented fusion algorithm was able to realize 2.102-4.066% higher average accuracy rate than the traditional algorithms and also enhanced efficiency.

Emotion Recognition Method Based on Multimodal Sensor Fusion Algorithm

  • Moon, Byung-Hyun;Sim, Kwee-Bo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제8권2호
    • /
    • pp.105-110
    • /
    • 2008
  • Human being recognizes emotion fusing information of the other speech signal, expression, gesture and bio-signal. Computer needs technologies that being recognized as human do using combined information. In this paper, we recognized five emotions (normal, happiness, anger, surprise, sadness) through speech signal and facial image, and we propose to method that fusing into emotion for emotion recognition result is applying to multimodal method. Speech signal and facial image does emotion recognition using Principal Component Analysis (PCA) method. And multimodal is fusing into emotion result applying fuzzy membership function. With our experiments, our average emotion recognition rate was 63% by using speech signals, and was 53.4% by using facial images. That is, we know that speech signal offers a better emotion recognition rate than the facial image. We proposed decision fusion method using S-type membership function to heighten the emotion recognition rate. Result of emotion recognition through proposed method, average recognized rate is 70.4%. We could know that decision fusion method offers a better emotion recognition rate than the facial image or speech signal.

이중스케일분해기와 미세정보 보존모델에 기반한 다중 모드 의료영상 융합연구 (Multimodal Medical Image Fusion Based on Two-Scale Decomposer and Detail Preservation Model)

  • 장영매;이효종
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 추계학술발표대회
    • /
    • pp.655-658
    • /
    • 2021
  • The purpose of multimodal medical image fusion (MMIF) is to integrate images of different modes with different details into a result image with rich information, which is convenient for doctors to accurately diagnose and treat the diseased tissues of patients. Encouraged by this purpose, this paper proposes a novel method based on a two-scale decomposer and detail preservation model. The first step is to use the two-scale decomposer to decompose the source image into the energy layers and structure layers, which have the characteristic of detail preservation. And then, structure tensor operator and max-abs are combined to fuse the structure layers. The detail preservation model is proposed for the fusion of the energy layers, which greatly improves the image performance. The fused image is achieved by summing up the two fused sub-images obtained by the above fusion rules. Experiments demonstrate that the proposed method has superior performance compared with the state-of-the-art fusion methods.