• Title/Summary/Keyword: speech management

Search Result 256, Processing Time 0.037 seconds

Development of Digital Endoscopic Data Management System (디지탈 내시경 데이터 management system의 개발)

  • Song, C.G.;Lee, S.M.;Lee, Y.M.;Kim, W.K.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1996 no.11
    • /
    • pp.304-306
    • /
    • 1996
  • Endoscopy has become a crucial diagnostic and theraputic procedure in clinical areas. Over the past three years, we have developed a computerized system to record and store clinical data pertaining to endoscopic surgery of laparascopic cholesystectomy, peviscopic endometriosis, and surgical arthroscopy. In this study, we are developed computer system, which is composed of frame grabber, sound board, VCR control board, LAN card and EDMS(endoscopic data management software). Also, computer system has controled over peripheral instruments as a color video printer, video cassette recorder, and endoscopic input/output signals(image and doctor's speech). Also, we are developed one body system of camels control unit including an endoscopic miniature camera and light source. Our system offer unsurpassed image quality in terms of resolution and color fidelity. Digital endoscopic data management system is based on open architecture and a set of widely available industry standards, namely: windows 3.1 as a operating system, TCP/IP as a network protocol and a time sequence based database that handles both an image and drctor's speech synchronized with endoscopic image.

  • PDF

DialogStudio;A Spoken Dialog System Workbench (음성대화시스템 워크벤취로서의 DialogStudio 개발)

  • Jung, Sang-Keun;Lee, Cheon-Jae;Lee, Geun-Bae
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.311-314
    • /
    • 2007
  • Spoken dialog system development includes many laborious and inefficient tasks. Since there are many components such as speech recognizer, language understanding, dialog management and knowledge management in a spoken dialog system, a developer should take an effort to edit corpus and train each model separately. To reduce a cost for editting corpus and training each models, we need more systematic and efficent working environment. For the working environment, we propose DialogStudio as an spoken dialog system workbench.

  • PDF

A Study on 8kbps PC-MPC by Using Position Compensation Method of Multi-Pulse (멀티펄스의 위치보정 방법을 이용한 8kbps PC-MPC에 관한 연구)

  • Lee, See-Woo
    • Journal of Digital Convergence
    • /
    • v.11 no.5
    • /
    • pp.285-290
    • /
    • 2013
  • In a MPC coding using excitation source of voiced and unvoiced, it would be a distortion of speech waveform. This is caused by normalization of synthesis speech waveform of voiced in the process of restoration the multi-pulses of representation section. To solve this problem, this paper present a method of position compensation(PC-MPC) in a multi-pulses each pitch interval in order to reduce distortion of speech waveform. I was confirmed that the method can be synthesized close to the original speech waveform. And I evaluate the MPC and PC-MPC using multi-pulses position compensation method. As a result, $SNR_{seg}$ of PC-MPC was improved 0.4dB for female voice and 0.5dB for male voice respectively. Compared to the MPC, $SNR_{seg}$ of PC-MPC has been improved that I was able to control the distortion of the speech waveform finally. And so, I expect to be able to this method for cellular phone and smart phone using excitation source of low bit rate.

Speaker Adaptation Using i-Vector Based Clustering

  • Kim, Minsoo;Jang, Gil-Jin;Kim, Ji-Hwan;Lee, Minho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.2785-2799
    • /
    • 2020
  • We propose a novel speaker adaptation method using acoustic model clustering. The similarity of different speakers is defined by the cosine distance between their i-vectors (intermediate vectors), and various efficient clustering algorithms are applied to obtain a number of speaker subsets with different characteristics. The speaker-independent model is then retrained with the training data of the individual speaker subsets grouped by the clustering results, and an unknown speech is recognized by the retrained model of the closest cluster. The proposed method is applied to a large-scale speech recognition system implemented by a hybrid hidden Markov model and deep neural network framework. An experiment was conducted to evaluate the word error rates using Resource Management database. When the proposed speaker adaptation method using i-vector based clustering was applied, the performance, as compared to that of the conventional speaker-independent speech recognition model, was improved relatively by as much as 12.2% for the conventional fully neural network, and by as much as 10.5% for the bidirectional long short-term memory.

The Diagnosis and Management of Velopharyngeal Insufficiency (연구개인두 폐쇄 부전 환자의 진단과 치료)

  • Lee, Yong-Kwon;Choi, Jae-Pyong;Choi, Jin-Young
    • Korean Journal of Cleft Lip And Palate
    • /
    • v.11 no.1
    • /
    • pp.13-22
    • /
    • 2008
  • Velopharyngeal insufficiency(VPI), characterized by hypernasal resonance and nasal air emission, is a speech disorder that can significantly compromise speech intelligibility. Cleft palate, previously repaired cleft palate and submucous cleft palate are associated with VPI. Less commonly, patients may acquire it after adenoidectomy with or without tonsillectomy or as a result of neuromuscular dysfunction. Comprehensive evaluation by a VPI team includes medical assessment focusing on airway obstructive symptoms, perceptual speech analysis, MRI and instrumental assessment. Options for intervention include speech therapy, intraoral prosthetic devices and surgery. Surgical methods can be categorized as palatal, palatopharyngeal or pharyngeal procedures. Each surgical approach has its strengths and limitations. Oro-maxillofacial surgeons are increasingly involved in the referral, evaluation, and treatment of velopharyngeal function. Therefore, understanding of physiology, anatomic structures, evaluation and treatment protocols in VPI is very important. This article presents protocol for evaluation of velopharyngeal function with a focus on indications for surgical interventions.

  • PDF

Literature Analysis on PROMPT Treatment (1984-2020) (프롬프트(PROMPT) 치료기법에 관한 문헌 분석(1984-2020년))

  • Kim, Wha-soo;Lee, Rio;Lee, Ji-woo
    • Journal of Digital Convergence
    • /
    • v.19 no.2
    • /
    • pp.447-456
    • /
    • 2021
  • This study analyzed 28 domestic and foreign studies related Prompts for Restructuring Oral Muscular Phonetic Targets treatment techniques from 1984 to 2020 to prepare basic data for the development of PROMPT intervention programs and examination tools. According to the analysis, continuous research has been conducted since 1984 when the prompt study was first started, and the method of research was 16 intervention studies, with the highest number of speech disorders, and the target age being 3 to 5 years old, the most frequently conducted for infancy. The treatment was the most frequent in the 16th sessions, and the activities were based on the Motor Speech Hierarchy(MSH), except for the subjects of the non-verbal autism spectrum disorder. According to the analysis of the dependent variables, 'speech production' was the most common, followed by 'speech motor control', 'articulation', and 'speech intelligibility' were highest. Combined with all these studies, it suggests that PROMPT, which are directly useful for exercise spoken word production, are effectively being used outside the country and that it is necessary to develop a PROMPT program that can be applied domestically, in Korea.

Linguistic Features of Spontaneous Speech Production in Normal Aging, Alzheimer's Disease (정상 노인과 알츠하이머성 치매 환자의 자발화 산출에서의 언어적 특징)

  • Kim, Jung Wan
    • 한국노년학
    • /
    • v.32 no.3
    • /
    • pp.747-758
    • /
    • 2012
  • Detecting probable Alzheimer's disease (AD) at an early stage is crucial in slowing the progression of the disease and initiating drug therapy for more effective symptom management. Therefore, this study aimed to identify linguistic features that allow us to distinguish between patients with AD and normal controls. This paper reports on characteristics of spontaneous speech in subjects in three stages of AD (questionable, mild, moderate) compared with education- and age-matched normal controls. Four components of speech were measured in Korean native speakers with AD and normal aging: speech tempo, hesitation (measured in seconds), rate of articulation errors, and rate of grammatical errors. The results revealed significant differences in most of these speech components among the four groups, including significant differences between normal controls and the questionable AD group in the areas of speech tempo and rate of grammatical errors. Phonological? articulatory ability was preserved in questionable AD, and grammatical ability was preserved in questionable and mild AD. Subjects with moderate AD were severely impaired in grammatical ability. Prospective assessments of spontaneous speech skills using a dialogue and picture-description task are useful in detecting the subtle, spontaneous speech impairments that AD causes even in its early stage.

A Study on 8kbps FBD-MPC Method Considering Low Bit Rate (Low Bit Rate을 고려한 8kbps FBD-MPC 방식에 관한 연구)

  • Lee, See-Woo
    • Journal of Digital Convergence
    • /
    • v.12 no.6
    • /
    • pp.271-276
    • /
    • 2014
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech quality in case coexist with a voiced and unvoiced consonants in a frame. In this paper, I propose a method of 8kbps Multi-Pulse Speech Coding(FBD-MPC: Frequency Band Division MPC) by using TSIUVC(Transition Segment Including Unvoiced Consonant) searching, extraction and approximation-synthesis method in a frequency domain. I evaluate the 8kbps MPC and FBD-MPC. As a result, SNRseg of FBD-MPC was improved 0.5dB for female voice and 0.2dB for male voice respectively. Compared to the MPC, SNRseg of FBD-MPC has been improved that I was able to control the distortion of the speech waveform finally. And so, I expect to be able to this method for cellular phone and smart phone using excitation source of low bit rate.

Voice Activity Detection in Noisy Environment using Speech Energy Maximization and Silence Feature Normalization (음성 에너지 최대화와 묵음 특징 정규화를 이용한 잡음 환경에 강인한 음성 검출)

  • Ahn, Chan-Shik;Choi, Ki-Ho
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.169-174
    • /
    • 2013
  • Speech recognition, the problem of performance degradation is the difference between the model training and recognition environments. Silence features normalized using the method as a way to reduce the inconsistency of such an environment. Silence features normalized way of existing in the low signal-to-noise ratio. Increase the energy level of the silence interval for voice and non-voice classification accuracy due to the falling. There is a problem in the recognition performance is degraded. This paper proposed a robust speech detection method in noisy environments using a silence feature normalization and voice energy maximize. In the high signal-to-noise ratio for the proposed method was used to maximize the characteristics receive less characterized the effects of noise by the voice energy. Cepstral feature distribution of voice / non-voice characteristics in the low signal-to-noise ratio and improves the recognition performance. Result of the recognition experiment, recognition performance improved compared to the conventional method.

Design of a Mirror for Fragrance Recommendation based on Personal Emotion Analysis (개인의 감성 분석 기반 향 추천 미러 설계)

  • Hyeonji Kim;Yoosoo Oh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.4
    • /
    • pp.11-19
    • /
    • 2023
  • The paper proposes a smart mirror system that recommends fragrances based on user emotion analysis. This paper combines natural language processing techniques such as embedding techniques (CounterVectorizer and TF-IDF) and machine learning classification models (DecisionTree, SVM, RandomForest, SGD Classifier) to build a model and compares the results. After the comparison, the paper constructs a personal emotion-based fragrance recommendation mirror model based on the SVM and word embedding pipeline-based emotion classifier model with the highest performance. The proposed system implements a personalized fragrance recommendation mirror based on emotion analysis, providing web services using the Flask web framework. This paper uses the Google Speech Cloud API to recognize users' voices and use speech-to-text (STT) to convert voice-transcribed text data. The proposed system provides users with information about weather, humidity, location, quotes, time, and schedule management.