KR101092820B1 - 립리딩과 음성 인식 통합 멀티모달 인터페이스 시스템 - Google Patents
립리딩과 음성 인식 통합 멀티모달 인터페이스 시스템 Download PDFInfo
- Publication number
- KR101092820B1 KR101092820B1 KR1020090089637A KR20090089637A KR101092820B1 KR 101092820 B1 KR101092820 B1 KR 101092820B1 KR 1020090089637 A KR1020090089637 A KR 1020090089637A KR 20090089637 A KR20090089637 A KR 20090089637A KR 101092820 B1 KR101092820 B1 KR 101092820B1
- Authority
- KR
- South Korea
- Prior art keywords
- lip
- recognition
- lip reading
- feature
- command
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/36—Input/output arrangements for on-board computers
- G01C21/3602—Input other than that of destination using image analysis, e.g. detection of road signs, lanes, buildings, real preceding vehicles using a camera
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/36—Input/output arrangements for on-board computers
- G01C21/3605—Destination input or retrieval
- G01C21/3608—Destination input or retrieval using speech input, e.g. using speech recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Automation & Control Theory (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (5)
- 삭제
- 삭제
- 오프라인 또는 온라인으로 학습된 명령어별 패턴을 저장하는 입술 특징 데이터베이스;음성 인식을 수행하여 명령어를 인식하는 음성인식 모듈 및 립리딩 인식을 수행하고 영상을 제공하는 립리딩 모듈을 포함하는 멀티모달 인터페이스;상기 립리딩 모듈로부터 제공된 영상에서 립리딩 특징을 검출하는 립리딩 특징 검출부;상기 음성인식 모듈에서 인식한 명령어 추정 확률이 임계값 이상일 때, 상기립리딩 특징 검출부에 의해 검출된 립리딩 특징을 입술 특징의 학습 레이블로 사용하여 학습을 수행하도록 하는 음성 인식 단어 추정 확률 판별부; 및상기 립리딩 특징 검출부에 의해 검출된 립리딩 특징에 대해, 상기 음성인식모듈로부터 제공된 명령어를 레이블로 삼아 K-NN(nearest neighbor) 학습을 수행하여 상기 입술 특징 데이터베이스를 업데이트하는 실시간 립리딩 학습부를 포함하는멀티모달 인터페이스를 이용한 화자 적응 실시간 립리딩 학습 시스템.
- 제3항에 있어서,상기 립리딩 특징 검출부에 의해 립리딩 특징이 정상 검출되었는지를 판별하는 입술 특징 검출 판별부를 더 포함하는 멀티모달 인터페이스를 이용한 화자 적응 실시간 립리딩 학습 시스템.
- 서비스화면 또는 단계별로 입력가능한 명령어 일람을 미리 정의하여, 각 서비스화면 또는 단계에서 입력가능한 명령어 일람을 제공하는 서비스 시나리오 데이터베이스;상태 변화시 각 서비스화면 또는 단계에서 필요한 단어 일람을 상기 서비스시나리오 데이터베이스에 기반하여 설정하는 인식 대상 단어 일람 설정부;상기 인식 대상 단어 일람 설정부에 의해 설정된 인식 대상 단어 일람을 참조하여 음성인식 또는 립리딩 인식을 수행하고, 음성인식 명령어 또는 립리딩 명령어를 출력할 수 있는 멀티모달 인터페이스;서비스 화면;상기 멀티모달 인터페이스에 의해 수행되는 음성인식 또는 립리딩 인식이 성공했는지 판별하는 인식결과 판별부;상기 인식결과 판별부의 판별 결과, 상기 음성인식 또는 립리딩 인식이 성공한 경우, 상기 음성인식 명령어 또는 립리딩 명령어에 따른 화면전환, 음성안내,정보 등록 및 기타 등록 애플리케이션 서비스를 수행하는 서비스 수행부; 및상기 서비스 시나리오 데이터베이스에 정의된 명령어 일람에 따라, 상기 음성인식 명령어 또는 립리딩 명령어에 대응하여 화면 전환을 수행하고 현재 서비스상태 정보를 상기 서비스 화면에 제공하는 화면 전환부를 포함하는 멀티모달 인터페이스를 이용한 대화형 서비스 시스템.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020090089637A KR101092820B1 (ko) | 2009-09-22 | 2009-09-22 | 립리딩과 음성 인식 통합 멀티모달 인터페이스 시스템 |
US12/628,514 US8442820B2 (en) | 2009-09-22 | 2009-12-01 | Combined lip reading and voice recognition multimodal interface system |
CN200910246886.7A CN102023703B (zh) | 2009-09-22 | 2009-12-03 | 组合唇读与语音识别的多模式界面系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020090089637A KR101092820B1 (ko) | 2009-09-22 | 2009-09-22 | 립리딩과 음성 인식 통합 멀티모달 인터페이스 시스템 |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20110032244A KR20110032244A (ko) | 2011-03-30 |
KR101092820B1 true KR101092820B1 (ko) | 2011-12-12 |
Family
ID=43757401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020090089637A KR101092820B1 (ko) | 2009-09-22 | 2009-09-22 | 립리딩과 음성 인식 통합 멀티모달 인터페이스 시스템 |
Country Status (3)
Country | Link |
---|---|
US (1) | US8442820B2 (ko) |
KR (1) | KR101092820B1 (ko) |
CN (1) | CN102023703B (ko) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20200036089A (ko) * | 2018-09-20 | 2020-04-07 | 한국전자통신연구원 | 상호 작용 장치 및 방법 |
US11037552B2 (en) | 2017-12-29 | 2021-06-15 | Samsung Electronics Co., Ltd. | Method and apparatus with a personalized speech recognition model |
Families Citing this family (147)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011209787A (ja) * | 2010-03-29 | 2011-10-20 | Sony Corp | 情報処理装置、および情報処理方法、並びにプログラム |
CN102298443B (zh) * | 2011-06-24 | 2013-09-25 | 华南理工大学 | 结合视频通道的智能家居语音控制系统及其控制方法 |
CN102270308B (zh) * | 2011-07-21 | 2013-09-11 | 武汉大学 | 一种基于五官相关aam模型的面部特征定位方法 |
CN102324035A (zh) * | 2011-08-19 | 2012-01-18 | 广东好帮手电子科技股份有限公司 | 口型辅助语音识别术在车载导航中应用的方法及系统 |
JP2013072974A (ja) * | 2011-09-27 | 2013-04-22 | Toshiba Corp | 音声認識装置、方法及びプログラム |
CN103177238B (zh) * | 2011-12-26 | 2019-01-15 | 宇龙计算机通信科技(深圳)有限公司 | 终端和用户识别方法 |
JP5928606B2 (ja) * | 2011-12-26 | 2016-06-01 | インテル・コーポレーション | 搭乗者の聴覚視覚入力の乗り物ベースの決定 |
US8863042B2 (en) * | 2012-01-24 | 2014-10-14 | Charles J. Kulas | Handheld device with touch controls that reconfigure in response to the way a user operates the device |
US20130212501A1 (en) * | 2012-02-10 | 2013-08-15 | Glen J. Anderson | Perceptual computing with conversational agent |
US8925058B1 (en) * | 2012-03-29 | 2014-12-30 | Emc Corporation | Authentication involving authentication operations which cross reference authentication factors |
US9071892B2 (en) * | 2012-05-14 | 2015-06-30 | General Motors Llc | Switching between acoustic parameters in a convertible vehicle |
US9094509B2 (en) | 2012-06-28 | 2015-07-28 | International Business Machines Corporation | Privacy generation |
KR101992676B1 (ko) * | 2012-07-26 | 2019-06-25 | 삼성전자주식회사 | 영상 인식을 이용하여 음성 인식을 하는 방법 및 장치 |
CN103869962B (zh) * | 2012-12-18 | 2016-12-28 | 联想(北京)有限公司 | 一种数据处理方法、装置及电子设备 |
JP5902632B2 (ja) | 2013-01-07 | 2016-04-13 | 日立マクセル株式会社 | 携帯端末装置及び情報処理システム |
US9094576B1 (en) | 2013-03-12 | 2015-07-28 | Amazon Technologies, Inc. | Rendered audiovisual communication |
DE102013007964B4 (de) * | 2013-05-10 | 2022-08-18 | Audi Ag | Kraftfahrzeug-Eingabevorrichtung mit Zeichenerkennung |
FR3005777B1 (fr) * | 2013-05-15 | 2015-05-22 | Parrot | Procede de reconnaissance vocale visuelle avec selection de groupes de points d'interet les plus pertinents |
CN103366506A (zh) * | 2013-06-27 | 2013-10-23 | 北京理工大学 | 一种驾驶员行车途中接打手机行为的自动监控装置及方法 |
US11199906B1 (en) | 2013-09-04 | 2021-12-14 | Amazon Technologies, Inc. | Global user input management |
US9406295B2 (en) | 2013-11-22 | 2016-08-02 | Intel Corporation | Apparatus and method for voice based user enrollment with video assistance |
US10163455B2 (en) * | 2013-12-03 | 2018-12-25 | Lenovo (Singapore) Pte. Ltd. | Detecting pause in audible input to device |
US9629774B2 (en) | 2014-01-14 | 2017-04-25 | Toyota Motor Engineering & Manufacturing North America, Inc. | Smart necklace with stereo vision and onboard processing |
US10248856B2 (en) | 2014-01-14 | 2019-04-02 | Toyota Motor Engineering & Manufacturing North America, Inc. | Smart necklace with stereo vision and onboard processing |
US9915545B2 (en) | 2014-01-14 | 2018-03-13 | Toyota Motor Engineering & Manufacturing North America, Inc. | Smart necklace with stereo vision and onboard processing |
US9578307B2 (en) | 2014-01-14 | 2017-02-21 | Toyota Motor Engineering & Manufacturing North America, Inc. | Smart necklace with stereo vision and onboard processing |
US10360907B2 (en) | 2014-01-14 | 2019-07-23 | Toyota Motor Engineering & Manufacturing North America, Inc. | Smart necklace with stereo vision and onboard processing |
US10024679B2 (en) | 2014-01-14 | 2018-07-17 | Toyota Motor Engineering & Manufacturing North America, Inc. | Smart necklace with stereo vision and onboard processing |
US20150279364A1 (en) * | 2014-03-29 | 2015-10-01 | Ajay Krishnan | Mouth-Phoneme Model for Computerized Lip Reading |
CN103905873A (zh) * | 2014-04-08 | 2014-07-02 | 天津思博科科技发展有限公司 | 一种基于口型识别技术的电视遥控器 |
CN105022470A (zh) * | 2014-04-17 | 2015-11-04 | 中兴通讯股份有限公司 | 一种基于唇读的终端操作方法及装置 |
CN105096935B (zh) * | 2014-05-06 | 2019-08-09 | 阿里巴巴集团控股有限公司 | 一种语音输入方法、装置和系统 |
CA2950148C (en) * | 2014-06-02 | 2022-07-12 | Tethis, Inc. | Modified biopolymers and methods of producing and using the same |
CN105450970B (zh) * | 2014-06-16 | 2019-03-29 | 联想(北京)有限公司 | 一种信息处理方法及电子设备 |
JP6276132B2 (ja) * | 2014-07-30 | 2018-02-07 | 株式会社東芝 | 発話区間検出装置、音声処理システム、発話区間検出方法およびプログラム |
US10024667B2 (en) | 2014-08-01 | 2018-07-17 | Toyota Motor Engineering & Manufacturing North America, Inc. | Wearable earpiece for providing social and environmental awareness |
CN105468950B (zh) * | 2014-09-03 | 2020-06-30 | 阿里巴巴集团控股有限公司 | 身份认证方法、装置、终端及服务器 |
CN105389097A (zh) * | 2014-09-03 | 2016-03-09 | 中兴通讯股份有限公司 | 一种人机交互装置及方法 |
US9922236B2 (en) | 2014-09-17 | 2018-03-20 | Toyota Motor Engineering & Manufacturing North America, Inc. | Wearable eyeglasses for providing social and environmental awareness |
US10024678B2 (en) | 2014-09-17 | 2018-07-17 | Toyota Motor Engineering & Manufacturing North America, Inc. | Wearable clip for providing social and environmental awareness |
US9881610B2 (en) | 2014-11-13 | 2018-01-30 | International Business Machines Corporation | Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities |
US9626001B2 (en) * | 2014-11-13 | 2017-04-18 | International Business Machines Corporation | Speech recognition candidate selection based on non-acoustic input |
US9741342B2 (en) * | 2014-11-26 | 2017-08-22 | Panasonic Intellectual Property Corporation Of America | Method and apparatus for recognizing speech by lip reading |
CN104409075B (zh) * | 2014-11-28 | 2018-09-04 | 深圳创维-Rgb电子有限公司 | 语音识别方法和系统 |
WO2016098228A1 (ja) * | 2014-12-18 | 2016-06-23 | 三菱電機株式会社 | 音声認識装置および音声認識方法 |
US9576460B2 (en) | 2015-01-21 | 2017-02-21 | Toyota Motor Engineering & Manufacturing North America, Inc. | Wearable smart device for hazard detection and warning based on image and audio data |
US10490102B2 (en) | 2015-02-10 | 2019-11-26 | Toyota Motor Engineering & Manufacturing North America, Inc. | System and method for braille assistance |
US9586318B2 (en) | 2015-02-27 | 2017-03-07 | Toyota Motor Engineering & Manufacturing North America, Inc. | Modular robot with smart device |
US9677901B2 (en) | 2015-03-10 | 2017-06-13 | Toyota Motor Engineering & Manufacturing North America, Inc. | System and method for providing navigation instructions at optimal times |
US9811752B2 (en) | 2015-03-10 | 2017-11-07 | Toyota Motor Engineering & Manufacturing North America, Inc. | Wearable smart device and method for redundant object identification |
US9972216B2 (en) | 2015-03-20 | 2018-05-15 | Toyota Motor Engineering & Manufacturing North America, Inc. | System and method for storing and playback of information for blind users |
CN106157956A (zh) * | 2015-03-24 | 2016-11-23 | 中兴通讯股份有限公司 | 语音识别的方法及装置 |
FR3034215B1 (fr) | 2015-03-27 | 2018-06-15 | Valeo Comfort And Driving Assistance | Procede de commande, dispositif de commande, systeme et vehicule automobile comprenant un tel dispositif de commande |
US10395555B2 (en) * | 2015-03-30 | 2019-08-27 | Toyota Motor Engineering & Manufacturing North America, Inc. | System and method for providing optimal braille output based on spoken and sign language |
CN104808794B (zh) * | 2015-04-24 | 2019-12-10 | 北京旷视科技有限公司 | 一种唇语输入方法和系统 |
CN106203235B (zh) * | 2015-04-30 | 2020-06-30 | 腾讯科技(深圳)有限公司 | 活体鉴别方法和装置 |
US9898039B2 (en) | 2015-08-03 | 2018-02-20 | Toyota Motor Engineering & Manufacturing North America, Inc. | Modular smart necklace |
CN106599764A (zh) * | 2015-10-20 | 2017-04-26 | 深圳市商汤科技有限公司 | 基于唇形特征的活体判断方法及设备 |
CN106651340B (zh) * | 2015-11-02 | 2021-06-29 | 创新先进技术有限公司 | 结算方法及装置 |
US9959872B2 (en) | 2015-12-14 | 2018-05-01 | International Business Machines Corporation | Multimodal speech recognition for real-time video audio-based display indicia application |
CN105632497A (zh) * | 2016-01-06 | 2016-06-01 | 昆山龙腾光电有限公司 | 一种语音输出方法、语音输出系统 |
US10024680B2 (en) | 2016-03-11 | 2018-07-17 | Toyota Motor Engineering & Manufacturing North America, Inc. | Step based guidance system |
WO2017199486A1 (ja) * | 2016-05-16 | 2017-11-23 | ソニー株式会社 | 情報処理装置 |
CN107404381A (zh) * | 2016-05-19 | 2017-11-28 | 阿里巴巴集团控股有限公司 | 一种身份认证方法和装置 |
US9958275B2 (en) | 2016-05-31 | 2018-05-01 | Toyota Motor Engineering & Manufacturing North America, Inc. | System and method for wearable smart device communications |
US10561519B2 (en) | 2016-07-20 | 2020-02-18 | Toyota Motor Engineering & Manufacturing North America, Inc. | Wearable computing device having a curved back to reduce pressure on vertebrae |
US10607258B2 (en) * | 2016-08-02 | 2020-03-31 | International Business Machines Corporation | System, method, and recording medium for fixed-wing aircraft advertisement using locally sampled word listening |
US10559312B2 (en) * | 2016-08-25 | 2020-02-11 | International Business Machines Corporation | User authentication using audiovisual synchrony detection |
JP2018074366A (ja) * | 2016-10-28 | 2018-05-10 | 京セラ株式会社 | 電子機器、制御方法およびプログラム |
US10432851B2 (en) | 2016-10-28 | 2019-10-01 | Toyota Motor Engineering & Manufacturing North America, Inc. | Wearable computing device for detecting photography |
US10012505B2 (en) | 2016-11-11 | 2018-07-03 | Toyota Motor Engineering & Manufacturing North America, Inc. | Wearable system for providing walking directions |
US10521669B2 (en) | 2016-11-14 | 2019-12-31 | Toyota Motor Engineering & Manufacturing North America, Inc. | System and method for providing guidance or feedback to a user |
CN108227904A (zh) * | 2016-12-21 | 2018-06-29 | 深圳市掌网科技股份有限公司 | 一种虚拟现实语言交互系统与方法 |
CN108227903B (zh) * | 2016-12-21 | 2020-01-10 | 深圳市掌网科技股份有限公司 | 一种虚拟现实语言交互系统与方法 |
US10172760B2 (en) | 2017-01-19 | 2019-01-08 | Jennifer Hendrix | Responsive route guidance and identification system |
US10332515B2 (en) * | 2017-03-14 | 2019-06-25 | Google Llc | Query endpointing based on lip detection |
US11189281B2 (en) * | 2017-03-17 | 2021-11-30 | Samsung Electronics Co., Ltd. | Method and system for automatically managing operations of electronic device |
CN107025439B (zh) * | 2017-03-22 | 2020-04-24 | 天津大学 | 基于深度数据的唇部区域特征提取和规范化方法 |
WO2018175959A1 (en) | 2017-03-23 | 2018-09-27 | Joyson Safety Systems Acquisition Llc | System and method of correlating mouth images to input commands |
CN108664842B (zh) * | 2017-03-27 | 2020-12-18 | Tcl科技集团股份有限公司 | 一种唇动识别模型的构建方法及系统 |
CN106875941B (zh) * | 2017-04-01 | 2020-02-18 | 彭楚奥 | 一种服务机器人的语音语义识别方法 |
CN107239139B (zh) | 2017-05-18 | 2018-03-16 | 刘国华 | 基于正视的人机交互方法与系统 |
EP3639248A4 (en) * | 2017-06-12 | 2021-03-10 | The Coca-Cola Company | LOW COST FLOW CONTROL |
US10522147B2 (en) * | 2017-12-21 | 2019-12-31 | Motorola Solutions, Inc. | Device and method for generating text representative of lip movement |
JP7081164B2 (ja) * | 2018-01-17 | 2022-06-07 | 株式会社Jvcケンウッド | 表示制御装置、通信装置、表示制御方法および通信方法 |
US11455986B2 (en) * | 2018-02-15 | 2022-09-27 | DMAI, Inc. | System and method for conversational agent via adaptive caching of dialogue tree |
US11308312B2 (en) | 2018-02-15 | 2022-04-19 | DMAI, Inc. | System and method for reconstructing unoccupied 3D space |
WO2019161198A1 (en) * | 2018-02-15 | 2019-08-22 | DMAI, Inc. | System and method for speech understanding via integrated audio and visual based speech recognition |
WO2019161196A2 (en) * | 2018-02-15 | 2019-08-22 | DMAI, Inc. | System and method for disambiguating a source of sound based on detected lip movement |
CN108520741B (zh) * | 2018-04-12 | 2021-05-04 | 科大讯飞股份有限公司 | 一种耳语音恢复方法、装置、设备及可读存储介质 |
CN108596107A (zh) * | 2018-04-26 | 2018-09-28 | 京东方科技集团股份有限公司 | 基于ar设备的唇语识别方法及其装置、ar设备 |
EP3766065A1 (en) * | 2018-05-18 | 2021-01-20 | Deepmind Technologies Limited | Visual speech recognition by phoneme prediction |
KR102114368B1 (ko) * | 2018-05-23 | 2020-05-22 | 카페24 주식회사 | 사용자 영상을 기반으로 하는 정보 입력 장치, 방법, 시스템 및 컴퓨터 판독 가능한 저장 매체 |
KR102777603B1 (ko) | 2018-06-22 | 2025-03-10 | 현대자동차주식회사 | 대화 시스템 및 이를 이용한 차량 |
CN110767228B (zh) * | 2018-07-25 | 2022-06-03 | 杭州海康威视数字技术股份有限公司 | 一种声音获取方法、装置、设备及系统 |
CN110837758B (zh) * | 2018-08-17 | 2023-06-02 | 杭州海康威视数字技术股份有限公司 | 一种关键词输入方法、装置及电子设备 |
CN109558788B (zh) * | 2018-10-08 | 2023-10-27 | 清华大学 | 静默语音输入辨识方法、计算装置和计算机可读介质 |
CN109448711A (zh) * | 2018-10-23 | 2019-03-08 | 珠海格力电器股份有限公司 | 一种语音识别的方法、装置及计算机存储介质 |
KR20200056754A (ko) * | 2018-11-15 | 2020-05-25 | 삼성전자주식회사 | 개인화 립 리딩 모델 생성 방법 및 장치 |
TWI682325B (zh) * | 2018-11-20 | 2020-01-11 | 新唐科技股份有限公司 | 辨識系統及辨識方法 |
US10863971B2 (en) * | 2018-11-30 | 2020-12-15 | Fujifilm Sonosite, Inc. | Touchless input ultrasound control |
CN111259711A (zh) * | 2018-12-03 | 2020-06-09 | 北京嘀嘀无限科技发展有限公司 | 一种识别唇动的方法和系统 |
KR102717792B1 (ko) * | 2018-12-14 | 2024-10-16 | 삼성전자 주식회사 | 전자 장치의 기능 실행 방법 및 이를 사용하는 전자 장치 |
CN111326152A (zh) * | 2018-12-17 | 2020-06-23 | 南京人工智能高等研究院有限公司 | 语音控制方法及装置 |
WO2020147925A1 (de) * | 2019-01-15 | 2020-07-23 | Siemens Aktiengesellschaft | System zum visualisieren einer geräuschquelle in einer umgebung eines nutzers sowie verfahren |
CN109872714A (zh) * | 2019-01-25 | 2019-06-11 | 广州富港万嘉智能科技有限公司 | 一种提高语音识别准确性的方法、电子设备及存储介质 |
CN111951629A (zh) * | 2019-05-16 | 2020-11-17 | 上海流利说信息技术有限公司 | 一种发音纠正系统、方法、介质和计算设备 |
CN110427809B (zh) * | 2019-06-21 | 2023-07-25 | 平安科技(深圳)有限公司 | 基于深度学习的唇语识别方法、装置、电子设备及介质 |
US11257493B2 (en) | 2019-07-11 | 2022-02-22 | Soundhound, Inc. | Vision-assisted speech processing |
US11348581B2 (en) | 2019-07-12 | 2022-05-31 | Qualcomm Incorporated | Multi-modal user interface |
WO2021007857A1 (zh) * | 2019-07-18 | 2021-01-21 | 深圳海付移通科技有限公司 | 一种身份验证方法、终端设备、存储介质 |
WO2021007856A1 (zh) * | 2019-07-18 | 2021-01-21 | 深圳海付移通科技有限公司 | 一种身份验证方法、终端设备、存储介质 |
JP6977004B2 (ja) | 2019-08-23 | 2021-12-08 | サウンドハウンド,インコーポレイテッド | 車載装置、発声を処理する方法およびプログラム |
CN110750152B (zh) * | 2019-09-11 | 2023-08-29 | 云知声智能科技股份有限公司 | 一种基于唇部动作的人机交互方法和系统 |
CN110765868A (zh) * | 2019-09-18 | 2020-02-07 | 平安科技(深圳)有限公司 | 唇读模型的生成方法、装置、设备及存储介质 |
CN110865705B (zh) * | 2019-10-24 | 2023-09-19 | 中国人民解放军军事科学院国防科技创新研究院 | 多模态融合的通讯方法、装置、头戴设备及存储介质 |
US11244696B2 (en) | 2019-11-06 | 2022-02-08 | Microsoft Technology Licensing, Llc | Audio-visual speech enhancement |
KR102479400B1 (ko) * | 2019-11-06 | 2022-12-21 | 한국과학기술원 | 영상을 활용한 딥러닝 모델 기반의 실시간 립리딩 인터페이스 시스템 |
US11375275B2 (en) | 2019-11-19 | 2022-06-28 | Charter Communications Operating, Llc | Method and system for using lip sequences to control operations of a device |
CN113112997A (zh) * | 2019-12-25 | 2021-07-13 | 华为技术有限公司 | 数据采集的方法及装置 |
CN111462733B (zh) * | 2020-03-31 | 2024-04-16 | 科大讯飞股份有限公司 | 多模态语音识别模型训练方法、装置、设备及存储介质 |
CN111539270A (zh) * | 2020-04-10 | 2020-08-14 | 贵州合谷信息科技有限公司 | 一种用于语音输入法的高识别率微表情识别方法 |
CN111554279A (zh) * | 2020-04-27 | 2020-08-18 | 天津大学 | 一种基于Kinect的多模态人机交互系统 |
CN111563244B (zh) * | 2020-04-29 | 2024-12-13 | 武汉大学 | 身份验证方法、装置、计算机设备和存储介质 |
CN111739534B (zh) * | 2020-06-04 | 2022-12-27 | 广东小天才科技有限公司 | 一种辅助语音识别的处理方法、装置、电子设备及存储介质 |
DE102020118967A1 (de) | 2020-07-17 | 2022-01-20 | Clinomic GmbH | Verfahren zum automatischen lippenlesen mittels einer funktionskomponente und zum bereitstellen der funktionskomponente |
CN111967334B (zh) * | 2020-07-20 | 2023-04-07 | 中国人民解放军军事科学院国防科技创新研究院 | 一种人体意图识别方法、系统以及存储介质 |
CN111986674B (zh) * | 2020-08-13 | 2021-04-09 | 广州仿真机器人有限公司 | 基于三级特征采集的智能语音识别方法 |
CN111933174B (zh) * | 2020-08-16 | 2024-08-30 | 云知声智能科技股份有限公司 | 语音处理方法、装置、设备和系统 |
CN112672021B (zh) * | 2020-12-25 | 2022-05-17 | 维沃移动通信有限公司 | 语言识别方法、装置及电子设备 |
CN112817575B (zh) * | 2021-01-19 | 2024-02-20 | 中科方寸知微(南京)科技有限公司 | 基于唇语识别的汇编语言编辑器及识别方法 |
CN113002461A (zh) * | 2021-03-26 | 2021-06-22 | 芜湖汽车前瞻技术研究院有限公司 | Ar-hud系统的虚像位置调整方法、装置及存储介质 |
US11996114B2 (en) | 2021-05-15 | 2024-05-28 | Apple Inc. | End-to-end time-domain multitask learning for ML-based speech enhancement |
KR102437760B1 (ko) | 2021-05-27 | 2022-08-29 | 이충열 | 컴퓨팅 장치에 의한 음향의 처리 방법, 영상 및 음향의 처리 방법 및 이를 이용한 시스템들 |
CN113450824B (zh) * | 2021-06-28 | 2022-08-16 | 武汉理工大学 | 一种基于多尺度视频特征融合的语音唇读方法及系统 |
CN113611287B (zh) * | 2021-06-29 | 2023-09-12 | 深圳大学 | 一种基于机器学习的发音纠错方法和系统 |
CN113486760A (zh) * | 2021-06-30 | 2021-10-08 | 上海商汤临港智能科技有限公司 | 对象说话检测方法及装置、电子设备和存储介质 |
CN115691498A (zh) * | 2021-07-29 | 2023-02-03 | 华为技术有限公司 | 语音交互方法、电子设备及介质 |
CN113655938B (zh) * | 2021-08-17 | 2022-09-02 | 北京百度网讯科技有限公司 | 一种用于智能座舱的交互方法、装置、设备和介质 |
CN113435421B (zh) * | 2021-08-26 | 2021-11-05 | 湖南大学 | 一种基于跨模态注意力增强的唇语识别方法及系统 |
CN113963528A (zh) * | 2021-10-20 | 2022-01-21 | 浙江理工大学 | 一种人机交互系统 |
CN114299418B (zh) * | 2021-12-10 | 2025-01-03 | 湘潭大学 | 一种粤语唇读识别方法、设备以及存储介质 |
KR20230137814A (ko) | 2022-03-22 | 2023-10-05 | 이충열 | 컴퓨팅 장치와 연동하는 촬영 장치로부터 획득되는 영상을 처리하는 방법 및 이를 이용한 시스템 |
CN114639152A (zh) * | 2022-03-22 | 2022-06-17 | 平安普惠企业管理有限公司 | 基于人脸识别的多模态语音交互方法、装置、设备及介质 |
CN115050092B (zh) * | 2022-05-20 | 2024-08-13 | 宁波明家智能科技有限公司 | 一种面向智能驾驶的唇读算法及系统 |
CN114708642B (zh) * | 2022-05-24 | 2022-11-18 | 成都锦城学院 | 商务英语仿真实训装置、系统、方法及存储介质 |
CN116721661B (zh) * | 2023-08-10 | 2023-10-31 | 深圳中检实验室技术有限公司 | 用于智能安全生物柜的人机交互管理系统 |
Family Cites Families (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0505621A3 (en) * | 1991-03-28 | 1993-06-02 | International Business Machines Corporation | Improved message recognition employing integrated speech and handwriting information |
US5586215A (en) * | 1992-05-26 | 1996-12-17 | Ricoh Corporation | Neural network acoustic and visual speech recognition system |
US5537488A (en) * | 1993-09-16 | 1996-07-16 | Massachusetts Institute Of Technology | Pattern recognition system with statistical classification |
CN1159704C (zh) * | 1994-06-13 | 2004-07-28 | 松下电器产业株式会社 | 信号分析装置 |
KR19980050096A (ko) | 1996-12-20 | 1998-09-15 | 박병재 | 음성과 영상에 의한 차량 동작제어장치 |
US6219639B1 (en) * | 1998-04-28 | 2001-04-17 | International Business Machines Corporation | Method and apparatus for recognizing identity of individuals employing synchronized biometrics |
US6263334B1 (en) * | 1998-11-11 | 2001-07-17 | Microsoft Corporation | Density-based indexing method for efficient execution of high dimensional nearest-neighbor queries on large databases |
US6366885B1 (en) * | 1999-08-27 | 2002-04-02 | International Business Machines Corporation | Speech driven lip synthesis using viseme based hidden markov models |
US6633844B1 (en) * | 1999-12-02 | 2003-10-14 | International Business Machines Corporation | Late integration in audio-visual continuous speech recognition |
US6931351B2 (en) * | 2001-04-20 | 2005-08-16 | International Business Machines Corporation | Decision making in classification problems |
US7130446B2 (en) * | 2001-12-03 | 2006-10-31 | Microsoft Corporation | Automatic detection and tracking of multiple individuals using multiple cues |
US7165029B2 (en) * | 2002-05-09 | 2007-01-16 | Intel Corporation | Coupled hidden Markov model for audiovisual speech recognition |
JP4363076B2 (ja) * | 2002-06-28 | 2009-11-11 | 株式会社デンソー | 音声制御装置 |
US7587318B2 (en) * | 2002-09-12 | 2009-09-08 | Broadcom Corporation | Correlating video images of lip movements with audio signals to improve speech recognition |
KR100499030B1 (ko) | 2002-12-16 | 2005-07-01 | 한국전자통신연구원 | 휴대용 단말에서 입술인식 인터페이스 입력장치 및 방법 |
US7472063B2 (en) * | 2002-12-19 | 2008-12-30 | Intel Corporation | Audio-visual feature fusion and support vector machine useful for continuous speech recognition |
US7269560B2 (en) * | 2003-06-27 | 2007-09-11 | Microsoft Corporation | Speech detection and enhancement using audio/video fusion |
KR100682889B1 (ko) * | 2003-08-29 | 2007-02-15 | 삼성전자주식회사 | 영상에 기반한 사실감 있는 3차원 얼굴 모델링 방법 및 장치 |
US7587064B2 (en) * | 2004-02-03 | 2009-09-08 | Hrl Laboratories, Llc | Active learning system for object fingerprinting |
JP2005292401A (ja) * | 2004-03-31 | 2005-10-20 | Denso Corp | カーナビゲーション装置 |
US7133048B2 (en) * | 2004-06-30 | 2006-11-07 | Mitsubishi Electric Research Laboratories, Inc. | Variable multilinear models for facial synthesis |
WO2007052100A2 (en) * | 2005-02-15 | 2007-05-10 | Dspv, Ltd. | System and method of user interface and data entry from a video call |
US20070061335A1 (en) * | 2005-09-14 | 2007-03-15 | Jorey Ramer | Multimodal search query processing |
KR100680278B1 (ko) | 2005-12-28 | 2007-02-07 | 고려대학교 산학협력단 | 입술모양 추출방법 및 그 장치 |
JP4775961B2 (ja) | 2006-12-08 | 2011-09-21 | 公立大学法人大阪府立大学 | 映像を用いた発音の推定方法 |
KR20080073933A (ko) * | 2007-02-07 | 2008-08-12 | 삼성전자주식회사 | 객체 트래킹 방법 및 장치, 그리고 객체 포즈 정보 산출방법 및 장치 |
KR101373206B1 (ko) | 2007-02-12 | 2014-03-12 | 삼성전자 주식회사 | 음성인식과 영상인식을 이용한 휴대단말기에서의 문서작성방법 |
KR100851981B1 (ko) * | 2007-02-14 | 2008-08-12 | 삼성전자주식회사 | 비디오 영상에서 실 객체 판별 방법 및 장치 |
JP2008310382A (ja) | 2007-06-12 | 2008-12-25 | Omron Corp | 読唇装置および方法、情報処理装置および方法、検出装置および方法、プログラム、データ構造、並びに、記録媒体 |
KR100897149B1 (ko) | 2007-10-19 | 2009-05-14 | 에스케이 텔레콤주식회사 | 텍스트 분석 기반의 입 모양 동기화 장치 및 방법 |
KR100840021B1 (ko) * | 2007-11-05 | 2008-06-20 | (주)올라웍스 | 특성 데이터를 이용하여 디지털 데이터에 포함된 인물의얼굴에 대해 인식하는 방법 및 시스템 |
KR101170612B1 (ko) | 2008-03-11 | 2012-08-03 | 에스케이 텔레콤주식회사 | 사용자 영상을 이용한 음성인식 시스템 및 방법 |
-
2009
- 2009-09-22 KR KR1020090089637A patent/KR101092820B1/ko active IP Right Grant
- 2009-12-01 US US12/628,514 patent/US8442820B2/en active Active
- 2009-12-03 CN CN200910246886.7A patent/CN102023703B/zh active Active
Non-Patent Citations (1)
Title |
---|
3차원 확장된 AAM 모델을 이용한 얼굴 형상 추적(정보과학회)* |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11037552B2 (en) | 2017-12-29 | 2021-06-15 | Samsung Electronics Co., Ltd. | Method and apparatus with a personalized speech recognition model |
US12236941B2 (en) | 2017-12-29 | 2025-02-25 | Samsung Electronics Co., Ltd. | Method and apparatus with a personalized speech recognition model |
KR20200036089A (ko) * | 2018-09-20 | 2020-04-07 | 한국전자통신연구원 | 상호 작용 장치 및 방법 |
US10800043B2 (en) | 2018-09-20 | 2020-10-13 | Electronics And Telecommunications Research Institute | Interaction apparatus and method for determining a turn-taking behavior using multimodel information |
KR102168802B1 (ko) * | 2018-09-20 | 2020-10-22 | 한국전자통신연구원 | 상호 작용 장치 및 방법 |
Also Published As
Publication number | Publication date |
---|---|
CN102023703B (zh) | 2015-03-11 |
CN102023703A (zh) | 2011-04-20 |
KR20110032244A (ko) | 2011-03-30 |
US8442820B2 (en) | 2013-05-14 |
US20110071830A1 (en) | 2011-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101092820B1 (ko) | 립리딩과 음성 인식 통합 멀티모달 인터페이스 시스템 | |
Shin et al. | Real-time lip reading system for isolated Korean word recognition | |
CN109941231B (zh) | 车载终端设备、车载交互系统和交互方法 | |
CN109410957B (zh) | 基于计算机视觉辅助的正面人机交互语音识别方法及系统 | |
CN102298443B (zh) | 结合视频通道的智能家居语音控制系统及其控制方法 | |
KR102061925B1 (ko) | 깊이 기반 콘텍스트 식별 | |
US11605379B2 (en) | Artificial intelligence server | |
CN202110564U (zh) | 结合视频通道的智能家居语音控制系统 | |
CN102324035A (zh) | 口型辅助语音识别术在车载导航中应用的方法及系统 | |
US20200005795A1 (en) | Device and method for providing voice recognition service based on artificial intelligence | |
KR20210010270A (ko) | 로봇 및 그의 기동어 인식 방법 | |
JP2005178473A (ja) | 車載機器用インターフェース | |
US20160267909A1 (en) | Voice recognition device for vehicle | |
JP2024161380A (ja) | コンピューティングデバイス | |
US11810575B2 (en) | Artificial intelligence robot for providing voice recognition function and method of operating the same | |
US11322134B2 (en) | Artificial intelligence device and operating method thereof | |
US20210193119A1 (en) | Artificial intelligence apparatus for training acoustic model | |
US11501757B2 (en) | Artificial intelligence apparatus | |
CN113963692A (zh) | 一种车舱内语音指令控制方法及相关设备 | |
Yang et al. | Av-pedaware: Self-supervised audio-visual fusion for dynamic pedestrian awareness | |
US12094222B2 (en) | Cabin monitoring and situation understanding perceiving method and system thereof | |
CN117995187A (zh) | 一种基于深度学习的客服机器人与对话处理系统及方法 | |
US20200051571A1 (en) | Artificial intelligence device | |
CN111724786A (zh) | 唇语识别系统及方法 | |
CN116109673A (zh) | 一种基于行人姿态估计的多帧轨迹跟踪系统及其方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
PA0109 | Patent application |
Patent event code: PA01091R01D Comment text: Patent Application Patent event date: 20090922 |
|
PA0201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
PE0902 | Notice of grounds for rejection |
Comment text: Notification of reason for refusal Patent event date: 20110309 Patent event code: PE09021S01D |
|
PG1501 | Laying open of application | ||
N231 | Notification of change of applicant | ||
PN2301 | Change of applicant |
Patent event date: 20110610 Comment text: Notification of Change of Applicant Patent event code: PN23011R01D |
|
E701 | Decision to grant or registration of patent right | ||
PE0701 | Decision of registration |
Patent event code: PE07011S01D Comment text: Decision to Grant Registration Patent event date: 20111104 |
|
GRNT | Written decision to grant | ||
PR0701 | Registration of establishment |
Comment text: Registration of Establishment Patent event date: 20111205 Patent event code: PR07011E01D |
|
PR1002 | Payment of registration fee |
Payment date: 20111205 End annual number: 3 Start annual number: 1 |
|
PG1601 | Publication of registration | ||
FPAY | Annual fee payment |
Payment date: 20141128 Year of fee payment: 4 |
|
PR1001 | Payment of annual fee |
Payment date: 20141128 Start annual number: 4 End annual number: 4 |
|
PR1001 | Payment of annual fee |
Payment date: 20201126 Start annual number: 10 End annual number: 10 |
|
PR1001 | Payment of annual fee |
Payment date: 20231120 Start annual number: 13 End annual number: 13 |