Huang et al., 2024 - Google Patents

Speech recognition and intelligent translation under multimodal human–computer interaction system

Huang et al., 2024

Document ID: 14086561293208570949
Author: Huang D; Xiang S
Publication year: 2024
Publication venue: Journal of Intelligent Systems

External Links

Cited by

Snippet

The traditional translation robot is limited to the translation of single-mode text images and text videos, which has the problem of low translation accuracy. Therefore, speech recognition and intelligent translation in multimodal human–computer interaction (HCI) …

Continue reading at www.degruyter.com (HTML) (other versions)

238000013519 translation 0 title abstract description 86

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR

Similar Documents

Publication	Publication Date	Title
Zhang et al.	2019	Spontaneous speech emotion recognition using multiscale deep convolutional LSTM
Amiri et al.	2024	Adventures in data analysis: A systematic review of Deep Learning techniques for pattern recognition in cyber-physical-social systems
Chiu et al.	2020	Enabling intelligent environment by the design of emotionally aware virtual assistant: A case of smart campus
Liu et al.	2021	Multi-modal fusion emotion recognition method of speech expression based on deep learning
Hussain et al.	2024	A tutorial on open-source large language models for behavioral science
Han et al.	2020	A review on sentiment discovery and analysis of educational big‐data
KR20190125153A (en)	2019-11-06	An apparatus for predicting the status of user's psychology and a method thereof
Gharavian et al.	2017	Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks
Wei et al.	2019	A novel speech emotion recognition algorithm based on wavelet kernel sparse classifier in stacked deep auto-encoder model
Fung et al.	2018	Empathetic dialog systems
CN113705238A (en)	2021-11-26	Method and model for analyzing aspect level emotion based on BERT and aspect feature positioning model
Cabada et al.	2018	Mining of educational opinions with deep learning
Zhang	2021	Voice keyword retrieval method using attention mechanism and multimodal information fusion
Ming-Hao et al.	2019	Data fusion methods in multimodal human computer dialog
Chaudhary et al.	2022	Signnet ii: A transformer-based two-way sign language translation model
Wu et al.	2022	Machine translation of English speech: Comparison of multiple algorithms
Keren et al.	2018	Deep learning for multisensorial and multimodal interaction
CN117251057A (en)	2023-12-19	AIGC-based method and system for constructing AI number wisdom
Lai et al.	2022	Multimodal sentiment analysis with asymmetric window multi-attentions
Wang	2023	Recognition of English speech–using a deep learning algorithm
Al-Saadawi et al.	2024	A systematic review of trimodal affective computing approaches: Text, audio, and visual integration in emotion recognition and sentiment analysis
Ren et al.	2021	ABML: attention-based multi-task learning for jointly humor recognition and pun detection
Chen	2022	A hidden Markov optimization model for processing and recognition of English speech feature signals
CN113010662B (en)	2022-09-27	A hierarchical conversational machine reading comprehension system and method
Huang et al.	2024	Speech recognition and intelligent translation under multimodal human–computer interaction system