[go: up one dir, main page]

Vysotska et al., 2025 - Google Patents

Development and Testing of Voice User Interfaces Based on BERT Models for Speech Recognition in Distance Learning and Smart Home Systems

Vysotska et al., 2025

View PDF
Document ID
16644328169918763302
Author
Vysotska V
Hu Z
Mykytyn N
Nagachevska O
Hazdiuk K
Uhryn D
Publication year
Publication venue
International Journal of Computer Network and Information Security (IJCNIS)

External Links

Snippet

Voice User Interfaces (VUIs) focus on their application in IT and linguistics. Our research examines the capabilities and limitations of small and multilingual BERT models in the context of speech recognition and command conversion. We evaluate the performance of …
Continue reading at www.mecs-press.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/289Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2872Rule based translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Taking into account non-speech caracteristics
    • G10L2015/228Taking into account non-speech caracteristics of application context
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Similar Documents

Publication Publication Date Title
Triantafyllopoulos et al. An overview of affective speech synthesis and conversion in the deep learning era
US11915692B2 (en) Facilitating end-to-end communications with automated assistants in multiple languages
Ngueajio et al. Hey ASR system! Why aren’t you more inclusive? Automatic speech recognition systems’ bias and proposed bias mitigation techniques. A literature review
US20230229960A1 (en) Systems and methods for facilitating integrative, extensible, composable, and interpretable deep learning
Cao et al. [Retracted] Optimization of Intelligent English Pronunciation Training System Based on Android Platform
Li et al. Audio-llm: Activating the capabilities of large language models to comprehend audio data
Dyriv et al. The user's psychological state identification based on Big Data analysis for person's electronic diary
Chakraborty et al. Knowledge-based framework for intelligent emotion recognition in spontaneous speech
Ashraff Voice-based interaction with digital services
Yellamma et al. Automatic and multilingual speech recognition and translation by using Google Cloud API
López-Ludeña et al. LSESpeak: A spoken language generator for Deaf people
Vysotska et al. Development and Testing of Voice User Interfaces Based on BERT Models for Speech Recognition in Distance Learning and Smart Home Systems
CN112883350B (en) Data processing method, device, electronic equipment and storage medium
Kumar et al. Voice-based virtual assistant for windows (Ziva-AI companion)
Amoli et al. Chromium Navigator Extension: Voice-Activated Assist for Disabled
Zahariev et al. Intelligent voice assistant based on open semantic technology
Li et al. The analysis of transformer end-to-end model in Real-time interactive scene based on speech recognition technology
Lichouri et al. Toward building another arabic voice command dataset for multiple speech processing tasks
Mišković et al. Hybrid methodological approach to context-dependent speech recognition
Tripathi et al. Cyclegan-based speech mode transformation model for robust multilingual ASR
Ashihara et al. Unveiling the linguistic capabilities of a self-supervised speech model through cross-lingual benchmark and layer-wise similarity analysis
Šoić et al. Spoken notifications in smart environments using Croatian language
Bhattacharya et al. A conversational assistant for democratization of data visualization: A comparative study of two approaches of interaction
Arop Integration Of A Speech Recognition System Into Fulafia FMIS
Pathak et al. Designing a multilingual virtual agent capable of interacting with uneducated people for automated data collection