Vysotska et al., 2025 - Google Patents

Development and Testing of Voice User Interfaces Based on BERT Models for Speech Recognition in Distance Learning and Smart Home Systems

Vysotska et al., 2025

View PDF

Document ID: 16644328169918763302
Author: Vysotska V; Hu Z; Mykytyn N; Nagachevska O; Hazdiuk K; Uhryn D
Publication year: 2025
Publication venue: International Journal of Computer Network and Information Security (IJCNIS)

External Links

Cited by

Snippet

Voice User Interfaces (VUIs) focus on their application in IT and linguistics. Our research examines the capabilities and limitations of small and multilingual BERT models in the context of speech recognition and command conversion. We evaluate the performance of …

Continue reading at www.mecs-press.org (PDF) (other versions)

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Taking into account non-speech caracteristics
- G10L2015/228—Taking into account non-speech caracteristics of application context
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Triantafyllopoulos et al.	2023	An overview of affective speech synthesis and conversion in the deep learning era
US11915692B2 (en)	2024-02-27	Facilitating end-to-end communications with automated assistants in multiple languages
Ngueajio et al.	2022	Hey ASR system! Why aren’t you more inclusive? Automatic speech recognition systems’ bias and proposed bias mitigation techniques. A literature review
US20230229960A1 (en)	2023-07-20	Systems and methods for facilitating integrative, extensible, composable, and interpretable deep learning
Cao et al.	2021	[Retracted] Optimization of Intelligent English Pronunciation Training System Based on Android Platform
Li et al.	2024	Audio-llm: Activating the capabilities of large language models to comprehend audio data
Dyriv et al.	2021	The user's psychological state identification based on Big Data analysis for person's electronic diary
Chakraborty et al.	2016	Knowledge-based framework for intelligent emotion recognition in spontaneous speech
Ashraff	2025	Voice-based interaction with digital services
Yellamma et al.	2024	Automatic and multilingual speech recognition and translation by using Google Cloud API
López-Ludeña et al.	2013	LSESpeak: A spoken language generator for Deaf people
Vysotska et al.	2025	Development and Testing of Voice User Interfaces Based on BERT Models for Speech Recognition in Distance Learning and Smart Home Systems
CN112883350B (en)	2024-12-17	Data processing method, device, electronic equipment and storage medium
Kumar et al.	2024	Voice-based virtual assistant for windows (Ziva-AI companion)
Amoli et al.	2023	Chromium Navigator Extension: Voice-Activated Assist for Disabled
Zahariev et al.	2020	Intelligent voice assistant based on open semantic technology
Li et al.	2025	The analysis of transformer end-to-end model in Real-time interactive scene based on speech recognition technology
Lichouri et al.	2023	Toward building another arabic voice command dataset for multiple speech processing tasks
Mišković et al.	2017	Hybrid methodological approach to context-dependent speech recognition
Tripathi et al.	2022	Cyclegan-based speech mode transformation model for robust multilingual ASR
Ashihara et al.	2024	Unveiling the linguistic capabilities of a self-supervised speech model through cross-lingual benchmark and layer-wise similarity analysis
Šoić et al.	2021	Spoken notifications in smart environments using Croatian language
Bhattacharya et al.	2024	A conversational assistant for democratization of data visualization: A comparative study of two approaches of interaction
Arop	2024	Integration Of A Speech Recognition System Into Fulafia FMIS
Pathak et al.	2017	Designing a multilingual virtual agent capable of interacting with uneducated people for automated data collection