Baas et al., 2022 - Google Patents

Transfusion: Transcribing speech with multinomial diffusion

Baas et al., 2022

Document ID: 1934448473787856302
Author: Baas M; Eloff K; Kamper H
Publication year: 2022
Publication venue: Southern African Conference for Artificial Intelligence Research

External Links

Cited by

Snippet

Diffusion models have shown exceptional scaling properties in the image synthesis domain, and initial attempts have shown similar benefits for applying diffusion to unconditional text synthesis. Denoising diffusion models attempt to iteratively refine a sampled noise signal …

Continue reading at arxiv.org (PDF) (other versions)

238000009792 diffusion process 0 title abstract description 117

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages

Similar Documents

Publication	Publication Date	Title
Liu et al.	2022	Audio self-supervised learning: A survey
Kürzinger et al.	2020	Ctc-segmentation of large corpora for german end-to-end speech recognition
US12175202B2 (en)	2024-12-24	Enhanced attention mechanisms
Le et al.	2021	Deep shallow fusion for RNN-T personalization
US11568000B2 (en)	2023-01-31	System and method for automatic task-oriented dialog system
US20210390271A1 (en)	2021-12-16	Neural machine translation systems
US11423237B2 (en)	2022-08-23	Sequence transduction neural networks
Inaguma et al.	2021	Orthros: Non-autoregressive end-to-end speech translation with dual-decoder
Sullivan et al.	2022	Improving automatic speech recognition for non-native english with transfer learning and language model decoding
CN120077431A (en)	2025-05-30	End-to-end speech recognition for multi-speaker applications
Baas et al.	2022	Transfusion: Transcribing speech with multinomial diffusion
Radzikowski et al.	2019	Dual supervised learning for non-native speech recognition
Zheng et al.	2016	Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach.
Sun et al.	2024	Graph neural networks for contextual ASR with the tree-constrained pointer generator
Bataev et al.	2018	Exploring end-to-end techniques for low-resource speech recognition
Dumyn et al.	2024	Review of automatic speech recognition systems for Ukrainian and english language
Andrusenko et al.	2020	Exploration of end-to-end asr for openstt–russian open speech-to-text dataset
Naowarat et al.	2021	Reducing spelling inconsistencies in code-switching ASR using contextualized CTC loss
Kim et al.	2024	Accurate semi-supervised automatic speech recognition via multi-hypotheses-based curriculum learning
Hu et al.	2019	How question generation can help question answering over knowledge base
Fang et al.	2021	Multi-head attention with hint mechanisms for joint extraction of entity and relation
Altinok	2025	Boosting CTC-Based ASR Using LLM-Based Intermediate Loss Regularization
Zhang et al.	2021	Robust dialog state tracker with contextual-feature augmentation
Yolchuyeva	2021	Novel NLP Methods for Improved Text-To-Speech Synthesis
Bajec et al.	2020	Punctuation Restoration System for Slovene Language