Baas et al., 2022 - Google Patents
Transfusion: Transcribing speech with multinomial diffusionBaas et al., 2022
View PDF- Document ID
- 1934448473787856302
- Author
- Baas M
- Eloff K
- Kamper H
- Publication year
- Publication venue
- Southern African Conference for Artificial Intelligence Research
External Links
Snippet
Diffusion models have shown exceptional scaling properties in the image synthesis domain, and initial attempts have shown similar benefits for applying diffusion to unconditional text synthesis. Denoising diffusion models attempt to iteratively refine a sampled noise signal …
- 238000009792 diffusion process 0 title abstract description 117
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Liu et al. | Audio self-supervised learning: A survey | |
| Kürzinger et al. | Ctc-segmentation of large corpora for german end-to-end speech recognition | |
| US12175202B2 (en) | Enhanced attention mechanisms | |
| Le et al. | Deep shallow fusion for RNN-T personalization | |
| US11568000B2 (en) | System and method for automatic task-oriented dialog system | |
| US20210390271A1 (en) | Neural machine translation systems | |
| US11423237B2 (en) | Sequence transduction neural networks | |
| Inaguma et al. | Orthros: Non-autoregressive end-to-end speech translation with dual-decoder | |
| Sullivan et al. | Improving automatic speech recognition for non-native english with transfer learning and language model decoding | |
| CN120077431A (en) | End-to-end speech recognition for multi-speaker applications | |
| Baas et al. | Transfusion: Transcribing speech with multinomial diffusion | |
| Radzikowski et al. | Dual supervised learning for non-native speech recognition | |
| Zheng et al. | Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach. | |
| Sun et al. | Graph neural networks for contextual ASR with the tree-constrained pointer generator | |
| Bataev et al. | Exploring end-to-end techniques for low-resource speech recognition | |
| Dumyn et al. | Review of automatic speech recognition systems for Ukrainian and english language | |
| Andrusenko et al. | Exploration of end-to-end asr for openstt–russian open speech-to-text dataset | |
| Naowarat et al. | Reducing spelling inconsistencies in code-switching ASR using contextualized CTC loss | |
| Kim et al. | Accurate semi-supervised automatic speech recognition via multi-hypotheses-based curriculum learning | |
| Hu et al. | How question generation can help question answering over knowledge base | |
| Fang et al. | Multi-head attention with hint mechanisms for joint extraction of entity and relation | |
| Altinok | Boosting CTC-Based ASR Using LLM-Based Intermediate Loss Regularization | |
| Zhang et al. | Robust dialog state tracker with contextual-feature augmentation | |
| Yolchuyeva | Novel NLP Methods for Improved Text-To-Speech Synthesis | |
| Bajec et al. | Punctuation Restoration System for Slovene Language |