Wu et al., 2021 - Google Patents
Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN. Electronics 2021, 10, 228Wu et al., 2021
View PDF- Document ID
- 9282625478903747496
- Author
- Wu B
- Liu C
- Ishi C
- Ishiguro H
- Publication year
External Links
Snippet
Co-speech gestures are a crucial, non-verbal modality for humans to communicate. Social agents also need this capability to be more human-like and comprehensive. This study aims to model the distribution of gestures conditioned on human speech features. Unlike previous …
- 230000001143 conditioned 0 abstract description 8
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/004—Artificial life, i.e. computers simulating life
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Alexanderson et al. | Listen, denoise, action! audio-driven motion synthesis with diffusion models | |
| Nyatsanga et al. | A comprehensive review of data‐driven co‐speech gesture generation | |
| Bhattacharya et al. | Text2gestures: A transformer-based network for generating emotive body gestures for virtual agents | |
| Ghorbani et al. | ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech | |
| Bhattacharya et al. | Speech2affectivegestures: Synthesizing co-speech gestures with generative adversarial affective expression learning | |
| Chiu et al. | How to train your avatar: A data driven approach to gesture generation | |
| Gibet et al. | The signcom system for data-driven animation of interactive virtual signers: Methodology and evaluation | |
| Qi et al. | Emotiongesture: Audio-driven diverse emotional co-speech 3d gesture generation | |
| Zhi et al. | Livelyspeaker: Towards semantic-aware co-speech gesture generation | |
| US20250131631A1 (en) | Three-dimensional face animation from speech | |
| Ondras et al. | Audio-driven robot upper-body motion synthesis | |
| Yang et al. | Statistics‐based motion synthesis for social conversations | |
| US12456244B2 (en) | Autonomous animation in embodied agents | |
| Rebol et al. | Passing a non-verbal turing test: Evaluating gesture animations generated from speech | |
| US20220215267A1 (en) | Processes and methods for enabling artificial general intelligence capable of flexible calculation, prediction, planning and problem solving with arbitrary and unstructured data inputs and outputs | |
| WO2025025822A1 (en) | Method and apparatus for generating virtual object action, and computer device | |
| Sadoughi et al. | Creating prosodic synchrony for a robot co-player in a speech-controlled game for children | |
| Wang et al. | Integrated speech and gesture synthesis | |
| CN113609301A (en) | Dialogue method, medium and system based on knowledge graph | |
| Gao et al. | Gesgpt: Speech gesture synthesis with text parsing from chatgpt | |
| Sun et al. | Beyond talking–generating holistic 3d human dyadic motion for communication | |
| Voß et al. | Aq-gt: a temporally aligned and quantized gru-transformer for co-speech gesture synthesis | |
| Zhang et al. | Speech-driven personalized gesture synthetics: Harnessing automatic fuzzy feature inference | |
| Oralbayeva et al. | Data-driven communicative behaviour generation: A survey | |
| Lee et al. | Crossmodal clustered contrastive learning: Grounding of spoken language to gesture |