Wu et al., 2021 - Google Patents

Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN. Electronics 2021, 10, 228

Wu et al., 2021

View PDF

Document ID: 9282625478903747496
Author: Wu B; Liu C; Ishi C; Ishiguro H
Publication year: 2021

External Links

Cited by

Snippet

Co-speech gestures are a crucial, non-verbal modality for humans to communicate. Social agents also need this capability to be more human-like and comprehensive. This study aims to model the distribution of gestures conditioned on human speech features. Unlike previous …

Continue reading at pdfs.semanticscholar.org (PDF) (other versions)

230000001143 conditioned 0 abstract description 8

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/004—Artificial life, i.e. computers simulating life
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification

Similar Documents

Publication	Publication Date	Title
Alexanderson et al.	2023	Listen, denoise, action! audio-driven motion synthesis with diffusion models
Nyatsanga et al.	2023	A comprehensive review of data‐driven co‐speech gesture generation
Bhattacharya et al.	2021	Text2gestures: A transformer-based network for generating emotive body gestures for virtual agents
Ghorbani et al.	2023	ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech
Bhattacharya et al.	2021	Speech2affectivegestures: Synthesizing co-speech gestures with generative adversarial affective expression learning
Chiu et al.	2011	How to train your avatar: A data driven approach to gesture generation
Gibet et al.	2011	The signcom system for data-driven animation of interactive virtual signers: Methodology and evaluation
Qi et al.	2024	Emotiongesture: Audio-driven diverse emotional co-speech 3d gesture generation
Zhi et al.	2023	Livelyspeaker: Towards semantic-aware co-speech gesture generation
US20250131631A1 (en)	2025-04-24	Three-dimensional face animation from speech
Ondras et al.	2020	Audio-driven robot upper-body motion synthesis
Yang et al.	2020	Statistics‐based motion synthesis for social conversations
US12456244B2 (en)	2025-10-28	Autonomous animation in embodied agents
Rebol et al.	2021	Passing a non-verbal turing test: Evaluating gesture animations generated from speech
US20220215267A1 (en)	2022-07-07	Processes and methods for enabling artificial general intelligence capable of flexible calculation, prediction, planning and problem solving with arbitrary and unstructured data inputs and outputs
WO2025025822A1 (en)	2025-02-06	Method and apparatus for generating virtual object action, and computer device
Sadoughi et al.	2017	Creating prosodic synchrony for a robot co-player in a speech-controlled game for children
Wang et al.	2021	Integrated speech and gesture synthesis
CN113609301A (en)	2021-11-05	Dialogue method, medium and system based on knowledge graph
Gao et al.	2024	Gesgpt: Speech gesture synthesis with text parsing from chatgpt
Sun et al.	2025	Beyond talking–generating holistic 3d human dyadic motion for communication
Voß et al.	2023	Aq-gt: a temporally aligned and quantized gru-transformer for co-speech gesture synthesis
Zhang et al.	2024	Speech-driven personalized gesture synthetics: Harnessing automatic fuzzy feature inference
Oralbayeva et al.	2024	Data-driven communicative behaviour generation: A survey
Lee et al.	2021	Crossmodal clustered contrastive learning: Grounding of spoken language to gesture