[go: up one dir, main page]

skip to main content
10.1145/1924035.1924042acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfaaConference Proceedingsconference-collections
research-article

Photorealistic 2D audiovisual text-to-speech synthesis using active appearance models

Published: 21 October 2010 Publication History

Abstract

Audiovisual text-to-speech (AVTTS) synthesizers are capable of generating a synthetic audiovisual speech signal based on an input text. A possible approach to achieve this is model-based synthesis, where the talking head consists of a 3D model of which the polygons are varied in accordance with the target speech. In contrast with these rule-based systems, data-driven synthesizers create the target speech by reusing pre-recorded natural speech samples. The system we developed at the Vrije Universiteit Brussel is a data-based 2D photorealistic synthesizer that is able to create a synthetic visual speech signal that is similar to standard 'newsreader-style' television recordings.

References

[1]
Edwards, G., Taylor, C., and Cootes, T. 1998. Interpreting face images using active appearance models. In Int. Conf. on Face and Gesture Recognition, 300--305.
[2]
Hunt, A., and Black, A. 1996. Unit selection in a concatenative speech synthesis system using a large speech database. In International Conference on Acoustics, Speech and Signal Processing, 373--376.
[3]
Mattheyses, W., Latacz, L., and Verhelst, W. 2009. On the importance of audiovisual coherence for the perceived quality of synthesized visual speech. EURASIP Journal on Audio, Speech, and Music Processing SI: Animating Virtual Speakers or Singers from Audio: Lip-Synching Facial Animation.
[4]
Theobald, B.-J., Fagel, S., Bailly, G., and Elisei, F. 2008. Lips2008: Visual speech synthesis challenge. In Interspeech '08, 1875--1878.

Cited By

View all
  • (2021)Iterative Text-Based Editing of Talking-Heads Using Neural RetargetingACM Transactions on Graphics10.1145/344906340:3(1-14)Online publication date: Aug-2021
  • (2019)Text-based editing of talking-head videoACM Transactions on Graphics10.1145/3306346.332302838:4(1-14)Online publication date: 12-Jul-2019
  • (2018)Audiovisual speech synthesisSpeech Communication10.1016/j.specom.2014.11.00166:C(182-217)Online publication date: 30-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
FAA '10: Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation
October 2010
25 pages
ISBN:9781450303880
DOI:10.1145/1924035
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • SSPNET: Social Signal Processing Network

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active appearance models
  2. audiovisual text-to-speech synthesis
  3. unit selection

Qualifiers

  • Research-article

Conference

FAA '10
Sponsor:
  • SSPNET

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Iterative Text-Based Editing of Talking-Heads Using Neural RetargetingACM Transactions on Graphics10.1145/344906340:3(1-14)Online publication date: Aug-2021
  • (2019)Text-based editing of talking-head videoACM Transactions on Graphics10.1145/3306346.332302838:4(1-14)Online publication date: 12-Jul-2019
  • (2018)Audiovisual speech synthesisSpeech Communication10.1016/j.specom.2014.11.00166:C(182-217)Online publication date: 30-Dec-2018
  • (2013)Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesisSpeech Communication10.1016/j.specom.2013.02.00555:7-8(857-876)Online publication date: Sep-2013

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media