research-article

Photorealistic 2D audiovisual text-to-speech synthesis using active appearance models

Authors:

Wesley Mattheyses,

Werner VerhelstAuthors Info & Claims

FAA '10: Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation

Page 15

https://doi.org/10.1145/1924035.1924042

Published: 21 October 2010 Publication History

Get Access

Abstract

Audiovisual text-to-speech (AVTTS) synthesizers are capable of generating a synthetic audiovisual speech signal based on an input text. A possible approach to achieve this is model-based synthesis, where the talking head consists of a 3D model of which the polygons are varied in accordance with the target speech. In contrast with these rule-based systems, data-driven synthesizers create the target speech by reusing pre-recorded natural speech samples. The system we developed at the Vrije Universiteit Brussel is a data-based 2D photorealistic synthesizer that is able to create a synthetic visual speech signal that is similar to standard 'newsreader-style' television recordings.

References

[1]

Edwards, G., Taylor, C., and Cootes, T. 1998. Interpreting face images using active appearance models. In Int. Conf. on Face and Gesture Recognition, 300--305.

Digital Library

Google Scholar

[2]

Hunt, A., and Black, A. 1996. Unit selection in a concatenative speech synthesis system using a large speech database. In International Conference on Acoustics, Speech and Signal Processing, 373--376.

Digital Library

Google Scholar

[3]

Mattheyses, W., Latacz, L., and Verhelst, W. 2009. On the importance of audiovisual coherence for the perceived quality of synthesized visual speech. EURASIP Journal on Audio, Speech, and Music Processing SI: Animating Virtual Speakers or Singers from Audio: Lip-Synching Facial Animation.

Digital Library

Google Scholar

[4]

Theobald, B.-J., Fagel, S., Bailly, G., and Elisei, F. 2008. Lips2008: Visual speech synthesis challenge. In Interspeech '08, 1875--1878.

Google Scholar

Cited By

View all

Yao XFried OFatahalian KAgrawala M(2021)Iterative Text-Based Editing of Talking-Heads Using Neural RetargetingACM Transactions on Graphics10.1145/344906340:3(1-14)Online publication date: Aug-2021
https://doi.org/10.1145/3449063
Fried OAgrawala MTewari AZollhöfer MFinkelstein AShechtman EGoldman DGenova KJin ZTheobalt C(2019)Text-based editing of talking-head videoACM Transactions on Graphics10.1145/3306346.332302838:4(1-14)Online publication date: 12-Jul-2019
https://doi.org/10.1145/3306346.3323028
Mattheyses WVerhelst W(2018)Audiovisual speech synthesisSpeech Communication10.1016/j.specom.2014.11.00166:C(182-217)Online publication date: 30-Dec-2018
https://dl.acm.org/doi/10.1016/j.specom.2014.11.001
Show More Cited By

Index Terms

Photorealistic 2D audiovisual text-to-speech synthesis using active appearance models
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
  2. Computer graphics
    1. Animation
    2. Shape modeling
2. Theory of computation
  1. Randomness, geometry and discrete structures
    1. Computational geometry

Recommendations

Concatenative speech synthesis for Amharic using unit selection method
MEDES '12: Proceedings of the International Conference on Management of Emergent Digital EcoSystems

In this paper we propose algorithms and methods that address critical issues in developing a general Amharic text-to-speech synthesizer. Converting grapheme to phoneme in Amharic is a very challenging task because of the two necessary and yet ...
Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and ...
Audiovisual Speech Synthesis using Tacotron2
ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction

Audiovisual speech synthesis involves synthesizing a talking face while maximizing the coherency of the acoustic and visual speech. To solve this problem, we propose using AVTacotron2, which is an end-to-end text-to-audiovisual speech synthesizer based ...

Comments

Information & Contributors

Information

Published In

FAA '10: Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation

October 2010

25 pages

ISBN:9781450303880

DOI:10.1145/1924035

Conference Chairs:
Darren Cosker
University of Bath
,
Gregor Hofer
University of Edinburgh
,
Michael Berger
University of Edinburgh
,
William Smith
University of York

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FAA '10

Sponsor:

SSPNET

FAA '10: The ACM / SSPNET 2nd International Symposium on Facial Analysis and Animation

October 21, 2010

Edinburgh, United Kingdom

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
103
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yao XFried OFatahalian KAgrawala M(2021)Iterative Text-Based Editing of Talking-Heads Using Neural RetargetingACM Transactions on Graphics10.1145/344906340:3(1-14)Online publication date: Aug-2021
https://doi.org/10.1145/3449063
Fried OAgrawala MTewari AZollhöfer MFinkelstein AShechtman EGoldman DGenova KJin ZTheobalt C(2019)Text-based editing of talking-head videoACM Transactions on Graphics10.1145/3306346.332302838:4(1-14)Online publication date: 12-Jul-2019
https://doi.org/10.1145/3306346.3323028
Mattheyses WVerhelst W(2018)Audiovisual speech synthesisSpeech Communication10.1016/j.specom.2014.11.00166:C(182-217)Online publication date: 30-Dec-2018
https://dl.acm.org/doi/10.1016/j.specom.2014.11.001
Mattheyses WLatacz LVerhelst W(2013)Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesisSpeech Communication10.1016/j.specom.2013.02.00555:7-8(857-876)Online publication date: Sep-2013
https://doi.org/10.1016/j.specom.2013.02.005

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Concatenative speech synthesis for Amharic using unit selection method

Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

Audiovisual Speech Synthesis using Tacotron2

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Concatenative speech synthesis for Amharic using unit selection method

Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

Audiovisual Speech Synthesis using Tacotron2

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations