Article

Automatic discovery of salient segments in imperfect speech transcripts

Authors:

Dulce Ponceleon,

Savitha SrinivasanAuthors Info & Claims

CIKM '01: Proceedings of the tenth international conference on Information and knowledge management

Pages 490 - 497

https://doi.org/10.1145/502585.502668

Published: 05 October 2001 Publication History

Get Access

Abstract

This paper addresses the problem of automatic detection of salient video segments for real-world applications such as corporate training based on associated speech transcriptions. We present a novel segmentation algorithm based on automatic speech recognition (ASR) applied to the audio track of the video. Our feature set consists of word n-grams extracted from the imperfect speech transcriptions. We use a two-pass algorithm that combines a boundary-based method with a content-based method. In the first pass, we analyze the temporal distribution and the rate of arrival of features to compute an initial segmentation. In the second pass, we detect changes in content-bearing words by using the content-bearing features as queries in an information retrieval system. The content-based second pass validates the initial segments and merges them as needed. Variations in the structure of the audio/video content, and the accuracy of ASR have an impact on the feasibility of the segmentation task. For realistic data we observe that we can identify content-rich segments of the audio. In the best scenario a high-level table-of-contents is generated and in the worse scenario a single salient segment is identified. We illustrate the algorithm in detail with some examples and validate the data with manual segmentation boundaries.

References

[1]

Allan, J., et al., Topic Detection and Tracking Pilot Study Final Report, Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, February 1998.

Google Scholar

[2]

Bach, J.R., et al., Virage image search engine: An open framework for image management, Proc. of SPIE Storage and Retrieval for Still Images and Video Databases IV, Vol. 2670, IS&T/SPIE, February 1996. http://www.virage.com

Crossref

Google Scholar

[3]

Bharat, K., and Henzinger, M., Improved Algorithms for Topic Distillation in Hyperlinked Environments, Proc. of ACM SIGIR 1998.

Digital Library

Google Scholar

[4]

Eichmann, D., et al., A cluster-based approach to tracking, detection and segmentation of Broadcast News, TDT Evaluation, NIST's 1999.

Google Scholar

[5]

Fiscus, J.G., et al., TDT Evaluation, NIST's 1998.

Google Scholar

[6]

Hauptmann, A.G., Speech Recognition in the Informedia Digital Video Library: Uses and Limitations, Proc. of ICTAI-95 7th IEEE Int. Conf. on Tools with AI, Washington, DC., 1995.

Digital Library

Google Scholar

[7]

Hauptmann, A.G., and Lee, D., Topic Labeling of Broadcast News Stories in the Informedia Digital Video Library Digital Libraries '98, Proc. of ACM Conf. on Digital Libraries, Pittsburgh, PA, June, 1998.

Digital Library

Google Scholar

[8]

Hearst, M.A, Multi-paragraph Segmentation of Expository Text, Proc. of the ACL, 1994.

Digital Library

Google Scholar

[9]

Johnson, et al., Spoken Document Retrieval for TREC-7 at Cambridge University, Proc. of the 7th Text Retrieval Conference (TREC-7), 1998.

Google Scholar

[10]

Justeson, J.S. and Slava K., Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text, in Natural Language Engineering, 1, pp 9-27, 1995.

Crossref

Google Scholar

[11]

Loach, P.D. and Wathen, A.J., On best least-squares approximation of continuous functions using linear splines with free knots, IMA J. Numerical Analysis, 11, pp. 393-409, 1991.

Crossref

Google Scholar

[12]

Schwartz, R., et. al., A Maximum Likelihood Model for Topic Classification in Broadcast News, Eurospeech, Fifth European Conf. on Speech Communication and Technology, September 1997.

Google Scholar

[13]

Srinivasan, S. and Petkovic, D., Phonetic Confusion Matrix Based Spoken Document Retrieval, Proc. of SIGIR-2000, July 2000.

Digital Library

Google Scholar

Cited By

View all

Halawa AGamalel-Din SNasr A(2023)EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGSApplied Computer Science10.35784/acs-2023-2019:2(126-141)Online publication date: 30-Jun-2023
https://doi.org/10.35784/acs-2023-20
Hailpern JKarahalios KHalle JDethorne LColetto M(2009)A3ACM Transactions on Accessible Computing10.1145/1530064.15300662:2(1-29)Online publication date: 1-Jun-2009
https://dl.acm.org/doi/10.1145/1530064.1530066
Hagedorn JHailpern JKarahalios KLevialdi S(2008)VCode and VDataProceedings of the working conference on Advanced visual interfaces10.1145/1385569.1385622(317-321)Online publication date: 28-May-2008
https://dl.acm.org/doi/10.1145/1385569.1385622
Show More Cited By

Index Terms

Automatic discovery of salient segments in imperfect speech transcripts
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing
      1. Speech recognition

Recommendations

Temporal variability in speech segments of Spanish: context and speaker related differences

This article reports on segmental duration measurements of eight selected consonants (voiceless obstruents, nasals and liquids) and three vowels in 192 disyllabic (CVCe) nonsense words with stress on the first syllable, spoken in isolation by 12 Spanish ...
Automatic indexing of speech segments with spontaneity levels on large audio database
SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speech

Spontaneous speech detection from a large audio database can be useful for different applications. For example, processing spontaneous speech is one of the many challenges that Automatic Speech Recognition (ASR) systems have to deal with. Spontaneous ...
Investigating different models for cross-language information retrieval from automatic speech transcripts

Comments

Information & Contributors

Information

Published In

CIKM '01: Proceedings of the tenth international conference on Information and knowledge management

October 2001

616 pages

ISBN:1581134363

DOI:10.1145/502585

Editors:
Henrique Paques
Georgia Institute of Technology
,
Ling Liu
Georgia Institute of Technology
,
David Grossman
Illinois Institute of Technology
,
General Chair:
Calton Pu
Georgia Institute of Technology

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2001

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

CIKM01

Sponsor:

CIKM01: International Conference on Information and Knowledge Management

October 5 - 10, 2001

Georgia, Atlanta, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
365
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Halawa AGamalel-Din SNasr A(2023)EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGSApplied Computer Science10.35784/acs-2023-2019:2(126-141)Online publication date: 30-Jun-2023
https://doi.org/10.35784/acs-2023-20
Hailpern JKarahalios KHalle JDethorne LColetto M(2009)A3ACM Transactions on Accessible Computing10.1145/1530064.15300662:2(1-29)Online publication date: 1-Jun-2009
https://dl.acm.org/doi/10.1145/1530064.1530066
Hagedorn JHailpern JKarahalios KLevialdi S(2008)VCode and VDataProceedings of the working conference on Advanced visual interfaces10.1145/1385569.1385622(317-321)Online publication date: 28-May-2008
https://dl.acm.org/doi/10.1145/1385569.1385622
Camastra FVinciarelli A(2008)Speech and Handwriting RecognitionMachine Learning for Audio, Image and Video Analysis10.1007/978-1-84800-007-0_12(345-379)Online publication date: 2008
https://doi.org/10.1007/978-1-84800-007-0_12
Haubold AKender JZhang HChua TSteinmetz RKankanhalli MWilcox L(2005)Augmented segmentation and visualization for presentation videosProceedings of the 13th annual ACM international conference on Multimedia10.1145/1101149.1101158(51-60)Online publication date: 6-Nov-2005
https://dl.acm.org/doi/10.1145/1101149.1101158

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Temporal variability in speech segments of Spanish: context and speaker related differences

Automatic indexing of speech segments with spontaneity levels on large audio database

Investigating different models for cross-language information retrieval from automatic speech transcripts