[go: up one dir, main page]

skip to main content
10.1145/502585.502668acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Automatic discovery of salient segments in imperfect speech transcripts

Published: 05 October 2001 Publication History

Abstract

This paper addresses the problem of automatic detection of salient video segments for real-world applications such as corporate training based on associated speech transcriptions. We present a novel segmentation algorithm based on automatic speech recognition (ASR) applied to the audio track of the video. Our feature set consists of word n-grams extracted from the imperfect speech transcriptions. We use a two-pass algorithm that combines a boundary-based method with a content-based method. In the first pass, we analyze the temporal distribution and the rate of arrival of features to compute an initial segmentation. In the second pass, we detect changes in content-bearing words by using the content-bearing features as queries in an information retrieval system. The content-based second pass validates the initial segments and merges them as needed. Variations in the structure of the audio/video content, and the accuracy of ASR have an impact on the feasibility of the segmentation task. For realistic data we observe that we can identify content-rich segments of the audio. In the best scenario a high-level table-of-contents is generated and in the worse scenario a single salient segment is identified. We illustrate the algorithm in detail with some examples and validate the data with manual segmentation boundaries.

References

[1]
Allan, J., et al., Topic Detection and Tracking Pilot Study Final Report, Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, February 1998.
[2]
Bach, J.R., et al., Virage image search engine: An open framework for image management, Proc. of SPIE Storage and Retrieval for Still Images and Video Databases IV, Vol. 2670, IS&T/SPIE, February 1996. http://www.virage.com
[3]
Bharat, K., and Henzinger, M., Improved Algorithms for Topic Distillation in Hyperlinked Environments, Proc. of ACM SIGIR 1998.
[4]
Eichmann, D., et al., A cluster-based approach to tracking, detection and segmentation of Broadcast News, TDT Evaluation, NIST's 1999.
[5]
Fiscus, J.G., et al., TDT Evaluation, NIST's 1998.
[6]
Hauptmann, A.G., Speech Recognition in the Informedia Digital Video Library: Uses and Limitations, Proc. of ICTAI-95 7th IEEE Int. Conf. on Tools with AI, Washington, DC., 1995.
[7]
Hauptmann, A.G., and Lee, D., Topic Labeling of Broadcast News Stories in the Informedia Digital Video Library Digital Libraries '98, Proc. of ACM Conf. on Digital Libraries, Pittsburgh, PA, June, 1998.
[8]
Hearst, M.A, Multi-paragraph Segmentation of Expository Text, Proc. of the ACL, 1994.
[9]
Johnson, et al., Spoken Document Retrieval for TREC-7 at Cambridge University, Proc. of the 7th Text Retrieval Conference (TREC-7), 1998.
[10]
Justeson, J.S. and Slava K., Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text, in Natural Language Engineering, 1, pp 9-27, 1995.
[11]
Loach, P.D. and Wathen, A.J., On best least-squares approximation of continuous functions using linear splines with free knots, IMA J. Numerical Analysis, 11, pp. 393-409, 1991.
[12]
Schwartz, R., et. al., A Maximum Likelihood Model for Topic Classification in Broadcast News, Eurospeech, Fifth European Conf. on Speech Communication and Technology, September 1997.
[13]
Srinivasan, S. and Petkovic, D., Phonetic Confusion Matrix Based Spoken Document Retrieval, Proc. of SIGIR-2000, July 2000.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
October 2001
616 pages
ISBN:1581134363
DOI:10.1145/502585
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

CIKM01
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGSApplied Computer Science10.35784/acs-2023-2019:2(126-141)Online publication date: 30-Jun-2023
  • (2009)A3ACM Transactions on Accessible Computing10.1145/1530064.15300662:2(1-29)Online publication date: 1-Jun-2009
  • (2008)VCode and VDataProceedings of the working conference on Advanced visual interfaces10.1145/1385569.1385622(317-321)Online publication date: 28-May-2008
  • (2008)Speech and Handwriting RecognitionMachine Learning for Audio, Image and Video Analysis10.1007/978-1-84800-007-0_12(345-379)Online publication date: 2008
  • (2005)Augmented segmentation and visualization for presentation videosProceedings of the 13th annual ACM international conference on Multimedia10.1145/1101149.1101158(51-60)Online publication date: 6-Nov-2005

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media