Abstract

The relatively low cost access to large amounts of multimedia data, such as over the WWW, has resulted in an increasing demand for multimedia data management. Audio data has received relatively little research attention. The main reason for this is that audio data poses unique problems. Specifically, the unstructured nature of current audio representations considerably complicates the tasks of content-based retrieval and especially browsing. This paper attempts to address this oversight by developing a representation that is based on the inherent, perceptually congruent structure of audio data. A survey of the pertinent issues is presented that includes some of limitations of current unstructured audio representations and the existing retrieval systems based on these. The benefits of a structured representation are discussed as well as the relevant perceptual issues used to identify the underlying structure of an audio data stream. Finally, the structured representation is described and its possible applications to retrieval and browsing are outlined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Gonzalez, “Hypermedia Data Modeling, Coding and Semiotics”, Proc of the IEEE, vol 85, no 7, July 1997, pp 1111–1141.
Article Google Scholar
D. Hindus, C. Schmandt and C. Horner, “Capturing, Structuring and Representing Ubiquitous Audio”, ACM Trans. On Information Systems, v. 11, n. 4, Oct 1993, pp 376–400.
Article Google Scholar
G. Hauptmann, M. J. Witbrock, A. I. Rudnicky and S. Reed, “Speech for Multimedia Information Retrieval”, UIST '95, pp. 79–80.
Google Scholar
J. McNab, L. A. Smith, D. Bainbridge and I. H. Witten, “The New Zealand Digital Library MELody inDEX”, D-Lib Magazine, May 1997, http://www.dlib.org/dlib/may97/meldex/05witten.htm.
Google Scholar
Ghias, J. Logan, D. Chamberlin and B. C. Smith, “Query By Humming: Musical Information Retrieval in An Audio Database”, Proc. ACM Multimedia '95, San Francisco, pp 231–236.
Google Scholar
E. Wold, T. Blum, D. Keislar and J. Wheaton, “Content-Based Classification, Search and Retrieval of Audio”, IEEE Multimedia, Fall 1996, pp. 27–36.
Article Google Scholar
S. Tanguine, “A Principle of Correlativity of Perception and its Application to Music Recognition”, Music Perception, Summer 1994, 11 (4), pp. 465–502.
Google Scholar
P.J.V. Aigrain, P. Longueville, Lepain, “Representation-based user interfaces for the audiovisual library of year 2000”, Proc. SPIE Multimedia and Computing and Networks 1995, vol. 2417, Feb 1995, pp. 35–45.
Google Scholar
B. Arons, “SpeechSkimmer: Interactively Skimming Recorded Speech”, Proc. USIT 1993: ACM Symposium on User Interface Software and Technology, Nov 1993.
Google Scholar
D. P. W. Ellis, B. L. Vercoe, “A Perceptual Representation of Audio for Auditory Signal Separation”, presented at the 23^rd meeting of the Acoustical Society of America, Salt Lake City, May 1992.
Google Scholar
B. C. J. Moore, “An Introduction to the Psychology of Hearing”, fourth edition, Academic Press, 1997.
Google Scholar
T. F. Quatieri, R. J. McAulay, “Speech Transformations Based on a Sinusoidal Representation”, IEEE Trans. ASSP, vol. ASSP-34, no. 6, Dec 1986, pp. 1449–1463.
Article Google Scholar
N. Ahmed, T. Natarajan and K.R. Rao, “Discrete Cosine Transform”, IEEE Trans on Computers, Jan 1974, pp. 90–93.
Google Scholar
M. Paraskevas, J. Mourjopoulos, “A Differential Perceptual Audio Coding Method with Reduced Bitrate Requirements”, IEEE Trans ASSP, v. 3, n. 6, Nov 1995.
Google Scholar
M.R. Schroeder, B. S. Atal, J. L. Hall, “Opimizing digital speech coders by exploiting masking properties of the human ear”, J. Acoust. Soc. Amer., 66(6), Dec 1979, pp 1647–1651.
Article Google Scholar
ISO/IEC 11 172-3.
Google Scholar
J. Hoyt, H. Wechsler, “Detection of Human Speech in Structured Noise”, IEEE ICASSP, vol 2. 1994, pp 237–240
Google Scholar
A. B. Fineberg, R. J. Mammone, “Detection and Classification of Multicomponent Signals”, Proc. 25^th Asilomar Conference on Computer, Signals and Systems, Nov 4–6, 1991.
Google Scholar
E. Terhardt, G. Stoll, M. Seewann, “Algorithm for extraction of pitch and pitch salience from complex tonal signals”, J. Acoust
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Griffith University, QLD, Australia
Kathy Melih & Ruben Gonzalez

Authors

Identifying perceptually congruent structures for audio retrieval

Abstract

Access this chapter

Preview

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Identifying perceptually congruent structures for audio retrieval

Abstract

Access this chapter

Preview

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation