Abstract
The relatively low cost access to large amounts of multimedia data, such as over the WWW, has resulted in an increasing demand for multimedia data management. Audio data has received relatively little research attention. The main reason for this is that audio data poses unique problems. Specifically, the unstructured nature of current audio representations considerably complicates the tasks of content-based retrieval and especially browsing. This paper attempts to address this oversight by developing a representation that is based on the inherent, perceptually congruent structure of audio data. A survey of the pertinent issues is presented that includes some of limitations of current unstructured audio representations and the existing retrieval systems based on these. The benefits of a structured representation are discussed as well as the relevant perceptual issues used to identify the underlying structure of an audio data stream. Finally, the structured representation is described and its possible applications to retrieval and browsing are outlined.
Preview
Unable to display preview. Download preview PDF.
References
R. Gonzalez, “Hypermedia Data Modeling, Coding and Semiotics”, Proc of the IEEE, vol 85, no 7, July 1997, pp 1111–1141.
D. Hindus, C. Schmandt and C. Horner, “Capturing, Structuring and Representing Ubiquitous Audio”, ACM Trans. On Information Systems, v. 11, n. 4, Oct 1993, pp 376–400.
G. Hauptmann, M. J. Witbrock, A. I. Rudnicky and S. Reed, “Speech for Multimedia Information Retrieval”, UIST '95, pp. 79–80.
J. McNab, L. A. Smith, D. Bainbridge and I. H. Witten, “The New Zealand Digital Library MELody inDEX”, D-Lib Magazine, May 1997, http://www.dlib.org/dlib/may97/meldex/05witten.htm.
Ghias, J. Logan, D. Chamberlin and B. C. Smith, “Query By Humming: Musical Information Retrieval in An Audio Database”, Proc. ACM Multimedia '95, San Francisco, pp 231–236.
E. Wold, T. Blum, D. Keislar and J. Wheaton, “Content-Based Classification, Search and Retrieval of Audio”, IEEE Multimedia, Fall 1996, pp. 27–36.
S. Tanguine, “A Principle of Correlativity of Perception and its Application to Music Recognition”, Music Perception, Summer 1994, 11 (4), pp. 465–502.
P.J.V. Aigrain, P. Longueville, Lepain, “Representation-based user interfaces for the audiovisual library of year 2000”, Proc. SPIE Multimedia and Computing and Networks 1995, vol. 2417, Feb 1995, pp. 35–45.
B. Arons, “SpeechSkimmer: Interactively Skimming Recorded Speech”, Proc. USIT 1993: ACM Symposium on User Interface Software and Technology, Nov 1993.
D. P. W. Ellis, B. L. Vercoe, “A Perceptual Representation of Audio for Auditory Signal Separation”, presented at the 23rd meeting of the Acoustical Society of America, Salt Lake City, May 1992.
B. C. J. Moore, “An Introduction to the Psychology of Hearing”, fourth edition, Academic Press, 1997.
T. F. Quatieri, R. J. McAulay, “Speech Transformations Based on a Sinusoidal Representation”, IEEE Trans. ASSP, vol. ASSP-34, no. 6, Dec 1986, pp. 1449–1463.
N. Ahmed, T. Natarajan and K.R. Rao, “Discrete Cosine Transform”, IEEE Trans on Computers, Jan 1974, pp. 90–93.
M. Paraskevas, J. Mourjopoulos, “A Differential Perceptual Audio Coding Method with Reduced Bitrate Requirements”, IEEE Trans ASSP, v. 3, n. 6, Nov 1995.
M.R. Schroeder, B. S. Atal, J. L. Hall, “Opimizing digital speech coders by exploiting masking properties of the human ear”, J. Acoust. Soc. Amer., 66(6), Dec 1979, pp 1647–1651.
ISO/IEC 11 172-3.
J. Hoyt, H. Wechsler, “Detection of Human Speech in Structured Noise”, IEEE ICASSP, vol 2. 1994, pp 237–240
A. B. Fineberg, R. J. Mammone, “Detection and Classification of Multicomponent Signals”, Proc. 25th Asilomar Conference on Computer, Signals and Systems, Nov 4–6, 1991.
E. Terhardt, G. Stoll, M. Seewann, “Algorithm for extraction of pitch and pitch salience from complex tonal signals”, J. Acoust
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Melih, K., Gonzalez, R. (1998). Identifying perceptually congruent structures for audio retrieval. In: Plagemann, T., Goebel, V. (eds) Interactive Distributed Multimedia Systems and Telecommunication Services. IDMS 1998. Lecture Notes in Computer Science, vol 1483. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0055311
Download citation
DOI: https://doi.org/10.1007/BFb0055311
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64955-7
Online ISBN: 978-3-540-49914-5
eBook Packages: Springer Book Archive