US20070276668A1

US20070276668A1 - Method and apparatus for accessing an audio file from a collection of audio files using tonal matching

Info

Publication number: US20070276668A1
Application number: US11/439,760
Authority: US
Inventors: Jun Xu; Huayun Zhang
Original assignee: Creative Technology Ltd
Current assignee: Creative Technology Ltd
Priority date: 2006-05-23
Filing date: 2006-05-23
Publication date: 2007-11-29
Also published as: TWI454942B; US8892565B2; CN101454778B; US20110238666A1; TW200813759A; WO2007136349A1; CN101454778A

Abstract

A method and apparatus for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with an electronic device. The method includes generating one index comprising information entries obtained from each of the more than one audio file in the collection, with each audio file in the collection information being linked to at least one information entry; receiving a vocal input during a voice reception mode; converting the vocal input into a digital signal using a digital-analog converter; analysing the digital signal using frequency spectrum analysis into discrete portions; and comparing the discrete portions with the entries in the index. It is advantageous that the audio file is accessed when the discrete portions substantially match at least one of the information entries in the index. It is preferable that the discrete portions are either musical notes or waveforms.

Description

FIELD OF INVENTION

This invention relates to a method and apparatus for accessing an audio file from a collection of audio files, and particularly relates to the accessing of files using tonal matching.

BACKGROUND

The advent of the age of affordable digital entertainment has given rise to a sharp increase in the adoption of personal digital entertainment devices by consumers. Such personal digital entertainment devices are usually equipped with storage capacities of a range of sizes. Given the falling prices of storage devices like hard drives and flash memory, an increasing number of personal digital entertainment devices come with storage capacities exceeding. 1 GB. Storage capacities of such sizes in personal digital entertainment devices used for audio files enable the storage of hundreds and even thousands of files.
While the audio files may be stored and categorisable according to their song titles, artistes, genre or the like, there may be instances where a user may forget the title or artiste of a song, rendering a search for the pertinent audio file akin to searching for a needle in a haystack. In many instances, the user may only be able to remember a portion of the song or its tune. At the present moment, this does not aid in the search for the pertinent audio file in any way. This is a problem when attempting to access audio files in a large collection of audio files where certain information like title or artiste of a song is unknown. This problem also arises when the visually impaired attempts to access audio files in a collection of audio files where they are unable to select the audio files through the use of sight.
It is also rather difficult to improve one's vocal prowess without engaging expensive vocal coaches. It is currently difficult to improve one's vocal prowess independently besides using karaoke machines with “scoring” functionalities incorporated in them. There are currently few devices available which are able to determine the quality of one's vocal prowess easily and conveniently.

SUMMARY OF INVENTION

In a preferred aspect of the present invention, there is provided a method for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with an electronic device. The method includes generating one index comprising of information entries obtained from each of the more than one audio file in the collection, with each audio file in the collection information being linked to at least one information entry; receiving a vocal input during a voice reception mode; converting the vocal input into a digital signal using a digital-analog converter; analysing the digital signal using frequency spectrum analysis into discrete portions; and comparing the discrete portions with the entries in the index. It is advantageous that the audio file is accessed when the discrete portions substantially coincide with at least one of the information entries in the index. It is preferable that the discrete portions are either musical notes or waveforms. The at least one information entry may also be musical notes or waveforms.
The vocal input may preferably be speaker independent and may be in the form of singing, humming, or whistling. The form of vocal input may preferably be manually or automatically selectable.
It is preferable that the audio file is accessible from the electronic device itself, a device functionally connected to the electronic device or a connected computer network. The information entry may also preferably be received from the audio file, a pre-recorded vocal entry linked to the audio file, or a connected computer network. It is preferable that the electronic device is selected from the group comprising: vehicle audio system, desktop computer, notebook computer, PDA, portable media player and mobile phone.
It is also preferable that the method further includes selecting a facility to access the audio files by depressing a pre-determined button at least once, and filtering the vocal input.
There is also provided an apparatus for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with the apparatus. It is preferable that the apparatus includes an indexer for generating an index comprising of information entries obtained from each of the more than one audio files in the collection, with each audio file in the collection information being linked to at least one information entry; a vocal reception means for receiving a vocal input during a vocal reception mode; converting the vocal input into a digital signal using a digital-analog converter; and a processor to analyse the digital signal using frequency spectrum analysis into discrete portions, the processor also being able to compare the discrete portions with the entries in the index. Advantageously, the audio file is accessed when the discrete portions substantially coincide with at least one of the information entries in the index. The apparatus may include a display and the vocal input may be filtered. The vocal reception mode may be activated by depressing at least one button at least once. It is preferable that the discrete portions are musical notes or waveforms.
It is preferable that the apparatus is selected from the group comprising: vehicle audio system, desktop computer, notebook computer, PDA, portable media player and mobile phone.
It is preferable that the vocal input is either manually or automatically selected from the group comprising: singing, humming, and whistling. Advantageously, the vocal input is speaker independent. The at least one information entry may be selected from either musical notes or waveforms. Preferably, the at least one information entry is received from the audio file, a pre-recorded vocal entry linked to the audio file, or a connected computer network. The audio file may be accessible from the electronic device itself, any device functionally connected to the electronic device or a connected computer network.
There is also provided a method of determining a level of quality for vocal input using the aforementioned apparatus.

DESCRIPTION OF DRAWINGS

In order that the present invention may be fully understood and readily put into practical effect, there shall now be described by way of non-limitative example only preferred embodiments of the present invention, the description being with reference to the accompanying illustrative drawings.

FIG. 1 shows a flow chart of a method of a preferred embodiment of the present invention.

FIG. 2 shows a schematic diagram of an apparatus of a preferred embodiment of the present invention.

DESCRIPTION OF PREFERRED EMBODIMENTS

The following discussion is intended to provide a brief, general description of a suitable computing environment in which the present invention may be implemented. Although not required, the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a personal computer. Generally, program modules include routines, programs, characters, components, data structures, that perform particular tasks or implement particular abstract data types. As those skilled in the art will appreciate, the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
Referring to FIG. 1, there is provided flow chart of a method for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with an electronic device. The electronic device may be, for example, a vehicle audio system, a desktop computer, a notebook computer, a PDA, a portable media player or a mobile phone and the like. The method may include an enablement of a vocal reception mode (20) in the electronic device in a manner like, for example, depressing a pre-determined button on the electronic device at least once. The vocal reception mode may be enabled or disabled as it may prevent a power source in the electronic device from being continually drained by continual enablement of the vocal reception mode. The vocal reception mode may be for vocal input such as, for example, singing, humming, or whistling.
The enablement of the vocal reception mode in the electronic device may initialise an indexing system (24). Once the indexing system is initiated, the system then determines whether the composition of audio files in the collection has changed (26). The composition of audio files may include the number of audio files and the audio filenames. The index may comprise information entries obtained from each of the more than one audio file in the collection of audio files stored in the electronic device, any device functionally connected to the electronic device or a connected computer network. Connection to the computer network may be via wired or wireless means. Each audio file in the collection may be linked to at least one information entry in the index. The at least one information entry may be musical notes or waveforms determined using semantic segmentation corresponding to a portion or the whole content stored in the audio files. The information entry may also be a MIDI component that is linked/attached to an audio file like file metadata. The information entry may also be obtainable from a pre-recorded vocal entry linked/attached to the audio file, or a connected computer network. There may be an online database on the connected computer network where information entries of musical notes or waveforms are downloadable for each audio file.
If the composition of audio files is found to be different, a search is conducted on the collection of audio files stored in the electronic device, any device functionally connected to the electronic device or a connected computer network (28). This step is to determine whether audio files have been added to or removed from the collection. Subsequent to the search, information entries obtained from each audio file directly (25), information entries downloaded from the connected computer network for each audio file (29), or pre-recorded vocal entries linked to each audio file (23) may be combined into an index (30). The index is then loaded for use (32) in the electronic device.
If the composition of audio files is found to be unchanged, the last used index is then loaded for use (32) in the electronic device. With the enablement of the vocal reception mode, there may be vocal input into the device (34). The vocal input may be singing, humming, or whistling. In a particular instance, the vocal input need not be a song in its entirety. A portion of a song may be sufficient as a viable form of the vocal input. The vocal input may be filtered. A user may be able to manually select a specific vocal input (22) for the vocal reception mode. There may also be automatic detection of vocal input (22). Vocal reception by the electronic device may be speaker independent. The vocal reception mode may have automatic volume correction for the vocal input if the vocal input is either too loud (such that distortion of input occurs) or too soft (such that input is inaudible). The electronic device may also be able to overcome the problem of an off tune vocal input during the vocal input mode by providing a selection of audio files that most closely approximates to the off tune vocal input based on the entries of the audio files in the index. The user may set the device to show the closest approximations up to a pre-determined number, such as, for example, the ten closest approximations.
Subsequently, the vocal input in analog form is converted into digital signals by a digital-analog converter (36). The converter may be an analog-MIDI converter. Thereafter, a processor in the electronic device may analyse the digital signals into discrete portions, where the discrete portions may be either musical notes or waveforms. Processing of the digital signals may be done using frequency spectrum analysis. The processor may then compare the discrete portions with entries in the index (40). Exact or substantial similarity between the discrete portions and entries in the index enables the generation of a listing of audio files in order of extent of similarity (42). The listing may show a number of audio files, a number that may be pre-determined by the user and may be shown on a display on the electronic device. The extent of similarity may be based on relative closeness in terms of either musical notes or waveforms.
Referring to FIG. 2, there is provided an apparatus 50 for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with the apparatus 50. The apparatus 50 may be for example, a vehicle audio system, desktop computer, notebook computer, PDA, portable media player or mobile phone. The components described in the following sections may be incorporated in the aforementioned different forms of the apparatus 50 in addition to components used for their primary functionalities.
The apparatus 50 may include a digital storage device 58 for the storage of the audio files that make up the collection of files. The digital storage device 58 may be non-volatile memory in the form of a hard disk drive or flash memory. The digital storage device 58 may have capacities of at least a few megabytes.
In addition, the apparatus 50 may also include an indexer 56 for generating an index comprising of information entries obtained from each of the more than one audio files in the collection. The index may comprise information entries obtained from each of the more than one audio file in a collection of audio files stored in the digital storage device 58 of the apparatus 50, any device functionally connected to the apparatus 50 or a connected computer network. Each audio file in the collection may be linked to at least one information entry in the index. The at least one information entry may be musical notes or waveforms determined using semantic segmentation corresponding to a portion or the whole content stored in the audio files. The information entry may also be a MIDI component that is linked/attached to an audio file like file metadata. The information entry may also be obtainable from a pre-recorded vocal entry linked/attached to the audio file, or a connected computer network. There may be an online database on the connected computer network where information entries of musical notes or waveforms are downloadable for each audio file.
A vocal reception means 64 for receiving a vocal input during a vocal reception mode may also be included in the apparatus 50. The vocal reception means 64 may be a microphone. The vocal input may be singing, humming, or whistling. In a particular instance, the vocal input need not be a song in its entirety. A portion of a song may be sufficient as a viable form of the vocal input. The vocal input may also be filtered. There may be a selector to choose the type of vocal input, or detection of vocal input may be automatic. The vocal reception mode may be activated by pressing an activating button 63 incorporated with the apparatus 50 at least once. Vocal input into the vocal reception means 64 may be speaker independent. The vocal reception mode may have automatic volume correction for the vocal input if the vocal input is either too loud (such that distortion of input occurs) or too soft (such that input is inaudible). The electronic device may also be able to overcome the problem of an off tune vocal input during the vocal input mode by providing a selection of audio files that most closely approximates to the off tune vocal input based on the entries of the audio files in the index. The user may set the device to show the closest approximations up to a pre-determined number, such as, for example, the ten closest approximations.
The vocal reception means 64 may be coupled to a digital-analog converter 62 which converts the vocal input through the vocal reception means 64 into digital signals. The converter 62 may be an analog-MIDI converter. The converted digital signals are then passed into a processor 60 for analysis of the digital signals into discrete portions, where the discrete portions may be either musical notes or waveforms. Processing of the digital signals by the processor 60 may be done using frequency spectrum analysis. The processor 60 may then be able to compare the discrete portions of the signals with the entries in the index generated by the indexer 56. Audio files may thereby be accessible when the discrete portions substantially coincides with at least one of the information entries in the index. Exact or substantial similarity between the discrete portions and entries in the index enable the generation of a listing of audio files in order of extent of similarity. The listing may show a number of audio files, a number that may be pre-determined by the user. A display 54 in the apparatus 50 allows for the listing of files to be shown clearly for selection by the user. The extent of similarity may be based on relative closeness in terms of, either musical notes or waveforms.
The visually impaired may be able to use apparatus 50 to access files stored within or accessible with the apparatus 50 using tonal matching. While they are unable to select the files shown on the display 54, they may access the audio file which has been extracted from the collection at their convenience just from using vocal input.
An alternative application of the present invention makes use of the vocal reception mode of the electronic device to ascertain and improve vocal abilities of users. For example, if a user repeatedly fails to find a desired audio file through the use of vocal input into the electronic device, it is highly probable that the user's vocal input (prowess) is flawed. Thus the user is then inclined to continually practice vocal input into the electronic device until improvement is attained in terms of a higher incidence of finding a desired audio file. Thus, a device to conveniently ascertain a level of quality for vocal input is also disclosed.
Whilst there has been described in the foregoing description preferred embodiments of the present invention, it will be understood by those skilled in the technology concerned that many variations or modifications in details of design or construction may be made without departing from the present invention.

Claims

1. A method for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with an electronic device, including:

generating one index comprising of information entries obtained from each of the more than one audio file in the collection, with each audio file in the collection being linked to at least one information entry;

receiving a vocal input during a voice reception mode;

converting the vocal input into a digital signal using a digital-analog converter;

analysing the digital signal using frequency spectrum analysis into discrete portions; and

comparing the discrete portions with the information entries in the index,

wherein the at least one audio file is accessed when the discrete portions substantially match at least one information entry in the index.

2. The method of claim 1, wherein the discrete portions are selected from the group consisting of: musical notes and waveforms.

3. The method of claim 1, wherein the vocal input is selected from the group consisting of: singing, humming, and whistling.

4. The method of claim 1, wherein the at least one information entry is selected from the group consisting of: musical notes and waveforms.

5. The method of claim 1, wherein the audio file accessible from a source selected from the group consisting of: the electronic device, any device functionally connected to the electronic device and a connected computer network.

6. The method of claim 3, wherein the vocal input is set by means selected from the group consisting of: manual selection and automatic selection.

7. The method of claim 1, wherein the vocal input is speaker independent.

8. The method of claim 1, wherein the at least one information entry is received from a source selected from the group consisting of: the audio file, a pre-recorded vocal entry linked to the audio file, and a connected computer network.

9. The method of claim 1, wherein the electronic device is selected from the group consisting of: vehicle audio system, desktop computer, notebook computer, PDA, portable media player and mobile phone.

10. The method of claim 1, further including selecting a facility to access the audio files by depressing a pre-determined button at least once.

11. The method of claim 1, further including filtering the vocal input.

12. An apparatus for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with the apparatus, including:

an indexer configured to generate an index comprising information entries obtained from each of the more than one audio files in the collection, with each audio file in the collection being linked to at least one information entry;

a vocal receiver configured to receive a vocal input during a vocal reception mode;

a digital signal using a digital-analog converter configured to convert the vocal input into a digital signal; and

a processor configured to analyse the digital signal using frequency spectrum analysis into discrete portions and to compare the discrete portions with the information entries in the index,

13. The apparatus of claim 12, wherein the apparatus is selected from the group consisting of: vehicle audio system, desktop computer, notebook computer, PDA, portable media player and mobile phone.

14. The apparatus of claim 12, wherein the vocal input is selected from the group consisting of: singing, humming, and whistling.

15. The apparatus of claim 14, wherein the vocal input is set by means selected from the group consisting of: manual selection and automatic selection.

16. The apparatus of claim 12, wherein the at least one information entry is selected from the group consisting of: musical notes and waveforms.

17. The apparatus of claim 12, wherein the vocal input is speaker independent.

18. The apparatus of claim 12, wherein the at least one information entry is received from a source selected from the group consisting of: the audio file, a pre-recorded vocal entry linked to the audio file, and a connected computer network.

19. The apparatus of claim 12, wherein the vocal reception mode is activated by depressing at least one button at least once.

20. The apparatus of claim 12, further including a display.

21. The apparatus of claim 12, wherein the vocal input is filtered.

22. The apparatus of claim 12, wherein the discrete portions are selected from the group consisting of: musical notes and waveforms.

23. The apparatus of claim 12, wherein the audio file is accessible from a source selected from the group consisting of: the electronic device, any device functionally connected to the electronic device and a connected computer network.

24. A method of determining a level of quality for vocal input using the apparatus of claim 12.

25. A method for accessing at least one audio file from a collection of audio files stored within or accessible with an electronic device, the method comprising:

generating an index comprising information entries obtained from audio files in the collection, each audio file in the collection having at least one corresponding information entry in the index;

analysing a digital signal into discrete portions, the digital signal being obtained from a converted vocal input received during a voice reception mode; and

comparing the discrete portions with the information entries in the index,

26. The method according to claim 25, wherein the digital signal is analysed into discrete portions using frequency spectrum analysis.

27. The method according to claim 25, wherein the vocal input is converted into the digital signal using a digital analog converter.