[go: up one dir, main page]

US20070276668A1 - Method and apparatus for accessing an audio file from a collection of audio files using tonal matching - Google Patents

Method and apparatus for accessing an audio file from a collection of audio files using tonal matching Download PDF

Info

Publication number
US20070276668A1
US20070276668A1 US11/439,760 US43976006A US2007276668A1 US 20070276668 A1 US20070276668 A1 US 20070276668A1 US 43976006 A US43976006 A US 43976006A US 2007276668 A1 US2007276668 A1 US 2007276668A1
Authority
US
United States
Prior art keywords
audio file
vocal
collection
discrete portions
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/439,760
Inventor
Jun Xu
Huayun Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Creative Technology Ltd
Original Assignee
Creative Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Creative Technology Ltd filed Critical Creative Technology Ltd
Priority to US11/439,760 priority Critical patent/US20070276668A1/en
Assigned to CREATIVE TECHNOLOGY LTD. reassignment CREATIVE TECHNOLOGY LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: XU, JUN, ZHANG, HUAYUN
Priority to PCT/SG2007/000140 priority patent/WO2007136349A1/en
Priority to US12/301,878 priority patent/US8892565B2/en
Priority to CN2007800190803A priority patent/CN101454778B/en
Priority to TW096118334A priority patent/TWI454942B/en
Publication of US20070276668A1 publication Critical patent/US20070276668A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/632Query formulation
    • G06F16/634Query by example, e.g. query by humming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • This invention relates to a method and apparatus for accessing an audio file from a collection of audio files, and particularly relates to the accessing of files using tonal matching.
  • While the audio files may be stored and categorisable according to their song titles, artistes, genre or the like, there may be instances where a user may forget the title or artiste of a song, rendering a search for the pertinent audio file akin to searching for a needle in a haystack. In many instances, the user may only be able to remember a portion of the song or its tune. At the present moment, this does not aid in the search for the pertinent audio file in any way. This is a problem when attempting to access audio files in a large collection of audio files where certain information like title or artiste of a song is unknown. This problem also arises when the visually impaired attempts to access audio files in a collection of audio files where they are unable to select the audio files through the use of sight.
  • a method for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with an electronic device.
  • the method includes generating one index comprising of information entries obtained from each of the more than one audio file in the collection, with each audio file in the collection information being linked to at least one information entry; receiving a vocal input during a voice reception mode; converting the vocal input into a digital signal using a digital-analog converter; analysing the digital signal using frequency spectrum analysis into discrete portions; and comparing the discrete portions with the entries in the index.
  • the audio file is accessed when the discrete portions substantially coincide with at least one of the information entries in the index.
  • the discrete portions are either musical notes or waveforms.
  • the at least one information entry may also be musical notes or waveforms.
  • the vocal input may preferably be speaker independent and may be in the form of singing, humming, or whistling.
  • the form of vocal input may preferably be manually or automatically selectable.
  • the audio file is accessible from the electronic device itself, a device functionally connected to the electronic device or a connected computer network.
  • the information entry may also preferably be received from the audio file, a pre-recorded vocal entry linked to the audio file, or a connected computer network.
  • the electronic device is selected from the group comprising: vehicle audio system, desktop computer, notebook computer, PDA, portable media player and mobile phone.
  • the method further includes selecting a facility to access the audio files by depressing a pre-determined button at least once, and filtering the vocal input.
  • an apparatus for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with the apparatus includes an indexer for generating an index comprising of information entries obtained from each of the more than one audio files in the collection, with each audio file in the collection information being linked to at least one information entry; a vocal reception means for receiving a vocal input during a vocal reception mode; converting the vocal input into a digital signal using a digital-analog converter; and a processor to analyse the digital signal using frequency spectrum analysis into discrete portions, the processor also being able to compare the discrete portions with the entries in the index.
  • the audio file is accessed when the discrete portions substantially coincide with at least one of the information entries in the index.
  • the apparatus may include a display and the vocal input may be filtered.
  • the vocal reception mode may be activated by depressing at least one button at least once. It is preferable that the discrete portions are musical notes or waveforms.
  • the apparatus is selected from the group comprising: vehicle audio system, desktop computer, notebook computer, PDA, portable media player and mobile phone.
  • the vocal input is either manually or automatically selected from the group comprising: singing, humming, and whistling.
  • the vocal input is speaker independent.
  • the at least one information entry may be selected from either musical notes or waveforms.
  • the at least one information entry is received from the audio file, a pre-recorded vocal entry linked to the audio file, or a connected computer network.
  • the audio file may be accessible from the electronic device itself, any device functionally connected to the electronic device or a connected computer network.
  • FIG. 1 shows a flow chart of a method of a preferred embodiment of the present invention.
  • FIG. 2 shows a schematic diagram of an apparatus of a preferred embodiment of the present invention.
  • program modules include routines, programs, characters, components, data structures, that perform particular tasks or implement particular abstract data types.
  • program modules include routines, programs, characters, components, data structures, that perform particular tasks or implement particular abstract data types.
  • program modules may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like.
  • the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote memory storage devices.
  • the electronic device may be, for example, a vehicle audio system, a desktop computer, a notebook computer, a PDA, a portable media player or a mobile phone and the like.
  • the method may include an enablement of a vocal reception mode ( 20 ) in the electronic device in a manner like, for example, depressing a pre-determined button on the electronic device at least once.
  • the vocal reception mode may be enabled or disabled as it may prevent a power source in the electronic device from being continually drained by continual enablement of the vocal reception mode.
  • the vocal reception mode may be for vocal input such as, for example, singing, humming, or whistling.
  • the enablement of the vocal reception mode in the electronic device may initialise an indexing system ( 24 ). Once the indexing system is initiated, the system then determines whether the composition of audio files in the collection has changed ( 26 ).
  • the composition of audio files may include the number of audio files and the audio filenames.
  • the index may comprise information entries obtained from each of the more than one audio file in the collection of audio files stored in the electronic device, any device functionally connected to the electronic device or a connected computer network. Connection to the computer network may be via wired or wireless means.
  • Each audio file in the collection may be linked to at least one information entry in the index.
  • the at least one information entry may be musical notes or waveforms determined using semantic segmentation corresponding to a portion or the whole content stored in the audio files.
  • the information entry may also be a MIDI component that is linked/attached to an audio file like file metadata.
  • the information entry may also be obtainable from a pre-recorded vocal entry linked/attached to the audio file, or a connected computer network.
  • There may be an online database on the connected computer network where information entries of musical notes or waveforms are downloadable for each audio file.
  • a search is conducted on the collection of audio files stored in the electronic device, any device functionally connected to the electronic device or a connected computer network ( 28 ). This step is to determine whether audio files have been added to or removed from the collection. Subsequent to the search, information entries obtained from each audio file directly ( 25 ), information entries downloaded from the connected computer network for each audio file ( 29 ), or pre-recorded vocal entries linked to each audio file ( 23 ) may be combined into an index ( 30 ). The index is then loaded for use ( 32 ) in the electronic device.
  • the last used index is then loaded for use ( 32 ) in the electronic device.
  • the vocal input may be singing, humming, or whistling. In a particular instance, the vocal input need not be a song in its entirety. A portion of a song may be sufficient as a viable form of the vocal input.
  • the vocal input may be filtered.
  • a user may be able to manually select a specific vocal input ( 22 ) for the vocal reception mode.
  • Vocal reception by the electronic device may be speaker independent.
  • the vocal reception mode may have automatic volume correction for the vocal input if the vocal input is either too loud (such that distortion of input occurs) or too soft (such that input is inaudible).
  • the electronic device may also be able to overcome the problem of an off tune vocal input during the vocal input mode by providing a selection of audio files that most closely approximates to the off tune vocal input based on the entries of the audio files in the index.
  • the user may set the device to show the closest approximations up to a pre-determined number, such as, for example, the ten closest approximations.
  • the vocal input in analog form is converted into digital signals by a digital-analog converter ( 36 ).
  • the converter may be an analog-MIDI converter.
  • a processor in the electronic device may analyse the digital signals into discrete portions, where the discrete portions may be either musical notes or waveforms. Processing of the digital signals may be done using frequency spectrum analysis.
  • the processor may then compare the discrete portions with entries in the index ( 40 ). Exact or substantial similarity between the discrete portions and entries in the index enables the generation of a listing of audio files in order of extent of similarity ( 42 ).
  • the listing may show a number of audio files, a number that may be pre-determined by the user and may be shown on a display on the electronic device.
  • the extent of similarity may be based on relative closeness in terms of either musical notes or waveforms.
  • an apparatus 50 for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with the apparatus 50 may be for example, a vehicle audio system, desktop computer, notebook computer, PDA, portable media player or mobile phone.
  • the components described in the following sections may be incorporated in the aforementioned different forms of the apparatus 50 in addition to components used for their primary functionalities.
  • the apparatus 50 may include a digital storage device 58 for the storage of the audio files that make up the collection of files.
  • the digital storage device 58 may be non-volatile memory in the form of a hard disk drive or flash memory.
  • the digital storage device 58 may have capacities of at least a few megabytes.
  • the apparatus 50 may also include an indexer 56 for generating an index comprising of information entries obtained from each of the more than one audio files in the collection.
  • the index may comprise information entries obtained from each of the more than one audio file in a collection of audio files stored in the digital storage device 58 of the apparatus 50 , any device functionally connected to the apparatus 50 or a connected computer network.
  • Each audio file in the collection may be linked to at least one information entry in the index.
  • the at least one information entry may be musical notes or waveforms determined using semantic segmentation corresponding to a portion or the whole content stored in the audio files.
  • the information entry may also be a MIDI component that is linked/attached to an audio file like file metadata.
  • the information entry may also be obtainable from a pre-recorded vocal entry linked/attached to the audio file, or a connected computer network.
  • a vocal reception means 64 for receiving a vocal input during a vocal reception mode may also be included in the apparatus 50 .
  • the vocal reception means 64 may be a microphone.
  • the vocal input may be singing, humming, or whistling. In a particular instance, the vocal input need not be a song in its entirety. A portion of a song may be sufficient as a viable form of the vocal input.
  • the vocal input may also be filtered. There may be a selector to choose the type of vocal input, or detection of vocal input may be automatic.
  • the vocal reception mode may be activated by pressing an activating button 63 incorporated with the apparatus 50 at least once. Vocal input into the vocal reception means 64 may be speaker independent.
  • the vocal reception mode may have automatic volume correction for the vocal input if the vocal input is either too loud (such that distortion of input occurs) or too soft (such that input is inaudible).
  • the electronic device may also be able to overcome the problem of an off tune vocal input during the vocal input mode by providing a selection of audio files that most closely approximates to the off tune vocal input based on the entries of the audio files in the index.
  • the user may set the device to show the closest approximations up to a pre-determined number, such as, for example, the ten closest approximations.
  • the vocal reception means 64 may be coupled to a digital-analog converter 62 which converts the vocal input through the vocal reception means 64 into digital signals.
  • the converter 62 may be an analog-MIDI converter.
  • the converted digital signals are then passed into a processor 60 for analysis of the digital signals into discrete portions, where the discrete portions may be either musical notes or waveforms. Processing of the digital signals by the processor 60 may be done using frequency spectrum analysis.
  • the processor 60 may then be able to compare the discrete portions of the signals with the entries in the index generated by the indexer 56 . Audio files may thereby be accessible when the discrete portions substantially coincides with at least one of the information entries in the index.
  • Exact or substantial similarity between the discrete portions and entries in the index enable the generation of a listing of audio files in order of extent of similarity.
  • the listing may show a number of audio files, a number that may be pre-determined by the user.
  • a display 54 in the apparatus 50 allows for the listing of files to be shown clearly for selection by the user.
  • the extent of similarity may be based on relative closeness in terms of, either musical notes or waveforms.
  • the visually impaired may be able to use apparatus 50 to access files stored within or accessible with the apparatus 50 using tonal matching. While they are unable to select the files shown on the display 54 , they may access the audio file which has been extracted from the collection at their convenience just from using vocal input.
  • An alternative application of the present invention makes use of the vocal reception mode of the electronic device to ascertain and improve vocal abilities of users. For example, if a user repeatedly fails to find a desired audio file through the use of vocal input into the electronic device, it is highly probable that the user's vocal input (prowess) is flawed. Thus the user is then inclined to continually practice vocal input into the electronic device until improvement is attained in terms of a higher incidence of finding a desired audio file. Thus, a device to conveniently ascertain a level of quality for vocal input is also disclosed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Telephone Function (AREA)

Abstract

A method and apparatus for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with an electronic device. The method includes generating one index comprising information entries obtained from each of the more than one audio file in the collection, with each audio file in the collection information being linked to at least one information entry; receiving a vocal input during a voice reception mode; converting the vocal input into a digital signal using a digital-analog converter; analysing the digital signal using frequency spectrum analysis into discrete portions; and comparing the discrete portions with the entries in the index. It is advantageous that the audio file is accessed when the discrete portions substantially match at least one of the information entries in the index. It is preferable that the discrete portions are either musical notes or waveforms.

Description

    FIELD OF INVENTION
  • This invention relates to a method and apparatus for accessing an audio file from a collection of audio files, and particularly relates to the accessing of files using tonal matching.
  • BACKGROUND
  • The advent of the age of affordable digital entertainment has given rise to a sharp increase in the adoption of personal digital entertainment devices by consumers. Such personal digital entertainment devices are usually equipped with storage capacities of a range of sizes. Given the falling prices of storage devices like hard drives and flash memory, an increasing number of personal digital entertainment devices come with storage capacities exceeding. 1 GB. Storage capacities of such sizes in personal digital entertainment devices used for audio files enable the storage of hundreds and even thousands of files.
  • While the audio files may be stored and categorisable according to their song titles, artistes, genre or the like, there may be instances where a user may forget the title or artiste of a song, rendering a search for the pertinent audio file akin to searching for a needle in a haystack. In many instances, the user may only be able to remember a portion of the song or its tune. At the present moment, this does not aid in the search for the pertinent audio file in any way. This is a problem when attempting to access audio files in a large collection of audio files where certain information like title or artiste of a song is unknown. This problem also arises when the visually impaired attempts to access audio files in a collection of audio files where they are unable to select the audio files through the use of sight.
  • It is also rather difficult to improve one's vocal prowess without engaging expensive vocal coaches. It is currently difficult to improve one's vocal prowess independently besides using karaoke machines with “scoring” functionalities incorporated in them. There are currently few devices available which are able to determine the quality of one's vocal prowess easily and conveniently.
  • SUMMARY OF INVENTION
  • In a preferred aspect of the present invention, there is provided a method for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with an electronic device. The method includes generating one index comprising of information entries obtained from each of the more than one audio file in the collection, with each audio file in the collection information being linked to at least one information entry; receiving a vocal input during a voice reception mode; converting the vocal input into a digital signal using a digital-analog converter; analysing the digital signal using frequency spectrum analysis into discrete portions; and comparing the discrete portions with the entries in the index. It is advantageous that the audio file is accessed when the discrete portions substantially coincide with at least one of the information entries in the index. It is preferable that the discrete portions are either musical notes or waveforms. The at least one information entry may also be musical notes or waveforms.
  • The vocal input may preferably be speaker independent and may be in the form of singing, humming, or whistling. The form of vocal input may preferably be manually or automatically selectable.
  • It is preferable that the audio file is accessible from the electronic device itself, a device functionally connected to the electronic device or a connected computer network. The information entry may also preferably be received from the audio file, a pre-recorded vocal entry linked to the audio file, or a connected computer network. It is preferable that the electronic device is selected from the group comprising: vehicle audio system, desktop computer, notebook computer, PDA, portable media player and mobile phone.
  • It is also preferable that the method further includes selecting a facility to access the audio files by depressing a pre-determined button at least once, and filtering the vocal input.
  • There is also provided an apparatus for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with the apparatus. It is preferable that the apparatus includes an indexer for generating an index comprising of information entries obtained from each of the more than one audio files in the collection, with each audio file in the collection information being linked to at least one information entry; a vocal reception means for receiving a vocal input during a vocal reception mode; converting the vocal input into a digital signal using a digital-analog converter; and a processor to analyse the digital signal using frequency spectrum analysis into discrete portions, the processor also being able to compare the discrete portions with the entries in the index. Advantageously, the audio file is accessed when the discrete portions substantially coincide with at least one of the information entries in the index. The apparatus may include a display and the vocal input may be filtered. The vocal reception mode may be activated by depressing at least one button at least once. It is preferable that the discrete portions are musical notes or waveforms.
  • It is preferable that the apparatus is selected from the group comprising: vehicle audio system, desktop computer, notebook computer, PDA, portable media player and mobile phone.
  • It is preferable that the vocal input is either manually or automatically selected from the group comprising: singing, humming, and whistling. Advantageously, the vocal input is speaker independent. The at least one information entry may be selected from either musical notes or waveforms. Preferably, the at least one information entry is received from the audio file, a pre-recorded vocal entry linked to the audio file, or a connected computer network. The audio file may be accessible from the electronic device itself, any device functionally connected to the electronic device or a connected computer network.
  • There is also provided a method of determining a level of quality for vocal input using the aforementioned apparatus.
  • DESCRIPTION OF DRAWINGS
  • In order that the present invention may be fully understood and readily put into practical effect, there shall now be described by way of non-limitative example only preferred embodiments of the present invention, the description being with reference to the accompanying illustrative drawings.
  • FIG. 1 shows a flow chart of a method of a preferred embodiment of the present invention.
  • FIG. 2 shows a schematic diagram of an apparatus of a preferred embodiment of the present invention.
  • DESCRIPTION OF PREFERRED EMBODIMENTS
  • The following discussion is intended to provide a brief, general description of a suitable computing environment in which the present invention may be implemented. Although not required, the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a personal computer. Generally, program modules include routines, programs, characters, components, data structures, that perform particular tasks or implement particular abstract data types. As those skilled in the art will appreciate, the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
  • Referring to FIG. 1, there is provided flow chart of a method for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with an electronic device. The electronic device may be, for example, a vehicle audio system, a desktop computer, a notebook computer, a PDA, a portable media player or a mobile phone and the like. The method may include an enablement of a vocal reception mode (20) in the electronic device in a manner like, for example, depressing a pre-determined button on the electronic device at least once. The vocal reception mode may be enabled or disabled as it may prevent a power source in the electronic device from being continually drained by continual enablement of the vocal reception mode. The vocal reception mode may be for vocal input such as, for example, singing, humming, or whistling.
  • The enablement of the vocal reception mode in the electronic device may initialise an indexing system (24). Once the indexing system is initiated, the system then determines whether the composition of audio files in the collection has changed (26). The composition of audio files may include the number of audio files and the audio filenames. The index may comprise information entries obtained from each of the more than one audio file in the collection of audio files stored in the electronic device, any device functionally connected to the electronic device or a connected computer network. Connection to the computer network may be via wired or wireless means. Each audio file in the collection may be linked to at least one information entry in the index. The at least one information entry may be musical notes or waveforms determined using semantic segmentation corresponding to a portion or the whole content stored in the audio files. The information entry may also be a MIDI component that is linked/attached to an audio file like file metadata. The information entry may also be obtainable from a pre-recorded vocal entry linked/attached to the audio file, or a connected computer network. There may be an online database on the connected computer network where information entries of musical notes or waveforms are downloadable for each audio file.
  • If the composition of audio files is found to be different, a search is conducted on the collection of audio files stored in the electronic device, any device functionally connected to the electronic device or a connected computer network (28). This step is to determine whether audio files have been added to or removed from the collection. Subsequent to the search, information entries obtained from each audio file directly (25), information entries downloaded from the connected computer network for each audio file (29), or pre-recorded vocal entries linked to each audio file (23) may be combined into an index (30). The index is then loaded for use (32) in the electronic device.
  • If the composition of audio files is found to be unchanged, the last used index is then loaded for use (32) in the electronic device. With the enablement of the vocal reception mode, there may be vocal input into the device (34). The vocal input may be singing, humming, or whistling. In a particular instance, the vocal input need not be a song in its entirety. A portion of a song may be sufficient as a viable form of the vocal input. The vocal input may be filtered. A user may be able to manually select a specific vocal input (22) for the vocal reception mode. There may also be automatic detection of vocal input (22). Vocal reception by the electronic device may be speaker independent. The vocal reception mode may have automatic volume correction for the vocal input if the vocal input is either too loud (such that distortion of input occurs) or too soft (such that input is inaudible). The electronic device may also be able to overcome the problem of an off tune vocal input during the vocal input mode by providing a selection of audio files that most closely approximates to the off tune vocal input based on the entries of the audio files in the index. The user may set the device to show the closest approximations up to a pre-determined number, such as, for example, the ten closest approximations.
  • Subsequently, the vocal input in analog form is converted into digital signals by a digital-analog converter (36). The converter may be an analog-MIDI converter. Thereafter, a processor in the electronic device may analyse the digital signals into discrete portions, where the discrete portions may be either musical notes or waveforms. Processing of the digital signals may be done using frequency spectrum analysis. The processor may then compare the discrete portions with entries in the index (40). Exact or substantial similarity between the discrete portions and entries in the index enables the generation of a listing of audio files in order of extent of similarity (42). The listing may show a number of audio files, a number that may be pre-determined by the user and may be shown on a display on the electronic device. The extent of similarity may be based on relative closeness in terms of either musical notes or waveforms.
  • Referring to FIG. 2, there is provided an apparatus 50 for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with the apparatus 50. The apparatus 50 may be for example, a vehicle audio system, desktop computer, notebook computer, PDA, portable media player or mobile phone. The components described in the following sections may be incorporated in the aforementioned different forms of the apparatus 50 in addition to components used for their primary functionalities.
  • The apparatus 50 may include a digital storage device 58 for the storage of the audio files that make up the collection of files. The digital storage device 58 may be non-volatile memory in the form of a hard disk drive or flash memory. The digital storage device 58 may have capacities of at least a few megabytes.
  • In addition, the apparatus 50 may also include an indexer 56 for generating an index comprising of information entries obtained from each of the more than one audio files in the collection. The index may comprise information entries obtained from each of the more than one audio file in a collection of audio files stored in the digital storage device 58 of the apparatus 50, any device functionally connected to the apparatus 50 or a connected computer network. Each audio file in the collection may be linked to at least one information entry in the index. The at least one information entry may be musical notes or waveforms determined using semantic segmentation corresponding to a portion or the whole content stored in the audio files. The information entry may also be a MIDI component that is linked/attached to an audio file like file metadata. The information entry may also be obtainable from a pre-recorded vocal entry linked/attached to the audio file, or a connected computer network. There may be an online database on the connected computer network where information entries of musical notes or waveforms are downloadable for each audio file.
  • A vocal reception means 64 for receiving a vocal input during a vocal reception mode may also be included in the apparatus 50. The vocal reception means 64 may be a microphone. The vocal input may be singing, humming, or whistling. In a particular instance, the vocal input need not be a song in its entirety. A portion of a song may be sufficient as a viable form of the vocal input. The vocal input may also be filtered. There may be a selector to choose the type of vocal input, or detection of vocal input may be automatic. The vocal reception mode may be activated by pressing an activating button 63 incorporated with the apparatus 50 at least once. Vocal input into the vocal reception means 64 may be speaker independent. The vocal reception mode may have automatic volume correction for the vocal input if the vocal input is either too loud (such that distortion of input occurs) or too soft (such that input is inaudible). The electronic device may also be able to overcome the problem of an off tune vocal input during the vocal input mode by providing a selection of audio files that most closely approximates to the off tune vocal input based on the entries of the audio files in the index. The user may set the device to show the closest approximations up to a pre-determined number, such as, for example, the ten closest approximations.
  • The vocal reception means 64 may be coupled to a digital-analog converter 62 which converts the vocal input through the vocal reception means 64 into digital signals. The converter 62 may be an analog-MIDI converter. The converted digital signals are then passed into a processor 60 for analysis of the digital signals into discrete portions, where the discrete portions may be either musical notes or waveforms. Processing of the digital signals by the processor 60 may be done using frequency spectrum analysis. The processor 60 may then be able to compare the discrete portions of the signals with the entries in the index generated by the indexer 56. Audio files may thereby be accessible when the discrete portions substantially coincides with at least one of the information entries in the index. Exact or substantial similarity between the discrete portions and entries in the index enable the generation of a listing of audio files in order of extent of similarity. The listing may show a number of audio files, a number that may be pre-determined by the user. A display 54 in the apparatus 50 allows for the listing of files to be shown clearly for selection by the user. The extent of similarity may be based on relative closeness in terms of, either musical notes or waveforms.
  • The visually impaired may be able to use apparatus 50 to access files stored within or accessible with the apparatus 50 using tonal matching. While they are unable to select the files shown on the display 54, they may access the audio file which has been extracted from the collection at their convenience just from using vocal input.
  • An alternative application of the present invention makes use of the vocal reception mode of the electronic device to ascertain and improve vocal abilities of users. For example, if a user repeatedly fails to find a desired audio file through the use of vocal input into the electronic device, it is highly probable that the user's vocal input (prowess) is flawed. Thus the user is then inclined to continually practice vocal input into the electronic device until improvement is attained in terms of a higher incidence of finding a desired audio file. Thus, a device to conveniently ascertain a level of quality for vocal input is also disclosed.
  • Whilst there has been described in the foregoing description preferred embodiments of the present invention, it will be understood by those skilled in the technology concerned that many variations or modifications in details of design or construction may be made without departing from the present invention.

Claims (27)

1. A method for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with an electronic device, including:
generating one index comprising of information entries obtained from each of the more than one audio file in the collection, with each audio file in the collection being linked to at least one information entry;
receiving a vocal input during a voice reception mode;
converting the vocal input into a digital signal using a digital-analog converter;
analysing the digital signal using frequency spectrum analysis into discrete portions; and
comparing the discrete portions with the information entries in the index,
wherein the at least one audio file is accessed when the discrete portions substantially match at least one information entry in the index.
2. The method of claim 1, wherein the discrete portions are selected from the group consisting of: musical notes and waveforms.
3. The method of claim 1, wherein the vocal input is selected from the group consisting of: singing, humming, and whistling.
4. The method of claim 1, wherein the at least one information entry is selected from the group consisting of: musical notes and waveforms.
5. The method of claim 1, wherein the audio file accessible from a source selected from the group consisting of: the electronic device, any device functionally connected to the electronic device and a connected computer network.
6. The method of claim 3, wherein the vocal input is set by means selected from the group consisting of: manual selection and automatic selection.
7. The method of claim 1, wherein the vocal input is speaker independent.
8. The method of claim 1, wherein the at least one information entry is received from a source selected from the group consisting of: the audio file, a pre-recorded vocal entry linked to the audio file, and a connected computer network.
9. The method of claim 1, wherein the electronic device is selected from the group consisting of: vehicle audio system, desktop computer, notebook computer, PDA, portable media player and mobile phone.
10. The method of claim 1, further including selecting a facility to access the audio files by depressing a pre-determined button at least once.
11. The method of claim 1, further including filtering the vocal input.
12. An apparatus for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with the apparatus, including:
an indexer configured to generate an index comprising information entries obtained from each of the more than one audio files in the collection, with each audio file in the collection being linked to at least one information entry;
a vocal receiver configured to receive a vocal input during a vocal reception mode;
a digital signal using a digital-analog converter configured to convert the vocal input into a digital signal; and
a processor configured to analyse the digital signal using frequency spectrum analysis into discrete portions and to compare the discrete portions with the information entries in the index,
wherein the at least one audio file is accessed when the discrete portions substantially match at least one information entry in the index.
13. The apparatus of claim 12, wherein the apparatus is selected from the group consisting of: vehicle audio system, desktop computer, notebook computer, PDA, portable media player and mobile phone.
14. The apparatus of claim 12, wherein the vocal input is selected from the group consisting of: singing, humming, and whistling.
15. The apparatus of claim 14, wherein the vocal input is set by means selected from the group consisting of: manual selection and automatic selection.
16. The apparatus of claim 12, wherein the at least one information entry is selected from the group consisting of: musical notes and waveforms.
17. The apparatus of claim 12, wherein the vocal input is speaker independent.
18. The apparatus of claim 12, wherein the at least one information entry is received from a source selected from the group consisting of: the audio file, a pre-recorded vocal entry linked to the audio file, and a connected computer network.
19. The apparatus of claim 12, wherein the vocal reception mode is activated by depressing at least one button at least once.
20. The apparatus of claim 12, further including a display.
21. The apparatus of claim 12, wherein the vocal input is filtered.
22. The apparatus of claim 12, wherein the discrete portions are selected from the group consisting of: musical notes and waveforms.
23. The apparatus of claim 12, wherein the audio file is accessible from a source selected from the group consisting of: the electronic device, any device functionally connected to the electronic device and a connected computer network.
24. A method of determining a level of quality for vocal input using the apparatus of claim 12.
25. A method for accessing at least one audio file from a collection of audio files stored within or accessible with an electronic device, the method comprising:
generating an index comprising information entries obtained from audio files in the collection, each audio file in the collection having at least one corresponding information entry in the index;
analysing a digital signal into discrete portions, the digital signal being obtained from a converted vocal input received during a voice reception mode; and
comparing the discrete portions with the information entries in the index,
wherein the at least one audio file is accessed when the discrete portions substantially match at least one information entry in the index.
26. The method according to claim 25, wherein the digital signal is analysed into discrete portions using frequency spectrum analysis.
27. The method according to claim 25, wherein the vocal input is converted into the digital signal using a digital analog converter.
US11/439,760 2006-05-23 2006-05-23 Method and apparatus for accessing an audio file from a collection of audio files using tonal matching Abandoned US20070276668A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US11/439,760 US20070276668A1 (en) 2006-05-23 2006-05-23 Method and apparatus for accessing an audio file from a collection of audio files using tonal matching
PCT/SG2007/000140 WO2007136349A1 (en) 2006-05-23 2007-05-22 A method and apparatus for accessing an audio file from a collection of audio files using tonal matching
US12/301,878 US8892565B2 (en) 2006-05-23 2007-05-22 Method and apparatus for accessing an audio file from a collection of audio files using tonal matching
CN2007800190803A CN101454778B (en) 2006-05-23 2007-05-22 A method and apparatus for accessing an audio file from a collection of audio files using tonal matching
TW096118334A TWI454942B (en) 2006-05-23 2007-05-23 A method and apparatus for accessing an audio file from a collection of audio files using tonal matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/439,760 US20070276668A1 (en) 2006-05-23 2006-05-23 Method and apparatus for accessing an audio file from a collection of audio files using tonal matching

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/301,878 Continuation US8892565B2 (en) 2006-05-23 2007-05-22 Method and apparatus for accessing an audio file from a collection of audio files using tonal matching

Publications (1)

Publication Number Publication Date
US20070276668A1 true US20070276668A1 (en) 2007-11-29

Family

ID=38723575

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/439,760 Abandoned US20070276668A1 (en) 2006-05-23 2006-05-23 Method and apparatus for accessing an audio file from a collection of audio files using tonal matching
US12/301,878 Active 2030-07-13 US8892565B2 (en) 2006-05-23 2007-05-22 Method and apparatus for accessing an audio file from a collection of audio files using tonal matching

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/301,878 Active 2030-07-13 US8892565B2 (en) 2006-05-23 2007-05-22 Method and apparatus for accessing an audio file from a collection of audio files using tonal matching

Country Status (4)

Country Link
US (2) US20070276668A1 (en)
CN (1) CN101454778B (en)
TW (1) TWI454942B (en)
WO (1) WO2007136349A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8478719B2 (en) 2011-03-17 2013-07-02 Remote Media LLC System and method for media file synchronization
US8688631B2 (en) 2011-03-17 2014-04-01 Alexander Savenok System and method for media file synchronization

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024388A1 (en) * 2007-06-11 2009-01-22 Pandiscio Jill A Method and apparatus for searching a music database
TWI383693B (en) * 2008-10-31 2013-01-21 Hon Hai Prec Ind Co Ltd Testing device capable of testing audio formats supported by an audio player device and method thereof
US8584198B2 (en) * 2010-11-12 2013-11-12 Google Inc. Syndication including melody recognition and opt out
US9183849B2 (en) 2012-12-21 2015-11-10 The Nielsen Company (Us), Llc Audio matching with semantic audio recognition and report generation
US9195649B2 (en) 2012-12-21 2015-11-24 The Nielsen Company (Us), Llc Audio processing techniques for semantic audio recognition and report generation
US9158760B2 (en) 2012-12-21 2015-10-13 The Nielsen Company (Us), Llc Audio decoding with supplemental semantic audio recognition and report generation
KR102161237B1 (en) * 2013-11-25 2020-09-29 삼성전자주식회사 Method for outputting sound and apparatus for the same
TWI579716B (en) * 2015-12-01 2017-04-21 Chunghwa Telecom Co Ltd Two - level phrase search system and method
CN106098058B (en) * 2016-06-23 2018-09-07 腾讯科技(深圳)有限公司 Tone line generation method and device
US9922631B2 (en) * 2016-06-24 2018-03-20 Panasonic Automotive Systems Company of America, a division of Panasonic Corporation of North America Car karaoke

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4915001A (en) * 1988-08-01 1990-04-10 Homer Dillard Voice to music converter
US6057502A (en) * 1999-03-30 2000-05-02 Yamaha Corporation Apparatus and method for recognizing musical chords
US6510410B1 (en) * 2000-07-28 2003-01-21 International Business Machines Corporation Method and apparatus for recognizing tone languages using pitch information
US20040060424A1 (en) * 2001-04-10 2004-04-01 Frank Klefenz Method for converting a music signal into a note-based description and for referencing a music signal in a data bank
US6938209B2 (en) * 2001-01-23 2005-08-30 Matsushita Electric Industrial Co., Ltd. Audio information provision system
US20080236364A1 (en) * 2007-01-09 2008-10-02 Yamaha Corporation Tone processing apparatus and method
US7488886B2 (en) * 2005-11-09 2009-02-10 Sony Deutschland Gmbh Music information retrieval using a 3D search algorithm
US20090064851A1 (en) * 2007-09-07 2009-03-12 Microsoft Corporation Automatic Accompaniment for Vocal Melodies
US7544881B2 (en) * 2005-10-28 2009-06-09 Victor Company Of Japan, Ltd. Music-piece classifying apparatus and method, and related computer program

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001069575A1 (en) * 2000-03-13 2001-09-20 Perception Digital Technology (Bvi) Limited Melody retrieval system
US6735563B1 (en) * 2000-07-13 2004-05-11 Qualcomm, Inc. Method and apparatus for constructing voice templates for a speaker-independent voice recognition system
US7031980B2 (en) * 2000-11-02 2006-04-18 Hewlett-Packard Development Company, L.P. Music similarity function based on signal analysis
CA2563478A1 (en) * 2004-04-16 2005-10-27 James A. Aman Automatic event videoing, tracking and content generation system
US20070195963A1 (en) * 2006-02-21 2007-08-23 Nokia Corporation Measuring ear biometrics for sound optimization
US8750484B2 (en) * 2007-03-19 2014-06-10 Avaya Inc. User-programmable call progress tone detection

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4915001A (en) * 1988-08-01 1990-04-10 Homer Dillard Voice to music converter
US6057502A (en) * 1999-03-30 2000-05-02 Yamaha Corporation Apparatus and method for recognizing musical chords
US6510410B1 (en) * 2000-07-28 2003-01-21 International Business Machines Corporation Method and apparatus for recognizing tone languages using pitch information
US6938209B2 (en) * 2001-01-23 2005-08-30 Matsushita Electric Industrial Co., Ltd. Audio information provision system
US20040060424A1 (en) * 2001-04-10 2004-04-01 Frank Klefenz Method for converting a music signal into a note-based description and for referencing a music signal in a data bank
US7544881B2 (en) * 2005-10-28 2009-06-09 Victor Company Of Japan, Ltd. Music-piece classifying apparatus and method, and related computer program
US7488886B2 (en) * 2005-11-09 2009-02-10 Sony Deutschland Gmbh Music information retrieval using a 3D search algorithm
US20080236364A1 (en) * 2007-01-09 2008-10-02 Yamaha Corporation Tone processing apparatus and method
US20090064851A1 (en) * 2007-09-07 2009-03-12 Microsoft Corporation Automatic Accompaniment for Vocal Melodies

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8478719B2 (en) 2011-03-17 2013-07-02 Remote Media LLC System and method for media file synchronization
US8688631B2 (en) 2011-03-17 2014-04-01 Alexander Savenok System and method for media file synchronization

Also Published As

Publication number Publication date
TWI454942B (en) 2014-10-01
US8892565B2 (en) 2014-11-18
CN101454778B (en) 2011-12-07
US20110238666A1 (en) 2011-09-29
TW200813759A (en) 2008-03-16
WO2007136349A1 (en) 2007-11-29
CN101454778A (en) 2009-06-10

Similar Documents

Publication Publication Date Title
US20070276668A1 (en) Method and apparatus for accessing an audio file from a collection of audio files using tonal matching
US6476306B2 (en) Method and a system for recognizing a melody
US8352268B2 (en) Systems and methods for selective rate of speech and speech preferences for text to speech synthesis
US8712776B2 (en) Systems and methods for selective text to speech synthesis
US8396714B2 (en) Systems and methods for concatenation of words in text to speech synthesis
US8352272B2 (en) Systems and methods for text to speech synthesis
US7908338B2 (en) Content retrieval method and apparatus, communication system and communication method
US8380507B2 (en) Systems and methods for determining the language to use for speech generated by a text to speech engine
US20100082327A1 (en) Systems and methods for mapping phonemes for text to speech synthesis
US20100082329A1 (en) Systems and methods of detecting language and natural language strings for text to speech synthesis
US20100082328A1 (en) Systems and methods for speech preprocessing in text to speech synthesis
EP1934828A2 (en) Method and system to control operation of a playback device
US20070288517A1 (en) Information processing system, terminal device, information processing method, and program
RU2381548C2 (en) Method and system for providing music-related information by using audio dna
US20140129235A1 (en) Audio tracker apparatus
WO2008089647A1 (en) Music search method based on querying musical piece information
JP2012103832A (en) Information processor, method, information processing system and program
US20090247096A1 (en) Method And System For Integrated FM Recording
KR20080083290A (en) Method and apparatus for accessing digital files in a collection of digital files
CN107679196A (en) A kind of multimedia recognition methods, electronic equipment and storage medium
KR101576683B1 (en) Method and apparatus for playing audio file comprising history storage
KR20080014188A (en) Music file retrieval system and its method in mobile terminal
KR20090062548A (en) Content retrieval method and mobile communication terminal using same
WO2006095847A1 (en) Contents acquiring device, method used in such contents acquiring device, program used in such contents acquiring device, and recording medium with such program recorded therein
KR20140092028A (en) Song recommendation system and terminal and song recommendation method using the same

Legal Events

Date Code Title Description
AS Assignment

Owner name: CREATIVE TECHNOLOGY LTD., SINGAPORE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:XU, JUN;ZHANG, HUAYUN;REEL/FRAME:018022/0931

Effective date: 20060718

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION