CN101271457A

CN101271457A - A melody-based music retrieval method and device

Info

Publication number: CN101271457A
Application number: CNA2007100646076A
Authority: CN
Inventors: 陈路佳; 胡包钢
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2007-03-21
Filing date: 2007-03-21
Publication date: 2008-09-24
Anticipated expiration: 2027-03-21
Also published as: CN101271457B

Abstract

The invention discloses a digital music retrieval method and its device, which can search for music containing a specified melody by using the music melody as a keyword. The invention provides users with two methods of inputting the melody: playing and humming. For the humming input method, a series of signal processing methods are used to analyze the humming audio signal and extract the melody information from it. For the music library, the inverted algorithm is used to compile the index to improve the efficiency of the search. The device of the present invention is divided into a server end and a client end. The function of the server end is to maintain the music database and its index, and respond to the inquiry request of the client end; The invention uses music melody to search for music to make up for the shortcomings of the traditional text-based search method, enabling users to search for desired music without knowing the text information; users can use common equipment such as computers, mobile phones, etc. to search for music.

Description

A melody-based music retrieval method and device

技术领域 technical field

本发明属于计算机技术应用领域，具体的涉及对数字音乐用旋律作为关键字的检索方法，以及使该方法能够顺利运行的计算机硬件及通讯设备装置。The invention belongs to the application field of computer technology, and in particular relates to a retrieval method for digital music using melody as a keyword, as well as computer hardware and communication equipment that enable the method to run smoothly.

背景技术 Background technique

随着互联网信息量的几何级数的增长，怎样从海量的信息库中迅速而准确地找到我们需要的信息，成为人们使用互联网的一大瓶颈。基于内容的多媒体检索是一个新兴的研究领域，它给人们提供了全新的搜索方式：用多媒体本身来搜索多媒体信息。多媒体信息有音频、视频、图像、动画等多种形式，其中音频信息占有相当大的比例。而在音频当中，音乐又是最常见的形式。目前的音乐检索，主要根据文本关键字来搜索，例如音乐名，作者，演唱歌星，专辑，流派，歌词等。但是音乐本身与文本关键字有着本质的不同，用户使用关键字进行搜索，前提条件是用户必须对目标音乐有所了解，熟悉与之相关的文本信息。如果用户只是对音乐旋律本身感兴趣，而对歌名，歌词等文本信息一无所知，现有的音乐搜索方法就无能为力了。With the exponential growth of the amount of Internet information, how to quickly and accurately find the information we need from the massive information database has become a major bottleneck for people to use the Internet. Content-based multimedia retrieval is an emerging research field, which provides people with a new way of searching: using multimedia itself to search multimedia information. Multimedia information has various forms such as audio, video, image, and animation, among which audio information occupies a considerable proportion. Among audio, music is the most common form. The current music search is mainly based on text keywords, such as music name, author, singer, album, genre, lyrics, etc. However, music itself is fundamentally different from text keywords. Users use keywords to search, but the prerequisite is that users must understand the target music and be familiar with the relevant text information. If the user is only interested in the music melody itself, but knows nothing about text information such as song titles and lyrics, the existing music search methods will be useless.

发明内容 Contents of the invention

现有的音乐关键字检索技术，如果不知目标音乐的文本关键字，这种文本关键字搜索方法就无能为力了，为了解决现有技术的问题，本发明的目的是提供一种基于旋律的数字音乐检索方法及装置。Existing music keyword retrieval technology, if do not know the text keyword of target music, this text keyword search method just can't do anything, in order to solve the problem of prior art, the purpose of the present invention is to provide a kind of digital music based on melody Retrieval method and device.

为了实现所述的目的，本发明第一方面，提供基于旋律的音乐检索方法，步骤如下所述：In order to achieve the stated purpose, the first aspect of the present invention provides a melody-based music retrieval method, the steps are as follows:

步骤S1：指定待查音乐中的一段旋律作为搜索的旋律关键字；Step S1: Designate a melody in the music to be searched as the melody keyword for searching;

步骤S2：将所指定的旋律关键字输入查询客户端设备，经过处理得到数字化旋律信号；Step S2: Input the specified melody keyword into the query client device, and obtain a digitized melody signal after processing;

步骤S3：将音乐库中的音乐建立索引，该索引体现音乐的旋律特征，形成索引化的音乐数据库；Step S3: indexing the music in the music library, the index reflects the melody characteristics of the music, and forms an indexed music database;

步骤S4：由搜索引擎将数字化旋律信号与产生的音乐数据库中的旋律进行比较，从音乐数据库选出一组包含指定关键字音乐旋律的一组音乐；Step S4: The search engine compares the digitized melody signal with the melody in the generated music database, and selects a group of music containing the specified keyword music melody from the music database;

步骤S5：将选出的音乐按照与旋律关键字的相似程度递减排序。Step S5: sort the selected music in descending order according to the degree of similarity to the melody keyword.

所述音乐输入方式包括：弹奏输入和哼唱输入。The music input methods include: playing input and humming input.

所述索引，为针对旋律片段的旋律特征而的编制索引。The index is indexing for the melody features of the melody segment.

所述对于哼唱输入方式，采取如下步骤获得数字化的旋律信号：For the humming input method, the following steps are taken to obtain the digitized melody signal:

步骤S21：使用音频采集设备采集用户的哼唱输入；Step S21: using an audio collection device to collect the user's humming input;

步骤S22：对用户输入的音频信号进行预滤波处理，包括直流消除、增益标准化、低通滤波处理，得到音频帧序列信号；Step S22: Perform pre-filtering processing on the audio signal input by the user, including DC elimination, gain normalization, and low-pass filtering processing, to obtain audio frame sequence signals;

步骤S23：对音频帧序列信号进行时域或频域分析，提取基频序列；Step S23: Perform time domain or frequency domain analysis on the audio frame sequence signal to extract the fundamental frequency sequence;

步骤S24：对基频序列进行进一步处理，包括线性化、求差，得到数字化的旋律信号。Step S24: Perform further processing on the fundamental frequency sequence, including linearization and difference calculation, to obtain a digitized melody signal.

为了实现所述的目的，本发明第二方面，提供基于旋律的音乐检索装置，包括：In order to achieve the stated purpose, the second aspect of the present invention provides a melody-based music retrieval device, including:

至少一台服务器提供在线音乐旋律检索服务；At least one server provides online music melody retrieval service;

和至少一台客户端终端设备发出在线音乐旋律检索请求，并接收服务器查询音乐旋律的结果。Send an online music melody retrieval request with at least one client terminal device, and receive the result of querying the music melody from the server.

所述客户端，包括：The client includes:

输入模块，用于输入需要查找的音乐旋律信息，并将其发送至服务器端；搜索结果的显示模块，客户端通过网络或其他传输方式从服务器端获得搜索结果，并呈现给用户。The input module is used to input the music melody information to be searched and send it to the server; the display module of the search result, the client obtains the search result from the server through the network or other transmission methods, and presents it to the user.

所述输入模块，包括：The input module includes:

音频采集单元用于采集用户的哼唱音频信号；音符采集单元用于采集用户弹奏的音符旋律信号；音频信号处理单元，将音频采集单元采集的音频信号转化为音乐旋律信号。The audio collection unit is used to collect the humming audio signal of the user; the note collection unit is used to collect the note melody signal played by the user; the audio signal processing unit converts the audio signal collected by the audio collection unit into a music melody signal.

所述服务器，包括：The server includes:

音乐数据源接口单元，用于提供访问各种数据源获取原始音乐数据的接口；数据获取与分析单元，用于收集原始的音乐数据，并对音乐数据进行分析，从中提取出音乐旋律信息；索引编制单元，用于将数据获取与分析单元获取的原始音乐数据按照其旋律特征建立索引；搜索单元，用于接收客户端输入模块的查询请求，并在索引编制单元生成的索引中搜索包含与客户端输入模块提供的旋律关键字相同或相近旋律的音乐，将搜索结果列表按相似程度倒序排序，并反馈回客户端的搜索结果显示模块。The music data source interface unit is used to provide an interface for accessing various data sources to obtain original music data; the data acquisition and analysis unit is used to collect original music data and analyze the music data to extract music melody information; index The compilation unit is used to index the original music data obtained by the data acquisition and analysis unit according to its melody characteristics; the search unit is used to receive the query request from the input module of the client, and search the index generated by the index compilation unit for information related to the client. The terminal input module provides music with the same or similar melody as the melody keyword, sorts the search result list in reverse order of similarity, and feeds back to the search result display module of the client.

所述音乐数据源接口单元，提供以下的一种或几种数据获取方式的接口：The music data source interface unit provides the interface of one or more of the following data acquisition methods:

Web：采取Web网络抓取的方式，自动在互联网上漫游，抓取音乐文件和与该音乐文件相关的信息；文件：对本地或网络文件系统中存储的音乐文件进行抓取和分析；数据库：对数据库中记录的音乐文件进行提取和分析。Web: Take the method of web crawling, automatically roam on the Internet, grab music files and information related to the music files; file: grab and analyze music files stored in the local or network file system; database: Extract and analyze music files recorded in the database.

所述客户端为以下设备中的一种或几种：The client is one or more of the following devices:

个人电脑；智能移动设备包括：手机，个人数字助理，车载智能终端等；电话；具有媒体点播功能的音视频娱乐设备：包括卡拉OK点唱设备。Personal computers; smart mobile devices include: mobile phones, personal digital assistants, vehicle-mounted intelligent terminals, etc.; telephones; audio and video entertainment devices with media on-demand functions: including karaoke singing devices.

所述的客户端选择个人电脑设备时，个人电脑客户端从服务器下载安装特定的Web浏览器插件软件，用户访问服务器提供的音乐检索Web网站时，用于为用户提供音频采集输入和音符采集旋律的用户界面，并且采集用户的查询输入，通过互联网发送至服务器。When the client selects a personal computer device, the personal computer client downloads and installs specific Web browser plug-in software from the server, and when the user accesses the music retrieval Web site provided by the server, it is used to provide the user with audio collection input and note collection melody The user interface, and collect the user's query input, and send it to the server through the Internet.

所述的客户端选择智能移动设备时，客户端安装特定的软件，该软件为用户提供音频采集和音符采集的用户界面，并且采集用户的查询输入，通过无线网络发送至服务器。When the client selects a smart mobile device, the client installs specific software, which provides the user with a user interface for audio collection and note collection, and collects the user's query input and sends it to the server through the wireless network.

所述的客户端选择电话设备时，服务器提供特定的电话声讯台，客户端拨打该声讯台号码，利用电话数字键盘，或使用电话受话器分别作为音符采集和音频采集输入设备，服务器与客户端通过公共交换电话网络进行信息交互。When the client selects the telephone equipment, the server provides a specific telephone audio station, the client dials the audio station number, uses the telephone number keypad, or uses the telephone receiver as the note collection and audio collection input devices respectively, and the server and the client pass through the public exchange Telephone network for information exchange.

所述的客户端选择具有媒体点播功能的音视频娱乐设备时，客户端配备数字钢琴键盘设备，或安装虚拟钢琴键盘软件采集用户的钢琴键盘音符输入，利用卡拉OK麦克风采集用户的哼唱输入，服务器为专用本地服务器，搜索的范围为卡拉OK本地的音乐数据库。When the client selects an audio-video entertainment device with a media-on-demand function, the client is equipped with a digital piano keyboard device, or virtual piano keyboard software is installed to collect the user's piano keyboard note input, and the karaoke microphone is used to collect the user's humming input, The server is a dedicated local server, and the scope of searching is the local music database of karaoke.

所述服务器对于搜索结果选中的音乐列表，按照搜索结果与查询输入旋律的相似性递减排序，并发送回客户端进行显示。The server sorts the music list selected by the search result in descending order according to the similarity between the search result and the query input melody, and sends it back to the client for display.

本发明为用户提供了一种新的搜索方式，即：用音乐旋律搜索音乐。它弥补了传统基于文本搜索方式的不足，使用户在不知文本信息的情况下搜索想要的音乐；本发明还将此搜索方式实施于具体的硬件平台，使得用户可以使用常见的设备如电脑，手机等，进行音乐搜索。The present invention provides a new search mode for users, that is, to search music by music melody. It makes up for the shortcomings of the traditional text-based search method, enabling users to search for desired music without knowing the text information; the invention also implements this search method on a specific hardware platform, so that users can use common equipment such as computers, mobile phone, etc., for music search.

附图说明 Description of drawings

图1本发明结构示意图Fig. 1 structural representation of the present invention

具体实施方式 Detailed ways

下面将结合附图对本发明和优点加以详细说明，应指出的是，所描述的实施例仅旨在便于对本发明的理解，而对其不起任何限定作用。The present invention and its advantages will be described in detail below with reference to the accompanying drawings. It should be noted that the described embodiments are only intended to facilitate the understanding of the present invention, and have no limiting effect on it.

本发明主要研究基于内容的音乐检索(Content based MusicRetrieval)，提供一种用音乐本身来搜索音乐的方式。具体来说，就是以一小段音乐旋律作为搜索的关键字，搜索引擎返回一组包含指定关键旋律的一组音乐。旋律作为关键字，它不同于文本关键字，用户无法直接从键盘输入，而需要提供一种特殊的输入旋律的方法。最符合人们习惯的方法就是哼唱输入，用户只要使用音频采集输入设备，如麦克风，哼唱一段需要查找的旋律。此外，用户还可以通过虚拟的钢琴键盘，进行弹奏输入。The present invention mainly researches the content based Music Retrieval (Content based Music Retrieval), provides a kind of mode that uses music itself to search for music. Specifically, a short piece of music melody is used as a search keyword, and the search engine returns a group of music that contains the specified key melody. The melody is used as a keyword, which is different from the text keyword. The user cannot input directly from the keyboard, but needs to provide a special method for inputting the melody. The method most in line with people's habits is humming input. The user only needs to use an audio acquisition input device, such as a microphone, to hum a melody that needs to be searched. In addition, users can also play input through the virtual piano keyboard.

本发明的实施例提供了一个完整的计算机技术应用系统平台，它的功能是提供基于旋律的音乐搜索服务，该平台同时实现了音乐原始数据获取，音乐原始数据分析，音乐数据库索引编制，在线查询，音频信号处理，信息反馈等功能。该系统平台具备了在普通个人电脑、智能移动设备、电话、卡拉OK点唱设备等终端设备上进行哼唱输入和钢琴键盘弹奏输入音乐旋律的条件，并且具备了在以上这些终端设备上向用户显示或再现搜索结果的条件。Embodiments of the present invention provide a complete computer technology application system platform, its function is to provide music search service based on melody, the platform simultaneously realizes music original data acquisition, music original data analysis, music database indexing, online query , audio signal processing, information feedback and other functions. The system platform has the conditions for humming input and piano keyboard playing input music melody on terminal equipment such as ordinary personal computers, smart mobile devices, telephones, and karaoke equipment, and has the ability to communicate with the above terminal equipment. The conditions under which the user displays or renders search results.

本发明由多个功能模块有机结合而成，每个功能模块完成特定的功能。系统完整的结构如图1所示。本发明基于旋律的音乐检索装置，包括至少一台计算机作为服务器2提供在线音乐检索服务，和至少一台客户端1终端设备发出在线音乐检索请求，并接收服务器2的查询结果，服务器2从多种数据源获取并存储了包含大量音乐旋律特征的音乐旋律数据库，并且对数据库建立索引。当收到客户端1的查询请求时，服务器2对用户输入的查询旋律片段与数据库中的旋律进行比较，并过滤掉与查询旋律片段不相关的音乐，将剩下的若干个候选音乐按照与查询旋律片段的相似程度排序，将排序后的音乐列表返回客户端。客户端1为用户提供两种输入界面，接收用户的旋律输入并将其转化为可用于查询的数字化旋律信号。The present invention is formed by the organic combination of a plurality of functional modules, and each functional module completes a specific function. The complete structure of the system is shown in Figure 1. The melody-based music retrieval device of the present invention includes at least one computer as server 2 to provide online music retrieval service, and at least one client terminal device 1 sends an online music retrieval request, and receives the query result of server 2, and server 2 from multiple A data source acquires and stores a music melody database containing a large number of music melody features, and indexes the database. When receiving the query request from client 1, server 2 compares the query melody segment input by the user with the melody in the database, and filters out the music that is not related to the query melody segment, and divides the remaining candidate music according to the Query the similarity ranking of melody fragments, and return the sorted music list to the client. Client 1 provides two input interfaces for the user, receives the user's melody input and converts it into a digitized melody signal that can be used for query.

图1所示的结构图中，左半部虚线框中的部件是在客户端1终端设备中的模块，包括：输入模块11采集用户输入并发送到服务器2，搜索结果显示模块12将服务器2返回的查询结果呈现给用户。In the structural diagram shown in Figure 1, the components in the dotted box on the left half are modules in the terminal device of the client 1, including: the input module 11 collects user input and sends it to the server 2, and the search result display module 12 sends the server 2 The returned query results are presented to the user.

所述输入模块11，包括：音频采集单元111和音符采集单元113，分别用于采集用户的哼唱输入和弹奏输入；音频信号处理单元112，将音频采集单元111采集的音频信号转化为音乐旋律信息。The input module 11 includes: an audio collection unit 111 and a note collection unit 113, which are respectively used to collect humming input and playing input of the user; an audio signal processing unit 112, which converts the audio signal collected by the audio collection unit 111 into music Melody information.

音频采集单元111采集用户的哼唱输入。它由音频采集设备和一段录音程序软件组成。音频采集设备在个人电脑和卡拉OK点唱终端上通常是麦克风，在手机等通讯终端上通常为受话筒。它由录音软件驱动，将音频波形的模拟信号按录音软件指定的采样频率进行数字采集，将采集的数字脉冲序列存储在客户端1的存储器中。由于人声的基频(一次谐波)通常在2000Hz以内，根据Nyquist采样定理，为保证采集的数字信号不发生频率混叠，采样频率应该大于最高有效频率的2倍。由于本发明需要对人声的谐波进行分析，所以取采样频率为8000Hz或11025Hz。音频采集单元111每次采集的时间长度默认为10秒，可根据情况自行设定。The audio collection unit 111 collects the user's humming input. It consists of audio acquisition equipment and a piece of recording program software. The audio collection device is usually a microphone on a personal computer and a karaoke terminal, and is usually a receiving microphone on a communication terminal such as a mobile phone. It is driven by the recording software, digitally collects the analog signal of the audio waveform according to the sampling frequency specified by the recording software, and stores the collected digital pulse sequence in the memory of the client 1. Since the fundamental frequency (first harmonic) of the human voice is usually within 2000 Hz, according to the Nyquist sampling theorem, in order to ensure that the collected digital signal does not have frequency aliasing, the sampling frequency should be greater than twice the highest effective frequency. Since the present invention needs to analyze the harmonics of human voice, the sampling frequency is 8000Hz or 11025Hz. The audio collection unit 111 defaults to 10 seconds for each collection, which can be set according to the situation.

音频信号处理单元112，它将音频采集单元111采集的音频信号转化为音乐的旋律信息。音频信号处理单元112对音频信号进行以下处理：The audio signal processing unit 112 converts the audio signal collected by the audio collection unit 111 into melody information of music. The audio signal processing unit 112 performs the following processing on the audio signal:

步骤1)、音频采集单元111收集的音频信号通常含有直流分量，直流分量造成信号平衡位置电位的偏移，给信号的低频频谱分析造成误差。因此有必要消除信号的直流分量。由于直流信号有时不变特性，令所有采样点电位值减去采样信号全局的平衡点电位值，即可消除直流分量。为消除信号强弱的差别带来的误差，音频信号处理单元112还对信号强度进行了标准化处理，方法是对于一次采样信号的能量最大值，将其设为1，其余所有点以该点为标准成比例地放大或缩小，保证任何一次采样的能量最大值都相等。此外，将采样信号通过低通滤波器处理，能抑制高频噪声，提高信噪比。Step 1), the audio signal collected by the audio collection unit 111 usually contains a DC component, and the DC component causes a shift in the potential of the equilibrium position of the signal, causing errors in the analysis of the low-frequency spectrum of the signal. Therefore it is necessary to eliminate the DC component of the signal. Due to the sometimes constant characteristic of the DC signal, the DC component can be eliminated by subtracting the potential value of the global equilibrium point of the sampling signal from the potential values of all sampling points. In order to eliminate the error caused by the difference in signal strength, the audio signal processing unit 112 also standardized the signal strength by setting it as 1 for the energy maximum value of a sampled signal, and taking this point as the reference point for all other points. The standard is scaled up or down proportionally, ensuring that the maximum energy value for any one sample is equal. In addition, processing the sampling signal through a low-pass filter can suppress high-frequency noise and improve the signal-to-noise ratio.

步骤2)、对步骤1)处理的信号进行取帧，相邻帧之间有一定的重叠，在语音信号处理中，通常每帧信号长度在200毫秒以内，以使每一帧信号可近似看做平稳信号。对每帧数据进行加窗滤波处理。汉宁窗滤波公式如下：Step 2), the signal processed in step 1) is framed, and there is a certain overlap between adjacent frames. In speech signal processing, the length of each frame signal is usually within 200 milliseconds, so that each frame signal can be viewed approximately Make a smooth signal. Windowing and filtering are performed on each frame of data. The Hanning window filter formula is as follows:

${w w}_{Hn h} ((n no)) = = 0.5 0.5 [[11 - - cos cos ((\frac{22 πn πn}{N N - - 11}))]] {R R}_{n no} ((n no))$

步骤3)、傅利叶变换(Fourier Transform)是一种将时域信号变换为频域信号的方法，在频域中，信号在不同频率分量上的能量分布可以清晰直观地再现。本步骤中采用快速傅利叶变换(FFT)算法将步骤2)处理后的每帧信号变换到复频域，得到每个频率分量的复向量。每个复向量包括实轴和虚轴两个分量，取其平方之和，得到能量值，即表示了该帧信号在每个频率分量上的强弱。快速傅立叶变换要求输入的采样点数为2^N，若步骤2)中每帧采样点数目不足2^N，则将不足的点补0。Step 3), Fourier Transform (Fourier Transform) is a method of transforming a time-domain signal into a frequency-domain signal. In the frequency domain, the energy distribution of the signal on different frequency components can be clearly and intuitively reproduced. In this step, the fast Fourier transform (FFT) algorithm is used to transform each frame signal processed in step 2) into the complex frequency domain to obtain a complex vector of each frequency component. Each complex vector includes two components of the real axis and the imaginary axis, and the sum of their squares is taken to obtain the energy value, which represents the strength of the frame signal on each frequency component. The fast Fourier transform requires that the number of input sampling points be 2 ^N , if the number of sampling points in each frame in step 2) is less than 2 ^N , fill the insufficient points with 0.

步骤4)、在步骤3)处理后的每帧频域分布中，若能在人声频段找到能量的峰值，并且显著超过了背景噪声的能量，则满足条件的第一个峰值对应的频率为人声的基频值。将相邻帧的基频值进行比较，如果变化不大，则认为是同一音符，若变化较大，则认为是音符的转换。此外，静音帧也可以作为音符的分界。Step 4), in the frequency domain distribution of each frame processed in step 3), if the peak energy can be found in the human voice frequency band, and significantly exceeds the energy of the background noise, then the frequency corresponding to the first peak that satisfies the condition is human The fundamental frequency of the sound. Compare the fundamental frequency values of adjacent frames, if the change is not large, it is considered to be the same note, and if the change is large, it is considered to be the conversion of the note. Additionally, silence frames can also serve as note boundaries.

步骤5)、在相邻两个音符间，求其频率的对数差，得到旋律音符的差分特征序列。将频率值取对数，就是将随音阶指数增长的频率值线性化，使得音阶差与其频率的对数差成正比。以音符的对数频率差作为旋律的特征，可以消除不同用户哼唱时，不同的基调带来的差异。Step 5), between two adjacent notes, find the logarithmic difference of their frequencies, and obtain the difference feature sequence of the melody notes. Taking the logarithm of the frequency values linearizes the frequency values that grow exponentially with the scale so that the difference between the scales is proportional to the logarithmic difference of their frequencies. Using the logarithmic frequency difference of the note as the feature of the melody can eliminate the difference caused by different keynotes when different users hum.

经过以上5个步骤，人声哼唱的音频转化成了旋律特征信息，可以作为关键特征发送给服务器端进行搜索。在以上的基频提取步骤中，同样可以采用时域的方法，例如自相关法等。After the above five steps, the audio of human humming is converted into melody feature information, which can be sent to the server as a key feature for search. In the above fundamental frequency extraction step, a time-domain method, such as an autocorrelation method, can also be used.

音符采集单元113是采用钢琴键盘输入的方式提供弹奏输入旋律的界面。音符采集单元113在客户端1终端设备上显示钢琴键盘，用户可以用鼠标或其他触点设备如触摸屏，手写笔等点击相应的琴键输入旋律。音符采集单元113将钢琴的每个键按音高顺序编号，作为每个键的ID。用户所点击的相邻两键的ID之差即为与音频信号处理单元112的输出含义相同的音符差，作为旋律特征发送至服务器2端。钢琴键盘采集的音符信息无需进行信号处理的运算。因此，钢琴键盘输入的旋律具有无误差，速度快等优点。The note collection unit 113 is an interface for playing and inputting melody by means of piano keyboard input. The musical note collection unit 113 displays a piano keyboard on the terminal device of the client 1, and the user can use a mouse or other touch devices such as a touch screen, a stylus, etc. to click on the corresponding keys to input the melody. The musical note collection unit 113 numbers each key of the piano in order of pitch as an ID of each key. The difference between the IDs of two adjacent keys clicked by the user is the note difference with the same meaning as the output of the audio signal processing unit 112, which is sent to the server 2 as a melody feature. The note information collected by the piano keyboard does not require signal processing operations. Therefore, the melody inputted by the piano keyboard has the advantages of being error-free and fast.

由于普通的电话设备不具有数据处理能力，因此在电话终端设备中，音频信号处理单元112运行于服务器2端，客户端1电话设备仅仅负责收集用户的输入。在哼唱输入方式中，用户使用电话受话筒作为音频采集单元111，音频信号以通过公共电话交换网络(PSTN)传送至服务器端；在钢琴键盘输入方式中，用户使用电话的数字拨号键盘，以音乐简谱的方式输入旋律，服务器2端收到电话按键信号后，服务器2与客户端1通过公共交换电话网络(PSTN)进行信息交互，将其转化为对应的音乐音符，反馈给用户以便用户修正。Since ordinary telephone equipment does not have data processing capabilities, in the telephone terminal equipment, the audio signal processing unit 112 runs on the server 2 side, and the client terminal 1 telephone equipment is only responsible for collecting user input. In the input mode of humming, the user uses the telephone to receive the microphone as the audio collection unit 111, and the audio signal is transmitted to the server end through the public switched telephone network (PSTN); Enter the melody in the form of musical numbered notation. After the server 2 receives the phone key signal, the server 2 and the client 1 exchange information through the public switched telephone network (PSTN), convert it into corresponding musical notes, and feed back to the user for correction. .

搜索结果的显示模块12，客户端1通过网络或其他传输方式从服务器2端获得搜索所需音乐旋律信息结果。搜索结果以列表的形式呈现，列表中的每一项是一首音乐名(标题)，以及作者，歌手等信息。列表中的音乐按相似程度递减排序。The display module 12 of the search result, the client 1 obtains the music melody information result required by the search from the server 2 through the network or other transmission methods. The search result is presented in the form of a list, and each item in the list is a music name (title), and information such as author and singer. The music in the list is sorted in descending order of similarity.

图1的结构图中，右边虚框中的是服务器2，包括：音乐数据源接口单元21、数据获取与分析单元22、索引编制单元23、搜索单元24，它们在后台完成收集数据、分析数据、编制索引，并且在线进行搜索运算。In the structural diagram of Fig. 1, what is in the virtual box on the right is the server 2, including: music data source interface unit 21, data acquisition and analysis unit 22, indexing unit 23, search unit 24, they complete data collection and analysis data in the background , indexing, and online search operations.

数据获取与分析单元22，它负责收集原始的音乐数据文件，并对音乐数据文件进行分析，从中提取出音乐旋律信息；本发明直接支持的音乐文件格式是MIDI格式，因此，数据获取与分析单元22主要对MIDI音乐文件进行分析。MIDI文件格式是以数字指令的形式存储音乐的要素，如音高，时长，音色，节奏等。通过对MIDI文件中音乐数字指令序列的解析，可以很方便而且精确地提取出音乐的参数。MIDI音乐文件可以看作一个分层的结构。常见的MIDI文件有两种格式：单轨格式(Type 0)，和多轨格式(Type 1)。在单轨格式中，每个文件包含一个音轨(track)，每个音轨中有16个通道(channel)，每个通道可以存放一种乐器。在播放时，16个通道同时播放。单轨格式最多有16种乐器同时播放，能满足一般数字音乐的需要。多轨格式音乐文件中，每个文件包含多个音轨(track)，每个音轨也包含16个通道，但每个音轨只有一个通道是活动的，其他通道都为空。多个音轨也是同时播放。多轨格式可以同时播放多于16种乐器，因此一些表现力丰富的数字音乐常采用该格式。数据获取与分析单元22将两种文件格式统一，建立分层结构：MIDI-轨道-通道-音符四个层次，上层元素由下层元素的集合组成。每一个非空的通道都包含一段音符序列。数据获取与分析单元22将每一个音乐文件转化为一个具有分层结构的对象，并且还保存了该音乐的指纹信息，标题，作者等相关信息。Data acquisition and analysis unit 22, it is responsible for collecting original music data file, and music data file is analyzed, therefrom extracts music melody information; The music file format directly supported by the present invention is MIDI format, therefore, data acquisition and analysis unit 22 mainly analyzes MIDI music files. The MIDI file format stores elements of music in the form of digital instructions, such as pitch, duration, timbre, rhythm, etc. By analyzing the music digital instruction sequence in the MIDI file, the parameters of the music can be extracted conveniently and accurately. MIDI music files can be viewed as a hierarchical structure. Common MIDI files come in two formats: single-track format (Type 0), and multi-track format (Type 1). In the single-track format, each file contains a track, and each track has 16 channels, and each channel can store an instrument. During playback, 16 channels play simultaneously. The single-track format can play up to 16 instruments at the same time, which can meet the needs of general digital music. In a multi-track format music file, each file contains multiple audio tracks (tracks), and each audio track also includes 16 channels, but only one channel of each audio track is active, and the other channels are all empty. Multiple audio tracks are also played simultaneously. The multi-track format can play more than 16 instruments at the same time, so some expressive digital music often uses this format. The data acquisition and analysis unit 22 unifies the two file formats to establish a hierarchical structure: MIDI-track-channel-note four levels, and the upper-level elements are composed of lower-level elements. Each non-empty channel contains a sequence of notes. The data acquisition and analysis unit 22 converts each music file into an object with a hierarchical structure, and also saves the fingerprint information, title, author and other related information of the music.

任何一个搜索引擎，它的工作就是在一个可以接受的时间内返回一个和该用户查询匹配的信息列表。在这里，有三个概念需要注意：The job of any search engine is to return a list of information that matches the user's query within an acceptable amount of time. Here, there are three concepts to pay attention to:

1)可以接受的时间。这指的是响应时间。对于在Internet上向广大用户提供服务的软件来说，这个时间不能太长，通常也就是在“秒”这个量级。这是衡量搜索引擎可用性的一个基本指标，也是和传统信息检索系统的一个差别。更进一步的，这样的响应时间要求不仅要能满足单个用户的查询，而且要能在系统设计负载的情况下满足所有的用户。也就是说，系统应该在额定的吞吐率的情况下保证秒级响应时间。1) Acceptable time. This refers to response time. For software that provides services to a large number of users on the Internet, this time cannot be too long, usually on the order of "seconds". This is a basic index to measure the availability of search engines, and it is also a difference from traditional information retrieval systems. Furthermore, such a response time requirement must not only be able to satisfy a single user's query, but also satisfy all users under the system design load. In other words, the system should guarantee a second-level response time at the rated throughput rate.

2)匹配。以网页为例，指的是网页中以某种形式包含有用户输入的查询关键字的内容，或者出现与查询关键字非常相近的内容。在基于旋律的音乐搜索引擎系统中，匹配指的就是音乐的主旋律中包含用户输入的旋律关键字。用户旋律的输入与目标旋律有所偏差，因此，匹配不仅要能精确匹配，而且还需要有一定的容错能力。2) Match. Taking a webpage as an example, it means that the webpage contains the query keyword input by the user in some form, or the content very similar to the query keyword appears. In the melody-based music search engine system, matching means that the main melody of the music contains the melody keyword input by the user. The input of the user melody deviates from the target melody. Therefore, the matching must not only be able to match accurately, but also need to have a certain degree of fault tolerance.

3)列表。在搜索引擎返回给用户的搜索结果，通常是一个包含多项结果的列表，在这个列表中的每一个元素，与用户输入的关键字都有一定程度的相似或相关。然而绝大多数用户只关心排在结果列表中第一页的元素，因此，对搜索结果列表中元素的相似相关性排序是必需的。这种排序称为Rank。目前不同的搜索引擎采取了不尽相同的Ranking算法。如Google采用的是PageRank算法，它对结果中页面的重要性进行排序，而百度采用了竞价排名的方法等。3) List. The search result returned to the user by the search engine is usually a list containing multiple results, and each element in the list is similar or related to the keyword entered by the user to a certain extent. However, the vast majority of users only care about the elements ranked on the first page in the result list, so it is necessary to sort the elements in the search result list by similar relevance. This sorting is called Rank. At present, different search engines adopt different ranking algorithms. For example, Google uses the PageRank algorithm, which ranks the importance of the pages in the results, while Baidu uses the method of bidding ranking.

在搜索引擎系统中，索引算法的优劣，对以上三个性能指标有至关重要的影响。在目前的基于旋律的音乐搜索引擎中，多数采用的是线性匹配的算法。这种算法就是把用户的输入旋律和音乐文件中的旋律分别看作两个串，进行串的相似度对比。在基于内容的音乐搜索领域中，比较常用的有Suffix Tree，Suffix Array，Linear Alignment等方法。然而，线性搜索有一个共同的缺陷，在搜索过程中，需要对数据库中的每一个元素进行扫描，以确定是否匹配。这在原始数据库的数据量不大的时候是可以接受的，但是随着数据库的数据量的增大，在最理想的情况下，搜索的时间也会呈线性地增长，即搜索的时间复杂度至少为O(n)，例如，在Suffix Array算法中，其时间复杂度为O(nlogn)。现在大型搜索引擎的数据量，通常在10⁸至10⁹数量级，如果对如此庞大的数据库进行线性扫描，运算时间是用户无法接受的。因此，大型的搜索引擎，一般都采用倒排索引的算法。In the search engine system, the quality of the indexing algorithm has a crucial impact on the above three performance indicators. In current melody-based music search engines, most of them adopt linear matching algorithms. This algorithm is to regard the user's input melody and the melody in the music file as two strings respectively, and compare the similarity of the strings. In the field of content-based music search, methods such as Suffix Tree, Suffix Array, and Linear Alignment are commonly used. However, linear search has a common defect. During the search process, each element in the database needs to be scanned to determine whether it matches. This is acceptable when the amount of data in the original database is not large, but as the amount of data in the database increases, in the most ideal case, the search time will also increase linearly, that is, the time complexity of the search At least O(n), for example, in the Suffix Array algorithm, its time complexity is O(nlogn). Now the data volume of large search engines is usually on the order of 10 ⁸ to 10 ⁹ , if such a huge database is linearly scanned, the calculation time is unacceptable to users. Therefore, large search engines generally use the inverted index algorithm.

在众多的搜索算法中，倒排索引(Inverted Index)以灵活，高效，具有通用性等特点，迅速获得广泛应用。它是一种基于单词的索引算法，能够根据用户输入的关键字，直接过滤掉数据库中不相关的内容，并且能对相关内容的相关性进行排序，并且有良好的容错性能，可以对近似的内容进行识别。Among the many search algorithms, Inverted Index has been widely used rapidly due to its flexibility, high efficiency, and versatility. It is a word-based indexing algorithm, which can directly filter out irrelevant content in the database according to the keywords entered by users, and can sort the relevance of related content, and has good fault tolerance performance, and can approximate Content is identified.

在多数语言的文本中，词与词之间都有天然的分隔符，如空格，标点符号等。在中文等没有天然分词的语言中，也有比较成熟的分词技术。倒排索引就是根据每个单词在文章中出现在频率不同，将不同文章中出现的同一个词归为一类，以单词作为索引的主键，含有该单词的文章作为元素列表。这样，当一个查询中出现了几个特定的单词，系统就会直接去查找这几个特定单词下的文章元素，而与查询无关的文章就会被自动过滤掉。这种自动过滤不需要占用CPU资源，因此效率非常高。这种高效自动过滤不相关信息的机制，就是倒排索引这种独特的数据结构的优势所在。In the text of most languages, there are natural separators between words, such as spaces, punctuation marks, etc. In languages that do not have natural word segmentation, such as Chinese, there are relatively mature word segmentation technologies. Inverted index is to classify the same word that appears in different articles into one category according to the frequency of each word in the article. The word is used as the primary key of the index, and the article containing the word is used as the element list. In this way, when several specific words appear in a query, the system will directly search for article elements under these specific words, and articles irrelevant to the query will be automatically filtered out. This automatic filtering does not require CPU resources, so it is very efficient. This efficient and automatic filtering mechanism of irrelevant information is the advantage of the unique data structure of the inverted index.

在音乐搜索引擎系统中，搜索的对象是音乐旋律，而不是文本。因此需要对基于文本的倒排索引模型做一些修改，使之适应音乐旋律的索引编制。In the music search engine system, the object of search is music melody, not text. Therefore, some modification of the text-based inverted index model is needed to adapt it to the indexing of music melodies.

音乐旋律是由连续的音符序列构成。在音乐中，虽然也有小节可以将乐曲分成小段，但是在MIDI音乐格式中，并没有明显的小节分隔的标志。此外，休止符与文本中的空格很相似，只是在不同风格的音乐中，休止符的出现很随机，没有一个具有明显特征的规律。因此，小节和休止符这类音乐本身天然的分隔符都不适合划分旋律。A musical melody is composed of a continuous sequence of notes. In music, although there are also bars that can divide a piece of music into small sections, in the MIDI music format, there is no obvious sign of bar separation. In addition, rests are very similar to spaces in text, but in different styles of music, the appearance of rests is very random, and there is no regularity with obvious characteristics. Therefore, the natural separators of music such as bars and rests are not suitable for dividing melody.

由于音乐旋律本身目前没有找到一种良好的分词机制，因此本发明采用旋律片段切分方法。将一段连续的旋律切分为小段，每小段包含3～4个音符，段与段之间有一定的重叠。本发明将旋律片段作为音乐旋律的分词，运用倒排算法进行索引编制。当有新音乐曲目需要加入索引时，只需要对该曲进行旋律片段的划分，并将该曲分别加入每个旋律片段的元素集合中。Since the music melody itself has not found a good word segmentation mechanism at present, the present invention adopts a melody segment segmentation method. Divide a continuous melody into small sections, each section contains 3 to 4 notes, and there is a certain overlap between sections. The invention uses the melody segment as the word segmentation of the music melody, and uses the inverted algorithm to compile the index. When there is a new music track that needs to be added to the index, it is only necessary to divide the song into melody segments, and add the song to the element set of each melody segment.

索引编制单元23，用于根据以上方法将音乐旋律信息片段作为音乐旋律的分词进行索引编制，对数据获取与分析单元22提供的音乐数据建立索引。The indexing unit 23 is used for indexing the pieces of music melody information as word segments of the music melody according to the above method, and building an index for the music data provided by the data acquisition and analysis unit 22 .

搜索单元24，用于接收客户端输入模块11的查询请求，并在索引编制单元23生成的索引中搜索与客户端1中音频采集单元111或音符采集单元113查询的音乐旋律信息相同或相近旋律的音乐在线进行搜索运算，用于将搜索结果列表按相似程度倒序排序，并反馈回客户端的搜索结果显示模块12。The search unit 24 is used to receive the query request of the client input module 11, and search for the same or similar melody as the music melody information inquired by the audio collection unit 111 or the note collection unit 113 in the client 1 in the index generated by the index compilation unit 23 The online music search operation is used to sort the search result list in reverse order of similarity, and feed back to the search result display module 12 of the client.

上文提到，按相似程度对搜索结果进行排序，是搜索引擎一个重要的功能。搜索单元24根据客户端查询串和音乐库中旋律串中相同音符的个数来计算相似度，相同的音符越多，说明两者越相似。As mentioned above, sorting search results by similarity is an important function of search engines. The search unit 24 calculates the similarity according to the number of identical notes in the client query string and the melody string in the music library. The more identical notes, the more similar they are.

搜索单元24根据不同的客户端1设备采用不同的交互方式。The search unit 24 adopts different interaction modes according to different client devices 1 .

对于客户端1为个人电脑设备时，个人电脑客户端从服务器下载安装特定的Web浏览器插件软件，该插件软件集成了音频采集模块111中的录音程序和音符采集模块113的虚拟钢琴键盘程序。用户访问服务器提供的音乐检索Web网站时，用于为用户提供音频采集输入和音符采集旋律的用户界面，并且采集用户的查询输入，通过互联网发送至服务器。When client 1 is a personal computer device, the personal computer client downloads and installs specific Web browser plug-in software from server, and this plug-in software integrates the recording program in the audio collection module 111 and the virtual piano keyboard program of the musical note collection module 113. When the user accesses the music retrieval Web site provided by the server, it is used to provide the user with a user interface for audio collection input and note collection melody, and collects the user's query input and sends it to the server through the Internet.

对于客户端1为智能移动设备时，客户端1安装特定的软件，该软件基于用户使用的移动设备操作系统平台开发(如Windows Mobile平台，Linux平台，Nokia S60平台，Java平台等)，为用户提供音频采集输入和音符采集旋律的用户界面，并且采集用户的查询输入，通过无线网络发送至服务器。When client 1 is a smart mobile device, client 1 installs specific software, which is developed based on the mobile device operating system platform used by the user (such as Windows Mobile platform, Linux platform, Nokia S60 platform, Java platform, etc.), for the user Provide a user interface for audio collection input and note collection melody, and collect user query input and send it to the server through the wireless network.

对于客户端1选择电话设备时，服务器2提供特定的电话声讯台，客户端1拨打该声讯台号码，利用电话数字键盘，或使用电话受话器作为音频采集输入设备，服务器2与客户端1通过公共交换电话网络(PSTN)进行信息交互。When client 1 selects a telephone device, server 2 provides a specific telephone audio station, client 1 dials the audio station number, uses the telephone numeric keypad, or uses a telephone receiver as an audio collection input device, server 2 and client 1 through the public switching telephone Network (PSTN) for information exchange.

对于客户端1选择具有媒体点播功能的音视频娱乐设备，客户端1配备硬件数字钢琴键盘设备，或安装虚拟钢琴键盘软件采集用户的钢琴键盘音符输入，利用卡拉OK麦克风采集用户的哼唱输入，服务器2为专用本地服务器，搜索的范围为卡拉OK本地的音乐库。Select the audio-video entertainment device with media on demand function for client 1, client 1 is equipped with hardware digital piano keyboard equipment, or installs virtual piano keyboard software to collect user's piano keyboard note input, utilizes karaoke microphone to collect user's humming input, Server 2 is a dedicated local server, and the scope of searching is the local music library of karaoke.

对于电脑和移动智能设备，搜索结果以列表形式呈现给用户，用户在不侵犯音乐作品知识产权的情况下，可以进行下载，播放等操作。对于电话的客户端1，服务器2端将以语音提示的方式朗读搜索结果列表，用户可用电话按键选中。对于点唱设备客户端1，用户选中后，可以进行预约，点播等操作。For computers and mobile smart devices, the search results are presented to users in the form of a list, and users can download, play and other operations without infringing on the intellectual property rights of music works. For the client 1 of the phone, the server 2 will read the search result list in the form of voice prompts, and the user can select it with the phone buttons. For the jukebox device client 1, after the user selects it, he can perform operations such as reservation and on-demand.

音乐数据源接口单元21，用于提供多种不同的数据源访问接口，使服务器能够从不同的数据源获取原始音乐数据，并根据具体的用途和需求对音乐数据库进行扩充，例如：The music data source interface unit 21 is used to provide multiple different data source access interfaces, so that the server can obtain original music data from different data sources, and expand the music database according to specific purposes and requirements, for example:

1.采取Web网络抓取的方式，自动在互联网上漫游，抓取音乐文件和与该音乐文件相关的信息；或1. Take the method of web crawling to automatically roam on the Internet to grab music files and information related to the music files; or

2.采取对本地或网络文件系统中存储的文件进行抓取和分析；或2. Take the crawling and analysis of files stored in the local or network file system; or

3.采取对数据库中的音乐记录进行提取和分析。3. Extract and analyze the music records in the database.

本发明不局限于以上三种数据源，而是提供了可二次开发的应用程序接口(API)，可对数据源进行进一步的扩充。The present invention is not limited to the above three data sources, but provides an application program interface (API) capable of secondary development, which can further expand the data source.

上面描述是用于实现本发明的实施例，本领域的技术人员应该理解，在不脱离本发明的范围的任何修改或局部替换，均属于本发明权利要求来限定的范围。The above description is an embodiment for implementing the present invention, and those skilled in the art should understand that any modification or partial replacement that does not depart from the scope of the present invention belongs to the scope defined by the claims of the present invention.

Claims

1. the music retrieval method based on melody is characterized in that,

Step S1: specify one section melody waiting to look in the music as the melody key word of searching for;

Step S2:, obtain the digitizing melody signal through handling with specified melody key word input inquiry client device;

Step S3: the music in the music libraries is set up index, and this index embodies the melody characteristics of music, forms the musical database of indexation;

Step S4: by search engine the melody in the musical database of digitizing melody signal and generation is compared, select one group of one group of music that comprises the nominal key music rhythm from musical database;

Step S5: with the music selected according to the similarity degree sort descending of melody key word.

2. music retrieval method according to claim 1 is characterized in that, described music input mode comprises: play input and humming input.

3. music retrieval method according to claim 1 is characterized in that, described index is the index of working out at the melody characteristics of melody fragment.

4. music retrieval method according to claim 2 is characterized in that, for the humming input mode, takes following steps to obtain digitized melody signal:

Step S21: use audio collecting device to gather user's humming input;

Step S22: the sound signal of user's input is carried out pre-filtering handle, comprise direct current elimination, gain normalization, low-pass filtering treatment, obtain the audio frame sequence signal;

Step S23: the audio frame sequence signal is carried out time domain or frequency-domain analysis, extract the fundamental frequency sequence;

Step S24: the fundamental frequency sequence is further handled, comprised linearization, ask poor, obtain digitized melody signal.

5. the music retrieval device based on melody is characterized in that, comprising:

At least one station server (2) provides online music rhythm retrieval service;

Send online music rhythm retrieval request with at least one client (1) terminal device, and the result of reception server query music melody.

6. music retrieval device according to claim 5 is characterized in that, described client (1) comprising:

Load module (11) is used to import the music rhythm information that need search, and sends it to server (2);

The display module of Search Results (12), client (1) obtains Search Results by network or other transmission modes from server (2), and presents to the user.

7. music retrieval device according to claim 6 is characterized in that, described load module 11 comprises:

Audio collection unit (111) is used to gather user's humming sound signal;

Note collecting unit (113) is used to gather the note melody signal that the user plays;

Audio signal processing unit (112), the sound signal that audio collection unit (111) are gathered is converted into the music rhythm signal.

8. music retrieval device according to claim 5 is characterized in that, described server (2) comprising:

Music data source interface unit (21) is used to provide the various data sources of visit to obtain the interface of original music data;

Data are obtained and analytic unit (22), be used to collect original music data, and music data is analyzed, and therefrom extract music rhythm information;

Authorized index unit (23) is used for that data are obtained the original music data of obtaining with analytic unit 22 and sets up index according to its melody characteristics;

Search unit (24), be used to receive the query requests of client load module (11), and the music of the identical or close melody of the melody key word that provides with client load module (11) is provided in search in the index that authorized index unit (23) generate, search result list is pressed the ordering of similarity degree inverted order, and feed back to the search result display module (12) of client (1).

9. music retrieval device according to claim 8 is characterized in that, described music data source interface unit (21) provides the interface of following one or more data obtain manners:

Web: the mode of taking the Web network to grasp, music file and the information relevant with this music file are grasped in roaming on the internet automatically;

File: the music file of storing in this locality or the network file system(NFS) is grasped and analyzes;

Database: the music file that writes down in the database is extracted and analyzes.

10. music retrieval device according to claim 5 is characterized in that, described client (1) is one or more in the following equipment:

PC, intelligent mobile device, phone, audio frequency and video amusement equipment with media-on-demand function.

11. according to claim 5 or 10 described music retrieval devices, it is characterized in that, when described client (1) is selected personal computer equipment, PC client (1) downloads and installs specific Web browser plug-in software from server (2), during music retrieval Web website that user access server (2) provides, the user interface that is used to the user to provide audio collection input and note to gather melody, and collection user's inquiry input are sent to server (2) by the internet.

12. according to claim 5 or 10 described music retrieval devices, it is characterized in that, when described client (1) is selected intelligent mobile device, client (1) is installed specific software, the user interface that this software provides audio collection and note to gather for the user, and gather user's inquiry input, be sent to server by wireless network.

13. according to claim 5 or 10 described music retrieval devices, it is characterized in that, when described client (1) is selected telephone plant, server (2) provides specific phone information service center, client (1) is dialed this information service center number, utilize the phone numbers keyboard, or use telephone receiver respectively as note collection and audio collection input equipment, server (2) carries out information interaction with client (1) by PSTN.

14. according to claim 5 or 10 described music retrieval devices, it is characterized in that, when described client (1) is selected to have the audio frequency and video amusement equipment of media-on-demand function, client (1) is equipped with digital piano keyboard equipment, or the fingerboard note that virtual piano keyboard software collection user is installed is imported, utilize microphone of carok collection user's humming input, server (2) is a dedicated local server, and the scope of search is the local musical database of Karaoke.

15. according to claim 5 or 10 described music retrieval devices, it is characterized in that, the music list that described server (2) is chosen for Search Results according to the similarity sort descending of Search Results with inquiry input melody, and sends it back client (1) and shows.