JP3612272B2

JP3612272B2 - Music information search device, music information search method, and computer-readable recording medium storing music information search program

Info

Publication number: JP3612272B2
Application number: JP2000312774A
Authority: JP
Inventors: 尚子小杉
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-10-13
Filing date: 2000-10-13
Publication date: 2005-01-19
Anticipated expiration: 2020-10-13
Also published as: JP2002123287A

Description

【０００１】
【発明の属する技術分野】
本発明は、音楽情報の持つ特徴量を用いて音楽情報を検索する音楽情報検索装置及びその方法と、その音楽情報検索方法の実現に用いられる音楽情報検索用プログラムを記録したコンピュータ読み取り可能な記録媒体とに関する。
【０００２】
【従来の技術】
マルチメディア・データベースを構築するにあたって、類似検索は欠かせない技術である。
【０００３】
例えば、リンゴの写っている画像を探す場合、検索キーとして与えたリンゴの絵や写真とまったく同じリンゴが写っている画像を探したいという要求はめったにない。ほとんどが、単にリンゴが写っている画像が欲しいという要求であり、このような場合には、「検索キーとして与えたリンゴの画像に似ている画像を探す」といった、類似検索技術が必要である。
【０００４】
このような内容検索の新たなメディアへの対応として、ハミングや口ずさんだメロディなどを使って、音楽情報データベースから曲名や歌手を検索する技術の研究が盛んに行われている。
【０００５】
このようなことを背景にして、本発明者らも、音楽から音の推移や音の分布に関する特徴量を抽出して、それを使って、音楽情報データベースを検索するシステムの研究（ＮａｏｋｏＫｏｓｕｇｉ，ＹｕｉｃｈｉＮｉｓｈｉｈａｒａ，ＳｅｉｉｃｈｉＫｏｎ’ｙａ，ＭａｓａｓｈｉＹａｍａｍｕｒｏ，ａｎｄＫａｚｕｈｉｋｏＫｕｓｈｉｍａ：ＭｕｓｉｃＲｅｔｒｉｅｖａｌｂｙＨｕｍｍｉｎｇ−ＵｓｉｎｇＳｉｍｉｌａｒｉｔｙＲｅｔｒｉｅｖａｌｏｖｅｒＨｉｇｈＤｉｍｅｎｓｉｏｎａｌＦｅａｔｕｒｅＶｅｃｔｏｒＳｐａｃｅ−，ＰＡＣＲＩＭ’９９ｐ４０４−４０７）を行っている。
【０００６】
この本発明者らが提案した音楽検索システムでは、人の歌唱を検索キーとして受け付けることで、曲の旋律の一部のみしか覚えていない人に、その曲の題名や歌手・演奏者などを捜し出すことができるようにすることを提供している。
【０００７】
人の歌唱によってその曲に関する情報を検索する場合、人は必ずしもサビや出たしを歌うとは限らない。そこで、本発明者らが提案した音楽検索システムでは、音楽データ（旋律データ）を細かく分割し、保持する音楽データの冗長性を増すことで、人が曲のどの部分を歌っても検索できるようにすることを実現している。
【０００８】
この構成の実現にあたって、本発明者らが提案した音楽検索システムでは、図２１に示すように、ある一定の区間ごとに音楽データを分割することで音楽片を切り出して、それらの音楽片毎に、音楽片の先頭を起点とする特徴量（特徴量ベクトル）を算出するという構成を採っている。
【０００９】
そして、そのようにして算出される特徴量を使って、音楽情報データベースを構築するとともに、そのようにして算出される特徴量を使って、人の歌唱の音楽データと音楽情報データベースに格納される音楽データとの類似度を評価するという構成を採っている。
【００１０】
このように、従来技術では、音楽片の先頭を起点として、音楽片全体から特徴量（特徴量ベクトル）を算出するという構成を採っている。
【００１１】
なお、この従来技術により算出される特徴量（特徴量ベクトル）を、音楽片全体から算出されることを考慮して、以下において「ＥｎｔｉｒｅＦｅａｔｕｒｅＶｅｃｔｏｒ」と称することがある。
【００１２】
【発明が解決しようとする課題】
しかしながら、このような従来技術に従っていると、検索者の歌唱部分と、分割された音楽データ（旋律データ）の先頭とが一致しない場合に、検索精度が著しく悪化するという問題点がある。
【００１３】
すなわち、従来技術では、音楽片の先頭を起点として、音楽片全体から特徴量を算出するという構成を採っていることから、検索者の歌唱部分と、分割された音楽データ（旋律データ）の先頭とが一致しない場合には、検索精度が著しく悪化するのである。
【００１４】
この問題点の解決を図るためには、音楽データをより細かく分割するという方法を用いることが考えられるが、この方法を用いると、データ量が爆発的に増大するという新たな問題点が出てくる。
【００１５】
本発明はかかる事情に鑑みてなされたものであって、人の歌唱によってその曲に関する情報を検索するシステムを構築する場合に、その検索精度の向上を実現可能にする新たな音楽情報検索技術の提供を目的とする。
【００２３】
【課題を解決するための手段】
この目的を達成するために、本発明の音楽情報検索装置は、入力された音楽情報から切り出される複数の音楽片のそれぞれについて、所定の条件を満たす音楽片上の時間的位置として定義される特徴点を検出して、それらの特徴点を起点とする特徴量を生成する生成手段と、検索対象となる楽曲の持つ特徴量（生成手段の生成対象となる特徴量）を記憶する記憶手段と、記憶手段の記憶データを検索することで、生成手段の生成した特徴量毎に、それに類似する特徴量を持つ複数の楽曲を検索する検索手段と、検索手段の検索した楽曲の中から、入力された音楽情報の指す楽曲を特定する特定手段とを備えるように構成する。
【００２４】
ここで、生成手段は、入力された音楽情報から切り出される１つの音楽片の持つ特徴点を検出して、その特徴点を起点とする特徴量を生成することがあり、このときには、特定手段については備えられずに、検索手段は、生成された特徴量に類似する特徴量を持つ１つ又は複数の楽曲を検索して、それを検索結果として出力することになる。
【００２５】
このように構成されるときにあって、入力された音楽情報を規定の音符形態に変換する変換手段（例えば、実験により検証された検索精度の向上を実現できる例えば８分音符といった規定の音符形態に変換する）を備えることがあり、この場合には、生成手段は、変換手段の変換した音楽情報から切り出される音楽片を処理対象として特徴量を生成する。
また、生成手段は、音楽片の規定の時間的位置に位置する規定長さの区間部分を特徴点の検出に用いる区間として特徴点を検出することがあり、このとき、生成手段は、特徴点に課される音の高さ又は長さについての条件を充足する音位置を検出することで、その区間に含まれる特徴点を検出することがある。
また、生成手段は、特徴点を起点とする規定長さの音楽片部分から特徴量を生成したり、特徴点を起点とし音楽片終端を終点とする音楽片部分から特徴量を生成する。
なお、本発明の音楽情報検索装置により算出される特徴量（特徴量ベクトル）を、音楽片の一部分から算出されることを考慮して、以下において「 Partial Feature Vector 」と称することがある。
このように構成される本発明の音楽情報検索装置では、生成手段は、入力された音楽情報から切り出される複数の音楽片の持つそれぞれの特徴点を検出して、それらの特徴点を起点とする特徴量を生成する。
【００２６】
この特徴量の生成を受けて、検索手段は、記憶手段の記憶データを検索することで、生成手段の生成した特徴量毎に、それに類似する特徴量を持つ複数の楽曲を検索し、これを受けて、特定手段は、例えば検索された楽曲の持つ類似度をＯＲ演算することで、入力された音楽情報の指す楽曲を特定する。
【００２７】
この構成を採るときにあって、検索精度の向上を図るために、生成手段は、例えば生成する時系列データ形式の特徴量を、例えばヒストグラム形式などのような別の表現形式に変換することで複数の特徴量を生成することがあり、これを受けて、データベース手段は、この表現形式の異なる複数の特徴量を管理することがある。
【００２８】
このようにして、本発明の音楽情報検索装置では、従来技術のように、音楽片全体から特徴量を算出するという構成を採るのではなくて、図１に示すように、音楽片の中に含まれる特徴点を検出する構成を採って、その特徴点を起点とするいわば音楽片の部分特徴量とも言うべき特徴量を算出するように処理する。
そして、本発明の音楽情報検索装置では、図２に示すように、検索対象となる楽曲を、それが持つそのような算出形態で算出される特徴量との対応をとりつつ管理する音楽情報データベースを構築するとともに、そのような算出形態に従って、入力された音楽情報の持つ特徴量を生成して、それを検索キーにして音楽情報データベースを検索することで、入力された音楽情報の指す楽曲を検索するように処理することから、入力された音楽情報を分割することで得た音楽片と音楽情報データベースに格納される分割された音楽片との先頭が一致しない場合にも、高い検索精度を実現できるようになる。
【００２９】
【発明の実施の形態】
以下、実施の形態に従って本発明を詳細に説明する。
【００３０】
図３に、本発明を具備する音楽情報検索装置１の一実施形態例を図示する。
【００３１】
本発明の音楽情報検索装置１は、人のハミングした曲に関する情報（曲名など）を検索する処理を行うものであって、この検索処理を実現するために、音楽情報データベース１０と、採譜部１１と、ノイズ除去部１２と、特徴量生成部１３と、検索部１４とを備える。
【００３２】
ここで、本発明の音楽情報検索装置１の備える採譜部１１やノイズ除去部１２や特徴量生成部１３や検索部１４は、具体的にはプログラムで実現されるものであり、これらのプログラムは、計算機が読み取り可能な半導体メモリなどの適当な記録媒体に格納することができる。
【００３３】
音楽情報データベース１０は、検索対象となる楽曲の音楽情報（曲名・旋律情報）と、その楽曲の持つ特徴量と、その楽曲の持つ属性情報（歌手・演奏者などの情報）とを管理する。この音楽情報データベース１０に管理される楽曲の持つ特徴量は、特徴量生成部１３により生成されて登録されるものである。
【００３４】
採譜部１１は、例えば市販の採譜ソフトにより構成されて、入力されたハミングの楽譜情報を作成する。
【００３５】
採譜部１１により作成される楽譜情報はＭＩＤＩ形式で作成されるものであって、楽譜情報を構成する相対時間にかかる情報と、その楽譜情報の演奏時間を構成する絶対時間にかかる情報とで構成されることになるが、採譜部１１は、音楽情報データベース１０の管理する特徴量（相対時間で記述されている）との時間の正規化を図るべく、この内の絶対時間に関係しない部分である楽譜情報部分をノイズ除去部１２に渡すように処理している。
【００３６】
ノイズ除去部１２は、採譜部１１から渡される音楽情報（楽譜情報）を入力として、中心となるメロディーラインを外れた音や、レコーディング開始のある区間内で、ある長さ以上の無音区間が継続するときには、その無音区間よりも前の音区間を削除することなどにより、入力するハミングに乗っているノイズを除去する処理を行う。例えば、図４に示すようなハミングを入力するときには、１番目と４番目と１４番目の音はノイズであるとして除去するように処理する。
【００３７】
特徴量生成部１３は、ノイズ除去部１２から渡される音楽情報を音楽片に分割する音楽情報分割部１３０と、分割された音楽片の持つ特徴量（旋律の特徴量）を算出する音楽特徴量算出部１３１と、算出された特徴量（旋律の特徴量）から音楽片の特徴点を選出するための区間を限定する特徴点選出区間限定部１３２と、限定された区間の中から特徴点を決定する特徴点決定部１３３と、決定された特徴点を起点とする特徴量を導出する特徴量導出部１３４とを備えることで、入力したハミングの持つ特徴量を生成して、それを検索部１４に渡す処理を実行する。
【００３８】
ここで、音楽特徴量算出部１３１の算出する音楽特徴量は旋律に１対１に対応付けられる特徴量であるのに対して、特徴量導出部１３４の導出する特徴量は本発明に特徴的な特徴量である。以下、説明の便宜上、特徴量導出部１３４の導出する特徴量を「特徴点起点音楽部分特徴ベクトル」と称することがある。
【００３９】
検索部１４は、特徴量生成部１３から渡されるハミングの持つ特徴量を検索キーにして音楽情報データベース１０を検索することで、入力したハミングの曲名などを検索する。
【００４０】
図５に、特徴量生成部１３の実行する処理フローの一実施形態例を図示する。次に、この処理フローに従って、特徴量生成部１３の実行する処理について説明する。
【００４１】
特徴量生成部１３は、ノイズ除去部１２からハミングの音楽情報が渡されることで起動されると、図５の処理フローに示すように、先ず最初に、ステップ１で、その音楽情報（検索対象となるもの）を入力する。
【００４２】
続いて、ステップ２で、その入力した音楽情報から、例えば、４拍ずつずらしながら１６拍の長さを持つ音楽片を切り出すことで、その入力した音楽情報を音楽片に分割する。この分割処理が音楽情報分割部１３０の処理に相当する。
【００４３】
続いて、ステップ３で、各音楽片の旋律の特徴量を算出する。例えば、図６に示すような音の高さを示すＭＩＤＩコードを使って音楽片の旋律の特徴量を算出する場合を具体例にして説明するならば、「６４，６４，６５，６５，６７，６７，６５，６５」という旋律の特徴量を算出したり、この６５をベーストーンとする「−１，−１，０，０，２，２，０，０」という旋律の特徴量を算出するのである。この算出処理が音楽特徴量算出部１３１の処理に相当する。
【００４４】
続いて、ステップ４で、生成した全ての音楽片を処理したのか否かを判断して、全ての音楽片を処理していないことを判断するときには、ステップ６に進んで、未処理の音楽片を１つ選択する。
【００４５】
続いて、ステップ７で、その選択した音楽片の持つ中央部分の区間を特徴点を選出するための区間として限定（抽出）する。例えば、１６拍の長さを持つ音楽片の中央の４拍を限定するのである。この限定処理が特徴点選出区間限定部１３２の処理に相当する。
【００４６】
続いて、ステップ８で、その限定した区間の中の最も高い音を特徴点として決定（選出）する。この決定処理が特徴点決定部１３３の処理に相当する。
【００４７】
続いて、ステップ９で、その決定した特徴点から音楽片の最後拍までをその音楽片の特徴量（特徴点起点音楽部分特徴ベクトル）として導出（生成）してから、次の音楽片を処理すべくステップ４に戻る。この導出処理が特徴量導出部１３４の処理に相当する。
【００４８】
そして、ステップ４で、生成した全ての音楽片を処理したことを判断するときには、ステップ５に進んで、生成した特徴量（音楽片毎に求まる）を検索部１４に渡して、処理を終了する。
【００４９】
このようにして、特徴量生成部１３は、図５の処理フローに従う場合には、入力した音楽情報から、４拍ずつずらしながら１６拍の長さを持つ音楽片を切り出して旋律の特徴量を算出すると、図７に示すように、先ず最初に、その音楽片の中央の４拍を限定し、続いて、その４拍の中の最も高い音を特徴点として決定し、続いて、その特徴点から最後拍までをその音楽片の特徴量（特徴点起点音楽部分特徴ベクトル）として導出するように処理するのである。
【００５０】
図８に、特徴量生成部１３の実行する処理フローの他の実施形態例を図示する。
【００５１】
特徴量生成部１３は、この処理フローに従う場合には、ノイズ除去部１２からハミングの音楽情報が渡されることで起動されると、先ず最初に、ステップ１で、その音楽情報を入力し、続くステップ２で、その入力した音楽情報から音楽片を切り出し、続くステップ３で、各音楽片の旋律の特徴量を算出する。
【００５２】
続いて、ステップ４で、生成した全ての音楽片を処理したのか否かを判断して、全ての音楽片を処理していないことを判断するときには、ステップ６に進んで、未処理の音楽片を１つ選択する。
【００５３】
続いて、ステップ７で、その選択した音楽片の持つ最初の４拍の区間を特徴点を選出するための区間として限定（抽出）し、続くステップ８で、最初の８拍内の最も長い音で、かつ鳴り出しがその限定した区間（最初の４拍）にある音を特徴点として決定（選出）し、続くステップ９で、その決定した特徴点から１２拍までをその音楽片の特徴量（特徴点起点音楽部分特徴ベクトル）として導出（生成）してから、次の音楽片を処理すべくステップ４に戻る。
【００５４】
そして、ステップ４で、生成した全ての音楽片を処理したことを判断するときには、ステップ５に進んで、生成した特徴量（音楽片毎に求まる）を検索部１４に渡して、処理を終了する。
【００５５】
このようにして、特徴量生成部１３は、図８の処理フローに従う場合には、入力した音楽情報から、４拍ずつずらしながら１６拍の長さを持つ音楽片を切り出して旋律の特徴量を算出すると、図９に示すように、先ず最初に、その音楽片の最初の４拍を限定し、続いて、最初の８拍内の最も長い音で、かつ鳴り出しがその最初の４拍にある音を特徴点として決定し、続いて、その特徴点から１２拍ををその音楽片の特徴量（特徴点起点音楽部分特徴ベクトル）として導出するように処理するのである。
【００５６】
このように、特徴量生成部１３は、従来技術のように、音楽片の全体から特徴量を算出するという構成を採るのではなくて、音楽片の中に含まれる特徴点を検出する構成を採って、その特徴点を起点とする特徴量（特徴点起点音楽部分特徴ベクトル）を導出するように処理している。
【００５７】
すなわち、図１０に示す音楽情報で説明するならば、１８番目の音符から切り出される音楽片について、従来技術では、その音楽片の先頭に位置する１８番目の音符を起点としつつ、その音楽片全体から「ＥｎｔｉｒｅＦｅａｔｕｒｅＶｅｃｔｏｒ」と表現すべき特徴量を生成するのに対して、特徴量生成部１３では、例えば、その音楽片の１９番目の音符を起点としつつ、その音楽片の一部分から「ＰａｒｔｉａｌＦｅａｔｕｒｅＶｅｃｔｏｒ」と表現すべき特徴量を生成するのである。
【００５８】
図３では図示しなかったが、特徴点選出区間限定部１３２や特徴点決定部１３３や特徴量導出部１３４の実行する処理については、色々な処理を用意しておいて、その中から検索精度の向上を実現する最適なものを選択できるようにしておくことが望まれる。
【００５９】
このような要求に応えるために、図１１（ａ）に示すように、異なるアルゴリズムに従って特徴点の選出区間を限定する複数のルールを用意しておき、その中からユーザの設定するルールを呼び出していくことで特徴点の選出区間を限定したり、図１１（ｂ）に示すように、異なるアルゴリズムに従って特徴点を決定する複数のルールを用意しておき、その中からユーザの設定するルールを呼び出していくことで特徴点を決定したり、図１１（ｃ）に示すように、異なるアルゴリズムに従って特徴量（特徴点起点音楽部分特徴ベクトル）を導出する複数のルールを用意しておき、その中からユーザの設定するルールを呼び出していくことで特徴量を導出するという構成を用いることが好ましい。
【００６０】
また、特徴量生成部１３は、図７や図９に示したような音の高さを示すＭＩＤＩコードの時系列データで定義される図１２（ｂ）に示すような特徴点起点音楽部分特徴ベクトルを生成するときに、そのＭＩＤＩコードのヒストグラムで定義される図１２（ｂ）に示すような特徴点起点音楽部分特徴ベクトルを生成することがある。
【００６１】
後述するように、このような異なる表現形式を持つ複数の特徴点起点音楽部分特徴ベクトルを使って音楽情報データベース１０を検索すると、検索精度を向上できるようになることから、特徴量生成部１３は、図１２に示すように、時系列データで定義される特徴点起点音楽部分特徴ベクトルを生成するときに、それから、それとは表現形式の異なる別の特徴点起点音楽部分特徴ベクトルを生成するように処理することがあるのである。
【００６２】
図１３及び図１４に、検索部１４の実行する処理フローの一実施形態例を図示する。ここで、図１３は、特徴量生成部１３が１種類の特徴量を生成するときに検索部１４が実行することになる処理を図示しており、図１４は、特徴量生成部１３が異なる表現形式を持つ複数の特徴量を生成するときに検索部１４が実行することになる処理を図示している。
【００６３】
次に、これらの処理フローに従って、特徴量生成部１３の生成する特徴量（特徴点起点音楽部分特徴ベクトル）を使って実行される検索処理について説明する。
【００６４】
検索部１４は、図１３の処理フローに従う場合には、ハミングから切り出された複数の音楽片の特徴量を特徴量生成部１３から渡されることで起動されると、先ず最初に、ステップ１で、特徴量生成部１３からそれらの各音楽片の持つ１種類の特徴量を受け取る。
【００６５】
続いて、ステップ２で、全ての音楽片を選択したのか否かを判断して、未処理の音楽片が残されていることを判断するときには、ステップ３に進んで、未処理の音楽片を１つ選択する。
【００６６】
続いて、ステップ４で、その選択した音楽片の特徴量を処理対象として設定し、続くステップ５で、音楽情報データベース１０を検索することで、その処理対象の特徴量に類似する特徴量を持つ例えば上位２０曲を検索して、次の音楽片を処理すべくステップ２に戻る。
【００６７】
一方、ステップ２で、全ての音楽片を選択したことを判断するときには、ステップ６に進んで、その検索した楽曲の持つ類似度（例えば距離を使って類似度を評価するときには、距離値が小さい値ほど高い類似を示す）にＯＲ演算を施すことで、ハミングに類似する例えば上位１０曲を特定する。
【００６８】
例えば、図１５に示すように、音楽片１の特徴量から、距離０．９の類似度を持つ楽曲Ｄ／距離１．５の類似度を持つ楽曲Ｂ／距離１．８の類似度を持つ楽曲Ｃ／距離５．８の類似度を持つ楽曲Ａが検索され、音楽片２の特徴量から、距離０．３の類似度を持つ楽曲Ａ／距離１．２の類似度を持つ楽曲Ｂ／距離２．０の類似度を持つ楽曲Ｃ／距離５．９の類似度を持つ楽曲Ｄが検索され、音楽片３の特徴量から、距離１．０の類似度を持つ楽曲Ｂ／距離１．２の類似度を持つ楽曲Ｃ／距離１．５の類似度を持つ楽曲Ｄ／距離６．０の類似度を持つ楽曲Ａが検索される場合には、楽曲Ａの最小距離が０．３で、楽曲Ｂの最小距離が１．０で、楽曲Ｃの最小距離が１．２で、楽曲Ｄの最小距離が０．９であるので、ＯＲ演算に従って、ハミングに最も類似する楽曲してＡ、次に類似する楽曲としてＤ、その次に類似する楽曲としてＢ、その次に類似する楽曲としてＣを特定するのである。
【００６９】
そして、最後に、ステップ７で、その特定した例えば上位１０曲を検索結果として出力して、処理を終了する。
【００７０】
また、検索部１４は、図１４の処理フローに従う場合には、ハミングから切り出された複数の音楽片の特徴量を特徴量生成部１３から渡されることで起動されると、先ず最初に、ステップ１で、特徴量生成部１３からそれらの各音楽片の持つ複数種類の特徴量を受け取る。
【００７１】
続いて、ステップ２で、全ての音楽片を選択したのか否かを判断して、未処理の音楽片が残されていることを判断するときには、ステップ３に進んで、未処理の音楽片を１つ選択する。
【００７２】
続いて、ステップ４で、その選択した音楽片の持つ特徴量を全て処理したのか否かを判断して、未処理のものが残されていることを判断するときには、ステップ５に進んで、その選択した音楽片の持つ特徴量の中から未処理のものを選択して処理対象とし、続くステップ６で、音楽情報データベース１０を検索することで、その処理対象の特徴量に類似する特徴量を検索して、次の特徴量を処理すべくステップ４に戻る。
【００７３】
このようにして、ステップ４〜ステップ６の処理を繰り返していくことで、ステップ４で、選択した音楽片の持つ特徴量を全て処理したことを判断するときには、ステップ７に進んで、ステップ６で検索した特徴量の持つ類似度の重み付線形和を算出し、その中から例えば上位２０個の特徴量を特定して、その特徴量を持つ例えば上位２０曲を検索する。そして、次の音楽片を処理すべくステップ２に戻る。
【００７４】
一方、ステップ２で、全ての音楽片を選択したことを判断するときには、ステップ８に進んで、その検索した楽曲の持つ類似度に上述のＯＲ演算を施すことで、ハミングに類似する例えば上位１０曲を特定し、続くステップ９で、その特定した例えば上位１０曲を検索結果として出力して、処理を終了する。
【００７５】
このようにして、検索部１４は、特徴量生成部１３から、ハミングから切り出された複数の音楽片の特徴量が渡されると、それを検索キーにして音楽情報データベース１０を検索することで、ハミングに類似する例えば上位１０曲を得て、それを出力するように処理するのである。
【００７６】
図１３及び図１４の処理フローでは、特徴量生成部１３がハミングの中から複数の音楽片を切り出してその特徴量を生成し、それを検索部１４に渡す処理を行うことで説明したが、特徴量生成部１３がハミングの中から例えば真ん中に位置する１つの音楽片のみの特徴量を生成し、それを検索部１４に渡す処理を行うということもある。
【００７７】
このときには、検索部１４は、図１３の処理フローに代えて図１６（ａ）の処理フローを実行し、また、図１４の処理フローに代えて図１６（ｂ）の処理フローを実行することになる。このときには、図１５で説明したＯＲ演算は行われない。
【００７８】
次に、本発明の有効性を検証するために行った実験の結果について説明する。
【００７９】
この実験は、音楽情報データベース１０が約１０，０００件の楽曲についての情報を管理する構成を採るときにあって、２５人のユーザが自分の選んだ任意の合計１８６曲をハミングするときに、その検索結果となる１０位までの曲名を得て、その曲名をハミングした曲と照らし合わせることで行った。
【００８０】
ここで、特徴点起点音楽部分特徴ベクトルは、音楽片の最初の４拍を限定して、その４拍の中の最も高い音を特徴点として決定し、その特徴点から１２．５拍をを抽出することで得られるものを使った。
【００８１】
図１７に、本発明により得られる特徴点起点音楽部分特徴ベクトルを検索キーとする場合の実験結果と、従来技術により得られる「ＥｎｔｉｒｅＦｅａｔｕｒｅＶｅｃｔｏｒ」を検索キーとする場合の実験結果とを対比して示す。
【００８２】
この実験結果から分かるように、本発明により得られる特徴点起点音楽部分特徴ベクトルを検索キーとすると、ハミングした曲が１０位までに入る確率は約７５％であるのに対して、従来技術により得られる「ＥｎｔｉｒｅＦｅａｔｕｒｅＶｅｃｔｏｒ」を検索キーとすると、ハミングした曲が１０位までに入る確率は約６５％に止まり、これにより本発明の有効性を検証できた。
【００８３】
図１８に、ハミングから複数の音楽片を切り出して検索を行う場合の実験結果と、ハミングの真ん中に位置する１つの音楽片を切り出して検索を行う場合の実験結果とを対比して示す。前者の検索を行う場合には、図１５で説明したＯＲ演算を行うことで最終的な検索結果を得るように処理することになる。
【００８４】
この実験結果から分かるように、ハミングから複数の音楽片を切り出して検索を行う方が、１つの音楽片を切り出して検索を行うよりも検索精度を向上できることが検証できた。
【００８５】
図１９に、特徴点起点音楽部分特徴ベクトル（ＰａｒｔｉａｌＴｏｎｅＴｒａｎｓｉｔｉｏｎＦｅａｔｕｒｅＶｅｃｔｏｒ）を検索キーとする場合の実験結果と、特徴点起点音楽部分特徴ベクトルをヒストグラムの表現形式に変換した特徴ベクトル（ＴｏｎｅＤｉｓｔｒｉｂｕｔｉｏｎＦｅａｔｕｒｅＶｅｃｔｏｒ）を検索キーとする場合の実験結果と、その２つの特徴ベクトルを結合した形を検索キーとする場合の実験結果とを対比して示す。
【００８６】
この実験結果から分かるように、特徴点起点音楽部分特徴ベクトルとそれをヒストグラムの表現形式に変換した特徴ベクトルとの２つを検索キーとすると、特徴点起点音楽部分特徴ベクトルのみを検索キーとする場合よりも検索精度を向上できることが検証できた。
【００８７】
図２０に、音楽情報データベース１０の管理する特徴量を８分音符（ｑｕａｖｅｒ−ｎｏｔｅ）で記述する場合の実験結果と、４分音符（ｑｕａｔｅｒ−ｎｏｔｅ）で記述する場合の実験結果とを対比して示す。ここで、ハミングの音符形態と音楽情報データベースの音符形態とが異なる場合には、特徴量生成部１３は、ハミングの音符形態を音楽情報データベースの音符形態に変換してから、上述の処理を行う。
【００８８】
この実験結果から分かるように、本発明を８分音符の音符形態で実現する方が、４分音符の音符形態で実現するよりも検索精度を向上できることが検証できた。
【００８９】
これから、上述の実施形態例で説明しなかったが、音楽情報データベース１０の管理する特徴量を８分音符のような検索精度の向上を実現できる音符形態で構築するとともに、特徴量生成部１３がハミングの音符形態をその音楽情報データベース１０の音符形態に変換する機能を持つことが好ましい。
【００９０】
【発明の効果】
以上説明したように、本発明では、従来技術のように、音楽片全体から特徴量を算出するという構成を採るのではなくて、音楽片の中に含まれる特徴点を検出する構成を採って、その特徴点を起点とするいわば音楽片の部分特徴量とも言うべき特徴量を算出するように処理することから、入力された音楽情報に類似するものを音楽情報データベースから検索する場合に、入力された音楽情報を分割することで得た音楽片と音楽情報データベースに格納される分割された音楽片との先頭が一致しない場合にも、高い検索精度を実現できるようになる。
【図面の簡単な説明】
【図１】本発明の説明図である。
【図２】本発明の説明図である。
【図３】本発明の一実施形態例である。
【図４】ノイズ除去処理を説明するための楽譜の一例である。
【図５】特徴量生成部の実行する処理フローの一実施形態例である。
【図６】音楽片の旋律の特徴量の算出処理の説明図である。
【図７】特徴量生成部の実行する処理の説明図である。
【図８】特徴量生成部の実行する処理フローの他の実施形態例である。
【図９】特徴量生成部の実行する処理の説明図である。
【図１０】本発明を説明するための楽譜の一例である。
【図１１】特徴量生成部の構成例の説明図である。
【図１２】特徴量生成部の生成する特徴量の説明図である。
【図１３】検索部の実行する処理フローの一実施形態例である。
【図１４】検索部の実行する処理フローの一実施形態例である。
【図１５】検索結果を求めるためのＯＲ演算処理の説明図である。
【図１６】特徴量生成部の実行する処理フローの他の実施形態例である。
【図１７】本発明の有効性を検証するために行った実験結果の説明図である。
【図１８】本発明の有効性を検証するために行った実験結果の説明図である。
【図１９】本発明の有効性を検証するために行った実験結果の説明図である。
【図２０】本発明の有効性を検証するために行った実験結果の説明図である。
【図２１】従来技術の説明図である。
【符号の説明】
１音楽情報検索装置
１０音楽情報データベース
１１採譜部
１２ノイズ除去部
１３特徴量生成部
１４検索部
１３０音楽情報分割部
１３１音楽特徴量算出部
１３２特徴点選出区間限定部
１３３特徴点決定部
１３４特徴量導出部[0001]
BACKGROUND OF THE INVENTION
The present invention is a feature amount of music information.ForMusic information search device that searches music informationAnd a method thereof, and a computer-readable recording medium storing a music information search program used for realizing the music information search methodRelated.
[0002]
[Prior art]
Similarity search is an indispensable technology for building multimedia databases.
[0003]
For example, when searching for an image with an apple, there is rarely a need to search for an image with an apple that is exactly the same as the apple picture or photo given as a search key. Most of them are requests that simply want an image showing an apple. In such a case, a similar search technique such as “Find an image similar to the apple image given as a search key” is required. .
[0004]
In order to deal with such new media for content search, research on techniques for searching song titles and singers from a music information database using humming and melody is popular.
[0005]
Against this background, the present inventors have also extracted a feature quantity related to sound transition and sound distribution from music and used it to search a music information database (Naoko Kosugi, Yuichi Nishihara, Seichiki Kon'ya, Masashi Yamamura, and Kazuhiko Kusima: Music Retrieval by Humming-Veimetic Retrieval
[0006]
In the music search system proposed by the present inventors, a person's singing is accepted as a search key, so that a person who only remembers only a part of the melody of the song searches for the title of the song, a singer / performer, etc. Offering to be able to.
[0007]
When searching for information related to a song by a person's singing, a person does not necessarily sing rust or music. Therefore, the music search system proposed by the present inventors can divide music data (melody data) finely and increase the redundancy of the music data to be held so that a person can search any part of a song. Has been realized.
[0008]
In realizing the configuration, in the music search system proposed by the present inventors, as shown in FIG. 21, music pieces are cut out by dividing the music data into predetermined intervals, and each piece of music is cut out. The feature amount (feature amount vector) starting from the beginning of the music piece is calculated.
[0009]
Then, a music information database is constructed using the feature quantity calculated in this way, and stored in the music data and music information database of the person's song using the feature quantity thus calculated. The configuration is such that the degree of similarity with music data is evaluated.
[0010]
As described above, the conventional technology adopts a configuration in which the feature amount (feature amount vector) is calculated from the entire music piece starting from the beginning of the music piece.
[0011]
Note that the feature amount (feature amount vector) calculated by this conventional technique may be hereinafter referred to as “Entire Feature Vector” in consideration of being calculated from the entire music piece.
[0012]
[Problems to be solved by the invention]
However, according to such a conventional technique, there is a problem that the search accuracy is remarkably deteriorated when the singing part of the searcher does not coincide with the head of the divided music data (melody data).
[0013]
That is, in the prior art, since the feature amount is calculated from the whole music piece starting from the beginning of the music piece, the singing part of the searcher and the beginning of the divided music data (melody data) If the and do not match, the search accuracy is significantly deteriorated.
[0014]
In order to solve this problem, it is conceivable to use a method of dividing music data more finely. However, if this method is used, a new problem that the amount of data increases explosively appears. come.
[0015]
The present invention has been made in view of such circumstances, and it is possible to improve the search accuracy when a system for searching for information related to a song by human singing is constructed.NewProviding music information retrieval technologyServingObjective.
[0023]
[Means for Solving the Problems]
ThisIn order to achieve the above object, the music information retrieval apparatus of the present invention provides a plurality of music cut out from input music information.For each piece, a feature defined as a temporal position on the piece of music that satisfies a given condition.Generating means for detecting feature points and generating feature quantities starting from those feature points; and music to be searchedSpecial features of songsCollection amount (features to be generated by the generation means) MemorySteppedThe stored data in the storage meansGeneration of generation means by searchingdidSearch means for searching for a plurality of music pieces having similar feature quantities for each feature quantity, and search by the search meansdidA specifying means for specifying the music indicated by the input music information from the music is provided.
[0024]
Here, the generation means may detect a feature point of one piece of music cut out from the input music information and generate a feature quantity starting from the feature point. The search means searches for one or a plurality of music pieces having a feature quantity similar to the generated feature quantity and outputs it as a search result.
[0025]
When configured in this way, conversion means for converting the input music information into a prescribed note form (for example, a prescribed note form such as an eighth note capable of improving the search accuracy verified by experiments) In this case, the generation unit generates a feature amount using a music piece cut out from the music information converted by the conversion unit as a processing target.
In addition, the generation unit may detect a feature point as a section used for detecting a feature point using a section having a specified length located at a specified temporal position of the music piece. At this time, the generation unit may detect the feature point. By detecting a sound position that satisfies a condition regarding the pitch or length of a sound imposed on the sound, a feature point included in the section may be detected.
The generation unit generates a feature amount from a music piece portion having a specified length starting from the feature point, or generates a feature amount from a music piece portion starting from the feature point and ending at the end of the music piece.
In consideration of the fact that the feature quantity (feature quantity vector) calculated by the music information retrieval apparatus of the present invention is calculated from a part of a music piece, “ Partial Feature Vector May be called.
In the music information retrieval apparatus of the present invention configured as described above, the generation means detects each feature point of a plurality of music pieces cut out from the input music information, and uses those feature points as starting points. Generate feature values.
[0026]
In response to the generation of this feature quantity, search meansThe stored data of the storage meansGenerating means by searchingdidFor each feature quantity, a plurality of music pieces having similar feature quantities are searched, and in response to this, the specifying means performs, for example, an OR operation on the similarity degree of the searched music pieces, thereby obtaining the input music information. Identify the song you are pointing to.
[0027]
In order to improve the search accuracy when adopting this configuration, the generation means converts the feature quantity of the generated time-series data format into another expression format such as a histogram format, for example. A plurality of feature quantities may be generated, and in response to this, the database means may manage a plurality of feature quantities having different representation formats.
[0028]
In this way, the music information retrieval apparatus according to the present invention does not adopt a configuration in which the feature amount is calculated from the entire music piece as in the prior art, but as shown in FIG. By adopting a configuration for detecting included feature points, processing is performed so as to calculate a feature amount that can be called a partial feature amount of a musical piece starting from the feature point.
In the music information retrieval apparatus of the present invention,As shown in Fig. 2, it has the music to be searched.Calculated in such a calculation form.A music information database that is managed in correspondence with the feature quantityAccording to such a calculation form,Features of the input music informationRawThe music obtained by dividing the input music information is processed by searching the music information database using it as a search key and searching for the music indicated by the input music information. Even when the heads of the pieces and the divided pieces of music stored in the music information database do not match, high search accuracy can be realized.
[0029]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described in detail according to embodiments.
[0030]
FIG. 3 shows an example of an embodiment of the music information retrieval apparatus 1 comprising the present invention.
[0031]
The music information search apparatus 1 of the present invention performs a process of searching for information (song name, etc.) relating to a person's hummed music. In order to realize this search process, a music information database 10 and a music recording unit 11 And a noise removal unit 12, a feature amount generation unit 13, and a search unit 14.
[0032]
Here, the music recording unit 11, the noise removal unit 12, the feature amount generation unit 13, and the search unit 14 included in the music information search device 1 of the present invention are specifically realized by programs, and these programs are It can be stored in an appropriate recording medium such as a semiconductor memory readable by a computer.
[0033]
The music information database 10 manages the music information (song name / melody information) of the music to be searched, the feature amount of the music, and the attribute information (information of singer / player, etc.) of the music. The feature quantity of the music managed in the music information database 10 is generated and registered by the feature quantity generation unit 13.
[0034]
The music recording section 11 is composed of, for example, commercially available music recording software, and creates input Hamming score information.
[0035]
The musical score information created by the transcription unit 11 is created in MIDI format, and is composed of information relating to the relative time constituting the musical score information and information relating to the absolute time constituting the performance time of the musical score information. However, in order to normalize the time with the feature quantity (described in relative time) managed by the music information database 10, the transcription unit 11 is a part not related to the absolute time. A certain musical score information part is processed so as to be passed to the noise removing unit 12.
[0036]
The noise removing unit 12 receives the music information (score information) passed from the transcription unit 11 and continues a sound that is off the center melody line or a silent section of a certain length or longer in a section where recording starts. When this is done, a process is performed to remove noise input to humming to be input, such as by deleting a sound section before the silent section. For example, when humming as shown in FIG. 4 is input, processing is performed so that the first, fourth, and fourteenth sounds are removed as noise.
[0037]
The feature value generation unit 13 divides the music information passed from the noise removal unit 12 into music pieces, and a music feature value for calculating the feature values (melody feature values) of the divided music pieces. A calculation unit 131, a feature point selection section limiting unit 132 that limits a section for selecting feature points of a musical piece from the calculated feature quantity (melody feature quantity), and feature points from the limited sections. A feature amount determination unit 133 to be determined and a feature amount deriving unit 134 to derive a feature amount starting from the determined feature point are generated to generate a feature amount of the input hamming and search it 14 is executed.
[0038]
Here, the music feature amount calculated by the music feature amount calculation unit 131 is a feature amount associated with the melody on a one-to-one basis, whereas the feature amount derived by the feature amount deriving unit 134 is characteristic of the present invention. It is a characteristic amount. Hereinafter, for convenience of explanation, the feature amount derived by the feature amount deriving unit 134 may be referred to as a “feature point starting music partial feature vector”.
[0039]
The search unit 14 searches the music information database 10 using the feature value of Hamming passed from the feature value generation unit 13 as a search key, thereby searching for the input Hamming song name and the like.
[0040]
FIG. 5 illustrates an example of a processing flow executed by the feature value generation unit 13. Next, processing executed by the feature value generation unit 13 will be described according to this processing flow.
[0041]
When the humming music information is passed from the noise removing unit 12, the feature amount generating unit 13 is first started in step 1 as shown in the processing flow of FIG. Enter the following.
[0042]
Subsequently, in step 2, the inputted music information is divided into music pieces by cutting out a piece of music having a length of 16 beats from the inputted music information, for example, while shifting by 4 beats. This division process corresponds to the process of the music information division unit 130.
[0043]
Subsequently, in step 3, the feature value of the melody of each music piece is calculated. For example, when the feature value of the melody of a musical piece is calculated using a MIDI code indicating a pitch as shown in FIG. 6 as a specific example, “64, 64, 65, 65, 67” will be described. , 67, 65, 65 ", or a melodic feature value of" -1, -1, 0, 0, 2, 2, 0, 0 "with 65 as a base tone. To do. This calculation process corresponds to the process of the music feature amount calculation unit 131.
[0044]
Subsequently, when it is determined in step 4 whether or not all the generated music pieces have been processed and it is determined that all the music pieces have not been processed, the process proceeds to step 6 where unprocessed music pieces are processed. Select one.
[0045]
Subsequently, in step 7, the section of the central portion of the selected music piece is limited (extracted) as a section for selecting feature points. For example, the central four beats of a musical piece having a length of 16 beats are limited. This limitation process corresponds to the process of the feature point selection section limitation unit 132.
[0046]
Subsequently, in step 8, the highest sound in the limited section is determined (selected) as a feature point. This determination process corresponds to the process of the feature point determination unit 133.
[0047]
Subsequently, in step 9, the feature point from the determined feature point to the last beat of the musical piece is derived (generated) as the characteristic amount (feature point starting music partial feature vector) of the musical piece, and then the next musical piece is processed. Return to step 4 as much as possible. This derivation process corresponds to the process of the feature quantity derivation unit 134.
[0048]
When it is determined in step 4 that all the generated music pieces have been processed, the process proceeds to step 5 where the generated feature amount (obtained for each music piece) is passed to the search unit 14 and the process is terminated. .
[0049]
In this way, the feature value generation unit 13 cuts out a piece of music having a length of 16 beats from the input music information while shifting it by 4 beats to obtain the feature value of the melody when the processing flow of FIG. 5 is followed. When calculated, as shown in FIG. 7, first, the central four beats of the musical piece are limited, then the highest sound among the four beats is determined as a feature point, and then the feature From the point to the last beat, processing is performed so as to derive the feature value (feature point starting music partial feature vector) of the music piece.
[0050]
FIG. 8 illustrates another embodiment of the processing flow executed by the feature quantity generation unit 13.
[0051]
In the case of following this processing flow, when the feature quantity generation unit 13 is started by receiving the humming music information from the noise removal unit 12, first, the feature information generation unit 13 inputs the music information in step 1 and continues. In step 2, music pieces are cut out from the input music information, and in step 3 the melodic feature value of each music piece is calculated.
[0052]
Subsequently, when it is determined in step 4 whether or not all the generated music pieces have been processed and it is determined that all the music pieces have not been processed, the process proceeds to step 6 where unprocessed music pieces are processed. Select one.
[0053]
Subsequently, in step 7, the first 4-beat section of the selected musical piece is limited (extracted) as a section for selecting feature points, and in step 8, the longest sound in the first 8 beats is selected. In addition, a sound whose sound begins in the limited section (first 4 beats) is determined (selected) as a feature point, and in step 9, from the determined feature point to 12 beats, the feature value of the music piece ( After being derived (generated) as a feature point starting music partial feature vector, the process returns to step 4 to process the next music piece.
[0054]
When it is determined in step 4 that all the generated music pieces have been processed, the process proceeds to step 5 where the generated feature amount (obtained for each music piece) is passed to the search unit 14 and the process is terminated. .
[0055]
In this way, the feature value generation unit 13 cuts out a piece of music having a length of 16 beats from the input music information while shifting it by 4 beats to obtain the feature value of the melody when the processing flow of FIG. 8 is followed. When calculated, as shown in FIG. 9, first the first 4 beats of the piece of music are limited, followed by the longest sound within the first 8 beats, and the ringing is at the first 4 beats. The sound is determined as a feature point, and then, 12 beats from the feature point are processed so as to be derived as the feature value of the music piece (feature point starting music partial feature vector).
[0056]
As described above, the feature quantity generation unit 13 is not configured to calculate the feature quantity from the entire music piece as in the prior art, but is configured to detect the feature points included in the music piece. Then, processing is performed so as to derive a feature amount (feature point starting music partial feature vector) starting from the feature point.
[0057]
That is, in the case of the music information shown in FIG. 10, with respect to a music piece cut out from the 18th note, in the prior art, the entire music piece is started from the 18th note located at the beginning of the music piece. The feature value generation unit 13 generates, for example, a “Partial Feature Vector” from “Partial Feature Vector” from a part of the music piece, for example, starting from the 19th note of the music piece. A feature amount to be expressed as “Feature Vector” is generated.
[0058]
Although not shown in FIG. 3, various processes are prepared for the processes executed by the feature point selection section limiting unit 132, the feature point determination unit 133, and the feature quantity deriving unit 134. It is desirable to be able to select an optimum one that realizes the improvement of the above.
[0059]
In order to respond to such a request, as shown in FIG. 11 (a), a plurality of rules for limiting feature point selection sections according to different algorithms are prepared, and a rule set by the user is called out of them. As shown in FIG. 11B, a plurality of rules for determining feature points according to different algorithms are prepared, and a rule set by the user is called out from the rules. As shown in FIG. 11 (c), a plurality of rules for deriving feature quantities (feature point starting music partial feature vectors) according to different algorithms are prepared, It is preferable to use a configuration in which the feature amount is derived by calling a rule set by the user.
[0060]
Also, the feature quantity generation unit 13 is a feature point starting music partial feature as shown in FIG. 12B defined by MIDI time-series data indicating the pitch of the sound as shown in FIGS. When generating a vector, a feature point starting music partial feature vector as shown in FIG. 12B defined by the histogram of the MIDI code may be generated.
[0061]
As will be described later, if the music information database 10 is searched using a plurality of feature point starting music partial feature vectors having such different expression formats, the search accuracy can be improved. As shown in FIG. 12, when generating a feature point starting music partial feature vector defined by time series data, another feature point starting music partial feature vector having a different expression form is then generated. It may be processed.
[0062]
13 and 14 show an embodiment of a processing flow executed by the search unit 14. Here, FIG. 13 illustrates processing that the search unit 14 executes when the feature amount generation unit 13 generates one type of feature amount. FIG. 14 is different in the feature amount generation unit 13. The process which the search part 14 will perform when producing | generating the some feature-value with an expression form is illustrated.
[0063]
Next, search processing executed using the feature amount (feature point starting music partial feature vector) generated by the feature amount generation unit 13 in accordance with these processing flows will be described.
[0064]
In the case of following the processing flow of FIG. 13, when the search unit 14 is activated by passing the feature values of a plurality of music pieces cut out from humming from the feature value generation unit 13, first, at step 1, , One type of feature value of each music piece is received from the feature value generation unit 13.
[0065]
Subsequently, in step 2, it is determined whether or not all music pieces have been selected, and when it is determined that unprocessed music pieces remain, the process proceeds to step 3 where unprocessed music pieces are selected. Select one.
[0066]
Subsequently, in step 4, the feature quantity of the selected music piece is set as a processing target, and in the subsequent step 5, the music information database 10 is searched to have a feature quantity similar to the feature quantity of the processing target. For example, search for the top 20 songs and return to step 2 to process the next piece of music.
[0067]
On the other hand, when it is determined in step 2 that all pieces of music have been selected, the process proceeds to step 6 where the similarity of the searched music (for example, when evaluating the similarity using distance, the distance value is small). For example, the top 10 songs that are similar to Hamming are specified by performing an OR operation on the value.
[0068]
For example, as shown in FIG. 15, from the feature amount of the music piece 1, music D having a similarity of distance 0.9 and music B having a similarity of distance 1.5 / similarity of distance 1.8 Music A having a similarity of distance 5.8 is searched, and music A having a similarity of distance 0.3 and music B / having a similarity of distance 1.2 from the feature amount of music piece 2 are searched. A song C having a similarity of distance 2.0 and a song D having a similarity of distance 5.9 are searched, and from the feature quantity of the music piece 3, a song B having a similarity of distance 1.0 and a distance 1. When a song C having a similarity of 2 / a song D having a similarity of 1.5 and a song D having a similarity of 1.5 and a distance of 6.0 is searched, the minimum distance of the song A is 0.3. Since the minimum distance of music B is 1.0, the minimum distance of music C is 1.2, and the minimum distance of music D is 0.9, According, A and song most similar to Hamming, then D as music to be similar, B as music to be similar to the next, is to identify the C as music to be similar to the next.
[0069]
Finally, in step 7, the identified top 10 songs, for example, are output as search results, and the process ends.
[0070]
Further, in the case of following the processing flow of FIG. 14, when the search unit 14 is started by passing the feature values of a plurality of music pieces cut out from humming from the feature value generation unit 13, first, the search unit 14 performs steps. 1, a plurality of types of feature values of each music piece are received from the feature value generation unit 13.
[0071]
Subsequently, in step 2, it is determined whether or not all music pieces have been selected, and when it is determined that unprocessed music pieces remain, the process proceeds to step 3 where unprocessed music pieces are selected. Select one.
[0072]
Subsequently, in step 4, it is determined whether or not all the feature values of the selected music piece have been processed, and when it is determined that unprocessed ones are left, the process proceeds to step 5, From the feature quantities of the selected music piece, an unprocessed one is selected as a processing target, and in step 6, the music information database 10 is searched to obtain a feature quantity similar to the processing target feature quantity. Search and return to step 4 to process the next feature.
[0073]
In this way, when it is determined in step 4 that all of the feature values of the selected music piece have been processed by repeating the processing of step 4 to step 6, the process proceeds to step 7. The weighted linear sum of the similarities of the searched feature quantities is calculated, and, for example, the top 20 feature quantities are specified, and the top 20 songs having the feature quantities are searched for. The process then returns to step 2 to process the next piece of music.
[0074]
On the other hand, when it is determined in step 2 that all pieces of music have been selected, the process proceeds to step 8 where the above-mentioned OR operation is performed on the similarity of the searched music pieces, so that, for example, the top 10 similar to Hamming The music is specified, and in step 9 that follows, the specified top 10 songs, for example, are output as search results, and the process is terminated.
[0075]
In this way, when the feature values of a plurality of music pieces cut out from the humming are passed from the feature value generation unit 13, the search unit 14 searches the music information database 10 using the feature values as search keys. For example, the top 10 songs similar to humming are obtained and processed so as to be output.
[0076]
In the processing flow of FIG. 13 and FIG. 14, the feature value generation unit 13 cuts out a plurality of music pieces from the humming, generates the feature value, and passes it to the search unit 14. For example, the feature amount generation unit 13 may generate a feature amount of only one piece of music located in the middle from the hamming and pass the result to the search unit 14.
[0077]
At this time, the search unit 14 executes the processing flow of FIG. 16A instead of the processing flow of FIG. 13, and executes the processing flow of FIG. 16B instead of the processing flow of FIG. become. At this time, the OR operation described in FIG. 15 is not performed.
[0078]
Next, the results of experiments conducted to verify the effectiveness of the present invention will be described.
[0079]
This experiment is performed when the music information database 10 is configured to manage information on about 10,000 songs, and when 25 users hum a total of 186 songs selected by the user, The search results were obtained up to the 10th song title, and the song title was checked against the hummed song.
[0080]
Here, the feature point starting music partial feature vector is limited to the first four beats of the music piece, the highest sound among the four beats is determined as the feature point, and 12.5 beats from the feature point are determined. What was obtained by extracting was used.
[0081]
FIG. 17 compares the experimental results obtained when the feature point starting music partial feature vector obtained by the present invention is used as a search key and the experimental results obtained when “Entire Feature Vector” obtained by the prior art is used as a search key. Show.
[0082]
As can be seen from the experimental results, when the feature point starting music partial feature vector obtained by the present invention is used as a search key, the probability that a hummed song will enter the 10th place is about 75%, whereas according to the conventional technique. When the obtained “Entire Feature Vector” is used as a search key, the probability that a hummed song will enter the 10th place is only about 65%, and thus the effectiveness of the present invention can be verified.
[0083]
FIG. 18 shows a comparison between an experimental result when a search is performed by cutting out a plurality of music pieces from Hamming, and an experimental result when a search is performed by cutting out one music piece located in the middle of Hamming. When the former search is performed, processing is performed so as to obtain a final search result by performing the OR operation described in FIG.
[0084]
As can be seen from the experimental results, it has been verified that the search accuracy can be improved by performing a search by cutting out a plurality of music pieces from Hamming than by performing a search by cutting out a single music piece.
[0085]
FIG. 19 shows an experimental result when a feature point origin music partial feature vector (Partial Tone Transition Feature Vector) is used as a search key, and a feature vector (Tone Distribution Feature) obtained by converting the feature point origin music partial feature vector into a histogram representation format. An experimental result when using Vector) as a search key and an experimental result when using a combination of the two feature vectors as a search key are shown in comparison.
[0086]
As can be seen from the experimental results, if the search key is the feature point starting music partial feature vector and the feature vector obtained by converting the feature point starting music partial feature vector into a histogram representation format, only the feature point starting music partial feature vector is used as the search key. It was verified that the search accuracy could be improved more than the case.
[0087]
FIG. 20 compares the experimental result when the feature value managed by the music information database 10 is described in quarter-notes with the experimental result when described in quarter-notes. Show. When the humming note form is different from the note form of the music information database, the feature value generating unit 13 converts the humming note form into the note form of the music information database, and then performs the above-described processing. .
[0088]
As can be seen from the experimental results, it has been verified that the search accuracy can be improved by realizing the present invention in the form of an eighth note than in the form of a quarter note.
[0089]
From now on, although not explained in the above-described embodiment, the feature quantity managed by the music information database 10 is constructed in a note form that can improve the search accuracy like an eighth note, and the feature quantity generation unit 13 It is preferable to have a function of converting the humming note form into the note form of the music information database 10.
[0090]
【The invention's effect】
As described above, the present invention adopts a configuration for detecting feature points included in a music piece, instead of adopting a configuration for calculating feature amounts from the whole music piece as in the prior art. Since the processing is performed so as to calculate the feature quantity that can be called the partial feature quantity of the music piece starting from the feature point, the music information database is searched for something similar to the input music information.PlaceTogetherInEven when the music piece obtained by dividing the music information that has been applied does not coincide with the beginning of the divided music piece stored in the music information database, high search accuracy can be realized.
[Brief description of the drawings]
FIG. 1 is an explanatory diagram of the present invention.
FIG. 2 is an explanatory diagram of the present invention.
FIG. 3 is an example of an embodiment of the present invention.
FIG. 4 is an example of a score for explaining noise removal processing;
FIG. 5 is an example of a processing flow executed by a feature amount generation unit.
FIG. 6 is an explanatory diagram of a calculation process of a feature value of a music piece melody;
FIG. 7 is an explanatory diagram of processing executed by a feature amount generation unit.
FIG. 8 is another embodiment of the processing flow executed by the feature quantity generation unit.
FIG. 9 is an explanatory diagram of processing executed by a feature amount generation unit.
FIG. 10 is an example of a score for explaining the present invention.
FIG. 11 is an explanatory diagram of a configuration example of a feature amount generation unit.
FIG. 12 is an explanatory diagram of a feature amount generated by a feature amount generation unit;
FIG. 13 is an example of a processing flow executed by a search unit.
FIG. 14 is an example of a processing flow executed by a search unit.
FIG. 15 is an explanatory diagram of an OR operation process for obtaining a search result.
FIG. 16 is another example of a processing flow executed by the feature quantity generation unit.
FIG. 17 is an explanatory diagram of the results of an experiment performed to verify the effectiveness of the present invention.
FIG. 18 is an explanatory diagram of the results of an experiment performed to verify the effectiveness of the present invention.
FIG. 19 is an explanatory diagram of the results of an experiment performed to verify the effectiveness of the present invention.
FIG. 20 is an explanatory diagram of the results of an experiment performed to verify the effectiveness of the present invention.
FIG. 21 is an explanatory diagram of a prior art.
[Explanation of symbols]
1 Music information retrieval device
10 Music information database
11 Transcription
12 Noise removal unit
13 Feature generator
14 Search part
130 Music Information Division
131 Music feature amount calculation unit
132 Feature point selection section limited section
133 Feature point determination unit
134 Feature Deriving Unit

Claims

In a music information search device for searching for a song name indicated by input music information and / or attribute information of the song,
For one music piece to be cut out from the input music information, by detecting a feature point which is defined as the time position of the piece satisfying a predetermined condition music, it generates a characteristic quantity which starts its feature point generation Means,
A storage means to store the characteristic quantity having the easy songs to be searched,
A search unit for searching for one or a plurality of music pieces having the feature amount similar to the feature amount generated by the generation unit by searching stored data in the storage unit ;
A featured music information retrieval device.

In a music information search device for searching for a song name indicated by input music information and / or attribute information of the song,
For each of a plurality of music pieces to be cut out from the input music information, by detecting a feature point which is defined as the time position of the piece satisfying a predetermined condition music, a feature quantity which starts their characteristic points Generating means for generating;
A storage means to store the characteristic quantity having the easy songs to be searched,
Retrieval means for retrieving a plurality of pieces of music having the similar feature quantities for each of the feature quantities generated by the generation means by searching stored data in the storage means;
Including specifying means for specifying the music indicated by the input music information from the music searched by the search means,
A featured music information retrieval device.

The music information search device according to claim 2 ,
The specifying means specifies the music indicated by the input music information by ORing the similarity of the music searched by the search means,
A featured music information retrieval device.

The music information search apparatus according to any one of claims 1 to 3,
A conversion means for converting the input music information into a prescribed note form;
The generation unit generates a feature amount using a music piece cut out from the music information converted by the conversion unit as a processing target.
A featured music information retrieval device.

The music information search device according to any one of claims 1 to 3,
The generating means detects a feature point as a section used for detecting a feature point using a section having a specified length located at a specified time position of the music piece.
A featured music information retrieval device.

The music information retrieval apparatus according to claim 5, wherein
The generating means detects a feature point included in the section by detecting a sound position that satisfies a condition for a pitch or length of a sound imposed on the feature point.
A featured music information retrieval device.

The music information search apparatus according to any one of claims 1 to 3,
The generation means generates a feature amount from a music piece portion having a specified length starting from the feature point.
A featured music information retrieval device.

The music information search device according to any one of claims 1 to 3,
The generating means generates a feature value from a music piece portion starting from a feature point and ending at a music piece end point.
A featured music information retrieval device.

The music information search device according to any one of claims 1 to 8 ,
It said generating means, to convert the feature amount the generated into another representation, that in addition to the feature quantity, generates a feature amount other than it,
A featured music information retrieval device.

In a music information search method for searching for a song name indicated by input music information and / or attribute information of the song,
A process of detecting a feature point defined as a temporal position on a music piece satisfying a predetermined condition for one piece of music cut out from input music information, and generating a feature amount starting from the feature point; ,
A process of searching for one or a plurality of music pieces having the feature amount similar to the generated feature amount by searching stored data in a storage unit that stores the feature amount of the song to be searched. To prepare,
Characteristic music information retrieval method.

In a music information search method for searching for a song name indicated by input music information and / or attribute information of the song,
For each of a plurality of pieces of music cut out from the input music information, feature points defined as temporal positions on the piece of music satisfying a predetermined condition are detected, and feature amounts starting from those feature points are generated. The process of
A process of searching for a plurality of pieces of music having the similar feature amount for each of the generated feature amounts by searching stored data in a storage unit that stores the feature amount of the music to be searched;
A process of identifying the music indicated by the input music information from the searched music,
Characteristic music information retrieval method.

A computer-readable recording medium storing a music information search program for searching for a song name indicated by input music information and / or attribute information of the song,
Processing for detecting a feature point defined as a temporal position on a piece of music that satisfies a predetermined condition for one piece of music cut out from the input music information, and generating a feature amount starting from the feature point; ,
A process of searching for one or a plurality of music pieces having the feature amount similar to the generated feature amount by searching stored data of a storage unit that stores the feature amount of the song to be searched. Recording a music information search program to be executed by a computer
A computer-readable recording medium on which a music information search program is recorded.

A computer-readable recording medium storing a music information search program for searching for a song name indicated by input music information and / or attribute information of the song,
For each of a plurality of pieces of music cut out from the input music information, feature points defined as temporal positions on the piece of music satisfying a predetermined condition are detected, and feature amounts starting from those feature points are generated. Processing to
A process for searching a plurality of music pieces having the above-mentioned feature amount similar to each of the generated feature amounts by searching stored data of a storage unit that stores the feature amount of the music to be searched;
Recording a music information search program that causes a computer to execute processing for specifying a music pointed to by the input music information from the searched music pieces,
A computer-readable recording medium on which a music information search program is recorded.