JP2008216486A

JP2008216486A - Music reproduction system

Info

Publication number: JP2008216486A
Application number: JP2007051830A
Authority: JP
Inventors: Masaaki Yoda; 雅彰誉田; Kentaro Hikosaka; 健太郎彦坂; Toru Taniguchi; 徹谷口; Katsuhiko Shirai; 克彦白井; Yotaro Kubo; 陽太郎久保
Original assignee: Waseda University
Current assignee: Waseda University
Priority date: 2007-03-01
Filing date: 2007-03-01
Publication date: 2008-09-18

Abstract

<P>PROBLEM TO BE SOLVED: To propose a music reproduction system which adaptively selects a music piece suitable for a user according to user's feeling without requesting the user to perform complicated operation, by using a sound feature amount. <P>SOLUTION: The feature amount is extracted from each music piece as preprocessing, and feature amount space is held based on the music feature (step S0). When reproduction is started, a music piece is selected from a music piece group, and reproduced (step S1). Basically the music piece is reproduced in a music score order, but it is reproduced at random when it comes to reproduction from the beginning to when the music score is updated. When a user skips " a music piece to which the user does not like to listen" (step S2), the music score which is a priority degree of the music piece is calculated according to music selection algorithm, while referring to the feature amount space of the music piece, and updates it while succeeding the music score (step S3). A play list of the music pieces are dynamically updated by repeating the steps S1 to S3. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、記憶された楽曲群の中から自動的に選曲を行い再生する音楽再生システムに関する。 The present invention relates to a music reproduction system for automatically selecting and reproducing music from a stored music group.

近年、ハードディスクの大容量化やネットワークの高速化により我々はＰＣ（パーソナルコンピュータ）をオーディオプレイヤーとして使用し、大量の音楽を入手、蓄積し、再生することが可能となった。また、携帯型ハードディスクプレイヤーの普及により、手軽にどこへでも大量の音楽を持ち出すことが可能になった。今後このような音楽再生の様式の変化はますます進むと考えられる。
暦本純一著、“利用者の嗜好に動的に適合するメディア再生機構（UniversalPlaylist）”、インタラクション2005、2005年 In recent years, with the increase in hard disk capacity and network speed, we have been able to use PCs (personal computers) as audio players to obtain, store and play back large amounts of music. In addition, with the spread of portable hard disk players, it has become possible to easily take a large amount of music anywhere. In the future, such changes in the style of music reproduction will continue to progress.
Junichi Kyokumoto, “Media playback mechanism that dynamically adapts to user preferences (UniversalPlaylist)”, Interaction 2005, 2005

このように大量の音楽を扱いながら再生することが可能になると、「聴く曲を自ら選曲したくはないが、何か適当な音楽を流したい」、「ランダムに再生して意外性のある選曲で楽しみたいが、今の気分に合わない曲は省きたい」といった欲求がでてくると考えられる。しかしながら、従来のものと比較して現在の音楽再生のインターフェイスはほとんど変わっておらず、これらの要求に十分応えているとは言いがたい。現在の音楽再生インターフェイスを見てみると、このような要求に応えるための機能として、ユーザが聴きたい曲の一群をあらかじめ指定し、保有しておく「プレイリスト機能」や、楽曲を完全にランダムな順序で再生する「ランダム再生機能」などが用意されている。しかし、「プレイリスト機能」では最初にユーザが手動でプレイリストに楽曲を登録しなければならず、そのプレイリストは変わることはないため意外性がなく、そのプレイリストに飽きてしまうと、また別のプレイリストを手動にて作成していかなければいけないなど手間のかかるものである。一方「ランダム再生機能」ではユーザが楽曲を指定する手間はかからないものの、ハードディスク内の楽曲が、そのジャンルやアーティストの曲調に関わらずランダムに選曲されるためハードロックの後にクラシック、といったような流れの悪い選曲になりがちであるし、ユーザが今聴きたい曲が再生されるとは限らない。 When it becomes possible to play while handling a large amount of music in this way, “I do not want to select the song to listen to myself, but I want to play some appropriate music”, “Random playback and unexpected music selection I want to enjoy it, but I want to omit songs that do n’t fit my mood. ” However, the current music playback interface has changed little compared to the conventional one, and it is hard to say that these requirements are fully met. Looking at the current music playback interface, as a function to respond to such demands, the “playlist function” in which a group of songs that the user wants to listen to is designated and held in advance, or the music is completely random "Random playback function" that plays in random order is provided. However, in the “playlist function”, the user must first manually register the music in the playlist, and since the playlist does not change, there is no surprise, and if the user gets bored with the playlist, It takes time and effort to create another playlist manually. On the other hand, the “Random Playback Function” does not require the user to specify the music, but the music on the hard disk is randomly selected regardless of the genre or the tone of the artist. It tends to be a bad song selection, and the song that the user wants to listen to is not always played.

自動選曲についての関連研究として、ユーザの複数のプレイリストを集合として扱い、プレイリストの評価重み付けを利用者の嗜好に合わせて動的に変更し、プレイリストの重みから各楽曲の優先度を計算するアルゴリズムの研究（非特許文献１）などがある。しかしメタデータなどによる選曲では、年代、ジャンル、アーティスト、経験など細かい設定を行うことができる反面、ユーザが多くのこと考えたり、データの付加を行うといった複雑な操作をすることを要求されてしまう。また、十分なメタデータが蓄えられ有効に活用できるようになるまでの時間が長く、気分の変化に応じた選曲や意外性を与える選曲ができないなどといった問題がある。またユーザにより楽曲をランク付けするというような再生法もあるが、そもそもメタデータを付加するのがユーザに大変な負担を強いる作業であった。 As related research on automatic music selection, multiple playlists of users are treated as a set, playlist evaluation weight is dynamically changed according to user's preference, and the priority of each song is calculated from the playlist weight There are researches on algorithms to perform (Non-Patent Document 1). However, the selection of music by metadata, etc. can make detailed settings such as age, genre, artist, experience, etc., but the user is required to perform complicated operations such as thinking a lot and adding data. . In addition, there is a problem that it takes a long time until sufficient metadata is stored and can be used effectively, and music selection according to mood changes or music selection that gives unexpectedness cannot be performed. In addition, there is a playback method in which music is ranked by the user, but in the first place, adding metadata is an operation that places a heavy burden on the user.

この選曲問題は音響特徴による一種の楽曲分類・識別問題と捉えることができる。このような問題を扱った研究としてパワースペクトルから抽出される情報により音楽のジャンル分類を行う研究、パワースペクトルから抽出される音響的情報による類似度と主観的な類似度との評価を行う研究などが行われている。しかし、未だ音響特徴量を有効に用いて、変化するユーザの気分に合わせて選曲を行うシステムや再生インターフェイスは十分に発展しているとは言えない。 This music selection problem can be regarded as a kind of music classification / identification problem by acoustic features. Research to deal with such problems, such as research on music genre classification based on information extracted from the power spectrum, research on evaluation of similarity and subjective similarity based on acoustic information extracted from the power spectrum, etc. Has been done. However, it cannot be said that a system and a playback interface that make effective use of acoustic features and select music according to the changing user's mood have been sufficiently developed.

そこで、音響特徴量を用い、ユーザに複雑な操作を要求することなく且つユーザの気分に応じ適応的にユーザに合う楽曲を選曲していく音楽再生システムを提案することを目的とする。 Therefore, an object of the present invention is to propose a music playback system that uses acoustic features and adaptively selects music suitable for the user according to the user's mood without requiring a complicated operation from the user.

本発明における請求項１では、楽曲の再生順序を示すプレイリストに従って音楽を再生する音楽再生システムであって、ユーザが操作可能な操作入力手段と、複数の楽曲データを記憶保持する楽曲データ記憶手段と、前記楽曲データから所定の特徴量を抽出して各楽曲間の類似関係を表す特徴量空間を生成する特徴量空間生成手段と、前記操作入力手段からの所定の操作入力により指定された楽曲データと他の各楽曲データとの前記特徴量空間上の距離を求めることにより各楽曲データの再生に関する優先度を決定し、当該優先度に基づき前記プレイリストを更新するプレイリスト生成手段とを備えている。 According to a first aspect of the present invention, there is provided a music reproduction system for reproducing music in accordance with a playlist indicating the reproduction order of music, an operation input means operable by a user, and music data storage means for storing and holding a plurality of music data A feature amount space generating means for extracting a predetermined feature amount from the song data and generating a feature amount space representing a similarity relationship between the songs, and a song designated by a predetermined operation input from the operation input means A playlist generation means for determining a priority for reproduction of each piece of music data by obtaining a distance between the data and each piece of music data in the feature amount space, and updating the playlist based on the priority; ing.

このようにすると、ユーザによるデータ付加などの必要がないため、ユーザに複雑な操作を要求することがなく、選曲の意外性を残したままユーザの気分に適応したプレイリストを動的に自動生成することができる。 In this way, there is no need for data addition by the user, so no complicated operation is required from the user, and a playlist that adapts to the user's mood is automatically and automatically generated while keeping the unexpectedness of music selection. can do.

本発明における請求項２の音楽再生システムでは、前記操作入力手段と前記楽曲データ記憶手段と前記プレイリスト生成手段とを備えた第１の情報処理装置と、前記特徴量空間生成手段を備えた第２の情報処理装置とから構成されている。 In the music reproduction system according to claim 2 of the present invention, a first information processing apparatus including the operation input unit, the music data storage unit, and the playlist generation unit, and a feature amount space generation unit includes the feature amount space generation unit. 2 information processing apparatuses.

このようにすると、例えば携帯型音楽プレイヤーなど比較的単純なデバイスに実装することができる。 In this way, it can be mounted on a relatively simple device such as a portable music player.

本発明における請求項３の音楽再生システムでは、前記特徴量空間生成手段は、楽曲のビート情報を前記特徴量の要素として前記特徴量空間を生成するものであることを特徴とする。 The music reproduction system according to claim 3 of the present invention is characterized in that the feature quantity space generating means generates the feature quantity space using beat information of music as an element of the feature quantity.

本発明における請求項４の音楽再生システムでは、前記特徴量空間生成手段は、楽曲の音色情報を前記特徴量の要素として前記特徴量空間を生成するものであることを特徴とする。 The music reproduction system according to claim 4 of the present invention is characterized in that the feature amount space generating means generates the feature amount space using timbre information of music as the element of the feature amount.

本発明における請求項５の音楽再生システムでは、前記特徴量空間生成手段は、楽曲のパワー情報を前記特徴量の要素として前記特徴量空間を生成するものであることを特徴とする。 The music reproduction system according to claim 5 of the present invention is characterized in that the feature quantity space generating means generates the feature quantity space using music power information as an element of the feature quantity.

このようにすると、楽曲の主観的・客観的特徴をよく捉え且つ、特徴量空間上でユーザの気分によるまとまりに分離され分布するような特徴量とすることができる。 In this way, it is possible to capture the subjective and objective features of the music well, and to make the feature quantities separated and distributed according to the user's mood in the feature quantity space.

本発明における請求項６の音楽再生システムでは、前記所定の操作入力はスキップ操作であることを特徴とする。 The music playback system according to claim 6 of the present invention is characterized in that the predetermined operation input is a skip operation.

このようにすると、聴きたくない楽曲が再生された場合にユーザが自然に行う操作により、聴きたくない楽曲の指定を行うことができる。 In this way, it is possible to specify a song that you do not want to listen to by a natural operation performed by the user when a song that you do not want to listen to is played.

本発明における請求項７の音楽再生システムでは、前記プレイリスト生成手段は、前記特徴量空間上の距離が遠いもの程、前記優先度を高く設定するものであることを特徴とする。 The music playback system according to claim 7 of the present invention is characterized in that the playlist generation means sets the priority higher as the distance in the feature amount space is longer.

このようにすると、ユーザが聴きたくない楽曲と全く曲調の異なる楽曲を優先的に再生するため、ユーザの気分に適した楽曲が再生され易い。 In this way, music that is completely different from the music that the user does not want to listen to is preferentially reproduced, so that music suitable for the user's mood is easily reproduced.

本発明における請求項８の音楽再生システムでは、前記プレイリスト生成手段は、前記特徴量空間上の距離が近いもの程、前記優先度を高く設定するものであることを特徴とする。 The music reproduction system according to claim 8 of the present invention is characterized in that the playlist generation means sets the priority higher as the distance in the feature amount space is shorter.

このようにすると、ユーザが指定した楽曲に類似する楽曲が再生されるため、楽曲のリコメンドシステムを実現することができる。 In this way, music similar to the music specified by the user is reproduced, so that a music recommendation system can be realized.

本発明によれば、音響特徴量を用い、ユーザに複雑な操作を要求することなく且つユーザの気分に応じ適応的にユーザに合う楽曲を選曲していく音楽再生システムを提案することができる。 According to the present invention, it is possible to propose a music playback system that uses acoustic feature amounts and adaptively selects music suitable for a user according to the user's mood without requiring a complicated operation from the user.

本発明は、ユーザに煩わしい操作を要求することなく、ユーザの好みに合わせた自動的に選曲を行う音楽再生システムに関するものであり、主観的な好みと楽音の音響的特徴との関連性を基にユーザの気分に適合する楽曲を楽曲再生時のオンライン学習により適応的に選択する手法を提案する。 The present invention relates to a music playback system that automatically selects music according to a user's preference without requiring annoying operation by the user, and is based on the relationship between subjective preference and acoustic characteristics of musical sound. We propose a method for adaptively selecting music that matches the user's mood through online learning during music playback.

具体的には、ユーザがある楽曲を「聴きたくない」と判断したときに、この「聴きたくない曲」をスキップするという操作を認識し、優先度の計算を行うアルゴリズムを適応することでシステムが、ユーザが聴きたいと思われる楽曲順にプレイリストを作成する。当該アルゴリズムとしては、各楽曲間の類似関係を表す特徴量空間においてそれらの楽曲からの距離が遠い楽曲を選曲するものが適用される。 Specifically, when the user decides that he / she does not want to listen to a certain song, the system recognizes the operation of skipping this “unwanted song” and applies an algorithm for calculating the priority. However, the playlist is created in the order of the music that the user wants to listen to. As the algorithm, an algorithm for selecting a music piece that is far from those music pieces in a feature amount space that represents a similar relationship between the music pieces is applied.

本発明の利点として、まず音響的な特徴量を用いるため、ユーザにメタデータの付加などの作業を一切要求しない点である。さらに、「聴きたくない曲」をスキップするという操作を行なうのは、ユーザにとって自然な動作であり、従来の操作方法と変わらずシステムを利用することができる。そのためポータブルプレイヤーのような、小さなデバイスにも実装が可能である。また、ユーザは自分の意思で曲をスキップするが、プレイリストの計算がどのように行われるかわからないため選曲の意外性を残すことができる。ユーザが「聴きたくない曲」をスキップするごとに計算を行い、動的にプレイリストを作成するため、プレイリストが楽曲の組み合わせ数だけ存在する。さらにユーザが楽曲を追加することにより特徴量空間も変化し、それに応じて選曲が変化していく。 As an advantage of the present invention, since an acoustic feature quantity is first used, the user is not required to perform any operation such as adding metadata. Furthermore, it is a natural operation for the user to perform an operation of skipping “a song that he / she does not want to listen to”, and the system can be used as in the conventional operation method. Therefore, it can be mounted on small devices such as portable players. In addition, the user skips the song by his / her own intention, but can leave the unexpectedness of the song selection because he does not know how the playlist is calculated. Since the calculation is performed every time the user skips “Songs that he / she does not want to listen to” and dynamically creates a playlist, there are as many playlists as the number of combinations of songs. Furthermore, when the user adds music, the feature space also changes, and the music selection changes accordingly.

本発明では、対象として例えばユーザの音楽プレイヤーやＰＣのＨＤＤ（ハードディスクドライブ）などに入っている数千曲程度の楽曲ファイルを扱うことを想定する。ＨＤＤに入る楽曲はユーザが聴きたいと思った楽曲であるという前提の下、その中からユーザの再生時の気分に合う楽曲を選曲することのできるシステムを提案する。 In the present invention, it is assumed that a music file of about several thousand songs stored in, for example, a user's music player or an HDD (hard disk drive) of a PC is handled as an object. Based on the premise that the music that enters the HDD is the music that the user wants to listen to, a system that can select music that matches the mood of the user during playback is proposed.

ユーザの状況として一般的な音楽プレイヤーで、「再生」、「次の曲」などの標準的な操作のみしか行うことができないことを想定する。ユーザが一連の音楽を再生する中で、「この曲は今聴きたくないのでスキップする」という通常通りの簡単な判断をすることにより、その次からの選曲に対してその判断を反映させ、選ばれた「聴きたくない曲」と類似度の高い曲を選曲しないことを目指す。逐次的にユーザの選択を選曲に反映させるためユーザの聴きたくない曲が気分や状況により変わった時も、そのことを選曲に反映させることができる。また本システムは楽曲の音響的特徴に基づいて識別を行うため、アーティストやジャンルなどといった楽曲に付けられているメタデータに依らない選曲が行うことができる。 It is assumed that the user is a general music player and can only perform standard operations such as “play” and “next song”. When a user plays a series of music, it makes a simple decision as usual, “Skip this song because I don't want to listen to it now.” Aiming to avoid selecting songs that have a high degree of similarity to the “songs you do not want to listen to”. Since the user's selection is sequentially reflected in the music selection, even when the music that the user does not want to listen to changes depending on the mood or the situation, this can be reflected in the music selection. In addition, since this system performs identification based on the acoustic characteristics of music, music selection such as artist and genre can be performed without depending on metadata attached to the music.

本手法は、例えばポータブルプレイヤーなどの限定的なユーザインターフェイスにおいても、従来のインターフェイスを変更することなく実現することが可能であることは実用上の利点である。また、本発明の応用例として、音響信号の分析による楽曲のリコメンドシステムの実現や楽曲検索に適用されることが期待できる。 It is a practical advantage that this method can be realized without changing the conventional interface even in a limited user interface such as a portable player. In addition, as an application example of the present invention, it can be expected that the present invention is applied to the realization of a music recommendation system by analyzing an acoustic signal and music search.

もちろん、本発明は以上述べた一般条件下においての適用に限定されるものではなく、様々な拡張性を有していることは言うまでもない。 Of course, it goes without saying that the present invention is not limited to application under the general conditions described above, and has various expandability.

図１に本システムにおける処理の流れを概略的に示す。より詳細な処理の流れについては後述する。 FIG. 1 schematically shows the flow of processing in this system. A more detailed processing flow will be described later.

楽曲データは、本システムで扱うデータの総称であり、少なくとも、再生対象となる楽曲（音声データ）の集合である楽曲群と、各楽曲の類似度を判断するための楽曲特徴と、選曲の際の優先度である楽曲スコアとを含む。前処理として、楽曲群の各楽曲の波形信号から例えばビート情報，音色情報，パワー構成情報などの特徴量を抽出し、当該楽曲特徴を基に特徴量空間を保持する（ステップＳ０）。再生を開始すると、楽曲群の中から選曲が行われ再生される（ステップＳ１）。このとき、基本的には楽曲スコア順に再生されることとなるが、後述するように楽曲スコアの初期値は各楽曲全て同値であるため、最初から楽曲スコアが更新されるまでの再生についてはランダムに選曲される。ユーザによって「聴きたくない曲」のスキップが行われると（ステップＳ２）、楽曲の特徴量空間を参照しながら、選曲アルゴリズムに従い楽曲の優先度である楽曲スコアを計算し、当該楽曲スコアを引き継ぎながら更新する（ステップＳ３）。ステップＳ１〜Ｓ３を繰り返すことにより、楽曲のプレイリストが動的に変更される。 Music data is a general term for data handled by this system. At least, a music group that is a set of music (audio data) to be played, music characteristics for determining the similarity of each music, and music selection And the music score that is the priority of. As preprocessing, feature quantities such as beat information, timbre information, and power configuration information are extracted from the waveform signals of each song in the song group, and a feature quantity space is held based on the song features (step S0). When the reproduction is started, the music is selected from the music group and reproduced (step S1). At this time, playback is basically performed in the order of the music score, but since the initial value of the music score is the same for each music as will be described later, the playback from the beginning until the music score is updated is random. Is selected. When the user skips “a song that you do not want to listen to” (step S2), the music score, which is the priority of the music, is calculated according to the music selection algorithm while referring to the music feature space, and the music score is inherited. Update (step S3). By repeating steps S1 to S3, the music playlist is dynamically changed.

前述のように本システムの前処理として、まず最初に選曲アルゴリズムに必要となる音響信号の特徴量を抽出することが必要となる。音響分析に用いる特徴量として多くの候補が挙げられるが、本発明の選曲アルゴリズムでは特徴量空間における分布を用いて計算を行うため、特徴量を適切に選択することは非常に重要な作業である。その性能向上のため有効な特徴量とは楽曲の主観的・客観的特徴をよく捉え且つ、特徴量空間上でユーザの気分によるまとまりに分離され分布するような特徴量であると考えられる。 As described above, as a pre-process of the present system, it is necessary to first extract the feature amount of the acoustic signal necessary for the music selection algorithm. There are many candidates for the feature quantity used for acoustic analysis, but the music selection algorithm of the present invention performs calculations using the distribution in the feature quantity space, so it is very important to select the feature quantity appropriately. . The feature quantity effective for improving the performance is considered to be a feature quantity that captures the subjective and objective features of the music well and is separated and distributed in the feature quantity space according to the user's mood.

楽曲の特徴としてはリズム、テンポ、コード進行、ピッチ情報、曲構造情報などが考えられるがそのような特徴は複雑なものであり、計算により導きだすことは非常に困難である。しかし、それらの特徴に関連した特徴量を用いることにより、楽曲の性質が適切に表せるものと考えられる。 Rhythm, tempo, chord progression, pitch information, music structure information, etc. can be considered as the characteristics of the music, but such characteristics are complicated and it is very difficult to derive by calculation. However, it is considered that the characteristics of the music can be appropriately expressed by using feature amounts related to those features.

本発明の実施に好適な特徴量は次の３種類である。（１）リズム、テンポなどのビート情報（ビートスペクトル）、（２）音色情報（平均ＭＦＣＣ）、（３）パワー情報（パワーヒストグラム）。 There are the following three types of feature quantities suitable for implementing the present invention. (1) Beat information (beat spectrum) such as rhythm and tempo, (2) Tone information (average MFCC), (3) Power information (power histogram).

以下に各特徴量の詳細を示す。 Details of each feature amount are shown below.

第１のビート情報の抽出について説明する。リズム情報を抽出する有効な手法の一つとして音響信号の短時間特徴量の自己相関を用いて、リズム情報であるビートスペクトルという特徴量を算出する手法がある。この手法は楽曲内の類似度を用いてビートを算出する手法であり、帯域制限などを行わないため、ドラムやベースなどリズム楽器を含んでいない楽曲や、音量の小さい部分（無音部を含む）を多く含む楽曲などにも広く適応することができる。そのため楽曲間のリズムの類似度を明らかにする研究や、リズムの類似度を用いて楽音の検索を行う研究や、楽音の分布を音響的特徴に基づいて可視化する研究などに用いられている。 The extraction of the first beat information will be described. As an effective method for extracting rhythm information, there is a method for calculating a feature amount called a beat spectrum, which is rhythm information, using autocorrelation of a short-time feature amount of an acoustic signal. This method calculates the beat using the similarity in the music, and does not limit the bandwidth, so the music does not include rhythm instruments such as drums and bass, and the volume is low (including the silence) It can be widely applied to music containing a lot of music. Therefore, it is used for research to clarify the similarity of rhythm between songs, research to search for musical sounds using rhythmic similarity, and research to visualize the distribution of musical sounds based on acoustic features.

ビートスペクトルの計算手順は以下の通りである。（１）音響信号のパラメタライズ、（２）フレーム類似度の計算、（３）距離マトリクスの作成、（４）ビートスペクトラムの導出。 The procedure for calculating the beat spectrum is as follows. (1) Parameterization of acoustic signal, (2) Calculation of frame similarity, (3) Creation of distance matrix, (4) Derivation of beat spectrum.

以下、図２乃至図５を参照しながら、ビートスペクトルの計算手順を具体的に説明する。 The beat spectrum calculation procedure will be specifically described below with reference to FIGS.

（１）音響信号のパラメタライズ
音響信号をパラメタライズする手法としては様々な方法が挙げられるが、ここでは対数パワースペクトルを用いる。図２に示すように、音響信号を対数パワースペクトルに変換する。条件として、窓長は256点、シフトサイズは128点とした。ここで用いる音響信号は16bitで量子化、22kHzでサンプリングされた信号である。よってフレーム長はおよそ11msとなる。ビートスペクトルは10秒毎に時間窓10秒で抽出を行う。 (1) Parameterization of acoustic signal There are various methods for parameterizing an acoustic signal. Here, a logarithmic power spectrum is used. As shown in FIG. 2, the acoustic signal is converted into a logarithmic power spectrum. As conditions, the window length was 256 points and the shift size was 128 points. The acoustic signal used here is a signal quantized at 16 bits and sampled at 22 kHz. Therefore, the frame length is about 11 ms. The beat spectrum is extracted every 10 seconds with a time window of 10 seconds.

（２）フレーム類似度を求める
信号のフレームの全ての組み合わせについて類似度を求める。図３に示すように、距離尺度として線形空間におけるユーグリッド距離を用いる。 (2) The similarity is obtained for all combinations of the frames of the signal for which the frame similarity is obtained. As shown in FIG. 3, the Eugrid distance in a linear space is used as a distance measure.

特徴ベクトルのコサインをとることにより大きさへの依存を平滑化することができる。
The dependence on the magnitude can be smoothed by taking the cosine of the feature vector.

（３）距離マトリクスを作成する
図４に示すように、各要素にフレーム間の類似度が入る距離マトリクスを作成する。このときｉ行ｊ列目の要素はｖ_iとｖ_jとの類似度になる。 (3) Creating a Distance Matrix As shown in FIG. 4, a distance matrix is created in which the similarity between frames enters each element. At this time, the element in the i-th row and j-th column has a similarity between v _i and v _j .

（４）ビートスペクトラムを求める
ビートスペクトラムの簡単な計算手法は以下の通りである。 (4) A simple calculation method of the beat spectrum for obtaining the beat spectrum is as follows.

ビートスペクトルを一曲にわたって求め、その平均値を一次元の特徴量とする。
The beat spectrum is obtained over one piece of music, and the average value is set as a one-dimensional feature amount.

以上の計算結果の一例としてある時間のロック楽曲のビートスペクトルを図５に示す。同図をみるとビートスペクトルの形状が繰り返し構造になっており、ロックの比較的ビートの繰り返しが強いという特徴捉えているものと考えられる。 FIG. 5 shows a beat spectrum of a rock music of a certain time as an example of the above calculation result. In the figure, the shape of the beat spectrum has a repetitive structure, which is considered to capture the characteristic of relatively strong beat repetition of rock.

第２の長時間ＭＦＣＣについて説明する。楽曲の音色を表す特徴量として、スペクトル包絡を表わす特徴量であるＭＦＣＣ（mel-frequency cepstral coefficients）を一楽曲にわたって平均した長時間ＭＦＣＣを用いた。長時間ＭＦＣＣの算出方法は以下に示す通りである。まず、一楽曲全体にわたって、窓長4096点、シフト幅1280点で全てのフレームのＭＦＣＣを算出する。ＭＦＣＣの計算時のフィルタバンク数は40であり、求められる次元数は13次元である。算出された全てのＭＦＣＣの一楽曲にわたる平均を求める。そのうち直流成分である１次元目を除いた12次元をその楽曲の長時間ＭＦＣＣとする。 The second long-term MFCC will be described. A long-term MFCC obtained by averaging MFCC (mel-frequency cepstral coefficients), which is a feature amount representing a spectral envelope, over one song is used as a feature amount representing the tone color of the song. The calculation method of long-term MFCC is as follows. First, the MFCC of all frames is calculated for a whole musical piece with a window length of 4096 points and a shift width of 1280 points. The number of filter banks at the time of calculation of MFCC is 40, and the required number of dimensions is 13 dimensions. An average of all the calculated MFCCs over one song is obtained. Of these, the 12th dimension excluding the first dimension, which is a direct current component, is the long-term MFCC of the music.

第３のパワーヒストグラムについて図６乃至図１２を参照しながら説明する。楽曲のパワーの構成を表す特徴量として短時間パワーのヒストグラムを用いた。これは時間毎のパワーをヒストグラムにした特徴量で、これにより楽曲の盛り上がりの激しさなどを表すことができ、楽曲の構造的な特徴をとらえることができるものと考えられる。パワーヒストグラムの算出方法は以下に示す通りである。 The third power histogram will be described with reference to FIGS. A short-time power histogram was used as a feature value representing the composition of the power of music. This is a feature value obtained by making the power for each hour into a histogram, which can express the intensity of the swell of the music, and can be considered to capture the structural characteristics of the music. The calculation method of the power histogram is as follows.

まず、図６に示す楽曲の波形データＷ（ｔ）（ｔは時刻）に関して、一楽曲全体にわたって一定の時間間隔ごとに短時間パワーを算出する。すなわち、時間を分割する窓を幅size秒でshift秒づつ移動させながらパワーを求める。size，shiftは例えば共に0．8秒（重なりをつくらない）など適当な値に設定すればよい。図７に示すように、各窓のパワーを求める。ｉ番目の窓のパワーの計算式は次のとおりである。 First, regarding the waveform data W (t) (t is time) of the music shown in FIG. 6, the short-time power is calculated at regular time intervals throughout the music. That is, the power is obtained while moving the window for dividing the time by the width size seconds by shift seconds. Size and shift may be set to appropriate values such as 0.8 seconds (no overlap). As shown in FIG. 7, the power of each window is obtained. The formula for calculating the power of the i-th window is as follows.

なお、ここでは、単に二乗和での計算を行っているが、人の聴覚に合わせるなら対数（デシベル）でパワーを求めることも考えられる。
Here, the calculation is simply performed by the sum of squares, but it is also conceivable to obtain the power in a logarithm (decibel) to match the human hearing.

次に、図８に示すように、短時間パワーの最も大きい値であるMax値を定義し、短時間パワーの最大値（Max値）から最小値まで10等分割し、一楽曲にわたってパワー値によって10分割されたヒストグラムを作成する。このとき、Max値からヒストグラムの境界値を定義する。具体的には、０を１番目の境界値とし、合計11の境界値を作る。 Next, as shown in FIG. 8, the Max value, which is the largest value of the short-time power, is defined and divided into 10 equal parts from the maximum value (Max value) of the short-time power to the minimum value. Create a 10-segment histogram. At this time, the boundary value of the histogram is defined from the Max value. Specifically, 0 is set as the first boundary value, and a total of 11 boundary values are created.

ヒストグラムの作成方法について２つの方法を以下に例示する。 Two methods for creating a histogram are exemplified below.

第１のヒストグラムの作成方法について説明する。パワーの値がｉ番目の境界値より値が大きいフレームの個数をヒストグラムのｉ次元の値とする。よって１次元目は０より大きいパワーを持つフレーム数であるので、１次元目の値は短時間パワーの総フレーム数と等しい。得られた１〜11次元までの各次元を総フレーム数で割る。総フレーム数で割るので１次元目は必ず値が１になる。つまり、曲に含まれるフレーム数を１に正規化することで、曲ごとに時間長が違うためにヒストグラムの形状が変化することを防ぐ（時間の正規化）。そして、１〜11次元の全て１である１次元目を除いた２〜11次元までを10次元の特徴量とする。この第１のヒストグラムの作成方法を用いた計算結果の一例として、あるクラシック楽曲についてヒストグラムを作成したものが図９であり、あるポップス楽曲についてヒストグラムを作成したものが図１０である。 A method for creating the first histogram will be described. The number of frames whose power value is greater than the i-th boundary value is defined as the i-dimensional value of the histogram. Therefore, since the first dimension is the number of frames having power greater than 0, the value of the first dimension is equal to the total number of frames with short-time power. Each obtained dimension from 1 to 11 is divided by the total number of frames. Since it is divided by the total number of frames, the value of the first dimension is always 1. In other words, normalizing the number of frames included in a song to 1 prevents the shape of the histogram from changing because the time length differs for each song (time normalization). Then, 2 to 11 dimensions excluding the first dimension, which is all 1 to 1 to 11 dimensions, are set as 10-dimensional feature values. As an example of the calculation result using the first histogram creation method, FIG. 9 shows a histogram created for a certain classical music piece, and FIG. 10 shows a histogram created for a certain pop music piece.

第２のヒストグラムの作成方法について説明する。０〜1/10×Maxを１番目のbin，1/10×Maxを２番目のbin，・・・として、合計10個のbinを作る。パワー値がbinのパワー値の範囲内である楽曲内のフレームの個数をそのbinの値とする。各binを総フレーム数で割る（時間の正規化）。そして、１〜10番目のbinの値を10次元の特徴量として用いる。この第２のヒストグラムの作成方法を用いた計算結果の一例として、あるクラシック楽曲についてヒストグラムを作成したものが図１１であり、あるポップス楽曲についてヒストグラムを作成したものが図１２である。 A method for creating the second histogram will be described. A total of 10 bins are created with 0-1 / 10 × Max as the first bin, 1/10 × Max as the second bin, and so on. The number of frames in the music whose power value is within the range of the bin power value is taken as the bin value. Divide each bin by the total number of frames (time normalization). Then, the values of the 1st to 10th bins are used as 10-dimensional feature values. As an example of a calculation result using the second histogram creation method, FIG. 11 shows a histogram created for a certain classical music piece, and FIG. 12 shows a histogram created for a pop music piece.

このようにして求めた10次元の特徴量を用いる。この特徴量においてはパワーをとる窓長により、パワー値が変化するため特徴量も変化すると考えられる。 The 10-dimensional feature amount obtained in this way is used. In this feature quantity, it is considered that the feature quantity also changes because the power value changes depending on the window length for taking power.

以下、選曲アルゴリズムの処理手順について図１３及び図１４のフローチャートを参照しながら詳述する。ここでｘ（ｉ）は楽曲ｉの特徴量ベクトル、ｓ（ｉ，ｊ）は楽曲ｉのｊ回目の選曲におけるスコアである。 Hereinafter, the processing procedure of the music selection algorithm will be described in detail with reference to the flowcharts of FIGS. Here, x (i) is a feature vector of music i, and s (i, j) is a score in j-th music selection of music i.

図示していない選曲アルゴリズムの前処理として、全ての楽曲について前述の３つの特徴量を抽出し、正規化した上で各楽曲について保持する。 As pre-processing of a music selection algorithm (not shown), the above-described three feature quantities are extracted for all music pieces, normalized, and held for each music piece.

図13において、システム開始時に、ｎ個の全楽曲に対して同じ楽曲スコアの初期値ｓ（ｉ、１）（ｉ＝１，２，・・・ｎ）を与える（ステップＳ10）。 In FIG. 13, at the time of starting the system, the initial value s (i, 1) (i = 1, 2,... N) of the same music score is given to all n pieces of music (step S10).

ｎ曲の楽曲のうち、ランダムに１曲を選択し、再生する（ステップＳ11）。ユーザは、再生された曲が今聴きたい曲であれば、何の操作もせずにそのままランダム再生された楽曲を聴く。これをステップＳ13のユーザ入力があるまで繰り返す（ステップＳ11，Ｓ12）。ユーザが聴きたくないと思った楽曲がｊ回目の選曲において出てきた時点で、ユーザがスキップ操作を行うことにより（ステップＳ12，Ｓ13）、ステップＳ14に進む。 Of the n songs, one is randomly selected and played (step S11). If the reproduced song is a song that the user wants to listen to now, the user listens to the randomly reproduced song without any operation. This is repeated until there is a user input in step S13 (steps S11 and S12). When a music piece that the user does not want to listen to appears at the j-th music selection, the user performs a skip operation (steps S12 and S13), and the process proceeds to step S14.

ステップＳ14で楽曲スコアが更新され、新しい楽曲スコアに基づいてステップＳ15でプレイリストが再構成される。ここで、これらのプレイリスト再構成処理について図14を参照しながら詳述する。ステップＳ13のスキップ操作時に、その楽曲が「聴きたくない」と指定され、その時の楽曲インデックスを“dislike”とし、プレイリストの残りＮ曲の１曲目の楽曲インデックスはｉとする（ステップＳ20）。Ｎ個全ての楽曲について、ステップＳ13において指定された「聴きたくない」な楽曲の特徴量空間上の位置ベクトルｘ（dislike）と、他の各楽曲の位置ベクトルｘ（ｉ）との距離ｄ（ｉ）を計算し、次式に従ってその逆数を楽曲スコアからそれぞれ減算する（ステップＳ21〜Ｓ24）。 The music score is updated in step S14, and the playlist is reconfigured in step S15 based on the new music score. Here, these playlist reconstruction processes will be described in detail with reference to FIG. At the time of the skip operation in step S13, the music is designated as “I do not want to listen”, the music index at that time is set to “dislike”, and the music index of the first N songs in the playlist is set to i (step S20). For all N songs, the distance d () between the position vector x (dislike) on the feature amount space of the song that you do not want to hear and the position vector x (i) of each other song specified in step S13. i) is calculated, and the reciprocal number is subtracted from the music score according to the following equation (steps S21 to S24).

ここまでのステップＳ20〜Ｓ24までがステップＳ14に対応する。このステップＳ14における楽曲スコアの計算結果に基づいて、選曲されていない楽曲の中で楽曲スコアが最も高いmaxs（ｉ，ｊ）ものから順番に再生されるようにプレイリストを並べ替えることによりプレイリストが更新される（ステップＳ25）。もしそのような楽曲が複数あるような場合はランダムに再生順を決定する。このステップＳ25がステップＳ15に対応する。なお、楽曲スコアの計算手法は、数式４に限定されず、ｄ（ｉ）の値により単調増加する関数をＦ（ｄ）とし、ｓ（ｉ，ｊ−１）−（１／Ｆ（ｄ））を用いて計算してもよい。また、例えば、楽曲スコアの初期値にｄを加えて、楽曲スコアが最も低いmin（ｉ，ｊ）ものから順番に再生されるようにプレイリストを並べ替えるなど種々の変形が考えられる。
Steps S20 to S24 so far correspond to step S14. Based on the calculation result of the music score in step S14, the playlist is rearranged so as to be played in order from maxs (i, j) having the highest music score among the music not selected. Is updated (step S25). If there are a plurality of such songs, the playback order is determined randomly. This step S25 corresponds to step S15. Note that the music score calculation method is not limited to Equation 4, and a function that monotonously increases with the value of d (i) is F (d), and s (i, j-1)-(1 / F (d) ) May be used for calculation. Further, for example, various modifications are possible, such as adding d to the initial value of the music score and rearranging the playlist so that the music score is reproduced in order from the min (i, j).

再び図13に戻って、再構成されたプレイリスト順に次曲が再生される（ステップＳ16）。ユーザは、再生された曲が今聴きたい曲であれば、何の操作もせずにそのままプレイリスト順に再生された楽曲を聴く。これをステップＳ18のユーザ入力があるまで繰り返す（ステップＳ16，Ｓ17）。ユーザが聴きたくないと思った楽曲がｊ回目の選曲において出てきた時点で、ユーザがスキップ操作を行うことにより（ステップＳ17，Ｓ18）、その楽曲が「聴きたくない」と指定される。その時の楽曲インデックスを“dislike”とし、以降ステップＳ14〜Ｓ18が繰り返される。 Returning again to FIG. 13, the next song is played back in the order of the reconfigured playlist (step S16). If the reproduced music is a song that the user wants to listen to now, the user listens to the reproduced music in the order of the playlist without performing any operation. This is repeated until there is a user input in step S18 (steps S16 and S17). When a music piece that the user does not want to listen to appears in the j-th music selection, the user performs a skip operation (steps S17 and S18), whereby the music piece is designated as "I do not want to listen". The music index at that time is set to “dislike”, and thereafter steps S14 to S18 are repeated.

この選曲アルゴリズムを基に行った実験について示す。 An experiment conducted based on this music selection algorithm will be described.

分析用楽曲データとしては、各被験者に、被験者自身のＰＣや携帯音楽プレイヤ中に保持している楽曲データの中から落ち着きたい時に聴きたい曲（class１）、通常の気分の時に聴きたい曲（class２）、気分を高揚させたい時に聴きたい曲（class３）の３classで楽曲を各30曲ずつ挙げてもらった。 As the music data for analysis, each subject asks each subject to listen to music (class 1) when he / she wants to calm down from the music data held in his / her own PC or portable music player (class 2). ), 30 songs were listed for each of the 3 classes of songs (class 3) that I wanted to listen to when I wanted to raise my mood.

被験者Ａ、Ｂにはジャンルやアーティストの指定は行わなかった。楽曲のジャンルは特に指定せず自由に選んでもらったが、結果として被験者Ａは洋楽のロック，ポップス，テクノなど，またはインストメンタル、被験者Ｂでは洋楽，邦楽のロック，またはポップス，ジャズ，クラシックを含むインストメンタルの楽曲を選択した。被験者Ｃはジャンルがロックである同じアーティストの楽曲90曲である。 For subjects A and B, no genre or artist was specified. The genre of the song was not specified, but it was chosen freely. As a result, Subject A played Western rock, pop, techno, etc., or instrumental. Selected instrumental music including. Subject C has 90 songs of the same artist whose genre is rock.

なおここで用いられた楽曲は全て量子化数16bit、サンプリング周波数22kHz、モノラルである。 Note that all the songs used here are 16 bit quantization, 22 kHz sampling frequency, and monaural.

抽出された各特徴量がどのような性質を示すか、ということを調べるため分析を行った。分析に用いたデータは被験者Ｂ一人分のデータである。 An analysis was performed to examine what kind of characteristics each extracted feature quantity shows. The data used for the analysis is for one subject B.

第１にビートスペクトルの分析を行った。上述の特徴量抽出において得られたビートスペクトルは高次数な特徴量なので、どのような形に縮約するかという問題がある。今回は複数の縮約方法を試した。その中でビートスペクトルが同じ楽曲内での類似度を用いる手法であることから、ビートスペクトルの値そのものがビート構造の特徴を示すという考えの下、ビートスペクトル全てについての平均値を特徴量とする方法が最も良い結果が得られたため、そちらを採用した。図１５に各classごとに特徴量の値をヒストグラムで表したものを示す。なお特徴量は平均が０、分散が１となるように正規化されている。図１５を見るとclass間で重なっている部分があるものの、おおむねclassにより分散していることがわかる。 First, the beat spectrum was analyzed. Since the beat spectrum obtained in the above feature quantity extraction is a high-order feature quantity, there is a problem of how it is reduced. This time, several reduction methods were tried. Among them, since the beat spectrum is a technique that uses the similarity in the same music, the average value of all beat spectra is used as the feature value under the idea that the beat spectrum value itself indicates the characteristics of the beat structure. This method was adopted because it gave the best results. FIG. 15 shows a histogram of the feature value for each class. The feature quantity is normalized so that the average is 0 and the variance is 1. FIG. 15 shows that although there are overlapping portions between classes, they are generally distributed by class.

第２に長時間ＭＦＣＣの分析を行った。今回得られた長時間ＭＦＣＣにおいて、直流成分である１次元目を除き、２次元目から13次元目までの合計12次元を特徴量として用いた。図１６に各classごとに、12次元の特徴量に対して主成分分析を行い、第一主成分の値をヒストグラムで表したものを示す。なお特徴量は各次元ごとに平均が０、分散が１となるように正規化されている。表を見るとclass１とclass２は重なっている部分が大きいが、class３はやはり他のclassより、値が違うことがわかる。 Second, a long time MFCC analysis was performed. In the long-time MFCC obtained this time, a total of 12 dimensions from the 2nd dimension to the 13th dimension were used as feature values except for the 1st dimension which is a direct current component. FIG. 16 shows a result of performing principal component analysis on a 12-dimensional feature amount for each class and representing the value of the first principal component in a histogram. The feature values are normalized so that the average is 0 and the variance is 1 for each dimension. Looking at the table, it can be seen that class1 and class2 have large overlapping parts, but class3 is still different in value from other classes.

第３にパワーヒストグラムの分析を行った。今回得られた合計10次元のパワーヒストグラムに対し主成分分析を行い、classごとに第一主成分の分布をヒストグラムとして示した（図１７）。なお特徴量は各次元について平均０、分散１となるように正規化されている。表を見ると完全に分かれるまではいかないが、classごとのヒストグラムの重なりが小さいことがわかる。 Third, power histogram analysis was performed. Principal component analysis was performed on the total 10-dimensional power histogram obtained this time, and the distribution of the first principal component for each class was shown as a histogram (FIG. 17). The feature values are normalized so that the average is 0 and the variance is 1 for each dimension. Looking at the table, it is not possible to completely separate, but you can see that the overlap of histograms for each class is small.

そして、全特徴量の統合を行った。システムに特徴量を適用するにあたって、ビートスペクトルとパワーヒストグラムについては平均０、分散が１となるように各次元ごとに正規化を行った。しかし、長時間ＭＦＣＣについては各次元で表す特徴の重要度が異なるためこのような正規化を行うことは適切ではないと考えられる。そこで上記の実験手順において実験的に重みｗを定めた。被験者Ａ，Ｂ，Ｃについてｗを変えながら各被験者の結果において基準線と総合評価の曲線との間の面積を計算し、それらの合計が最大となったｗ＝14を設定した。 Then, all feature values were integrated. When applying the features to the system, the beat spectrum and power histogram were normalized for each dimension so that the average was 0 and the variance was 1. However, it is considered that such normalization is not appropriate for the long-term MFCC because the importance of features represented in each dimension is different. Therefore, the weight w was determined experimentally in the above experimental procedure. While changing w for subjects A, B, and C, the area between the reference line and the overall evaluation curve was calculated in the results of each subject, and w = 14 was set at which the sum of them was maximized.

被験者Ｂについての特徴量セットについて主成分分析を行い、横軸に第一主成分、縦軸に第二主成分をとりそれらの分布を調べた。結果を図１８に示す。この分布を見ると各classにおいて重なっている部分はあるものの大まかにはclassごとの分布が形成されていることがわかる。よってこのシステムで用いられる特徴量は正規化したビートスペクトル１次元、重みｗを付けた長時間ＭＦＣＣ12次元、正規化したパワーヒストグラム10次元の合計23次元である。また、図１８に示された特徴量空間において選曲アルゴリズムを適用した場合のイメージ図を示したものが図１９である。ある聴きたくない楽曲（例えば右下隅のclass１の特徴量で表されるもの）に対して、遠い距離にある楽曲がユーザが聴いてもよい楽曲である。聴きたくない楽曲に対する各楽曲の楽曲スコアはＳ−（1/ｄ）で表される。Ｓは楽曲スコアの初期値、ｄは聴きたくない楽曲との距離である。なお、図１９は、イメージし易いように、主成分分析を行って二次元上に表せるようにしたものであるが、実際の距離計算は23次元空間上で行うことは前述のとおりである。 A principal component analysis was performed on the feature amount set for the subject B, and the distribution was examined with the first principal component on the horizontal axis and the second principal component on the vertical axis. The results are shown in FIG. Looking at this distribution, it can be seen that although there are overlapping portions in each class, a distribution for each class is roughly formed. Therefore, the feature quantity used in this system is a total of 23 dimensions, that is, a normalized one-dimensional beat spectrum, a long-term MFCC 12-dimensional with weight w, and a normalized power histogram 10-dimensional. FIG. 19 shows an image diagram when the music selection algorithm is applied in the feature amount space shown in FIG. For a piece of music that is not desired to be listened to (for example, a feature amount of class 1 in the lower right corner), a piece of music at a long distance is a piece of music that the user may listen to. The music score of each music for the music that you do not want to listen to is represented by S- (1 / d). S is the initial value of the music score, and d is the distance from the music that you do not want to listen to. In FIG. 19, the principal component analysis is performed so that it can be expressed in two dimensions so that it can be easily imaged. However, as described above, the actual distance calculation is performed in the 23-dimensional space.

統合された特徴量のセットを選曲アルゴリズムに適用した時の性能を調べるため、実験を行った。 An experiment was conducted to investigate the performance of the integrated feature set when applied to a music selection algorithm.

まず被験者１人分のデータを準備する。上記述べた選曲アルゴリズムにおいては（１）でランダム再生を行い、ユーザが気分によって「聴きたくない」と選んだ楽曲により逐次的に選曲を適応させていく。しかし今回の実験では、楽曲に付加されたクラスにより性能を評価する。 First, data for one subject is prepared. In the music selection algorithm described above, random playback is performed in (1), and the music selection is sequentially adapted according to the music selected by the user as “I do not want to listen”. However, in this experiment, performance is evaluated by the class added to the music.

現実での使用状況を再現するため、ここでは先に述べた３classのうち１つを「今聴きたくない」曲であると仮定して前記選曲アルゴリズムを適用した。 In order to reproduce the actual usage situation, the music selection algorithm is applied on the assumption that one of the three classes described above is a song that you do not want to listen to now.

各被験者、各classについて30セットの試行を行う。各試行は「聴きたくない」classのうちの１曲から開始され、その被験者が選択した全class計90曲を再生し終わるまで続ける。この30回×３セットの選曲において、選曲位置毎の聴きたくない楽曲の累積出現数の平均をとり、それを選曲アルゴリズムの性能の３つのclassについての総合評価とし、聴きたくない楽曲が一様に等出現率で30曲出現した場合の直線を基準線、つまり完全にランダムな選曲の場合の直線として比較する。またclass１，２，３それぞれのclassのみについても調査した。 30 sets of trials are performed for each subject and each class. Each trial starts with one of the “I don't want to listen” classes and continues until all 90 classes selected by the subject have been played. In this 30 times x 3 sets of music selection, the average number of cumulative appearances of songs that you don't want to listen to at each music selection position is averaged, and this is the overall evaluation of the three classes of performance of the music selection algorithm. Compare the straight line when 30 songs appear with the same appearance rate to the reference line, that is, the straight line in the case of completely random music selection. In addition, only class 1, 2 and 3 were also investigated.

各被験者の実験結果を図２０乃至図２２に示す。横軸が選曲を行った回数、縦軸がその中で聴きたくないと指定したclassに属する楽曲が選曲された累積数である。各図において総合評価（all）、class１のみを平均した評価（class1）、class２のみを平均した評価（class2）、class３のみを平均した評価（class3）を示した。参考として聴きたくない楽曲が一様に等出現率で30曲出現した場合である基準線（base）、最初一曲の判断のみで聴きたくないclassの楽曲が最後まででてこなくなった場合、つまり最良の結果（best）、最初に聴きたくない楽曲が全て出現しその後他の楽曲が出現する最悪の結果を想定した場合も示した（worst）。評価としては完全にランダムであると仮定した場合である基準線より評価線が下まわれば下まわるほど評価が高いということになる。結果を見ると被験者Ａ，Ｂ，Ｃ全てほぼ全域で基準線を下回っていることがわかる。最も評価が高いのは被験者Ｂであり、被験者Ａ，Ｃについては同程度であるということがいえる。被験者Ｂにおいてはかなり高い性能が得られた。さらに特に被験者Ｃにおいて、ほとんど似た楽曲である同じアーティストの楽曲についても基準線を下回ったことは、本手法の有効性を示すものであるといえる。classごとの評価を見るとclass３についての性能が若干良好であることが見て取れるが明確な差はなかった。被験者Ａについては音楽の種類が多岐にわたり、ある種類の音楽の中での聴きたい気分と他の種類の音楽の中での聴きたい気分が同じ言葉で表現されていても、必ずしも一致せず特徴量空間の形成が困難であった可能性がある。 The experimental results of each subject are shown in FIGS. The horizontal axis is the number of times a song has been selected, and the vertical axis is the cumulative number of songs that belong to the class designated as not to be listened to. In each figure, overall evaluation (all), evaluation that averaged only class 1 (class 1), evaluation that averaged only class 2 (class 2), and evaluation that averaged only class 3 (class 3) are shown. The reference line (base), which is the case where 30 songs appearing uniformly at the same rate of appearance, and the music of the class that you don't want to listen to only by the judgment of the first song, does not come to the end. The best result (best) is shown, assuming the worst result that all the songs that you don't want to listen to first appear and then other songs appear (worst). As an evaluation, the lower the evaluation line is, the lower the evaluation line is from the reference line that is assumed to be completely random. It can be seen from the results that subjects A, B, and C are all below the reference line in almost the entire area. It can be said that subject B has the highest evaluation, and subjects A and C have the same degree. In subject B, fairly high performance was obtained. Furthermore, especially in the subject C, the fact that the music of the same artist, which is almost similar music, was below the reference line can be said to show the effectiveness of the present technique. Looking at the evaluation for each class, it can be seen that the performance for class 3 is slightly better, but there was no clear difference. Subject A has a wide variety of types of music, and even if the mood in one type of music and the mood in other types of music are expressed in the same language, they do not necessarily match. The formation of a quantity space may have been difficult.

以下、図２３乃至図２７を参照しながら、本発明における音楽再生システムの好ましい実施態様について説明する。 Hereinafter, preferred embodiments of the music playback system according to the present invention will be described with reference to FIGS.

図２３は、本発明における音楽再生システムの構成を示すブロック図である。本システムは、主に、楽曲の再生及びプレイリストの生成を行う再生機能構成部１と、各楽曲の特徴量を計算する特徴計算機能構成部２とからなる。再生機能構成部１と特徴計算機能構成部２との間では、例えば楽曲データや特徴量空間などのデータのやり取りが行われる。 FIG. 23 is a block diagram showing a configuration of a music playback system according to the present invention. This system mainly includes a playback function configuration unit 1 that plays back music and generates a playlist, and a feature calculation function configuration unit 2 that calculates the feature amount of each song. For example, data such as music data and feature space is exchanged between the playback function configuration unit 1 and the feature calculation function configuration unit 2.

再生機能構成部１は、ユーザが操作可能な例えばボタンや入力キーなどの操作入力部３と、音声を出力する例えばスピーカなどの出力部４と、多数の楽曲データを記憶保持する例えばハードディスクやメモリなどの楽曲データ記憶部５と、楽曲データ記憶部５から楽曲データを読み込み所定形式の電気信号に変換して出力部４へ伝達する例えばアンプなどの再生部６と、特徴計算機能構成部２から受け取った特徴量空間を記憶保持する特徴量空間記憶部７と、特徴量空間を参照して楽曲スコアを算出しプレイリストを生成するプレイリスト生成部８と、各構成部を制御して例えば前述の選曲アルゴリズムなどの音楽再生に必要な情報処理を実行する制御部９とを備える。特徴計算機能構成部２は、楽曲データ記憶部５に記憶された各楽曲から特徴量を抽出し特徴量空間を生成する特徴量空間生成部10を備える。 The playback function configuration unit 1 includes an operation input unit 3 such as buttons and input keys that can be operated by a user, an output unit 4 such as a speaker that outputs sound, and a hard disk or memory that stores a large number of music data. From the music data storage unit 5, the music data read from the music data storage unit 5, converted into an electric signal in a predetermined format and transmitted to the output unit 4, and the feature calculation function configuration unit 2 The feature amount space storage unit 7 that stores and holds the received feature amount space, the playlist generation unit 8 that calculates a music score by referring to the feature amount space and generates a playlist, and controls each component unit, for example, And a control unit 9 that executes information processing necessary for music reproduction, such as a music selection algorithm. The feature calculation function configuration unit 2 includes a feature amount space generation unit 10 that extracts a feature amount from each piece of music stored in the song data storage unit 5 and generates a feature amount space.

図２４は、特徴量空間生成部10の構成を示すブロック図である。特徴量空間生成部10は、前述のパワーヒストグラムの計算手順に従って楽曲データ記憶部５に記憶された楽曲データからパワー情報を抽出する短時間パワーヒストグラム抽出部11と、前述の長時間ＭＦＣＣの計算手順に従って楽曲データ記憶部５に記憶された楽曲データから音色情報を抽出する長時間ＭＦＣＣ抽出部12と、前述のビートスペクトルの計算手順に従って楽曲データ記憶部５に記憶された楽曲データからビート情報を抽出するビート情報抽出部13と、各抽出部11，12，13で抽出された３つの特徴量を統合して前述の特徴量空間を生成する特徴量統合部14とを備える。 FIG. 24 is a block diagram illustrating a configuration of the feature amount space generation unit 10. The feature space generation unit 10 includes a short-time power histogram extraction unit 11 that extracts power information from music data stored in the music data storage unit 5 in accordance with the above-described power histogram calculation procedure, and the above-described long-time MFCC calculation procedure. In accordance with the long-term MFCC extraction unit 12 for extracting timbre information from the music data stored in the music data storage unit 5 and the beat information from the music data stored in the music data storage unit 5 in accordance with the beat spectrum calculation procedure described above. A beat information extraction unit 13 for integrating the three feature amounts extracted by the extraction units 11, 12, and 13 to generate the feature amount space described above.

図２５は、実際のシステム構成例の一態様を示すブロック図である。20は、例えばコンピュータなどの情報処理装置であり、前述の選曲アルゴリズム等が実装された音楽再生プログラムがインストールされるなどにより、各ハードウェア資源が有機的に連結共同して再生機能構成部１と特徴計算機能構成部２とが同一装置上で実現される。 FIG. 25 is a block diagram illustrating an aspect of an actual system configuration example. Reference numeral 20 denotes an information processing apparatus such as a computer, for example, by installing a music playback program in which the music selection algorithm described above is installed. The feature calculation function configuration unit 2 is realized on the same device.

図２６は、実際のシステム構成例の別の態様を示すブロック図である。22は、情報処理装置20と同様、例えばコンピュータなどの情報処理装置であり、前述のパワーヒストグラム，長時間ＭＦＣＣ，ビートスペクトルの計算アルゴリズム等が実装された特徴計算プログラムがインストールされるなどにより、各ハードウェア資源が有機的に連結共同して特徴計算機能構成部２のみが装置上で実現される。20は、例えば携帯型音楽プレイヤーなどの情報処理装置であり、前述の選曲アルゴリズム等が実装された音楽再生プログラムがインストールされるなどにより、各ハードウェア資源が有機的に連結共同して再生機能構成部１のみが装置上で実現される。 FIG. 26 is a block diagram illustrating another aspect of an actual system configuration example. 22 is an information processing apparatus such as a computer, for example, similar to the information processing apparatus 20. Each of the information processing apparatuses 20 is installed with a feature calculation program in which the power histogram, long-time MFCC, beat spectrum calculation algorithm, etc. are installed. Only the feature calculation function configuration unit 2 is realized on the device by organically connecting and coordinating hardware resources. 20 is an information processing device such as a portable music player, for example, by installing a music playback program in which the music selection algorithm described above is installed, etc. Only part 1 is implemented on the device.

情報処理装置21と情報処理装置22との間はデータ通信が可能なよう構成されており、例えば楽曲データや特徴量空間などのデータのやり取りが行われる。すなわち、情報処理装置21をクライアントとすると、情報処理装置22はサーバに相当する。 The information processing apparatus 21 and the information processing apparatus 22 are configured to be able to perform data communication. For example, data such as music data and feature amount space is exchanged. That is, when the information processing apparatus 21 is a client, the information processing apparatus 22 corresponds to a server.

図２７は、情報処理装置20又は情報処理装置21の表示装置上で表示される操作画面の一例を示したものである。同図に示す操作画面30において、31は図１８に示すような特徴量空間の表示部であり、32はプレイリストの表示窓であり、33は楽曲の再生操作を行なうための「Play」ボタンであり、34は楽曲の停止操作を行なうための「Stop」ボタンであり、35は楽曲のスキップ（曲飛ばし）操作を行なうための「Skip」ボタンであり、36は楽曲の「聴きたくない」指定操作を行なうための「dislike!」ボタンであり、37は現在のプレイリストを初期状態に戻すリセット操作を行なうための「Reset」ボタンであり、38はプレイリストのシャッフル（ランダムな並び替え）を行なうための「shuffle」ボタンであり、39は曲順(プレイリスト)はそのままでスコアのみを全て初期状態(初期値)に戻すエスリセット操作を行なうための「Sreset」ボタンであり、40は楽曲データが格納されたフォルダの指定を行なうためのプルダウンボックスである。同図に示す画面では、現在再生されている楽曲が「聴きたくない」であると指定するための「dislike!」ボタン36を「Skip」ボタン35とは別に設けている。 FIG. 27 shows an example of an operation screen displayed on the display device of the information processing device 20 or the information processing device 21. In the operation screen 30 shown in the figure, 31 is a display unit for a feature amount space as shown in FIG. 18, 32 is a play list display window, and 33 is a “Play” button for performing a music playback operation. 34 is a “Stop” button for performing a song stop operation, 35 is a “Skip” button for performing a song skip operation, and 36 is “I do not want to listen to” the song. A “dislike!” Button for performing a designated operation, 37 is a “Reset” button for performing a reset operation for returning the current playlist to the initial state, and 38 is a shuffle of playlist (random sorting). `` Shuffle '' button, 39 is the `` Sreset '' button for performing the es reset operation to return only the score to the initial state (initial value) without changing the song order (playlist), 40 Music data is stored This is a pull-down box for specifying a specified folder. In the screen shown in the figure, a “dislike!” Button 36 is provided separately from the “Skip” button 35 for designating that the currently reproduced music is “I do not want to listen”.

以上のように本実施例では、楽曲の再生順序を示すプレイリストに従って音楽を再生する音楽再生システムであって、ユーザが操作可能な操作入力部３と、複数の楽曲データを記憶保持する楽曲データ記憶部５と、前記楽曲データから所定の特徴量を抽出して各楽曲間の類似関係を表す特徴量空間を生成する特徴量空間生成部10と、操作入力部３からの所定の操作入力により指定された楽曲データと他の各楽曲データとの前記特徴量空間上の距離を求めることにより各楽曲データの再生に関する優先度を決定し、当該優先度に基づき前記プレイリストを更新するプレイリスト生成部８とを備えている。 As described above, in this embodiment, the music playback system plays back music in accordance with the playlist indicating the playback order of the music, and the operation input unit 3 that can be operated by the user and the music data that stores and holds a plurality of music data. By a predetermined operation input from the storage unit 5, a feature amount space generation unit 10 that extracts a predetermined feature amount from the music data and generates a feature amount space representing a similar relationship between the pieces of music, and a predetermined operation input from the operation input unit 3 Play list generation for determining a priority for reproduction of each piece of music data by obtaining a distance in the feature amount space between the designated piece of music data and each other piece of music data, and updating the playlist based on the priority Part 8.

また本実施例の音楽再生システムでは、操作入力部３と楽曲データ記憶部５とプレイリスト生成部８とを備えた第１の情報処理装置21と、特徴量空間生成部10を備えた第２の情報処理装置22とから構成されている。 In the music playback system of the present embodiment, the first information processing apparatus 21 including the operation input unit 3, the music data storage unit 5, and the playlist generation unit 8, and the second including the feature amount space generation unit 10. The information processing apparatus 22 is configured.

さらに本実施例の音楽再生システムでは、特徴量空間生成部10は、楽曲のビート情報，音色情報，パワー情報を前記特徴量の要素として前記特徴量空間を生成するものであることを特徴とする。 Furthermore, in the music reproduction system of the present embodiment, the feature amount space generation unit 10 generates the feature amount space using the beat information, timbre information, and power information of the music as elements of the feature amount. .

また本実施例の音楽再生システムでは、前記所定の操作入力はスキップ操作であることを特徴とする。 In the music playback system of the present embodiment, the predetermined operation input is a skip operation.

さらに本実施例の音楽再生システムでは、プレイリスト生成部８は、前記特徴量空間上の距離が遠いもの程、前記優先度を高く設定するものであることを特徴とする。 Furthermore, in the music reproduction system of the present embodiment, the playlist generation unit 8 sets the priority higher as the distance in the feature amount space is longer.

また本実施例の音楽再生システムでは、プレイリスト生成部８は、前記特徴量空間上の距離が近いもの程、前記優先度を高く設定するものであることを特徴とする。 In the music playback system of the present embodiment, the playlist generation unit 8 sets the priority higher as the distance in the feature amount space is shorter.

なお、本発明は、上記実施例に限定されるものではなく、本発明の趣旨を逸脱しない範囲で変更可能である。 In addition, this invention is not limited to the said Example, It can change in the range which does not deviate from the meaning of this invention.

本発明における音楽再生システムの処理の流れを概略的に示すフロー図である。It is a flowchart which shows roughly the flow of a process of the music reproduction system in this invention. 音響信号から変換された対数パワースペクトルを示す図である。It is a figure which shows the logarithmic power spectrum converted from the acoustic signal. 図２のパワースペクトル図からフレーム類似度を求める手順を示す説明図である。It is explanatory drawing which shows the procedure which calculates | requires a frame similarity from the power spectrum figure of FIG. 図３のフレーム類似度から作成された距離マトリクスを示す図である。It is a figure which shows the distance matrix produced from the frame similarity of FIG. ある時間のロック楽曲のビートスペクトルを示す図である。It is a figure which shows the beat spectrum of the rock music of a certain time. 楽曲の波形データを示す波形図である。It is a wave form diagram which shows the waveform data of a music. 図６で示す波形から求めた短時間パワーを示す図である。It is a figure which shows the short time power calculated | required from the waveform shown in FIG. 図７の図にヒストグラムの境界値を定義する手順を示す説明図である。FIG. 8 is an explanatory diagram illustrating a procedure for defining a boundary value of a histogram in the diagram of FIG. 第１のヒストグラム作成方法に従って作成したクラシック楽曲のヒストグラムを示す図である。It is a figure which shows the histogram of the classical music created according to the 1st histogram creation method. 同上、ポップス楽曲のヒストグラムを示す図である。It is a figure which shows the histogram of a pop music music same as the above. 第２のヒストグラム作成方法に従って作成したクラシック楽曲のヒストグラムを示す図である。It is a figure which shows the histogram of the classical music created according to the 2nd histogram preparation method. 同上、ポップス楽曲のヒストグラムを示す図である。It is a figure which shows the histogram of a pop music music same as the above. 本発明における音楽再生システムの選曲アルゴリズムを示すフロー図である。It is a flowchart which shows the music selection algorithm of the music reproduction system in this invention. 同上、選曲アルゴリズムのプレイリスト生成処理を詳細に示すフロー図である。It is a flowchart which shows the play list production | generation process of a music selection algorithm in detail same as the above. 同上、選曲アルゴリズムを基に行った実験で求められたビートスペクトルの値の平均値を示す図である。It is a figure which shows the average value of the value of the beat spectrum calculated | required by the experiment conducted based on the music selection algorithm same as the above. 同上、選曲アルゴリズムを基に行った実験で求められたＭＦＣＣ12次元の第一主成分の値のヒストグラムを示す図である。It is a figure which shows the histogram of the value of the 1st main component of the MFCC12 dimension calculated | required by the experiment conducted based on the music selection algorithm same as the above. 同上、選曲アルゴリズムを基に行った実験で求められたパワーヒストグラムの第一主成分の値のヒストグラムを示す図である。It is a figure which shows the histogram of the value of the 1st main component of the power histogram calculated | required by the experiment conducted based on the music selection algorithm same as the above. 同上、選曲アルゴリズムを基に行った実験で求められた特徴量空間における楽曲の分布を示す図である。It is a figure which shows distribution of the music in the feature-value space calculated | required by the experiment conducted based on the music selection algorithm same as the above. 図１８において選曲アルゴリズムの処理イメージを示す説明図である。It is explanatory drawing which shows the process image of a music selection algorithm in FIG. 本発明における音楽再生システムの被験者Ａに対する実験結果を示す図である。It is a figure which shows the experimental result with respect to the test subject A of the music reproduction system in this invention. 本発明における音楽再生システムの被験者Ｂに対する実験結果を示す図である。It is a figure which shows the experimental result with respect to the test subject B of the music reproduction system in this invention. 本発明における音楽再生システムの被験者Ｃに対する実験結果を示す図である。It is a figure which shows the experimental result with respect to the test subject C of the music reproduction system in this invention. 本発明における音楽再生システムの構成を示すブロック図である。It is a block diagram which shows the structure of the music reproduction system in this invention. 同上、特徴量空間生成部の構成を示すブロック図である。It is a block diagram which shows the structure of a feature-value space generation part same as the above. 同上、実際のシステム構成例の一態様を示すブロック図である。It is a block diagram which shows the one aspect | mode of an actual system configuration example same as the above. 同上、実際のシステム構成例の別の態様を示すブロック図である。It is a block diagram which shows another aspect of an actual system configuration example same as the above. 同上、操作画面の一例を示したものである。The above shows an example of the operation screen.

Explanation of symbols

３操作入力部
５楽曲データ記憶部
８プレイリスト生成部
10 特徴量空間生成部
21，22 情報処理装置 3 Operation input unit 5 Music data storage unit 8 Playlist generation unit
10 Feature space generator
21, 22 Information processing equipment

Claims

A music playback system for playing music according to a playlist indicating a playback order of music pieces,
Operation input means that can be operated by the user, music data storage means for storing and holding a plurality of music data, and a feature quantity space that represents a similar relationship between each music by generating a predetermined feature quantity from the music data is generated. A feature space generation means;
A priority for reproduction of each piece of music data is determined by obtaining a distance in the feature amount space between the piece of music data designated by a predetermined operation input from the operation input unit and each piece of music data, and the priority And a playlist generating means for updating the playlist based on the music playback system.

A first information processing device including the operation input unit, the music data storage unit, and the playlist generation unit; and a second information processing device including the feature space generation unit. The music playback system according to claim 1, wherein

3. The music reproducing system according to claim 1, wherein the feature amount space generating unit generates the feature amount space using beat information of music as an element of the feature amount.

The music reproduction system according to any one of claims 1 to 3, wherein the feature amount space generation unit generates the feature amount space using timbre information of music as an element of the feature amount. .

The music reproduction system according to any one of claims 1 to 4, wherein the feature amount space generation unit generates the feature amount space by using music power information as an element of the feature amount. .

The music reproduction system according to claim 1, wherein the predetermined operation input is a skip operation.

The music playback system according to any one of claims 1 to 6, wherein the playlist generation means sets the priority higher as the distance in the feature space is longer. .

The music playback system according to any one of claims 1 to 7, wherein the playlist generation unit sets the priority higher as the distance in the feature amount space is shorter. .