JP7585681B2

JP7585681B2 - Performance information prediction device, performance model training device, performance information generation system, performance information prediction method, and performance model training method

Info

Publication number: JP7585681B2
Application number: JP2020158761A
Authority: JP
Inventors: 博毅佐藤
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2024-11-19
Anticipated expiration: 2040-09-23
Also published as: JP2022052389A

Description

本開示は、演奏情報予測装置、演奏モデル訓練装置、演奏情報生成システム、演奏情報予測方法及び演奏モデル訓練方法に関する。 This disclosure relates to a performance information prediction device, a performance model training device, a performance information generation system, a performance information prediction method, and a performance model training method.

ギターのような楽器の弦振動波形を、マグネティックピックアップもしくはピエゾピックアップによって電気信号化し、そのピッチや音量を分析することで、ＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔａｌＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）メッセージ等のデジタル演奏データに変換するギターコントローラ（もしくはギターシンセサイザー）という電子楽器が存在する。このようなタイプのコントローラは音源を鳴らすためのみの専用ギターコントローラと異なり、通常のギターのシェイプや機能を残しつつ、そこに演奏情報取得用の各弦に独立のピックアップを搭載することでＭＩＤＩ演奏も可能となるという大きなメリットがあり、最も一般的な形態であると言える。 There is an electronic musical instrument called a guitar controller (or guitar synthesizer) that converts the string vibration waveform of an instrument such as a guitar into an electric signal using a magnetic or piezo pickup, analyzes the pitch and volume, and converts it into digital performance data such as MIDI (Musical Instrument Digital Interface) messages. Unlike dedicated guitar controllers that are only used to play sound sources, this type of controller retains the shape and functions of a normal guitar, but has the great advantage of being equipped with independent pickups for each string to obtain performance information, making it possible to play MIDI, and is said to be the most common form.

特開平９－６３３９号公報Japanese Patent Application Publication No. 9-6339 特開２０００－１０５５９０号公報JP 2000-105590 A

しかしながら、このような楽器において長年解決されていない大きな問題の１つとして撥弦時に演奏者が意図しない演奏情報に変換されてしまう、いわゆるトラッキングエラーがある。これは、入力信号の波形のピークやゼロクロスポイントの周期などを観測することでピッチを検出し、入力エンベロープの変化量だけから発音方法を判断しているため、撥弦時に発生するピッキングやタッピングによる過渡的な演奏ノイズや、弦の複雑な倍音の動きに騙されてしまうためである。 However, one of the major problems with these instruments that has remained unsolved for many years is the so-called tracking error, where the performance information is converted into unintended performance information when the strings are plucked. This is because the pitch is detected by observing the peaks of the input signal waveform and the cycle of the zero crossing points, and the sound generation method is determined only from the amount of change in the input envelope, so it is fooled by the transient performance noise caused by picking or tapping when plucking the strings, and the complex harmonic movements of the strings.

例えば、ピッキングノイズについて、撥弦前にピックが弦に接した際に発生する摩擦音、及びピックとブリッジの間の非常に短い長さの弦の振動によるピッキングノイズを演奏音と認識してしまうケースがある。これにより、実際に弦を抑えたフレット位置に対応する演奏音程とはかけ離れた高い音程のノート情報が発生されうる。 For example, when it comes to picking noise, there are cases where the friction noise that occurs when the pick touches the string before plucking, and the picking noise caused by the vibration of the very short length of the string between the pick and the bridge are recognized as playing sounds. This can result in note information that is far higher in pitch than the playing pitch that corresponds to the fret position where the string is actually pressed.

また、ハーモニクスについて、意図的な奏法によってハーモニクスを発生したのではなく、通常演奏において弦振動に含まれるハーモニクスの量が多いために基音と倍音の区別が付かず、ハーモニクスの音程を演奏ノートと認識してしまうケースがある。最も多いのは、２倍音、すなわち、本来の音程よりも１オクターブ高い音程と誤認されてしまうことであるが、３倍音を基音と間違えるケースも起こりやすい。 In addition, when it comes to harmonics, there are cases where harmonics are not produced by an intentional playing technique, but rather the amount of harmonics contained in the string vibration during normal playing is so high that it is difficult to distinguish between the fundamental tone and the overtone, and the harmonic pitch is recognized as the played note. The most common case is that the second overtone is mistaken for a pitch one octave higher than the actual pitch, but it is also easy to mistake the third overtone for the fundamental tone.

また、ピッキングとレガートの奏法の誤認識について、ギターの弦の発音奏法を判断して演奏情報を付加する場合、意図しない発音奏法として誤解されるケースがある。例えば、ギターの弦の発音奏法は音の特性から以下のように分類できる。
Ａ．ピッキングや指に依る通常の撥弦
Ｂ．あるフレットが押さえられている弦に対して、それよりも上位のフレット位置を別の指で叩くか触れることで押弦し、音程を変えるハンマリング・オン奏法（ｈａｍｍｅｒ－ｏｎ）（フレットを押さえる手の指の場合)、あるいはタッピング奏法（通常の押弦の手と反対の手の指で弦を叩く）
Ｃ．フレットを押さえる指で弦を少し引っ張って離すか、あるいは前述のタッピングした指をそのまま使って引っ張って離すことで撥弦するプリング・オフ奏法（ｐｕｌｌ－ｏｆｆ）
Ｄ．現在発音中の弦の上を指で押さえながらスライドさせることで音程を変えるグリッサンド奏法、あるいはスライド奏法。これは、ＭＩＤＩメッセージでは撥弦による新規発音とは解釈せず、ピッチ変化として表現するのが普通である。 Regarding the misrecognition of picking and legato playing, when the sound production technique of guitar strings is judged and performance information is added, there are cases where it is misunderstood as an unintended sound production technique. For example, the sound production techniques of guitar strings can be classified as follows based on the characteristics of the sound.
A. Normal plucking with picking or fingers. B. Hammer-on (using the fingers of the fret hand to strike or touch a higher fret on a string, changing the pitch) or tapping (using the fingers of the opposite hand to strike the string).
C. Pull-off technique: Pull the string slightly with the finger pressing the fret and then release it, or use the same finger as above to pull and release the string to pluck it.
D. A glissando or slide technique in which the pitch is changed by pressing and sliding a finger over the currently sounding string. This is not interpreted as a new sound being produced by plucking the string in a MIDI message, but is usually represented as a pitch change.

これらの中で一般的にＡ，Ｂ，Ｃのケースでは、新しい発音情報が発生し、Ｄのケースではレガート奏法と判断し、現在の発音中のノートに対してピッチベンド情報を発生させる。 Of these, cases A, B, and C generally generate new sounding information, while case D is determined to be legato playing and generates pitch bend information for the currently sounding note.

これらの奏法の判断は発音時の音量エンベロープの変化だけでなく、過渡時に発生する各種ノイズの倍音のレベルの変化の様子なども解析することで判断できると考えられるが、このような解析は従来の方法では困難であった。 It is believed that these playing styles can be determined not only by changes in the volume envelope when sound is produced, but also by analyzing changes in the levels of the harmonics of various noises that occur during transients, but this type of analysis has been difficult using conventional methods.

また、さらに言えばギターごとに異なる特性、演奏者の癖、ピックの形状や材質、フィンガーピッキングでは演奏者の皮膚の硬さなどで撥弦時の周波数成分や変化が大きく変わってくるため、判断時には個別の特性を考慮する必要もあるが、判断時にそのようなファクターを加味するものも存在しないのが実情であり、奏法の判断自体を行っている楽器はほとんど存在しないのが実情である。 Furthermore, the frequency components and changes when plucking the strings vary greatly depending on the different characteristics of each guitar, the player's habits, the shape and material of the pick, and in fingerpicking, the hardness of the player's skin, so it is necessary to take individual characteristics into account when making a judgment. However, there is currently nothing that takes such factors into account when making a judgment, and in reality, there are almost no instruments that can judge playing style itself.

上記課題を鑑み、本開示の課題は、電子弦楽器の演奏を演奏情報に高精度に変換するための技術を提供することである。 In view of the above problems, the objective of this disclosure is to provide technology for converting the performance of an electronic stringed instrument into performance information with high accuracy.

上記課題を解決するため、本開示の一態様は、弦楽器演奏を表す弦振動波形データからスペクトルデータフレームを生成し、前記スペクトルデータフレームに基づいて、前記スペクトルデータフレームに含まれる所定個数の上位ピークの周波数から構成されるスペクトル特徴化データフレームを取得する前処理部と、訓練済み演奏モデルを利用して、基準時刻の前記スペクトル特徴化データフレームと、前記基準時刻前後の時刻の前記スペクトル特徴化データフレームとから、前記弦楽器演奏の演奏情報を予測する演奏情報予測部と、を有し、前記訓練済み演奏モデルは、前記基準時刻の前記スペクトル特徴化データフレームと、前記基準時刻前の第１の数のスペクトル特徴化データフレームと、前記基準時刻後の第２の数のスペクトル特徴化データフレームとを取得し、弦楽器の奏法及びノート番号を出力する、演奏情報予測装置に関する。
In order to solve the above-mentioned problems, one aspect of the present disclosure relates to a performance information prediction device including: a pre-processing unit that generates a spectral data frame from string vibration waveform data representing a stringed instrument performance, and acquires, based on the spectral data frame, a spectral feature data frame composed of frequencies of a predetermined number of top peaks contained in the spectral data frame; and a performance information prediction unit that utilizes a trained performance model to predict performance information of the stringed instrument performance from the spectral feature data frame at a reference time and the spectral feature data frames at times before and after the reference time, wherein the trained performance model acquires the spectral feature data frame at the reference time, a first number of spectral feature data frames before the reference time, and a second number of spectral feature data frames after the reference time, and outputs a playing style and note numbers of the stringed instrument .

本開示によると、電子弦楽器の演奏を演奏情報に高精度に変換することができる。 According to this disclosure, it is possible to convert the performance of an electronic stringed instrument into performance information with high accuracy.

本開示の一実施例によるギターコントローラを示す概略図である。FIG. 1 is a schematic diagram illustrating a guitar controller according to one embodiment of the present disclosure. 本開示の一実施例による演奏情報の構成を示す図である。A diagram showing the configuration of performance information according to one embodiment of the present disclosure. 本開示の一実施例によるＴＡＢ譜を示す図である。FIG. 1 illustrates a tablature according to one embodiment of the present disclosure. 本開示の一実施例によるギターコントローラの外観を示す図である。FIG. 1 illustrates an external view of a guitar controller according to an embodiment of the present disclosure. 本開示の一実施例によるギターのハードウェア構成を示すブロック図である。FIG. 2 is a block diagram showing a hardware configuration of a guitar according to an embodiment of the present disclosure. 本開示の一実施例による制御装置のハードウェア構成を示すブロック図である。FIG. 2 is a block diagram showing a hardware configuration of a control device according to an embodiment of the present disclosure. 本開示の一実施例による演奏モデル訓練装置の動作を示す概略図である。1 is a schematic diagram illustrating an operation of a musical performance model training device according to an embodiment of the present disclosure. 本開示の一実施例による演奏モデル訓練装置の機能構成を示すブロック図である。1 is a block diagram showing a functional configuration of a musical performance model training device according to an embodiment of the present disclosure. 本開示の一実施例によるスペクトルデータフレームを示す概略図である。FIG. 2 is a schematic diagram illustrating a spectral data frame according to an embodiment of the present disclosure. 本開示の一実施例による特徴化データフレームを示す概略図である。FIG. 2 is a schematic diagram illustrating a characterization data frame according to an embodiment of the present disclosure. 本開示の一実施例による演奏モデルのアーキテクチャを示す図である。A diagram showing the architecture of a performance model according to one embodiment of the present disclosure. 本開示の他の実施例による演奏モデルのアーキテクチャを示す図である。A diagram showing the architecture of a performance model according to another embodiment of the present disclosure. 本開示の一実施例による演奏モデル訓練処理を示すフローチャートである。13 is a flowchart illustrating a performance model training process according to an embodiment of the present disclosure. 本開示の一実施例による演奏情報予測装置の動作を示す概略図である。1 is a schematic diagram illustrating an operation of a performance information prediction device according to an embodiment of the present disclosure. 本開示の一実施例による演奏情報予測装置の機能構成を示すブロック図である。1 is a block diagram showing a functional configuration of a performance information prediction device according to an embodiment of the present disclosure. 本開示の一実施例による音量検出及びピッチ検出を示す概略図である。FIG. 2 is a schematic diagram illustrating volume detection and pitch detection according to one embodiment of the present disclosure. 本開示の一実施例による演奏情報予測処理を示すフローチャートである。11 is a flowchart illustrating a performance information prediction process according to an embodiment of the present disclosure.

以下の実施例では、ギターの演奏によって生成される弦振動波形から演奏情報（例えば、ＭＩＤＩメッセージなど）を生成するギターコントローラが開示される。なお、本開示は、ギターコントローラに限定されず、弦振動波形抽出機能を備えた弦楽器の演奏から演奏情報を生成する他の何れかの演奏情報生成装置に適用されてもよい。
［本開示の概要］
後述される実施例を概略すると、図１に示されるように、本開示の一実施例によるギターコントローラ１０は、ギター５０及び制御装置１００を有する。ギターコントローラ１０は、ニューラルネットワークなどの機械学習モデルとして実現される演奏モデルを利用して、ギター５０の演奏によって生成される弦振動波形から演奏情報を生成する。 In the following embodiment, a guitar controller is disclosed that generates performance information (e.g., MIDI messages) from a string vibration waveform generated by playing a guitar. Note that the present disclosure is not limited to a guitar controller, and may be applied to any other performance information generating device that generates performance information from the performance of a stringed instrument having a string vibration waveform extraction function.
[Summary of the Disclosure]
1 , a guitar controller 10 according to an embodiment of the present disclosure includes a guitar 50 and a control device 100. The guitar controller 10 generates performance information from a string vibration waveform generated by playing the guitar 50, using a performance model realized as a machine learning model such as a neural network.

本開示の一実施例による演奏情報は、図２に示されるように、発音情報、消音情報及びピッチ変更情報の演奏種別を示す。 The performance information according to one embodiment of the present disclosure indicates the performance type of sound production information, mute information, and pitch change information, as shown in FIG. 2.

発音情報は、分類モデルとしての演奏モデルによって判別されるノート番号及び奏法と、エンベロープ検出による撥弦の強さとを示す。奏法は、例えば、０）ピックによるピッキング、１）フィンガーピッキング、２）ハンマリング・オン（タッピング）、３）プリング・オフ、４）ミュートピッキング、５）オープンハーモニクス、及び６）ピッキング・ハーモニクスの７種類に分類される。発音をＭＩＤＩメッセージにより表現する場合、奏法はＣｏｎｔｒｏｌＣｈａｎｇｅ：０ｘＢｎ，０ｘ４６，ｖｖによって表し、ノート番号及び撥弦の強さはＮｏｔｅＯｎ：０ｘ９ｎ，ｋｋ，ｖｖによって表してもよい。 The sound generation information indicates the note number and playing style determined by the performance model as a classification model, and the strength of the string plucking determined by envelope detection. The playing styles are classified into seven types, for example: 0) picking with a pick, 1) finger picking, 2) hammering on (tapping), 3) pulling off, 4) mute picking, 5) open harmonics, and 6) picking harmonics. When sound generation is expressed by a MIDI message, the playing style may be represented by Control Change: 0xBn, 0x46, vv, and the note number and string plucking strength may be represented by Note On: 0x9n, kk, vv.

また、消音情報は、エンベロープ検出によって検出され、０）発音停止及び１）置き換えを表す。消音をＭＩＤＩメッセージにより表現する場合、ＣｏｎｔｒｏｌＣｈａｎｇｅ：０ｘＢｎ，０ｘ４６，ｖｖ及びＮｏｔｅＯｆｆ：０ｘ８ｎ，ｋｋ，ｖｖによって表してもよい。 Also, the mute information is detected by envelope detection and indicates 0) sound stop and 1) replacement. When mute is expressed by a MIDI message, it may be expressed by Control Change: 0xBn, 0x46, vv and Note Off: 0x8n, kk, vv.

また、ピッチ変更情報は、ゼロクロスカウントによって検出され、例えば、半音チョーキングアップ、半音チョーキングダウン、全音チョーキングアップ、全音チョーキングダウン、１音半チョーキングアップ、１音半チョーキングダウン、２音チョーキングアップ、２音チョーキングダウン、及びスライドを示す。ピッチ変化をＭＩＤＩメッセージにより表現する場合、ＰｉｔｃｈＢｅｎｄ：０ｘＥｎ，ｌｌ，ｍｍによって表してもよい。 The pitch change information is detected by zero cross counting, and indicates, for example, half-tone choking up, half-tone choking down, whole-tone choking up, whole-tone choking down, one-tone and a half-tone choking up, one-tone and a half-tone choking down, two-tone choking up, two-tone choking down, and slide. When expressing pitch change by MIDI message, it may be expressed by Pitch Bend: 0xEn, ll, mm.

図１に示された実施例では、ギターコントローラ１０は、演奏モデルを訓練する訓練モードと、訓練した演奏モデルを利用して演奏情報を予測する演奏モードとの２つの動作モードを有し、制御装置１００は、訓練モードにおいて利用される演奏モデル訓練装置２００と、演奏モードにおいて利用される演奏情報予測装置３００とを有する。 In the embodiment shown in FIG. 1, the guitar controller 10 has two operating modes: a training mode for training a performance model, and a performance mode for predicting performance information using the trained performance model, and the control device 100 has a performance model training device 200 used in the training mode, and a performance information prediction device 300 used in the performance mode.

まず、訓練モードにおいて、ギターコントローラ１０は、訓練用演奏情報データベース８０から訓練データを取得する。訓練データは、例えば、楽譜データ（例えば、ＴＡＢ譜など）と、当該楽譜データに対応するＭＩＤＩファイルとのペアから構成される。ＴＡＢ譜は、例えば、図３に示されるような周知の記法に従って記述されたものであってもよい。ユーザが取得した訓練用楽譜データの楽譜に従ってギター５０を演奏すると、演奏モデル訓練装置２００は、ユーザの演奏に基づきギター５０によって生成された弦振動情報を訓練対象の演奏モデルに入力し、演奏モデルから出力される演奏情報としてのＭＩＤＩメッセージと訓練用ＭＩＤＩファイルとを比較し、これらの誤差が小さくなるように演奏モデルを訓練する。本開示では、弦振動波形データが高速フーリエ変換（ＦＦＴ）によってスペクトルデータに変換され、スペクトルデータにおける所定数のピークに基づき特徴化されたスペクトル特徴化データを利用して、演奏モデルから演奏情報を取得する。訓練が終了すると、演奏モデル訓練装置２００は、訓練した演奏モデルを演奏情報予測装置３００に提供する。 First, in the training mode, the guitar controller 10 acquires training data from the training performance information database 80. The training data is composed of, for example, a pair of music score data (e.g., TAB scores, etc.) and a MIDI file corresponding to the music score data. The TAB scores may be written according to a well-known notation, such as that shown in FIG. 3. When the user plays the guitar 50 according to the music score of the acquired training music score data, the performance model training device 200 inputs string vibration information generated by the guitar 50 based on the user's performance to the performance model to be trained, compares the MIDI message as performance information output from the performance model with the training MIDI file, and trains the performance model so that the error between them is reduced. In the present disclosure, the string vibration waveform data is converted into spectrum data by a fast Fourier transform (FFT), and performance information is acquired from the performance model using spectral feature data characterized based on a predetermined number of peaks in the spectral data. When training is completed, the performance model training device 200 provides the trained performance model to the performance information prediction device 300.

次に、演奏モードでは、ユーザがギター５０を演奏すると、演奏情報予測装置３００は、ユーザの演奏に基づきギター５０によって生成された弦振動情報を訓練済み演奏モデルに入力し、ＭＩＤＩメッセージなどの演奏情報を取得する。取得した演奏情報は、例えば、外部の再生装置やコンピュータに送信され、ユーザは、再生装置を介してユーザによる演奏を再生したり、演奏情報をコンピュータ上で利用できる。 Next, in the performance mode, when the user plays the guitar 50, the performance information prediction device 300 inputs string vibration information generated by the guitar 50 based on the user's performance into the trained performance model, and acquires performance information such as MIDI messages. The acquired performance information is transmitted to, for example, an external playback device or computer, and the user can play back the user's performance via the playback device or use the performance information on the computer.

これにより、電子弦楽器の演奏を演奏情報に変換する際のトラッキングエラーを軽減すると共に、高精度に奏法を判断することが可能になる。 This reduces tracking errors when converting the performance of an electronic string instrument into performance information, and makes it possible to determine playing style with high accuracy.

なお、以下に説明する実施例によるギターコントローラ１０は、演奏モデル訓練装置２００を有するが、本開示はこれに限定されず、例えば、演奏モデルは、外部のコンピュータやサーバによって訓練され、訓練された演奏モデル及び／又は演奏モデルの更新情報が外部のコンピュータやサーバから演奏情報予測装置３００に提供されてもよい。
［ハードウェア構成］
次に、図４を参照して、ギターコントローラ１０の物理的構成を説明する。図４は、本開示の一実施例によるギターコントローラ１０の外観を示す図である。 It should be noted that the guitar controller 10 according to the embodiment described below has the performance model training device 200, but the present disclosure is not limited to this. For example, the performance model may be trained by an external computer or server, and the trained performance model and/or update information of the performance model may be provided to the performance information prediction device 300 from the external computer or server.
[Hardware configuration]
Next, the physical configuration of the guitar controller 10 will be described with reference to Fig. 4. Fig. 4 is a diagram showing the external appearance of the guitar controller 10 according to one embodiment of the present disclosure.

図４に示されるように、ギターコントローラ１０は、相互接続されたギター５０と制御装置１００とから構成されるセパレートタイプの演奏情報生成システムである。 As shown in FIG. 4, the guitar controller 10 is a separate type performance information generation system consisting of an interconnected guitar 50 and a control device 100.

ギター５０は、通常のエレクトリックギターに、６つの弦の各弦の独立した振動を拾うためのヘクサディバイデッドピックアップ、演奏情報の音量をコントロールするためのＭＩＤＩボリューム、制御装置１００に対してパッチメモリ番号の上下切り替えを行うためのアップダウンスイッチ、を搭載したものである。これらの情報とノーマルなピックアップの出力が、マルチケーブルによって制御装置１００に送信されている。また、電源は制御装置１００からマルチケーブル経由で供給される。本実施例のヘクサディバイデッドピックアップは、ノーマルピックアップと同じマグネティック（磁気）ピックアップである。 The guitar 50 is a normal electric guitar equipped with a hexa-divided pickup for picking up the independent vibrations of each of the six strings, a MIDI volume for controlling the volume of the performance information, and an up-down switch for switching the patch memory number up or down for the control device 100. This information and the output of the normal pickup are sent to the control device 100 via a multi-cable. Power is also supplied from the control device 100 via the multi-cable. The hexa-divided pickup in this embodiment is a magnetic pickup, the same as a normal pickup.

一方、制御装置１００は、ギターの弦振動の入力を受け、ＭＩＤＩフォーマットによる演奏情報を生成する。演奏情報の送信先は、限定することなく、音源ユニットやコンピュータ等であってもよい。制御装置１００は、図１に示されるように、各種設定を記憶したパッチメモリのバンク番号とナンバーを切り替えるフットスイッチ、任意の演奏メッセージを割り当てて送信することができるＣＯＮＴＲＯＬスイッチとフットペダルを有する。現在選択されているパッチメモリの番号はＢＡＮＫ／ＮＵＭ画面に表示される。メインの表示デバイスとしてＬＣＤがあるが、画面上にはタッチパネルが装着される。また、データを入力する際のロータリエンコーダもパネル上に装備される。端子として、ギター５０からのマルチケーブルの入力端子ＧＵＩＴＡＲＩＮＰＵＴ、ノーマルピックアップのオーディオ出力端子ＧＵＩＴＡＲＯＵＴ、ＭＩＤＩ演奏信号の出力端子ＭＩＤＩＯＵＴ、ホストコンピュータとの接続端子ＵＳＢｔｏＨＯＳＴ、ＡＣ電源入力端子ＡＣＰＯＷＥＲが備えられる。 On the other hand, the control device 100 receives input of guitar string vibrations and generates performance information in MIDI format. The destination of the performance information may be, without limitation, a sound source unit, a computer, etc. As shown in FIG. 1, the control device 100 has a foot switch for switching the bank number and number of the patch memory in which various settings are stored, a CONTROL switch that can assign and send any performance message, and a foot pedal. The number of the currently selected patch memory is displayed on the BANK/NUM screen. The main display device is an LCD, but a touch panel is attached to the screen. A rotary encoder for inputting data is also provided on the panel. As terminals, the input terminal GUITAR INPUT for the multi-cable from the guitar 50, the audio output terminal GUITAR OUT for the normal pickup, the output terminal MIDI OUT for the MIDI performance signal, the connection terminal USB to HOST with the host computer, and the AC power input terminal AC POWER are provided.

次に、図５を参照して、本開示の一実施例によるギター５０のハードウェア構成を説明する。図５は、本開示の一実施例によるギター５０のハードウェア構成を示すブロック図である。 Next, the hardware configuration of the guitar 50 according to one embodiment of the present disclosure will be described with reference to FIG. 5. FIG. 5 is a block diagram showing the hardware configuration of the guitar 50 according to one embodiment of the present disclosure.

図５に示されるように、ギター５０は、ヘクサディバイデッドピックアップのバッファアンプを通した信号、ＭＩＤＩボリュームコントロール、パッチメモリのアップダウンスイッチ、そしてノーマルピックアップの信号がマルチケーブルで制御装置１００に送信される。３つのノーマルピックアップはピックアップセレクターで選択され、トーンコントロール回路と、ボリュームコントロール回路を経てバッファアンプを通過したものが制御装置１００に送信される。 As shown in FIG. 5, the guitar 50 sends the signal from the hexa-divided pickup buffer amplifier, the MIDI volume control, the patch memory up/down switch, and the normal pickup signal to the control device 100 via a multi-cable. The three normal pickups are selected by a pickup selector, and the signal is sent to the control device 100 after passing through the tone control circuit, the volume control circuit, and the buffer amplifier.

次に、図６を参照して、本開示の一実施例による制御装置１００のハードウェア構成を説明する。図６は、本開示の一実施例による制御装置のハードウェア構成を示すブロック図である。 Next, the hardware configuration of the control device 100 according to one embodiment of the present disclosure will be described with reference to FIG. 6. FIG. 6 is a block diagram showing the hardware configuration of the control device according to one embodiment of the present disclosure.

図６に示されるように、制御装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）及びＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）から構成され、ＣＰＵは制御装置１００全体の機能及び処理を管理し、ＤＳＰは高速処理が必要な波形解析処理を実行する。ＣＰＵのバスには、ＣＰＵが使用するＲＡＭ、ＦｌａｓｈＲＯＭ、ＬＣＤをコントロールするＬＣＤコントローラ、各種Ｉ／Ｏデバイスと接続されるＩ／Ｏインターフェース、ＤＳＰ、ＵＳＢインターフェース、及びＭＩＤＩインターフェースが接続される。さらに、Ｉ／Ｏインターフェースにはフットスイッチ、ロータリエンコーダ、ＬＣＤタッチパネル、ギター５０のＭＩＤＩボリュームと、制御装置１００のペダルの位置を検出するためのＡ／Ｄコンバータ、パッチメモリの番号表示用のＬＥＤが接続される。１つのＡ／Ｄコンバータしか図示されていないが、マルチプレクサによって入力ソースを時分割で切り替えて値を読み込んでいる。専用のＲＡＭとＦｌａｓｈＲＯＭが接続されているＤＳＰには、ヘクサディバイデッドピックアップの６つの弦の出力を高速にデジタル信号化するための独立したＡ／Ｄコンバータが接続されており高速な解析処理を行うことができる。 As shown in FIG. 6, the control device 100 is composed of a CPU (Central Processing Unit) and a DSP (Digital Signal Processor). The CPU manages the overall functions and processing of the control device 100, and the DSP executes waveform analysis processing that requires high speed processing. The CPU bus is connected to the RAM used by the CPU, Flash ROM, an LCD controller that controls the LCD, an I/O interface connected to various I/O devices, the DSP, a USB interface, and a MIDI interface. In addition, the I/O interface is connected to a foot switch, a rotary encoder, an LCD touch panel, the MIDI volume of the guitar 50, an A/D converter for detecting the position of the pedal of the control device 100, and an LED for displaying the patch memory number. Although only one A/D converter is shown, the input source is switched in a time-division manner by a multiplexer to read values. The DSP, which is connected to a dedicated RAM and Flash ROM, is also connected to an independent A/D converter that quickly converts the output of the six strings of the hexa-divided pickup into a digital signal, allowing for high-speed analysis and processing.

しかしながら、ギター５０及び制御装置１００は、上述したハードウェア構成に限定されるものでなく、他の何れか適切なハードウェア構成により実現されてもよい。
［演奏モデル訓練装置］
次に、図７～１２を参照して、本開示の一実施例による演奏モデル訓練装置２００を説明する。図７は、本開示の一実施例による演奏モデル訓練装置２００の動作を示す概略図である。 However, the guitar 50 and the control device 100 are not limited to the above-mentioned hardware configuration, and may be realized by any other appropriate hardware configuration.
[Performance model training device]
Next, the musical performance model training device 200 according to an embodiment of the present disclosure will be described with reference to Figures 7 to 12. Figure 7 is a schematic diagram showing the operation of the musical performance model training device 200 according to an embodiment of the present disclosure.

図７に示されるように、演奏モデル訓練装置２００は、訓練データを格納する訓練用演奏情報データベース８０に格納されている訓練データを利用して演奏モデルを訓練する。具体的には、訓練データは、訓練用楽譜データと当該楽譜データに対応する訓練用演奏情報とのペアから構成され、訓練用楽譜データに基づき表示された楽譜（例えば、ＴＡＢ譜など）を演奏者に表示し、演奏者は、メトロノームによるテンポ制御の下でギター５０を演奏する。当該演奏を表す弦振動波形データは、演奏モデル訓練装置２００に提供され、演奏モデル訓練装置２００は、取得した弦振動波形データを以下で詳細に説明するスペクトル特徴化データフレームに変換し、基準時及び基準時前後のスペクトル特徴化データフレームを訓練対象の演奏モデルに入力する。そして、演奏モデル訓練装置２００は、演奏モデルからの出力と訓練用演奏情報の発音情報（例えば、ノート番号と奏法など）とを比較し、その誤差に応じて演奏モデルのパラメータを更新する。演奏モデル訓練装置２００は、所定の終了条件が充足されるまで上述した処理を繰り返し、演奏モデルからの出力が訓練用演奏情報の発音情報に近づくように演奏モデルを最適化する。 As shown in FIG. 7, the performance model training device 200 trains the performance model using the training data stored in the training performance information database 80 that stores the training data. Specifically, the training data is composed of a pair of training score data and training performance information corresponding to the score data, and a score (e.g., TAB score, etc.) displayed based on the training score data is displayed to the performer, who plays the guitar 50 under tempo control by a metronome. The string vibration waveform data representing the performance is provided to the performance model training device 200, which converts the acquired string vibration waveform data into a spectral feature data frame, which will be described in detail below, and inputs the spectral feature data frames at the reference time and around the reference time to the performance model to be trained. The performance model training device 200 then compares the output from the performance model with the pronunciation information (e.g., note number and playing style, etc.) of the training performance information, and updates the parameters of the performance model according to the error. The performance model training device 200 repeats the above-described process until a predetermined termination condition is satisfied, and optimizes the performance model so that the output from the performance model approaches the pronunciation information of the training performance information.

図８は、本開示の一実施例による演奏モデル訓練装置２００の機能構成を示すブロック図である。 Figure 8 is a block diagram showing the functional configuration of a performance model training device 200 according to one embodiment of the present disclosure.

図８に示されるように、演奏モデル訓練装置２００は、前処理部２１０及び演奏モデル訓練部２２０を有する。 As shown in FIG. 8, the performance model training device 200 has a preprocessing unit 210 and a performance model training unit 220.

前処理部２１０は、訓練用演奏情報に従って演奏された弦楽器演奏を表す弦振動波形データからスペクトルデータフレームを生成し、スペクトルデータフレームをスペクトル特徴化データフレームに変換する。 The pre-processing unit 210 generates a spectral data frame from string vibration waveform data representing a stringed instrument performance performed according to the training performance information, and converts the spectral data frame into a spectral feature data frame.

具体的には、演奏者によってギター５０が演奏されると、ギター５０は、図９に示されるように、時間と各弦の振幅とを示す弦振動波形データを取得し、演奏モデル訓練装置２００に送信する。すなわち、ギター５０は６弦からなるため、６種類の弦振動波形データが生成される。前処理部２１０は、各弦の弦振動波形データに対して高速フーリエ変換（ＦＦＴ）を実行し、スペクトルデータを取得する。具体的には、前処理部２１０は、弦振動波形データから時間軸に関して重複する窓幅ｗ（例えば、Ｗ＝５１２，２５．６ｍｓｅｃなど）の弦振動波形フレームを抽出し、Ｉ回（Ｉ＝６４，３．２ｍｓｅｃなど）のサンプリング毎にＦＦＴを実行し、各弦振動波形フレームをスペクトルデータフレームに変換してもよい。 Specifically, when the guitar 50 is played by a player, the guitar 50 acquires string vibration waveform data indicating time and the amplitude of each string, as shown in FIG. 9, and transmits the data to the performance model training device 200. That is, since the guitar 50 has six strings, six types of string vibration waveform data are generated. The pre-processing unit 210 performs a fast Fourier transform (FFT) on the string vibration waveform data of each string to acquire spectral data. Specifically, the pre-processing unit 210 may extract string vibration waveform frames with a window width w (e.g., W=512, 25.6 msec, etc.) that overlap on the time axis from the string vibration waveform data, and perform an FFT every I samplings (I=64, 3.2 msec, etc.) to convert each string vibration waveform frame into a spectral data frame.

スペクトルデータフレームへの変換後、前処理部２１０は、各スペクトルデータフレームの所定数個の上位のピークによってスペクトルデータフレームを特徴化する。例えば、上位４個のピークによってスペクトルデータフレームを特徴化する場合、前処理部２１０は、図１０に示されるように、スペクトルデータフレーム内の周波数軸に関するピーク（極大点）のうち上位４個のピークの周波数によってスペクトルデータフレームを特徴化し、当該４個のピークの周波数によるスペクトル特徴化データフレームを構成する。当該特徴化によると、データサイズが圧縮されると共に、予測対象の奏法及びノート番号に関係すると想定されるピークの波高及びピークからの波高の時間変化が強調され、演奏モデル訓練処理の精度の向上及び高速化が可能になると考えられる。 After conversion to a spectral data frame, the preprocessing unit 210 characterizes the spectral data frame by a predetermined number of the top peaks of each spectral data frame. For example, when characterizing a spectral data frame by the top four peaks, the preprocessing unit 210 characterizes the spectral data frame by the frequencies of the top four peaks (maxima) on the frequency axis in the spectral data frame as shown in FIG. 10, and constructs a spectral characterization data frame by the frequencies of the four peaks. This characterization reduces the data size and emphasizes the peak heights and time changes in height from the peaks that are assumed to be related to the playing style and note number to be predicted, which is believed to improve the accuracy and speed of the performance model training process.

前処理部２１０は、このようにして抽出した所定数のピークから構成されるスペクトル特徴化データを生成し、演奏モデル訓練部２２０に提供する。 The preprocessing unit 210 generates spectral characterization data consisting of a predetermined number of peaks extracted in this manner and provides it to the performance model training unit 220.

演奏モデル訓練部２２０は、訓練用演奏情報を利用して、基準時刻のスペクトル特徴化データフレームと、当該基準時刻のスペクトル特徴化データフレームの前後のスペクトル特徴化データフレームとから弦楽器演奏の演奏情報を予測する演奏モデルを訓練する。ここで、訓練対象の演奏モデルは、予測対象の基準時刻の奏法及びノート番号を予測する際、当該基準時刻のスペクトル特徴化データフレームだけでなく、当該基準時刻の前後の時刻のスペクトル特徴化データフレームを入力として取得し、基準時刻の奏法及びノート番号を出力する。例えば、演奏モデル訓練部２２０は、基準時刻のスペクトル特徴化データフレームと、基準時刻直前のｐ個のスペクトル特徴化データフレームと、基準時刻直後のｎ個のスペクトル特徴化データフレームとを演奏モデルに入力してもよい。ここで、所定数ｐ，ｎは同一又は異なる所定値であってもよい。例えば、所定数ｐ，ｎは、演奏者によるギター５０の撥弦と、演奏情報予測装置３００における演奏情報の出力とのタイムラグが演奏者によって認知できない程度の値に設定されることが好ましい。 The performance model training unit 220 uses the training performance information to train a performance model that predicts performance information of a stringed instrument performance from a spectral feature data frame at a reference time and spectral feature data frames before and after the spectral feature data frame at the reference time. Here, when predicting the playing style and note number at the reference time to be predicted, the performance model to be trained acquires as input not only the spectral feature data frame at the reference time but also the spectral feature data frames at times before and after the reference time, and outputs the playing style and note number at the reference time. For example, the performance model training unit 220 may input the spectral feature data frame at the reference time, p spectral feature data frames immediately before the reference time, and n spectral feature data frames immediately after the reference time to the performance model. Here, the predetermined numbers p and n may be the same or different predetermined values. For example, it is preferable that the predetermined numbers p and n are set to values such that the time lag between the player plucking the strings of the guitar 50 and the output of the performance information in the performance information prediction device 300 is not noticeable by the player.

このように一定の時間範囲のスペクトル特徴化データフレームを利用することによって、フレーム間の前後関係を考慮して新たな撥弦が発生したか判断することができると共に、撥弦の時間変化を判断することが可能になる。 By using a spectral feature data frame for a certain time range in this way, it is possible to determine whether a new pluck has occurred while taking into account the context between frames, and to determine the change in the pluck over time.

なお、基準時刻において発音がなかった場合、すなわち、基準時刻が消音状態であった場合、演奏モデルは、検出不可を示す値を出力するように訓練されてもよい。 In addition, if there is no sound at the reference time, i.e., if the reference time is in a mute state, the performance model may be trained to output a value indicating that detection is not possible.

一実施例では、演奏モデルは、ニューラルネットワークによって実現されてもよい。例えば、演奏モデルは、図１１に示されるようなネットワークアーキテクチャを有するニューラルネットワークであってもよい。この場合、演奏モデル訓練部２２０は、ニューラルネットワークの入力層に基準時刻のスペクトル特徴化データフレームと、基準時刻前後の（ｐ＋ｎ）個のスペクトル特徴化データフレームとを入力し、中間層における演算を介し出力層から奏法番号Ｖａｒ及びノート番号Ｎｏｔｅを取得する。 In one embodiment, the performance model may be realized by a neural network. For example, the performance model may be a neural network having a network architecture as shown in FIG. 11. In this case, the performance model training unit 220 inputs a spectral feature data frame of a reference time and (p+n) spectral feature data frames around the reference time to the input layer of the neural network, and obtains the rendition style number Var and the note number Note from the output layer through calculations in the intermediate layer.

また、他の実施例では、演奏モデルは、図１２に示されるような再帰型ニューラルネットワークによって実現されてもよい。この場合、演奏モデル訓練部２２０は、上述したｐ，ｎによる時間範囲より広い時間範囲のスペクトル特徴化データフレームを利用してもよく、例えば、基準時刻ｔのスペクトル特徴化データフレーム、基準時刻直前のｂ個（ｂ＞ｐ）のスペクトル特徴化データフレーム、及び基準時刻直後のｆ個（ｆ＞ｎ）のスペクトル特徴化データフレームを再帰型ニューラルネットワークの入力層Ｘ_ｔ－ｂ，・・・Ｘ_ｔ－１，Ｘ_ｔ，Ｘ_ｔ＋１，・・・，Ｘ_ｔ＋ｆに入力し、中間層における演算を介し出力層から奏法番号Ｖａｒ及びノート番号Ｎｏｔｅを取得してもよい。再帰型ニューラルネットワークは、時系列データの処理に適しており、奏法番号Ｖａｒ及びノート番号Ｎｏｔｅを高精度に予測することができると考えられる。 In another embodiment, the performance model may be realized by a recurrent neural network as shown in Fig. 12. In this case, the performance model training unit 220 may use a spectral feature data frame having a wider time range than the time range of p and n described above. For example, the spectral feature data frame at the reference time t, b (b>p) spectral feature data frames immediately before the reference time, and f (f>n) spectral feature data frames immediately after the reference time may be input to the input layer Xt _-b , ..., Xt _-1 , _Xt , _Xt+1 , ..., Xt _+f of the recurrent neural network, and the rendition style number Var and the note number Note may be obtained from the output layer through calculations in the intermediate layer. The recurrent neural network is suitable for processing time series data, and is considered to be able to predict the rendition style number Var and the note number Note with high accuracy.

また、訓練対象の演奏モデルは、事前訓練された機械学習モデルであってもよく、演奏モデル訓練部２２０は、上述した訓練処理によって、事前訓練された演奏モデルをファインチューニングするようにしてもよい。これにより、初期状態の機械学習モデルから演奏モデルを訓練するのと比較して、少ない訓練データにより高精度な演奏モデルを構築することが可能になる。 The performance model to be trained may be a pre-trained machine learning model, and the performance model training unit 220 may fine-tune the pre-trained performance model by the above-mentioned training process. This makes it possible to build a highly accurate performance model with less training data compared to training a performance model from an initial machine learning model.

演奏モデルから奏法及びノート番号を取得すると、演奏モデル訓練部２２０は、取得した奏法及びノート番号と、訓練用演奏情報の奏法及びノート番号とを比較し、これらが一致するように演奏モデルのパラメータを更新する。例えば、演奏モデルがニューラルネットワークにより実現される場合、演奏モデル訓練部２２０は、周知の誤差逆伝播法に従って比較結果に応じてニューラルネットワークのパラメータを更新してもよい。 When the performance model obtains the playing style and note number from the performance model, the performance model training unit 220 compares the obtained playing style and note number with the playing style and note number of the training performance information, and updates the parameters of the performance model so that they match. For example, if the performance model is realized by a neural network, the performance model training unit 220 may update the parameters of the neural network according to the comparison result according to the well-known backpropagation method.

演奏モデル訓練部２２０は、所定の終了条件が充足されるまで、上述した処理を繰り返し、演奏モデルを訓練し、所定の終了条件が充足されると、当該時点における演奏モデルを訓練済み演奏モデルとして演奏情報予測装置３００にわたす。ここで、所定の終了条件は、準備された全ての訓練データを処理したことなどであってもよい。
［演奏モデル訓練処理］
次に、図１３を参照して、本開示の一実施例による演奏モデル訓練処理を説明する。当該演奏モデル訓練処理は、上述した演奏モデル訓練装置２００によって実現され、例えば、プロセッサがプログラム又は命令を実行することによって実現されてもよい。図１３は、本開示の一実施例による演奏モデル訓練処理を示すフローチャートである。 The performance model training unit 220 repeats the above-mentioned process to train the performance model until a predetermined termination condition is satisfied, and when the predetermined termination condition is satisfied, passes the performance model at that time point as a trained performance model to the performance information prediction device 300. Here, the predetermined termination condition may be, for example, that all prepared training data has been processed.
[Performance model training process]
Next, a musical performance model training process according to an embodiment of the present disclosure will be described with reference to Fig. 13. The musical performance model training process is realized by the musical performance model training device 200 described above, and may be realized by, for example, a processor executing a program or an instruction. Fig. 13 is a flowchart showing the musical performance model training process according to an embodiment of the present disclosure.

図１３に示されるように、ステップＳ１０１において、演奏モデル訓練装置２００は、訓練用演奏情報データベース８０から訓練用演奏情報を選択する。具体的には、演奏モデル訓練装置２００は、ランダム、順次、ユーザ選択によって訓練用演奏情報を自動選択してもよい。 As shown in FIG. 13, in step S101, the performance model training device 200 selects training performance information from the training performance information database 80. Specifically, the performance model training device 200 may automatically select training performance information randomly, sequentially, or by user selection.

ステップＳ１０２において、演奏モデル訓練装置２００は、演奏情報をＴＡＢ譜の表示情報に変換する。 In step S102, the performance model training device 200 converts the performance information into tablature display information.

ステップＳ１０３において、演奏モデル訓練装置２００は、ＴＡＢ譜を制御装置１００のＬＣＤなどに表示する。 In step S103, the performance model training device 200 displays the TAB score on the LCD or other display of the control device 100.

ステップＳ１０４において、演奏モデル訓練装置２００は、演奏情報のテンポに合わせてＭＩＤＩプレーヤーをスタートする。 In step S104, the performance model training device 200 starts the MIDI player in sync with the tempo of the performance information.

ステップＳ１０５において、演奏モデル訓練装置２００は、テンポに合わせてメトロノームをスタートする。 In step S105, the performance model training device 200 starts the metronome in time with the tempo.

ステップＳ１０６において、ＭＩＤＩプレーヤーは、演奏情報を再生する。 In step S106, the MIDI player plays the performance information.

ステップＳ１０７において、メトロノームは、演奏情報を再生する。これにより、演奏者の演奏を取得するための準備が整い、演奏者は演奏を開始する。 In step S107, the metronome plays back the performance information. This completes the preparations for acquiring the performer's performance, and the performer begins playing.

ステップＳ１０８において、演奏モデル訓練装置２００は、弦番号ｓを０に初期化する。ギター５０は６弦からなるため、弦番号ｓは０～５の値をとりうる。 In step S108, the performance model training device 200 initializes the string number s to 0. Since the guitar 50 has six strings, the string number s can take a value from 0 to 5.

ステップＳ１０９において、演奏モデル訓練装置２００は、ＭＩＤＩプレーヤーから発生したｓチャネルの発音情報を発音情報メモリｐに格納する。 In step S109, the performance model training device 200 stores the s-channel pronunciation information generated by the MIDI player in the pronunciation information memory p.

ステップＳ１１０において、演奏モデル訓練装置２００は、演奏者による演奏を表す弦番号ｓの弦振動波形をバッファから取得し、スペクトル特徴化データフレームを生成し、リングバッファなどに格納する。 In step S110, the performance model training device 200 obtains the string vibration waveform of string number s, which represents the performance by the performer, from the buffer, generates a spectral feature data frame, and stores it in a ring buffer or the like.

ステップＳ１１１において、演奏モデル訓練装置２００は、基準時刻のスペクトル特徴化データフレーム、基準時刻直前のｐ個のスペクトル特徴化データフレーム、及び基準時刻直後のｎ個のスペクトル特徴化データフレームを訓練対象の演奏モデルに入力する。 In step S111, the performance model training device 200 inputs the spectral feature data frame at the reference time, p spectral feature data frames immediately before the reference time, and n spectral feature data frames immediately after the reference time to the performance model to be trained.

ステップＳ１１２において、演奏モデル訓練装置２００は、演奏モデルの出力結果をメモリｏに格納する。 In step S112, the performance model training device 200 stores the performance model output result in memory o.

ステップＳ１１３において、演奏モデル訓練装置２００は、メモリｐの発音情報（例えば、奏法番号及びノート番号など）と、メモリｏの演奏モデルの出力結果とを比較する。 In step S113, the performance model training device 200 compares the sound production information (e.g., playing style number and note number) in memory p with the output result of the performance model in memory o.

ステップＳ１１４において、演奏モデル訓練装置２００は、メモリｐの発音情報とメモリｏの出力結果との間に差分があるか判断する。 In step S114, the performance model training device 200 determines whether there is a difference between the pronunciation information in memory p and the output result in memory o.

有意な差分があった場合（Ｓ１１４：Ｙｅｓ）、演奏モデル訓練装置２００は、ステップＳ１１５において、当該差分から演奏モデルを更新するための最適化情報を演奏モデルに適用し、ステップＳ１１６に移行する。他方、有意な差分がなかった場合（Ｓ１１４：Ｎｏ）、演奏モデル訓練装置２００は、演奏モデルを更新することなく、ステップＳ１１６に移行する。 If there is a significant difference (S114: Yes), in step S115, the performance model training device 200 applies optimization information to the performance model to update the performance model from the difference, and proceeds to step S116. On the other hand, if there is no significant difference (S114: No), the performance model training device 200 proceeds to step S116 without updating the performance model.

ステップＳ１１６において、演奏モデル訓練装置２００は、次の弦を処理するため、弦番号ｓを１だけインクリメントする。 In step S116, the performance model training device 200 increments the string number s by 1 to process the next string.

ステップＳ１１７において、演奏モデル訓練装置２００は、全ての弦について演奏モデルの更新処理を終了したか判断し、全ての弦について更新処理が終了していない場合（Ｓ１１７：Ｙｅｓ）、ステップＳ１０９に戻る。 In step S117, the performance model training device 200 determines whether the performance model update process has been completed for all strings, and if the update process has not been completed for all strings (S117: Yes), it returns to step S109.

ステップＳ１１８において、演奏モデル訓練装置２００は、演奏情報全体を処理したか判断し、演奏情報全体を処理していない場合（Ｓ１１８：Ｎｏ）、ステップＳ１０６に戻る。 In step S118, the performance model training device 200 determines whether the entire performance information has been processed, and if the entire performance information has not been processed (S118: No), the process returns to step S106.

ステップＳ１１９において、演奏モデル訓練装置２００は、メトロノーム及びＭＩＤＩプレーヤーを停止する。 In step S119, the performance model training device 200 stops the metronome and the MIDI player.

ステップＳ１２０において、演奏モデル訓練装置２００は、ユーザなどによる終了操作があったか判断し、終了操作がない場合（Ｓ１２０：Ｎｏ）、ステップＳ１０１に戻り、次の演奏情報を選択し、終了操作があった場合（Ｓ１２０：Ｙｅｓ）、当該処理を終了する。
［演奏情報予測装置］
次に、図１４～１６を参照して、本開示の一実施例による演奏情報予測装置３００を説明する。図１４は、本開示の一実施例による演奏情報予測装置３００の動作を示す概略図である。 In step S120, the performance model training device 200 determines whether an end operation has been performed by the user or the like. If no end operation has been performed (S120: No), the performance model training device 200 returns to step S101 and selects the next performance information. If an end operation has been performed (S120: Yes), the performance model training device 200 terminates the processing.
[Performance Information Prediction Device]
Next, the performance information prediction device 300 according to an embodiment of the present disclosure will be described with reference to Figures 14 to 16. Figure 14 is a schematic diagram showing the operation of the performance information prediction device 300 according to an embodiment of the present disclosure.

演奏情報予測装置３００は、演奏モデル訓練装置２００によって訓練された演奏モデルを利用して、演奏者によるギター５０の演奏から演奏情報（例えば、ＭＩＤＩメッセージなど）を予測する。具体的には、図１４に示されるように、演奏情報予測装置３００は、ギター５０からギター演奏を表す弦振動波形データを取得すると、取得した弦振動波形データに対して高速フーリエ変換を実行し、時間軸に関して重複部分を有する所定の窓幅のスペクトルデータフレームを生成する。そして、演奏情報予測装置３００は、各スペクトルデータフレームにおける周波数に関する所定数個の上位のピークを特定し、特定したピークを抽出することによってスペクトル特徴化データフレームを生成する。例えば、これらの前処理は、ＤＳＰによって実現されてもよい。 The performance information prediction device 300 uses the performance model trained by the performance model training device 200 to predict performance information (e.g., MIDI messages, etc.) from the performer's performance of the guitar 50. Specifically, as shown in FIG. 14, when the performance information prediction device 300 acquires string vibration waveform data representing a guitar performance from the guitar 50, it performs a fast Fourier transform on the acquired string vibration waveform data to generate a spectral data frame of a predetermined window width having overlapping portions on the time axis. The performance information prediction device 300 then identifies a predetermined number of top peaks in terms of frequency in each spectral data frame, and generates a spectral feature data frame by extracting the identified peaks. For example, these preprocessing steps may be implemented by a DSP.

基準時刻の演奏情報を予測するため、演奏情報予測装置３００は、基準時刻前後の一定の時間範囲のスペクトル特徴化データフレーム、すなわち、基準時刻のスペクトル特徴化データフレーム、基準時刻直前のｐ個のスペクトル特徴化データフレーム及び基準時刻直後のｎ個のスペクトル特徴化データフレームを訓練済み演奏モデルに入力し、奏法及びノート番号を含む発音情報を取得する。また、演奏情報予測装置３００は、基準時刻のスペクトルに対して音量検出及びピッチ検出を実行し、演奏の音量及びピッチを検出する。演奏情報予測装置３００は、検出した音量及びピッチに基づきそれぞれ消音情報及びピッチ変更情報を生成すると共に、音量に基づき撥弦の強さを示すベロシティー情報を生成し、発音情報に付加する。このようにして、演奏情報予測装置３００は、各時刻のスペクトル特徴化データフレームから発音情報、消音情報及び／又はピッチ変更情報を含む各時刻の演奏情報（例えば、ＭＩＤＩメッセージなど）を生成し、外部機器（例えば、再生装置、コンピュータ等）に送信する。例えば、これらの演奏情報生成処理は、ＣＰＵによって実現されてもよい。 To predict the performance information at the reference time, the performance information prediction device 300 inputs the spectral feature data frames of a certain time range before and after the reference time, i.e., the spectral feature data frame at the reference time, p spectral feature data frames immediately before the reference time, and n spectral feature data frames immediately after the reference time, into the trained performance model to obtain sound information including the playing style and note number. The performance information prediction device 300 also performs volume detection and pitch detection on the spectrum at the reference time to detect the volume and pitch of the performance. The performance information prediction device 300 generates silencing information and pitch change information based on the detected volume and pitch, respectively, and generates velocity information indicating the strength of the plucking based on the volume and adds it to the sound information. In this way, the performance information prediction device 300 generates performance information (e.g., MIDI messages, etc.) for each time including sound information, silencing information, and/or pitch change information from the spectral feature data frames at each time, and transmits it to an external device (e.g., a playback device, a computer, etc.). For example, these performance information generation processes may be realized by a CPU.

図１５は、本開示の一実施例による演奏情報予測装置３００の機能構成を示すブロック図である。 Figure 15 is a block diagram showing the functional configuration of a performance information prediction device 300 according to one embodiment of the present disclosure.

図１５に示されるように、演奏情報予測装置３００は、前処理部３１０及び演奏情報予測部３２０を有する。 As shown in FIG. 15, the performance information prediction device 300 has a preprocessing unit 310 and a performance information prediction unit 320.

前処理部３１０は、弦楽器演奏を表す弦振動波形データからスペクトルデータフレームを生成し、スペクトルデータフレームをスペクトル特徴化データフレームに変換する。前処理部２１０と同様に、前処理部３１０は、演奏者によってギター５０が演奏されると、ギター５０から各弦の弦振動波形データを取得し、各弦の弦振動波形データに対して高速フーリエ変換（ＦＦＴ）を実行し、スペクトルデータを取得する。具体的には、前処理部２１０と同様の設定の下、前処理部３１０は、弦振動波形データから時間軸に関して重複する窓幅ｗの弦振動波形フレームを抽出し、サンプリング毎にＦＦＴを実行し、各弦振動波形フレームをスペクトルデータフレームに変換してもよい。スペクトルデータフレームへの変換後、前処理部３１０は、各スペクトルデータフレームの所定数個の上位のピークに基づきスペクトル特徴化データフレームを生成し、基準時刻前後の一定の時間範囲におけるスペクトル特徴化データフレームをリングバッファなどに格納する。 The preprocessing unit 310 generates a spectral data frame from the string vibration waveform data representing the performance of a stringed instrument, and converts the spectral data frame into a spectral feature data frame. As with the preprocessing unit 210, when the guitar 50 is played by the performer, the preprocessing unit 310 acquires string vibration waveform data of each string from the guitar 50, performs a fast Fourier transform (FFT) on the string vibration waveform data of each string, and acquires spectral data. Specifically, under the same settings as the preprocessing unit 210, the preprocessing unit 310 may extract string vibration waveform frames of a window width w that overlap with respect to the time axis from the string vibration waveform data, perform an FFT for each sampling, and convert each string vibration waveform frame into a spectral data frame. After conversion into a spectral data frame, the preprocessing unit 310 generates a spectral feature data frame based on a predetermined number of top peaks of each spectral data frame, and stores the spectral feature data frames in a certain time range around the reference time in a ring buffer or the like.

演奏情報予測部３２０は、訓練済み演奏モデルを利用して、基準時刻のスペクトル特徴化データフレームと、基準時刻のスペクトル特徴化データフレームの前後のスペクトル特徴化データフレームとから、弦楽器演奏の演奏情報を予測する。具体的には、演奏情報予測部３２０は、基準時刻のスペクトル特徴化データフレーム、基準時刻直前のｐ個のスペクトル特徴化データフレーム及び基準時刻直後のｎ個のスペクトル特徴化データフレームを訓練済み演奏モデルに入力し、基準時刻における奏法及びノート番号を取得する。 The performance information prediction unit 320 uses a trained performance model to predict performance information of a stringed instrument performance from the spectral feature data frame at the reference time and the spectral feature data frames before and after the spectral feature data frame at the reference time. Specifically, the performance information prediction unit 320 inputs the spectral feature data frame at the reference time, p spectral feature data frames immediately before the reference time, and n spectral feature data frames immediately after the reference time into the trained performance model, and obtains the playing style and note number at the reference time.

また、これと並行して、演奏情報予測部３２０は、基準時刻のスペクトル特徴化データフレームに対して音量検出及びピッチ検出を実行する。例えば、音量検出について、演奏情報予測部３２０は、図１６（ａ）に示されるように、スペクトル特徴化データフレームの所定数のピークの周波数レベルの合計を算出し、算出した合計の周波数レベルを当該基準時刻における音量として決定してもよい。当該基準時刻に対して訓練済み演奏モデルが発音を検出しなかった場合、あるいは、検出した音量が消音状態と認められる所定の閾値以下であった場合、演奏情報予測部３２０は、発音がなかったと判断し、演奏情報として消音情報を出力する。そうでない場合、演奏情報予測部３２０は、発音があったと判断し、検出した音量を当該発音のベロシティー値とし、演奏モデルから出力された奏法及びノート番号と共に当該ベロシティー値を発音情報に含める。 In parallel with this, the performance information prediction unit 320 performs volume detection and pitch detection on the spectral feature data frame at the reference time. For example, for volume detection, the performance information prediction unit 320 may calculate the sum of the frequency levels of a predetermined number of peaks in the spectral feature data frame as shown in FIG. 16(a), and determine the calculated total frequency level as the volume at the reference time. If the trained performance model does not detect pronunciation at the reference time, or if the detected volume is equal to or less than a predetermined threshold that is recognized as a muted state, the performance information prediction unit 320 determines that there was no pronunciation and outputs muted information as the performance information. Otherwise, the performance information prediction unit 320 determines that there was pronunciation, sets the detected volume as the velocity value of the pronunciation, and includes the velocity value in the pronunciation information together with the playing style and note number output from the performance model.

また、ピッチ検出について、演奏情報予測部３２０は、図１６（ｂ）に示されるように、スペクトル特徴化データフレームの所定数のピークのうち最小の周波数レベルを撥弦のピッチと決定し、ピッチ情報を生成する。そして、演奏情報予測部３２０は、直近のピッチ情報又は発音情報と差異があった場合、ピッチ変更があったと判断し、ピッチ変更情報を出力する。 For pitch detection, the performance information prediction unit 320 determines the minimum frequency level among a predetermined number of peaks in the spectrum feature data frame as the pitch of the plucked string, as shown in FIG. 16(b), and generates pitch information. If there is a difference from the most recent pitch information or pronunciation information, the performance information prediction unit 320 determines that a pitch change has occurred and outputs pitch change information.

なお、前処理部３１０及び演奏情報予測部３２０は、全ての弦に対して上述した処理を並列に実行する。
［演奏情報予測処理］
次に、図１７を参照して、本開示の一実施例による演奏情報予測処理を説明する。当該演奏情報予測処理は、上述した演奏情報予測装置３００によって実現され、例えば、プロセッサがプログラム又は命令を実行することによって実現されてもよい。図１７は、本開示の一実施例による演奏情報予測処理を示すフローチャートである。 The pre-processing unit 310 and the performance information prediction unit 320 execute the above-mentioned processes in parallel for all the strings.
[Performance Information Prediction Processing]
Next, a performance information prediction process according to an embodiment of the present disclosure will be described with reference to Fig. 17. The performance information prediction process is realized by the above-mentioned performance information prediction device 300, and may be realized by, for example, a processor executing a program or an instruction. Fig. 17 is a flowchart showing the performance information prediction process according to an embodiment of the present disclosure.

図１７に示されるように、ステップＳ２０１において、演奏情報予測装置３００は、弦番号ｓを０に初期化する。ギター５０は６弦から構成されるため、弦番号ｓは０～５の値をとる。 As shown in FIG. 17, in step S201, the performance information prediction device 300 initializes the string number s to 0. Since the guitar 50 is made up of six strings, the string number s can take a value between 0 and 5.

ステップＳ２０２において、演奏情報予測装置３００は、弦振動波形データからスペクトル特徴化データフレームを生成し、基準時刻のスペクトル特徴化データフレームに対して音量検出を実行する。 In step S202, the performance information prediction device 300 generates a spectral feature data frame from the string vibration waveform data and performs volume detection on the spectral feature data frame at the reference time.

ステップＳ２０３において、演奏情報予測装置３００は、検出した音量Ｉが所定の閾値未満であるか判断する。音量Ｉが所定の閾値以上である場合（Ｓ２０３：Ｎｏ）、演奏情報予測装置３００は、発音中であると判断し、ステップＳ２０６に移行する。 In step S203, the performance information prediction device 300 determines whether the detected volume I is less than a predetermined threshold. If the volume I is equal to or greater than the predetermined threshold (S203: No), the performance information prediction device 300 determines that sound is being produced and proceeds to step S206.

他方、音量Ｉが所定の閾値未満である場合（Ｓ２０３：Ｙｅｓ）、演奏情報予測装置３００は、ステップＳ２０４において、当該基準時刻において発音中であるかを判断する。例えば、当該判断は、訓練済み演奏モデルから前回発音情報の出力があったか否かに基づき行われてもよい。発音中であった場合（Ｓ２０４：Ｙｅｓ）、演奏情報予測装置３００は、ステップＳ２０５において、演奏モデルから出力されたノート番号に対応した消音情報を生成する。発音中でない場合（Ｓ２０４：Ｎｏ）、演奏情報予測装置３００は、ステップＳ２０６に移行する。 On the other hand, if the volume I is less than the predetermined threshold (S203: Yes), the performance information prediction device 300 determines in step S204 whether the sound is being produced at the reference time. For example, this determination may be made based on whether sound production information was previously output from the trained performance model. If the sound is being produced (S204: Yes), the performance information prediction device 300 generates mute information corresponding to the note number output from the performance model in step S205. If the sound is not being produced (S204: No), the performance information prediction device 300 proceeds to step S206.

ステップＳ２０６において、演奏情報予測装置３００は、当該基準時刻のスペクトル特徴化データフレーム、基準時刻直前のｐ個のスペクトル特徴化データフレーム、及び基準時刻直後のｎ個のスペクトル特徴化データフレームをバッファから抽出する。 In step S206, the performance information prediction device 300 extracts from the buffer the spectral feature data frame for the reference time, p spectral feature data frames immediately before the reference time, and n spectral feature data frames immediately after the reference time.

ステップＳ２０７において、演奏情報予測装置３００は、抽出したスペクトル特徴化データフレームを演奏モデルに入力する。 In step S207, the performance information prediction device 300 inputs the extracted spectral feature data frame into the performance model.

ステップＳ２０８において、演奏情報予測装置３００は、演奏モデルから奏法及びノート番号を含む発音情報が出力されたか判断する。発音情報が出力された場合（Ｓ２０８：Ｙｅｓ）、演奏情報予測装置３００は、ステップＳ２０９において、出力された奏法及びノート番号をそれぞれ変数ｖ，ｋに代入する。他方、発音情報が出力されなかった場合（Ｓ２０８：Ｎｏ）、演奏情報予測装置３００は、ステップＳ２１５に移行する。 In step S208, the performance information prediction device 300 determines whether or not sound information including the playing style and note number has been output from the performance model. If sound information has been output (S208: Yes), the performance information prediction device 300 assigns the output playing style and note number to variables v and k, respectively, in step S209. On the other hand, if sound information has not been output (S208: No), the performance information prediction device 300 proceeds to step S215.

ステップＳ２１０において、演奏情報予測装置３００は、発音があったか判断する。発音があった場合（Ｓ２１０：Ｙｅｓ）、演奏情報予測装置３００は、ステップＳ２１１において、前回の発音イベントのノート番号Ｋ０の消音情報を生成する。他方、発音がない場合（Ｓ２１０：Ｎｏ）、演奏情報予測装置３００は、ステップＳ２１２に移行する。 In step S210, the performance information prediction device 300 determines whether a sound has been produced. If a sound has been produced (S210: Yes), the performance information prediction device 300 generates mute information for note number K0 of the previous sound production event in step S211. On the other hand, if no sound has been produced (S210: No), the performance information prediction device 300 proceeds to step S212.

ステップＳ２１２において、演奏情報予測装置３００は、奏法番号ｖ、ノート番号ｋ及び音量Ｉから変換されたベロシティーを含む発音情報を生成する。 In step S212, the performance information prediction device 300 generates pronunciation information including the velocity converted from the playing style number v, the note number k, and the volume I.

ステップＳ２１３において、演奏情報予測装置３００は、前回の発音イベントのノート番号Ｋ０にｋを代入する。 In step S213, the performance information prediction device 300 assigns k to the note number K0 of the previous sound production event.

ステップＳ２１４において、演奏情報予測装置３００は、前回発生したピッチＰ０＝ｋに対応するピッチを特定する。 In step S214, the performance information prediction device 300 identifies the pitch that corresponds to the previously generated pitch P0=k.

ステップＳ２１５において、演奏情報予測装置３００は、基準時刻のスペクトル特徴化データフレームに対してピッチ検出を実行し、検出したピッチをｐに格納する。 In step S215, the performance information prediction device 300 performs pitch detection on the spectral feature data frame at the reference time and stores the detected pitch in p.

ステップＳ２１６において、演奏情報予測装置３００は、ｐ＝Ｐ０であるか判断する。演奏情報予測装置３００は、ｐ＝Ｐ０である場合（Ｓ２１６：Ｙｅｓ）、ステップＳ２１８に移行し、ｐ＝Ｐ０でない場合（Ｓ２１６：Ｎｏ）、ステップＳ２１７において、ｐからの差分によってピッチベンド情報を生成する。 In step S216, the performance information prediction device 300 determines whether p = P0. If p = P0 (S216: Yes), the performance information prediction device 300 proceeds to step S218, and if p = P0 is not true (S216: No), in step S217, the performance information prediction device 300 generates pitch bend information based on the difference from p.

ステップＳ２１８において、演奏情報予測装置３００は、次の弦に対して上述した処理を実行するため、弦番号ｓを１だけインクリメントする。 In step S218, the performance information prediction device 300 increments the string number s by 1 to perform the above-described process on the next string.

ステップＳ２１９において、演奏情報予測装置３００は、全ての弦が処理されたか判断し、全ての弦が処理された場合（Ｓ２１９：Ｎｏ）、当該演奏情報予測処理を終了し、そうでない場合（Ｓ２１９：Ｙｅｓ）、次の弦に対して上述した処理を繰り返す。 In step S219, the performance information prediction device 300 determines whether all strings have been processed. If all strings have been processed (S219: No), the performance information prediction process ends. If not (S219: Yes), the above-described process is repeated for the next string.

なお、上述した実施例では、ギター５０などの弦楽器の弦振動波形データから演奏情報を予測する演奏モデルを訓練し、訓練した演奏モデルを利用して演奏情報を予測する演奏情報予測システムを説明したが、本開示はこれに限定されず、管楽器に適用されてもよい。すなわち、本開示は、管楽器の空気振動波形データから演奏情報を予測する演奏モデルを訓練し、訓練した演奏モデルを利用して演奏情報を予測する演奏情報予測システムに適用されてもよい。 In the above-described embodiment, a performance information prediction system is described in which a performance model that predicts performance information from string vibration waveform data of a stringed instrument such as a guitar 50 is trained, and the performance information is predicted using the trained performance model. However, the present disclosure is not limited to this, and may be applied to a wind instrument. In other words, the present disclosure may be applied to a performance information prediction system in which a performance model that predicts performance information from air vibration waveform data of a wind instrument is trained, and the performance information is predicted using the trained performance model.

以上、本発明の実施例について詳述したが、本発明は上述した特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 Although the examples of the present invention have been described in detail above, the present invention is not limited to the specific embodiments described above, and various modifications and variations are possible within the scope of the gist of the present invention as described in the claims.

以下に、本願出願の当初の特許請求の範囲に記載された発明を付記する。
［付記］
本開示の一態様では、
弦楽器演奏を表す弦振動波形データからスペクトルデータフレームを生成し、前記スペクトルデータフレームに基づいて、スペクトル特徴化データフレームを取得する前処理部と、
訓練済み演奏モデルを利用して、基準時刻のスペクトル特徴化データフレームと、前記基準時刻のスペクトル特徴化データフレームの前後のスペクトル特徴化データフレームとから、前記弦楽器演奏の演奏情報を予測する演奏情報予測部と、
を有する演奏情報予測装置が提供される。 The invention as originally claimed in the present application is set forth below.
[Additional Notes]
In one aspect of the present disclosure,
a pre-processing unit that generates a spectral data frame from string vibration waveform data representing a stringed instrument performance, and obtains a spectral characterization data frame based on the spectral data frame;
a performance information prediction unit that predicts performance information of the stringed instrument performance from a spectral feature data frame at a reference time and spectral feature data frames before and after the spectral feature data frame at the reference time by using a trained performance model;
The present invention provides a performance information prediction device having the above structure.

一実施例では、前記演奏情報は、発音情報、消音情報及びピッチ変更情報から構成されてもよい。 In one embodiment, the performance information may consist of sound generation information, mute information, and pitch change information.

一実施例では前記発音情報は、奏法及びノート番号を含んでもよい。 In one embodiment, the sound information may include playing style and note number.

一実施例では、前記訓練済み演奏モデルは、前記奏法及び前記ノート番号を出力してもよい。 In one embodiment, the trained performance model may output the playing style and the note number.

一実施例では、前記訓練済み演奏モデルは、前記基準時刻のスペクトル特徴化データフレームと、前記基準時刻のスペクトル特徴化データフレームの前の第１の数のスペクトル特徴化データフレームと、前記基準時刻のスペクトル特徴化データフレームの後の第２の数のスペクトル特徴化データフレームとを取得し、前記奏法及び前記ノート番号を出力してもよい。 In one embodiment, the trained performance model may obtain a spectral feature data frame at the reference time, a first number of spectral feature data frames before the spectral feature data frame at the reference time, and a second number of spectral feature data frames after the spectral feature data frame at the reference time, and output the playing style and the note number.

一実施例では、前記訓練済み演奏モデルは、ニューラルネットワークにより実現されてもよい。 In one embodiment, the trained performance model may be realized by a neural network.

一実施例では、前記スペクトル特徴化データフレームは、前記スペクトルデータフレームに含まれる所定数個の上位のピークから構成されてもよい。 In one embodiment, the spectral characterization data frame may consist of a predetermined number of top peaks contained in the spectral data frame.

一実施例では、前記演奏情報は、ＭＩＤＩプロトコルに従って記述されてもよい。 In one embodiment, the performance information may be described according to the MIDI protocol.

本開示の他の態様では、
訓練用演奏情報に従って演奏された弦楽器演奏を表す弦振動波形データからスペクトルデータフレームを生成し、前記スペクトルデータフレームに基づいて、スペクトル特徴化データフレームを取得する前処理部と、
前記訓練用演奏情報を利用して、基準時刻のスペクトル特徴化データフレームと、前記基準時刻のスペクトル特徴化データフレームの前後のスペクトル特徴化データフレームとから前記弦楽器演奏の演奏情報を予測する演奏モデルを訓練する演奏モデル訓練部と、
を有する演奏モデル訓練装置が提供される。 In another aspect of the present disclosure,
a pre-processing unit that generates a spectral data frame from string vibration waveform data representing a stringed instrument performance performed in accordance with training performance information, and obtains a spectral characterization data frame based on the spectral data frame;
a performance model training unit that uses the training performance information to train a performance model that predicts performance information of the stringed instrument performance from a spectral feature data frame at a reference time and spectral feature data frames before and after the spectral feature data frame at the reference time;
A performance model training device having the above structure is provided.

本開示の他の態様では、
電子弦楽器と、
上述した演奏情報予測装置と、
上述した演奏モデル訓練装置と、
を有する演奏情報生成システムが提供される。 In another aspect of the present disclosure,
Electronic string instruments,
The performance information prediction device described above;
The above-mentioned performance model training device;
A performance information generating system having the above structure is provided.

本開示の他の態様では、
１つ以上のプロセッサが、弦楽器演奏を表す弦振動波形データからスペクトルデータフレームを生成し、前記スペクトルデータフレームに基づいて、スペクトル特徴化データフレームを取得するステップと、
前記１つ以上のプロセッサが、訓練済み演奏モデルを利用して、基準時刻のスペクトル特徴化データフレームと、前記基準時刻のスペクトル特徴化データフレームの前後のスペクトル特徴化データフレームとから、前記弦楽器演奏の演奏情報を予測するステップと、
を有する演奏情報予測方法が提供される。 In another aspect of the present disclosure,
generating, by one or more processors, a spectral data frame from string vibration waveform data representative of a stringed instrument performance, and obtaining a spectral characterization data frame based on the spectral data frame;
predicting performance information of the stringed instrument performance from a spectral feature data frame at a reference time and spectral feature data frames before and after the spectral feature data frame at the reference time using a trained performance model;
A performance information prediction method is provided, which has the following steps:

本開示の他の態様では、
１つ以上のプロセッサが、訓練用演奏情報に従って演奏された弦楽器演奏を表す弦振動波形データからスペクトルデータフレームを生成し、前記スペクトルデータフレームに基づいて、スペクトル特徴化データフレームを取得するステップと、
前記１つ以上のプロセッサが、前記訓練用演奏情報を利用して、基準時刻のスペクトル特徴化データフレームと、前記基準時刻のスペクトル特徴化データフレームの前後のスペクトル特徴化データフレームとから前記弦楽器演奏の演奏情報を予測する演奏モデルを訓練するステップと、
を有する演奏モデル訓練方法が提供される。 In another aspect of the present disclosure,
generating, by one or more processors, a spectral data frame from string vibration waveform data representative of a stringed instrument performance performed according to training performance information, and obtaining a spectral characterization data frame based on the spectral data frame;
training a performance model using the training performance information, the one or more processors, to predict performance information of the stringed instrument performance from a spectral feature data frame at a reference time and spectral feature data frames before and after the spectral feature data frame at the reference time;
A performance model training method is provided, which has the following:

１０ギターコントローラ
５０ギター
１００制御装置
２００演奏モデル訓練装置
２１０前処理部
２２０演奏モデル訓練部
３００演奏情報予測装置
３１０前処理部
３２０演奏情報予測部 10 Guitar controller 50 Guitar 100 Control device 200 Performance model training device 210 Pre-processing unit 220 Performance model training unit 300 Performance information prediction device 310 Pre-processing unit 320 Performance information prediction unit

Claims

a pre-processing unit that generates a spectral data frame from string vibration waveform data representing a stringed instrument performance, and obtains a spectral characterization data frame based on the spectral data frame, the spectral characterization data frame being configured from the frequencies of a predetermined number of top peaks included in the spectral data frame ;
a performance information prediction unit that predicts performance information of the stringed instrument performance from the spectral feature data frame at a reference time and the spectral feature data frames at times before and after the reference time by using a trained performance model;
having
The trained performance model obtains the spectral feature data frame at the reference time, a first number of spectral feature data frames before the reference time, and a second number of spectral feature data frames after the reference time, and outputs a playing style and note number of a stringed instrument .

The performance information prediction device according to claim 1, wherein the performance information is composed of sound generation information, sound suppression information, and pitch change information.

3. The performance information prediction device according to claim 2, wherein the sound generation information includes the playing style and the note number.

4. The apparatus according to claim 1, wherein the trained performance model is realized by a neural network.

5. The performance information prediction device according to claim 1, wherein the performance information is described in accordance with a MIDI protocol.

a pre-processing unit that generates a spectral data frame from string vibration waveform data representing a stringed instrument performance performed in accordance with training performance information, and obtains a spectral characterization data frame based on the spectral data frame, the spectral characterization data frame being configured from frequencies of a predetermined number of upper peaks included in the spectral data frame ;
a performance model training unit that uses the training performance information to train a performance model that predicts performance information of the stringed instrument performance from the spectral feature data frame at a reference time and the spectral feature data frames at times before and after the reference time;
A performance model training device having the above structure.

Electronic string instruments,
The performance information prediction device according to any one of claims 1 to 5 ,
A musical performance model training device according to claim 6 ,
A performance information generating system having the above configuration.

one or more processors generating a spectral data frame from string vibration waveform data representing a stringed instrument performance, and obtaining a spectral characterization data frame based on the spectral data frame, the spectral characterization data frame being composed of frequencies of a predetermined number of top peaks included in the spectral data frame ;
predicting performance information of the stringed instrument performance from the spectral feature data frame at a reference time and the spectral feature data frames at times around the reference time using a trained performance model;
having
The trained performance model obtains the spectral feature data frame at the reference time, a first number of spectral feature data frames before the reference time, and a second number of spectral feature data frames after the reference time, and outputs a playing style and note number of a stringed instrument .

one or more processors generating a spectral data frame from string vibration waveform data representing a stringed instrument performance performed according to training performance information, and obtaining a spectral characterization data frame based on the spectral data frame, the spectral characterization data frame being composed of frequencies of a predetermined number of top peaks included in the spectral data frame ;
training a performance model using the training performance information, the one or more processors, to predict performance information of the stringed instrument performance from the spectral feature data frame at a reference time and the spectral feature data frames at times around the reference time;
A performance model training method having the above structure.