JP2910417B2

JP2910417B2 - Voice music discrimination device

Info

Publication number: JP2910417B2
Application number: JP4157717A
Authority: JP
Inventors: 武志則松; 良久中藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-06-17
Filing date: 1992-06-17
Publication date: 1999-06-23
Anticipated expiration: 2014-06-23
Also published as: JPH064088A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音響装置などの前処理
装置として使用される、連続して入力される信号が音声
であるかそれ以外（音楽等）であるかを自動的に判別す
る音声音楽判別装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention automatically determines whether a continuously input signal used as a pre-processing device such as an audio device is a voice or another signal (music or the like). The present invention relates to an audio / music discriminating apparatus.

【０００２】[0002]

【従来の技術】最近のステレオ装置、テレビ等には音の
効果音を創り出すサラウンド等の機能が搭載されてい
る。これらは音楽等のソースに対しては効果が大きい反
面、ニュース番組などの音声主体のものでは逆に明瞭感
が乏しくなってしまう。そこでテレビ、ラジオのソース
が音声主体のものかそれ以外かを自動的に判別すること
ができれば、その結果に応じて音場、周波数特性を最適
に制御することが可能になり大変聞き易くなる。2. Description of the Related Art Recent stereo apparatuses, televisions, and the like are equipped with functions such as surround for creating sound effects. While these are very effective for sources such as music, they are less clear when used mainly for audio such as news programs. Therefore, if it is possible to automatically determine whether the source of the television or radio is mainly of voice or other, it is possible to optimally control the sound field and the frequency characteristics according to the result, and it becomes very easy to hear.

【０００３】従来の音声音楽判別の方法では、入力信号
がステレオ信号であることを利用している。音楽ソース
の場合、左（Ｌ）チャンネルと右（Ｒ）チャンネルの信
号は互いに独立しており、２チャンネル間の相関が低
い。逆にニュース番組などの音声主体のソースはＬ信号
とＲ信号がほとんど同信号であり２チャンネル間の相関
が高い。そこで、Ｌ信号とＲ信号の振幅の差を計算し、
差の大きな時は音楽、小さな時は音声として判別するこ
とが可能である。また、Ｌ信号とＲ信号の相関値を計算
し、相関値の大きな場合は音声、小さな場合は音楽と判
定することもできる。[0003] The conventional method of sound and music discrimination utilizes that an input signal is a stereo signal. In the case of a music source, the left (L) channel and right (R) channel signals are independent of each other, and the correlation between the two channels is low. Conversely, in a source mainly composed of audio such as a news program, the L signal and the R signal are almost the same, and the correlation between the two channels is high. Then, the difference between the amplitudes of the L signal and the R signal is calculated,
When the difference is large, it can be determined as music, and when the difference is small, it can be determined as voice. Further, the correlation value between the L signal and the R signal can be calculated, and when the correlation value is large, it can be determined that the voice is sound, and when the correlation value is small, it can be determined that music.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら上記の音
声音楽判別装置では、ステレオソースについては効果が
あるが、Ｌ信号とＲ信号の差がないモノラルのソースに
対しては判別が行えないといった欠点を有している。However, the above-described audio / music discriminating apparatus is effective for a stereo source, but cannot be discriminated for a monaural source having no difference between an L signal and an R signal. Have.

【０００５】本発明は上記従来の課題を解決するもので
あり、ソースがモノラル信号、ステレオ信号の区別な
く、精度良く音声か音楽かを判別することのできる音声
音楽判別装置を提供することを目的とするものである。An object of the present invention is to provide a voice / music discriminating apparatus which can accurately discriminate between voice and music without discriminating between a monaural signal and a stereo signal. It is assumed that.

【０００６】[0006]

【課題を解決するための手段】上記課題を解決するため
に本発明の音声音楽判別装置は、パワー算出部、有音無
音判定部、判定部を備えた音声音楽判別装置であって、
パワー算出部は、入力信号を一定時間のフレームに区切
ると共に、フレーム毎の音響パワーを算出し、有音無音
判定部は、パワー算出部の算出する音響パワーが音響パ
ワー閾値より大きい場合に有音区間、小さい場合に無音
区間と判定し、判定部は、複数フレーム数毎に第１、第
２の判定を行うと共に、第１及び第２の判定に当てはま
らない場合には、前回の判定を今回の判定とし、第１の
判定は、無音区間と判定したフレーム数が音声閾値より
も大きい場合は、複数フレームを音声と判定し、第２の
判定は、有音区間と判定したフレーム数が音楽閾値より
も大きい場合は、複数フレームを音楽と判定することを
特徴とする。また、本発明の音声音楽判別装置は、パワ
ー算出部、有音無音判定部、判定部を備えた音声音楽判
別装置であって、パワー算出部は、入力信号を一定時間
のフレームに区切ると共に、フレーム毎の音響パワーを
算出し、有音無音判定部は、パワー算出部の算出する音
響パワーが第１音響パワー閾値より大きい場合に有音区
間、第２音響パワー閾値より小さい場合に無音区間と判
定し、判定部は、複数フレーム数毎に第１、第２の判定
を行うと共に、第１及び第２の判定に当てはまらない場
合には、前回の判定を今回の判定とし、第１の判定は、
無音区間と判定したフレーム数が音声閾値よりも大きい
場合は、複数フレームを音声と判定し、第２の判定は、
有音区間と判定したフレーム数が音楽閾値よりも大きい
場合は、複数フレームを音楽と判定することを特徴とす
る。 In order to solve the above-mentioned problems, a voice / music discriminating apparatus according to the present invention comprises a power calculating unit,
A sound determination unit, a voice music determination device including a determination unit,
The power calculator divides the input signal into frames of a fixed time.
And calculate the sound power for each frame,
The determining unit determines that the acoustic power calculated by the power calculating unit is an acoustic power.
Sound threshold if it is greater than the power threshold, and no sound if it is lower
The section is determined, and the determination unit determines the first and second intervals for each of a plurality of frames.
2 and apply to the first and second determinations.
If not, the previous determination is made the current determination and the first
The judgment is that the number of frames judged as a silent section is greater than the audio threshold
Is larger, the plurality of frames are determined to be audio, and the second
The judgment is that the number of frames judged as a sound section is greater than the music threshold
Is large, it is determined that multiple frames are music.
Features. Further, the audio / music discriminating apparatus of the present invention
-Sound / Music judgment with calculation unit, sound / silence judgment unit, judgment unit
In another apparatus, the power calculation unit outputs the input signal for a predetermined time.
And the sound power of each frame
The sound / no-sound determining unit calculates the sound calculated by the power calculating unit.
Sound zone if the sound power is greater than the first sound power threshold
If the sound power is smaller than the second sound power threshold,
And the determining unit performs the first and second determinations for each of a plurality of frames.
And do not apply to the first and second judgments
In this case, the previous determination is the current determination, and the first determination is
The number of frames determined as silent sections is greater than the audio threshold
In this case, a plurality of frames are determined to be audio, and the second determination is
The number of frames judged as a sound section is larger than the music threshold
In this case, a plurality of frames are determined to be music.
You.

【０００７】また、本発明の音声音楽判別装置は、パワ
ー算出部、有音無音判定部、判定部を備えた音声音楽判
別装置であって、パワー算出部は、入力信号を一定時間
のフレームに区切ると共に、フレーム毎の音響パワーを
算出し、有音無音判定部は、パワー算出部の算出する音
響パワーが音響パワー閾値より大きい場合に有音区間、
小さい場合に無音区間と判定し、判定部は、連続する第
１複数フレーム、第２複数フレーム毎に第１、第２の判
定を行うと共に、第１及び第２の判定に当てはまらない
場合には、前回の判定を今回の判定とし、第１の判定
は、第１複数フレームにおいて無音区間と判定したフレ
ーム数が音声閾値よりも大きく、第２複数フレームにお
いて無音区間と判定したフレーム数が音声閾値よりも大
きい場合は、第２複数フレームを音声と判定し、第２の
判定は、第１複数フレームにおいて有音区間と判定した
フレーム数が音楽閾値よりも大きく、第２複数フレーム
において有音区間と判定したフレーム数が音楽閾値より
も大きい場合は、第２複数フレームを音楽と判定するこ
とを特徴とする。また、本発明の音声音楽判別装置は、
パワー算出部、有音無音判定部、判定部を備えた音声音
楽判別装置であって、パワー算出部は、入力信号を一定
時間のフレームに区切ると共に、フレーム毎の音響パワ
ーを算出し、有音無音判定部は、パワー算出部の算出す
る音響パワーが第１音響パワー閾値より大きい場合に有
音区間、第２音響パワー閾値より小さい場合に無音区間
と判定し、判定部は、連続する第１複数フレーム、第２
複数フレーム毎に第１、第２の判定を行うと共に、第１
及び第２の判定に当てはまらない場合には、前回の判定
を今回の判定とし、第１の判定は、第１複数フレームに
おいて無音区間と判定したフレーム数が音声閾値よりも
大きく、第２複数フレームにおいて無音区間と判定した
フレーム数が音声閾値よりも大きい場合は、第２複数フ
レームを音声と判定し、第２の判定は、第１複数フレー
ムにおいて有音区間と判定したフレーム数が音楽閾値よ
りも大きく、第２複数フレームにおいて有音区間と判定
したフレーム数が音楽閾値よりも大きい場合は、第２複
数フレームを音楽と判定することを特徴とする。 Further , the voice / music discriminating apparatus of the present invention
-Sound / Music judgment with calculation unit, sound / silence judgment unit, judgment unit
In another apparatus, the power calculation unit outputs the input signal for a predetermined time.
And the sound power of each frame
The sound / no-sound determining unit calculates the sound calculated by the power calculating unit.
Sound interval when the sound power is greater than the sound power threshold,
If it is smaller, it is determined to be a silent section, and the determination unit
The first and second determinations are made for each of a plurality of frames and a second plurality of frames.
And does not apply to the first and second judgments
In this case, the previous determination is made the current determination and the first determination
Indicates a frame determined to be a silent section in the first plurality of frames.
The number of frames is greater than the audio threshold and
And the number of frames judged as silent sections is larger than the audio threshold
If the second frame is determined to be speech, the second
Judgment was determined to be a sound section in the first plurality of frames.
If the number of frames is greater than the music threshold and the second
The number of frames determined to be sounded in
Is larger, the second plural frames are determined to be music.
And features. In addition, the audio / music discriminating apparatus of the present invention includes:
Power calculation unit, sound / non-speech judgment unit, voice sound with judgment unit
In a music discriminating apparatus, a power calculation unit controls an input signal to be constant.
In addition to dividing into time frames, the sound power of each frame
Is calculated, and the sound / silence determination section calculates
If the sound power is greater than the first sound power threshold.
Sound section, silent section when smaller than the second sound power threshold
And the determination unit determines that the first plurality of consecutive frames and the second
The first and second determinations are made for each of a plurality of frames, and the first
And if not applicable to the second determination, the previous determination
Is the current determination, and the first determination is for the first plurality of frames.
The number of frames determined to be a silent section in
Large, determined to be a silent section in the second multiple frames
If the number of frames is greater than the audio threshold, the second
The frame is determined to be voice, and the second determination is performed by the first multiple frames.
The number of frames determined to be sound in the
Is judged to be a sound section in the second plural frames.
If the number of frames obtained is larger than the music threshold, the second
It is characterized in that several frames are determined to be music.

【０００８】[0008]

【作用】本発明は上記に述べた構成により、入力信号に
対し、連続発声された音声中に必ず無音区間が存在する
が、音楽中にはほとんど存在しないことを利用して、一
定の複数フレ―ム間における有音無音の存在比率により
音声と音楽を判定することにより音声と音楽を高精度に
判別することができ、また判別の難しい場合は前回の判
定結果を保持して一定間隔で結果を出力することによ
り、全体の流れで優勢な側へ判定結果が固定され誤判定
の少ない音声音楽判別装置を提供することができる。According to the present invention, the above-mentioned configuration makes use of the fact that a silence section always exists in continuously uttered speech but hardly exists in music, and a plurality of fixed frames are used for the input signal. -Voice and music can be distinguished with high accuracy by determining voice and music based on the existence ratio of sound and silence between music and music. Is output, the determination result is fixed to the dominant side in the overall flow, and a voice / music determination device with less erroneous determination can be provided.

【０００９】また本発明は、連続して同一の判定結果が
得られた場合にのみ音声音楽の判定結果を出力するよう
にしたことにより、より正確な判定が実現でき、スムー
ズな音声音楽の判定切り替えが可能な音声音楽判別装置
を提供することができる。Further, according to the present invention, a more accurate judgment can be realized by outputting the judgment result of the voice music only when the same judgment result is continuously obtained, so that a smooth judgment of the voice music can be realized. A switchable voice / music discriminating apparatus can be provided.

【００１０】[0010]

【実施例】以下本発明の一実施例の音声音楽判別装置に
ついて図面を参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An audio / music discriminating apparatus according to an embodiment of the present invention will be described below with reference to the drawings.

【００１１】（図１）は本発明の一実施例の音声音楽判
別装置のブロック構成図である。（図１）において、１
は入力信号のパワ−を計算するパワ−算出部、２はパワ
−の閾値と比較することによりフレ―ムの入力信号が有
音であるか無音であるかを判定する有音無音判定部であ
る。３は判定部で、複数フレ―ム間における音声音楽の
判定を行い、前回の判定結果に基づいて現在フレ―ムに
おける判定結果を出力する。また（図２）は判定部３の
動作を説明するための要部フロ―チャ―トである。FIG. 1 is a block diagram of a voice / music discriminating apparatus according to an embodiment of the present invention. (FIG. 1)
Is a power calculation unit for calculating the power of the input signal, and 2 is a sound / silence determination unit for comparing the input signal of the frame with sound or silence by comparing with a threshold value of the power. is there. Numeral 3 denotes a judgment unit for judging voice music between a plurality of frames, and outputting the judgment result in the current frame based on the previous judgment result. FIG. 2 is a main part flowchart for explaining the operation of the judging section 3.

【００１２】次に、上記の一実施例における音声判別装
置の動作を（図１）を用いて詳細に説明する。ここでは
入力信号は音響機器、テレビ等を対象と考えステレオ信
号とする。入力されたステレオ信号のＬ信号、Ｒ信号は
ミキシングされ、Ｌ＋Ｒとしてパワ−算出部１に入力さ
れる。パワ−算出部１では、一定時間（フレ―ム）間隔
毎にその区間の振幅の累積値あるいは平均値をそのフレ
―ムでのパワ−値として算出する。有音無音判定部２で
は得られたパワ−値を用いてフレ―ム毎の有音無音の判
定を下す。ここで現フレ―ムでのパワ−値をＰ、有音無
音判定の閾値をＰ_tとすると、（数１）を満たすときに
有音と判定し、満たさない場合は無音と判定する。Next, the operation of the voice discriminating apparatus in the above embodiment will be described in detail with reference to FIG. Here, the input signal is a stereo signal considering audio equipment, a television, and the like as targets. The input L signal and R signal of the stereo signal are mixed and input to the power calculation unit 1 as L + R. The power calculation unit 1 calculates the cumulative value or the average value of the amplitude of the section at intervals of a predetermined time (frame) as a power value in the frame. The sound / non-speech judging section 2 judges the sound / non-speech of each frame using the obtained power value. Here the current frame - determines if the value P, and the threshold value of the voice activity detection and P _t, determines that sound when satisfying equation (1), if not satisfied with the silence - power in beam.

【００１３】[0013]

【数１】 (Equation 1)

【００１４】また、有音無音の判定は、判定の閾値をＰ
_t1、Ｐ_t2（但し、Ｐ_t2はＰ_t1より大きいものとする。）
の２種類とし、（数２）を満たすとき無音と判定し、
（数３）を満たすとき有音と判定してもよい。[0014] In addition, the sound / silence judgment is made by setting the judgment threshold to P
_t1 , _Pt2 (provided that _Pt2 is larger than _Pt1 )
Is determined as silence when (Equation 2) is satisfied,
When (Equation 3) is satisfied, the sound may be determined.

【００１５】[0015]

【数２】 (Equation 2)

【００１６】[0016]

【数３】 (Equation 3)

【００１７】このフレ―ム毎の有音無音の判定結果を用
いて、判定部３でフレ―ムよりも大きな単位毎の音声音
楽の判別を行う。以下、（図２）のフロ―チャ―トに沿
って判定部３の動作を詳細に説明する。Using the result of the sound / non-speech judgment for each frame, the judgment unit 3 judges the sound music for each unit larger than the frame. Hereinafter, the operation of the determination unit 3 will be described in detail along the flowchart of FIG.

【００１８】以下、音声音楽判定をＦフレ―ム毎に行う
ものとする。この判定の間隔Ｆは、連続して発声された
音声の場合の平均３、４音節が含まれるような値に設定
すればよい。実際には１秒から２秒の間の値に設定すれ
ば音声中に無音部分がほぼある割合で含まれるようにな
り、音声音楽判別の精度を上げることができる。Hereinafter, it is assumed that the voice / music judgment is performed for each F frame. The determination interval F may be set to a value that includes an average of three or four syllables in the case of continuously uttered speech. Actually, if the value is set to a value between 1 second and 2 seconds, a silent portion will be included in the voice at a certain ratio, and the accuracy of voice / music determination can be improved.

【００１９】まず、有音無音判別部２の結果に基づきス
テップ２１、２２において無音であれば無音カウンター
をインクリメントし、有音であれば有音カウンターをイ
ンクリメントする。次にフレ―ム数のカウンターをイン
クリメントし（ステップ２３）、ステップ２４の判定で
Ｆフレ―ム分の有音無音判定が終了している場合は、フ
レ―ムカウンターをリセットする（ステップ２５）。こ
こで音声判定のための閾値をＬ_T、音楽判定のための閾
値をＨ_Tとする。First, based on the result of the sound / non-speech discriminating section 2, in steps 21 and 22, if there is no sound, the silence counter is incremented, and if there is sound, the sound counter is incremented. Next, the counter of the number of frames is incremented (step 23). If the sound / silence determination for the F frame has been completed in the determination of step 24, the frame counter is reset (step 25). . Here, let L _{T be} the threshold value for voice determination, and let H _T be the threshold value for music determination.

【００２０】ステップ２６では無音カウンター値とＬ_T
を比較し、無音カウンターが大きい場合は現時点で入力
信号が音声であると判定し、ステップ２７で判定フラグ
をオンにしてその情報を外部に出力する。無音状態が連
続している場合も無音を音声の一種と考え音声側に判定
している。ここで判定フラグは“１”の時は音声、
“０”の時は音楽とする。ステップ２６で無音カウンタ
ー値の方が小さい場合は、ステップ２８で有音カウンタ
ー値とＨ_Tを比較し、大きい場合は入力信号が音楽であ
ると判定し、ステップ２９で判定フラグをオフにしてそ
の情報を外部に出力する。小さい場合は判定が難しいた
め、既に設定されているフラグの状態をそのまま保持す
る。以降次のフレ―ムの処理に進み同様の処理を繰り返
す。In step 26, the silence counter value and L _T
If the silence counter is large, it is determined that the input signal is a voice at the present time, and in step 27, the determination flag is turned on and the information is output to the outside. Even when the silence state is continuous, the silence is considered to be a kind of voice and is determined on the voice side. Here, when the judgment flag is “1”, the sound is
When it is “0”, it is music. If towards the silence counter value is smaller at step 26, compares the voice counter value and H _T in step 28, if it is greater, it determines that the input signal is music, the turn off determination flag in step 29 Output information to the outside. If it is smaller, it is difficult to determine, so the state of the flag that has already been set is kept as it is. Thereafter, the processing proceeds to the processing of the next frame, and the same processing is repeated.

【００２１】なお判別部３は次のような方法によって構
成することもできる。（図３）は判定部３を別の方法で
実現した場合の動作を説明するための要部フロ―チャ―
トである。ここで、ステップ３０からステップ３５まで
の処理はそれぞれ（図２）のステップ２０からステップ
２５までの処理と同一であるので説明を省略する。また
前回のＦフレ―ムでの判定で音声と判定された場合は音
声フラグがオンに、音楽と判定された場合は音楽フラグ
がオンになっているものとする。The discriminating section 3 can be constructed by the following method. (FIG. 3) is a main part flow chart for explaining the operation when the judgment unit 3 is realized by another method.
It is. Here, the processing from step 30 to step 35 is the same as the processing from step 20 to step 25 in FIG. Also, it is assumed that the sound flag is turned on when it is determined that the sound is a sound in the previous F-frame determination, and that the music flag is turned on when it is determined that the music is the music.

【００２２】まず、ステップ３６で無音カウンターが閾
値Ｌ_Tより大きい場合はステップ３７で前回の判定結果
が音声であったかを判定し、音声の場合はステップ３８
で判定フラグをオンにし音声と判定する。前回の判定結
果が音声でなかった場合は音声フラグのみをオンにし判
定フラグは前回の状態を保持する。ステップ３６の条件
を満たさない場合は、ステップ４０で有音カウンター値
と閾値Ｈ_Tを比較する。この条件を満たすときはステッ
プ４１で前回の判定結果が音楽であったかを判定し、音
楽の場合は判定フラグをオフにし音楽と判定する。前回
の判定結果が音楽でなかった場合はステップ４３で音楽
フラグのみをオンにし、判定フラグの内容はそのまま保
持する。ステップ４０の条件を満たさないときは音声フ
ラグ、音楽フラグともオフにし判定フラグの内容をその
まま保持する。これらの処理によりＦフレ―ム毎の判定
結果が続けて音声あるいは音楽となった場合にのみ判定
フラグの内容が変更されることになる。[0022] First, the previous determination result in step 37 if silence counter is greater than a threshold L _T is determined whether a speech in step 36, if voice Step 38
Then, the determination flag is turned on to determine that the voice is sound. If the previous determination result is not a voice, only the voice flag is turned on, and the determination flag retains the previous state. If the condition is not satisfied in step 36, it compares the voice counter value and the threshold value H _T in step 40. If this condition is satisfied, it is determined in step 41 whether or not the previous determination result is music. If it is music, the determination flag is turned off and the music is determined to be music. If the previous determination result is not music, only the music flag is turned on in step 43, and the content of the determination flag is held as it is. If the condition of step 40 is not satisfied, both the voice flag and the music flag are turned off, and the contents of the determination flag are held as they are. By these processes, the content of the determination flag is changed only when the determination result for each F frame is continuously voice or music.

【００２３】以上のように本実施例によれば、パワ−算
出部１で得られたフレ―ム毎のパワ−値から有音無音判
定部２でそのフレ―ムが有音か無音かを判定し、判定部
３でＦフレ―ム毎に音声中の無音の出現頻度に基づき有
音、無音のそれぞれの存在比率により音声か音楽かを判
定し、どちらとも判定できない場合は前回のＦフレ―ム
での結果を保持するように構成したことにより、連続し
て入力される信号に対して精度良く音声音楽が判定する
ことができ、しかも判定が難しい部分が存在しても全体
の流れで優勢な側に判定結果が固定されるため、音声音
楽の判定が短時間おきに変わるいわゆるチャタリングの
ような現象を防止することができる。As described above, according to this embodiment, the sound / non-speech judging unit 2 determines whether the frame is sound or non-sound from the power value for each frame obtained by the power calculating unit 1. Then, the judgment unit 3 judges whether the sound is voice or music based on the frequency of occurrence of silence in the sound for each F frame based on the frequency of occurrence of silence in the sound. If neither can be judged, the previous F frame is used. -The system is configured to hold the results of the audio and music, so that the voice and music can be determined with high accuracy for the continuously input signal. Since the determination result is fixed to the dominant side, it is possible to prevent a phenomenon such as so-called chattering in which the determination of voice music changes every short time.

【００２４】また判定部３を、Ｆフレ―ム毎の判定結果
が２回続けて音声か音楽に判定された場合に判定結果を
変更し、それ以外の場合は前の判定結果をそのまま保持
するようにしたことにより、音声音楽判定の精度がさら
に高くなりスムーズに判定の切り替えを行うことができ
る。The determination unit 3 changes the determination result when the determination result for each F frame is determined to be voice or music twice consecutively, and otherwise retains the previous determination result. By doing so, the accuracy of the voice / music determination is further increased, and the determination can be switched smoothly.

【００２５】[0025]

【発明の効果】以上のように本発明によれば、フレ―ム
毎のパワ−を算出するパワ−算出部と、フレ―ム毎のパ
ワ−値と予め設定したパワ−閾値とを比較することによ
り有音か無音かを判定する有音無音判定部と、複数フレ
―ム毎に有音無音の出現比率からそれぞれの閾値と比較
することにより音声であるか音楽であるかを判定し、判
定の不確かな場合は前回の複数フレ―ムでの判定結果を
保持する判定部とを備えたことにより、短時間の遅れで
音声と音楽とを精度良く連続的に判定することができ、
判定の難しい部分が部分的に存在しても安定した判定結
果を出力することができる音声音楽判別装置を提供する
ことができる。As described above, according to the present invention, a power calculation unit for calculating power for each frame, and a power value for each frame are compared with a preset power threshold. A sound / silence determining unit that determines whether the sound is sound or not, and determines whether the sound is music or not by comparing the appearance ratio of sound / silence for each of a plurality of frames with each threshold. In the case where the judgment is uncertain, by providing the judgment unit which holds the judgment results of the previous plurality of frames, it is possible to judge the sound and the music continuously with a short delay, and
It is possible to provide a voice / music discriminating apparatus that can output a stable judgment result even when a part that is difficult to judge is partially present.

【００２６】また、複数フレ―ム毎の判定で続けて音声
か音楽に判定された場合にのみ判定結果を変更し、それ
以外の場合は前回の判定時点での判定結果をそのまま保
持する判定部を備えることにより、判別精度をさらに向
上させることができスムーズな判定切り替えが行える音
声音楽判別装置を提供することができる。The determination unit changes the determination result only when it is determined that the voice or music is successively determined in each of a plurality of frames, and otherwise, retains the determination result at the time of the previous determination as it is. By providing the audio / music discriminating apparatus, it is possible to further improve the discrimination accuracy and to smoothly perform the discrimination switching.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の一実施例の音声音楽判別装置のブロッ
ク構成図FIG. 1 is a block diagram of a voice / music discriminating apparatus according to an embodiment of the present invention;

【図２】本実施例の判定部の動作を説明するための要部
フロ―チャ―トFIG. 2 is a main part flowchart for explaining the operation of a determination unit of the embodiment;

【図３】本実施例の判定部の他の構成を説明するための
要部フロ―チャ―トFIG. 3 is a main part flowchart for explaining another configuration of the determination unit of the embodiment.

[Explanation of symbols]

１パワ−算出部２有音無音判定部３判定部 Reference Signs List 1 power calculation unit 2 sound / non-speech judgment unit 3 judgment unit

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭49−130607（ＪＰ，Ａ) 特開昭58−208796（ＪＰ，Ａ) 特開昭61−94098（ＪＰ，Ａ) 特開平２−267599（ＪＰ，Ａ) 特開平５−113797（ＪＰ，Ａ) 特開平４−359298（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 G10L 3/00 513 G10L 3/00 531 G10L 9/00 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-49-130607 (JP, A) JP-A-58-208796 (JP, A) JP-A-61-94098 (JP, A) JP-A-2- 267599 (JP, A) JP-A-5-113797 (JP, A) JP-A-4-359298 (JP, A) (58) Fields investigated (Int. Cl. ⁶ , DB name) G10L 3/00 G10L 3 / 00 513 G10L 3/00 531 G10L 9/00

Claims

(57) [Claims]

1. A power calculation unit (1), a sound / no-sound determination unit
(2) A voice / music discriminating apparatus including a judging unit (3).
The power calculation unit (1) converts the input signal into a frame
And the sound power of each frame is calculated, and the sound / non-speech determination unit (2) calculates the sound power of the power calculation unit (1).
If the sound power is greater than the sound power threshold,
If the interval is small, the section is determined to be a silent section, and the determination section (3) performs first and second determinations for each of a plurality of frames.
And do not apply to the first and second judgments
In this case, the previous determination is the current determination, and the first determination is that the number of frames determined to be a silent section is equal to the audio threshold.
If greater than the value, a plurality of frames is determined that speech, the second determination, the number of frames is determined that sound period music threshold
If it is larger than the value, multiple frames are determined to be music
Voice music discrimination device.

2. A power calculation unit (1), a sound / non-speech determination unit.
(2) A voice / music discriminating apparatus including a judging unit (3).
The power calculation unit (1) converts the input signal into a frame
And the sound power of each frame is calculated, and the sound / non-speech determination unit (2) calculates the sound power of the power calculation unit (1).
If the sound power is greater than the first sound power threshold.
Sound section, silent section when smaller than the second sound power threshold
And the determination unit (3) performs the first and second determinations for each of a plurality of frames.
And do not apply to the first and second judgments
In this case, the previous determination is the current determination, and the first determination is that the number of frames determined to be a silent section is equal to the audio threshold.
If greater than the value, a plurality of frames is determined that speech, the second determination, the number of frames is determined that sound period music threshold
If it is larger than the value, multiple frames are determined to be music
Voice music discrimination device.

3. A power calculation section (1), a sound / non-sound determination section.
(2) A voice / music discriminating apparatus including a judging unit (3).
The power calculation unit (1) converts the input signal into a frame
And the sound power of each frame is calculated, and the sound / non-speech determination unit (2) calculates the sound power of the power calculation unit (1).
If the sound power is greater than the sound power threshold,
If the interval is small, it is determined to be a silent section, and the determination unit (3) determines that the first plurality of frames and the second plurality
The first and second determinations are made for each frame, and the first and second determinations are made.
If the second judgment does not apply, the previous judgment is
The times of the determination, the first determination, a silent interval and determine the first plurality of frames
Number of frames is greater than the audio threshold and the second
The number of frames determined to be silent in the
If the value is larger than the value, the second multiple frames are determined to be audio
However , the second determination is made as a voiced section in the first plurality of frames.
Number of frames is greater than the music threshold and the second
The number of frames determined to be sounded in the
If the value is larger than the value, the second multiple frames are determined to be music
Voice and music discriminating device.

4. A power calculation section (1), a sound / non-sound determination section.
(2) A voice / music discriminating apparatus including a judging unit (3).
The power calculation unit (1) converts the input signal into a frame
And the sound power of each frame is calculated, and the sound / non-speech determination unit (2) calculates the sound power of the power calculation unit (1).
If the sound power is greater than the first sound power threshold.
Sound section, silent section when smaller than the second sound power threshold
The determination unit (3) determines that the first plurality of frames and the second plurality
The first and second determinations are made for each frame, and the first and second determinations are made.
If the second judgment does not apply, the previous judgment is
The times of the determination, the first determination, a silent interval and determine the first plurality of frames
Number of frames is greater than the audio threshold and the second
The number of frames determined to be silent in the
If the value is larger than the value, the second multiple frames are determined to be audio
However , the second determination is made as a voiced section in the first plurality of frames.
Boss was the number of frames is greater than the music threshold, the second plurality off
The number of frames determined to be sounded in the
If the value is larger than the value, the second multiple frames are determined to be music
Voice and music discriminating device.