TWI397901B - Method for controlling a particular loudness characteristic of an audio signal, and apparatus and computer program associated therewith - Google Patents
Method for controlling a particular loudness characteristic of an audio signal, and apparatus and computer program associated therewith Download PDFInfo
- Publication number
- TWI397901B TWI397901B TW94138593A TW94138593A TWI397901B TW I397901 B TWI397901 B TW I397901B TW 94138593 A TW94138593 A TW 94138593A TW 94138593 A TW94138593 A TW 94138593A TW I397901 B TWI397901 B TW I397901B
- Authority
- TW
- Taiwan
- Prior art keywords
- loudness
- audio signal
- specific loudness
- audio
- target specific
- Prior art date
Links
Landscapes
- Tone Control, Compression And Expansion, Limiting Amplitude (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Control Of Amplification And Gain Control (AREA)
Description
本發明係關於音訊信號處理。尤其是,本發明係關於一音訊信號之感知聲音響度及/或感知頻譜平衡的量測與控制。本發明可被使用於,例如,在音訊重播環境中之一種或多種:響度補償音量控制、自動增益控制、動態範圍控制(其包含,例如,限制器、壓縮器、擴展器、等等)、動態等化、以及背景雜訊干擾補償。本發明除了包含方法之外,亦同時包含對應的電腦程式和裝置。The present invention relates to audio signal processing. In particular, the present invention relates to the measurement and control of perceived acoustics and/or perceived spectral balance of an audio signal. The present invention can be used, for example, in one or more of an audio replay environment: loudness compensated volume control, automatic gain control, dynamic range control (which includes, for example, limiters, compressors, expanders, etc.), Dynamic equalization, and background noise interference compensation. In addition to the method, the present invention also includes corresponding computer programs and devices.
目前已有許多嘗試著產生令人滿意之客觀量測響度方法。弗萊徹(Fletcher)和馬森(Munson)在1933年確定人類聽力對於低和高頻率比對於中間(或語音)頻率較不敏感。他們同時也發現在聲音位準增加時,靈敏性相對地減少。一種早期的響度量測器是以麥克風、放大器、量測器以及濾波器之組合所構成,並被設計而粗略地模仿聽力於低、中和高聲音位準之頻率響應。There have been many attempts to produce a satisfactory objective measure of loudness. In 1933, Fletcher and Munson determined that human hearing was less sensitive to low and high frequencies than to intermediate (or speech) frequencies. They also found that the sensitivity is relatively reduced as the sound level increases. An early loudness metric was constructed with a combination of microphones, amplifiers, gauges, and filters, and was designed to roughly mimic the frequency response of hearing low, medium, and high sound levels.
雖然此裝置可提供單一種響度、常數位準、被隔離音調的量測,更複雜之聲音量測並不適當地匹配主觀的響度感覺。這型式的聲音位準量測器已被標準化,但僅被使用於特定應用,例如,工業雜訊之監控和控制。While this device provides a single loudness, constant level, and isolated tone measurement, more complex sound measurements do not properly match the subjective loudness perception. This type of sound level gauge has been standardized but is only used for specific applications, such as monitoring and control of industrial noise.
在1950年代早期,茨維克(Zwicker)和史蒂芬斯(Stevens),以及其他人,將弗萊徹(Fletcher)和馬森(Munson)的研究延伸,並產生一種更真實的響度感知處理程序模式。史蒂芬斯(Stevens)將一種“複雜雜訊響度計算”的方法於1956年公佈在美國聲學協會期刊上,並且茨維克(Zwicker)將他的“響度心理性和方法性的成分”之文章於1958年公佈在Acoustica期刊上。在1959年茨維克(Zwicker)公佈一種響度計算圖示化步驟,並且緊接著公佈許多相似文章。Stevens和Zwicker之方法分別地被標準化為ISO 532,A和B部份。兩種方法皆包含相似之步驟。In the early 1950s, Zwicker and Stevens, among others, extended the research of Fletcher and Munson and produced a more realistic loudness-aware processor pattern. . Stevens published a "complex noise loudness calculation" method in the journal of the American Acoustics Association in 1956, and Zwicker wrote his "Psychology and Methodological Components" Published in the journal Acoustica in 1958. In 1959, Zwicker published a graphical representation of loudness calculations, followed by many similar articles. The methods of Stevens and Zwicker were standardized as ISO 532, Part A and Part B, respectively. Both methods contain similar steps.
首先,沿著內耳之基底膜的時變分配能量,被稱為激勵,利用將音訊傳輸經由具有在臨界頻帶率尺度平均分配的中心頻率之一群集的帶通聽覺濾波器而被模擬。各個聽覺的濾波器被設計以模擬沿著內耳基底膜在特定位置的頻率響應,其中濾波器之中心頻率對應至該位置。臨界頻帶寬度被定義為一個此種濾波器的頻寬。以赫茲單位作為量測,這些聽覺濾波器的臨界頻帶寬度隨著中心頻率增加。因此可定義一組被抝曲的頻率尺度以使所有依據此抝曲尺度所量測的聽覺濾波器之臨界頻帶寬度是常數。此類被抝曲尺度被稱為臨界頻帶率尺度並且對於了解和模擬廣泛範圍的心理聽覺學現象是非常有用的。請參看,例如,心理聽覺學-事實和模型,其由E.Zwicker和H.Fastl,Springer-Verlag,Berlin,於1990年所發表。Stevens和Zwicker採用被稱為Bark刻度之一種臨界頻帶率尺度的方法,其中臨界頻帶寬度在500Hz之下是常數並且在500Hz以上則增加。最近,Moore和Glasberg定義一種臨界頻帶率尺度,其被稱為等效矩形頻寬(ERB)尺度(B.C.J.Moore、B.Glasberg、T.Baer,“臨限、響度、以及部份響度的預測模式”,其刊登於1997年4月之音訊工程協會期刊,第145卷,編號4,224-240頁)。利用使用動態雜訊遮罩之心理聽覺學試驗,Moore和Glasberg展示臨界頻帶寬度在500Hz之下繼續地減少,相對地在Bark尺度中的臨界頻帶寬度則保持不變。First, the time-distributed energy along the basement membrane of the inner ear, referred to as excitation, is modeled by passing the audio transmission through a bandpass auditory filter clustered with one of the center frequencies evenly distributed at the critical band rate scale. Each of the auditory filters is designed to simulate a frequency response at a particular location along the inner ear basement membrane, where the center frequency of the filter corresponds to that location. The critical band width is defined as the bandwidth of one such filter. Taking the Hertz unit as a measure, the critical band width of these auditory filters increases with the center frequency. Thus a set of distorted frequency scales can be defined such that the critical bandwidth of all auditory filters measured according to this skew scale is constant. Such a tortuous scale is known as the critical band rate scale and is very useful for understanding and simulating a wide range of psychoacoustic phenomena. See, for example, Psycho-Acoustics - Facts and Models, published by E. Zwicker and H. Fastl, Springer-Verlag, Berlin, 1990. Stevens and Zwicker use a method called the critical band rate scale of the Bark scale, where the critical band width is constant below 500 Hz and increases above 500 Hz. Recently, Moore and Glasberg defined a critical band rate scale called the equivalent rectangular bandwidth (ERB) scale (BCJ Moore, B. Glasberg, T. Baer, "predictive mode for threshold, loudness, and partial loudness". , published in the April 1997 issue of the Journal of the Journal of Audio Engineering, Vol. 145, No. 4, pp. 224-240. Using psychoacoustic experiments using dynamic noise masks, Moore and Glasberg show that the critical band width continues to decrease below 500 Hz, while the critical band width in the Bark scale remains relatively unchanged.
激勵計算的後面是一種非線性的壓縮性函數,其產生被稱為“特定響度”的一種數量。特定響度是頻率和時間函數之感知響度之量測並且可依據每單位頻率之感知響度而沿著臨界頻帶率尺度作為量測單位,如上面所討論的Bark或ERB尺度。最後,時變的“總計響度”將利用在頻率域積分特定響度而被計算。當特定響度依據沿著臨界頻帶率尺度被平均分佈的一組限定聽覺濾波器而被評估時,總計響度將可簡單地利用來自各濾波器的特定響度相加而被計算。The excitation calculation is followed by a non-linear compressive function that produces a quantity called "specific loudness." The specific loudness is a measure of the perceived loudness of the frequency and time functions and can be used as a unit of measurement along the critical band rate scale, depending on the perceived loudness per unit frequency, such as the Bark or ERB scale discussed above. Finally, the time-varying "total loudness" will be calculated by integrating a specific loudness in the frequency domain. When a particular loudness is evaluated based on a set of defined auditory filters that are evenly distributed along a critical band rate scale, the total loudness will simply be calculated using the particular loudness additions from each filter.
響度可以口方(phon)作為量測單位。以口方測量的聲音響度是具有主觀響度等於該聲音的1kHz音調之聲音壓力位準(SPL)。照慣例,SPL的參考0dB是2x10- 5 (帕司卡)Pascal(壓力單位)的均方根壓力,並且因此,這同時也是參考0口方。使用這定義比較頻率在1kHz之外並具有響度為1kHz的音調響度,可提供所給予口方位準之相等響度的等高線。第11圖展示相等響度等高線在20Hz和12.5kHz之間的頻率,以及在4.2口方(被認為是聽力臨限)和120口方(ISO 226:1087(E),“聽覺正常相等響度位準等高線”)之間的口方位準。該口方量測考慮到人類聽力隨著頻率變化的靈敏性,但是其結果不允許具有變化位準的聲音之相對主觀響度的評估,因為並未試圖隨著SPL之非線性成長而更正響度,亦即,等高線之間隔變化的情況。The loudness can be measured by the phon. The acoustic level measured at the mouth is a sound pressure level (SPL) with a subjective loudness equal to the 1 kHz tone of the sound. Conventionally, the reference 0 dB of the SPL is the root mean square pressure of 2x10 - 5 (pascal) Pascal (pressure unit), and therefore, this is also the reference 0 port. Using this definition, the comparison of the frequency below 1 kHz and having a pitch loudness of 1 kHz provides a contour for the equal loudness of the given mouth orientation. Figure 11 shows the frequency of the equal loudness contour between 20 Hz and 12.5 kHz, and at the 4.2 square (considered hearing tolerance) and 120 square (ISO 226:1087 (E), "hearing normal equal loudness level The position of the mouth between the contour lines is). The oral measurement takes into account the sensitivity of human hearing as a function of frequency, but the results do not allow for the assessment of the relative subjective loudness of a sound with varying levels, as no attempt is made to correct the loudness as the SPL grows nonlinearly, That is, the interval of the contour lines changes.
響度同時也可被以“宋(sone---響度單位)”為單位量測。在口方單位和宋單位之間有一對一的映射,如第11圖所指示。一個宋被定義為一組40dB(SPL)1kHz純正弦波形之響度並且等於40口方。該宋單位被定義為宋的兩倍增加對應至感知響度的兩倍增加。例如,4宋被感知為2個宋的兩倍大的聲量。因此,將響度位準以宋表示,將可提供較多資訊。將特定響度定義為頻率和時間函數之感知響度之量測,特定響度可以每單位頻率的宋為量測單位。因此,當使用Bark刻度時,特定響度是以每Bark的宋為單位,並且類似地當使用ERB刻度時,則以每ERB的宋為單位。Loudness can also be measured in units of "song (- loudness unit)". There is a one-to-one mapping between the oral unit and the Song unit, as indicated in Figure 11. A Song is defined as the loudness of a set of 40dB (SPL) 1kHz pure sinusoidal waveforms and is equal to 40 squares. The Song unit is defined as twice the increase in Song corresponding to a twofold increase in perceived loudness. For example, 4 Song is perceived as twice as loud as 2 Songs. Therefore, the loudness level will be expressed in Song and will provide more information. The specific loudness is defined as the measure of the perceived loudness of the frequency and time functions, and the specific loudness can be measured in units of Song per unit frequency. Therefore, when using the Bark scale, the specific loudness is in units of Song per Bark, and similarly when using the ERB scale, it is in units of Song per ERB.
如上所述,人耳靈敏性隨著頻率和位準變化,如心理聽覺學文獻所詳記的現象。結果,一組給予的聲音之感知頻譜或音色隨著聽到聲音時的聲學位準而變化。例如,一組包含低、中與高頻率的聲音,此類頻率成分之感知相對比例隨著聲音的全面響度改變;在較小聲時,低和高頻率成分聲音之相對於中頻率是比大聲時較小聲。這現象是習知的並且在聲音複製設備中利用所謂的響度控制而被減輕。響度控制是在音量被轉小時施加低頻率並且有時也施加高頻率拉升的音量控制。因此,人耳在極端頻率之較低靈敏性利用那些頻率的人工拉升而獲得補償。此類控制的完成是完全靜態的;所施加的補償度是音量控制設定或一些使用者操作控制的一組函數,而不是音訊信號內容之函數。As noted above, human ear sensitivity varies with frequency and level, as is noted in the psychoacoustic literature. As a result, the perceived spectrum or timbre of a given set of sounds varies with the acoustic level at which the sound is heard. For example, a group of sounds containing low, medium, and high frequencies, the perceived relative proportion of such frequency components changes with the overall loudness of the sound; in the case of smaller sounds, the sounds of the low and high frequency components are larger than the medium frequency. The sound is louder. This phenomenon is conventional and is alleviated in sound reproduction equipment using so-called loudness control. Loudness control is a volume control that applies a low frequency when the volume is turned down and sometimes also applies a high frequency pull. Therefore, the lower sensitivity of the human ear at extreme frequencies is compensated by the artificial pull-up of those frequencies. The completion of such control is completely static; the degree of compensation applied is a set of functions for volume control settings or some user operation control, rather than a function of the content of the audio signal.
實際上,在低、中和高頻率之間的感知相對頻譜平衡改變取決於信號,尤其是取決於信號之實際頻譜以及信號是否該是大聲或小聲。考慮交響樂團之錄音。以一位聽眾在音樂廳所聽到之相同位準複製,跨越頻譜之平衡可能是正確的,無論交響樂團的演奏是大聲或小聲。例如,如果音樂被以10dB較小聲複製,跨越頻譜之感知平衡在較大聲的一段以一種方式改變並且在較小聲的一段以另一方式改變。習見的靜態響度控制不以音樂的函數而施加不同的補償。在國際專利申請序號PCT/US 2004/016964案,其建檔於2004年5月27日,而於2004年12月23日被公佈在WO 2004/111994 A2,Seefeld等人之揭示,量測且調整一組音訊信號之感知響度的系統。該PCT申請,其指定於美國,且整體地於此配合參考。在該申請中,一心理聽覺學模式以感知單位計算一組音訊信號響度。此外,該申請引介計算一種寬頻帶相乘增益之技術,當其被施加至音訊時,產生大致地和標準響度相同之一組增益-被修改音訊之響度。但是,這種寬頻帶增益之應用,改變音訊之感知頻譜平衡。In fact, the perceived relative spectral balance change between low, medium and high frequencies depends on the signal, especially depending on the actual spectrum of the signal and whether the signal should be loud or whisper. Consider the recording of the symphony orchestra. The balance across the spectrum may be correct with the same level of information heard by an audience in the concert hall, whether the symphony orchestra is loud or whispered. For example, if the music is reproduced with a 10 dB smaller sound, the perceived balance across the spectrum changes in one way over a larger segment and in another in a smaller segment. The static loudness control that is conventionally seen does not impose different compensations as a function of music. In the international patent application serial number PCT/US 2004/016964, the file was filed on May 27, 2004, and was published on December 23, 2004 in WO 2004/111994 A2, disclosed by Seefeld et al. A system that adjusts the perceived loudness of a set of audio signals. This PCT application, which is assigned to the United States, is hereby incorporated by reference in its entirety. In this application, a psychoacoustic mode calculates a set of audio signal loudness in units of perception. In addition, the application introduces a technique for calculating a broadband multiplicative gain that, when applied to an audio, produces a set of gains that are substantially the same as the standard loudness - the loudness of the modified audio. However, the application of this wideband gain changes the perceived spectral balance of the audio.
於本發明之一論點,本發明提供利用修改音訊信號以便減少在其之特定響度和目標特定響度之間的差異而推導可使用以控制音訊信號之特定響度的資訊方法。特定響度是頻率和時間函數之感知響度之量測。在實際的製作中,被修改音訊信號之特定響度可被處理至近似目標特定響度。近似度不僅被一般信號處理之考量所影響,同時也被可被用於修改之時間及/或頻率平滑化所影響,如被說明於下。In one aspect of the present invention, the present invention provides an information method for deriving a particular loudness that can be used to control an audio signal by modifying the audio signal to reduce the difference between its particular loudness and target specific loudness. The specific loudness is a measure of the perceived loudness of the frequency and time functions. In actual production, the specific loudness of the modified audio signal can be processed to approximate the target specific loudness. The approximation is not only affected by general signal processing considerations, but also by the time and/or frequency smoothing that can be used for modification, as explained below.
因為特定響度是頻率和時間函數之音訊信號感知響度之量測,為了減少在音訊信號之特定響度和目標特定響度之間的差異,修改時可依據頻率之函數而修改音訊信號。雖然在一些情況中目標特定響度可能是非時變並且音訊信號本身可能是一組穩態非時變信號,一般而言,修改同時也可依據時間之函數而修改音訊信號。Since the specific loudness is a measure of the perceived loudness of the audio signal as a function of frequency and time, in order to reduce the difference between the specific loudness of the audio signal and the target specific loudness, the audio signal can be modified as a function of frequency. While in some cases the target specific loudness may be time-invariant and the audio signal itself may be a set of steady-state time-invariant signals, in general, the modification may also modify the audio signal as a function of time.
本發明之論點同時也可被採用在音訊重播環境中補償背景雜訊干擾。當音訊在背景雜訊存在被聽到時,雜訊可能部份地或完全地以依據音訊位準和頻譜以及雜訊位準和頻譜的一種方式遮罩音訊。因而產生音訊感知頻譜的變更。依據心理聽覺學的研究(請參考,例如,Moore、Glasberg、和Baer,“臨限、響度、以及部份響度之預測模式”,1997年4月,音訊工程協會期刊,45卷,編號4),音訊之“部份特定響度”可被定義為在第二干擾聲音信號,例如,雜訊,存在時的音訊之感知響度。The arguments of the present invention can also be employed to compensate for background noise interference in an audio replay environment. When the audio is heard in the background noise, the noise may partially or completely mask the audio in a manner that depends on the audio level and the spectrum, as well as the noise level and spectrum. This produces a change in the audio-aware spectrum. Based on psychoacoustic research (please refer, for example, Moore, Glasberg, and Baer, “Predicting Modes for Threshold, Loudness, and Partial Loudness”, April 1997, Journal of the Audio Engineering Society, Volume 45, No. 4) The "partial specific loudness" of the audio can be defined as the perceived loudness of the audio in the presence of the second interfering sound signal, for example, noise.
因此,在本發明之另一論點中,本發明提供利用修改音訊信號以便減少在其之部份特定響度和目標特定響度之間的差異而推導可使用以控制該音訊信號之部份特定響度資訊的方法。因而以一種感知精確的方式以減輕雜訊所造成之影響。在本發明這以及其他考慮干擾雜訊信號論點中,將假設可單獨地觸取音訊信號並且單獨地觸取第二干擾信號。Accordingly, in another aspect of the present invention, the present invention provides for utilizing a modified audio signal to reduce a difference between a portion of its specific loudness and a target specific loudness to derive a portion of the specific loudness information that can be used to control the audio signal. Methods. Therefore, in a perceptually accurate way to mitigate the effects of noise. In this and other considerations of the interference noise signal of the present invention, it will be assumed that the audio signal can be individually touched and the second interference signal can be individually accessed.
在另一論點中,本發明提供利用修改音訊信號以便減少在其之特定響度以及一目標特定響度之間的差異而控制音訊信號之特定響度的方法。In another aspect, the present invention provides a method of controlling the specific loudness of an audio signal by modifying the audio signal to reduce the difference between its particular loudness and a target specific loudness.
在另一論點中,本發明提供利用修改音訊信號以便減少在其之部份特定響度以及一目標特定響度之間的差異以控制音訊信號之部份特定響度的方法。In another aspect, the present invention provides a method for modifying a portion of a particular loudness of an audio signal by modifying the audio signal to reduce the difference between a portion of its specific loudness and a target specific loudness.
當目標特定響度不是音訊信號之一種函數時,其可能是儲存或接收的目標特定響度。當目標特定響度不是音訊信號之一種函數時,修改或推導可明確地或隱含地計算特定響度或部份特定響度。隱含地計算之範例包含,一對照表或一“封閉形式”數學表示式,於其中,特定響度及/或部份特定響度被固有地決定(封閉形式描述一數學表示式,其可利用有限數目的標準數學運算和函數,例如,指數和餘弦,而準確地被表示)。同時當目標特定響度不是音訊信號的一種函數時,目標特定響度可能是非時變的且非頻變的或其可能僅是非頻變。When the target specific loudness is not a function of the audio signal, it may be the target specific loudness stored or received. When the target specific loudness is not a function of the audio signal, the modification or derivation may explicitly or implicitly calculate a particular loudness or a portion of a particular loudness. An implicitly calculated example includes a look-up table or a "closed form" mathematical expression in which a particular loudness and/or a portion of a particular loudness is inherently determined (a closed form describes a mathematical representation that is available for limited use) The number of standard mathematical operations and functions, such as exponents and cosines, are accurately represented). While the target specific loudness is not a function of the audio signal, the target specific loudness may be time-invariant and non-frequency-variant or it may only be non-frequency varying.
在另一論點中,本發明提供利用依據一個或更多處理程序以及一個或多個處理程序而控制參數處理音訊信號或音訊信號之量測以產生一種目標特定響度之音訊信號的處理。雖然目標特定響度可能是非時變(“固定的”),目標特定響度可有利地是音訊信號之特定響度的一組函數。雖然目標特定響度可能是靜態、非頻變且非時變的信號,一般而言,音訊信號本身是頻變且時變,因此導致當它是音訊信號的一組函數時,目標特定響度則為頻變和時變。In another aspect, the present invention provides a process for controlling the measurement of an audio signal or an audio signal by a parameter in accordance with one or more processing programs and one or more processing programs to produce a target specific loudness audio signal. While the target specific loudness may be time-invariant ("fixed"), the target specific loudness may advantageously be a function of a particular loudness of the audio signal. Although the target specific loudness may be a static, non-frequency-changing and non-time-varying signal, in general, the audio signal itself is frequency-variant and time-varying, so that when it is a set of functions of the audio signal, the target specific loudness is Frequency change and time change.
音訊和目標特定響度或目標特定響度的表示可來自傳輸或儲存媒體之複製。The representation of the audio and target specific loudness or target specific loudness may be from a copy of the transmission or storage medium.
目標特定響度的表示可能是一個或多個尺度調整,其調整音訊信號或音訊信號之量測。The representation of the target specific loudness may be one or more scale adjustments that adjust the measurement of the audio signal or the audio signal.
本發明任何上述論點的目標特定響度可以是該音訊信號或該音訊信號之量測函數。音訊信號的特定響度是音訊信號的一種適當量測。該音訊信號或該音訊信號之量測的函數可以是該音訊信號或該音訊信號之量測之一個尺度調整。例如,尺度調整可以是一個尺度調整或多個尺度調整的組合:(a)一時間和頻率-變化尺度係數Ξ[b,t]以下列關係達成特定響度之尺度調整
於時間和頻率-變化尺度係數的情況(a)中,尺度調整可至少部分地利用所需的多頻帶響度和音訊信號之多頻帶響度的比率而決定。此類的尺度調整可被使用為一種動態範圍控制。採用本發明作為一種動態範圍控制之論點的進一步細節被說明於下。In case (a) of the time and frequency-change scale factor, the scale adjustment can be determined, at least in part, by the ratio of the desired multi-band loudness and the multi-band loudness of the audio signal. Scale adjustments of this type can be used as a dynamic range control. Further details of the use of the present invention as a dynamic range control argument are set forth below.
同時在時間和頻率-變化尺度係數的情況(a)中,特定響度可依據所需頻譜形狀的量測和音訊信號之頻譜形狀的量測之比率而被尺度調整。此類的尺度調整可被採用以將音訊信號之感知頻譜從時變感知頻譜轉換至大致地非時變感知頻譜。當特定響度按照所需的頻譜形狀之量測和音訊信號的頻譜形狀之量測的比率被尺度調整時,此類的尺度調整可被使用為一組動態等化器。採用本發明作為動態等化器之論點將進一步地被詳細說明於下。At the same time, in the case of time and frequency-variation scale coefficients (a), the specific loudness can be scaled according to the ratio of the measurement of the desired spectral shape to the measurement of the spectral shape of the audio signal. Such scale adjustments can be employed to convert the perceived spectrum of the audio signal from a time varying perceptual spectrum to a substantially non-time varying perceptual spectrum. Such scale adjustments can be used as a set of dynamic equalizers when the specific loudness is scaled by the ratio of the measured spectral shape and the measured spectral shape of the audio signal. The arguments for using the present invention as a dynamic equalizer will be further described in detail below.
於時變、非頻變尺度係數的情況(b)中,尺度調整可根據所需的寬頻帶響度和音訊信號之寬頻帶響度的比率至少部分地被決定。此類的尺度調整可被使用作為一種自動增益控制或動態範圍控制。採用本發明作為一自動增益控制或一動態範圍控制之論點將被進一步地說明於下。In the case of time-varying, non-frequency-varying scale coefficients (b), the scale adjustment can be determined at least in part according to the ratio of the desired wide-band loudness and the wide-band loudness of the audio signal. Such scale adjustments can be used as an automatic gain control or dynamic range control. The argument that the invention is used as an automatic gain control or a dynamic range control will be further described below.
在情況(a)(時間頻率-變化尺度係數)或情況(b)(時變、非頻變尺度係數)中,尺度係數可以是該音訊信號或該音訊信號之量測的函數。In case (a) (time frequency - variation scale factor) or case (b) (time varying, non-frequency variable scale factor), the scale factor may be a function of the measurement of the audio signal or the audio signal.
在非時變、頻變尺度係數的情況或非時變、非頻變、尺度係數的情況(d)中,修改或推導可包含儲存尺度係數或該尺度係數可從外部來源被接收。In the case of non-time varying, frequency varying scale coefficients or in the case of non-time varying, non-frequency varying, scale coefficients (d), the modification or derivation may include storing scale coefficients or the scale coefficients may be received from an external source.
在情況(c)或情況(d)的任一情況中,尺度係數可能不是該音訊信號或該音訊信號之量測的函數。In either case of case (c) or case (d), the scale factor may not be a function of the measurement of the audio signal or the audio signal.
本發明之任何論點以及其變化、修改、推導、或產生,可以各自地、明確地或隱含地計算(1)特定響度、及/或(2)部份特定響度、及/或(3)目標特定響度。隱含地計算可包含,例如,一對照表或一封閉形式數學表示式。Any of the arguments of the present invention, as well as variations, modifications, derivations, or generations thereof, may be individually, explicitly, or implicitly calculated as (1) a particular loudness, and/or (2) a portion of a particular loudness, and/or (3) Target specific loudness. Implicit calculations may include, for example, a look-up table or a closed form mathematical representation.
修改參數可在時間上被平滑化。修改參數可以是,例如,(1)關於音訊信號之頻帶的多數個振幅尺度調整或(2)控制一個或更多濾波器,例如多分支FIR濾波器或多極點IIR濾波器的多數個濾波器係數。該尺度調整或濾波器係數(及它們被應用於其中的濾波器)可以是時變的。Modifying parameters can be smoothed in time. The modified parameters may be, for example, (1) a majority of amplitude scale adjustments with respect to the frequency band of the audio signal or (2) control of one or more filters, such as a plurality of filters of a multi-branch FIR filter or a multi-pole IIR filter. coefficient. The scale adjustments or filter coefficients (and the filters to which they are applied) can be time-varying.
在計算定義目標特定響度的音訊信號之特定響度的函數或該函數之反函數時,進行此類計算的程序在被歸類於感知(心理聽覺學)響度領域中操作,其中計算之輸入和輸出為特定響度。相對地,在將振幅尺度調整施加至音訊信號頻帶或將濾波器係數施加至音訊信號之可控制過濾時,該修改參數操作以於感知(心理聽覺學)響度領域之外,在可被歸類於電氣信號的領域中修改該音訊信號。雖然可在電氣信號領域中進行音訊信號之音訊信號修改,在電氣信號領域中的這種改變從在感知(心理聽覺學)響度領域中的計算中導出以至於被修改音訊信號具有近似所需的目標特定響度之特定響度。In calculating a function of a particular loudness of an audio signal defining a target specific loudness or an inverse function of the function, the procedure for performing such calculations is operated in the field of perception (perceptual psychoacoustic) loudness, where the input and output of the calculation For a specific loudness. In contrast, when applying amplitude scale adjustment to the audio signal band or applying filter coefficients to the controllable filtering of the audio signal, the modified parameter operates outside of the perceptual (psychoacoustic) loudness domain and can be classified The audio signal is modified in the field of electrical signals. Although the audio signal modification of the audio signal can be performed in the field of electrical signals, this change in the field of electrical signals is derived from calculations in the field of perceptual (psychoacoustic) loudness so that the modified audio signal has an approximate desired The specific loudness of the target specific loudness.
從在響度領域中的計算推導修改參數,可在感知響度和感知頻譜平衡方面獲得比從電氣信號領域中導出這些修改參數有更多的控制。此外,使用基底膜模擬心理聽覺學濾波器組或其之等效以進行響度領域計算比從電氣信號領域中導出修改參數的配置提供更多的感知頻譜之詳細控制。Deriving the modified parameters from the calculations in the loudness field provides more control over the perceived loudness and perceived spectral balance than the derivation of these modified parameters from the electrical signal domain. Furthermore, the use of a basement membrane to simulate a psychoacoustic filterbank or its equivalent to perform loudness domain calculations provides more detailed control of the perceptual spectrum than a configuration that derives modified parameters from the electrical signal domain.
各種修改、推導、和產生取決於一個或多個的干擾音訊信號之量測、目標特定響度、從被修改音訊信號之特定響度或部份特定響度導出的未被修改音訊信號之特定響度的近似度、未被修改音訊信號之特定響度、以及從被修改音訊信號之特定響度或部份特定響度導出的目標特定響度之近似值。Various modifications, derivations, and generations of approximating the specific loudness of an unmodified audio signal derived from one or more interfering audio signals, a target specific loudness, a particular loudness from a modified audio signal, or a portion of a particular loudness Degree, the specific loudness of the unmodified audio signal, and an approximation of the target specific loudness derived from the specific loudness or partial specific loudness of the modified audio signal.
該修改或推導可至少部分地自一個或更多個下列各項導出修改參數:干擾音訊信號量測、目標特定響度、導自被修改音訊信號之特定響度或部份特定響度之未被修改音訊信號特定響度評估值、未被修改音訊信號特定響度、及導自被修改音訊信號之特定響度或部份特定響度之目標特定響度近似值。The modification or derivation may derive modification parameters at least in part from one or more of: interfering audio signal measurements, target specific loudness, unrecognized audio from a particular loudness or partial specific loudness of the modified audio signal The signal specific loudness evaluation value, the specific loudness of the unmodified audio signal, and the target specific loudness approximation derived from the specific loudness or part of the specific loudness of the modified audio signal.
尤其是,該修改或推導可以至少部分地自下列各項導出修改參數:(1)下列之一者一目標特定響度,以及一從被修改音訊信號之特定響度接收的未被修改音訊信號之特定響度的近似值,以及(2)下列之一者未被修改音訊信號之特定響度,以及從被修改音訊信號之特定響度導出的一目標特定響度近似值,或,當干擾音訊信號將被列入考慮時,該修改或推導可以至少部分地自下列各項導出修改參數:(1)干擾音訊信號之一量測,(2)下列之一者一目標特定響度,及一從被修改音訊信號之部份特定響度導出的未被修改音訊信號之特定響度的近似值,及(3)下列之一者一未被修改音訊信號之特定響度,及從被修改音訊信號之部份特定響度導出的一組目標特定響度近似值。In particular, the modification or derivation may derive the modification parameters at least in part from: (1) one of the following: a target specific loudness, and a particular one of the unmodified audio signal received from the particular loudness of the modified audio signal. An approximation of the loudness, and (2) a specific loudness of the unmodified audio signal, and a target specific loudness approximation derived from the specific loudness of the modified audio signal, or when the interfering audio signal is to be considered The modification or derivation may derive modified parameters at least in part from (1) one of the interfering audio signals, (2) one of the following, a target specific loudness, and a portion from the modified audio signal. An approximation of a particular loudness of an unmodified audio signal derived from a particular loudness, and (3) a specific loudness of one of the unmodified audio signals, and a set of target specificities derived from a particular loudness of the modified audio signal The loudness is approximate.
一組前授配置可被採用,其中該特定響度是從音訊信號被導出並且其中該目標特定響度是來自方法之外部來源或在修改或推導包含儲存一組目標特定響度時來自一儲存部。另外地,一組混合前授/回授配置可被採用,其中目標特定響度之近似值是從被修改音訊信號導出,並且其中,當修改或推導包含儲存一組目標特定響度時,目標特定響度是來自方法之外的來源或來自一組儲存部。A set of preamble configurations can be employed, wherein the particular loudness is derived from an audio signal and wherein the target specific loudness is from an external source of the method or from a store when the modification or derivation includes storing a set of target specific loudness. Additionally, a set of hybrid pre-receiving/responsible configurations can be employed in which an approximation of the target specific loudness is derived from the modified audio signal, and wherein when the modification or derivation includes storing a set of target specific loudness, the target specific loudness is Sources from outside the method or from a set of storage.
該修改或推導可包含一種或多種處理程序以明確地或隱含地獲得該目標特定響度,而其中之一個或多個明確地或隱含地計算該音訊信號或該音訊信號之量測的函數。另外,一組前授配置可被採用,其中該特定響度和該目標特定響度從音訊信號被導出,目標特定響度之推導採用該音訊信號或該音訊信號之量測的函數。在另一種選擇中,一組混合前授/回授配置可被採用,其中目標特定響度的近似度從被修改音訊信號被導出,並且目標特定響度從音訊信號被導出,目標特定響度之推導採用該音訊信號或該音訊信號之量測的函數。The modification or derivation may include one or more processing procedures to obtain the target specific loudness, either explicitly or implicitly, and one or more of the functions of the audio signal or the measurement of the audio signal are explicitly or implicitly calculated. . In addition, a set of preamble configurations can be employed, wherein the specific loudness and the target specific loudness are derived from the audio signal, and the derivation of the target specific loudness is a function of the measurement of the audio signal or the audio signal. In another option, a set of hybrid pre-receiving/responsible configurations can be employed in which the approximation of the target specific loudness is derived from the modified audio signal, and the target specific loudness is derived from the audio signal, and the derivation of the target specific loudness is employed. The function of the audio signal or the measurement of the audio signal.
該修改或推導可包含一個或多個處理程序,以明確地或隱含地獲得反應於被修改音訊信號之未被修改音訊信號的特定響度的評估值,其中之一個或多個明確地或隱含地計算該音訊信號或該音訊信號之量測的函數的反函數。另外,一組回授配置被採用,其中未被修改音訊信號之特定響度的評估值和目標特定響度的近似度從被修改音訊信號導出,特定響度之評估值使用該音訊信號或該音訊信號之量測的函數之反函數而被計算。在另一選擇中,一組混合前授/回授配置被採用,其中特定響度從音訊信號被導出並且未被修改音訊信號之特定響度的評估值從被修改音訊信號導出,該音訊信號或該音訊信號之量測的函數之反函數被使用以計算評估值的推導。The modification or derivation may include one or more processing procedures to obtain, explicitly or implicitly, an evaluation value of a particular loudness of the unmodified audio signal that is reflected in the modified audio signal, one or more of which are explicitly or implicitly Inversely calculating an inverse function of the function of the audio signal or the measurement of the audio signal. In addition, a set of feedback configurations is adopted, wherein the evaluation value of the specific loudness of the unmodified audio signal and the approximation of the target specific loudness are derived from the modified audio signal, and the evaluation value of the specific loudness uses the audio signal or the audio signal. The inverse function of the measured function is calculated. In another option, a set of hybrid pre-receiving/responsible configurations are employed in which a particular loudness is derived from the audio signal and an evaluation of the particular loudness of the unmodified audio signal is derived from the modified audio signal, the audio signal or The inverse of the function of the measurement of the audio signal is used to calculate the derivation of the evaluation value.
修改參數可被施加至音訊信號以產生一組被修改之音訊信號。Modification parameters can be applied to the audio signal to produce a modified set of audio signals.
本發明之另一論點在於處理程序或裝置可具有時間及/或空間分離,因此,實際上,具有編碼器或編碼程序並且同時也具有解碼器或解碼程序。例如,可能具有一編碼/解碼系統其中該修改或推導可傳輸和接收或儲存並且同時也複製該音訊信號以及任一的(1)修改參數或(2)目標特定響度或目標特定響度的表示。另外地,實際上,可能僅有一組編碼器或編碼程序,其中具有一組音訊信號以及(1)修改參數或(2)目標特定響度或目標特定響度的表示之傳輸或儲存。另外地,實際上,可能僅有一組解碼器或解碼程序,其中具有一組音訊信號以及(1)修改參數或(2)目標特定響度或目標特定響度的表示之傳輸或儲存。Another argument of the invention is that the processing program or device may have temporal and/or spatial separation and, therefore, actually have an encoder or encoding program and also have a decoder or decoding program. For example, there may be an encoding/decoding system in which the modification or derivation may transmit and receive or store and at the same time also copy the audio signal and either (1) the modified parameter or (2) the target specific loudness or the target specific loudness representation. Additionally, in practice, there may be only one set of encoders or encoding programs with a set of audio signals and (1) modified parameters or (2) transmission or storage of representations of target specific loudness or target specific loudness. Additionally, in practice, there may be only one set of decoders or decoding programs having a set of audio signals and (1) modified parameters or (2) transmission or storage of representations of target specific loudness or target specific loudness.
第1圖展示依據本發明之一論點的前授製作範例的一組功能方塊圖。Figure 1 shows a set of functional block diagrams of a pre-production paradigm in accordance with one aspect of the present invention.
第2圖展示依據本發明之一論點的回授製作範例的一組功能方塊圖。Figure 2 shows a set of functional block diagrams of a feedback production paradigm in accordance with one aspect of the present invention.
第3圖展示依據本發明之一論點的混合前授/回授製作範例之一組功能方塊圖。Figure 3 is a block diagram showing a functional group of a hybrid pre-commissioning/responsibility production example in accordance with one aspect of the present invention.
第4圖展示依據本發明之一論點的另一混合前授/回授製作範例的一組功能方塊圖。Figure 4 shows a set of functional block diagrams of another hybrid pre-/re-sales production paradigm in accordance with one aspect of the present invention.
第5圖是一組功能方塊圖,其展示被任何一組前授、回授、與混合前授回授配置所決定的未被修改音訊信號和修改參數可被儲存或被發送的方式,例如,以供在時間上或空間上分隔的裝置或處理程序中的使用。Figure 5 is a set of functional block diagrams showing the manner in which unmodified audio signals and modified parameters determined by any set of pre-grant, feedback, and hybrid pre-request configurations can be stored or transmitted, such as For use in devices or handlers that are separated in time or space.
第6圖是一組功能方塊圖,其展示被任何一組前授、回授、與混合前授回授配置所決定的未被修改音訊信號和目標特定響度或其表示可被儲存或被發送的方式,例如,以供在時間上或空間上分隔的裝置或處理程序中的使用。Figure 6 is a set of functional block diagrams showing that unmodified audio signals and target specific loudness or their representations determined by any set of pre-grant, feedback, and hybrid pre-requested configurations can be stored or transmitted. The manner, for example, is for use in a device or process that is separated in time or space.
第7圖展示本發明之一論點的縱觀之分解功能方塊圖或分解流程圖。Figure 7 is a block diagram showing an exploded view or an exploded flow chart of an overview of one of the arguments of the present invention.
第8圖是適合作為本發明之實施例中的一組傳輸濾波器之線性濾波器P(z)的理想特性反應,其中垂直軸是分貝衰減(dB)並且水平軸是以赫茲(Hz)為單位的對數基底10頻率。Figure 8 is an ideal characteristic response of a linear filter P(z) suitable as a set of transmission filters in an embodiment of the present invention, wherein the vertical axis is decibel attenuation (dB) and the horizontal axis is in Hertz (Hz) The logarithmic base of the unit is 10 frequencies.
第9圖展示在ERB頻率尺度(垂直軸)和以赫茲為單位的頻率(水平軸)之間的關係。Figure 9 shows the relationship between the ERB frequency scale (vertical axis) and the frequency in Hertz (horizontal axis).
第10圖展示近似在ERB尺度上的臨界頻帶之一組理想化聽覺濾波器特性響應。水平刻度是以赫茲為單位的頻率並且垂直刻度是分貝位準。Figure 10 shows a set of idealized auditory filter characteristic responses that approximate a critical band on the ERB scale. The horizontal scale is the frequency in Hertz and the vertical scale is the decibel level.
第11圖展示ISO 226的相等響度等高線。水平刻度是以赫茲為單位的頻率(對數基底10尺度)並且垂直刻度是以分貝為單位的聲音壓力位準。Figure 11 shows the equal loudness contour of ISO 226. The horizontal scale is the frequency in Hertz (the logarithmic base 10 scale) and the vertical scale is the sound pressure level in decibels.
第12圖展示被傳輸濾波器P(z)所標準化的ISO 226的相等響度等高線。水平刻度是以赫茲為單位的頻率(對數基底10尺度)並且垂直刻度是以分貝為單位的聲音壓力位準。Figure 12 shows the equal loudness contour of ISO 226 normalized by the transmission filter P(z). The horizontal scale is the frequency in Hertz (the logarithmic base 10 scale) and the vertical scale is the sound pressure level in decibels.
第13a圖展示在一片段的女性語音時響度尺度調整為0.25的寬頻帶與多頻帶增益之理想圖。水平刻度是ERB頻帶且垂直刻度是以分貝(dB)為單位的相對增益。Figure 13a shows an ideal plot of wideband and multiband gain with a loudness scale adjusted to 0.25 for a segment of female speech. The horizontal scale is the ERB band and the vertical scale is the relative gain in decibels (dB).
第13b圖分別地展示原始信號、寬頻帶增益被修改信號、以及多頻帶增益被修改信號的特定響度之理想圖。水平刻度是ERB頻帶並且垂直刻度是特定響度(宋/ERB)。Figure 13b shows an ideal plot of the original signal, the wideband gain modified signal, and the specific loudness of the multiband gain modified signal, respectively. The horizontal scale is the ERB band and the vertical scale is the specific loudness (Song/ERB).
第14a圖是一組理想圖,其展示:一般AGC時,L o [t ]為L i [t ]的一組函數。水平刻度是log(L i [t ])和垂直刻度是log(L o [t ])。FIG 14a is a set of ideal view showing: Usually when AGC, L o [t] to L i [t] is a set of functions. The horizontal scale is log( L i [ t ]) and the vertical scale is log ( L o [ t ]).
第14b圖是一組理想化圖,其展示:一般DRC時,L o [t ]為L i [t ]的一組函數。水平刻度是log(L i [t ])和垂直刻度是log(L o [t ])。FIG 14b is a set of idealized view showing: Usually when DRC, L o [t] to L i [t] is a set of functions. The horizontal scale is log( L i [ t ]) and the vertical scale is log ( L o [ t ]).
第15圖展示多頻帶DRC的一般頻帶平滑函數之理想圖。水平刻度是頻帶數目並且垂直刻度是頻帶b的增益輸出。Figure 15 shows an ideal plot of the general band smoothing function for multi-band DRC. The horizontal scale is the number of bands and the vertical scale is the gain output of band b.
第16圖展示本發明之一論點的縱觀之分解功能方塊圖或分解流程圖。Figure 16 is a block diagram showing an exploded view or an exploded flow chart of an overview of one of the arguments of the present invention.
第17圖相似於第1圖其同時也包含在重播環境中的雜訊補償的分解功能方塊圖或分解流程圖。Figure 17 is similar to Figure 1 which also contains a decomposition function block diagram or decomposition flow diagram of the noise compensation in the replay environment.
第1至4圖展示依據本發明之論點的可能之前授、回授、與二組混合前授/回授的製作範例之功能方塊圖。Figures 1 through 4 show functional block diagrams of possible pre-delivery, feedback, and two-group hybrid pre-review/recall modes in accordance with the teachings of the present invention.
參看至第1圖之前授拓樸結構範例,一音訊信號被施加至二組路徑:(1)具有能夠反應於修改參數而修改音訊之一種處理程序或裝置2(“修改音訊信號”)的一組信號路線,以及(2)具有能夠產生此類修改參數的一種處理程序或裝置4(“產生修改參數”)之一組控制路線。在第1圖前授拓樸結構範例以及第2-4圖的各個範例中之修改音訊信號2可以是修改音訊信號的裝置或處理程序,例如,依據來自產生修改參數4(或對應程序或分別地來自第2-4圖的各個範例之裝置4’、4”及4''')的修改參數M,以頻變及/或時變方式,而修改其之振幅。產生修改參數4及各個第2-4圖中的對應者至少部份地在感知響度領域中操作。如第1-4圖的各個範例,修改音訊信號2在電氣信號領域中操作並且產生一組被修改音訊信號。同時也如第1-4圖的各個範例,修改音訊信號2和產生修改參數4(或其之對應者)修改一組音訊信號以減少在其之特定響度和一組目標特定響度之間的差異。Referring to the topology example prior to Figure 1, an audio signal is applied to two sets of paths: (1) a processor or device 2 ("Modify Audio Signal") that has the ability to modify the parameters to modify the audio. The set of signal routes, and (2) a set of control routes having a handler or device 4 ("generating modified parameters") capable of generating such modified parameters. The modified audio signal 2 in the first embodiment of the present invention and the modified audio signal 2 in each of the examples in FIGS. 2-4 may be devices or processing programs for modifying the audio signal, for example, based on the generated modified parameter 4 (or corresponding program or separate The modified parameters M of the devices 4', 4" and 4"') from the respective examples of Figures 2-4 are modified in frequency and/or time varying manner to produce modified parameters 4 and The counterparts in Figures 2-4 operate, at least in part, in the field of perceived loudness. As in the various examples of Figures 1-4, the modified audio signal 2 operates in the electrical signal domain and produces a set of modified audio signals. As also in the various examples of Figures 1-4, the modified audio signal 2 and the resulting modified parameter 4 (or their counterparts) modify a set of audio signals to reduce the difference between their particular loudness and a set of target specific loudnesses.
在第1圖前授範例中,處理程序或裝置4可包含許多處理程序及/或裝置:一組“計算目標特定響度”處理程序或裝置6其計算反應於音訊信號的目標特定響度或量測音訊信號,例如,音訊信號的特定響度、一組“計算特定響度”處理程序或裝置8其計算反應於音訊信號的音訊信號之特定響度或一量測音訊信號例如其之激勵、以及一組“計算修改參數”處理程序或裝置10其計算反應於特定響度和目標特定響度的修改參數。計算目標特定響度6可執行各具有函數參數之一組或更多組函數“F”。例如,其可計算音訊信號之特定響度並且接著將一組或更多組函數F施加至音訊信號以提供目標特定響度。在第1圖中這被分解地指示為“選擇函數F和函數參數”處理程序或裝置6的輸入。與其由裝置或處理程序6計算之外,目標特定響度可被產生修改參數4中所包含或與其相關的儲存處理程序或裝置(被分解地展示為處理程序或裝置10的“儲存”輸入),或所有處理程序或裝置之外部的來源(被分解地展示為處理程序或裝置10的“外部”輸入)所提供。因此,修改參數至少是部分地依據感知(心理聽覺學)響度領域(亦即,至少特定響度並且,在一些情況中,目標特定響度計算)中的計算。In the first example of Figure 1, the processing program or device 4 may include a number of processing programs and/or devices: a set of "calculation target specific loudness" processing programs or devices 6 that calculate a target specific loudness or measurement that is reflected in the audio signal. An audio signal, for example, a specific loudness of an audio signal, a set of "calculated specific loudness" processing programs or means 8 that compute a particular loudness of an audio signal that is responsive to an audio signal or a measure of an audio signal such as its excitation, and a set of " The Calculate Modification Parameters procedure or device 10 calculates a modified parameter that is responsive to a particular loudness and target specific loudness. The calculation target specific loudness 6 may perform one or more sets of function "F" each having a function parameter. For example, it can calculate a particular loudness of the audio signal and then apply one or more sets of functions F to the audio signal to provide a target specific loudness. This is decomposed in the first figure as the input of the "select function F and function parameters" handler or device 6. Instead of being calculated by the device or process 6, the target specific loudness may be generated by a stored processing program or device included in or associated with the modified parameter 4 (decomposed as a "storage" input to the processing program or device 10), Or a source external to all of the handlers or devices (deprecated as an "external" input to the handler or device 10). Thus, the modified parameters are calculated, at least in part, in the field of perceptual (psychophonic) loudness (ie, at least a particular loudness and, in some cases, a target specific loudness calculation).
被處理程序或裝置6、8和10(及第2圖範例中的處理程序或裝置12、14、10’、第3圖範例中的6、14、10”、及第4圖範例中的8、12、10''')所進行的計算可被明確地及/或隱含地進行。隱含地進行的範例包含(1)一組對照表其項目是全部或部分地根據特定響度及/或目標特定響度及/或修改參數計算,和(2)一組封閉形式數學表示式其固有地全部或部分地根據特定響度及/或目標特定響度及/或修改參數。Processed programs or devices 6, 8 and 10 (and the processing programs or devices 12, 14, 10' in the example of Fig. 2, 6, 14, 10 in the example of Fig. 3, and 8 in the example in Fig. 4) The calculations performed by 12, 10''') may be performed explicitly and/or implicitly. Examples implicitly include (1) a set of comparison tables whose items are based in whole or in part on a particular loudness and/or Or target specific loudness and/or modified parameter calculations, and (2) a set of closed form mathematical expressions that are inherently wholly or partially based on a particular loudness and/or target specific loudness and/or modified parameters.
雖然第1圖範例中之計算處理程序或裝置6、8和10(及第2圖範例中之處理程序或裝置12、14、10’、第3圖範例中之6、14、10”、及第4圖範例中之8、12、10''')分解地被展示並且分別地被說明,這僅是為了說明目的。該了解的是,一個或所有的這些處理程序或裝置可被組合在單一處理程序或裝置中或不同地被組合在數個處理程序或裝置中。例如,如下面的第9圖之配置,如同第1圖範例之一組前授拓樸結構,該處理程序或裝置反應於從音訊信號和目標特定響度導出之平滑化激勵而計算修改參數。在第9圖範例中,計算修改參數的裝置或處理程序隱含地計算音訊信號之特定響度。Although the calculation processing program or devices 6, 8 and 10 in the example of Fig. 1 (and the processing programs or devices 12, 14, 10' in the example of Fig. 2, 6, 14, 10 in the example of Fig. 3, and 8, 12, 10'' in the example of Figure 4 are shown exploded and separately, for illustrative purposes only. It is understood that one or all of these processes or devices can be combined a single processing program or device is combined in a plurality of processing programs or devices, for example, as in the configuration of FIG. 9 below, like a group of pre-represented topology structures of the example of FIG. 1, the processing program or device The modified parameters are calculated in response to the smoothed excitation derived from the audio signal and the target specific loudness. In the example of Fig. 9, the means or process for calculating the modified parameters implicitly calculates the specific loudness of the audio signal.
如本發明之一論點,第1圖之範例和本發明其他實施例之範例,目標特定響度([b
,t
])可利用以一個或多個尺度的尺度調整而調整特定響度(N
[b
,t
])地被計算。該尺度調整可以是一組時變和頻變尺度係數Ξ
[b,t],其依據下述關係進行特定響度的尺度調整:
因此,目標特定響度可被表示為該音訊信號或該音訊信號之量測的一個或多個函數F(特定響度為音訊信號之一可能量測):
如果函數F是可逆的,未被修改音訊信號之特定響度(N
[b
,t
])可被計算為目標特定響度([b
,t
])的反函數F- 1
:
如將被解釋於後,反函數F- 1 在第2和4圖之回授與混合前授/回授範例中被計算。As will be explained later, the inverse function F - 1 is calculated in the feedback and hybrid pre-review/reward examples in Figures 2 and 4.
計算目標特定響度6的一組“選擇函數和函數參數”輸入被展示以指示裝置或處理程序6,其可利用依據一或更多函數參數施加一或更多函數而計算目標特定響度。例如,計算目標特定響度8可計算音訊信號之特定響度的函數“F”以便定義目標特定響度。例如,“選擇函數和函數參數”輸入可選擇屬於一或多種上述種類的尺度調整之一或更多組特定的函數,以及一個或多個函數參數,例如關於函數的常數(例如,尺度調整)。A set of "selection function and function parameters" inputs that calculate the target specific loudness 6 are shown to indicate a device or process 6 that can calculate a target specific loudness by applying one or more functions in accordance with one or more function parameters. For example, calculating a target specific loudness 8 may calculate a function "F" of a particular loudness of the audio signal to define a target specific loudness. For example, the "Select Function and Function Parameters" input may select one or more sets of specific functions belonging to one or more of the above categories, and one or more function parameters, such as constants for functions (eg, scaling) .
如上所述地,與尺度調整相關的尺度係數可作為目標特定響度的表示,如同目標特定響度,其可依據特定響度之尺度調整而計算。因此,在第9圖範例中,被說明於下並且如上所述,對照表可以尺度係數和激勵為索引,以至於特定響度和目標特定響度的計算為表內固有的。As described above, the scale factor associated with the scale adjustment can be used as a representation of the target specific loudness, as with the target specific loudness, which can be calculated based on the scale adjustment of the particular loudness. Thus, in the example of Figure 9, illustrated below and as described above, the look-up table may be indexed by the scale factor and the stimulus such that the calculation of the specific loudness and the target specific loudness is inherent within the table.
無論採用一組對照表、一組封閉形式數學表示式、或一些其他的技術,產生修改參數4(和其之對應處理程序或第2-4圖範例之各裝置4’、4”以及4''')的操作使得計算在感知(心理聽覺學)響度領域中達成,即使特定響度和目標特定響度不能被明確地計算。一種明確的特定響度或一組假設的、隱含地特定響度亦將存在。相似地,一組明確的目標特定響度或一組假設的、隱含地目標特定響度亦將存在。在任何情況中,修改參數的計算是為了產生修改音訊信號,以減少在特定響度和目標特定響度之間的差異之修改參數。Whether using a set of look-up tables, a set of closed-form mathematical expressions, or some other technique, the modified parameters 4 (and their corresponding processing programs or the various devices 4', 4" and 4' of the examples of Figures 2-4 are generated. The operation of '') causes the calculation to be achieved in the field of perceptual (psychophone) loudness, even if the specific loudness and the target specific loudness cannot be explicitly calculated. A clear specific loudness or a set of hypothetical, implicitly specific loudness will also Similarly, a set of explicit target-specific loudnesses or a set of hypothetical, implicit target-specific loudnesses will also exist. In any case, the modified parameters are calculated to produce modified audio signals to reduce the specific loudness and Modification parameters for differences between target specific loudnesses.
在具有第二干擾音訊信號(例如,雜訊)的重播環境中,計算修改參數10(和其之對應處理程序或分別地在第2-4圖各個範例之裝置10’、10”和10'''),其同時也可選擇性地接收這種第二干擾音訊信號的量測作為輸入或第二干擾信號本身作為其之一組輸入。這種選擇性的輸入以一破折線被展示在第1圖(及第2-4圖)。第二干擾信號的量測可以是其之激勵,例如第17圖之範例,其被說明於下。將干擾信號的量測或信號本身(假設干擾信號可分別地被處理)施加至計算修改參數處理程序或第1圖之裝置10(及其之對應處理程序或分別地在第2-4圖的各個範例之裝置10’、10”和10''')而允許適當地被組態之此類處理程序或裝置計算考慮干擾信號的一組修改參數,如將進一步地被說明於“雜訊補償”部分中。第2-4圖之範例中,部份特定響度之計算假設干擾信號的適當量測不只被施加至分別的計算修改參數10’、10”、或10''',但同時也被施加至一組“未被修改音訊之特定響度的計算近似度”處理程序或裝置12及/或一組“目標特定響度之計算近似度”處理程序或裝置14,以協助函數或裝置計算部份特定響度。如第1圖之前授範例,部份特定響度未被明確地計算-第1圖之計算修改參數10計算適當的修改參數,以使被修改音訊的部份特定響度近似於目標特定響度。這將在“雜訊補償”部分中進一步地被說明。In a replay environment having a second interfering audio signal (e.g., noise), the modified parameter 10 (and its corresponding processing program or devices 10', 10" and 10' of the respective examples in Figures 2-4, respectively, are calculated. ''), which can also selectively receive the measurement of such second interfering audio signal as an input or a second interfering signal itself as a group input. This selective input is displayed in a dashed line. Figure 1 (and Figures 2-4). The measurement of the second interfering signal may be an excitation thereof, such as the example of Figure 17, which is illustrated below. The measurement of the interfering signal or the signal itself (assuming interference) The signals may be separately processed) applied to the device for modifying the parameter processing program or the device 10 of FIG. 1 (and its corresponding processing program or devices 10', 10" and 10' of the respective examples of Figures 2-4, respectively. '') allows such a program or device that is suitably configured to calculate a set of modified parameters that take into account the interfering signal, as will be further illustrated in the "noise compensation" portion. In the example of Figures 2-4, the calculation of a portion of the specific loudness assumes that the appropriate measurement of the interfering signal is not only applied to the respective computational modification parameters 10', 10", or 10"', but is also applied to one. The set "calculated approximation of the specific loudness of the unmodified audio" handler or device 12 and/or a set of "calculated approximations of the target specific loudness" process or device 14 is used to assist the function or device in calculating a portion of the specific loudness. As in the example given in Figure 1, some of the specific loudness is not explicitly calculated - the calculation modification parameter 10 of Figure 1 calculates the appropriate modification parameters so that the specific loudness of the modified audio approximates the target specific loudness. This is further illustrated in the "Noise Compensation" section.
如上所述,在第1-4圖的各個範例中,修改參數M,在被音訊信號修改器2施加至音訊信號時,以減少在產生的被修改音訊之特定響度或部份特定響度和目標特定響度之間的差異。理想地,被修改音訊信號的特定響度與目標特定響度是近似或相同。修改參數M可能,例如,具有被施加至從濾波器組導出的頻帶或時變濾波器之係數的時變增益係數之型態。因此,在所有第1-4圖的範例中,修改音訊信號2可被製作為,例如,各自在一組頻帶中操作的多數個振幅尺度器,或一組時變濾波器(例如,一組多分支FIR濾波器或一組多極點IIR濾波器)。As described above, in each of the examples of Figures 1-4, the parameter M is modified to be applied to the audio signal by the audio signal modifier 2 to reduce the specific loudness or part of the specific loudness and target of the generated modified audio. The difference between specific loudnesses. Ideally, the specific loudness of the modified audio signal is approximately or the same as the target specific loudness. Modifying the parameter M may, for example, be of a type having a time varying gain coefficient applied to a band derived from a filter bank or a coefficient of a time varying filter. Thus, in all of the examples of Figures 1-4, the modified audio signal 2 can be made, for example, as a plurality of amplitude scalers each operating in a set of frequency bands, or a set of time varying filters (eg, a set Multi-branch FIR filter or a set of multi-pole IIR filters).
於本文中,相同參考號碼指示該裝置或處理程序可與另一或其他具有相同參考號碼的裝置或處理程序是大致地相似。具有撇號的參考號碼(例如,“10’”)指示裝置或處理程序則具有相似的結構或功能,但亦可能是另一或其他之具有相同基本的參考號碼或撇號之修改。Herein, the same reference numerals indicate that the device or processing program can be substantially similar to another or other device or processing program having the same reference number. A reference number with an apostrophe (e.g., "10'") indicates that the device or processing program has a similar structure or function, but may be another or other modification having the same basic reference number or apostrophe.
在某些限制之下,如第1圖之前授範例的幾乎等效回授配置可被實現。第2圖展示此種範例其中音訊信號同時也在信號路徑中。被施加至修改音訊信號處理程序或裝置2處理程序或裝置2同時也從一組控制路線接收修改參數M其中在一組回授配置中的一組產生修改參數處理程序或裝置4’從修改音訊信號2之輸出接收被修改音訊信號作為其輸入。因此,在第2圖範例中,被修改音訊而不是未被修改音訊被施加至一組控制路徑。修改音訊信號處理程序或裝置2和產生修改參數處理程序或裝置4’修改該音訊信號以減少在其之特定響度和目標特定響度之間的差異。處理程序或裝置4'可包含許多函數和或裝置:一組“未被修改音訊之特定響度的計算近似度”處理程序或裝置12、a“目標特定響度之計算近似度”處理程序或裝置14、和計算修改參數的一組“計算修改參數”處理程序或裝置10’。Under certain limitations, an almost equivalent feedback configuration as exemplified in Figure 1 can be implemented. Figure 2 shows this example where the audio signal is also in the signal path. Applied to the modified audio signal processing program or device 2 processing program or device 2 also receives modified parameters M from a set of control routes, one of which in a set of feedback configurations generates a modified parameter handler or device 4' from the modified audio The output of signal 2 receives the modified audio signal as its input. Thus, in the example of Figure 2, the modified audio, rather than the unmodified audio, is applied to a set of control paths. Modifying the audio signal processing program or device 2 and generating a modified parameter handler or device 4' modifies the audio signal to reduce the difference between its particular loudness and target specific loudness. The processing program or device 4' may include a number of functions and or devices: a set of "calculated approximations of a particular loudness of unmodified audio" processing program or device 12, a "calculated approximation of target specific loudness" processing program or device 14 And a set of "calculate modified parameters" handlers or devices 10' that modify the parameters.
依據函數F是可逆的限制,處理程序或裝置12利用將反函數F- 1 施加至被修改音訊信號之特定響度或部份特定響度而評估未被修改音訊信號之特定響度。如上所述,裝置或處理程序12可計算一組反函數F- 1 。在第2圖中這被分解地指示為“選擇反函數F- 1 和函數參數”輸入至處理程序或裝置12。“目標特定響度的計算近似度”利用計算被修改音訊信號之特定響度或部份特定響度而操作。此特定響度或部份特定響度是目標特定響度的一組近似度。未被修改音訊信號之特定響度的近似度和目標特定響度之近似度被計算修改參數10’使用以導出修改參數M,其如果被修改音訊信號2施加至音訊信號,可減少在被修改音訊信號之特定響度或部份特定響度和目標特定響度之間的差異。如上所述,這些修改參數M可能,例如,具有被施加至濾波器組的頻帶或時變濾波器之係數的時變增益之型態。在計算修改參數10”中回授迴路的實際實施例可能引介在修改參數M的計算和施加之間的延遲。Depending on whether the function F is a reversible limit, the handler or device 12 evaluates the particular loudness of the unmodified audio signal by applying the inverse function F - 1 to the particular loudness or partial specific loudness of the modified audio signal. As described above, the device or handler 12 can calculate a set of inverse functions F - 1 . This is decomposed in Fig. 2 as "select inverse function F - 1 and function parameters" input to the processing program or device 12. The "approximate degree of calculation of the target specific loudness" is operated by calculating the specific loudness or part of the specific loudness of the modified audio signal. This particular loudness or partial specific loudness is a set of approximations of the target specific loudness. The approximation of the specific loudness of the unmodified audio signal and the approximation of the target specific loudness are used by the calculated modified parameter 10' to derive a modified parameter M that can be reduced if the modified audio signal 2 is applied to the audio signal. The difference between the specific loudness or part of the specific loudness and the target specific loudness. As mentioned above, these modified parameters M may, for example, be of a time varying gain having a frequency band applied to the filter bank or a time varying filter. The actual embodiment of the feedback loop in calculating the modified parameter 10" may introduce a delay between the calculation and the application of the modified parameter M.
如上所述,在具有第二干擾音訊信號,例如雜訊,的重播環境中,計算修改參數10’、未被修改音訊12之特定響度o計算近似度、以及目標特定響度14的計算近似度各亦可選擇性地接收這種第二干擾音訊信號的量測作為輸入或第二干擾信號它本身作為其之一組輸入並且處理程序或裝置12和處理程序或裝置14各可計算被修改音訊信號的部份特定響度。此種選擇輸入以長折線被展示於第2圖。As described above, in a replay environment having a second interfering audio signal, such as a noise, the modified parameter 10', the specific loudness of the unmodified audio 12, the approximate degree of approximation, and the calculated approximation of the target specific loudness 14 are each Optionally, the measurement of the second interfering audio signal is received as an input or a second interfering signal as a group input thereof and the processing program or device 12 and the processing program or device 14 can each calculate the modified audio signal. Part of the specific loudness. This selection input is shown in Figure 2 as a long fold line.
如上所述,本發明之論點的混合前授/回授製作範例是可能的。第3和4圖展示兩組此類製作範例。在第3和4圖的範例中,如第1和2圖的範例中,音訊信號同時也在一組信號路徑中被施加至一組修改音訊信號處理程序或裝置2,但是在分別控制路徑中之產生修改參數(第3圖中的4”中和第4圖中的4''')各接收未被修改音訊信號和被修改音訊信號。在第3和4圖的範例中,修改音訊信號2和產生修改參數(分別為4”和4''')修改音訊信號以減少在其之特定響度,其可能是隱含的,和一組目標特定響度,其同時也可能是隱含的,之間的差異。As described above, a hybrid pre-/return production paradigm of the present invention is possible. Figures 3 and 4 show two sets of such production examples. In the examples of Figures 3 and 4, as in the examples of Figures 1 and 2, the audio signals are simultaneously applied to a set of modified audio signal processing programs or devices 2 in a set of signal paths, but in separate control paths. The generated modified parameters (4 in Fig. 3 and 4'' in Fig. 4) each receive the unmodified audio signal and the modified audio signal. In the examples of Figs. 3 and 4, the audio signal is modified. 2 and generating modified parameters (4" and 4"' respectively) to modify the audio signal to reduce the specific loudness at it, which may be implied, and a set of target specific loudness, which may also be implicit, difference between.
如第3圖之範例,產生修改參數處理程序或裝置4’可包含許多函數和/或裝置:如第1圖範例之一組計算目標特定響度6、如第2圖之回授範例的目標特定響度14之一組計算近似度、以及一“計算修改參數”處理程序或裝置10”。如第1圖之範例,在這混合前授/回授範例的前授部份,計算目標特定響度6可執行一或更多的函數“F”其各具有函數參數。在第3圖中這被分解地指示為“選擇函數F和函數參數”處理程序或裝置6的輸入。在這混合前授/回授範例的回授部份,被修改音訊信號被施加至目標特定響度14的一組計算近似度,如第2圖之回授範例。在第3圖之範例中處理程序或裝置14計算被修改音訊信號之特定響度或部份特定響度的操作如同其在第2圖之範例中的操作。此特定響度或部份特定響度是目標特定響度的一組近似度。目標特定響度(來自處理程序或裝置6)和目標特定響度的近似度(來自處理程序或裝置14)被施加至計算修改參數10”以導出修改參數M,其如果被修改音訊信號2施加至音訊信號,將減少在未被修改音訊信號之特定響度和目標特定響度之間的差異。如上所述,這些修改參數M可能,例如,具有被施加至濾波器組的頻帶或時變濾波器之係數的時變增益之型態。在實際的實施例中回授迴路可能引介在修改參數M的計算和施加之間的延遲。如上所述,在具有第二干擾音訊信號,例如雜訊,的重播環境中,計算修改參數10”和目標特定響度14的計算近似度各可選擇性地接收這種第二干擾音訊信號的量測作為輸入或第二干擾信號它本身作為其之一組輸入並且處理程序或裝置14可計算被修改音訊信號的部份特定響度。此種選擇性輸入以長折線被展示於第3圖。As with the example of FIG. 3, the generated modified parameter handler or apparatus 4' may include a number of functions and/or means: a set of target specific loudness 6 as in the example of FIG. 1, a target specificity of the feedback example of FIG. 2 One set of loudness 14 calculates the approximation, and a "calculate modified parameter" handler or device 10". As in the example of Figure 1, the target specific loudness is calculated in the pre-committed portion of the pre-mixed/received paradigm. One or more functions "F" can be executed each having a function parameter. In Figure 3 this is decomposed to indicate the input of the "Select Function F and Function Parameters" handler or device 6. Before this hybrid is given / The feedback portion of the feedback example, the modified audio signal is applied to a set of computational approximations of the target specific loudness 14, as in the feedback example of Figure 2. In the example of Figure 3, the processing program or device 14 is calculated. The operation of modifying the specific loudness or part of the specific loudness of an audio signal is as its operation in the example of Figure 2. This particular loudness or partial specific loudness is a set of approximations of the target specific loudness. Target specific loudness (from the handler Or device 6) The approximation to the target specific loudness (from the processing program or device 14) is applied to the computational modification parameter 10" to derive the modified parameter M, which if applied to the audio signal by the modified audio signal 2, will be reduced in the unmodified audio signal The difference between a specific loudness and a target specific loudness. As mentioned above, these modified parameters M may, for example, be of a time varying gain having a frequency band applied to the filter bank or a time varying filter. In a practical embodiment, the feedback loop may introduce a delay between the calculation and application of the modified parameter M. As described above, in a replay environment having a second interfering audio signal, such as a noise, the calculated modified parameter 10" and the calculated approximation of the target specific loudness 14 each selectively receive the amount of such second interfering audio signal. The input or second interfering signal is itself input as a group and the processing program or device 14 can calculate a portion of the specific loudness of the modified audio signal. Such selective input is shown in Figure 3 as a long polyline.
計算修改參數10”可採用一組誤差檢測裝置或函數,以至於在其之目標特定響度和目標特定響度近似度輸入之間的差異調整修改參數以便減少在目標特定響度的近似度和“實際”目標特定響度之間的差異。Calculating the modified parameter 10" may employ a set of error detecting means or functions such that the difference between the target specific loudness and the target specific loudness approximation input adjusts the modified parameter to reduce the approximation and "actual" at the target specific loudness. The difference between the target specific loudness.
此調整減少在未被修改音訊信號之特定響度,和目標特定響度,其可能是隱含的,之間的差異。因此,修改參數M可依據在前授路線中使用函數F從原始音訊之特定響度被計算的目標特定響度,和在回授路線中從被修改音訊之特定響度或部份特定響度被計算的目標特定響度近似度之間的誤差而被調整。This adjustment reduces the difference between the specific loudness of the unmodified audio signal, and the target specific loudness, which may be implied. Therefore, the modification parameter M can be based on the target specific loudness calculated from the specific loudness of the original audio using the function F in the pre-route, and the target calculated from the specific loudness or part of the specific loudness of the modified audio in the feedback route. The error between the specific loudness approximations is adjusted.
另外的前授/回授範例被展示於第4圖之範例中。這另外的範例和第3圖之範例的差異在於反函數F- 1 是在回授路線中被計算而不是函數F在前授路徑中被計算。如第4圖之範例,產生修改參數處理程序或裝置4'可包含許多函數和/或裝置:如第1圖前授範例之一組計算特定響度8、如第2圖之回授範例的未被修改音訊12之特定響度的計算近似度、以及一組計算修改參數10'''。如第1圖的前授範例,一組計算特定響度8提供未被修改音訊信號之特定響度作為計算修改參數10'''的輸入。如在第2圖的回授範例中,依據函數F是可逆的限制,處理程序或裝置12利用將反函數F- 1 施加至被修改音訊信號的特定響度或部份特定響度而評估未被修改音訊信號的特定響度。作為未被修改音訊12之特定響度的計算近似度之輸入的“選擇反函數和反函數參數”被展示以指示裝置或處理程序12可計算一組反函數F- 1 ,如上所述。在第4圖中這被分解地指示為“選擇反函數F- 1 和函數參數”處理程序或裝置12的輸入。因此,處理程序或裝置12將未被修改音訊信號之特定響度的近似度提供作為計算修改參數10'''的另一輸入。Additional pre-grant/reward examples are shown in the example of Figure 4. The difference between this additional example and the example of Fig. 3 is that the inverse function F - 1 is calculated in the feedback route instead of the function F being calculated in the preamble path. As in the example of FIG. 4, the generation of the modified parameter handler or apparatus 4' may include a number of functions and/or means: as in the first example of the first example of FIG. 1, a particular loudness is calculated, as in the feedback example of FIG. The calculated approximation of the specific loudness of the modified audio 12, and a set of computational modification parameters 10'''. As in the pre-example of Figure 1, a set of calculated specific loudnesses 8 provides the specific loudness of the unmodified audio signal as an input to calculate the modified parameter 10'''. As in the feedback example of FIG. 2, depending on whether the function F is a reversible limit, the processing program or device 12 evaluates the unmodified by applying the inverse function F - 1 to the specific loudness or partial specific loudness of the modified audio signal. The specific loudness of the audio signal. The "select inverse and inverse function parameters" as inputs to the calculated approximation of the particular loudness of the unmodified audio 12 are shown to indicate that the device or handler 12 can calculate a set of inverse functions F - 1 , as described above. This is decomposed in Fig. 4 as the input of the "select inverse function F - 1 and function parameters" handler or device 12. Thus, the handler or device 12 provides an approximation of the particular loudness of the unmodified audio signal as another input to calculate the modified parameter 10"'.
如第1-3圖之範例,計算修改參數10'''推導出修改參數M,其如果被修改音訊信號2施加至音訊信號,將減少在未被修改音訊信號的特定響度和在這範例中是隱含的目標特定響度之間的差異。如上所述,這些修改參數M可能,例如,具有被施加至濾波器組的頻帶或時變濾波器之係數的時變增益之型態。在實際的實施例中回授迴路可能引介在修改參數M的計算和施加之間的延遲。如上所述,在具有第二干擾音訊信號,例如雜訊,的重播環境中,計算修改參數10'''和未被修改音訊12之特定響度的計算近似度各可選擇性地接收這種第二干擾音訊信號的量測作為輸入或第二干擾信號它本身作為其之一組輸入並且處理程序或裝置12可計算被修改音訊信號的部份特定響度。此種選擇性輸入以長折線被展示於第4圖。As in the example of Figures 1-3, calculating the modified parameter 10''' derives a modified parameter M that, if applied to the audio signal by the modified audio signal 2, will reduce the specific loudness of the unmodified audio signal and in this example It is the difference between the implied target-specific loudness. As mentioned above, these modified parameters M may, for example, be of a time varying gain having a frequency band applied to the filter bank or a time varying filter. In a practical embodiment, the feedback loop may introduce a delay between the calculation and application of the modified parameter M. As described above, in a replay environment having a second interfering audio signal, such as a noise, the calculated modification parameter 10''' and the calculated approximation of the specific loudness of the unmodified audio 12 are each selectively receivable. The measurement of the two interfering audio signals as an input or a second interfering signal is itself input as a group and the processing program or device 12 can calculate a portion of the specific loudness of the modified audio signal. This selective input is shown in Figure 4 as a long fold line.
計算修改參數10”可採用一組誤差檢測裝置或函數,以至於在其之特定響度和特定響度近似度輸入之間的差異產生調整修改參數的輸出以便減少在特定響度的近似度和“實際”特定響度之間的差異。因為特定響度的近似度從被修改音訊之特定響度或部份特定響度被導出,其可作為目標特定響度之近似度,此調整減少在被修改音訊信號的特定響度和函數F- 1 內固有的目標特定響度之間的差異。因此,該修改參數M可依據在前授路線中從原始音訊計算的特定響度,和在回授路線中使用反函數F- 1 從被修改音訊之特定響度或部份特定響度計算的特定響度近似度之間的誤差而被調整。由於回授路徑的關係,實際的製作可引介在修改參數之更動和施加之間的延遲。Calculating the modified parameter 10" may employ a set of error detecting means or functions such that the difference between its particular loudness and the specific loudness approximation input produces an output of the adjusted modified parameter to reduce the approximation and "actual" at a particular loudness. The difference between specific loudnesses. Since the approximation of a particular loudness is derived from the specific loudness or part of the specific loudness of the modified audio, it can be used as the approximation of the target specific loudness, which reduces the specific loudness of the modified audio signal and The difference between the target specific loudness inherent in function F - 1. Therefore, the modified parameter M can be based on the specific loudness calculated from the original audio in the pre-route, and the inverse function F - 1 is used in the feedback route. The error between the particular loudness of the audio or the specific loudness approximation of the particular loudness calculation is modified. Due to the feedback path, the actual fabrication can introduce a delay between the modification of the modified parameter and the application.
雖然第1-4圖範例中的修改參數M在被施加至一組修改音訊信號處理程序或裝置2時減少在音訊信號之特定響度和目標特定響度之間的差異,在實際的實施例中反應於相同音訊信號而被產生的對應修改參數可能不是彼此相同的。Although the modified parameter M in the examples of Figures 1-4 reduces the difference between the specific loudness of the audio signal and the target specific loudness when applied to a set of modified audio signal processing programs or devices 2, in a practical embodiment The corresponding modified parameters generated for the same audio signal may not be identical to each other.
雖然對於本發明之論點並不是主要或必要的,音訊信號或被修改音訊信號之特定響度的計算可有利地採用先前於該國際專利申請所提出之序號PCT/US2004/016964案,其被公佈於WO 2004/111964 A2中之技術,其中之計算從二組或多組特定響度模式函數的族群中,選擇一組或二組或更多特定響度模式函數之組合,其之選擇被輸入音訊信號之特性的量測所控制。第1圖之特定響度104的說明,將於下面說明這種配置。Although the arguments of the present invention are not primary or necessary, the calculation of the specific loudness of the audio signal or the modified audio signal may advantageously be carried out using the serial number PCT/US2004/016964 previously filed in the International Patent Application, which is incorporated herein by reference. The technique of WO 2004/111964 A2, wherein calculating a combination of one or two or more specific loudness mode functions from a group of two or more sets of specific loudness mode functions, the selection of which is input to an audio signal The measurement of the characteristics is controlled. The description of the specific loudness 104 of Fig. 1 will be explained below.
依據本發明之進一步的論點,未被修改音訊信號和任一的(1)修改參數或(2)目標特定響度或目標特定響度的表示(例如,可明確地或隱含地,使用在計算中的尺度調整,如目標特定響度)可被儲存或被發送,例如,以供在時間上及/或空間上分隔的裝置或處理程序中的使用。修改參數、目標特定響度、或目標特定響度之表示可以任何適當的方式而決定,例如,如第1-4圖之前授、回授、及混合前授回授配置範例之一,如上所述。實際上,一組前授配置,如第1圖之範例,是最不複雜並且最快的因為其避免依據被修改音訊信號的計算。第5圖展示傳輸或儲存未被修改音訊和修改參數的範例,第6圖展示傳輸或儲存未被修改音訊和目標特定響度或目標特定響度之表示的範例。According to a further aspect of the invention, the unmodified audio signal and either (1) modified parameter or (2) the target specific loudness or the target specific loudness representation (eg, may be used explicitly or implicitly in the calculation The scale adjustments, such as target specific loudness, can be stored or sent, for example, for use in devices and processes that are separated in time and/or space. The modification of the parameter, the target specific loudness, or the representation of the target specific loudness may be determined in any suitable manner, for example, as described in Figures 1-4, prior to the grant, feedback, and hybrid pre-request configuration examples, as described above. In fact, a set of pre-administration configurations, such as the example of Figure 1, is the least complicated and fastest because it avoids calculations based on the modified audio signal. Figure 5 shows an example of transmitting or storing unmodified audio and modifying parameters. Figure 6 shows an example of transmitting or storing an unmodified audio and a target specific loudness or target specific loudness representation.
如第5圖之範例的一組配置可被使用以在時間上及/或空間上將修改參數至音訊信號的施加從此修改參數的產生分離。如第6圖之範例的一組配置可被使用以在時間上及/或空間上將修改參數之產生和施加從目標特定響度或其表示之產生分離。兩種配置產生一種簡單低成本的重播或接收配置其可避免產生修改參數或產生目標特定響度的複雜性。雖然第5圖型式的配置比第6圖型式的配置簡單,第6圖的配置具有較少需要被儲存或被發送的資訊之優點,尤其是在目標特定響度的表示,例如一個或更多尺度個調整被儲存或被發送時。儲存或傳輸資訊的減少在低位元率的音訊環境中特別有用。A set of configurations, as exemplified in FIG. 5, can be used to separate the application of modified parameters to audio signals temporally and/or spatially from the generation of such modified parameters. A set of configurations, as exemplified in FIG. 6, can be used to separate the generation and application of modified parameters from the target specific loudness or its representation in time and/or space. Both configurations result in a simple, low cost replay or receive configuration that avoids the complexity of generating modified parameters or producing target specific loudness. Although the configuration of the fifth pattern is simpler than the configuration of the sixth pattern, the configuration of FIG. 6 has the advantage of less information that needs to be stored or transmitted, especially in the representation of the target specific loudness, such as one or more scales. When adjustments are stored or sent. The reduction in stored or transmitted information is particularly useful in low bit rate audio environments.
因此,本發明之進一步的論點在於供應一組裝置或處理程序(1)其從一組儲存器或傳輸裝置或處理程序,接收或播放,修改參數M,並且將它們施加至同時被接收的音訊信號或(2),其從一組儲存器或傳輸裝置或處理程序,接收或播放,一目標特定響度或目標特定響度的表示,而利用將目標特定響度或其表示施加至同時被接收的音訊信號(或至音訊信號的量測,例如,可從音訊信號被導出之特定響度)而產生修改參數M,並且將修改參數M施加至接收的音訊信號。此裝置或處理程序可被歸類為解碼處理程序或解碼器;而所需以產生被儲存或被發送資訊的裝置或處理程序可被歸類為編碼處理程序或編碼器。此編碼處理程序或編碼器是第1-4圖之配置範例中可使用以產生分別的解碼處理程序或解碼器所需之資訊的部份。此解碼處理器或解碼器可以和幾乎任何一種處理及/或複製聲音的處理程序或裝置組合或操作。Accordingly, a further object of the present invention is to supply a set of devices or processing programs (1) that receive or play from a set of storage or transmission devices or processing programs, modify parameters M, and apply them to simultaneously received audio. a signal or (2) that receives or plays from a set of storage or transmission devices or processing programs, a target specific loudness or a target specific loudness representation, and utilizes the target specific loudness or its representation to simultaneously received audio. The signal (or to the measurement of the audio signal, for example, the particular loudness from which the audio signal is derived) produces a modified parameter M and applies the modified parameter M to the received audio signal. This apparatus or processing program may be categorized as a decoding processing program or decoder; and a device or processing program required to generate stored or transmitted information may be classified as an encoding processing program or encoder. This encoding process or encoder is part of the configuration example of Figures 1-4 that can be used to generate the information needed for the respective decoding process or decoder. This decoding processor or decoder can be combined or operated with virtually any processing or device that processes and/or replicates sound.
在本發明之一論點中,如第5圖之範例,未被修改音訊信號和修改參數M可利用,例如,一組修改參數產生處理程序或產生器,例如,第1圖之產生修改參數4、第2圖之4’、第3圖之4”或第4圖之4'''施加至任何適當的儲存或傳輸裝置或函數(“儲存器或傳輸”)16,而被產生。於使用第1圖之前授範例作為編碼處理程序或編碼器的情況中,如果不需要在編碼器或編碼處理程序的時間或空間位置提供被修改音訊,修改音訊信號2將不需產生被修改音訊,並且可被省略。儲存器或傳輸16可包含,例如,任何適當的磁、光學或固態儲存和重播裝置或任何適當的有線或無線傳輸和接收裝置,其選擇對於本發明並非主要的。播放或接收修改參數可接著被施加至修改音訊信號2,其具有第1-4圖之範例所採用的型式,以便修改被播放或接收的音訊信號以使其之特定響度近似目標特定響度或修改參數被導出之配置固有的。修改參數可以任何各種的方式被儲存或被發送。例如,它們可被儲存或被發送為伴隨音訊信號的元資料,它們可由分別的路徑或通道被傳送,它們可被隱匿式地編碼在音訊中,它們可以是多路傳輸,等等。使用修改參數以修改音訊信號可以是選擇性的並且,如果是選擇性的,它們的使用可被,例如,使用者所選擇。例如,修改參數如果被施加至音訊信號可能減少音訊信號的動態範圍。使用者可選擇是否採用此動態範圍減少。In an aspect of the present invention, as in the example of FIG. 5, the unmodified audio signal and the modified parameter M are available, for example, a set of modified parameter generation processing programs or generators, for example, the modified parameter 4 of FIG. 4' of Figure 2, 4' of Figure 3 or 4'' of Figure 4 is applied to any suitable storage or transmission device or function ("storage or transmission") 16 and is used. In the case of the first example of the encoding process or the encoder, if the modified audio is not required to be provided at the time or space of the encoder or the encoding process, the modified audio signal 2 will not need to generate the modified audio, and The storage or transmission 16 may comprise, for example, any suitable magnetic, optical or solid state storage and playback device or any suitable wired or wireless transmission and reception device, the choice of which is not essential to the invention. Play or receive The modified parameters can then be applied to the modified audio signal 2 having the pattern used in the examples of Figures 1-4 to modify the played or received audio signal to approximate its specific loudness. The specific loudness or modification parameters are inherently derived from the configuration. The modified parameters can be stored or transmitted in any of a variety of ways. For example, they can be stored or sent as metadata accompanying the audio signal, which can be routed by separate paths or channels. Transmit, they can be implicitly encoded in the audio, they can be multiplexed, etc. The use of modified parameters to modify the audio signal can be selective and, if selective, their use can be, for example The user selects. For example, modifying the parameters if applied to the audio signal may reduce the dynamic range of the audio signal. The user may choose whether to use this dynamic range reduction.
在本發明之另一論點中,如第6圖之範例,未被修改音訊信號和目標特定響度或目標特定響度之表示可被施加至任何適當的儲存或傳輸裝置或函數(“儲存器或傳輸”)16。In another aspect of the invention, as exemplified in Figure 6, the representation of the unmodified audio signal and the target specific loudness or target specific loudness may be applied to any suitable storage or transmission device or function ("storage or transmission" ") 16.
於使用如第1圖之前授組態範例作為編碼處理程序或編碼器的情況中,如果不需要在編碼器或編碼處理程序的時間或空間位置提供任何修改參數或被修改音訊,則計算修改參數10型式之處理程序或裝置或修改音訊信號2型式之處理程序或裝置皆不被需要並且可被省略。如第5圖範例的情況中,儲存器或傳輸16可包含,例如,任何適當的磁、光學或固態儲存和重播裝置或任何適當的有線的或無線傳輸和接收裝置,其之選擇對於本發明並非主要的。被播放或接收目標特定響度或目標特定響度的表示可接著,與未被修改音訊一起,被施加至一組計算修改參數10,如第1圖之範例所採用的型式,或被施加至一組計算修改參數10”,如第3圖之範例所採用的型式,以便提供修改參數M其可接著被施加至修改音訊信號2,如第1-4圖之範例所採用的型式,以便修改播放或接收音訊信號以使其之特定響度近似目標特定響度或修改參數被導出之配置固有的。雖然目標特定響度或其表示可立即地從如第1圖範例之型式的編碼處理程序或編碼器獲得,目標特定響度或其表示或目標特定響度之近似度或其表示可從如第2至4圖的範例之型式的編碼處理程序或編碼器而被得到(近似度在第2和3圖之處理程序或裝置14和第4圖的程序或裝置12中被計算)。目標特定響度或其表示可以任何各種的方式被儲存或被發送。例如,它們可被儲存或被發送為伴隨音訊信號的元資料,它們可由分別的路徑或通道被傳送,它們可被隱含式地編碼在音訊中,它們可以是多路傳輸,等等。使用從儲存或被發送目標特定響度或表示導出的修改參數以修改音訊信號可以是選擇性的,例如,它們的使用可被使用者所選擇。例如,修改參數如果被施加至音訊信號可能減少音訊信號的動態範圍。使用者可選擇是否採用此動態範圍減少。In the case of using the configuration example as shown in Figure 1 as an encoding process or encoder, if it is not necessary to provide any modified parameters or modified audio at the time or spatial position of the encoder or encoding processor, then the modified parameters are calculated. A type 10 processing program or apparatus or a processing program or apparatus for modifying the audio signal type 2 is not required and may be omitted. As in the case of the example of Figure 5, the storage or transmission 16 may comprise, for example, any suitable magnetic, optical or solid state storage and replay device or any suitable wired or wireless transmission and reception device, the choice of which is for the present invention Not the main one. The representation of the target specific loudness or target specific loudness being played or received may then be applied, along with the unmodified audio, to a set of computational modification parameters 10, such as the pattern employed in the example of FIG. 1, or applied to a group Calculating the modified parameter 10", as used in the example of Figure 3, to provide a modified parameter M which can then be applied to the modified audio signal 2, as in the example of Figures 1-4, to modify the play or Receiving an audio signal such that its particular loudness approximates the target specific loudness or the configuration in which the modified parameter is derived. Although the target specific loudness or its representation is immediately available from an encoding process or encoder of the type illustrated in the first example of FIG. 1, The approximation of the target specific loudness or its representation or target specific loudness or its representation may be obtained from an encoding process or encoder of the type of the example of Figures 2 to 4 (approximation of the processing procedures of Figures 2 and 3) Or the device 14 and the program or device 12 of Figure 4 are calculated. The target specific loudness or its representation can be stored or transmitted in any of a variety of ways. For example, they can be stored Stored or transmitted as metadata accompanying audio signals, which may be transmitted by separate paths or channels, which may be implicitly encoded in the audio, they may be multiplexed, etc. Used from being stored or transmitted The target specific loudness or the derived modified parameters to modify the audio signal may be selective, for example, their use may be selected by the user. For example, modifying the parameter may reduce the dynamic range of the audio signal if applied to the audio signal. You can choose whether to use this dynamic range reduction.
當製作本發明之揭示成為一組數位系統時,一組前授組態是最實際的,並且此類組態的範例將被詳細說明於後,將了解本發明之範圍是不受此限制。While making the disclosure of the present invention a set of digital systems, a set of pre-configured configurations is the most practical, and examples of such configurations will be described in detail, and it is to be understood that the scope of the present invention is not limited thereto.
在此文件中,一些名稱,例如"濾波器"或"濾波器組"被使用在此以包含實際上任何遞歸和非遞歸過濾型式,例如IIR濾波器或轉換,並且"濾波"資訊是使用此類濾波器所產生的結果。被說明於下之實施例採用以轉換製作的濾波器組。In this file, some names, such as "filters" or "filter banks" are used here to contain virtually any recursive and non-recursive filtering patterns, such as IIR filters or transformations, and "filtering" information is used. The result of a class filter. The embodiment described below employs a filter bank made by conversion.
第7圖展示本發明之論點實施在一組前授配置中的實施例示範之更詳細的細節。音訊首先通過一組分析濾波器組函數或裝置(“分析濾波器組”)100,其將音訊信號分割成為多數個頻帶(因此,第5圖展示多數個來自分析濾波器組100的輸出,各輸出代表一組頻帶,其輸出經由各種函數或裝置而傳輸至合成濾波器組,其將頻帶合併至一組被組合寬頻帶信號,如進一步地被說明於下)。分析濾波器組100中之濾波器相關於各個頻帶的反應被設計以模擬基底膜在內耳中之特定位置的反應。分析濾波器組100中的各個濾波器之輸出接著進入模擬經由外耳和中耳之音訊傳輸的過濾影響之一組傳輸濾波器或傳輸濾波器函數(“傳輸濾波器”)101。如果只有音訊響度需被量測,傳輸濾波器可被施加在分析濾波器組前,但是因為分析濾波器組輸出被使用以合成被修改音訊,因此在濾波器組之後施加傳輸濾波器是有利的。傳輸濾波器101之輸出接著進入一組激勵函數或裝置(“激勵”)102,其輸出模擬沿著基底膜的能量分配。激勵能量數值可利用平滑函數或裝置(“平滑化”)103而被跨越時間平滑化。平滑函數之函數常數依據所需應用的需要而被設定。平滑化激勵信號在特定響度函數或裝置(“特定響度(SL)”)104中被依序地轉換成為特定響度。特定響度被以每單位頻率的宋之單位表示。與各個頻帶相關的特定響度成分被傳送入特定響度修改函數或裝置(“SL修改”)105。SL修改部105的輸入是原始特定響度並且接著輸出所需的或“目標”特定響度,依據本發明之論點,其最好是原始特定響度的函數(參看下一標題,其標題為“目標特定響度”)。取決於所需的效果,SL修改部105可獨立地於各個頻帶上操作,或一種互相關聯性可能存在於頻帶之間(如第7圖之相交連接線所建議之一組頻率平滑化)。一組增益解決器函數或裝置(“增益解決器”)106以來自激勵102之平滑化激勵頻帶成分和來自SL修改部105的目標特定響度作為其輸入,並決定需要被施加至分析濾波器組100之輸出的各個頻帶以便將量測特定響度轉換為目標特定響度的增益。增益解決器可以各種方式被製作。例如,增益解決器可包含一組反覆的處理程序,例如,被公佈於國際專利申請序號WO 2004/111964 A2之PCT/US2004/016964案中之方式,或另外的查表方式。雖然增益解決器106產生的每頻帶之增益可進一步地利用選擇性的平滑函數或裝置(“平滑化”)107在時間上被平滑化以便使人工效果的感知最小化,在全面的處理程序或裝置中時間平滑化最好被施加在別處,如於別處之說明。最後,增益經由一組分別的相乘組合函數或組合器108被施加至分析濾波器組100的分別頻帶,並且被處理或“被修改”音訊在合成濾波器組函數或裝置(“合成濾波器組”)110中從增益-被修改頻帶被合成。此外,來自分析濾波器組之輸出可在施加增益之前被以延遲函數或裝置(“延遲”)109延遲以便補償任何與增益計算相關的延遲。另外地,除了計算在頻帶之施加增益修改所使用的增益之外,增益解決器106可計算控制一組時變濾波器之濾波器係數,例如一組多分支FIR濾波器或一組多極IIR濾波器。為了簡化激勵,本發明之論點大致被說明為採用被施加至頻帶的增益係數,將了解的是濾波器係數和時變濾波器同時也可被採用於實際的實施例中。Figure 7 shows more detailed details of an embodiment of the invention implemented in a set of pre-configured configurations. The audio first passes through a set of analysis filter bank functions or means ("analysis filter bank") 100, which divides the audio signal into a plurality of frequency bands (thus, Figure 5 shows a plurality of outputs from the analysis filter bank 100, each The output represents a set of frequency bands whose output is transmitted via various functions or means to a synthesis filter bank that combines the frequency bands into a set of combined wideband signals, as further illustrated below. The response of the filter in the analysis filter bank 100 with respect to each frequency band is designed to simulate the reaction of the basement membrane at a particular location in the inner ear. The output of each filter in the analysis filter bank 100 then enters a set of transmission filters or transmission filter functions ("transmission filters") 101 that simulate the filtering effects of audio transmission via the outer and middle ear. If only the audio loudness needs to be measured, the transmission filter can be applied before the analysis filter bank, but since the analysis filter bank output is used to synthesize the modified audio, it is advantageous to apply the transmission filter after the filter bank. . The output of transmission filter 101 then enters a set of excitation functions or devices ("excitation") 102 whose output simulates the energy distribution along the basement membrane. The excitation energy value can be smoothed across time using a smoothing function or device ("smoothing") 103. The function constants of the smoothing function are set according to the needs of the desired application. The smoothed excitation signal is sequentially converted to a particular loudness in a particular loudness function or device ("specific loudness (SL)") 104. The specific loudness is expressed in units of Song per unit frequency. The specific loudness component associated with each frequency band is passed to a particular loudness modification function or device ("SL Modification") 105. The input of the SL modification 105 is the original specific loudness and then outputs the desired or "target" specific loudness, which is preferably a function of the original specific loudness according to the argument of the present invention (see the next title, entitled "Target Specific Loudness"). Depending on the desired effect, the SL modification 105 can operate independently on each frequency band, or a correlation may exist between the frequency bands (as suggested by the intersection of the intersection lines of Figure 7 for frequency smoothing). A set of gain solver functions or means ("gain solver") 106 takes as input the smoothed excitation band component from the excitation 102 and the target specific loudness from the SL modification 105 and determines that it needs to be applied to the analysis filter bank. Each frequency band of the output of 100 is used to convert the measured specific loudness to the gain of the target specific loudness. The gain solver can be made in a variety of ways. For example, the gain solver can include a set of repetitive processes, for example, in the manner of PCT/US2004/016964, which is incorporated by reference in the International Patent Application Serial No. WO 2004/111964 A2, or otherwise. Although the gain per band produced by gain solver 106 can be further smoothed with time using a selective smoothing function or device ("smoothing") 107 to minimize the perception of artifacts, in a comprehensive processing or Time smoothing in the device is preferably applied elsewhere, as explained elsewhere. Finally, the gain is applied to the respective frequency bands of the analysis filter bank 100 via a set of separate multiplication combining functions or combiners 108, and processed or "modified" in the synthesis filter bank function or device ("synthesis filter" The band->modified band is synthesized from the gain-modified band. Moreover, the output from the analysis filter bank can be delayed by a delay function or device ("delay") 109 prior to applying the gain to compensate for any delay associated with the gain calculation. Additionally, in addition to calculating the gain used in applying the gain modification to the band, the gain solver 106 can calculate filter coefficients that control a set of time varying filters, such as a set of multi-branch FIR filters or a set of multi-pole IIRs. filter. To simplify the excitation, the arguments of the present invention are generally illustrated as employing gain coefficients applied to the frequency bands, it being understood that the filter coefficients and time varying filters can also be employed in practical embodiments.
在實際的實施例中,音訊處理可在數位領域中被進行。因此,音訊輸入信號以離散時間序列x[n]代表,其從音訊來源以某取樣頻率f s
被取樣。假設序列x[n]已經適當地被尺度調整因而x[n]的平均輸出功率以分貝為單位如下
是等於人類聆聽者所聆聽之音訊,以dB為單位之聲音壓力位準。此外,為了簡化激勵,音訊信號被假設為單聲道的。It is equal to the sound pressure level in dB, which is heard by human listeners. Furthermore, to simplify the excitation, the audio signal is assumed to be mono.
分析濾波器組100、傳輸濾波器101、激勵102、特定響度104、特定響度修改部105、增益解決器106、及合成濾波器組110可被更詳細說明如下。The analysis filter bank 100, the transmission filter 101, the excitation 102, the specific loudness 104, the specific loudness modification unit 105, the gain solver 106, and the synthesis filter bank 110 can be described in more detail below.
分析濾波器組100 音訊輸入信號被施加至分析濾波器組或濾波器組函數(“分析濾波器組”)100。分析濾波器組100之各個濾波器被設計以模擬沿著基底膜在內耳中之特定位置的頻率響應。濾波器組100可包含一組線性濾波器其頻寬和間隔調節在等效矩形頻寬(ERB)頻率刻度上為常數,如Moore、Glasberg和Baer(如上之B.C.J.Moore、B.Glasberg、T.Baer,“臨限、響度、和部份的響度之預測模式”)所定義者。 The analysis filter bank 100 audio input signal is applied to an analysis filter bank or filter bank function ("analysis filter bank") 100. The individual filters of the analysis filter bank 100 are designed to simulate the frequency response along a particular location in the inner ear of the basement membrane. Filter bank 100 can include a set of linear filters whose bandwidth and spacing adjustments are constant over the equivalent rectangular bandwidth (ERB) frequency scale, such as Moore, Glasberg, and Baer (BCJ Moore, B. Glasberg, T. Baer, supra). , defined by "predictive mode of threshold, loudness, and partial loudness").
雖然ERB頻率尺度更嚴密地匹配人類感知並且在產生匹配主觀響度結果之客觀響度量測中展示更佳性能,具有較低性能的Bark頻率尺度可被採用。While the ERB frequency scale more closely matches human perception and exhibits better performance in producing objective loudness measurements that match subjective loudness results, Bark frequency scales with lower performance can be employed.
在以赫茲為單位之中心頻率f時,一組以赫茲為單位之ERB頻帶寬度可被近似如下:ERB (f )=24.7(4.37f /1000+1) (1)At a center frequency f in Hertz, a set of ERB bandwidths in Hertz can be approximated as follows: ERB ( f ) = 24.7 (4.37 f / 1000 + 1) (1)
依據這關係一組被抝曲頻率尺度被定義以至於在沿著被抝曲尺度之任何點,以被抝曲尺度為單位的對應ERB等於一。從以赫茲為單位之線性頻率轉換至這ERB頻率尺度的函數可利用將方程式1之倒數積分而獲得:
同時,利用方程式2a找出f將ERB尺度轉換為線性的頻率尺度的表示也是有用的:
分析濾波器組100可包含B組聽覺濾波器,其被稱為頻帶,並在中心頻率沿著ERB尺度被均勻地分隔。更明確地說,f c [1]=f m i n (3a)f c [b ]=f c [b-1]+ERBToHz (HzToERB (f c [b -1])+△)b =2...B (3b)f c [B ]<f m a x , (3c)其中△是分析濾波器組100所需的ERB間隔調節,並且其中f m i n 和f m a x 分別地是所需的最小和最大中心頻率。可選擇△=1,並且考慮到人耳敏感的頻率範圍,可設定f m i n =50Hz 和f m a x =20,000Hz 。以這些參數,例如,方程式3a-c的應用將產生B=40聽覺的濾波器。The analysis filterbank 100 can include a set B of auditory filters, referred to as frequency bands, and are evenly spaced along the ERB scale at the center frequency. More specifically, f c [1] = f m i n (3a) f c [ b ] = f c [b-1] + ERBToHz ( HzToERB ( f c [ b -1]) + Δ) b = 2 ... B (3b) f c [ B ]< f m a x , (3c) where Δ is the ERB interval adjustment required for analyzing the filter bank 100, and wherein f m i n and f m a x are respectively The minimum and maximum center frequencies required. Δ=1 can be selected, and f m i n =50 Hz and f m a x =20,000 Hz can be set in consideration of the frequency range sensitive to the human ear. With these parameters, for example, the application of equations 3a-c will produce a B=40 audible filter.
各個聽覺濾波器的振幅頻率響應可利用一組圓通化指數函數而被描述,如Moore和Glasberg所建議。明確地說,具有中心頻率f c
[b
]的濾波器之振幅反應可被計算為:H b
(f
)=(1+pg
)e - pg
(4a)
其中
這些B聽覺濾波器的振幅響應,其近似在ERB刻度上的臨界頻帶,如展示於第10圖。The amplitude response of these B auditory filters is approximately the critical band on the ERB scale, as shown in Figure 10.
分析濾波器組100的過濾運算可適當地使用有限長度離散傅立葉轉換而被近似,普遍地被稱為短時離散傅立葉轉換(STDFT),因為以音訊信號的取樣率執行濾波器之製作,被稱為全速率製作,被認為可提供比精確響度量測所需之更高時間解析度。使用STDFT取代全速率製作,可達成效率的改進和計算複雜性的減少。The filtering operation of the analysis filter bank 100 can be approximated using a finite-length discrete Fourier transform as appropriate, commonly referred to as short-time discrete Fourier transform (STDFT), because the filter is performed at the sampling rate of the audio signal, which is called For full rate production, it is believed to provide a higher time resolution than is required for accurate loudness measurements. Using STDFT instead of full rate production results in improved efficiency and reduced computational complexity.
輸入音訊信號x[n]的STDFT被定義為:
注意方程式5a中的變量t是代表STDFT的時間區塊之離散指標而非以秒為單位的時間量測。各個增量t代表沿著信號x[n]的T取樣間隔。依序指標t之引用亦假設此定義。不同的參數設定和視窗形狀可依據製作之細節而被使用,在f s =44100Hz 時選擇N =2048,T =1024,並且以w[n]作為一組漢寧視窗提供足夠的時間和頻率解析度之平衡。上述的STDFT可使用快速傅立葉轉換(FFT)而成為更有效的。Note that the variable t in Equation 5a is a discrete indicator representing the time block of the STDFT rather than a time measurement in seconds. Each increment t represents the T-sampling interval along the signal x[n]. The reference to the indicator t is also assumed to be this definition. Different parameter settings and window shapes can be used depending on the details of the production. Select n = 2048, T = 1024 at f s = 44100 Hz , and provide enough time and frequency for w[n] as a group of Hanning windows. The balance of resolution. The above STDFT can be made more efficient using Fast Fourier Transform (FFT).
與其使用STDFT,修改離散餘弦轉換(MDCT)可被採用以製作分析濾波器組。MDCT是普遍地被使用在感知音訊編碼器,例如,杜比(Dolby)AC-3之一種轉換。如果被揭示的系統被以此類的感知編碼音訊而被製作,被揭示響度量測和修改可利用處理編碼音訊的現存MDCT係數而更有效地被製作,因而免除進行分析濾波器組轉換的需要。輸入音訊信號x[n]的MDCT被給予為:,其中(6)Instead of using STDFT, modified discrete cosine transform (MDCT) can be employed to create an analysis filter bank. MDCT is a conversion commonly used in perceptual audio encoders, for example, Dolby AC-3. If the disclosed system is produced with such perceptually encoded audio, the revealed metrics and modifications can be more efficiently produced by processing the existing MDCT coefficients of the encoded audio, thereby eliminating the need for analysis filter bank conversion. . The MDCT of the input audio signal x[n] is given as: ,among them (6)
一般而言,間隔尺寸T被選擇為正好是轉換長度N的一半因而可能完整地重建信號x[n]。In general, the spacing dimension T is chosen to be exactly half of the conversion length N and thus it is possible to completely reconstruct the signal x[n].
傳輸濾波器101 分析濾波器組100之輸出被施加至一組傳輸濾波器或傳輸濾波器函數(“傳輸濾波器”)101,其依據經由外耳和中耳之音訊傳輸將濾波器組的各個頻帶濾波。第8圖展示傳輸濾波器P(f),跨越可聽見的頻率範圍之一種適當的振幅頻率響應。在1kHz之下該響應是一單位,並且,在1kHz之上,將依據聽力臨限之倒數,如ISO 226標準所指定,其臨限在1kHz時被正規化至等於一單位。The output of the transmission filter 101 analysis filter bank 100 is applied to a set of transmission filters or transmission filter functions ("transmission filters") 101 which transmit the respective bands of the filter bank in accordance with the audio transmission via the outer and middle ears. Filtering. Figure 8 shows the transmission filter P(f), an appropriate amplitude frequency response across the audible frequency range. The response is one unit below 1 kHz and, above 1 kHz, will be normalized to equal one unit depending on the reciprocal of the hearing threshold, as specified by the ISO 226 standard.
激勵102
為了計算輸入音訊信號的響度,必須在施加分析濾波器組100之後,量測在傳輸濾波器101的各個濾波器中之音訊信號的短時間能量。這時間和頻率變化量測被稱為激勵。分析濾波器組100中之各個濾波器的短時間能量輸出可在激勵函數102中經由將在頻率領域中的濾波器響應與輸入信號之功率頻譜相乘而被近似:
總之,激勵函數102的輸出是能量E在每時間週期t之分別ERB頻帶b中的頻率領域表示。In summary, the output of the excitation function 102 is represented by the frequency domain in the ERB band b of each time period t.
時間平均(“平滑化”)103
在被揭示之發明的某些應用中,如被說明於下,可能需要將激勵在其轉換至特定響度之前平滑化。例如,平滑化可依據下列方程式以平滑函數103被遞歸地進行:
特定響度104 在特定響度轉換器或轉換函數(“特定響度”)104中,激勵之各個頻帶被轉換為特定響度之一組成分值,其以每ERB單位之宋為量測。 The specific loudness 104 is in a particular loudness converter or transfer function ("specific loudness") 104, and each frequency band of the excitation is converted to a component score of a particular loudness, which is measured in units of ERB units.
開始計算特定響度時,[b
,t
]之各個頻帶的激勵位準可被轉換至在1kHz之等效激勵位準,如ISO 226之相等響度等高線(第11圖)所指定,並以傳輸濾波器P(z)(第12圖)正規化:
接著,各個頻帶的特定響度可被計算為:N [b ,t ]=α[b ,t ]N NB [b ,t ]+(1-α[b ,t ])N WB [b ,t ], (10)其中N NB [b ,t ]和N WB [b ,t ]分別地是依據窄頻帶和寬頻帶信號模式的特定響度數值。數值α[b ,t ]是從音訊信號被計算在0和1之間的插補係數。其公佈於WO 2004/111964 A2之國際專利申請序號PCT/US2004/016964案,而說明自激勵之頻譜平滑度計算α[b ,t ]的技術:其同時也詳細地說明“窄頻帶”和“寬頻帶”信號模式。Then, the specific loudness of each frequency band can be calculated as: N [ b , t ]=α[ b , t ] N NB [ b , t ]+(1 -α[ b , t ]) N WB [ b , t ] (10) where N NB [ b , t ] and N WB [ b , t ] are specific loudness values depending on the narrowband and wideband signal patterns, respectively. The value α[ b , t ] is an interpolation coefficient calculated from the audio signal between 0 and 1. It is disclosed in International Patent Application Serial No. PCT/US2004/016964 to WO 2004/111964 A2, which describes the technique for calculating α[ b , t ] from the spectral smoothness of excitation: it also specifies "narrowband" and "details" in detail. Broadband" signal mode.
窄頻帶和寬頻帶特定響度數值N NB
[b
,t
]和N wB
[b
,t
]可使用指數函數從被轉換激勵而評估:
Moore和Glasberg建議當激勵在聽力臨限時特定響度應該等於一些小的數值而非零。特定響度應該接著隨著激勵減少至零而單調地減少至零。因為聽力臨限是可能的臨限(音調察覺率為50%),並且一些音調,各在其臨限,被一起呈現而可總計為比任何分別的音調更容易被聽見之聲音。在被揭示的申請中,將這性質增加至特定響度函數將添加使增益解決器在激勵接近臨限時更適當的表現之利益,如將於下被討論者。如果特定響度在激勵是臨限或臨限之下時被定義為零,增益解決器在激勵是臨限或臨限之下時將不具有單一的解答。如果,另一方面,特定響度在激勵大於或等於零的所有數值時被定義為單調地增加,如Moore和Glasberg所建議,單一的解答將會存在。大於一單位的響度尺度係數將導致大於一單位的增益並且反之亦然。方程式11a和11b之特定響度函數可被修改以具有所需的性質依據:
依據特定響度,全面或“總計”響度L
[t
]利用跨越所有的頻帶b的特定響度總和而被所給予:
特定響度修改部105 在特定響度修改函數(“特定響度修改”)105中,目標特定響度,被稱為[b ,t ],可依據所需的全面裝置或處理程序之應用以各種方法從SL 104的特定響度(第7圖)被計算。如將被更詳細地說明於下,一組目標特定響度可使用一組尺度係數α而被計算,例如,於音量控制的情況中。參看方程式16和其之相關的說明。於自動增益控制(AGC)和動態範圍控制(DRC)的情況中,一組目標特定響度可使用所需輸出響度和輸入響度的比率而被計算。參看方程式17和18和它們相關之說明。於動態等化的情況中,一組目標特定響度可使用被提供在方程式23和其之相關的說明之關係而被計算。 The specific loudness modification section 105 in the specific loudness modification function ("specific loudness modification") 105, the target specific loudness, is called [ b , t ] can be calculated from the specific loudness of the SL 104 (Fig. 7) in various ways depending on the desired full device or processing application. As will be explained in more detail below, a set of target specific loudnesses can be calculated using a set of scale factors a, for example, in the case of volume control. See equation 16 and its associated description. In the case of automatic gain control (AGC) and dynamic range control (DRC), a set of target specific loudnesses can be calculated using the ratio of the desired output loudness to the input loudness. See equations 17 and 18 and their associated descriptions. In the case of dynamic equalization, a set of target specific loudnesses can be calculated using the relationship provided in Equation 23 and its associated description.
益解決器106
在這範例中,對於各個頻帶b和每個區間t,增益解決器106將以平滑化激勵[b
,t
]和目標特定響度[b
,t
]作為其輸入並且產生接著被使用以修改音訊的增益G
[b
,t
]。令函數Ψ{.}代表從激勵至特定響度的非線性轉換以至於
增益解決器解出G
[b
,t
]以至於
增益解決器106決定頻變和時變增益,當其被施加至原始激勵時,產生理想地等於所需的目標特定響度的特定響度。The gain solver 106 determines the frequency and time varying gains, when applied to the original excitation, produces a particular loudness that is ideally equal to the desired target specific loudness.
實際上,增益解決器決定頻變和時變增益,當其被施加至音訊信號之頻率領域版本時產生修改音訊信號以便減少在其之特定響度和目標特定響度之間的差異。理想地,修改被達成以至於被修改音訊信號具有接近目標特定響度之近似值的特定響度。方程式14a之解可以多種方式被製作。例如,如果特定響度之倒數的封閉形式數學表示式存在,並以Ψ- 1
{.}表示,增益可直接地利用重新配置方程式14a而被計算:
另外地,如果Ψ- 1
{.}封閉形式解不存在,一種疊代的方法可被採用,其中各個疊代方程式14a使用目前增益的評估值而被評估。產生之特定響度被與所需的目標比較且增益依據誤差而被更動。如果增益適當地被更動,它們將收斂至所需的解。另一方法包含預先計算在各個頻帶中某範圍的激勵數值之函數Ψ{.}以產生一組對照表。從這對照表可得到反函數之近似並且增益可接著從方程式14b被計算。如上所述,目標特定響度可利用特定響度之尺度調整而被表示:
將方程式13代入14c並且接著將14c代入14b產生增益的另外表示式:
增益可完全地以激勵[b ,t ]和特定響度尺度調整Ξ [b ,t ]之函數而被表示。因此,增益可經由14d之評估或等效對照表被計算而不必明確地計算特定響度或目標特定響度為中間數值。但是,這些數值經由方程式14d之使用而被隱含地計算。利用明確地或隱含地計算特定響度和目標特定響度以計算修改參數之另一等效方法可被設計,並且本發明將涵蓋所有此類方法。Gain can be fully motivated [ b , t ] and a specific loudness scale adjustment Ξ [ b , t ] is expressed as a function. Thus, the gain can be calculated via an evaluation of 14d or an equivalent comparison table without having to explicitly calculate a particular loudness or target specific loudness as an intermediate value. However, these values are implicitly calculated via the use of Equation 14d. Another equivalent method of calculating a modified parameter using a specific loudness and a target specific loudness, either explicitly or implicitly, can be designed, and the present invention will cover all such methods.
合成濾波器組110
如上所述,分析濾波器組100可有效地經由使用短時間離散傅立葉轉換(STDFT)或修改離散餘弦轉換而被製作,並且STDFT或MDCT可相似地被使用以製作合成濾波器組110。明確地說,令X
[k
,t
]代表輸入音訊的STDFT或MDCT,如先前所定義,在合成濾波器組110中之被處理(被修改)音訊的STDFT或MDCT可被計算為
目標特定響度 實施本發明之論點的配置行為例如第1-7圖之範例大致取決於目標特定響度[b ,t ]被計算的方式。雖然本發明不受限制於計算目標特定響度的任何特定函數或反函數,許多此類函數和它們的適當應用將被說明。 Target Specific Loudness The configuration behavior of the arguments implementing the present invention, such as the examples of Figures 1-7, generally depends on the target specific loudness [ b , t ] is calculated. Although the invention is not limited to any particular function or inverse function of calculating a particular loudness of a target, many such functions and their appropriate applications will be described.
適合音量控制之非時變和非頻變函數
一種標準音量控制利用施加一組寬頻帶增益至音訊而調整音訊信號之響度。一般而言,增益被耦合至被使用者調整的旋鈕或推桿直至音訊之響度達到所需的位準。本發明之論點允許此類控制較符合心理聽覺化的製作方式。依據本發明之此論點,與其將一組寬頻帶增益耦合至音量控制而產生跨越所有的頻帶之同量的增益改變,其可能導致感知頻譜的改變,一組特定響度尺度調整係數將被聯合至音量控制調整而使得各個多數頻帶之增益以考慮到人類聽力模式之數量而被改變,因此理想地,不會導致感知頻譜的改變。在本發明之此論點的背景和其示範例的應用中,“常數”或“非時變”是有意允許音量控制尺度係數之設定,例如,被一使用者不時的改變。這種“非時變”有時被稱為“準非時變”、“準-穩態”、“片段非時變”、“片段穩態”、“階段式非時變”、以及“階段式穩態”。給予尺度係數,α,目標特定響度可被計算為量測的特定響度與α相乘之結果:
因為總計響度L [t ]是跨越所有頻帶b的特定響度N [b ,t ]之總和,先前之修改同時也以係數α調整總計響度,但是其方式可對於音量控制調整之改變在特定時間保存相同的感知頻譜。換言之,在任何特定的時間,音量控制調整的改變產生感知響度之改變但是並不產生被修改音訊的感知頻譜對於未被修改音訊的感知頻譜之改變。第13a圖展示當α=0.25時在特定的時間“t”跨越頻帶“b”,一組由女性語音構成之音訊信號產生的多頻帶增益G [b ,t ]。以0.25(水平線)調整原始總計響度所需的寬頻帶增益,如標準音量控制中,同時也被標繪作為比較。與中間頻帶相較之下,多頻帶增益G [b ,t ]在低和高頻帶增加。這與指示人耳在低和高的頻率較不敏感之相等響度等高線是一致的。Since the total loudness L [ t ] is the sum of the specific loudnesses N [ b , t ] across all frequency bands b, the previous modification also adjusts the total loudness by the coefficient α, but the manner can be saved for the change of the volume control adjustment at a specific time. The same perceptual spectrum. In other words, at any particular time, the change in volume control adjustment produces a change in perceived loudness but does not produce a change in the perceived spectrum of the modified audio for the perceived spectrum of the unmodified audio. Figure 13a shows the multi-band gain G [ b , t ] produced by a set of audio signals consisting of female speech at a particular time "t" across frequency band "b" when α = 0.25. The wideband gain required to adjust the original total loudness at 0.25 (horizontal line), as in standard volume control, is also plotted as a comparison. Compared to the intermediate frequency band, the multi-band gain G [ b , t ] increases in the low and high frequency bands. This is consistent with an equal loudness contour that indicates that the human ear is less sensitive at low and high frequencies.
第13b圖展示原始音訊信號、依據先前技術音量控制被修改之寬頻帶增益被修改信號、和依據本發明此論點被修改的多頻帶增益-被修改信號之特定響度。多頻帶增益被修改信號之特定響度是被以0.25調整的原始特定響度。寬頻帶增益被修改信號之特定響度相對於原始未被修改信號具有改變的頻譜形狀。於此情況中,特定響度,相對地,失去在低和高頻率的響度。這被感知為音訊的模糊化,因為其音量被減少,多頻帶被修改信號不具有此問題,因為其響度由感知響度領域導出的增益所控制。Figure 13b shows the original audio signal, the wideband gain modified signal modified according to prior art volume control, and the specific loudness of the multi-band gain-modified signal modified in accordance with this aspect of the invention. The specific loudness of the multi-band gain modified signal is the original specific loudness adjusted by 0.25. The specific loudness of the wideband gain modified signal has a changed spectral shape relative to the original unmodified signal. In this case, the specific loudness, relatively, loses the loudness at low and high frequencies. This is perceived as blurring of the audio because its volume is reduced and the multi-band modified signal does not have this problem because its loudness is controlled by the gain derived from the perceived loudness field.
除了與傳統音量控制相關的感知頻譜平衡失真之外尚有第二個問題的存在。呈現在方程式11a-11d所提供之響度模式中,響度感知之一種性質是在信號位準接近聽力臨限時,任何頻率的信號響度將更快速地減少。結果,所需以提供相同響度衰減至一組較小聲信號的電氣衰減小於較大聲信號所需的電氣衰減。傳統音量控制提供一組常數衰減而無視於信號位準,並且因此在減低音量時較小聲信號與較大聲信號相較之下成為“太小聲”。在許多情況中這導致音訊細節的損失。考慮在迴響房間中錄製響板的情況。在此錄製中主要的響板“擊打”與迴響回音相較之下是顯著地大聲,但是迴響回音傳送房間的大小。當音量被以傳統音量控制降低時,迴響回音相對於主要的擊打成為較小聲並且將終於消失在聽力臨限之下,導致聽起來“乾澀”的響板。以響度為主的音量控制防止錄製之較小聲部份的消失利用將錄製之較小聲迴響部份相對於較大聲的主要擊打而增強使得在這些部份之間的相對響度保持常數。為了達成此效果,多頻帶增益G [b ,t ]必須以與人類響度感知的時間解析度相稱之速率隨著時間而變化。因為多頻帶增益G [b ,t ]被計算為平滑化激勵[b ,t ]之函數,方程式8中的時間常數λ b 選擇取決於在各個頻帶b中跨越時間之增益變化的速度。如先前所述,這些時間常數可被選擇為與在頻帶b之內人類響度感知的整合時間成比例的並且因此隨著時間產生適當的變化G [b ,t ]。應注意到如果時間常數被不適當地選擇(過快或過慢),則感知上令人不愉快的人工效果可能被引入處理音訊。In addition to the perceived spectral balance distortion associated with traditional volume control, there is a second problem. Presented in the loudness mode provided by equations 11a-11d, one property of loudness perception is that signal loudness at any frequency will decrease more rapidly as the signal level approaches hearing threshold. As a result, the electrical attenuation required to provide the same loudness attenuation to a set of smaller acoustic signals is less than the electrical attenuation required for a larger acoustic signal. Conventional volume control provides a set of constant attenuations regardless of the signal level, and thus the smaller acoustic signal becomes "too small" compared to the larger acoustic signal when the volume is reduced. In many cases this results in a loss of audio detail. Consider the case where the castanets are recorded in the reverberant room. The main castanets "hit" in this recording is significantly louder than the reverberant echo, but the echo echo transmits the size of the room. When the volume is lowered by conventional volume control, the reverberant echo becomes a minor sound relative to the main hit and will eventually disappear below the hearing threshold, resulting in a "dry" sound cast. The loudness-based volume control prevents the disappearance of the smaller portion of the recording by using the smaller acoustic reverberation portion of the recording to be enhanced relative to the larger main hit, so that the relative loudness between these portions remains constant. . To achieve this effect, the multi-band gain G [ b , t ] must vary with time at a rate commensurate with the temporal resolution of human loudness perception. Because the multi-band gain G [ b , t ] is calculated as smoothing excitation The function of [ b , t ], the time constant λ b in Equation 8 depends on the speed of the change in gain across time in each frequency band b. As previously described, these time constants can be selected to be proportional to the integration time of human loudness perception within band b and thus produce an appropriate change G [ b , t ] over time. It should be noted that if the time constant is improperly selected (too fast or too slow), then a perceptually unpleasant artificial effect may be introduced to process the audio.
適合固定等化之非時變和頻變函數
在一些應用中,可能希望將固定感知等化施加至音訊,在此情況中目標特定響度可利用施加一組非時變但是頻變的尺度係數Θ[b
]而被計算如下
適合自動增益和動態範圍控制之非頻變和時變函數 自動增益和動態範圍控制(AGC和DRC)的技術習知於音訊處理領域中。抽象而言,兩種技術皆以一些方式量測音訊信號的位準並且接著以量測位準之函數的數量增益修改信號。在AGC的情況中,信號被增益-修改使得其之量測位準較接近使用者選擇的參考位準。在DRC的情況中,信號被增益-修改使得信號的量測位準範圍被轉換至所需的範圍中。例如,使用者可能希望讓音訊之較小聲部份較大聲並且大聲部份較小聲。此類系統被Robinson和Gundry(Charles Robinson和Kenneth Gundry,“經由元資料控制動態範圍”,1999年9月24-27日於紐約舉行之第107次AES會議,預行刊印版5028)所說明。AGC和DRC的傳統製作一般採用音訊信號位準的簡單量測,例如,平滑化峰值或均方根(rms)振幅,以驅動增益修改。此類簡單量測與音訊之感知響度有某些程度的關聯,但是本發明之論點利用依據心理聽覺學模式的響度量測以驅動增益修改而允許更感知的AGC和DRC。同時,許多傳統AGC和DRC系統以寬頻帶增益施加增益修改,因而導致上述被處理音訊之音質(頻譜)失真。另一方面,本發明之論點,採用一組多頻帶增益以使這種失真減少或最小化的方式形成特定響度。 Non-frequency-varying and time-varying functions for automatic gain and dynamic range control The techniques of automatic gain and dynamic range control (AGC and DRC) are well known in the field of audio processing. Abstractly, both techniques measure the level of the audio signal in some way and then modify the signal by the number of gains that measure the level. In the case of AGC, the signal is gain-modified such that its measurement level is closer to the user-selected reference level. In the case of DRC, the signal is gain-modified such that the measurement level range of the signal is converted to the desired range. For example, the user may wish to make the smaller portion of the audio louder and the louder portion smaller. Such systems are described by Robinson and Gundry (Charles Robinson and Kenneth Gundry, "Controlling Dynamic Range via Metadata", 107th AES Conference, New York, 24-27 September 1999, pre-printed 5028). Traditional fabrication of AGCs and DRCs typically uses simple measurements of audio signal levels, such as smoothing peak or root mean square (rms) amplitudes to drive gain modification. Such simple measurements have some degree of correlation with the perceived loudness of the audio, but the arguments of the present invention utilize a loud metric based on the psychoacoustic mode to drive gain modification while allowing for more perceptual AGC and DRC. At the same time, many conventional AGC and DRC systems apply gain modifications with wideband gain, resulting in distortion of the quality (spectral) of the above-described processed audio. On the other hand, the argument of the present invention forms a specific loudness in a manner that a set of multi-band gains are used to reduce or minimize such distortion.
採用本發明之論點的AGC和DRC應用皆具有將輸入寬頻帶響度L i [t ]轉換或映射成為所需輸出寬頻帶響度L o [t ]的函數之特徵,其中響度被以感知響度為單位量測,例如宋。輸入寬頻帶響度L i [t ]是輸入音訊信號之特定響度N [b ,t ]的一組函數。雖然其可能與輸入音訊信號之總體響度相同,其可能是音訊信號之總計響度的時間平滑化版本。Both AGC and DRC applications employing the teachings of the present invention feature a function of converting or mapping the input wideband loudness L i [ t ] to a desired output wideband loudness L o [ t ], where the loudness is in units of perceived loudness. Measurement, such as Song. The input wideband loudness L i [ t ] is a set of functions of the specific loudness N [ b , t ] of the input audio signal. Although it may be the same as the overall loudness of the input audio signal, it may be a time smoothed version of the total loudness of the audio signal.
第14a和14b圖分別地展示AGC和aDRC的一般映射函數之範例。給予這種a映射其中L o
[t
]是L i
[t
]的函數,目標特定響度可被計算為
音訊信號之原始特定響度N [b ,t ]以所需的輸出寬頻帶響度和輸入寬頻帶響度之比率被簡單地調整以產生輸出特定響度[b ,t ]。在AGC系統中,輸入寬頻帶響度L i [t ]一般是音訊之長期總計響度的量測。這可利用將總計響度L [t ]跨越時間平滑化以產生L i [t ]而被達成。The original specific loudness N [ b , t ] of the audio signal is simply adjusted to produce an output specific loudness at a desired ratio of output wideband loudness to input wideband loudness. [ b , t ]. In an AGC system, the input wideband loudness L i [ t ] is typically a measure of the long-term total loudness of the audio. This can be achieved by smoothing the total loudness L [ t ] across time to produce L i [ t ].
與AGC比較之下,DRC系統反應於信號之響度的較短期改變,並且因此L i [t ]可簡單地等於L [t ]。結果,特定響度的尺度調整,以L o [t ]/L i [t ]表示,可能快速地變動導致被處理音訊中不被需要的人工效果。一種一般的缺陷是某些其他相對不相關的頻譜之部份造成頻率頻譜的部份之可聽見調變。例如,古典音樂選擇可能包含持續弦樂音符佔主要地位的高頻率,以及包含大聲作響的定音鼓之低頻率。當定音鼓被擊打時,全面響度L i [t ]增加,並且DRC系統施加衰減至整個特定響度。接著可聽到弦樂響度隨著定音鼓被減弱並且增強。傳統寬頻帶DRC系統之同樣地具有這種頻譜的相互調整之問題,以及一般的解決辦法包含獨立地施加DRC至不同的頻帶。被揭示於此處之系統是固有地多頻帶,因為其具有濾波器組並且採用感知響度模式的特定響度之計算,並且因此修改DRC系統以依據本發明之論點多頻帶形式操作,該系統相對地直接並且將接著被說明。In contrast to AGC, the DRC system reacts to a shorter-term change in the loudness of the signal, and thus L i [ t ] can simply be equal to L [ t ]. As a result, the scale adjustment of the specific loudness, expressed as L o [ t ]/ L i [ t ], may rapidly change the artifacts that are not required in the processed audio. A general drawback is that some of the other relatively uncorrelated portions of the spectrum cause audible modulation of portions of the frequency spectrum. For example, classical music choices may include high frequencies where dominant string notes dominate, and low frequencies with tonal drums that make loud noises. When the timpani is hit, the overall loudness L i [ t ] increases and the DRC system applies attenuation to the entire specific loudness. Then you can hear the string loudness being weakened and enhanced with the timpani. Conventional broadband DRC systems have the same problem of mutual adjustment of such spectrum, and the general solution involves applying DRCs independently to different frequency bands. The system disclosed herein is inherently multi-band because it has a filter bank and employs a calculation of the specific loudness of the perceived loudness pattern, and thus modifies the DRC system to operate in accordance with the inventive multi-band format, the system relatively Direct and will be explained next.
適合動態範圍控制之頻變和時變函數
DRC系統可利用允許輸入和輸出響度獨立地隨著頻帶b變化而被延伸為以多頻帶或頻變形式操作。這些多頻帶響度數值被以L i
[b
,t
]和L o
[b
,t
]表示,並且目標特定響度可接著被給予為
計算L i
[b
,t
]的最直接方法是將其設為相等於特定響度N
[b
,t
]。於此情況中,DRC在感知響度模式之聽覺的濾波器組中的各頻帶上獨立地進行,而不是依據相同輸入與輸出響度比率在所有頻帶上進行如先前在“適合自動增益和動態範圍控制之非頻變和時變函數”的標題之下所說明的。在採用40頻帶的實際實施例中,這些頻帶沿著頻率軸的間隔調節相對地小以便提供響度的精確量測。但是,將DRC尺度係數獨立地施加至各個頻帶可能導致被處理音訊聽起來有“裂開”的感覺。為了避免這問題,可選擇利用將特定響度N
[b
,t
]跨越頻帶平滑化以計算L i
[b
,t
]而使從頻帶到頻帶被施加的DRC數量不過度變化。這可利用定義頻帶-平滑化濾波器Q
(b
)並且接著依據標準捲積總和將跨越所有頻帶c的特定響度平滑化而被達成:
如果以L o [b ,t ]的函數計算L i [b ,t ]之DRC函數在各頻帶b上被固定,施加至特定響度N [b ,t ]之各個頻帶的變化將取決於被處理音訊之頻譜,即使信號的全面響度保持相同。例如,具有大聲低音和較小聲高音的音訊信號之低音可能被移除並且高音增強。具有較小聲低音和大聲高音的信號可能有相反的情況。最終的影響是音質或音訊之感知頻譜的改變,而這在某些應用中可能是所需的。If the DRC function for calculating L i [ b , t ] by the function of L o [ b , t ] is fixed in each frequency band b, the variation of each frequency band applied to the specific loudness N [ b , t ] will depend on the processed The spectrum of the audio, even if the overall loudness of the signal remains the same. For example, the bass of an audio signal with a loud bass and a small pitch may be removed and the treble is enhanced. Signals with smaller bass and louder treble may have the opposite situation. The final effect is a change in the perceived spectrum of sound quality or audio, which may be desirable in some applications.
但是,可能希望進行多頻帶DRC而不必修改音訊之平均感知頻譜。使用者可能希望各個頻帶之平均修改大致地相同但仍然允許修改之短期變化在頻帶之間獨立地操作。所需的效果可利用迫使DRC在各個頻帶中的平均運轉狀態為與一些參考運轉狀態相同而被達成。使用者可能選擇這參考運轉狀態為寬頻帶輸入響度所需L i
[t
]的DRC。令函數L o
[t
]=DRC
{L i
[t
]}代表寬頻帶響度所需的DRC映射。接著令代表寬頻帶輸入響度之時間-平均版本,並且令代表多頻帶輸入響度L i
[b
,t
]之時間-平均版本。多頻帶輸出響度可接著被計算為
注意多頻帶輸入響度是最先被尺度調整以和寬頻帶輸入響度具有相同平均範圍。為寬頻帶響度設計的DRC函數接著被施加。最後,所產生的結果被尺度調整至多頻帶響度的平均範圍。這型式的多頻帶DRC維持減少頻譜泵的優點,同時保留音訊之平均感知頻譜。Note that the multi-band input loudness is first scaled to have the same average range as the wideband input loudness. The DRC function designed for wideband loudness is then applied. Finally, the resulting results are scaled to the average range of multi-band loudness. This type of multi-band DRC maintains the advantages of a reduced spectrum pump while preserving the average perceived spectrum of the audio.
適合動態等化之頻變和時變函數
本發明之論點的另一應用在於有意地將音訊的時變感知頻譜轉換至目標非時變感知頻譜而仍然保留音訊的原始動態範圍。這處理可被稱為動態等化(DEQ)。在傳統靜態等化中,一組簡單固定過濾被施加至音訊以便改變其之頻譜。例如,固定的低音或高音增強可能被施加。這種處理不考慮到音訊的目前頻譜並且可能因此不適合一些信號,亦即,已經包含相對大量之低音或高音的信號。利用DEQ,信號之頻譜被量測並且信號接著被動態地修改以便將量測頻譜轉換為實際靜態所需的形式。本發明之論點中,這種所需的形式跨越濾波器組中頻帶而被明定並且被稱為EQ
[b
]。在實際的實施例中,量測頻譜應該代表音訊之平均頻譜形狀其可利用將特定響度N
[b
,t
]跨越時間平滑化而被產生。這處理可被稱為平滑化特定響度[b
,t
]。如多頻帶DRC,可能不希望DEQ修改在頻帶間激烈地變化,並且因此一組頻帶-平滑函數可被施加以產生一組頻帶-平滑化頻譜:
為了維持音訊的原始動態範圍,所需的頻譜EQ
[b
]應該被正規化以具有與以代表的量測頻譜形狀相同的全面響度。這正規化頻譜形狀可以表示:
最後,目標特定響度被計算為
產生所需的頻譜形狀EQ [b ]之一種方便的方式是令使用者將其設定為等於,如讓使用者覺得舒適之一些音訊片段的頻譜平衡之量測。在實際的實施例中,例如展示於第16圖中,使用者可被提供一組按鈕或其他適當的致動器507,當其被致動時,將捕捉目前音訊的頻譜形狀之量測,並且接著將這量測儲存為一組預置(在目標特定響度預置捕捉和儲存器506中),其可稍後在DEQ引動(如被預置選擇508)時被載入EQ [b ]。第16圖是被簡化的第7圖版本,其中僅一組單線被展示以代表從分析濾波器組100至合成濾波器組110的多數頻帶。第17圖範例同時也提供一組動態EC特定響度(SL)修改505,其依據先前所說明之動態等化提供函數或裝置104所量測的特定響度之一種修改。A convenient way to generate the desired spectral shape EQ [ b ] is to have the user set it equal to A measure of the spectral balance of some audio segments that are comfortable for the user. In a practical embodiment, such as shown in Figure 16, the user may be provided with a set of buttons or other suitable actuators 507 that, when actuated, will capture the spectral shape of the current audio. The measurements are then stored and stored as a set of presets (in target specific loudness preset capture and storage 506) that can be loaded later upon DEQ priming (as preset selection 508) EQ [ b ]. Figure 16 is a simplified version of Figure 7, in which only a single set of lines is shown to represent the majority of the bands from analysis filter bank 100 to synthesis filter bank 110. The Figure 17 example also provides a set of dynamic EC specific loudness (SL) modifications 505 that provide a modification of the particular loudness measured by the function or device 104 in accordance with the dynamic equalization previously described.
組合處理
使用者可能希望將所有先前說明的處理,包含音量控制(VC)、AGC、DRC、及DEQ,組合成為一單一系統。因為這些處理程序各可被以特定響度之尺度調整表示,它們全部可容易地被組合如下:
在一些情況中,單一或一個組合的響度修改程序之尺度調整可能隨著時間快速地變動並且在產生的處理音訊中產生人工效果。因此需要將這些尺度調整的一些子集合平滑化。一般而言,來自VC和DEQ的尺度調整隨著時間平滑地變化,但是可能需要將AGC和DRC尺度調整之組合的平滑化。令這些尺度調整的組合如下所示
平滑化的基本概念在於當特定響度增加時,被組合的尺度調整應該快速地反應,並且當特定響度減少時,尺度調整應該被更加平滑化。這概念對應至在音訊壓縮器設計中所採用的快速沖擊和緩慢釋放之習知實施。平滑化尺度係數之適當的時間常數可利用將特定響度之頻帶平滑化版本跨越時間平滑化而計算出。首先特定響度之頻帶-平滑化版本被計算出:
這頻帶平滑化特定響度的時間平滑化版本接著被計算為
平滑化的尺度調整組合接著被計算為
平滑化係數的頻帶平滑化防止時間平滑化尺度調整之跨越頻帶過度的改變。上述尺度係數的時間和頻帶平滑化使得處理音訊包含較少令人不愉快的感知人工效果。The band smoothing of the smoothing coefficients prevents excessive changes in the spanning band of the time smoothing scale adjustment. The time and band smoothing of the above scale coefficients results in processing the audio containing less unpleasant perceptual artifacts.
雜訊補償 在許多音訊重播環境中干擾聆聽者希望聽到之音訊的背景雜訊可能存在。例如,在移動車輛中之聆聽者可能以安裝的立體聲系統播放音樂而來自引擎和路面之雜訊可顯著地改變音樂的感知。尤其是,在雜訊能量相對於音樂能量是顯著的頻譜部份,音樂之感知響度將被減少。如果雜訊位準夠大,音樂將完全地被遮罩。 Noise Compensation Background noise that interferes with the audio that the listener wishes to hear may be present in many audio replay environments. For example, a listener in a moving vehicle may play music in an installed stereo system and noise from the engine and the road surface can significantly change the perception of the music. In particular, the perceived loudness of the music will be reduced in the portion of the spectrum where the noise energy is significant relative to the music energy. If the noise level is large enough, the music will be completely masked.
依據本發明之論點,將希望選擇增益G [b ,t ]而使干擾雜訊存在時被處理音訊的特定響度等於目標特定響度[b ,t ]。In accordance with the teachings of the present invention, it will be desirable to select the gain G [ b , t ] such that the specific loudness of the processed audio is equal to the target specific loudness when the interfering noise is present. [ b , t ].
為了達成這效果,可採用部份響度的概念,如Moore和Glasberg先前所定義。假設能夠得到雜訊本身的量測和音訊本身的量測。令E N [b ,t ]代表來自雜訊的激勵和令E A [b ,t ]代表來自音訊的激勵。音訊和雜訊之被組合特定響度接著如下面所給予N TOT [b ,t ]=Ψ{E A [b ,t ]+E N [b ,t ]}, (31)其中,再次,Ψ{.}代表從激勵至特定響度的非線性轉換。假設聆聽者的聽力將以維持被組合特定響度之方式分隔在音訊的部份特定響度和雜訊的部份特定響度之間的被組合特定響度:N TOT [b ,t ]=N A [b ,t ]+N N [b ,t ]。 (32)To achieve this effect, the concept of partial loudness can be used, as previously defined by Moore and Glasberg. It is assumed that the measurement of the noise itself and the measurement of the audio itself can be obtained. Let E N [ b , t ] represent the excitation from the noise and let E A [ b , t ] represent the excitation from the audio. The specific loudness of the audio and noise is combined as follows: N TOT [ b , t ]=Ψ{ E A [ b , t ]+ E N [ b , t ]}, (31) where, again, Ψ{ . } represents a nonlinear transformation from excitation to specific loudness. It is assumed that the listener's hearing will be separated by a specific loudness that is separated by the specific loudness of the combination between the specific loudness of the audio and the specific loudness of the noise: N TOT [ b , t ]= N A [ b , t ]+ N N [ b , t ]. (32)
音訊的部份特定響度,N A
[b
,t
],是需被控制的數值,並且因此必須找出這數值。雜訊之部份特定響度可被近似為
注意,當音訊激勵等於雜訊遮罩臨限時(E A [b ,t ]=E TN [b ,t ]),音訊的部份特定響度等於較小聲臨限的信號響度,其是所需的結果。當音訊激勵顯著地大於雜訊時,方程式34之第二組數值將消失,並且音訊的特定響度大約等於雜訊不存在時的響度。換言之,當音訊比雜訊顯著地大聲時,雜訊將被音訊遮罩。K 指數被憑經驗地選擇以提供雜訊中之音調響度資料作為信號雜訊比的函數之適當的符合度。Moore和Glasberg發現K =0.3的數值是適當的。雜訊之遮罩臨限可被以雜訊激勵本身的函數近似:E TN [b ,t ]=K [b ]E N [b ,t ]+E TQ [b ] (35)其中K [b ]是在較低頻帶增加的一組常數。因此,方程式34給予的音訊之部份特定響度可抽象地被表示為音訊激勵和雜訊激勵的函數:N A [b ,t ]=Φ{E A [b ,t ],E N [b ,t ]}。 (36)Note that when the audio excitation is equal to the noise mask threshold ( E A [ b , t ]= E TN [ b , t ]), the partial loudness of the audio is equal to the signal loudness of the smaller sound limit, which is required the result of. When the audio excitation is significantly greater than the noise, the second set of values of Equation 34 will disappear and the specific loudness of the audio will be approximately equal to the loudness when the noise is not present. In other words, when the audio is louder than the noise, the noise will be masked by the audio. The K index is empirically selected to provide the appropriate degree of compliance of the tonal loudness data in the noise as a function of the signal to noise ratio. Moore and Glasberg found that a value of K = 0.3 is appropriate. The noise threshold of the noise can be approximated by the function of the noise excitation itself: E TN [ b , t ]= K [ b ] E N [ b , t ]+ E TQ [ b ] (35) where K [ b ] is a set of constants that increase in the lower frequency band. Thus, the partial loudness of the audio given by Equation 34 can be abstractly represented as a function of the audio excitation and noise excitation: N A [ b , t ]=Φ{ E A [ b , t ], E N [ b , t ]}. (36)
被修改增益解決器可接著被採用以計算增益G
[b
,t
]以至於雜訊存在時,被處理音訊之部份特定響度等於目標特定響度:
第17圖展示第7圖系統,其中的原始增益解決器106被上述雜訊補償增益解決器206所取代(注意在方塊之間的多組垂直線代表濾波器組的多組頻帶已經被單線取代以簡化圖形)。此外,圖形展示雜訊激勵的量測(利用分析濾波器組200、傳輸濾波器201、激勵202以及平滑化203以對應至方塊100、101、102以及103之操作的方式)與音訊激勵(來自平滑化103)和目標特定響度(來自SL修改部105)一起被饋送進入新的增益解決器206。Figure 17 shows a system of Figure 7, in which the original gain solver 106 is replaced by the above-described noise compensation gain solver 206 (note that multiple sets of vertical lines between the blocks represent that the multiple sets of bands of the filter bank have been replaced by single lines To simplify the graphics). In addition, the graph shows the measurement of the noise excitation (using the analysis filter bank 200, the transmission filter 201, the excitation 202, and the smoothing 203 to correspond to the operation of blocks 100, 101, 102, and 103) and the audio excitation (from Smoothing 103) is fed into the new gain solver 206 along with the target specific loudness (from the SL modification 105).
在其之基本的操作模式,第17圖中的SL修改部105可簡單地將目標特定響度[b ,t ]設定為等於音訊N [b ,t ]之原始特定響度。換言之,SL修改部提供音訊信號的特定響度之一組非頻變、α尺度調整,其中α=1。如第17圖中的配置,增益被計算使得雜訊存在時,被處理音訊之感知響度頻譜等於雜訊不存在時的音訊響度頻譜。另外地,先前被說明之以原始響度之函數計算目標特定響度,包含VC、AGC、DRC、以及DEQ技術的任何一種或組合皆可配合雜訊補償響度修改系統而被採用。In its basic mode of operation, the SL modification unit 105 in FIG. 17 can simply target the specific loudness. [ b , t ] is set equal to the original specific loudness of the audio N [ b , t ]. In other words, the SL modification unit provides a set of non-frequency-variant, alpha-scale adjustments of a particular loudness of the audio signal, where α=1. As in the configuration of Figure 17, the gain is calculated such that when the noise is present, the perceived loudness spectrum of the processed audio is equal to the audio loudness spectrum when the noise is not present. Additionally, the previously specified target specific loudness is calculated as a function of the original loudness, and any one or combination of VC, AGC, DRC, and DEQ techniques can be employed in conjunction with the noise compensated loudness modification system.
在實際的實施例中,雜訊量測可從被安置於音訊將被播放的環境中或在附近之麥克風而得到。另外地,近似在各種情況中之預期雜訊頻譜的一組預定樣本雜訊激勵可被採用。例如,在車輛中的雜訊可在各種驅動速率被預先分析並且接著儲存為雜訊激勵與相對速率之對照表。被饋送進入第17圖中的增益解決器206之雜訊激勵可接著隨著車輛速率變化而這從對照表被近似。In a practical embodiment, the noise measurement can be obtained from a microphone placed in the environment in which the audio is to be played or in the vicinity. Additionally, a predetermined set of sample noise excitations that approximate the expected noise spectrum in various situations can be employed. For example, noise in the vehicle can be pre-analyzed at various drive rates and then stored as a table of noise excitation and relative rates. The noise excitation that is fed into the gain solver 206 in Fig. 17 can then be approximated from the look-up table as the vehicle speed changes.
製作 本發明可以硬體或軟體,或兩者之組合(例如,可程控的邏輯陣列)被製作。除了所指定者以外,被包含為本發明部份之演算法不固有地相關於任何特定的電腦或其他裝置。尤其是,各種一般用途機器可配合依據此處技術製作的程式而被使用,或製作更專業的裝置(例如,積體電路)以進行所需的方法步驟可能更為方便。因此,本發明可以在一種或多種可程控電腦系統上所執行之一個或多個電腦程式而被製作,其電腦系統各包含至少一個處理器、至少一個資料儲存系統(包含依電性和非依電性記憶體及/或儲存元件)、至少一個輸入裝置或埠、以及至少一個輸出裝置或埠。程式碼被施加至輸入資料以進行上述函數並且產生輸出資訊。輸出資訊以習知的形式被施加至一個或更多輸出裝置。 Fabrication The invention can be fabricated in hardware or software, or a combination of both (e.g., a programmable logic array). Except as specified, algorithms included as part of the invention are not inherently related to any particular computer or other device. In particular, it may be more convenient for various general purpose machines to be used in conjunction with programs made in accordance with the techniques herein, or to make more specialized devices (e.g., integrated circuits) to perform the desired method steps. Accordingly, the present invention can be made in one or more computer programs executed on one or more programmable computer systems, each computer system comprising at least one processor, at least one data storage system (including electrical and non-compliant) An electrical memory and/or storage element), at least one input device or device, and at least one output device or device. The code is applied to the input data to perform the above functions and to generate output information. The output information is applied to one or more output devices in a conventional form.
各組此類程式可利用任何所需的電腦語言(包含機器、組合、或高位準程序、邏輯、或物件導向程式語言)被製作以與電腦系統通訊。在任何情況中,該語言可以是編譯或翻譯的語言。Each such group of programs can be made to communicate with a computer system using any desired computer language (including machine, combination, or high level program, logic, or object oriented programming language). In any case, the language can be a compiled or translated language.
各組此類電腦程式最好是被儲存或下載至可被一般或特殊用途可程控電腦讀取的一組儲存媒體或裝置(例如,固態記憶體或媒體、或磁或光學媒體),以在電腦系統讀取儲存媒體或裝置而進行上述步驟時以組態並且操作電腦。本發明系統同時也可考慮被製作為被以電腦程式組態的電腦-可讀取儲存媒體,其中儲存媒體被組態使得電腦系統以特定並且預定方式進行上述函數而操作。Preferably, each such computer program is stored or downloaded to a set of storage media or devices (eg, solid state memory or media, or magnetic or optical media) that can be read by a general or special purpose programmable computer. The computer system reads the storage medium or device and performs the above steps to configure and operate the computer. The system of the present invention also contemplates a computer-readable storage medium that is configured to be programmed in a computer program, wherein the storage medium is configured such that the computer system operates in a specific and predetermined manner.
本發明之一些實施例已經被說明。不過,將了解的是各種修改可被達成而不脫離本發明之精神和範疇。例如,上述一些步驟的順序無關,並且因此可以依上述被說明之不同的順序而進行。Some embodiments of the invention have been described. However, it will be appreciated that various modifications may be made without departing from the spirit and scope of the invention. For example, the order of some of the above steps is independent, and thus may be performed in a different order as described above.
2...修改音訊信號2. . . Modify the audio signal
4、4’、4”、4'''...產生修改參數4, 4', 4", 4'''... generate modified parameters
6...計算目標特定響度6. . . Calculate target specific loudness
8...計算特定響度8. . . Calculate specific loudness
10、10’、10”、10'''...計算修改參數10, 10', 10", 10'''... calculate the modified parameters
12...計算未被修改音訊之特定響度的近似度12. . . Calculate the approximation of the specific loudness of the unmodified audio
14...計算目標特定響度之近似度14. . . Calculate the approximation of the target specific loudness
16...儲存或傳輸16. . . Store or transfer
100...分析濾波器組100. . . Analysis filter bank
101...傳輸濾波器101. . . Transmission filter
102...激勵102. . . excitation
102...激勵102. . . excitation
103...平滑化103. . . Smoothing
104...特定響度(SL)104. . . Specific loudness (SL)
105...SL修改105. . . SL modification
106...增益解決器106. . . Gain solver
107...選擇性平滑化107. . . Selective smoothing
108...組合器108. . . Combiner
109...延遲109. . . delay
110...合成濾波器組110. . . Synthesis filter bank
200...分析濾波器組200. . . Analysis filter bank
201...傳輸濾波器201. . . Transmission filter
202...激勵202. . . excitation
203...平滑化203. . . Smoothing
206...雜訊補償增益解決器206. . . Noise compensation gain solver
505...動態EQ特定響度修改505. . . Dynamic EQ specific loudness modification
506...目標特定響度預置捕捉和儲存器506. . . Target specific loudness preset capture and storage
507...使用者捕捉選擇507. . . User capture selection
508...使用者預置選擇508. . . User preset selection
第1圖展示依據本發明之一論點的前授製作範例的一組功能方塊圖。Figure 1 shows a set of functional block diagrams of a pre-production paradigm in accordance with one aspect of the present invention.
第2圖展示依據本發明之一論點的回授製作範例的一組功能方塊圖。Figure 2 shows a set of functional block diagrams of a feedback production paradigm in accordance with one aspect of the present invention.
第3圖展示依據本發明之一論點的混合前授/回授製作範例之一組功能方塊圖。Figure 3 is a block diagram showing a functional group of a hybrid pre-commissioning/responsibility production example in accordance with one aspect of the present invention.
第4圖展示依據本發明之一論點的另一混合前授/回授製作範例的一組功能方塊圖。Figure 4 shows a set of functional block diagrams of another hybrid pre-/re-sales production paradigm in accordance with one aspect of the present invention.
第5圖是一組功能方塊圖,其展示被任何一組前授、回授、與混合前授回授配置所決定的未被修改音訊信號和修改參數可被儲存或被發送的方式,例如,以供在時間上或空間上分隔的裝置或處理程序中的使用。Figure 5 is a set of functional block diagrams showing the manner in which unmodified audio signals and modified parameters determined by any set of pre-grant, feedback, and hybrid pre-request configurations can be stored or transmitted, such as For use in devices or handlers that are separated in time or space.
第6圖是一組功能方塊圖,其展示被任何一組前授、回授、與混合前授回授配置所決定的未被修改音訊信號和目標特定響度或其表示可被儲存或被發送的方式,例如,以供在時間上或空間上分隔的裝置或處理程序中的使用。Figure 6 is a set of functional block diagrams showing that unmodified audio signals and target specific loudness or their representations determined by any set of pre-grant, feedback, and hybrid pre-requested configurations can be stored or transmitted. The manner, for example, is for use in a device or process that is separated in time or space.
第7圖展示本發明之一論點的縱觀之分解功能方塊圖或分解流程圖。Figure 7 is a block diagram showing an exploded view or an exploded flow chart of an overview of one of the arguments of the present invention.
第8圖是適合作為本發明之實施例中的一組傳輸濾波器之線性濾波器P(z)的理想特性反應,其中垂直軸是分貝衰減(dB)並且水平軸是以赫茲(Hz)為單位的對數基底10頻率。Figure 8 is an ideal characteristic response of a linear filter P(z) suitable as a set of transmission filters in an embodiment of the present invention, wherein the vertical axis is decibel attenuation (dB) and the horizontal axis is in Hertz (Hz) The logarithmic base of the unit is 10 frequencies.
第9圖展示在ERB頻率尺度(垂直軸)和以赫茲為單位的頻率(水平軸)之間的關係。Figure 9 shows the relationship between the ERB frequency scale (vertical axis) and the frequency in Hertz (horizontal axis).
第10圖展示近似在ERB尺度上的臨界頻帶之一組理想化聽覺濾波器特性響應。水平刻度是以赫茲為單位的頻率並且垂直刻度是分貝位準。Figure 10 shows a set of idealized auditory filter characteristic responses that approximate a critical band on the ERB scale. The horizontal scale is the frequency in Hertz and the vertical scale is the decibel level.
第11圖展示ISO 226的相等響度等高線。水平刻度是以赫茲為單位的頻率(對數基底10尺度)並且垂直刻度是以分貝為單位的聲音壓力位準。Figure 11 shows the equal loudness contour of ISO 226. The horizontal scale is the frequency in Hertz (the logarithmic base 10 scale) and the vertical scale is the sound pressure level in decibels.
第12圖展示被傳輸濾波器P(z)所標準化的ISO 226的相等響度等高線。水平刻度是以赫茲為單位的頻率(對數基底10尺度)並且垂直刻度是以分貝為單位的聲音壓力位準。Figure 12 shows the equal loudness contour of ISO 226 normalized by the transmission filter P(z). The horizontal scale is the frequency in Hertz (the logarithmic base 10 scale) and the vertical scale is the sound pressure level in decibels.
第13a圖展示在一片段的女性語音時響度尺度調整為0.25的寬頻帶與多頻帶增益之理想圖。水平刻度是ERB頻帶且垂直刻度是以分貝(dB)為單位的相對增益。Figure 13a shows an ideal plot of wideband and multiband gain with a loudness scale adjusted to 0.25 for a segment of female speech. The horizontal scale is the ERB band and the vertical scale is the relative gain in decibels (dB).
第13b圖分別地展示原始信號、寬頻帶增益被修改信號、以及多頻帶增益被修改信號的特定響度之理想圖。水平刻度是ERB頻帶並且垂直刻度是特定響度(宋/ERB)。Figure 13b shows an ideal plot of the original signal, the wideband gain modified signal, and the specific loudness of the multiband gain modified signal, respectively. The horizontal scale is the ERB band and the vertical scale is the specific loudness (Song/ERB).
第14a圖是一組理想圖,其展示:一般AGC時,L o [t ]為L i [t ]的一組函數。水平刻度是log(L i [t ])和垂直刻度是log(L o [t ])。FIG 14a is a set of ideal view showing: Usually when AGC, L o [t] to L i [t] is a set of functions. The horizontal scale is log( L i [ t ]) and the vertical scale is log ( L o [ t ]).
第14b圖是一組理想化圖,其展示:一般DRC時,L o [t ]為L i [t ]的一組函數。水平刻度是log(L i [t ])和垂直刻度是log(L o [t ])。FIG 14b is a set of idealized view showing: Usually when DRC, L o [t] to L i [t] is a set of functions. The horizontal scale is log( L i [ t ]) and the vertical scale is log ( L o [ t ]).
第15圖展示多頻帶DRC的一般頻帶平滑函數之理想圖。水平刻度是頻帶數目並且垂直刻度是頻帶b的增益輸出。Figure 15 shows an ideal plot of the general band smoothing function for multi-band DRC. The horizontal scale is the number of bands and the vertical scale is the gain output of band b.
第16圖展示本發明之一論點的縱觀之分解功能方塊圖或分解流程圖。Figure 16 is a block diagram showing an exploded view or an exploded flow chart of an overview of one of the arguments of the present invention.
第17圖相似於第1圖其同時也包含在重播環境中的雜訊補償的分解功能方塊圖或分解流程圖。Figure 17 is similar to Figure 1 which also contains a decomposition function block diagram or decomposition flow diagram of the noise compensation in the replay environment.
2...修改音訊信號2. . . Modify the audio signal
4...產生修改參數4. . . Generate modified parameters
6...計算目標特定響度6. . . Calculate target specific loudness
8...計算特定響度8. . . Calculate specific loudness
10...計算修改參數10. . . Calculate the modified parameters
Claims (64)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US63860704P | 2004-12-21 | 2004-12-21 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW200623024A TW200623024A (en) | 2006-07-01 |
TWI397901B true TWI397901B (en) | 2013-06-01 |
Family
ID=49030112
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW94138593A TWI397901B (en) | 2004-12-21 | 2005-11-03 | Method for controlling a particular loudness characteristic of an audio signal, and apparatus and computer program associated therewith |
Country Status (2)
Country | Link |
---|---|
MY (1) | MY154344A (en) |
TW (1) | TWI397901B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10109293B2 (en) | 2016-12-09 | 2018-10-23 | Acer Incorporated | Voice signal processing apparatus and voice signal processing method |
TWI754687B (en) * | 2016-10-24 | 2022-02-11 | 美商艾孚諾亞公司 | Signal processor and method for headphone off-ear detection |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7593535B2 (en) * | 2006-08-01 | 2009-09-22 | Dts, Inc. | Neural network filtering techniques for compensating linear and non-linear distortion of an audio transducer |
EP2168122B1 (en) * | 2007-07-13 | 2011-11-30 | Dolby Laboratories Licensing Corporation | Audio processing using auditory scene analysis and spectral skewness |
TWI491277B (en) * | 2008-11-14 | 2015-07-01 | That Corp | Dynamic volume control and multi-spatial processing protection |
TWI503816B (en) * | 2009-05-06 | 2015-10-11 | Dolby Lab Licensing Corp | Adjusting the loudness of an audio signal with perceived spectral balance preservation |
WO2010138311A1 (en) * | 2009-05-26 | 2010-12-02 | Dolby Laboratories Licensing Corporation | Equalization profiles for dynamic equalization of audio data |
TWI505724B (en) * | 2013-06-10 | 2015-10-21 | Princeton Technology Corp | Gain controlling system, sound playback system, and gain controlling method thereof |
CN112614507B (en) * | 2020-12-09 | 2024-06-11 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and device for detecting noise |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW380246B (en) * | 1996-10-23 | 2000-01-21 | Sony Corp | Speech encoding method and apparatus and audio signal encoding method and apparatus |
US6041295A (en) * | 1995-04-10 | 2000-03-21 | Corporate Computer Systems | Comparing CODEC input/output to adjust psycho-acoustic parameters |
US6088461A (en) * | 1997-09-26 | 2000-07-11 | Crystal Semiconductor Corporation | Dynamic volume control system |
US6108431A (en) * | 1996-05-01 | 2000-08-22 | Phonak Ag | Loudness limiter |
US6148085A (en) * | 1997-08-29 | 2000-11-14 | Samsung Electronics Co., Ltd. | Audio signal output apparatus for simultaneously outputting a plurality of different audio signals contained in multiplexed audio signal via loudspeaker and headphone |
US6240388B1 (en) * | 1996-07-09 | 2001-05-29 | Hiroyuki Fukuchi | Audio data decoding device and audio data coding/decoding system |
US6263371B1 (en) * | 1999-06-10 | 2001-07-17 | Cacheflow, Inc. | Method and apparatus for seaming of streaming content |
US6301555B2 (en) * | 1995-04-10 | 2001-10-09 | Corporate Computer Systems | Adjustable psycho-acoustic parameters |
TW200404222A (en) * | 2002-08-07 | 2004-03-16 | Dolby Lab Licensing Corp | Audio channel spatial translation |
-
2005
- 2005-11-03 TW TW94138593A patent/TWI397901B/en active
- 2005-11-08 MY MYPI20055232A patent/MY154344A/en unknown
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6041295A (en) * | 1995-04-10 | 2000-03-21 | Corporate Computer Systems | Comparing CODEC input/output to adjust psycho-acoustic parameters |
US6301555B2 (en) * | 1995-04-10 | 2001-10-09 | Corporate Computer Systems | Adjustable psycho-acoustic parameters |
US6108431A (en) * | 1996-05-01 | 2000-08-22 | Phonak Ag | Loudness limiter |
US6240388B1 (en) * | 1996-07-09 | 2001-05-29 | Hiroyuki Fukuchi | Audio data decoding device and audio data coding/decoding system |
TW380246B (en) * | 1996-10-23 | 2000-01-21 | Sony Corp | Speech encoding method and apparatus and audio signal encoding method and apparatus |
US6148085A (en) * | 1997-08-29 | 2000-11-14 | Samsung Electronics Co., Ltd. | Audio signal output apparatus for simultaneously outputting a plurality of different audio signals contained in multiplexed audio signal via loudspeaker and headphone |
US6088461A (en) * | 1997-09-26 | 2000-07-11 | Crystal Semiconductor Corporation | Dynamic volume control system |
US6263371B1 (en) * | 1999-06-10 | 2001-07-17 | Cacheflow, Inc. | Method and apparatus for seaming of streaming content |
TW200404222A (en) * | 2002-08-07 | 2004-03-16 | Dolby Lab Licensing Corp | Audio channel spatial translation |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI754687B (en) * | 2016-10-24 | 2022-02-11 | 美商艾孚諾亞公司 | Signal processor and method for headphone off-ear detection |
US10109293B2 (en) | 2016-12-09 | 2018-10-23 | Acer Incorporated | Voice signal processing apparatus and voice signal processing method |
Also Published As
Publication number | Publication date |
---|---|
TW200623024A (en) | 2006-07-01 |
MY154344A (en) | 2015-05-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11296668B2 (en) | Methods and apparatus for adjusting a level of an audio signal | |
TWI471856B (en) | Method for controlling a particular loudness characteristic of an audio signal, and apparatus and computer program to perform the same | |
US8199933B2 (en) | Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal | |
TWI397901B (en) | Method for controlling a particular loudness characteristic of an audio signal, and apparatus and computer program associated therewith |