TWI397901B

TWI397901B - Method for controlling a particular loudness characteristic of an audio signal, and apparatus and computer program associated therewith

Info

Publication number: TWI397901B
Application number: TW94138593A
Authority: TW
Inventors: Alan Jeffrey Seefeldt
Original assignee: Dolby Lab Licensing Corp
Priority date: 2004-12-21
Filing date: 2005-11-03
Publication date: 2013-06-01
Also published as: TW200623024A; MY154344A

Description

Method for controlling audio signal specific loudness characteristics and related devices and computer programs

Field of invention

本發明係關於音訊信號處理。尤其是，本發明係關於一音訊信號之感知聲音響度及/或感知頻譜平衡的量測與控制。本發明可被使用於，例如，在音訊重播環境中之一種或多種：響度補償音量控制、自動增益控制、動態範圍控制(其包含，例如，限制器、壓縮器、擴展器、等等)、動態等化、以及背景雜訊干擾補償。本發明除了包含方法之外，亦同時包含對應的電腦程式和裝置。The present invention relates to audio signal processing. In particular, the present invention relates to the measurement and control of perceived acoustics and/or perceived spectral balance of an audio signal. The present invention can be used, for example, in one or more of an audio replay environment: loudness compensated volume control, automatic gain control, dynamic range control (which includes, for example, limiters, compressors, expanders, etc.), Dynamic equalization, and background noise interference compensation. In addition to the method, the present invention also includes corresponding computer programs and devices.

Background of the invention

目前已有許多嘗試著產生令人滿意之客觀量測響度方法。弗萊徹(Fletcher)和馬森(Munson)在1933年確定人類聽力對於低和高頻率比對於中間(或語音)頻率較不敏感。他們同時也發現在聲音位準增加時，靈敏性相對地減少。一種早期的響度量測器是以麥克風、放大器、量測器以及濾波器之組合所構成，並被設計而粗略地模仿聽力於低、中和高聲音位準之頻率響應。There have been many attempts to produce a satisfactory objective measure of loudness. In 1933, Fletcher and Munson determined that human hearing was less sensitive to low and high frequencies than to intermediate (or speech) frequencies. They also found that the sensitivity is relatively reduced as the sound level increases. An early loudness metric was constructed with a combination of microphones, amplifiers, gauges, and filters, and was designed to roughly mimic the frequency response of hearing low, medium, and high sound levels.

雖然此裝置可提供單一種響度、常數位準、被隔離音調的量測，更複雜之聲音量測並不適當地匹配主觀的響度感覺。這型式的聲音位準量測器已被標準化，但僅被使用於特定應用，例如，工業雜訊之監控和控制。While this device provides a single loudness, constant level, and isolated tone measurement, more complex sound measurements do not properly match the subjective loudness perception. This type of sound level gauge has been standardized but is only used for specific applications, such as monitoring and control of industrial noise.

在1950年代早期，茨維克(Zwicker)和史蒂芬斯(Stevens)，以及其他人，將弗萊徹(Fletcher)和馬森(Munson)的研究延伸，並產生一種更真實的響度感知處理程序模式。史蒂芬斯(Stevens)將一種“複雜雜訊響度計算”的方法於1956年公佈在美國聲學協會期刊上，並且茨維克(Zwicker)將他的“響度心理性和方法性的成分”之文章於1958年公佈在Acoustica期刊上。在1959年茨維克(Zwicker)公佈一種響度計算圖示化步驟，並且緊接著公佈許多相似文章。Stevens和Zwicker之方法分別地被標準化為ISO 532，A和B部份。兩種方法皆包含相似之步驟。In the early 1950s, Zwicker and Stevens, among others, extended the research of Fletcher and Munson and produced a more realistic loudness-aware processor pattern. . Stevens published a "complex noise loudness calculation" method in the journal of the American Acoustics Association in 1956, and Zwicker wrote his "Psychology and Methodological Components" Published in the journal Acoustica in 1958. In 1959, Zwicker published a graphical representation of loudness calculations, followed by many similar articles. The methods of Stevens and Zwicker were standardized as ISO 532, Part A and Part B, respectively. Both methods contain similar steps.

首先，沿著內耳之基底膜的時變分配能量，被稱為激勵，利用將音訊傳輸經由具有在臨界頻帶率尺度平均分配的中心頻率之一群集的帶通聽覺濾波器而被模擬。各個聽覺的濾波器被設計以模擬沿著內耳基底膜在特定位置的頻率響應，其中濾波器之中心頻率對應至該位置。臨界頻帶寬度被定義為一個此種濾波器的頻寬。以赫茲單位作為量測，這些聽覺濾波器的臨界頻帶寬度隨著中心頻率增加。因此可定義一組被抝曲的頻率尺度以使所有依據此抝曲尺度所量測的聽覺濾波器之臨界頻帶寬度是常數。此類被抝曲尺度被稱為臨界頻帶率尺度並且對於了解和模擬廣泛範圍的心理聽覺學現象是非常有用的。請參看，例如，心理聽覺學－事實和模型，其由E.Zwicker和H.Fastl，Springer－Verlag，Berlin，於1990年所發表。Stevens和Zwicker採用被稱為Bark刻度之一種臨界頻帶率尺度的方法，其中臨界頻帶寬度在500Hz之下是常數並且在500Hz以上則增加。最近，Moore和Glasberg定義一種臨界頻帶率尺度，其被稱為等效矩形頻寬(ERB)尺度(B.C.J.Moore、B.Glasberg、T.Baer，“臨限、響度、以及部份響度的預測模式”，其刊登於1997年4月之音訊工程協會期刊，第145卷，編號4，224－240頁)。利用使用動態雜訊遮罩之心理聽覺學試驗，Moore和Glasberg展示臨界頻帶寬度在500Hz之下繼續地減少，相對地在Bark尺度中的臨界頻帶寬度則保持不變。First, the time-distributed energy along the basement membrane of the inner ear, referred to as excitation, is modeled by passing the audio transmission through a bandpass auditory filter clustered with one of the center frequencies evenly distributed at the critical band rate scale. Each of the auditory filters is designed to simulate a frequency response at a particular location along the inner ear basement membrane, where the center frequency of the filter corresponds to that location. The critical band width is defined as the bandwidth of one such filter. Taking the Hertz unit as a measure, the critical band width of these auditory filters increases with the center frequency. Thus a set of distorted frequency scales can be defined such that the critical bandwidth of all auditory filters measured according to this skew scale is constant. Such a tortuous scale is known as the critical band rate scale and is very useful for understanding and simulating a wide range of psychoacoustic phenomena. See, for example, Psycho-Acoustics - Facts and Models, published by E. Zwicker and H. Fastl, Springer-Verlag, Berlin, 1990. Stevens and Zwicker use a method called the critical band rate scale of the Bark scale, where the critical band width is constant below 500 Hz and increases above 500 Hz. Recently, Moore and Glasberg defined a critical band rate scale called the equivalent rectangular bandwidth (ERB) scale (BCJ Moore, B. Glasberg, T. Baer, "predictive mode for threshold, loudness, and partial loudness". , published in the April 1997 issue of the Journal of the Journal of Audio Engineering, Vol. 145, No. 4, pp. 224-240. Using psychoacoustic experiments using dynamic noise masks, Moore and Glasberg show that the critical band width continues to decrease below 500 Hz, while the critical band width in the Bark scale remains relatively unchanged.

激勵計算的後面是一種非線性的壓縮性函數，其產生被稱為“特定響度”的一種數量。特定響度是頻率和時間函數之感知響度之量測並且可依據每單位頻率之感知響度而沿著臨界頻帶率尺度作為量測單位，如上面所討論的Bark或ERB尺度。最後，時變的“總計響度”將利用在頻率域積分特定響度而被計算。當特定響度依據沿著臨界頻帶率尺度被平均分佈的一組限定聽覺濾波器而被評估時，總計響度將可簡單地利用來自各濾波器的特定響度相加而被計算。The excitation calculation is followed by a non-linear compressive function that produces a quantity called "specific loudness." The specific loudness is a measure of the perceived loudness of the frequency and time functions and can be used as a unit of measurement along the critical band rate scale, depending on the perceived loudness per unit frequency, such as the Bark or ERB scale discussed above. Finally, the time-varying "total loudness" will be calculated by integrating a specific loudness in the frequency domain. When a particular loudness is evaluated based on a set of defined auditory filters that are evenly distributed along a critical band rate scale, the total loudness will simply be calculated using the particular loudness additions from each filter.

響度可以口方(phon)作為量測單位。以口方測量的聲音響度是具有主觀響度等於該聲音的1kHz音調之聲音壓力位準(SPL)。照慣例，SPL的參考0dB是2x10^－ ⁵ (帕司卡)Pascal(壓力單位)的均方根壓力，並且因此，這同時也是參考0口方。使用這定義比較頻率在1kHz之外並具有響度為1kHz的音調響度，可提供所給予口方位準之相等響度的等高線。第11圖展示相等響度等高線在20Hz和12.5kHz之間的頻率，以及在4.2口方(被認為是聽力臨限)和120口方(ISO 226：1087(E)，“聽覺正常相等響度位準等高線”)之間的口方位準。該口方量測考慮到人類聽力隨著頻率變化的靈敏性，但是其結果不允許具有變化位準的聲音之相對主觀響度的評估，因為並未試圖隨著SPL之非線性成長而更正響度，亦即，等高線之間隔變化的情況。The loudness can be measured by the phon. The acoustic level measured at the mouth is a sound pressure level (SPL) with a subjective loudness equal to the 1 kHz tone of the sound. Conventionally, the reference 0 dB of the SPL is the root mean square pressure of 2x10 ^- ⁵ (pascal) Pascal (pressure unit), and therefore, this is also the reference 0 port. Using this definition, the comparison of the frequency below 1 kHz and having a pitch loudness of 1 kHz provides a contour for the equal loudness of the given mouth orientation. Figure 11 shows the frequency of the equal loudness contour between 20 Hz and 12.5 kHz, and at the 4.2 square (considered hearing tolerance) and 120 square (ISO 226:1087 (E), "hearing normal equal loudness level The position of the mouth between the contour lines is). The oral measurement takes into account the sensitivity of human hearing as a function of frequency, but the results do not allow for the assessment of the relative subjective loudness of a sound with varying levels, as no attempt is made to correct the loudness as the SPL grows nonlinearly, That is, the interval of the contour lines changes.

響度同時也可被以“宋(sone－－－響度單位)”為單位量測。在口方單位和宋單位之間有一對一的映射，如第11圖所指示。一個宋被定義為一組40dB(SPL)1kHz純正弦波形之響度並且等於40口方。該宋單位被定義為宋的兩倍增加對應至感知響度的兩倍增加。例如，4宋被感知為2個宋的兩倍大的聲量。因此，將響度位準以宋表示，將可提供較多資訊。將特定響度定義為頻率和時間函數之感知響度之量測，特定響度可以每單位頻率的宋為量測單位。因此，當使用Bark刻度時，特定響度是以每Bark的宋為單位，並且類似地當使用ERB刻度時，則以每ERB的宋為單位。Loudness can also be measured in units of "song (- loudness unit)". There is a one-to-one mapping between the oral unit and the Song unit, as indicated in Figure 11. A Song is defined as the loudness of a set of 40dB (SPL) 1kHz pure sinusoidal waveforms and is equal to 40 squares. The Song unit is defined as twice the increase in Song corresponding to a twofold increase in perceived loudness. For example, 4 Song is perceived as twice as loud as 2 Songs. Therefore, the loudness level will be expressed in Song and will provide more information. The specific loudness is defined as the measure of the perceived loudness of the frequency and time functions, and the specific loudness can be measured in units of Song per unit frequency. Therefore, when using the Bark scale, the specific loudness is in units of Song per Bark, and similarly when using the ERB scale, it is in units of Song per ERB.

如上所述，人耳靈敏性隨著頻率和位準變化，如心理聽覺學文獻所詳記的現象。結果，一組給予的聲音之感知頻譜或音色隨著聽到聲音時的聲學位準而變化。例如，一組包含低、中與高頻率的聲音，此類頻率成分之感知相對比例隨著聲音的全面響度改變；在較小聲時，低和高頻率成分聲音之相對於中頻率是比大聲時較小聲。這現象是習知的並且在聲音複製設備中利用所謂的響度控制而被減輕。響度控制是在音量被轉小時施加低頻率並且有時也施加高頻率拉升的音量控制。因此，人耳在極端頻率之較低靈敏性利用那些頻率的人工拉升而獲得補償。此類控制的完成是完全靜態的；所施加的補償度是音量控制設定或一些使用者操作控制的一組函數，而不是音訊信號內容之函數。As noted above, human ear sensitivity varies with frequency and level, as is noted in the psychoacoustic literature. As a result, the perceived spectrum or timbre of a given set of sounds varies with the acoustic level at which the sound is heard. For example, a group of sounds containing low, medium, and high frequencies, the perceived relative proportion of such frequency components changes with the overall loudness of the sound; in the case of smaller sounds, the sounds of the low and high frequency components are larger than the medium frequency. The sound is louder. This phenomenon is conventional and is alleviated in sound reproduction equipment using so-called loudness control. Loudness control is a volume control that applies a low frequency when the volume is turned down and sometimes also applies a high frequency pull. Therefore, the lower sensitivity of the human ear at extreme frequencies is compensated by the artificial pull-up of those frequencies. The completion of such control is completely static; the degree of compensation applied is a set of functions for volume control settings or some user operation control, rather than a function of the content of the audio signal.

實際上，在低、中和高頻率之間的感知相對頻譜平衡改變取決於信號，尤其是取決於信號之實際頻譜以及信號是否該是大聲或小聲。考慮交響樂團之錄音。以一位聽眾在音樂廳所聽到之相同位準複製，跨越頻譜之平衡可能是正確的，無論交響樂團的演奏是大聲或小聲。例如，如果音樂被以10dB較小聲複製，跨越頻譜之感知平衡在較大聲的一段以一種方式改變並且在較小聲的一段以另一方式改變。習見的靜態響度控制不以音樂的函數而施加不同的補償。在國際專利申請序號PCT/US 2004/016964案，其建檔於2004年5月27日，而於2004年12月23日被公佈在WO 2004/111994 A2，Seefeld等人之揭示，量測且調整一組音訊信號之感知響度的系統。該PCT申請，其指定於美國，且整體地於此配合參考。在該申請中，一心理聽覺學模式以感知單位計算一組音訊信號響度。此外，該申請引介計算一種寬頻帶相乘增益之技術，當其被施加至音訊時，產生大致地和標準響度相同之一組增益－被修改音訊之響度。但是，這種寬頻帶增益之應用，改變音訊之感知頻譜平衡。In fact, the perceived relative spectral balance change between low, medium and high frequencies depends on the signal, especially depending on the actual spectrum of the signal and whether the signal should be loud or whisper. Consider the recording of the symphony orchestra. The balance across the spectrum may be correct with the same level of information heard by an audience in the concert hall, whether the symphony orchestra is loud or whispered. For example, if the music is reproduced with a 10 dB smaller sound, the perceived balance across the spectrum changes in one way over a larger segment and in another in a smaller segment. The static loudness control that is conventionally seen does not impose different compensations as a function of music. In the international patent application serial number PCT/US 2004/016964, the file was filed on May 27, 2004, and was published on December 23, 2004 in WO 2004/111994 A2, disclosed by Seefeld et al. A system that adjusts the perceived loudness of a set of audio signals. This PCT application, which is assigned to the United States, is hereby incorporated by reference in its entirety. In this application, a psychoacoustic mode calculates a set of audio signal loudness in units of perception. In addition, the application introduces a technique for calculating a broadband multiplicative gain that, when applied to an audio, produces a set of gains that are substantially the same as the standard loudness - the loudness of the modified audio. However, the application of this wideband gain changes the perceived spectral balance of the audio.

Summary of invention

於本發明之一論點，本發明提供利用修改音訊信號以便減少在其之特定響度和目標特定響度之間的差異而推導可使用以控制音訊信號之特定響度的資訊方法。特定響度是頻率和時間函數之感知響度之量測。在實際的製作中，被修改音訊信號之特定響度可被處理至近似目標特定響度。近似度不僅被一般信號處理之考量所影響，同時也被可被用於修改之時間及/或頻率平滑化所影響，如被說明於下。In one aspect of the present invention, the present invention provides an information method for deriving a particular loudness that can be used to control an audio signal by modifying the audio signal to reduce the difference between its particular loudness and target specific loudness. The specific loudness is a measure of the perceived loudness of the frequency and time functions. In actual production, the specific loudness of the modified audio signal can be processed to approximate the target specific loudness. The approximation is not only affected by general signal processing considerations, but also by the time and/or frequency smoothing that can be used for modification, as explained below.

因為特定響度是頻率和時間函數之音訊信號感知響度之量測，為了減少在音訊信號之特定響度和目標特定響度之間的差異，修改時可依據頻率之函數而修改音訊信號。雖然在一些情況中目標特定響度可能是非時變並且音訊信號本身可能是一組穩態非時變信號，一般而言，修改同時也可依據時間之函數而修改音訊信號。Since the specific loudness is a measure of the perceived loudness of the audio signal as a function of frequency and time, in order to reduce the difference between the specific loudness of the audio signal and the target specific loudness, the audio signal can be modified as a function of frequency. While in some cases the target specific loudness may be time-invariant and the audio signal itself may be a set of steady-state time-invariant signals, in general, the modification may also modify the audio signal as a function of time.

本發明之論點同時也可被採用在音訊重播環境中補償背景雜訊干擾。當音訊在背景雜訊存在被聽到時，雜訊可能部份地或完全地以依據音訊位準和頻譜以及雜訊位準和頻譜的一種方式遮罩音訊。因而產生音訊感知頻譜的變更。依據心理聽覺學的研究(請參考，例如，Moore、Glasberg、和Baer，“臨限、響度、以及部份響度之預測模式”，1997年4月，音訊工程協會期刊，45卷，編號4)，音訊之“部份特定響度”可被定義為在第二干擾聲音信號，例如，雜訊，存在時的音訊之感知響度。The arguments of the present invention can also be employed to compensate for background noise interference in an audio replay environment. When the audio is heard in the background noise, the noise may partially or completely mask the audio in a manner that depends on the audio level and the spectrum, as well as the noise level and spectrum. This produces a change in the audio-aware spectrum. Based on psychoacoustic research (please refer, for example, Moore, Glasberg, and Baer, “Predicting Modes for Threshold, Loudness, and Partial Loudness”, April 1997, Journal of the Audio Engineering Society, Volume 45, No. 4) The "partial specific loudness" of the audio can be defined as the perceived loudness of the audio in the presence of the second interfering sound signal, for example, noise.

因此，在本發明之另一論點中，本發明提供利用修改音訊信號以便減少在其之部份特定響度和目標特定響度之間的差異而推導可使用以控制該音訊信號之部份特定響度資訊的方法。因而以一種感知精確的方式以減輕雜訊所造成之影響。在本發明這以及其他考慮干擾雜訊信號論點中，將假設可單獨地觸取音訊信號並且單獨地觸取第二干擾信號。Accordingly, in another aspect of the present invention, the present invention provides for utilizing a modified audio signal to reduce a difference between a portion of its specific loudness and a target specific loudness to derive a portion of the specific loudness information that can be used to control the audio signal. Methods. Therefore, in a perceptually accurate way to mitigate the effects of noise. In this and other considerations of the interference noise signal of the present invention, it will be assumed that the audio signal can be individually touched and the second interference signal can be individually accessed.

在另一論點中，本發明提供利用修改音訊信號以便減少在其之特定響度以及一目標特定響度之間的差異而控制音訊信號之特定響度的方法。In another aspect, the present invention provides a method of controlling the specific loudness of an audio signal by modifying the audio signal to reduce the difference between its particular loudness and a target specific loudness.

在另一論點中，本發明提供利用修改音訊信號以便減少在其之部份特定響度以及一目標特定響度之間的差異以控制音訊信號之部份特定響度的方法。In another aspect, the present invention provides a method for modifying a portion of a particular loudness of an audio signal by modifying the audio signal to reduce the difference between a portion of its specific loudness and a target specific loudness.

當目標特定響度不是音訊信號之一種函數時，其可能是儲存或接收的目標特定響度。當目標特定響度不是音訊信號之一種函數時，修改或推導可明確地或隱含地計算特定響度或部份特定響度。隱含地計算之範例包含，一對照表或一“封閉形式”數學表示式，於其中，特定響度及/或部份特定響度被固有地決定(封閉形式描述一數學表示式，其可利用有限數目的標準數學運算和函數，例如，指數和餘弦，而準確地被表示)。同時當目標特定響度不是音訊信號的一種函數時，目標特定響度可能是非時變的且非頻變的或其可能僅是非頻變。When the target specific loudness is not a function of the audio signal, it may be the target specific loudness stored or received. When the target specific loudness is not a function of the audio signal, the modification or derivation may explicitly or implicitly calculate a particular loudness or a portion of a particular loudness. An implicitly calculated example includes a look-up table or a "closed form" mathematical expression in which a particular loudness and/or a portion of a particular loudness is inherently determined (a closed form describes a mathematical representation that is available for limited use) The number of standard mathematical operations and functions, such as exponents and cosines, are accurately represented). While the target specific loudness is not a function of the audio signal, the target specific loudness may be time-invariant and non-frequency-variant or it may only be non-frequency varying.

在另一論點中，本發明提供利用依據一個或更多處理程序以及一個或多個處理程序而控制參數處理音訊信號或音訊信號之量測以產生一種目標特定響度之音訊信號的處理。雖然目標特定響度可能是非時變(“固定的”)，目標特定響度可有利地是音訊信號之特定響度的一組函數。雖然目標特定響度可能是靜態、非頻變且非時變的信號，一般而言，音訊信號本身是頻變且時變，因此導致當它是音訊信號的一組函數時，目標特定響度則為頻變和時變。In another aspect, the present invention provides a process for controlling the measurement of an audio signal or an audio signal by a parameter in accordance with one or more processing programs and one or more processing programs to produce a target specific loudness audio signal. While the target specific loudness may be time-invariant ("fixed"), the target specific loudness may advantageously be a function of a particular loudness of the audio signal. Although the target specific loudness may be a static, non-frequency-changing and non-time-varying signal, in general, the audio signal itself is frequency-variant and time-varying, so that when it is a set of functions of the audio signal, the target specific loudness is Frequency change and time change.

音訊和目標特定響度或目標特定響度的表示可來自傳輸或儲存媒體之複製。The representation of the audio and target specific loudness or target specific loudness may be from a copy of the transmission or storage medium.

目標特定響度的表示可能是一個或多個尺度調整，其調整音訊信號或音訊信號之量測。The representation of the target specific loudness may be one or more scale adjustments that adjust the measurement of the audio signal or the audio signal.

本發明任何上述論點的目標特定響度可以是該音訊信號或該音訊信號之量測函數。音訊信號的特定響度是音訊信號的一種適當量測。該音訊信號或該音訊信號之量測的函數可以是該音訊信號或該音訊信號之量測之一個尺度調整。例如，尺度調整可以是一個尺度調整或多個尺度調整的組合：(a)一時間和頻率－變化尺度係數Ξ[b,t]以下列關係達成特定響度之尺度調整 (b)一時變、非頻變尺度係數Φ[t]以下列關係達成特定響度之尺度調整 (c)一非時變、頻變尺度係數以下列關係達成特定響度之尺度調整；並且(d)一非時變、非頻變、尺度係數Θ[b]以下列關係達成音訊信號之特定響度的尺度調整其中[b ,t ]是目標特定響度、N [b ,t ]是音訊信號的特定響度、b是頻率量測、以及t是時間量測。The target specific loudness of any of the above-discussed aspects of the present invention may be the measurement function of the audio signal or the audio signal. The specific loudness of an audio signal is an appropriate measure of the audio signal. The function of the audio signal or the measurement of the audio signal may be a scale adjustment of the audio signal or the measurement of the audio signal. For example, the scale adjustment can be a combination of scale adjustments or multiple scale adjustments: (a) a time and frequency-change scale factor Ξ[b,t] achieves a specific loudness scale adjustment with the following relationship (b) A time-varying, non-frequency-varying scale factor Φ[t] achieves a specific loudness scale adjustment with the following relationship (c) A non-time-varying, frequency-varying scale factor achieves a certain degree of loudness adjustment in the following relationship; And (d) a non-time-varying, non-frequency-changing, scale factor Θ[b] achieves a scale adjustment of the specific loudness of the audio signal in the following relationship among them [ b , t ] is the target specific loudness, N [ b , t ] is the specific loudness of the audio signal, b is the frequency measurement, and t is the time measurement.

於時間和頻率－變化尺度係數的情況(a)中，尺度調整可至少部分地利用所需的多頻帶響度和音訊信號之多頻帶響度的比率而決定。此類的尺度調整可被使用為一種動態範圍控制。採用本發明作為一種動態範圍控制之論點的進一步細節被說明於下。In case (a) of the time and frequency-change scale factor, the scale adjustment can be determined, at least in part, by the ratio of the desired multi-band loudness and the multi-band loudness of the audio signal. Scale adjustments of this type can be used as a dynamic range control. Further details of the use of the present invention as a dynamic range control argument are set forth below.

同時在時間和頻率－變化尺度係數的情況(a)中，特定響度可依據所需頻譜形狀的量測和音訊信號之頻譜形狀的量測之比率而被尺度調整。此類的尺度調整可被採用以將音訊信號之感知頻譜從時變感知頻譜轉換至大致地非時變感知頻譜。當特定響度按照所需的頻譜形狀之量測和音訊信號的頻譜形狀之量測的比率被尺度調整時，此類的尺度調整可被使用為一組動態等化器。採用本發明作為動態等化器之論點將進一步地被詳細說明於下。At the same time, in the case of time and frequency-variation scale coefficients (a), the specific loudness can be scaled according to the ratio of the measurement of the desired spectral shape to the measurement of the spectral shape of the audio signal. Such scale adjustments can be employed to convert the perceived spectrum of the audio signal from a time varying perceptual spectrum to a substantially non-time varying perceptual spectrum. Such scale adjustments can be used as a set of dynamic equalizers when the specific loudness is scaled by the ratio of the measured spectral shape and the measured spectral shape of the audio signal. The arguments for using the present invention as a dynamic equalizer will be further described in detail below.

於時變、非頻變尺度係數的情況(b)中，尺度調整可根據所需的寬頻帶響度和音訊信號之寬頻帶響度的比率至少部分地被決定。此類的尺度調整可被使用作為一種自動增益控制或動態範圍控制。採用本發明作為一自動增益控制或一動態範圍控制之論點將被進一步地說明於下。In the case of time-varying, non-frequency-varying scale coefficients (b), the scale adjustment can be determined at least in part according to the ratio of the desired wide-band loudness and the wide-band loudness of the audio signal. Such scale adjustments can be used as an automatic gain control or dynamic range control. The argument that the invention is used as an automatic gain control or a dynamic range control will be further described below.

在情況(a)(時間頻率－變化尺度係數)或情況(b)(時變、非頻變尺度係數)中，尺度係數可以是該音訊信號或該音訊信號之量測的函數。In case (a) (time frequency - variation scale factor) or case (b) (time varying, non-frequency variable scale factor), the scale factor may be a function of the measurement of the audio signal or the audio signal.

在非時變、頻變尺度係數的情況或非時變、非頻變、尺度係數的情況(d)中，修改或推導可包含儲存尺度係數或該尺度係數可從外部來源被接收。In the case of non-time varying, frequency varying scale coefficients or in the case of non-time varying, non-frequency varying, scale coefficients (d), the modification or derivation may include storing scale coefficients or the scale coefficients may be received from an external source.

在情況(c)或情況(d)的任一情況中，尺度係數可能不是該音訊信號或該音訊信號之量測的函數。In either case of case (c) or case (d), the scale factor may not be a function of the measurement of the audio signal or the audio signal.

本發明之任何論點以及其變化、修改、推導、或產生，可以各自地、明確地或隱含地計算(1)特定響度、及/或(2)部份特定響度、及/或(3)目標特定響度。隱含地計算可包含，例如，一對照表或一封閉形式數學表示式。Any of the arguments of the present invention, as well as variations, modifications, derivations, or generations thereof, may be individually, explicitly, or implicitly calculated as (1) a particular loudness, and/or (2) a portion of a particular loudness, and/or (3) Target specific loudness. Implicit calculations may include, for example, a look-up table or a closed form mathematical representation.

修改參數可在時間上被平滑化。修改參數可以是，例如，(1)關於音訊信號之頻帶的多數個振幅尺度調整或(2)控制一個或更多濾波器，例如多分支FIR濾波器或多極點IIR濾波器的多數個濾波器係數。該尺度調整或濾波器係數(及它們被應用於其中的濾波器)可以是時變的。Modifying parameters can be smoothed in time. The modified parameters may be, for example, (1) a majority of amplitude scale adjustments with respect to the frequency band of the audio signal or (2) control of one or more filters, such as a plurality of filters of a multi-branch FIR filter or a multi-pole IIR filter. coefficient. The scale adjustments or filter coefficients (and the filters to which they are applied) can be time-varying.

在計算定義目標特定響度的音訊信號之特定響度的函數或該函數之反函數時，進行此類計算的程序在被歸類於感知(心理聽覺學)響度領域中操作，其中計算之輸入和輸出為特定響度。相對地，在將振幅尺度調整施加至音訊信號頻帶或將濾波器係數施加至音訊信號之可控制過濾時，該修改參數操作以於感知(心理聽覺學)響度領域之外，在可被歸類於電氣信號的領域中修改該音訊信號。雖然可在電氣信號領域中進行音訊信號之音訊信號修改，在電氣信號領域中的這種改變從在感知(心理聽覺學)響度領域中的計算中導出以至於被修改音訊信號具有近似所需的目標特定響度之特定響度。In calculating a function of a particular loudness of an audio signal defining a target specific loudness or an inverse function of the function, the procedure for performing such calculations is operated in the field of perception (perceptual psychoacoustic) loudness, where the input and output of the calculation For a specific loudness. In contrast, when applying amplitude scale adjustment to the audio signal band or applying filter coefficients to the controllable filtering of the audio signal, the modified parameter operates outside of the perceptual (psychoacoustic) loudness domain and can be classified The audio signal is modified in the field of electrical signals. Although the audio signal modification of the audio signal can be performed in the field of electrical signals, this change in the field of electrical signals is derived from calculations in the field of perceptual (psychoacoustic) loudness so that the modified audio signal has an approximate desired The specific loudness of the target specific loudness.

從在響度領域中的計算推導修改參數，可在感知響度和感知頻譜平衡方面獲得比從電氣信號領域中導出這些修改參數有更多的控制。此外，使用基底膜模擬心理聽覺學濾波器組或其之等效以進行響度領域計算比從電氣信號領域中導出修改參數的配置提供更多的感知頻譜之詳細控制。Deriving the modified parameters from the calculations in the loudness field provides more control over the perceived loudness and perceived spectral balance than the derivation of these modified parameters from the electrical signal domain. Furthermore, the use of a basement membrane to simulate a psychoacoustic filterbank or its equivalent to perform loudness domain calculations provides more detailed control of the perceptual spectrum than a configuration that derives modified parameters from the electrical signal domain.

各種修改、推導、和產生取決於一個或多個的干擾音訊信號之量測、目標特定響度、從被修改音訊信號之特定響度或部份特定響度導出的未被修改音訊信號之特定響度的近似度、未被修改音訊信號之特定響度、以及從被修改音訊信號之特定響度或部份特定響度導出的目標特定響度之近似值。Various modifications, derivations, and generations of approximating the specific loudness of an unmodified audio signal derived from one or more interfering audio signals, a target specific loudness, a particular loudness from a modified audio signal, or a portion of a particular loudness Degree, the specific loudness of the unmodified audio signal, and an approximation of the target specific loudness derived from the specific loudness or partial specific loudness of the modified audio signal.

該修改或推導可至少部分地自一個或更多個下列各項導出修改參數：干擾音訊信號量測、目標特定響度、導自被修改音訊信號之特定響度或部份特定響度之未被修改音訊信號特定響度評估值、未被修改音訊信號特定響度、及導自被修改音訊信號之特定響度或部份特定響度之目標特定響度近似值。The modification or derivation may derive modification parameters at least in part from one or more of: interfering audio signal measurements, target specific loudness, unrecognized audio from a particular loudness or partial specific loudness of the modified audio signal The signal specific loudness evaluation value, the specific loudness of the unmodified audio signal, and the target specific loudness approximation derived from the specific loudness or part of the specific loudness of the modified audio signal.

尤其是，該修改或推導可以至少部分地自下列各項導出修改參數：(1)下列之一者一目標特定響度，以及一從被修改音訊信號之特定響度接收的未被修改音訊信號之特定響度的近似值，以及(2)下列之一者未被修改音訊信號之特定響度，以及從被修改音訊信號之特定響度導出的一目標特定響度近似值，或，當干擾音訊信號將被列入考慮時，該修改或推導可以至少部分地自下列各項導出修改參數：(1)干擾音訊信號之一量測，(2)下列之一者一目標特定響度，及一從被修改音訊信號之部份特定響度導出的未被修改音訊信號之特定響度的近似值，及(3)下列之一者一未被修改音訊信號之特定響度，及從被修改音訊信號之部份特定響度導出的一組目標特定響度近似值。In particular, the modification or derivation may derive the modification parameters at least in part from: (1) one of the following: a target specific loudness, and a particular one of the unmodified audio signal received from the particular loudness of the modified audio signal. An approximation of the loudness, and (2) a specific loudness of the unmodified audio signal, and a target specific loudness approximation derived from the specific loudness of the modified audio signal, or when the interfering audio signal is to be considered The modification or derivation may derive modified parameters at least in part from (1) one of the interfering audio signals, (2) one of the following, a target specific loudness, and a portion from the modified audio signal. An approximation of a particular loudness of an unmodified audio signal derived from a particular loudness, and (3) a specific loudness of one of the unmodified audio signals, and a set of target specificities derived from a particular loudness of the modified audio signal The loudness is approximate.

一組前授配置可被採用，其中該特定響度是從音訊信號被導出並且其中該目標特定響度是來自方法之外部來源或在修改或推導包含儲存一組目標特定響度時來自一儲存部。另外地，一組混合前授/回授配置可被採用，其中目標特定響度之近似值是從被修改音訊信號導出，並且其中，當修改或推導包含儲存一組目標特定響度時，目標特定響度是來自方法之外的來源或來自一組儲存部。A set of preamble configurations can be employed, wherein the particular loudness is derived from an audio signal and wherein the target specific loudness is from an external source of the method or from a store when the modification or derivation includes storing a set of target specific loudness. Additionally, a set of hybrid pre-receiving/responsible configurations can be employed in which an approximation of the target specific loudness is derived from the modified audio signal, and wherein when the modification or derivation includes storing a set of target specific loudness, the target specific loudness is Sources from outside the method or from a set of storage.

該修改或推導可包含一種或多種處理程序以明確地或隱含地獲得該目標特定響度，而其中之一個或多個明確地或隱含地計算該音訊信號或該音訊信號之量測的函數。另外，一組前授配置可被採用，其中該特定響度和該目標特定響度從音訊信號被導出，目標特定響度之推導採用該音訊信號或該音訊信號之量測的函數。在另一種選擇中，一組混合前授/回授配置可被採用，其中目標特定響度的近似度從被修改音訊信號被導出，並且目標特定響度從音訊信號被導出，目標特定響度之推導採用該音訊信號或該音訊信號之量測的函數。The modification or derivation may include one or more processing procedures to obtain the target specific loudness, either explicitly or implicitly, and one or more of the functions of the audio signal or the measurement of the audio signal are explicitly or implicitly calculated. . In addition, a set of preamble configurations can be employed, wherein the specific loudness and the target specific loudness are derived from the audio signal, and the derivation of the target specific loudness is a function of the measurement of the audio signal or the audio signal. In another option, a set of hybrid pre-receiving/responsible configurations can be employed in which the approximation of the target specific loudness is derived from the modified audio signal, and the target specific loudness is derived from the audio signal, and the derivation of the target specific loudness is employed. The function of the audio signal or the measurement of the audio signal.

該修改或推導可包含一個或多個處理程序，以明確地或隱含地獲得反應於被修改音訊信號之未被修改音訊信號的特定響度的評估值，其中之一個或多個明確地或隱含地計算該音訊信號或該音訊信號之量測的函數的反函數。另外，一組回授配置被採用，其中未被修改音訊信號之特定響度的評估值和目標特定響度的近似度從被修改音訊信號導出，特定響度之評估值使用該音訊信號或該音訊信號之量測的函數之反函數而被計算。在另一選擇中，一組混合前授/回授配置被採用，其中特定響度從音訊信號被導出並且未被修改音訊信號之特定響度的評估值從被修改音訊信號導出，該音訊信號或該音訊信號之量測的函數之反函數被使用以計算評估值的推導。The modification or derivation may include one or more processing procedures to obtain, explicitly or implicitly, an evaluation value of a particular loudness of the unmodified audio signal that is reflected in the modified audio signal, one or more of which are explicitly or implicitly Inversely calculating an inverse function of the function of the audio signal or the measurement of the audio signal. In addition, a set of feedback configurations is adopted, wherein the evaluation value of the specific loudness of the unmodified audio signal and the approximation of the target specific loudness are derived from the modified audio signal, and the evaluation value of the specific loudness uses the audio signal or the audio signal. The inverse function of the measured function is calculated. In another option, a set of hybrid pre-receiving/responsible configurations are employed in which a particular loudness is derived from the audio signal and an evaluation of the particular loudness of the unmodified audio signal is derived from the modified audio signal, the audio signal or The inverse of the function of the measurement of the audio signal is used to calculate the derivation of the evaluation value.

修改參數可被施加至音訊信號以產生一組被修改之音訊信號。Modification parameters can be applied to the audio signal to produce a modified set of audio signals.

本發明之另一論點在於處理程序或裝置可具有時間及/或空間分離，因此，實際上，具有編碼器或編碼程序並且同時也具有解碼器或解碼程序。例如，可能具有一編碼/解碼系統其中該修改或推導可傳輸和接收或儲存並且同時也複製該音訊信號以及任一的(1)修改參數或(2)目標特定響度或目標特定響度的表示。另外地，實際上，可能僅有一組編碼器或編碼程序，其中具有一組音訊信號以及(1)修改參數或(2)目標特定響度或目標特定響度的表示之傳輸或儲存。另外地，實際上，可能僅有一組解碼器或解碼程序，其中具有一組音訊信號以及(1)修改參數或(2)目標特定響度或目標特定響度的表示之傳輸或儲存。Another argument of the invention is that the processing program or device may have temporal and/or spatial separation and, therefore, actually have an encoder or encoding program and also have a decoder or decoding program. For example, there may be an encoding/decoding system in which the modification or derivation may transmit and receive or store and at the same time also copy the audio signal and either (1) the modified parameter or (2) the target specific loudness or the target specific loudness representation. Additionally, in practice, there may be only one set of encoders or encoding programs with a set of audio signals and (1) modified parameters or (2) transmission or storage of representations of target specific loudness or target specific loudness. Additionally, in practice, there may be only one set of decoders or decoding programs having a set of audio signals and (1) modified parameters or (2) transmission or storage of representations of target specific loudness or target specific loudness.

Simple illustration

第1圖展示依據本發明之一論點的前授製作範例的一組功能方塊圖。Figure 1 shows a set of functional block diagrams of a pre-production paradigm in accordance with one aspect of the present invention.

第2圖展示依據本發明之一論點的回授製作範例的一組功能方塊圖。Figure 2 shows a set of functional block diagrams of a feedback production paradigm in accordance with one aspect of the present invention.

第3圖展示依據本發明之一論點的混合前授/回授製作範例之一組功能方塊圖。Figure 3 is a block diagram showing a functional group of a hybrid pre-commissioning/responsibility production example in accordance with one aspect of the present invention.

第4圖展示依據本發明之一論點的另一混合前授/回授製作範例的一組功能方塊圖。Figure 4 shows a set of functional block diagrams of another hybrid pre-/re-sales production paradigm in accordance with one aspect of the present invention.

第5圖是一組功能方塊圖，其展示被任何一組前授、回授、與混合前授回授配置所決定的未被修改音訊信號和修改參數可被儲存或被發送的方式，例如，以供在時間上或空間上分隔的裝置或處理程序中的使用。Figure 5 is a set of functional block diagrams showing the manner in which unmodified audio signals and modified parameters determined by any set of pre-grant, feedback, and hybrid pre-request configurations can be stored or transmitted, such as For use in devices or handlers that are separated in time or space.

第6圖是一組功能方塊圖，其展示被任何一組前授、回授、與混合前授回授配置所決定的未被修改音訊信號和目標特定響度或其表示可被儲存或被發送的方式，例如，以供在時間上或空間上分隔的裝置或處理程序中的使用。Figure 6 is a set of functional block diagrams showing that unmodified audio signals and target specific loudness or their representations determined by any set of pre-grant, feedback, and hybrid pre-requested configurations can be stored or transmitted. The manner, for example, is for use in a device or process that is separated in time or space.

第7圖展示本發明之一論點的縱觀之分解功能方塊圖或分解流程圖。Figure 7 is a block diagram showing an exploded view or an exploded flow chart of an overview of one of the arguments of the present invention.

第8圖是適合作為本發明之實施例中的一組傳輸濾波器之線性濾波器P(z)的理想特性反應，其中垂直軸是分貝衰減(dB)並且水平軸是以赫茲(Hz)為單位的對數基底10頻率。Figure 8 is an ideal characteristic response of a linear filter P(z) suitable as a set of transmission filters in an embodiment of the present invention, wherein the vertical axis is decibel attenuation (dB) and the horizontal axis is in Hertz (Hz) The logarithmic base of the unit is 10 frequencies.

第9圖展示在ERB頻率尺度(垂直軸)和以赫茲為單位的頻率(水平軸)之間的關係。Figure 9 shows the relationship between the ERB frequency scale (vertical axis) and the frequency in Hertz (horizontal axis).

第10圖展示近似在ERB尺度上的臨界頻帶之一組理想化聽覺濾波器特性響應。水平刻度是以赫茲為單位的頻率並且垂直刻度是分貝位準。Figure 10 shows a set of idealized auditory filter characteristic responses that approximate a critical band on the ERB scale. The horizontal scale is the frequency in Hertz and the vertical scale is the decibel level.

第11圖展示ISO 226的相等響度等高線。水平刻度是以赫茲為單位的頻率(對數基底10尺度)並且垂直刻度是以分貝為單位的聲音壓力位準。Figure 11 shows the equal loudness contour of ISO 226. The horizontal scale is the frequency in Hertz (the logarithmic base 10 scale) and the vertical scale is the sound pressure level in decibels.

第12圖展示被傳輸濾波器P(z)所標準化的ISO 226的相等響度等高線。水平刻度是以赫茲為單位的頻率(對數基底10尺度)並且垂直刻度是以分貝為單位的聲音壓力位準。Figure 12 shows the equal loudness contour of ISO 226 normalized by the transmission filter P(z). The horizontal scale is the frequency in Hertz (the logarithmic base 10 scale) and the vertical scale is the sound pressure level in decibels.

第13a圖展示在一片段的女性語音時響度尺度調整為0.25的寬頻帶與多頻帶增益之理想圖。水平刻度是ERB頻帶且垂直刻度是以分貝(dB)為單位的相對增益。Figure 13a shows an ideal plot of wideband and multiband gain with a loudness scale adjusted to 0.25 for a segment of female speech. The horizontal scale is the ERB band and the vertical scale is the relative gain in decibels (dB).

第13b圖分別地展示原始信號、寬頻帶增益被修改信號、以及多頻帶增益被修改信號的特定響度之理想圖。水平刻度是ERB頻帶並且垂直刻度是特定響度(宋/ERB)。Figure 13b shows an ideal plot of the original signal, the wideband gain modified signal, and the specific loudness of the multiband gain modified signal, respectively. The horizontal scale is the ERB band and the vertical scale is the specific loudness (Song/ERB).

第14a圖是一組理想圖，其展示：一般AGC時，L _o [t ]為L _i [t ]的一組函數。水平刻度是log(L _i [t ])和垂直刻度是log(L _o [t ])。FIG 14a is a set of ideal view showing: Usually when AGC, L _o [t] to L _i [t] is a set of functions. The horizontal scale is log( L _i [ t ]) and the vertical scale is log ( L _o [ t ]).

第14b圖是一組理想化圖，其展示：一般DRC時，L _o [t ]為L _i [t ]的一組函數。水平刻度是log(L _i [t ])和垂直刻度是log(L _o [t ])。FIG 14b is a set of idealized view showing: Usually when DRC, L _o [t] to L _i [t] is a set of functions. The horizontal scale is log( L _i [ t ]) and the vertical scale is log ( L _o [ t ]).

第15圖展示多頻帶DRC的一般頻帶平滑函數之理想圖。水平刻度是頻帶數目並且垂直刻度是頻帶b的增益輸出。Figure 15 shows an ideal plot of the general band smoothing function for multi-band DRC. The horizontal scale is the number of bands and the vertical scale is the gain output of band b.

第16圖展示本發明之一論點的縱觀之分解功能方塊圖或分解流程圖。Figure 16 is a block diagram showing an exploded view or an exploded flow chart of an overview of one of the arguments of the present invention.

第17圖相似於第1圖其同時也包含在重播環境中的雜訊補償的分解功能方塊圖或分解流程圖。Figure 17 is similar to Figure 1 which also contains a decomposition function block diagram or decomposition flow diagram of the noise compensation in the replay environment.

Best mode for carrying out the invention

第1至4圖展示依據本發明之論點的可能之前授、回授、與二組混合前授/回授的製作範例之功能方塊圖。Figures 1 through 4 show functional block diagrams of possible pre-delivery, feedback, and two-group hybrid pre-review/recall modes in accordance with the teachings of the present invention.

參看至第1圖之前授拓樸結構範例，一音訊信號被施加至二組路徑：(1)具有能夠反應於修改參數而修改音訊之一種處理程序或裝置2(“修改音訊信號”)的一組信號路線，以及(2)具有能夠產生此類修改參數的一種處理程序或裝置4(“產生修改參數”)之一組控制路線。在第1圖前授拓樸結構範例以及第2－4圖的各個範例中之修改音訊信號2可以是修改音訊信號的裝置或處理程序，例如，依據來自產生修改參數4(或對應程序或分別地來自第2－4圖的各個範例之裝置4’、4”及4''')的修改參數M，以頻變及/或時變方式，而修改其之振幅。產生修改參數4及各個第2－4圖中的對應者至少部份地在感知響度領域中操作。如第1－4圖的各個範例，修改音訊信號2在電氣信號領域中操作並且產生一組被修改音訊信號。同時也如第1－4圖的各個範例，修改音訊信號2和產生修改參數4(或其之對應者)修改一組音訊信號以減少在其之特定響度和一組目標特定響度之間的差異。Referring to the topology example prior to Figure 1, an audio signal is applied to two sets of paths: (1) a processor or device 2 ("Modify Audio Signal") that has the ability to modify the parameters to modify the audio. The set of signal routes, and (2) a set of control routes having a handler or device 4 ("generating modified parameters") capable of generating such modified parameters. The modified audio signal 2 in the first embodiment of the present invention and the modified audio signal 2 in each of the examples in FIGS. 2-4 may be devices or processing programs for modifying the audio signal, for example, based on the generated modified parameter 4 (or corresponding program or separate The modified parameters M of the devices 4', 4" and 4"') from the respective examples of Figures 2-4 are modified in frequency and/or time varying manner to produce modified parameters 4 and The counterparts in Figures 2-4 operate, at least in part, in the field of perceived loudness. As in the various examples of Figures 1-4, the modified audio signal 2 operates in the electrical signal domain and produces a set of modified audio signals. As also in the various examples of Figures 1-4, the modified audio signal 2 and the resulting modified parameter 4 (or their counterparts) modify a set of audio signals to reduce the difference between their particular loudness and a set of target specific loudnesses.

在第1圖前授範例中，處理程序或裝置4可包含許多處理程序及/或裝置：一組“計算目標特定響度”處理程序或裝置6其計算反應於音訊信號的目標特定響度或量測音訊信號，例如，音訊信號的特定響度、一組“計算特定響度”處理程序或裝置8其計算反應於音訊信號的音訊信號之特定響度或一量測音訊信號例如其之激勵、以及一組“計算修改參數”處理程序或裝置10其計算反應於特定響度和目標特定響度的修改參數。計算目標特定響度6可執行各具有函數參數之一組或更多組函數“F”。例如，其可計算音訊信號之特定響度並且接著將一組或更多組函數F施加至音訊信號以提供目標特定響度。在第1圖中這被分解地指示為“選擇函數F和函數參數”處理程序或裝置6的輸入。與其由裝置或處理程序6計算之外，目標特定響度可被產生修改參數4中所包含或與其相關的儲存處理程序或裝置(被分解地展示為處理程序或裝置10的“儲存”輸入)，或所有處理程序或裝置之外部的來源(被分解地展示為處理程序或裝置10的“外部”輸入)所提供。因此，修改參數至少是部分地依據感知(心理聽覺學)響度領域(亦即，至少特定響度並且，在一些情況中，目標特定響度計算)中的計算。In the first example of Figure 1, the processing program or device 4 may include a number of processing programs and/or devices: a set of "calculation target specific loudness" processing programs or devices 6 that calculate a target specific loudness or measurement that is reflected in the audio signal. An audio signal, for example, a specific loudness of an audio signal, a set of "calculated specific loudness" processing programs or means 8 that compute a particular loudness of an audio signal that is responsive to an audio signal or a measure of an audio signal such as its excitation, and a set of " The Calculate Modification Parameters procedure or device 10 calculates a modified parameter that is responsive to a particular loudness and target specific loudness. The calculation target specific loudness 6 may perform one or more sets of function "F" each having a function parameter. For example, it can calculate a particular loudness of the audio signal and then apply one or more sets of functions F to the audio signal to provide a target specific loudness. This is decomposed in the first figure as the input of the "select function F and function parameters" handler or device 6. Instead of being calculated by the device or process 6, the target specific loudness may be generated by a stored processing program or device included in or associated with the modified parameter 4 (decomposed as a "storage" input to the processing program or device 10), Or a source external to all of the handlers or devices (deprecated as an "external" input to the handler or device 10). Thus, the modified parameters are calculated, at least in part, in the field of perceptual (psychophonic) loudness (ie, at least a particular loudness and, in some cases, a target specific loudness calculation).

被處理程序或裝置6、8和10(及第2圖範例中的處理程序或裝置12、14、10’、第3圖範例中的6、14、10”、及第4圖範例中的8、12、10''')所進行的計算可被明確地及/或隱含地進行。隱含地進行的範例包含(1)一組對照表其項目是全部或部分地根據特定響度及/或目標特定響度及/或修改參數計算，和(2)一組封閉形式數學表示式其固有地全部或部分地根據特定響度及/或目標特定響度及/或修改參數。Processed programs or devices 6, 8 and 10 (and the processing programs or devices 12, 14, 10' in the example of Fig. 2, 6, 14, 10 in the example of Fig. 3, and 8 in the example in Fig. 4) The calculations performed by 12, 10''') may be performed explicitly and/or implicitly. Examples implicitly include (1) a set of comparison tables whose items are based in whole or in part on a particular loudness and/or Or target specific loudness and/or modified parameter calculations, and (2) a set of closed form mathematical expressions that are inherently wholly or partially based on a particular loudness and/or target specific loudness and/or modified parameters.

雖然第1圖範例中之計算處理程序或裝置6、8和10(及第2圖範例中之處理程序或裝置12、14、10’、第3圖範例中之6、14、10”、及第4圖範例中之8、12、10''')分解地被展示並且分別地被說明，這僅是為了說明目的。該了解的是，一個或所有的這些處理程序或裝置可被組合在單一處理程序或裝置中或不同地被組合在數個處理程序或裝置中。例如，如下面的第9圖之配置，如同第1圖範例之一組前授拓樸結構，該處理程序或裝置反應於從音訊信號和目標特定響度導出之平滑化激勵而計算修改參數。在第9圖範例中，計算修改參數的裝置或處理程序隱含地計算音訊信號之特定響度。Although the calculation processing program or devices 6, 8 and 10 in the example of Fig. 1 (and the processing programs or devices 12, 14, 10' in the example of Fig. 2, 6, 14, 10 in the example of Fig. 3, and 8, 12, 10'' in the example of Figure 4 are shown exploded and separately, for illustrative purposes only. It is understood that one or all of these processes or devices can be combined a single processing program or device is combined in a plurality of processing programs or devices, for example, as in the configuration of FIG. 9 below, like a group of pre-represented topology structures of the example of FIG. 1, the processing program or device The modified parameters are calculated in response to the smoothed excitation derived from the audio signal and the target specific loudness. In the example of Fig. 9, the means or process for calculating the modified parameters implicitly calculates the specific loudness of the audio signal.

如本發明之一論點，第1圖之範例和本發明其他實施例之範例，目標特定響度([b ,t ])可利用以一個或多個尺度的尺度調整而調整特定響度(N [b ,t ])地被計算。該尺度調整可以是一組時變和頻變尺度係數Ξ [b,t]，其依據下述關係進行特定響度的尺度調整：一組時變、非頻變尺度係數Φ[t]依據下述關係進行特定響度的尺度調整一組非時變、頻變尺度係數Θ[b ]依據下述關係進行特定響度的尺度調整音訊信號之特定響度的一組尺度係數α依據下述關係進行特定響度的尺度調整其中b是頻率量測(例如，頻帶數目)而t是時間量測(例如，區塊數目)。多重尺度調整同時也可使用多數個特定的尺度調整及/或特定尺度調整的組合而被採用。多重尺度調整的範例被給予於下。在一些情況中，如進一步地說明於下，尺度調整可以是該音訊信號或該音訊信號之量測的函數。在其他情況中，同時也將被進一步地說明於下，當尺度調整不是音訊信號量測之函數時，尺度調整可以不同的方法被決定或被供應。例如，使用者可選擇或施加一組非時變並且非頻變尺度係數α或一組非時變，頻變尺度係數Θ[b ]之尺度調整。As an argument of the present invention, an example of FIG. 1 and an example of other embodiments of the present invention, target specific loudness ( [ b , t ]) can be calculated using a scale adjustment of one or more scales to adjust the specific loudness ( N [ b , t ]). The scale adjustment can be a set of time-varying and frequency-varying scale coefficients Ξ [b, t], which scale the specific loudness according to the following relationship: A set of time-varying, non-frequency-varying scale coefficients Φ[t] scales the specific loudness according to the following relationship A set of non-time-varying, frequency-varying scale coefficients Θ[ b ] scales the specific loudness according to the following relationship A set of scale factors α of the specific loudness of the audio signal are scaled to a specific loudness according to the following relationship Where b is the frequency measurement (eg, the number of bands) and t is the time measurement (eg, the number of blocks). Multiple scale adjustments can also be employed using a combination of a number of specific scale adjustments and/or specific scale adjustments. An example of multiple scale adjustments is given below. In some cases, as further described below, the scale adjustment can be a function of the measurement of the audio signal or the audio signal. In other cases, it will be further explained below. When the scale adjustment is not a function of the audio signal measurement, the scale adjustment can be determined or supplied in different ways. For example, the user may select or apply a set of time-invariant and non-frequency-varying scale coefficients α or a set of non-time-varying, frequency-scaled scale coefficients Θ[ b ].

因此，目標特定響度可被表示為該音訊信號或該音訊信號之量測的一個或多個函數F(特定響度為音訊信號之一可能量測)： Thus, the target specific loudness can be expressed as one or more functions F of the audio signal or the measurement of the audio signal (the specific loudness is one of the audio signals may be measured):

如果函數F是可逆的，未被修改音訊信號之特定響度(N [b ,t ])可被計算為目標特定響度([b ,t ])的反函數F^－ ¹ ： If the function F is reversible, the specific loudness ( N [ b , t ]) of the unmodified audio signal can be calculated as the target specific loudness ( The inverse function F ^- ^{1 of} [ b , t ]) :

如將被解釋於後，反函數F^－ ¹ 在第2和4圖之回授與混合前授/回授範例中被計算。As will be explained later, the inverse function F ^- ¹ is calculated in the feedback and hybrid pre-review/reward examples in Figures 2 and 4.

計算目標特定響度6的一組“選擇函數和函數參數”輸入被展示以指示裝置或處理程序6，其可利用依據一或更多函數參數施加一或更多函數而計算目標特定響度。例如，計算目標特定響度8可計算音訊信號之特定響度的函數“F”以便定義目標特定響度。例如，“選擇函數和函數參數”輸入可選擇屬於一或多種上述種類的尺度調整之一或更多組特定的函數，以及一個或多個函數參數，例如關於函數的常數(例如，尺度調整)。A set of "selection function and function parameters" inputs that calculate the target specific loudness 6 are shown to indicate a device or process 6 that can calculate a target specific loudness by applying one or more functions in accordance with one or more function parameters. For example, calculating a target specific loudness 8 may calculate a function "F" of a particular loudness of the audio signal to define a target specific loudness. For example, the "Select Function and Function Parameters" input may select one or more sets of specific functions belonging to one or more of the above categories, and one or more function parameters, such as constants for functions (eg, scaling) .

如上所述地，與尺度調整相關的尺度係數可作為目標特定響度的表示，如同目標特定響度，其可依據特定響度之尺度調整而計算。因此，在第9圖範例中，被說明於下並且如上所述，對照表可以尺度係數和激勵為索引，以至於特定響度和目標特定響度的計算為表內固有的。As described above, the scale factor associated with the scale adjustment can be used as a representation of the target specific loudness, as with the target specific loudness, which can be calculated based on the scale adjustment of the particular loudness. Thus, in the example of Figure 9, illustrated below and as described above, the look-up table may be indexed by the scale factor and the stimulus such that the calculation of the specific loudness and the target specific loudness is inherent within the table.

無論採用一組對照表、一組封閉形式數學表示式、或一些其他的技術，產生修改參數4(和其之對應處理程序或第2－4圖範例之各裝置4’、4”以及4''')的操作使得計算在感知(心理聽覺學)響度領域中達成，即使特定響度和目標特定響度不能被明確地計算。一種明確的特定響度或一組假設的、隱含地特定響度亦將存在。相似地，一組明確的目標特定響度或一組假設的、隱含地目標特定響度亦將存在。在任何情況中，修改參數的計算是為了產生修改音訊信號，以減少在特定響度和目標特定響度之間的差異之修改參數。Whether using a set of look-up tables, a set of closed-form mathematical expressions, or some other technique, the modified parameters 4 (and their corresponding processing programs or the various devices 4', 4" and 4' of the examples of Figures 2-4 are generated. The operation of '') causes the calculation to be achieved in the field of perceptual (psychophone) loudness, even if the specific loudness and the target specific loudness cannot be explicitly calculated. A clear specific loudness or a set of hypothetical, implicitly specific loudness will also Similarly, a set of explicit target-specific loudnesses or a set of hypothetical, implicit target-specific loudnesses will also exist. In any case, the modified parameters are calculated to produce modified audio signals to reduce the specific loudness and Modification parameters for differences between target specific loudnesses.

在具有第二干擾音訊信號(例如，雜訊)的重播環境中，計算修改參數10(和其之對應處理程序或分別地在第2－4圖各個範例之裝置10’、10”和10''')，其同時也可選擇性地接收這種第二干擾音訊信號的量測作為輸入或第二干擾信號本身作為其之一組輸入。這種選擇性的輸入以一破折線被展示在第1圖(及第2－4圖)。第二干擾信號的量測可以是其之激勵，例如第17圖之範例，其被說明於下。將干擾信號的量測或信號本身(假設干擾信號可分別地被處理)施加至計算修改參數處理程序或第1圖之裝置10(及其之對應處理程序或分別地在第2－4圖的各個範例之裝置10’、10”和10''')而允許適當地被組態之此類處理程序或裝置計算考慮干擾信號的一組修改參數，如將進一步地被說明於“雜訊補償”部分中。第2－4圖之範例中，部份特定響度之計算假設干擾信號的適當量測不只被施加至分別的計算修改參數10’、10”、或10'''，但同時也被施加至一組“未被修改音訊之特定響度的計算近似度”處理程序或裝置12及/或一組“目標特定響度之計算近似度”處理程序或裝置14，以協助函數或裝置計算部份特定響度。如第1圖之前授範例，部份特定響度未被明確地計算－第1圖之計算修改參數10計算適當的修改參數，以使被修改音訊的部份特定響度近似於目標特定響度。這將在“雜訊補償”部分中進一步地被說明。In a replay environment having a second interfering audio signal (e.g., noise), the modified parameter 10 (and its corresponding processing program or devices 10', 10" and 10' of the respective examples in Figures 2-4, respectively, are calculated. ''), which can also selectively receive the measurement of such second interfering audio signal as an input or a second interfering signal itself as a group input. This selective input is displayed in a dashed line. Figure 1 (and Figures 2-4). The measurement of the second interfering signal may be an excitation thereof, such as the example of Figure 17, which is illustrated below. The measurement of the interfering signal or the signal itself (assuming interference) The signals may be separately processed) applied to the device for modifying the parameter processing program or the device 10 of FIG. 1 (and its corresponding processing program or devices 10', 10" and 10' of the respective examples of Figures 2-4, respectively. '') allows such a program or device that is suitably configured to calculate a set of modified parameters that take into account the interfering signal, as will be further illustrated in the "noise compensation" portion. In the example of Figures 2-4, the calculation of a portion of the specific loudness assumes that the appropriate measurement of the interfering signal is not only applied to the respective computational modification parameters 10', 10", or 10"', but is also applied to one. The set "calculated approximation of the specific loudness of the unmodified audio" handler or device 12 and/or a set of "calculated approximations of the target specific loudness" process or device 14 is used to assist the function or device in calculating a portion of the specific loudness. As in the example given in Figure 1, some of the specific loudness is not explicitly calculated - the calculation modification parameter 10 of Figure 1 calculates the appropriate modification parameters so that the specific loudness of the modified audio approximates the target specific loudness. This is further illustrated in the "Noise Compensation" section.

如上所述，在第1－4圖的各個範例中，修改參數M，在被音訊信號修改器2施加至音訊信號時，以減少在產生的被修改音訊之特定響度或部份特定響度和目標特定響度之間的差異。理想地，被修改音訊信號的特定響度與目標特定響度是近似或相同。修改參數M可能，例如，具有被施加至從濾波器組導出的頻帶或時變濾波器之係數的時變增益係數之型態。因此，在所有第1－4圖的範例中，修改音訊信號2可被製作為，例如，各自在一組頻帶中操作的多數個振幅尺度器，或一組時變濾波器(例如，一組多分支FIR濾波器或一組多極點IIR濾波器)。As described above, in each of the examples of Figures 1-4, the parameter M is modified to be applied to the audio signal by the audio signal modifier 2 to reduce the specific loudness or part of the specific loudness and target of the generated modified audio. The difference between specific loudnesses. Ideally, the specific loudness of the modified audio signal is approximately or the same as the target specific loudness. Modifying the parameter M may, for example, be of a type having a time varying gain coefficient applied to a band derived from a filter bank or a coefficient of a time varying filter. Thus, in all of the examples of Figures 1-4, the modified audio signal 2 can be made, for example, as a plurality of amplitude scalers each operating in a set of frequency bands, or a set of time varying filters (eg, a set Multi-branch FIR filter or a set of multi-pole IIR filters).

於本文中，相同參考號碼指示該裝置或處理程序可與另一或其他具有相同參考號碼的裝置或處理程序是大致地相似。具有撇號的參考號碼(例如，“10’”)指示裝置或處理程序則具有相似的結構或功能，但亦可能是另一或其他之具有相同基本的參考號碼或撇號之修改。Herein, the same reference numerals indicate that the device or processing program can be substantially similar to another or other device or processing program having the same reference number. A reference number with an apostrophe (e.g., "10'") indicates that the device or processing program has a similar structure or function, but may be another or other modification having the same basic reference number or apostrophe.

在某些限制之下，如第1圖之前授範例的幾乎等效回授配置可被實現。第2圖展示此種範例其中音訊信號同時也在信號路徑中。被施加至修改音訊信號處理程序或裝置2處理程序或裝置2同時也從一組控制路線接收修改參數M其中在一組回授配置中的一組產生修改參數處理程序或裝置4’從修改音訊信號2之輸出接收被修改音訊信號作為其輸入。因此，在第2圖範例中，被修改音訊而不是未被修改音訊被施加至一組控制路徑。修改音訊信號處理程序或裝置2和產生修改參數處理程序或裝置4’修改該音訊信號以減少在其之特定響度和目標特定響度之間的差異。處理程序或裝置4'可包含許多函數和或裝置：一組“未被修改音訊之特定響度的計算近似度”處理程序或裝置12、a“目標特定響度之計算近似度”處理程序或裝置14、和計算修改參數的一組“計算修改參數”處理程序或裝置10’。Under certain limitations, an almost equivalent feedback configuration as exemplified in Figure 1 can be implemented. Figure 2 shows this example where the audio signal is also in the signal path. Applied to the modified audio signal processing program or device 2 processing program or device 2 also receives modified parameters M from a set of control routes, one of which in a set of feedback configurations generates a modified parameter handler or device 4' from the modified audio The output of signal 2 receives the modified audio signal as its input. Thus, in the example of Figure 2, the modified audio, rather than the unmodified audio, is applied to a set of control paths. Modifying the audio signal processing program or device 2 and generating a modified parameter handler or device 4' modifies the audio signal to reduce the difference between its particular loudness and target specific loudness. The processing program or device 4' may include a number of functions and or devices: a set of "calculated approximations of a particular loudness of unmodified audio" processing program or device 12, a "calculated approximation of target specific loudness" processing program or device 14 And a set of "calculate modified parameters" handlers or devices 10' that modify the parameters.

依據函數F是可逆的限制，處理程序或裝置12利用將反函數F^－ ¹ 施加至被修改音訊信號之特定響度或部份特定響度而評估未被修改音訊信號之特定響度。如上所述，裝置或處理程序12可計算一組反函數F^－ ¹ 。在第2圖中這被分解地指示為“選擇反函數F^－ ¹ 和函數參數”輸入至處理程序或裝置12。“目標特定響度的計算近似度”利用計算被修改音訊信號之特定響度或部份特定響度而操作。此特定響度或部份特定響度是目標特定響度的一組近似度。未被修改音訊信號之特定響度的近似度和目標特定響度之近似度被計算修改參數10’使用以導出修改參數M，其如果被修改音訊信號2施加至音訊信號，可減少在被修改音訊信號之特定響度或部份特定響度和目標特定響度之間的差異。如上所述，這些修改參數M可能，例如，具有被施加至濾波器組的頻帶或時變濾波器之係數的時變增益之型態。在計算修改參數10”中回授迴路的實際實施例可能引介在修改參數M的計算和施加之間的延遲。Depending on whether the function F is a reversible limit, the handler or device 12 evaluates the particular loudness of the unmodified audio signal by applying the inverse function F ^- ¹ to the particular loudness or partial specific loudness of the modified audio signal. As described above, the device or handler 12 can calculate a set of inverse functions F ^- ¹ . This is decomposed in Fig. 2 as "select inverse function F ^- ¹ and function parameters" input to the processing program or device 12. The "approximate degree of calculation of the target specific loudness" is operated by calculating the specific loudness or part of the specific loudness of the modified audio signal. This particular loudness or partial specific loudness is a set of approximations of the target specific loudness. The approximation of the specific loudness of the unmodified audio signal and the approximation of the target specific loudness are used by the calculated modified parameter 10' to derive a modified parameter M that can be reduced if the modified audio signal 2 is applied to the audio signal. The difference between the specific loudness or part of the specific loudness and the target specific loudness. As mentioned above, these modified parameters M may, for example, be of a time varying gain having a frequency band applied to the filter bank or a time varying filter. The actual embodiment of the feedback loop in calculating the modified parameter 10" may introduce a delay between the calculation and the application of the modified parameter M.

如上所述，在具有第二干擾音訊信號，例如雜訊，的重播環境中，計算修改參數10’、未被修改音訊12之特定響度o計算近似度、以及目標特定響度14的計算近似度各亦可選擇性地接收這種第二干擾音訊信號的量測作為輸入或第二干擾信號它本身作為其之一組輸入並且處理程序或裝置12和處理程序或裝置14各可計算被修改音訊信號的部份特定響度。此種選擇輸入以長折線被展示於第2圖。As described above, in a replay environment having a second interfering audio signal, such as a noise, the modified parameter 10', the specific loudness of the unmodified audio 12, the approximate degree of approximation, and the calculated approximation of the target specific loudness 14 are each Optionally, the measurement of the second interfering audio signal is received as an input or a second interfering signal as a group input thereof and the processing program or device 12 and the processing program or device 14 can each calculate the modified audio signal. Part of the specific loudness. This selection input is shown in Figure 2 as a long fold line.

如上所述，本發明之論點的混合前授/回授製作範例是可能的。第3和4圖展示兩組此類製作範例。在第3和4圖的範例中，如第1和2圖的範例中，音訊信號同時也在一組信號路徑中被施加至一組修改音訊信號處理程序或裝置2，但是在分別控制路徑中之產生修改參數(第3圖中的4”中和第4圖中的4''')各接收未被修改音訊信號和被修改音訊信號。在第3和4圖的範例中，修改音訊信號2和產生修改參數(分別為4”和4''')修改音訊信號以減少在其之特定響度，其可能是隱含的，和一組目標特定響度，其同時也可能是隱含的，之間的差異。As described above, a hybrid pre-/return production paradigm of the present invention is possible. Figures 3 and 4 show two sets of such production examples. In the examples of Figures 3 and 4, as in the examples of Figures 1 and 2, the audio signals are simultaneously applied to a set of modified audio signal processing programs or devices 2 in a set of signal paths, but in separate control paths. The generated modified parameters (4 in Fig. 3 and 4'' in Fig. 4) each receive the unmodified audio signal and the modified audio signal. In the examples of Figs. 3 and 4, the audio signal is modified. 2 and generating modified parameters (4" and 4"' respectively) to modify the audio signal to reduce the specific loudness at it, which may be implied, and a set of target specific loudness, which may also be implicit, difference between.

如第3圖之範例，產生修改參數處理程序或裝置4’可包含許多函數和/或裝置：如第1圖範例之一組計算目標特定響度6、如第2圖之回授範例的目標特定響度14之一組計算近似度、以及一“計算修改參數”處理程序或裝置10”。如第1圖之範例，在這混合前授/回授範例的前授部份，計算目標特定響度6可執行一或更多的函數“F”其各具有函數參數。在第3圖中這被分解地指示為“選擇函數F和函數參數”處理程序或裝置6的輸入。在這混合前授/回授範例的回授部份，被修改音訊信號被施加至目標特定響度14的一組計算近似度，如第2圖之回授範例。在第3圖之範例中處理程序或裝置14計算被修改音訊信號之特定響度或部份特定響度的操作如同其在第2圖之範例中的操作。此特定響度或部份特定響度是目標特定響度的一組近似度。目標特定響度(來自處理程序或裝置6)和目標特定響度的近似度(來自處理程序或裝置14)被施加至計算修改參數10”以導出修改參數M，其如果被修改音訊信號2施加至音訊信號，將減少在未被修改音訊信號之特定響度和目標特定響度之間的差異。如上所述，這些修改參數M可能，例如，具有被施加至濾波器組的頻帶或時變濾波器之係數的時變增益之型態。在實際的實施例中回授迴路可能引介在修改參數M的計算和施加之間的延遲。如上所述，在具有第二干擾音訊信號，例如雜訊，的重播環境中，計算修改參數10”和目標特定響度14的計算近似度各可選擇性地接收這種第二干擾音訊信號的量測作為輸入或第二干擾信號它本身作為其之一組輸入並且處理程序或裝置14可計算被修改音訊信號的部份特定響度。此種選擇性輸入以長折線被展示於第3圖。As with the example of FIG. 3, the generated modified parameter handler or apparatus 4' may include a number of functions and/or means: a set of target specific loudness 6 as in the example of FIG. 1, a target specificity of the feedback example of FIG. 2 One set of loudness 14 calculates the approximation, and a "calculate modified parameter" handler or device 10". As in the example of Figure 1, the target specific loudness is calculated in the pre-committed portion of the pre-mixed/received paradigm. One or more functions "F" can be executed each having a function parameter. In Figure 3 this is decomposed to indicate the input of the "Select Function F and Function Parameters" handler or device 6. Before this hybrid is given / The feedback portion of the feedback example, the modified audio signal is applied to a set of computational approximations of the target specific loudness 14, as in the feedback example of Figure 2. In the example of Figure 3, the processing program or device 14 is calculated. The operation of modifying the specific loudness or part of the specific loudness of an audio signal is as its operation in the example of Figure 2. This particular loudness or partial specific loudness is a set of approximations of the target specific loudness. Target specific loudness (from the handler Or device 6) The approximation to the target specific loudness (from the processing program or device 14) is applied to the computational modification parameter 10" to derive the modified parameter M, which if applied to the audio signal by the modified audio signal 2, will be reduced in the unmodified audio signal The difference between a specific loudness and a target specific loudness. As mentioned above, these modified parameters M may, for example, be of a time varying gain having a frequency band applied to the filter bank or a time varying filter. In a practical embodiment, the feedback loop may introduce a delay between the calculation and application of the modified parameter M. As described above, in a replay environment having a second interfering audio signal, such as a noise, the calculated modified parameter 10" and the calculated approximation of the target specific loudness 14 each selectively receive the amount of such second interfering audio signal. The input or second interfering signal is itself input as a group and the processing program or device 14 can calculate a portion of the specific loudness of the modified audio signal. Such selective input is shown in Figure 3 as a long polyline.

計算修改參數10”可採用一組誤差檢測裝置或函數，以至於在其之目標特定響度和目標特定響度近似度輸入之間的差異調整修改參數以便減少在目標特定響度的近似度和“實際”目標特定響度之間的差異。Calculating the modified parameter 10" may employ a set of error detecting means or functions such that the difference between the target specific loudness and the target specific loudness approximation input adjusts the modified parameter to reduce the approximation and "actual" at the target specific loudness. The difference between the target specific loudness.

此調整減少在未被修改音訊信號之特定響度，和目標特定響度，其可能是隱含的，之間的差異。因此，修改參數M可依據在前授路線中使用函數F從原始音訊之特定響度被計算的目標特定響度，和在回授路線中從被修改音訊之特定響度或部份特定響度被計算的目標特定響度近似度之間的誤差而被調整。This adjustment reduces the difference between the specific loudness of the unmodified audio signal, and the target specific loudness, which may be implied. Therefore, the modification parameter M can be based on the target specific loudness calculated from the specific loudness of the original audio using the function F in the pre-route, and the target calculated from the specific loudness or part of the specific loudness of the modified audio in the feedback route. The error between the specific loudness approximations is adjusted.

另外的前授/回授範例被展示於第4圖之範例中。這另外的範例和第3圖之範例的差異在於反函數F^－ ¹ 是在回授路線中被計算而不是函數F在前授路徑中被計算。如第4圖之範例，產生修改參數處理程序或裝置4'可包含許多函數和/或裝置：如第1圖前授範例之一組計算特定響度8、如第2圖之回授範例的未被修改音訊12之特定響度的計算近似度、以及一組計算修改參數10'''。如第1圖的前授範例，一組計算特定響度8提供未被修改音訊信號之特定響度作為計算修改參數10'''的輸入。如在第2圖的回授範例中，依據函數F是可逆的限制，處理程序或裝置12利用將反函數F^－ ¹ 施加至被修改音訊信號的特定響度或部份特定響度而評估未被修改音訊信號的特定響度。作為未被修改音訊12之特定響度的計算近似度之輸入的“選擇反函數和反函數參數”被展示以指示裝置或處理程序12可計算一組反函數F^－ ¹ ，如上所述。在第4圖中這被分解地指示為“選擇反函數F^－ ¹ 和函數參數”處理程序或裝置12的輸入。因此，處理程序或裝置12將未被修改音訊信號之特定響度的近似度提供作為計算修改參數10'''的另一輸入。Additional pre-grant/reward examples are shown in the example of Figure 4. The difference between this additional example and the example of Fig. 3 is that the inverse function F ^- ¹ is calculated in the feedback route instead of the function F being calculated in the preamble path. As in the example of FIG. 4, the generation of the modified parameter handler or apparatus 4' may include a number of functions and/or means: as in the first example of the first example of FIG. 1, a particular loudness is calculated, as in the feedback example of FIG. The calculated approximation of the specific loudness of the modified audio 12, and a set of computational modification parameters 10'''. As in the pre-example of Figure 1, a set of calculated specific loudnesses 8 provides the specific loudness of the unmodified audio signal as an input to calculate the modified parameter 10'''. As in the feedback example of FIG. 2, depending on whether the function F is a reversible limit, the processing program or device 12 evaluates the unmodified by applying the inverse function F ^- ¹ to the specific loudness or partial specific loudness of the modified audio signal. The specific loudness of the audio signal. The "select inverse and inverse function parameters" as inputs to the calculated approximation of the particular loudness of the unmodified audio 12 are shown to indicate that the device or handler 12 can calculate a set of inverse functions F ^- ¹ , as described above. This is decomposed in Fig. 4 as the input of the "select inverse function F ^- ¹ and function parameters" handler or device 12. Thus, the handler or device 12 provides an approximation of the particular loudness of the unmodified audio signal as another input to calculate the modified parameter 10"'.

如第1－3圖之範例，計算修改參數10'''推導出修改參數M，其如果被修改音訊信號2施加至音訊信號，將減少在未被修改音訊信號的特定響度和在這範例中是隱含的目標特定響度之間的差異。如上所述，這些修改參數M可能，例如，具有被施加至濾波器組的頻帶或時變濾波器之係數的時變增益之型態。在實際的實施例中回授迴路可能引介在修改參數M的計算和施加之間的延遲。如上所述，在具有第二干擾音訊信號，例如雜訊，的重播環境中，計算修改參數10'''和未被修改音訊12之特定響度的計算近似度各可選擇性地接收這種第二干擾音訊信號的量測作為輸入或第二干擾信號它本身作為其之一組輸入並且處理程序或裝置12可計算被修改音訊信號的部份特定響度。此種選擇性輸入以長折線被展示於第4圖。As in the example of Figures 1-3, calculating the modified parameter 10''' derives a modified parameter M that, if applied to the audio signal by the modified audio signal 2, will reduce the specific loudness of the unmodified audio signal and in this example It is the difference between the implied target-specific loudness. As mentioned above, these modified parameters M may, for example, be of a time varying gain having a frequency band applied to the filter bank or a time varying filter. In a practical embodiment, the feedback loop may introduce a delay between the calculation and application of the modified parameter M. As described above, in a replay environment having a second interfering audio signal, such as a noise, the calculated modification parameter 10''' and the calculated approximation of the specific loudness of the unmodified audio 12 are each selectively receivable. The measurement of the two interfering audio signals as an input or a second interfering signal is itself input as a group and the processing program or device 12 can calculate a portion of the specific loudness of the modified audio signal. This selective input is shown in Figure 4 as a long fold line.

計算修改參數10”可採用一組誤差檢測裝置或函數，以至於在其之特定響度和特定響度近似度輸入之間的差異產生調整修改參數的輸出以便減少在特定響度的近似度和“實際”特定響度之間的差異。因為特定響度的近似度從被修改音訊之特定響度或部份特定響度被導出，其可作為目標特定響度之近似度，此調整減少在被修改音訊信號的特定響度和函數F^－ ¹ 內固有的目標特定響度之間的差異。因此，該修改參數M可依據在前授路線中從原始音訊計算的特定響度，和在回授路線中使用反函數F^－ ¹ 從被修改音訊之特定響度或部份特定響度計算的特定響度近似度之間的誤差而被調整。由於回授路徑的關係，實際的製作可引介在修改參數之更動和施加之間的延遲。Calculating the modified parameter 10" may employ a set of error detecting means or functions such that the difference between its particular loudness and the specific loudness approximation input produces an output of the adjusted modified parameter to reduce the approximation and "actual" at a particular loudness. The difference between specific loudnesses. Since the approximation of a particular loudness is derived from the specific loudness or part of the specific loudness of the modified audio, it can be used as the approximation of the target specific loudness, which reduces the specific loudness of the modified audio signal and The difference between the target specific loudness inherent in function F ^- ^1. Therefore, the modified parameter M can be based on the specific loudness calculated from the original audio in the pre-route, and the inverse function F ^- ¹ is used in the feedback route. The error between the particular loudness of the audio or the specific loudness approximation of the particular loudness calculation is modified. Due to the feedback path, the actual fabrication can introduce a delay between the modification of the modified parameter and the application.

雖然第1－4圖範例中的修改參數M在被施加至一組修改音訊信號處理程序或裝置2時減少在音訊信號之特定響度和目標特定響度之間的差異，在實際的實施例中反應於相同音訊信號而被產生的對應修改參數可能不是彼此相同的。Although the modified parameter M in the examples of Figures 1-4 reduces the difference between the specific loudness of the audio signal and the target specific loudness when applied to a set of modified audio signal processing programs or devices 2, in a practical embodiment The corresponding modified parameters generated for the same audio signal may not be identical to each other.

雖然對於本發明之論點並不是主要或必要的，音訊信號或被修改音訊信號之特定響度的計算可有利地採用先前於該國際專利申請所提出之序號PCT/US2004/016964案，其被公佈於WO 2004/111964 A2中之技術，其中之計算從二組或多組特定響度模式函數的族群中，選擇一組或二組或更多特定響度模式函數之組合，其之選擇被輸入音訊信號之特性的量測所控制。第1圖之特定響度104的說明，將於下面說明這種配置。Although the arguments of the present invention are not primary or necessary, the calculation of the specific loudness of the audio signal or the modified audio signal may advantageously be carried out using the serial number PCT/US2004/016964 previously filed in the International Patent Application, which is incorporated herein by reference. The technique of WO 2004/111964 A2, wherein calculating a combination of one or two or more specific loudness mode functions from a group of two or more sets of specific loudness mode functions, the selection of which is input to an audio signal The measurement of the characteristics is controlled. The description of the specific loudness 104 of Fig. 1 will be explained below.

依據本發明之進一步的論點，未被修改音訊信號和任一的(1)修改參數或(2)目標特定響度或目標特定響度的表示(例如，可明確地或隱含地，使用在計算中的尺度調整，如目標特定響度)可被儲存或被發送，例如，以供在時間上及/或空間上分隔的裝置或處理程序中的使用。修改參數、目標特定響度、或目標特定響度之表示可以任何適當的方式而決定，例如，如第1－4圖之前授、回授、及混合前授回授配置範例之一，如上所述。實際上，一組前授配置，如第1圖之範例，是最不複雜並且最快的因為其避免依據被修改音訊信號的計算。第5圖展示傳輸或儲存未被修改音訊和修改參數的範例，第6圖展示傳輸或儲存未被修改音訊和目標特定響度或目標特定響度之表示的範例。According to a further aspect of the invention, the unmodified audio signal and either (1) modified parameter or (2) the target specific loudness or the target specific loudness representation (eg, may be used explicitly or implicitly in the calculation The scale adjustments, such as target specific loudness, can be stored or sent, for example, for use in devices and processes that are separated in time and/or space. The modification of the parameter, the target specific loudness, or the representation of the target specific loudness may be determined in any suitable manner, for example, as described in Figures 1-4, prior to the grant, feedback, and hybrid pre-request configuration examples, as described above. In fact, a set of pre-administration configurations, such as the example of Figure 1, is the least complicated and fastest because it avoids calculations based on the modified audio signal. Figure 5 shows an example of transmitting or storing unmodified audio and modifying parameters. Figure 6 shows an example of transmitting or storing an unmodified audio and a target specific loudness or target specific loudness representation.

如第5圖之範例的一組配置可被使用以在時間上及/或空間上將修改參數至音訊信號的施加從此修改參數的產生分離。如第6圖之範例的一組配置可被使用以在時間上及/或空間上將修改參數之產生和施加從目標特定響度或其表示之產生分離。兩種配置產生一種簡單低成本的重播或接收配置其可避免產生修改參數或產生目標特定響度的複雜性。雖然第5圖型式的配置比第6圖型式的配置簡單，第6圖的配置具有較少需要被儲存或被發送的資訊之優點，尤其是在目標特定響度的表示，例如一個或更多尺度個調整被儲存或被發送時。儲存或傳輸資訊的減少在低位元率的音訊環境中特別有用。A set of configurations, as exemplified in FIG. 5, can be used to separate the application of modified parameters to audio signals temporally and/or spatially from the generation of such modified parameters. A set of configurations, as exemplified in FIG. 6, can be used to separate the generation and application of modified parameters from the target specific loudness or its representation in time and/or space. Both configurations result in a simple, low cost replay or receive configuration that avoids the complexity of generating modified parameters or producing target specific loudness. Although the configuration of the fifth pattern is simpler than the configuration of the sixth pattern, the configuration of FIG. 6 has the advantage of less information that needs to be stored or transmitted, especially in the representation of the target specific loudness, such as one or more scales. When adjustments are stored or sent. The reduction in stored or transmitted information is particularly useful in low bit rate audio environments.

因此，本發明之進一步的論點在於供應一組裝置或處理程序(1)其從一組儲存器或傳輸裝置或處理程序，接收或播放，修改參數M，並且將它們施加至同時被接收的音訊信號或(2)，其從一組儲存器或傳輸裝置或處理程序，接收或播放，一目標特定響度或目標特定響度的表示，而利用將目標特定響度或其表示施加至同時被接收的音訊信號(或至音訊信號的量測，例如，可從音訊信號被導出之特定響度)而產生修改參數M，並且將修改參數M施加至接收的音訊信號。此裝置或處理程序可被歸類為解碼處理程序或解碼器；而所需以產生被儲存或被發送資訊的裝置或處理程序可被歸類為編碼處理程序或編碼器。此編碼處理程序或編碼器是第1－4圖之配置範例中可使用以產生分別的解碼處理程序或解碼器所需之資訊的部份。此解碼處理器或解碼器可以和幾乎任何一種處理及/或複製聲音的處理程序或裝置組合或操作。Accordingly, a further object of the present invention is to supply a set of devices or processing programs (1) that receive or play from a set of storage or transmission devices or processing programs, modify parameters M, and apply them to simultaneously received audio. a signal or (2) that receives or plays from a set of storage or transmission devices or processing programs, a target specific loudness or a target specific loudness representation, and utilizes the target specific loudness or its representation to simultaneously received audio. The signal (or to the measurement of the audio signal, for example, the particular loudness from which the audio signal is derived) produces a modified parameter M and applies the modified parameter M to the received audio signal. This apparatus or processing program may be categorized as a decoding processing program or decoder; and a device or processing program required to generate stored or transmitted information may be classified as an encoding processing program or encoder. This encoding process or encoder is part of the configuration example of Figures 1-4 that can be used to generate the information needed for the respective decoding process or decoder. This decoding processor or decoder can be combined or operated with virtually any processing or device that processes and/or replicates sound.

在本發明之一論點中，如第5圖之範例，未被修改音訊信號和修改參數M可利用，例如，一組修改參數產生處理程序或產生器，例如，第1圖之產生修改參數4、第2圖之4’、第3圖之4”或第4圖之4'''施加至任何適當的儲存或傳輸裝置或函數(“儲存器或傳輸”)16，而被產生。於使用第1圖之前授範例作為編碼處理程序或編碼器的情況中，如果不需要在編碼器或編碼處理程序的時間或空間位置提供被修改音訊，修改音訊信號2將不需產生被修改音訊，並且可被省略。儲存器或傳輸16可包含，例如，任何適當的磁、光學或固態儲存和重播裝置或任何適當的有線或無線傳輸和接收裝置，其選擇對於本發明並非主要的。播放或接收修改參數可接著被施加至修改音訊信號2，其具有第1－4圖之範例所採用的型式，以便修改被播放或接收的音訊信號以使其之特定響度近似目標特定響度或修改參數被導出之配置固有的。修改參數可以任何各種的方式被儲存或被發送。例如，它們可被儲存或被發送為伴隨音訊信號的元資料，它們可由分別的路徑或通道被傳送，它們可被隱匿式地編碼在音訊中，它們可以是多路傳輸，等等。使用修改參數以修改音訊信號可以是選擇性的並且，如果是選擇性的，它們的使用可被，例如，使用者所選擇。例如，修改參數如果被施加至音訊信號可能減少音訊信號的動態範圍。使用者可選擇是否採用此動態範圍減少。In an aspect of the present invention, as in the example of FIG. 5, the unmodified audio signal and the modified parameter M are available, for example, a set of modified parameter generation processing programs or generators, for example, the modified parameter 4 of FIG. 4' of Figure 2, 4' of Figure 3 or 4'' of Figure 4 is applied to any suitable storage or transmission device or function ("storage or transmission") 16 and is used. In the case of the first example of the encoding process or the encoder, if the modified audio is not required to be provided at the time or space of the encoder or the encoding process, the modified audio signal 2 will not need to generate the modified audio, and The storage or transmission 16 may comprise, for example, any suitable magnetic, optical or solid state storage and playback device or any suitable wired or wireless transmission and reception device, the choice of which is not essential to the invention. Play or receive The modified parameters can then be applied to the modified audio signal 2 having the pattern used in the examples of Figures 1-4 to modify the played or received audio signal to approximate its specific loudness. The specific loudness or modification parameters are inherently derived from the configuration. The modified parameters can be stored or transmitted in any of a variety of ways. For example, they can be stored or sent as metadata accompanying the audio signal, which can be routed by separate paths or channels. Transmit, they can be implicitly encoded in the audio, they can be multiplexed, etc. The use of modified parameters to modify the audio signal can be selective and, if selective, their use can be, for example The user selects. For example, modifying the parameters if applied to the audio signal may reduce the dynamic range of the audio signal. The user may choose whether to use this dynamic range reduction.

在本發明之另一論點中，如第6圖之範例，未被修改音訊信號和目標特定響度或目標特定響度之表示可被施加至任何適當的儲存或傳輸裝置或函數(“儲存器或傳輸”)16。In another aspect of the invention, as exemplified in Figure 6, the representation of the unmodified audio signal and the target specific loudness or target specific loudness may be applied to any suitable storage or transmission device or function ("storage or transmission" ") 16.

於使用如第1圖之前授組態範例作為編碼處理程序或編碼器的情況中，如果不需要在編碼器或編碼處理程序的時間或空間位置提供任何修改參數或被修改音訊，則計算修改參數10型式之處理程序或裝置或修改音訊信號2型式之處理程序或裝置皆不被需要並且可被省略。如第5圖範例的情況中，儲存器或傳輸16可包含，例如，任何適當的磁、光學或固態儲存和重播裝置或任何適當的有線的或無線傳輸和接收裝置，其之選擇對於本發明並非主要的。被播放或接收目標特定響度或目標特定響度的表示可接著，與未被修改音訊一起，被施加至一組計算修改參數10，如第1圖之範例所採用的型式，或被施加至一組計算修改參數10”，如第3圖之範例所採用的型式，以便提供修改參數M其可接著被施加至修改音訊信號2，如第1－4圖之範例所採用的型式，以便修改播放或接收音訊信號以使其之特定響度近似目標特定響度或修改參數被導出之配置固有的。雖然目標特定響度或其表示可立即地從如第1圖範例之型式的編碼處理程序或編碼器獲得，目標特定響度或其表示或目標特定響度之近似度或其表示可從如第2至4圖的範例之型式的編碼處理程序或編碼器而被得到(近似度在第2和3圖之處理程序或裝置14和第4圖的程序或裝置12中被計算)。目標特定響度或其表示可以任何各種的方式被儲存或被發送。例如，它們可被儲存或被發送為伴隨音訊信號的元資料，它們可由分別的路徑或通道被傳送，它們可被隱含式地編碼在音訊中，它們可以是多路傳輸，等等。使用從儲存或被發送目標特定響度或表示導出的修改參數以修改音訊信號可以是選擇性的，例如，它們的使用可被使用者所選擇。例如，修改參數如果被施加至音訊信號可能減少音訊信號的動態範圍。使用者可選擇是否採用此動態範圍減少。In the case of using the configuration example as shown in Figure 1 as an encoding process or encoder, if it is not necessary to provide any modified parameters or modified audio at the time or spatial position of the encoder or encoding processor, then the modified parameters are calculated. A type 10 processing program or apparatus or a processing program or apparatus for modifying the audio signal type 2 is not required and may be omitted. As in the case of the example of Figure 5, the storage or transmission 16 may comprise, for example, any suitable magnetic, optical or solid state storage and replay device or any suitable wired or wireless transmission and reception device, the choice of which is for the present invention Not the main one. The representation of the target specific loudness or target specific loudness being played or received may then be applied, along with the unmodified audio, to a set of computational modification parameters 10, such as the pattern employed in the example of FIG. 1, or applied to a group Calculating the modified parameter 10", as used in the example of Figure 3, to provide a modified parameter M which can then be applied to the modified audio signal 2, as in the example of Figures 1-4, to modify the play or Receiving an audio signal such that its particular loudness approximates the target specific loudness or the configuration in which the modified parameter is derived. Although the target specific loudness or its representation is immediately available from an encoding process or encoder of the type illustrated in the first example of FIG. 1, The approximation of the target specific loudness or its representation or target specific loudness or its representation may be obtained from an encoding process or encoder of the type of the example of Figures 2 to 4 (approximation of the processing procedures of Figures 2 and 3) Or the device 14 and the program or device 12 of Figure 4 are calculated. The target specific loudness or its representation can be stored or transmitted in any of a variety of ways. For example, they can be stored Stored or transmitted as metadata accompanying audio signals, which may be transmitted by separate paths or channels, which may be implicitly encoded in the audio, they may be multiplexed, etc. Used from being stored or transmitted The target specific loudness or the derived modified parameters to modify the audio signal may be selective, for example, their use may be selected by the user. For example, modifying the parameter may reduce the dynamic range of the audio signal if applied to the audio signal. You can choose whether to use this dynamic range reduction.

當製作本發明之揭示成為一組數位系統時，一組前授組態是最實際的，並且此類組態的範例將被詳細說明於後，將了解本發明之範圍是不受此限制。While making the disclosure of the present invention a set of digital systems, a set of pre-configured configurations is the most practical, and examples of such configurations will be described in detail, and it is to be understood that the scope of the present invention is not limited thereto.

在此文件中，一些名稱，例如"濾波器"或"濾波器組"被使用在此以包含實際上任何遞歸和非遞歸過濾型式，例如IIR濾波器或轉換，並且"濾波"資訊是使用此類濾波器所產生的結果。被說明於下之實施例採用以轉換製作的濾波器組。In this file, some names, such as "filters" or "filter banks" are used here to contain virtually any recursive and non-recursive filtering patterns, such as IIR filters or transformations, and "filtering" information is used. The result of a class filter. The embodiment described below employs a filter bank made by conversion.

第7圖展示本發明之論點實施在一組前授配置中的實施例示範之更詳細的細節。音訊首先通過一組分析濾波器組函數或裝置(“分析濾波器組”)100，其將音訊信號分割成為多數個頻帶(因此，第5圖展示多數個來自分析濾波器組100的輸出，各輸出代表一組頻帶，其輸出經由各種函數或裝置而傳輸至合成濾波器組，其將頻帶合併至一組被組合寬頻帶信號，如進一步地被說明於下)。分析濾波器組100中之濾波器相關於各個頻帶的反應被設計以模擬基底膜在內耳中之特定位置的反應。分析濾波器組100中的各個濾波器之輸出接著進入模擬經由外耳和中耳之音訊傳輸的過濾影響之一組傳輸濾波器或傳輸濾波器函數(“傳輸濾波器”)101。如果只有音訊響度需被量測，傳輸濾波器可被施加在分析濾波器組前，但是因為分析濾波器組輸出被使用以合成被修改音訊，因此在濾波器組之後施加傳輸濾波器是有利的。傳輸濾波器101之輸出接著進入一組激勵函數或裝置(“激勵”)102，其輸出模擬沿著基底膜的能量分配。激勵能量數值可利用平滑函數或裝置(“平滑化”)103而被跨越時間平滑化。平滑函數之函數常數依據所需應用的需要而被設定。平滑化激勵信號在特定響度函數或裝置(“特定響度(SL)”)104中被依序地轉換成為特定響度。特定響度被以每單位頻率的宋之單位表示。與各個頻帶相關的特定響度成分被傳送入特定響度修改函數或裝置(“SL修改”)105。SL修改部105的輸入是原始特定響度並且接著輸出所需的或“目標”特定響度，依據本發明之論點，其最好是原始特定響度的函數(參看下一標題，其標題為“目標特定響度”)。取決於所需的效果，SL修改部105可獨立地於各個頻帶上操作，或一種互相關聯性可能存在於頻帶之間(如第7圖之相交連接線所建議之一組頻率平滑化)。一組增益解決器函數或裝置(“增益解決器”)106以來自激勵102之平滑化激勵頻帶成分和來自SL修改部105的目標特定響度作為其輸入，並決定需要被施加至分析濾波器組100之輸出的各個頻帶以便將量測特定響度轉換為目標特定響度的增益。增益解決器可以各種方式被製作。例如，增益解決器可包含一組反覆的處理程序，例如，被公佈於國際專利申請序號WO 2004/111964 A2之PCT/US2004/016964案中之方式，或另外的查表方式。雖然增益解決器106產生的每頻帶之增益可進一步地利用選擇性的平滑函數或裝置(“平滑化”)107在時間上被平滑化以便使人工效果的感知最小化，在全面的處理程序或裝置中時間平滑化最好被施加在別處，如於別處之說明。最後，增益經由一組分別的相乘組合函數或組合器108被施加至分析濾波器組100的分別頻帶，並且被處理或“被修改”音訊在合成濾波器組函數或裝置(“合成濾波器組”)110中從增益－被修改頻帶被合成。此外，來自分析濾波器組之輸出可在施加增益之前被以延遲函數或裝置(“延遲”)109延遲以便補償任何與增益計算相關的延遲。另外地，除了計算在頻帶之施加增益修改所使用的增益之外，增益解決器106可計算控制一組時變濾波器之濾波器係數，例如一組多分支FIR濾波器或一組多極IIR濾波器。為了簡化激勵，本發明之論點大致被說明為採用被施加至頻帶的增益係數，將了解的是濾波器係數和時變濾波器同時也可被採用於實際的實施例中。Figure 7 shows more detailed details of an embodiment of the invention implemented in a set of pre-configured configurations. The audio first passes through a set of analysis filter bank functions or means ("analysis filter bank") 100, which divides the audio signal into a plurality of frequency bands (thus, Figure 5 shows a plurality of outputs from the analysis filter bank 100, each The output represents a set of frequency bands whose output is transmitted via various functions or means to a synthesis filter bank that combines the frequency bands into a set of combined wideband signals, as further illustrated below. The response of the filter in the analysis filter bank 100 with respect to each frequency band is designed to simulate the reaction of the basement membrane at a particular location in the inner ear. The output of each filter in the analysis filter bank 100 then enters a set of transmission filters or transmission filter functions ("transmission filters") 101 that simulate the filtering effects of audio transmission via the outer and middle ear. If only the audio loudness needs to be measured, the transmission filter can be applied before the analysis filter bank, but since the analysis filter bank output is used to synthesize the modified audio, it is advantageous to apply the transmission filter after the filter bank. . The output of transmission filter 101 then enters a set of excitation functions or devices ("excitation") 102 whose output simulates the energy distribution along the basement membrane. The excitation energy value can be smoothed across time using a smoothing function or device ("smoothing") 103. The function constants of the smoothing function are set according to the needs of the desired application. The smoothed excitation signal is sequentially converted to a particular loudness in a particular loudness function or device ("specific loudness (SL)") 104. The specific loudness is expressed in units of Song per unit frequency. The specific loudness component associated with each frequency band is passed to a particular loudness modification function or device ("SL Modification") 105. The input of the SL modification 105 is the original specific loudness and then outputs the desired or "target" specific loudness, which is preferably a function of the original specific loudness according to the argument of the present invention (see the next title, entitled "Target Specific Loudness"). Depending on the desired effect, the SL modification 105 can operate independently on each frequency band, or a correlation may exist between the frequency bands (as suggested by the intersection of the intersection lines of Figure 7 for frequency smoothing). A set of gain solver functions or means ("gain solver") 106 takes as input the smoothed excitation band component from the excitation 102 and the target specific loudness from the SL modification 105 and determines that it needs to be applied to the analysis filter bank. Each frequency band of the output of 100 is used to convert the measured specific loudness to the gain of the target specific loudness. The gain solver can be made in a variety of ways. For example, the gain solver can include a set of repetitive processes, for example, in the manner of PCT/US2004/016964, which is incorporated by reference in the International Patent Application Serial No. WO 2004/111964 A2, or otherwise. Although the gain per band produced by gain solver 106 can be further smoothed with time using a selective smoothing function or device ("smoothing") 107 to minimize the perception of artifacts, in a comprehensive processing or Time smoothing in the device is preferably applied elsewhere, as explained elsewhere. Finally, the gain is applied to the respective frequency bands of the analysis filter bank 100 via a set of separate multiplication combining functions or combiners 108, and processed or "modified" in the synthesis filter bank function or device ("synthesis filter" The band->modified band is synthesized from the gain-modified band. Moreover, the output from the analysis filter bank can be delayed by a delay function or device ("delay") 109 prior to applying the gain to compensate for any delay associated with the gain calculation. Additionally, in addition to calculating the gain used in applying the gain modification to the band, the gain solver 106 can calculate filter coefficients that control a set of time varying filters, such as a set of multi-branch FIR filters or a set of multi-pole IIRs. filter. To simplify the excitation, the arguments of the present invention are generally illustrated as employing gain coefficients applied to the frequency bands, it being understood that the filter coefficients and time varying filters can also be employed in practical embodiments.

在實際的實施例中，音訊處理可在數位領域中被進行。因此，音訊輸入信號以離散時間序列x[n]代表，其從音訊來源以某取樣頻率f _s 被取樣。假設序列x[n]已經適當地被尺度調整因而x[n]的平均輸出功率以分貝為單位如下 In a practical embodiment, audio processing can be performed in the digital domain. Thus, audio input signals to discrete time sequence x [n] Representative, from which audio source is sampled at a sampling frequency f _s. Suppose the sequence x[n] has been properly scaled so that the average output power of x[n] is in decibels as follows

是等於人類聆聽者所聆聽之音訊，以dB為單位之聲音壓力位準。此外，為了簡化激勵，音訊信號被假設為單聲道的。It is equal to the sound pressure level in dB, which is heard by human listeners. Furthermore, to simplify the excitation, the audio signal is assumed to be mono.

分析濾波器組100、傳輸濾波器101、激勵102、特定響度104、特定響度修改部105、增益解決器106、及合成濾波器組110可被更詳細說明如下。The analysis filter bank 100, the transmission filter 101, the excitation 102, the specific loudness 104, the specific loudness modification unit 105, the gain solver 106, and the synthesis filter bank 110 can be described in more detail below.

分析濾波器組100 音訊輸入信號被施加至分析濾波器組或濾波器組函數(“分析濾波器組”)100。分析濾波器組100之各個濾波器被設計以模擬沿著基底膜在內耳中之特定位置的頻率響應。濾波器組100可包含一組線性濾波器其頻寬和間隔調節在等效矩形頻寬(ERB)頻率刻度上為常數，如Moore、Glasberg和Baer(如上之B.C.J.Moore、B.Glasberg、T.Baer，“臨限、響度、和部份的響度之預測模式”)所定義者。 The analysis filter bank 100 audio input signal is applied to an analysis filter bank or filter bank function ("analysis filter bank") 100. The individual filters of the analysis filter bank 100 are designed to simulate the frequency response along a particular location in the inner ear of the basement membrane. Filter bank 100 can include a set of linear filters whose bandwidth and spacing adjustments are constant over the equivalent rectangular bandwidth (ERB) frequency scale, such as Moore, Glasberg, and Baer (BCJ Moore, B. Glasberg, T. Baer, supra). , defined by "predictive mode of threshold, loudness, and partial loudness").

雖然ERB頻率尺度更嚴密地匹配人類感知並且在產生匹配主觀響度結果之客觀響度量測中展示更佳性能，具有較低性能的Bark頻率尺度可被採用。While the ERB frequency scale more closely matches human perception and exhibits better performance in producing objective loudness measurements that match subjective loudness results, Bark frequency scales with lower performance can be employed.

在以赫茲為單位之中心頻率f時，一組以赫茲為單位之ERB頻帶寬度可被近似如下：ERB (f )＝24.7(4.37f /1000＋1) (1)At a center frequency f in Hertz, a set of ERB bandwidths in Hertz can be approximated as follows: ERB ( f ) = 24.7 (4.37 f / 1000 + 1) (1)

依據這關係一組被抝曲頻率尺度被定義以至於在沿著被抝曲尺度之任何點，以被抝曲尺度為單位的對應ERB等於一。從以赫茲為單位之線性頻率轉換至這ERB頻率尺度的函數可利用將方程式1之倒數積分而獲得： Based on this relationship, a set of distorted frequency scales is defined such that at any point along the scale being distorted, the corresponding ERB in units of the distorted scale is equal to one. The function of converting from a linear frequency in Hertz to this ERB frequency scale can be obtained by integrating the reciprocal of Equation 1:

同時，利用方程式2a找出f將ERB尺度轉換為線性的頻率尺度的表示也是有用的：其中e是以ERB尺度為單位。第9圖展示在ERB尺度和以赫茲為單位的頻率之間的關係。At the same time, it is also useful to use Equation 2a to find the representation of f to convert the ERB scale to a linear frequency scale: Where e is in units of ERB. Figure 9 shows the relationship between the ERB scale and the frequency in Hertz.

分析濾波器組100可包含B組聽覺濾波器，其被稱為頻帶，並在中心頻率沿著ERB尺度被均勻地分隔。更明確地說，f _c [1]＝f _m _i _n (3a)f _c [b ]＝f _c [b－1]＋ERBToHz (HzToERB (f _c [b －1])＋△)b ＝2...B (3b)f _c [B ]<f _m _a _x ， (3c)其中△是分析濾波器組100所需的ERB間隔調節，並且其中f _m _i _n 和f _m _a _x 分別地是所需的最小和最大中心頻率。可選擇△＝1，並且考慮到人耳敏感的頻率範圍，可設定f _m _i _n ＝50Hz 和f _m _a _x ＝20,000Hz 。以這些參數，例如，方程式3a－c的應用將產生B＝40聽覺的濾波器。The analysis filterbank 100 can include a set B of auditory filters, referred to as frequency bands, and are evenly spaced along the ERB scale at the center frequency. More specifically, f _c [1] = f _m _i _n (3a) f _c [ b ] = f _c [b-1] + ERBToHz ( HzToERB ( f _c [ b -1]) + Δ) b = 2 ... B (3b) f _c [ B ]< f _m _a _x , (3c) where Δ is the ERB interval adjustment required for analyzing the filter bank 100, and wherein f _m _i _n and f _m _a _x are respectively The minimum and maximum center frequencies required. Δ=1 can be selected, and f _m _i _n =50 Hz and f _m _a _x =20,000 Hz can be set in consideration of the frequency range sensitive to the human ear. With these parameters, for example, the application of equations 3a-c will produce a B=40 audible filter.

各個聽覺濾波器的振幅頻率響應可利用一組圓通化指數函數而被描述，如Moore和Glasberg所建議。明確地說，具有中心頻率f _c [b ]的濾波器之振幅反應可被計算為：H _b (f )＝(1＋pg )e ^－ ^pg (4a) 其中 The amplitude frequency response of each auditory filter can be described using a set of circular pass exponential functions, as suggested by Moore and Glasberg. Specifically, the amplitude response of a filter having a center frequency f _c [ b ] can be calculated as: H _b ( f )=(1+ pg ) e ^- ^pg (4a)

這些B聽覺濾波器的振幅響應，其近似在ERB刻度上的臨界頻帶，如展示於第10圖。The amplitude response of these B auditory filters is approximately the critical band on the ERB scale, as shown in Figure 10.

分析濾波器組100的過濾運算可適當地使用有限長度離散傅立葉轉換而被近似，普遍地被稱為短時離散傅立葉轉換(STDFT)，因為以音訊信號的取樣率執行濾波器之製作，被稱為全速率製作，被認為可提供比精確響度量測所需之更高時間解析度。使用STDFT取代全速率製作，可達成效率的改進和計算複雜性的減少。The filtering operation of the analysis filter bank 100 can be approximated using a finite-length discrete Fourier transform as appropriate, commonly referred to as short-time discrete Fourier transform (STDFT), because the filter is performed at the sampling rate of the audio signal, which is called For full rate production, it is believed to provide a higher time resolution than is required for accurate loudness measurements. Using STDFT instead of full rate production results in improved efficiency and reduced computational complexity.

輸入音訊信號x[n]的STDFT被定義為：其中k是頻率指標，t是時間區塊指標，N是DFT尺寸，T是間隔尺寸，以及w[n]是正規化的長度N視窗因而 The STDFT of the input audio signal x[n] is defined as: Where k is the frequency index, t is the time block indicator, N is the DFT size, T is the interval size, and w[n] is the normalized length N window

注意方程式5a中的變量t是代表STDFT的時間區塊之離散指標而非以秒為單位的時間量測。各個增量t代表沿著信號x[n]的T取樣間隔。依序指標t之引用亦假設此定義。不同的參數設定和視窗形狀可依據製作之細節而被使用，在f _s ＝44100Hz 時選擇N ＝2048，T ＝1024，並且以w[n]作為一組漢寧視窗提供足夠的時間和頻率解析度之平衡。上述的STDFT可使用快速傅立葉轉換(FFT)而成為更有效的。Note that the variable t in Equation 5a is a discrete indicator representing the time block of the STDFT rather than a time measurement in seconds. Each increment t represents the T-sampling interval along the signal x[n]. The reference to the indicator t is also assumed to be this definition. Different parameter settings and window shapes can be used depending on the details of the production. Select n = 2048, T = 1024 at f _s = 44100 Hz , and provide enough time and frequency for w[n] as a group of Hanning windows. The balance of resolution. The above STDFT can be made more efficient using Fast Fourier Transform (FFT).

與其使用STDFT，修改離散餘弦轉換(MDCT)可被採用以製作分析濾波器組。MDCT是普遍地被使用在感知音訊編碼器，例如，杜比(Dolby)AC－3之一種轉換。如果被揭示的系統被以此類的感知編碼音訊而被製作，被揭示響度量測和修改可利用處理編碼音訊的現存MDCT係數而更有效地被製作，因而免除進行分析濾波器組轉換的需要。輸入音訊信號x[n]的MDCT被給予為：，其中(6)Instead of using STDFT, modified discrete cosine transform (MDCT) can be employed to create an analysis filter bank. MDCT is a conversion commonly used in perceptual audio encoders, for example, Dolby AC-3. If the disclosed system is produced with such perceptually encoded audio, the revealed metrics and modifications can be more efficiently produced by processing the existing MDCT coefficients of the encoded audio, thereby eliminating the need for analysis filter bank conversion. . The MDCT of the input audio signal x[n] is given as: ,among them (6)

一般而言，間隔尺寸T被選擇為正好是轉換長度N的一半因而可能完整地重建信號x[n]。In general, the spacing dimension T is chosen to be exactly half of the conversion length N and thus it is possible to completely reconstruct the signal x[n].

傳輸濾波器101 分析濾波器組100之輸出被施加至一組傳輸濾波器或傳輸濾波器函數(“傳輸濾波器”)101，其依據經由外耳和中耳之音訊傳輸將濾波器組的各個頻帶濾波。第8圖展示傳輸濾波器P(f)，跨越可聽見的頻率範圍之一種適當的振幅頻率響應。在1kHz之下該響應是一單位，並且，在1kHz之上，將依據聽力臨限之倒數，如ISO 226標準所指定，其臨限在1kHz時被正規化至等於一單位。The output of the transmission filter 101 analysis filter bank 100 is applied to a set of transmission filters or transmission filter functions ("transmission filters") 101 which transmit the respective bands of the filter bank in accordance with the audio transmission via the outer and middle ears. Filtering. Figure 8 shows the transmission filter P(f), an appropriate amplitude frequency response across the audible frequency range. The response is one unit below 1 kHz and, above 1 kHz, will be normalized to equal one unit depending on the reciprocal of the hearing threshold, as specified by the ISO 226 standard.

激勵102 為了計算輸入音訊信號的響度，必須在施加分析濾波器組100之後，量測在傳輸濾波器101的各個濾波器中之音訊信號的短時間能量。這時間和頻率變化量測被稱為激勵。分析濾波器組100中之各個濾波器的短時間能量輸出可在激勵函數102中經由將在頻率領域中的濾波器響應與輸入信號之功率頻譜相乘而被近似：其中b是頻帶數目、t是區塊數目、並且H _b [k ]和P [k ]分別地是在對應至STDFT或MDCT之指標k的頻率被取樣之聽覺濾波器和傳輸濾波器的頻率響應。應注意到聽覺濾波器的振幅反應型式別於方程式4a－c所指定的型式亦可被使用於方程式7以達成相似結果。例如，公佈於WO 2004/111964 A2之國際專利申請序號PCT/US2004/016964案，其說明二組不同的型式：一組聽覺濾波器，其具特徵於第12階IIR轉移函數，以及一組低成本的“磚壁”帶通近似者。In order to calculate the loudness of the input audio signal, the excitation 102 must measure the short-term energy of the audio signal in each of the filters of the transmission filter 101 after applying the analysis filter bank 100. This time and frequency change measurement is called excitation. The short-term energy output of each of the filters in the analysis filter bank 100 can be approximated in the excitation function 102 by multiplying the filter response in the frequency domain by the power spectrum of the input signal: Where b is the number of bands, t is the number of blocks, and H _b [ k ] and P [ k ] are the frequency responses of the auditory filters and transmission filters sampled at frequencies corresponding to the index k of the STDFT or MDCT, respectively. . It should be noted that the amplitude response pattern of the auditory filter other than the one specified in Equations 4a-c can also be used in Equation 7 to achieve similar results. For example, the international patent application serial number PCT/US2004/016964, which is hereby incorporated by reference in its entirety, is hereby incorporated by reference in its entirety in the the the the the the the the the The cost of the "brick wall" bandpass approximation.

總之，激勵函數102的輸出是能量E在每時間週期t之分別ERB頻帶b中的頻率領域表示。In summary, the output of the excitation function 102 is represented by the frequency domain in the ERB band b of each time period t.

時間平均(“平滑化”)103 在被揭示之發明的某些應用中，如被說明於下，可能需要將激勵在其轉換至特定響度之前平滑化。例如，平滑化可依據下列方程式以平滑函數103被遞歸地進行：其中各個頻帶b之時間常數λ_b 依據所需的應用被選擇。在多數情況中時間常數可有利地被選擇為與在頻帶b之內人類響度感知的積分時間成比例的。Watson和Gengel進行試驗以展示這積分時間在低頻率(125－200Hz)時於150－175ms範圍之內並且在高頻率時在40－60ms範圍之內(Charles S.Watson和Roy W.Gengel，“信號持續和信號頻率與相對於聽覺靈敏度之關係”，其取自1969年之美國聽覺協會期刊，46卷，編號4(部份2)，989－997頁)。 Time averaging ("smoothing") 103 In some applications of the disclosed invention, as explained below, it may be desirable to smooth the stimulus before it transitions to a particular loudness. For example, smoothing can be performed recursively with the smoothing function 103 according to the following equation: The time constant λ _{b of} each frequency band b is selected depending on the desired application. In most cases the time constant can advantageously be chosen to be proportional to the integration time of human loudness perception within frequency band b. Watson and Gengel experimented to show that this integration time is in the range of 150-175 ms at low frequencies (125-200 Hz) and in the range of 40-60 ms at high frequencies (Charles S. Watson and Roy W. Gengel, " Signal persistence and signal frequency versus relative auditory sensitivity, taken from the American Journal of Auditory Association, 1969, Volume 46, number 4 (Part 2), pages 989-997).

特定響度104 在特定響度轉換器或轉換函數(“特定響度”)104中，激勵之各個頻帶被轉換為特定響度之一組成分值，其以每ERB單位之宋為量測。 The specific loudness 104 is in a particular loudness converter or transfer function ("specific loudness") 104, and each frequency band of the excitation is converted to a component score of a particular loudness, which is measured in units of ERB units.

開始計算特定響度時，[b ,t ]之各個頻帶的激勵位準可被轉換至在1kHz之等效激勵位準，如ISO 226之相等響度等高線(第11圖)所指定，並以傳輸濾波器P(z)(第12圖)正規化：其中T ₁ _kHz (E ,f )是在1kHz之位準產生的函數，其相當於頻率f之位準E的響度。實際上，T ₁ _kHz (E ,f )被製作為相等響度等高線之對照表的插補，並以傳輸濾波器正規化。轉換至在1kHz的等效位準簡化下面的特定響度計算。When starting to calculate a specific loudness, The excitation level of each band of [ b , t ] can be converted to an equivalent excitation level at 1 kHz, as specified by the equivalent loudness contour of ISO 226 (Fig. 11), with the transmission filter P(z) ( Figure 12) Normalization: Where T ₁ _kHz ( E , f ) is a function produced at a level of ₁ _kHz , which corresponds to the loudness of the level E of the frequency f. In fact, T ₁ _kHz ( E , f ) is interpolated as a comparison table of equal loudness contours and normalized by a transmission filter. Convert to the specific loudness calculation below the equivalent level at 1 kHz.

接著，各個頻帶的特定響度可被計算為：N [b ,t ]＝α[b ,t ]N _NB [b ,t ]＋(1－α[b ,t ])N _WB [b ,t ]， (10)其中N _NB [b ,t ]和N _WB [b ,t ]分別地是依據窄頻帶和寬頻帶信號模式的特定響度數值。數值α[b ,t ]是從音訊信號被計算在0和1之間的插補係數。其公佈於WO 2004/111964 A2之國際專利申請序號PCT/US2004/016964案，而說明自激勵之頻譜平滑度計算α[b ,t ]的技術：其同時也詳細地說明“窄頻帶”和“寬頻帶”信號模式。Then, the specific loudness of each frequency band can be calculated as: N [ b , t ]=α[ b , t ] N _NB [ b , t ]+(1 -α[ b , t ]) N _WB [ b , t ] (10) where N _NB [ b , t ] and N _WB [ b , t ] are specific loudness values depending on the narrowband and wideband signal patterns, respectively. The value α[ b , t ] is an interpolation coefficient calculated from the audio signal between 0 and 1. It is disclosed in International Patent Application Serial No. PCT/US2004/016964 to WO 2004/111964 A2, which describes the technique for calculating α[ b , t ] from the spectral smoothness of excitation: it also specifies "narrowband" and "details" in detail. Broadband" signal mode.

窄頻帶和寬頻帶特定響度數值N _NB [b ,t ]和N _wB [b ,t ]可使用指數函數從被轉換激勵而評估：其中TQ ₁ _kHz 是1kHz音調在較小聲臨限的激勵位準。依據相等響度等高線(第11和12圖)等於4.2dB。注意當激勵等於較小聲臨限時，這些特定響度函數皆等於零。對於大於較小聲臨限之激勵，兩函數隨著依據史帝文斯強度感覺定律的功率定律單調地增加。窄頻帶函數之指數被選擇為大於寬頻帶函數之指數，使得窄頻帶函數比寬頻帶函數更快速地增加。窄頻帶和寬頻帶情況之指數β和增益G的特定選擇被選擇以匹配音調和雜訊之響度成長的試驗資料。The narrowband and wideband specific loudness values N _NB [ b , t ] and N _wB [ b , t ] can be evaluated from the transformed excitation using an exponential function: Where TQ ₁ _kHz is the excitation level of the ₁ _kHz tone at the lower sound limit. According to the equal loudness contour (Figures 11 and 12) is equal to 4.2dB. Note that these specific loudness functions are equal to zero when the excitation is equal to the smaller sound threshold. For excitations that are greater than the smaller sound threshold, the two functions monotonically increase with the power law according to the Stevens intensity perception law. The exponent of the narrow band function is chosen to be larger than the exponent of the wide band function such that the narrow band function increases more rapidly than the wide band function. The specific choice of the exponent β and the gain G for the narrow and wide band cases was chosen to match the experimental data for the loudness growth of tones and noise.

Moore和Glasberg建議當激勵在聽力臨限時特定響度應該等於一些小的數值而非零。特定響度應該接著隨著激勵減少至零而單調地減少至零。因為聽力臨限是可能的臨限(音調察覺率為50%)，並且一些音調，各在其臨限，被一起呈現而可總計為比任何分別的音調更容易被聽見之聲音。在被揭示的申請中，將這性質增加至特定響度函數將添加使增益解決器在激勵接近臨限時更適當的表現之利益，如將於下被討論者。如果特定響度在激勵是臨限或臨限之下時被定義為零，增益解決器在激勵是臨限或臨限之下時將不具有單一的解答。如果，另一方面，特定響度在激勵大於或等於零的所有數值時被定義為單調地增加，如Moore和Glasberg所建議，單一的解答將會存在。大於一單位的響度尺度係數將導致大於一單位的增益並且反之亦然。方程式11a和11b之特定響度函數可被修改以具有所需的性質依據：其中λ常數大於一，η指數小於一，並且常數K和C被選擇使得特定響度函數和其之第一導數在＝λTQ ₁ _kHz 點是連續的。Moore and Glasberg suggest that the specific loudness should be equal to some small values rather than zero when the stimulus is in the hearing threshold. The specific loudness should then be monotonically reduced to zero as the stimulus is reduced to zero. Because hearing threshold is a possible threshold (tone perception rate is 50%), and some tones, each in their limits, are presented together and can sum up sounds that are more audible than any separate tone. In the disclosed application, adding this property to a particular loudness function will add to the benefit of the gain solver's more appropriate performance when the stimulus approaches the threshold, as will be discussed below. If the specific loudness is defined as zero when the stimulus is below the threshold or threshold, the gain solver will not have a single solution when the stimulus is below the threshold or threshold. If, on the other hand, a particular loudness is defined as a monotonous increase in exciting all values greater than or equal to zero, a single solution would exist as suggested by Moore and Glasberg. A loudness scale factor greater than one unit will result in a gain greater than one unit and vice versa. The specific loudness function of equations 11a and 11b can be modified to have the required property basis: Where the λ constant is greater than one, the η exponent is less than one, and the constants K and C are selected such that the specific loudness function and its first derivative are =λ TQ ₁ _kHz point is continuous.

依據特定響度，全面或“總計”響度L [t ]利用跨越所有的頻帶b的特定響度總和而被所給予： Depending on the specific loudness, the full or "total" loudness L [ t ] is given by the sum of the specific loudness across all frequency bands b:

特定響度修改部105 在特定響度修改函數(“特定響度修改”)105中，目標特定響度，被稱為[b ,t ]，可依據所需的全面裝置或處理程序之應用以各種方法從SL 104的特定響度(第7圖)被計算。如將被更詳細地說明於下，一組目標特定響度可使用一組尺度係數α而被計算，例如，於音量控制的情況中。參看方程式16和其之相關的說明。於自動增益控制(AGC)和動態範圍控制(DRC)的情況中，一組目標特定響度可使用所需輸出響度和輸入響度的比率而被計算。參看方程式17和18和它們相關之說明。於動態等化的情況中，一組目標特定響度可使用被提供在方程式23和其之相關的說明之關係而被計算。 The specific loudness modification section 105 in the specific loudness modification function ("specific loudness modification") 105, the target specific loudness, is called [ b , t ] can be calculated from the specific loudness of the SL 104 (Fig. 7) in various ways depending on the desired full device or processing application. As will be explained in more detail below, a set of target specific loudnesses can be calculated using a set of scale factors a, for example, in the case of volume control. See equation 16 and its associated description. In the case of automatic gain control (AGC) and dynamic range control (DRC), a set of target specific loudnesses can be calculated using the ratio of the desired output loudness to the input loudness. See equations 17 and 18 and their associated descriptions. In the case of dynamic equalization, a set of target specific loudnesses can be calculated using the relationship provided in Equation 23 and its associated description.

益解決器106 在這範例中，對於各個頻帶b和每個區間t，增益解決器106將以平滑化激勵[b ,t ]和目標特定響度[b ,t ]作為其輸入並且產生接著被使用以修改音訊的增益G [b ,t ]。令函數Ψ{．}代表從激勵至特定響度的非線性轉換以至於 Benefit solver 106 In this example, gain solver 106 will be smoothed for each frequency band b and each interval t [ b , t ] and target specific loudness [ b , t ] acts as its input and produces a gain G [ b , t ] that is then used to modify the audio. Let the function Ψ{. } represents a nonlinear transformation from excitation to specific loudness.

增益解決器解出G [b ,t ]以至於 The gain solver solves G [ b , t ] so that

增益解決器106決定頻變和時變增益，當其被施加至原始激勵時，產生理想地等於所需的目標特定響度的特定響度。The gain solver 106 determines the frequency and time varying gains, when applied to the original excitation, produces a particular loudness that is ideally equal to the desired target specific loudness.

實際上，增益解決器決定頻變和時變增益，當其被施加至音訊信號之頻率領域版本時產生修改音訊信號以便減少在其之特定響度和目標特定響度之間的差異。理想地，修改被達成以至於被修改音訊信號具有接近目標特定響度之近似值的特定響度。方程式14a之解可以多種方式被製作。例如，如果特定響度之倒數的封閉形式數學表示式存在，並以Ψ^－ ¹ {．}表示，增益可直接地利用重新配置方程式14a而被計算： In effect, the gain solver determines the frequency and time varying gains that, when applied to the frequency domain version of the audio signal, produce a modified audio signal to reduce the difference between its particular loudness and target specific loudness. Ideally, the modification is such that the modified audio signal has a specific loudness that approximates an approximation of the target specific loudness. The solution of equation 14a can be made in a variety of ways. For example, if the closed form mathematical representation of the reciprocal of a particular loudness exists, and Ψ ^- ¹ {. } indicates that the gain can be calculated directly by reconfiguring equation 14a:

另外地，如果Ψ^－ ¹ {．}封閉形式解不存在，一種疊代的方法可被採用，其中各個疊代方程式14a使用目前增益的評估值而被評估。產生之特定響度被與所需的目標比較且增益依據誤差而被更動。如果增益適當地被更動，它們將收斂至所需的解。另一方法包含預先計算在各個頻帶中某範圍的激勵數值之函數Ψ{．}以產生一組對照表。從這對照表可得到反函數之近似並且增益可接著從方程式14b被計算。如上所述，目標特定響度可利用特定響度之尺度調整而被表示： In addition, if Ψ ^- ¹ {. The closed form solution does not exist, and an iterative method can be employed in which each iteration equation 14a is evaluated using the evaluation value of the current gain. The specific loudness produced is compared to the desired target and the gain is changed according to the error. If the gains are properly changed, they will converge to the desired solution. Another method involves pre-calculating a function of a range of excitation values in each frequency band Ψ{. } to generate a set of comparison tables. An approximation of the inverse function can be obtained from this look-up table and the gain can then be calculated from Equation 14b. As noted above, the target specific loudness can be represented by a scale adjustment of a particular loudness:

將方程式13代入14c並且接著將14c代入14b產生增益的另外表示式： Substituting Equation 13 into 14c and then substituting 14c into 14b produces an additional representation of the gain:

增益可完全地以激勵[b ,t ]和特定響度尺度調整Ξ [b ,t ]之函數而被表示。因此，增益可經由14d之評估或等效對照表被計算而不必明確地計算特定響度或目標特定響度為中間數值。但是，這些數值經由方程式14d之使用而被隱含地計算。利用明確地或隱含地計算特定響度和目標特定響度以計算修改參數之另一等效方法可被設計，並且本發明將涵蓋所有此類方法。Gain can be fully motivated [ b , t ] and a specific loudness scale adjustment Ξ [ b , t ] is expressed as a function. Thus, the gain can be calculated via an evaluation of 14d or an equivalent comparison table without having to explicitly calculate a particular loudness or target specific loudness as an intermediate value. However, these values are implicitly calculated via the use of Equation 14d. Another equivalent method of calculating a modified parameter using a specific loudness and a target specific loudness, either explicitly or implicitly, can be designed, and the present invention will cover all such methods.

合成濾波器組110 如上所述，分析濾波器組100可有效地經由使用短時間離散傅立葉轉換(STDFT)或修改離散餘弦轉換而被製作，並且STDFT或MDCT可相似地被使用以製作合成濾波器組110。明確地說，令X [k ,t ]代表輸入音訊的STDFT或MDCT，如先前所定義，在合成濾波器組110中之被處理(被修改)音訊的STDFT或MDCT可被計算為其中S _b [k ]是與頻帶b相關的合成濾波器響應，以及d是與第7圖中之延遲區塊109相關的延遲。合成濾波器S _b [k ]之型式可被選擇為和在分析濾波器組H _b [k ]中所採用的濾波器相同，或它們可被修改以在缺乏任何增益修改(亦即，當G [b ,t ]＝1時)時提供完整的重組。熟習本技術者應明白，最後被處理之音訊可接著經由反傅立業(Fourier)或[k ,t ]之修改餘弦轉換和重疊－相加合成而被產生。 Synthesis Filter Bank 110 As described above, the analysis filter bank 100 can be efficiently fabricated using Short Time Discrete Fourier Transform (STDFT) or Modified Discrete Cosine Transform, and STDFT or MDCT can be similarly used to make a synthesis filter. Group 110. Specifically, let X [ k , t ] represent the STDFT or MDCT of the input audio, as defined previously, the STDFT or MDCT of the processed (modified) audio in the synthesis filter bank 110 can be calculated as Where S _b [ k ] is the composite filter response associated with band b, and d is the delay associated with delay block 109 in FIG. The pattern of the synthesis filter S _b [ k ] can be selected to be the same as that used in the analysis filter bank H _b [ k ], or they can be modified to lack any gain modification (ie, when G A complete reorganization is provided when [ b , t ] = 1). Those skilled in the art should understand that the final processed audio can then be passed through Fourier or The modified cosine transform of [ k , t ] and the overlap-addition synthesis are generated.

目標特定響度實施本發明之論點的配置行為例如第1－7圖之範例大致取決於目標特定響度[b ,t ]被計算的方式。雖然本發明不受限制於計算目標特定響度的任何特定函數或反函數，許多此類函數和它們的適當應用將被說明。 Target Specific Loudness The configuration behavior of the arguments implementing the present invention, such as the examples of Figures 1-7, generally depends on the target specific loudness [ b , t ] is calculated. Although the invention is not limited to any particular function or inverse function of calculating a particular loudness of a target, many such functions and their appropriate applications will be described.

適合音量控制之非時變和非頻變函數一種標準音量控制利用施加一組寬頻帶增益至音訊而調整音訊信號之響度。一般而言，增益被耦合至被使用者調整的旋鈕或推桿直至音訊之響度達到所需的位準。本發明之論點允許此類控制較符合心理聽覺化的製作方式。依據本發明之此論點，與其將一組寬頻帶增益耦合至音量控制而產生跨越所有的頻帶之同量的增益改變，其可能導致感知頻譜的改變，一組特定響度尺度調整係數將被聯合至音量控制調整而使得各個多數頻帶之增益以考慮到人類聽力模式之數量而被改變，因此理想地，不會導致感知頻譜的改變。在本發明之此論點的背景和其示範例的應用中，“常數”或“非時變”是有意允許音量控制尺度係數之設定，例如，被一使用者不時的改變。這種“非時變”有時被稱為“準非時變”、“準－穩態”、“片段非時變”、“片段穩態”、“階段式非時變”、以及“階段式穩態”。給予尺度係數，α，目標特定響度可被計算為量測的特定響度與α相乘之結果： Non-Time-varying and Non-Frequency Variable Functions for Volume Control A standard volume control adjusts the loudness of an audio signal by applying a set of broadband gains to the audio. In general, the gain is coupled to a knob or fader that is adjusted by the user until the loudness of the audio reaches the desired level. The arguments of the present invention allow such controls to be more in line with psychoacoustic production. In accordance with this aspect of the invention, instead of coupling a set of broadband gains to volume control to produce the same amount of gain change across all frequency bands, which may result in a change in the perceived spectrum, a set of specific loudness scale adjustment coefficients will be combined to The volume control is adjusted such that the gain of each of the majority of the bands is changed in consideration of the number of human hearing modes, and ideally, does not result in a change in the perceived spectrum. In the context of this aspect of the invention and its application, "constant" or "non-time varying" is the intention to allow the setting of the volume control scale factor, for example, by a user from time to time. This “non-time-varying” is sometimes referred to as “quasi-time-variant”, “quasi-steady state”, “fragment non-time-varying”, “fragment steady-state”, “stage-time non-time-varying”, and “stage”. Steady state". Given the scale factor, α, the target specific loudness can be calculated as the result of multiplying the specific loudness of the measurement by α:

因為總計響度L [t ]是跨越所有頻帶b的特定響度N [b ,t ]之總和，先前之修改同時也以係數α調整總計響度，但是其方式可對於音量控制調整之改變在特定時間保存相同的感知頻譜。換言之，在任何特定的時間，音量控制調整的改變產生感知響度之改變但是並不產生被修改音訊的感知頻譜對於未被修改音訊的感知頻譜之改變。第13a圖展示當α＝0.25時在特定的時間“t”跨越頻帶“b”，一組由女性語音構成之音訊信號產生的多頻帶增益G [b ,t ]。以0.25(水平線)調整原始總計響度所需的寬頻帶增益，如標準音量控制中，同時也被標繪作為比較。與中間頻帶相較之下，多頻帶增益G [b ,t ]在低和高頻帶增加。這與指示人耳在低和高的頻率較不敏感之相等響度等高線是一致的。Since the total loudness L [ t ] is the sum of the specific loudnesses N [ b , t ] across all frequency bands b, the previous modification also adjusts the total loudness by the coefficient α, but the manner can be saved for the change of the volume control adjustment at a specific time. The same perceptual spectrum. In other words, at any particular time, the change in volume control adjustment produces a change in perceived loudness but does not produce a change in the perceived spectrum of the modified audio for the perceived spectrum of the unmodified audio. Figure 13a shows the multi-band gain G [ b , t ] produced by a set of audio signals consisting of female speech at a particular time "t" across frequency band "b" when α = 0.25. The wideband gain required to adjust the original total loudness at 0.25 (horizontal line), as in standard volume control, is also plotted as a comparison. Compared to the intermediate frequency band, the multi-band gain G [ b , t ] increases in the low and high frequency bands. This is consistent with an equal loudness contour that indicates that the human ear is less sensitive at low and high frequencies.

第13b圖展示原始音訊信號、依據先前技術音量控制被修改之寬頻帶增益被修改信號、和依據本發明此論點被修改的多頻帶增益－被修改信號之特定響度。多頻帶增益被修改信號之特定響度是被以0.25調整的原始特定響度。寬頻帶增益被修改信號之特定響度相對於原始未被修改信號具有改變的頻譜形狀。於此情況中，特定響度，相對地，失去在低和高頻率的響度。這被感知為音訊的模糊化，因為其音量被減少，多頻帶被修改信號不具有此問題，因為其響度由感知響度領域導出的增益所控制。Figure 13b shows the original audio signal, the wideband gain modified signal modified according to prior art volume control, and the specific loudness of the multi-band gain-modified signal modified in accordance with this aspect of the invention. The specific loudness of the multi-band gain modified signal is the original specific loudness adjusted by 0.25. The specific loudness of the wideband gain modified signal has a changed spectral shape relative to the original unmodified signal. In this case, the specific loudness, relatively, loses the loudness at low and high frequencies. This is perceived as blurring of the audio because its volume is reduced and the multi-band modified signal does not have this problem because its loudness is controlled by the gain derived from the perceived loudness field.

除了與傳統音量控制相關的感知頻譜平衡失真之外尚有第二個問題的存在。呈現在方程式11a－11d所提供之響度模式中，響度感知之一種性質是在信號位準接近聽力臨限時，任何頻率的信號響度將更快速地減少。結果，所需以提供相同響度衰減至一組較小聲信號的電氣衰減小於較大聲信號所需的電氣衰減。傳統音量控制提供一組常數衰減而無視於信號位準，並且因此在減低音量時較小聲信號與較大聲信號相較之下成為“太小聲”。在許多情況中這導致音訊細節的損失。考慮在迴響房間中錄製響板的情況。在此錄製中主要的響板“擊打”與迴響回音相較之下是顯著地大聲，但是迴響回音傳送房間的大小。當音量被以傳統音量控制降低時，迴響回音相對於主要的擊打成為較小聲並且將終於消失在聽力臨限之下，導致聽起來“乾澀”的響板。以響度為主的音量控制防止錄製之較小聲部份的消失利用將錄製之較小聲迴響部份相對於較大聲的主要擊打而增強使得在這些部份之間的相對響度保持常數。為了達成此效果，多頻帶增益G [b ,t ]必須以與人類響度感知的時間解析度相稱之速率隨著時間而變化。因為多頻帶增益G [b ,t ]被計算為平滑化激勵[b ,t ]之函數，方程式8中的時間常數λ_b 選擇取決於在各個頻帶b中跨越時間之增益變化的速度。如先前所述，這些時間常數可被選擇為與在頻帶b之內人類響度感知的整合時間成比例的並且因此隨著時間產生適當的變化G [b ,t ]。應注意到如果時間常數被不適當地選擇(過快或過慢)，則感知上令人不愉快的人工效果可能被引入處理音訊。In addition to the perceived spectral balance distortion associated with traditional volume control, there is a second problem. Presented in the loudness mode provided by equations 11a-11d, one property of loudness perception is that signal loudness at any frequency will decrease more rapidly as the signal level approaches hearing threshold. As a result, the electrical attenuation required to provide the same loudness attenuation to a set of smaller acoustic signals is less than the electrical attenuation required for a larger acoustic signal. Conventional volume control provides a set of constant attenuations regardless of the signal level, and thus the smaller acoustic signal becomes "too small" compared to the larger acoustic signal when the volume is reduced. In many cases this results in a loss of audio detail. Consider the case where the castanets are recorded in the reverberant room. The main castanets "hit" in this recording is significantly louder than the reverberant echo, but the echo echo transmits the size of the room. When the volume is lowered by conventional volume control, the reverberant echo becomes a minor sound relative to the main hit and will eventually disappear below the hearing threshold, resulting in a "dry" sound cast. The loudness-based volume control prevents the disappearance of the smaller portion of the recording by using the smaller acoustic reverberation portion of the recording to be enhanced relative to the larger main hit, so that the relative loudness between these portions remains constant. . To achieve this effect, the multi-band gain G [ b , t ] must vary with time at a rate commensurate with the temporal resolution of human loudness perception. Because the multi-band gain G [ b , t ] is calculated as smoothing excitation The function of [ b , t ], the time constant λ _b in Equation 8 depends on the speed of the change in gain across time in each frequency band b. As previously described, these time constants can be selected to be proportional to the integration time of human loudness perception within band b and thus produce an appropriate change G [ b , t ] over time. It should be noted that if the time constant is improperly selected (too fast or too slow), then a perceptually unpleasant artificial effect may be introduced to process the audio.

適合固定等化之非時變和頻變函數在一些應用中，可能希望將固定感知等化施加至音訊，在此情況中目標特定響度可利用施加一組非時變但是頻變的尺度係數Θ[b ]而被計算如下其中[b ,t ]是目標特定響度，N [b ,t ]是音訊信號之特定響度，b是頻率量測，以及t是函數量測。於此情況中，尺度調整可依據頻帶變化。此應用可用以強調，例如，在頻譜中佔主要地位的語音頻率部分以便提升理解度。 Non-time-varying and frequency-varying functions suitable for fixed equalization In some applications, it may be desirable to apply fixed perceptual equalization to the audio, in which case the target specific loudness may utilize a set of non-time-varying but frequency-changing scale coefficients. [ b ] is calculated as follows among them [ b , t ] is the target specific loudness, N [ b , t ] is the specific loudness of the audio signal, b is the frequency measurement, and t is the function measurement. In this case, the scale adjustment can vary depending on the frequency band. This application can be used to emphasize, for example, the portion of the speech frequency that is dominant in the spectrum in order to increase understanding.

適合自動增益和動態範圍控制之非頻變和時變函數自動增益和動態範圍控制(AGC和DRC)的技術習知於音訊處理領域中。抽象而言，兩種技術皆以一些方式量測音訊信號的位準並且接著以量測位準之函數的數量增益修改信號。在AGC的情況中，信號被增益－修改使得其之量測位準較接近使用者選擇的參考位準。在DRC的情況中，信號被增益－修改使得信號的量測位準範圍被轉換至所需的範圍中。例如，使用者可能希望讓音訊之較小聲部份較大聲並且大聲部份較小聲。此類系統被Robinson和Gundry(Charles Robinson和Kenneth Gundry，“經由元資料控制動態範圍”，1999年9月24－27日於紐約舉行之第107次AES會議，預行刊印版5028)所說明。AGC和DRC的傳統製作一般採用音訊信號位準的簡單量測，例如，平滑化峰值或均方根(rms)振幅，以驅動增益修改。此類簡單量測與音訊之感知響度有某些程度的關聯，但是本發明之論點利用依據心理聽覺學模式的響度量測以驅動增益修改而允許更感知的AGC和DRC。同時，許多傳統AGC和DRC系統以寬頻帶增益施加增益修改，因而導致上述被處理音訊之音質(頻譜)失真。另一方面，本發明之論點，採用一組多頻帶增益以使這種失真減少或最小化的方式形成特定響度。 Non-frequency-varying and time-varying functions for automatic gain and dynamic range control The techniques of automatic gain and dynamic range control (AGC and DRC) are well known in the field of audio processing. Abstractly, both techniques measure the level of the audio signal in some way and then modify the signal by the number of gains that measure the level. In the case of AGC, the signal is gain-modified such that its measurement level is closer to the user-selected reference level. In the case of DRC, the signal is gain-modified such that the measurement level range of the signal is converted to the desired range. For example, the user may wish to make the smaller portion of the audio louder and the louder portion smaller. Such systems are described by Robinson and Gundry (Charles Robinson and Kenneth Gundry, "Controlling Dynamic Range via Metadata", 107th AES Conference, New York, 24-27 September 1999, pre-printed 5028). Traditional fabrication of AGCs and DRCs typically uses simple measurements of audio signal levels, such as smoothing peak or root mean square (rms) amplitudes to drive gain modification. Such simple measurements have some degree of correlation with the perceived loudness of the audio, but the arguments of the present invention utilize a loud metric based on the psychoacoustic mode to drive gain modification while allowing for more perceptual AGC and DRC. At the same time, many conventional AGC and DRC systems apply gain modifications with wideband gain, resulting in distortion of the quality (spectral) of the above-described processed audio. On the other hand, the argument of the present invention forms a specific loudness in a manner that a set of multi-band gains are used to reduce or minimize such distortion.

採用本發明之論點的AGC和DRC應用皆具有將輸入寬頻帶響度L _i [t ]轉換或映射成為所需輸出寬頻帶響度L _o [t ]的函數之特徵，其中響度被以感知響度為單位量測，例如宋。輸入寬頻帶響度L _i [t ]是輸入音訊信號之特定響度N [b ,t ]的一組函數。雖然其可能與輸入音訊信號之總體響度相同，其可能是音訊信號之總計響度的時間平滑化版本。Both AGC and DRC applications employing the teachings of the present invention feature a function of converting or mapping the input wideband loudness L _i [ t ] to a desired output wideband loudness L _o [ t ], where the loudness is in units of perceived loudness. Measurement, such as Song. The input wideband loudness L _i [ t ] is a set of functions of the specific loudness N [ b , t ] of the input audio signal. Although it may be the same as the overall loudness of the input audio signal, it may be a time smoothed version of the total loudness of the audio signal.

第14a和14b圖分別地展示AGC和aDRC的一般映射函數之範例。給予這種a映射其中L _o [t ]是L _i [t ]的函數，目標特定響度可被計算為 Figures 14a and 14b show examples of general mapping functions for AGC and aDRC, respectively. Given this a mapping where L _o [ t ] is a function of L _i [ t ], the target specific loudness can be calculated as

音訊信號之原始特定響度N [b ,t ]以所需的輸出寬頻帶響度和輸入寬頻帶響度之比率被簡單地調整以產生輸出特定響度[b ,t ]。在AGC系統中，輸入寬頻帶響度L _i [t ]一般是音訊之長期總計響度的量測。這可利用將總計響度L [t ]跨越時間平滑化以產生L _i [t ]而被達成。The original specific loudness N [ b , t ] of the audio signal is simply adjusted to produce an output specific loudness at a desired ratio of output wideband loudness to input wideband loudness. [ b , t ]. In an AGC system, the input wideband loudness L _i [ t ] is typically a measure of the long-term total loudness of the audio. This can be achieved by smoothing the total loudness L [ t ] across time to produce L _i [ t ].

與AGC比較之下，DRC系統反應於信號之響度的較短期改變，並且因此L _i [t ]可簡單地等於L [t ]。結果，特定響度的尺度調整，以L _o [t ]/L _i [t ]表示，可能快速地變動導致被處理音訊中不被需要的人工效果。一種一般的缺陷是某些其他相對不相關的頻譜之部份造成頻率頻譜的部份之可聽見調變。例如，古典音樂選擇可能包含持續弦樂音符佔主要地位的高頻率，以及包含大聲作響的定音鼓之低頻率。當定音鼓被擊打時，全面響度L _i [t ]增加，並且DRC系統施加衰減至整個特定響度。接著可聽到弦樂響度隨著定音鼓被減弱並且增強。傳統寬頻帶DRC系統之同樣地具有這種頻譜的相互調整之問題，以及一般的解決辦法包含獨立地施加DRC至不同的頻帶。被揭示於此處之系統是固有地多頻帶，因為其具有濾波器組並且採用感知響度模式的特定響度之計算，並且因此修改DRC系統以依據本發明之論點多頻帶形式操作，該系統相對地直接並且將接著被說明。In contrast to AGC, the DRC system reacts to a shorter-term change in the loudness of the signal, and thus L _i [ t ] can simply be equal to L [ t ]. As a result, the scale adjustment of the specific loudness, expressed as L _o [ t ]/ L _i [ t ], may rapidly change the artifacts that are not required in the processed audio. A general drawback is that some of the other relatively uncorrelated portions of the spectrum cause audible modulation of portions of the frequency spectrum. For example, classical music choices may include high frequencies where dominant string notes dominate, and low frequencies with tonal drums that make loud noises. When the timpani is hit, the overall loudness L _i [ t ] increases and the DRC system applies attenuation to the entire specific loudness. Then you can hear the string loudness being weakened and enhanced with the timpani. Conventional broadband DRC systems have the same problem of mutual adjustment of such spectrum, and the general solution involves applying DRCs independently to different frequency bands. The system disclosed herein is inherently multi-band because it has a filter bank and employs a calculation of the specific loudness of the perceived loudness pattern, and thus modifies the DRC system to operate in accordance with the inventive multi-band format, the system relatively Direct and will be explained next.

適合動態範圍控制之頻變和時變函數 DRC系統可利用允許輸入和輸出響度獨立地隨著頻帶b變化而被延伸為以多頻帶或頻變形式操作。這些多頻帶響度數值被以L _i [b ,t ]和L _o [b ,t ]表示，並且目標特定響度可接著被給予為其中L _o [b ,t ]從L _i [b ,t ]獨立地按照各個頻帶b被計算或映射，如第14b圖所展示。輸入寬頻帶響度是輸入音訊信號之特定響度的一組函數。雖然其可能與輸入音訊信號之特定響度相同，其可能是音訊信號之特定響度的時間平滑化及/或頻率平滑化版本。 Frequency-variable and time-varying function DRC systems suitable for dynamic range control can be extended to operate in multi-band or frequency-variant mode by allowing input and output loudness to vary independently with frequency band b. These multi-band loudness values are represented by L _i [ b , t ] and L _o [ b , t ], and the target specific loudness can then be given as Where L _o [ b , t ] is independently calculated or mapped from L _i [ b , t ] according to each frequency band b, as shown in Figure 14b. The input broadband loudness is a set of functions that are specific to the loudness of the input audio signal. Although it may be the same as the specific loudness of the input audio signal, it may be a time smoothing and/or frequency smoothing version of the specific loudness of the audio signal.

計算L _i [b ,t ]的最直接方法是將其設為相等於特定響度N [b ,t ]。於此情況中，DRC在感知響度模式之聽覺的濾波器組中的各頻帶上獨立地進行，而不是依據相同輸入與輸出響度比率在所有頻帶上進行如先前在“適合自動增益和動態範圍控制之非頻變和時變函數”的標題之下所說明的。在採用40頻帶的實際實施例中，這些頻帶沿著頻率軸的間隔調節相對地小以便提供響度的精確量測。但是，將DRC尺度係數獨立地施加至各個頻帶可能導致被處理音訊聽起來有“裂開”的感覺。為了避免這問題，可選擇利用將特定響度N [b ,t ]跨越頻帶平滑化以計算L _i [b ,t ]而使從頻帶到頻帶被施加的DRC數量不過度變化。這可利用定義頻帶－平滑化濾波器Q (b )並且接著依據標準捲積總和將跨越所有頻帶c的特定響度平滑化而被達成：其中N [c ,t ]是音訊信號的特定響度以及Q(b－c)是平滑化濾波器的頻移反應。第15圖展示這種頻帶平滑化濾波器之一範例。The most straightforward way to calculate L _i [ b , t ] is to set it equal to the specific loudness N [ b , t ]. In this case, the DRC is performed independently on each frequency band in the auditory filter bank that senses the loudness mode, rather than on the same input and output loudness ratios on all frequency bands as previously described in "suitable for automatic gain and dynamic range control. The non-frequency and time-varying functions are described below under the heading. In a practical embodiment employing 40 bands, the spacing of these bands along the frequency axis is relatively small to provide an accurate measure of loudness. However, applying DRC scale coefficients independently to each frequency band may result in a perceived "cleavage" of the processed audio. In order to avoid this problem, it is optional to use a smoothness of the specific loudness N [ b , t ] across the frequency band to calculate L _i [ b , t ] such that the number of DRCs applied from the frequency band to the frequency band is not excessively changed. This can be achieved by defining a band-smoothing filter Q ( b ) and then smoothing the specific loudness across all bands c based on the standard convolution sum: Where N [ c , t ] is the specific loudness of the audio signal and Q(b-c) is the frequency shift response of the smoothing filter. Figure 15 shows an example of such a band smoothing filter.

如果以L _o [b ,t ]的函數計算L _i [b ,t ]之DRC函數在各頻帶b上被固定，施加至特定響度N [b ,t ]之各個頻帶的變化將取決於被處理音訊之頻譜，即使信號的全面響度保持相同。例如，具有大聲低音和較小聲高音的音訊信號之低音可能被移除並且高音增強。具有較小聲低音和大聲高音的信號可能有相反的情況。最終的影響是音質或音訊之感知頻譜的改變，而這在某些應用中可能是所需的。If the DRC function for calculating L _i [ b , t ] by the function of L _o [ b , t ] is fixed in each frequency band b, the variation of each frequency band applied to the specific loudness N [ b , t ] will depend on the processed The spectrum of the audio, even if the overall loudness of the signal remains the same. For example, the bass of an audio signal with a loud bass and a small pitch may be removed and the treble is enhanced. Signals with smaller bass and louder treble may have the opposite situation. The final effect is a change in the perceived spectrum of sound quality or audio, which may be desirable in some applications.

但是，可能希望進行多頻帶DRC而不必修改音訊之平均感知頻譜。使用者可能希望各個頻帶之平均修改大致地相同但仍然允許修改之短期變化在頻帶之間獨立地操作。所需的效果可利用迫使DRC在各個頻帶中的平均運轉狀態為與一些參考運轉狀態相同而被達成。使用者可能選擇這參考運轉狀態為寬頻帶輸入響度所需L _i [t ]的DRC。令函數L _o [t ]＝DRC {L _i [t ]}代表寬頻帶響度所需的DRC映射。接著令代表寬頻帶輸入響度之時間－平均版本，並且令代表多頻帶輸入響度L _i [b ,t ]之時間－平均版本。多頻帶輸出響度可接著被計算為 However, it may be desirable to perform multi-band DRC without having to modify the average perceived spectrum of the audio. The user may wish that the average modification of each frequency band is substantially the same but still allows the modified short-term changes to operate independently between the frequency bands. The desired effect can be achieved by forcing the DRC's average operating state in each frequency band to be the same as some reference operating states. The user may select this reference DRC for the operating state to be L _i [ t ] required for wideband input loudness. Let the function L _o [ t ]= DRC { L _i [ t ]} represent the DRC mapping required for wide-band loudness. Then order Represents the time-average version of the broadband input loudness, and Represents the time-average version of the multi-band input loudness L _i [ b , t ]. The multi-band output loudness can then be calculated as

注意多頻帶輸入響度是最先被尺度調整以和寬頻帶輸入響度具有相同平均範圍。為寬頻帶響度設計的DRC函數接著被施加。最後，所產生的結果被尺度調整至多頻帶響度的平均範圍。這型式的多頻帶DRC維持減少頻譜泵的優點，同時保留音訊之平均感知頻譜。Note that the multi-band input loudness is first scaled to have the same average range as the wideband input loudness. The DRC function designed for wideband loudness is then applied. Finally, the resulting results are scaled to the average range of multi-band loudness. This type of multi-band DRC maintains the advantages of a reduced spectrum pump while preserving the average perceived spectrum of the audio.

適合動態等化之頻變和時變函數本發明之論點的另一應用在於有意地將音訊的時變感知頻譜轉換至目標非時變感知頻譜而仍然保留音訊的原始動態範圍。這處理可被稱為動態等化(DEQ)。在傳統靜態等化中，一組簡單固定過濾被施加至音訊以便改變其之頻譜。例如，固定的低音或高音增強可能被施加。這種處理不考慮到音訊的目前頻譜並且可能因此不適合一些信號，亦即，已經包含相對大量之低音或高音的信號。利用DEQ，信號之頻譜被量測並且信號接著被動態地修改以便將量測頻譜轉換為實際靜態所需的形式。本發明之論點中，這種所需的形式跨越濾波器組中頻帶而被明定並且被稱為EQ [b ]。在實際的實施例中，量測頻譜應該代表音訊之平均頻譜形狀其可利用將特定響度N [b ,t ]跨越時間平滑化而被產生。這處理可被稱為平滑化特定響度[b ,t ]。如多頻帶DRC，可能不希望DEQ修改在頻帶間激烈地變化，並且因此一組頻帶－平滑函數可被施加以產生一組頻帶－平滑化頻譜： Frequency-Varying and Time-Varying Functions Suitable for Dynamic Equalization Another application of the argument of the present invention is to intentionally convert the time-varying perceptual spectrum of the audio to the target non-time-varying spectrum while still retaining the original dynamic range of the audio. This process can be referred to as dynamic equalization (DEQ). In traditional static equalization, a simple set of fixed filters is applied to the audio to change its spectrum. For example, a fixed bass or treble boost may be applied. This processing does not take into account the current spectrum of the audio and may therefore not be suitable for some signals, i.e., signals that already contain a relatively large amount of bass or treble. With DEQ, the spectrum of the signal is measured and the signal is then dynamically modified to convert the measured spectrum to the form that is actually static. In the argument of the present invention, this desired form is defined across the frequency bands in the filter bank and is referred to as EQ [ b ]. In a practical embodiment, the measurement spectrum should represent the average spectral shape of the audio which can be generated by smoothing the specific loudness N [ b , t ] across time. This process can be called smoothing specific loudness [ b , t ]. As with multi-band DRC, it may not be desirable for DEQ modifications to vary drastically between bands, and thus a set of band-smoothing functions can be applied to produce a set of bands-smoothed spectrum :

為了維持音訊的原始動態範圍，所需的頻譜EQ [b ]應該被正規化以具有與以代表的量測頻譜形狀相同的全面響度。這正規化頻譜形狀可以表示： In order to maintain the original dynamic range of the audio, the required spectrum EQ [ b ] should be normalized to have Represents the overall loudness of the same measured spectrum shape. This normalized spectrum shape can Indicates:

最後，目標特定響度被計算為其中β是從零至1的一組使用者指定參數，指示將被施加的DEQ程度。參看至方程式23，當β＝₀ 時，原始特定響度未被修改，並且當β＝₁ 時，特定響度以所需的頻譜形狀和量測頻譜形狀之比率被尺度調整。Finally, the target specific loudness is calculated as Where β is a set of user-specified parameters from zero to 1, indicating the degree of DEQ to be applied. Referring to Equation 23, when β = ₀ , the original specific loudness is not modified, and when β = ₁ , the specific loudness is scaled by the ratio of the desired spectral shape and the measured spectral shape.

產生所需的頻譜形狀EQ [b ]之一種方便的方式是令使用者將其設定為等於，如讓使用者覺得舒適之一些音訊片段的頻譜平衡之量測。在實際的實施例中，例如展示於第16圖中，使用者可被提供一組按鈕或其他適當的致動器507，當其被致動時，將捕捉目前音訊的頻譜形狀之量測，並且接著將這量測儲存為一組預置(在目標特定響度預置捕捉和儲存器506中)，其可稍後在DEQ引動(如被預置選擇508)時被載入EQ [b ]。第16圖是被簡化的第7圖版本，其中僅一組單線被展示以代表從分析濾波器組100至合成濾波器組110的多數頻帶。第17圖範例同時也提供一組動態EC特定響度(SL)修改505，其依據先前所說明之動態等化提供函數或裝置104所量測的特定響度之一種修改。A convenient way to generate the desired spectral shape EQ [ b ] is to have the user set it equal to A measure of the spectral balance of some audio segments that are comfortable for the user. In a practical embodiment, such as shown in Figure 16, the user may be provided with a set of buttons or other suitable actuators 507 that, when actuated, will capture the spectral shape of the current audio. The measurements are then stored and stored as a set of presets (in target specific loudness preset capture and storage 506) that can be loaded later upon DEQ priming (as preset selection 508) EQ [ b ]. Figure 16 is a simplified version of Figure 7, in which only a single set of lines is shown to represent the majority of the bands from analysis filter bank 100 to synthesis filter bank 110. The Figure 17 example also provides a set of dynamic EC specific loudness (SL) modifications 505 that provide a modification of the particular loudness measured by the function or device 104 in accordance with the dynamic equalization previously described.

組合處理使用者可能希望將所有先前說明的處理，包含音量控制(VC)、AGC、DRC、及DEQ，組合成為一單一系統。因為這些處理程序各可被以特定響度之尺度調整表示，它們全部可容易地被組合如下：其中Ξ _＊ [b ,t ]代表與處理程序"^＊ "相關的尺度調整。接著可計算代表被組合處理之目標特定響度的單一組增益G [b ,t ]。 Combining Processes A user may wish to combine all of the previously described processes, including volume control (VC), AGC, DRC, and DEQ, into a single system. Since these handlers can each be represented by a specific loudness scale, they can all be easily combined as follows: Where Ξ _* [ b , t ] represents the scale adjustment associated with the handler " ^* ". A single set of gains G [ b , t ] representing the target specific loudness of the combined processing can then be calculated.

在一些情況中，單一或一個組合的響度修改程序之尺度調整可能隨著時間快速地變動並且在產生的處理音訊中產生人工效果。因此需要將這些尺度調整的一些子集合平滑化。一般而言，來自VC和DEQ的尺度調整隨著時間平滑地變化，但是可能需要將AGC和DRC尺度調整之組合的平滑化。令這些尺度調整的組合如下所示 In some cases, the scaling of a single or a combined loudness modification procedure may change rapidly over time and produce an artifact in the resulting processed audio. Therefore, some subsets of these scale adjustments need to be smoothed. In general, scale adjustments from VC and DEQ vary smoothly over time, but smoothing of the combination of AGC and DRC scale adjustments may be required. Let the combination of these scale adjustments be as follows

平滑化的基本概念在於當特定響度增加時，被組合的尺度調整應該快速地反應，並且當特定響度減少時，尺度調整應該被更加平滑化。這概念對應至在音訊壓縮器設計中所採用的快速沖擊和緩慢釋放之習知實施。平滑化尺度係數之適當的時間常數可利用將特定響度之頻帶平滑化版本跨越時間平滑化而計算出。首先特定響度之頻帶－平滑化版本被計算出：其中是N [c ,t ]音訊信號的特定響度以及Q(b－c)是平滑化濾波器的頻移響應，如先前之方程式19。The basic concept of smoothing is that as the specific loudness increases, the combined scale adjustments should react quickly, and as the specific loudness decreases, the scale adjustments should be smoothed. This concept corresponds to the conventional implementation of fast impact and slow release employed in audio compressor design. The appropriate time constant for smoothing the scale factor can be calculated by smoothing the band smoothing version of the specific loudness over time. First the band-smoothed version of the specific loudness is calculated: Where is the specific loudness of the N [ c , t ] audio signal and Q(b-c) is the frequency shift response of the smoothing filter, as in Equation 19 above.

這頻帶平滑化特定響度的時間平滑化版本接著被計算為其中頻帶相關平滑化係數λ[b ,t ]如下所示 The time smoothed version of this band smoothing specific loudness is then calculated as The band-dependent smoothing coefficient λ[ b , t ] is as follows

平滑化的尺度調整組合接著被計算為其中λ_M [b ,t ]是λ[b ,t ]的頻帶平滑化版本： The smoothed scale adjustment combination is then calculated as Where λ _M [ b , t ] is the band smoothing version of λ[ b , t ]:

平滑化係數的頻帶平滑化防止時間平滑化尺度調整之跨越頻帶過度的改變。上述尺度係數的時間和頻帶平滑化使得處理音訊包含較少令人不愉快的感知人工效果。The band smoothing of the smoothing coefficients prevents excessive changes in the spanning band of the time smoothing scale adjustment. The time and band smoothing of the above scale coefficients results in processing the audio containing less unpleasant perceptual artifacts.

雜訊補償在許多音訊重播環境中干擾聆聽者希望聽到之音訊的背景雜訊可能存在。例如，在移動車輛中之聆聽者可能以安裝的立體聲系統播放音樂而來自引擎和路面之雜訊可顯著地改變音樂的感知。尤其是，在雜訊能量相對於音樂能量是顯著的頻譜部份，音樂之感知響度將被減少。如果雜訊位準夠大，音樂將完全地被遮罩。 Noise Compensation Background noise that interferes with the audio that the listener wishes to hear may be present in many audio replay environments. For example, a listener in a moving vehicle may play music in an installed stereo system and noise from the engine and the road surface can significantly change the perception of the music. In particular, the perceived loudness of the music will be reduced in the portion of the spectrum where the noise energy is significant relative to the music energy. If the noise level is large enough, the music will be completely masked.

依據本發明之論點，將希望選擇增益G [b ,t ]而使干擾雜訊存在時被處理音訊的特定響度等於目標特定響度[b ,t ]。In accordance with the teachings of the present invention, it will be desirable to select the gain G [ b , t ] such that the specific loudness of the processed audio is equal to the target specific loudness when the interfering noise is present. [ b , t ].

為了達成這效果，可採用部份響度的概念，如Moore和Glasberg先前所定義。假設能夠得到雜訊本身的量測和音訊本身的量測。令E _N [b ,t ]代表來自雜訊的激勵和令E _A [b ,t ]代表來自音訊的激勵。音訊和雜訊之被組合特定響度接著如下面所給予N _TOT [b ,t ]＝Ψ{E _A [b ,t ]＋E _N [b ,t ]}， (31)其中，再次，Ψ{．}代表從激勵至特定響度的非線性轉換。假設聆聽者的聽力將以維持被組合特定響度之方式分隔在音訊的部份特定響度和雜訊的部份特定響度之間的被組合特定響度：N _TOT [b ,t ]＝N _A [b ,t ]＋N _N [b ,t ]。 (32)To achieve this effect, the concept of partial loudness can be used, as previously defined by Moore and Glasberg. It is assumed that the measurement of the noise itself and the measurement of the audio itself can be obtained. Let E _N [ b , t ] represent the excitation from the noise and let E _A [ b , t ] represent the excitation from the audio. The specific loudness of the audio and noise is combined as follows: N _TOT [ b , t ]=Ψ{ E _A [ b , t ]+ E _N [ b , t ]}, (31) where, again, Ψ{ . } represents a nonlinear transformation from excitation to specific loudness. It is assumed that the listener's hearing will be separated by a specific loudness that is separated by the specific loudness of the combination between the specific loudness of the audio and the specific loudness of the noise: N _TOT [ b , t ]= N _A [ b , t ]+ N _N [ b , t ]. (32)

音訊的部份特定響度，N _A [b ,t ]，是需被控制的數值，並且因此必須找出這數值。雜訊之部份特定響度可被近似為其中E _TN [b ,t ]是雜訊存在時的遮罩臨限、E _TQ [b ]是在頻帶b的較小聲聽力臨限、以及K是在零和1之間的指數。組合方程式31－33可獲得音訊之部份特定響度的表示式： Part of the specific loudness of the audio, N _A [ b , t ], is the value to be controlled, and therefore this value must be found. Part of the specific loudness of the noise can be approximated as Where E _TN [ b , t ] is the mask threshold when noise is present, E _TQ [ b ] is the smaller acoustic hearing threshold in band b, and K is the index between zero and one. Combining equations 31-33 gives an expression of the specific loudness of the audio:

注意，當音訊激勵等於雜訊遮罩臨限時(E _A [b ,t ]＝E _TN [b ,t ])，音訊的部份特定響度等於較小聲臨限的信號響度，其是所需的結果。當音訊激勵顯著地大於雜訊時，方程式34之第二組數值將消失，並且音訊的特定響度大約等於雜訊不存在時的響度。換言之，當音訊比雜訊顯著地大聲時，雜訊將被音訊遮罩。K 指數被憑經驗地選擇以提供雜訊中之音調響度資料作為信號雜訊比的函數之適當的符合度。Moore和Glasberg發現K ＝0.3的數值是適當的。雜訊之遮罩臨限可被以雜訊激勵本身的函數近似：E _TN [b ,t ]＝K [b ]E _N [b ,t ]＋E _TQ [b ] (35)其中K [b ]是在較低頻帶增加的一組常數。因此，方程式34給予的音訊之部份特定響度可抽象地被表示為音訊激勵和雜訊激勵的函數：N _A [b ,t ]＝Φ{E _A [b ,t ],E _N [b ,t ]}。 (36)Note that when the audio excitation is equal to the noise mask threshold ( E _A [ b , t ]= E _TN [ b , t ]), the partial loudness of the audio is equal to the signal loudness of the smaller sound limit, which is required the result of. When the audio excitation is significantly greater than the noise, the second set of values of Equation 34 will disappear and the specific loudness of the audio will be approximately equal to the loudness when the noise is not present. In other words, when the audio is louder than the noise, the noise will be masked by the audio. The K index is empirically selected to provide the appropriate degree of compliance of the tonal loudness data in the noise as a function of the signal to noise ratio. Moore and Glasberg found that a value of K = 0.3 is appropriate. The noise threshold of the noise can be approximated by the function of the noise excitation itself: E _TN [ b , t ]= K [ b ] E _N [ b , t ]+ E _TQ [ b ] (35) where K [ b ] is a set of constants that increase in the lower frequency band. Thus, the partial loudness of the audio given by Equation 34 can be abstractly represented as a function of the audio excitation and noise excitation: N _A [ b , t ]=Φ{ E _A [ b , t ], E _N [ b , t ]}. (36)

被修改增益解決器可接著被採用以計算增益G [b ,t ]以至於雜訊存在時，被處理音訊之部份特定響度等於目標特定響度： The modified gain solver can then be employed to calculate the gain G [ b , t ] such that when noise is present, a portion of the specific loudness of the processed audio is equal to the target specific loudness:

第17圖展示第7圖系統，其中的原始增益解決器106被上述雜訊補償增益解決器206所取代(注意在方塊之間的多組垂直線代表濾波器組的多組頻帶已經被單線取代以簡化圖形)。此外，圖形展示雜訊激勵的量測(利用分析濾波器組200、傳輸濾波器201、激勵202以及平滑化203以對應至方塊100、101、102以及103之操作的方式)與音訊激勵(來自平滑化103)和目標特定響度(來自SL修改部105)一起被饋送進入新的增益解決器206。Figure 17 shows a system of Figure 7, in which the original gain solver 106 is replaced by the above-described noise compensation gain solver 206 (note that multiple sets of vertical lines between the blocks represent that the multiple sets of bands of the filter bank have been replaced by single lines To simplify the graphics). In addition, the graph shows the measurement of the noise excitation (using the analysis filter bank 200, the transmission filter 201, the excitation 202, and the smoothing 203 to correspond to the operation of blocks 100, 101, 102, and 103) and the audio excitation (from Smoothing 103) is fed into the new gain solver 206 along with the target specific loudness (from the SL modification 105).

在其之基本的操作模式，第17圖中的SL修改部105可簡單地將目標特定響度[b ,t ]設定為等於音訊N [b ,t ]之原始特定響度。換言之，SL修改部提供音訊信號的特定響度之一組非頻變、α尺度調整，其中α＝1。如第17圖中的配置，增益被計算使得雜訊存在時，被處理音訊之感知響度頻譜等於雜訊不存在時的音訊響度頻譜。另外地，先前被說明之以原始響度之函數計算目標特定響度，包含VC、AGC、DRC、以及DEQ技術的任何一種或組合皆可配合雜訊補償響度修改系統而被採用。In its basic mode of operation, the SL modification unit 105 in FIG. 17 can simply target the specific loudness. [ b , t ] is set equal to the original specific loudness of the audio N [ b , t ]. In other words, the SL modification unit provides a set of non-frequency-variant, alpha-scale adjustments of a particular loudness of the audio signal, where α=1. As in the configuration of Figure 17, the gain is calculated such that when the noise is present, the perceived loudness spectrum of the processed audio is equal to the audio loudness spectrum when the noise is not present. Additionally, the previously specified target specific loudness is calculated as a function of the original loudness, and any one or combination of VC, AGC, DRC, and DEQ techniques can be employed in conjunction with the noise compensated loudness modification system.

在實際的實施例中，雜訊量測可從被安置於音訊將被播放的環境中或在附近之麥克風而得到。另外地，近似在各種情況中之預期雜訊頻譜的一組預定樣本雜訊激勵可被採用。例如，在車輛中的雜訊可在各種驅動速率被預先分析並且接著儲存為雜訊激勵與相對速率之對照表。被饋送進入第17圖中的增益解決器206之雜訊激勵可接著隨著車輛速率變化而這從對照表被近似。In a practical embodiment, the noise measurement can be obtained from a microphone placed in the environment in which the audio is to be played or in the vicinity. Additionally, a predetermined set of sample noise excitations that approximate the expected noise spectrum in various situations can be employed. For example, noise in the vehicle can be pre-analyzed at various drive rates and then stored as a table of noise excitation and relative rates. The noise excitation that is fed into the gain solver 206 in Fig. 17 can then be approximated from the look-up table as the vehicle speed changes.

製作本發明可以硬體或軟體，或兩者之組合(例如，可程控的邏輯陣列)被製作。除了所指定者以外，被包含為本發明部份之演算法不固有地相關於任何特定的電腦或其他裝置。尤其是，各種一般用途機器可配合依據此處技術製作的程式而被使用，或製作更專業的裝置(例如，積體電路)以進行所需的方法步驟可能更為方便。因此，本發明可以在一種或多種可程控電腦系統上所執行之一個或多個電腦程式而被製作，其電腦系統各包含至少一個處理器、至少一個資料儲存系統(包含依電性和非依電性記憶體及/或儲存元件)、至少一個輸入裝置或埠、以及至少一個輸出裝置或埠。程式碼被施加至輸入資料以進行上述函數並且產生輸出資訊。輸出資訊以習知的形式被施加至一個或更多輸出裝置。 Fabrication The invention can be fabricated in hardware or software, or a combination of both (e.g., a programmable logic array). Except as specified, algorithms included as part of the invention are not inherently related to any particular computer or other device. In particular, it may be more convenient for various general purpose machines to be used in conjunction with programs made in accordance with the techniques herein, or to make more specialized devices (e.g., integrated circuits) to perform the desired method steps. Accordingly, the present invention can be made in one or more computer programs executed on one or more programmable computer systems, each computer system comprising at least one processor, at least one data storage system (including electrical and non-compliant) An electrical memory and/or storage element), at least one input device or device, and at least one output device or device. The code is applied to the input data to perform the above functions and to generate output information. The output information is applied to one or more output devices in a conventional form.

各組此類程式可利用任何所需的電腦語言(包含機器、組合、或高位準程序、邏輯、或物件導向程式語言)被製作以與電腦系統通訊。在任何情況中，該語言可以是編譯或翻譯的語言。Each such group of programs can be made to communicate with a computer system using any desired computer language (including machine, combination, or high level program, logic, or object oriented programming language). In any case, the language can be a compiled or translated language.

各組此類電腦程式最好是被儲存或下載至可被一般或特殊用途可程控電腦讀取的一組儲存媒體或裝置(例如，固態記憶體或媒體、或磁或光學媒體)，以在電腦系統讀取儲存媒體或裝置而進行上述步驟時以組態並且操作電腦。本發明系統同時也可考慮被製作為被以電腦程式組態的電腦－可讀取儲存媒體，其中儲存媒體被組態使得電腦系統以特定並且預定方式進行上述函數而操作。Preferably, each such computer program is stored or downloaded to a set of storage media or devices (eg, solid state memory or media, or magnetic or optical media) that can be read by a general or special purpose programmable computer. The computer system reads the storage medium or device and performs the above steps to configure and operate the computer. The system of the present invention also contemplates a computer-readable storage medium that is configured to be programmed in a computer program, wherein the storage medium is configured such that the computer system operates in a specific and predetermined manner.

本發明之一些實施例已經被說明。不過，將了解的是各種修改可被達成而不脫離本發明之精神和範疇。例如，上述一些步驟的順序無關，並且因此可以依上述被說明之不同的順序而進行。Some embodiments of the invention have been described. However, it will be appreciated that various modifications may be made without departing from the spirit and scope of the invention. For example, the order of some of the above steps is independent, and thus may be performed in a different order as described above.

2．．．修改音訊信號2. . . Modify the audio signal

4、4’、4”、4'''．．．產生修改參數4, 4', 4", 4'''... generate modified parameters

6．．．計算目標特定響度6. . . Calculate target specific loudness

8．．．計算特定響度8. . . Calculate specific loudness

10、10’、10”、10'''．．．計算修改參數10, 10', 10", 10'''... calculate the modified parameters

12．．．計算未被修改音訊之特定響度的近似度12. . . Calculate the approximation of the specific loudness of the unmodified audio

14．．．計算目標特定響度之近似度14. . . Calculate the approximation of the target specific loudness

16．．．儲存或傳輸16. . . Store or transfer

100．．．分析濾波器組100. . . Analysis filter bank

101．．．傳輸濾波器101. . . Transmission filter

102．．．激勵102. . . excitation

103．．．平滑化103. . . Smoothing

104．．．特定響度(SL)104. . . Specific loudness (SL)

105．．．SL修改105. . . SL modification

106．．．增益解決器106. . . Gain solver

107．．．選擇性平滑化107. . . Selective smoothing

108．．．組合器108. . . Combiner

109．．．延遲109. . . delay

110．．．合成濾波器組110. . . Synthesis filter bank

200．．．分析濾波器組200. . . Analysis filter bank

201．．．傳輸濾波器201. . . Transmission filter

202．．．激勵202. . . excitation

203．．．平滑化203. . . Smoothing

206．．．雜訊補償增益解決器206. . . Noise compensation gain solver

505．．．動態EQ特定響度修改505. . . Dynamic EQ specific loudness modification

506．．．目標特定響度預置捕捉和儲存器506. . . Target specific loudness preset capture and storage

507．．．使用者捕捉選擇507. . . User capture selection

508．．．使用者預置選擇508. . . User preset selection

2．．．修改音訊信號2. . . Modify the audio signal

4．．．產生修改參數4. . . Generate modified parameters

6．．．計算目標特定響度6. . . Calculate target specific loudness

8．．．計算特定響度8. . . Calculate specific loudness

10．．．計算修改參數10. . . Calculate the modified parameters

Claims

A method for controlling a particular loudness characteristic of an audio signal, wherein the specific loudness characteristic is a specific loudness measured as one of a perceived loudness as a function of frequency and time, or a second interfering signal is present As part of the measurement of the perceived loudness of the signal as a function of frequency and time, the method comprises the steps of: a) calculating a target specific loudness as a function of the audio signal; b) deriving the use to modify the audio Signaling to reduce frequency and time variation modification parameters between the particular loudness characteristic and the target specific loudness, and c) applying the modified parameters to the audio signal to reduce its specific loudness characteristic and the target specific loudness difference.

According to the method of claim 1, wherein the step of calculating a target specific loudness as a function of the audio signal or the deriving frequency and time variation modifying parameter comprises explicitly calculating the specific loudness and/or the partial loudness step.

The method of claim 1, wherein the step of calculating a target specific loudness as a function of the audio signal or the deriving frequency and time variation modifying parameter comprises implicitly calculating the specific loudness and/or partial loudness Processing steps.

The method of claim 3, wherein the processing step employs a look-up table such that the processing step inherently determines the loudness and/or partial loudness.

The method of claim 3, wherein the specific loudness and/or the partial loudness is inherently determined by the mathematical expression of the closed form employed in the processing step.

The method of any one of claims 1 to 5, wherein the function of the audio signal used to calculate the target specific loudness comprises one or more scale adjustments of the audio signal.

According to the method of claim 6, wherein the one or more scale adjustments comprise a time and frequency variation scale factor Ξ[b,t] adjusting the specific loudness according to the following relationship scale: among them( [ b , t ]) is the target specific loudness, ( N [ b , t ]) is the specific loudness of the audio signal, b is one of the frequencies, and t is one of the time measurements.

The method of claim 7, wherein the one or more scale adjustments are determined at least in part by a ratio of desired multi-band loudness and multi-band loudness of the audio signal.

According to the method of claim 8, wherein the one or more scale adjustments can be expressed as L _o [b, t] / L _i [b, t] in the following relationship: Where N [ b , t ] is the specific loudness of the audio signal, L _o [b, t] is the desired multi-band loudness, L _i [b, t] is the multi-band loudness of the audio signal, and [ b , t ] is the target specific loudness.

According to the method of claim 9, wherein L _o [b, t] is a function of L _i [b, t].

A method according to Paragraph 10 of patent application range, where as _{L i [b, t] L} o [b, t] one of the functions may be expressed _{as: L o [b, t]} = DRC {L i [b, t ]}, where DRC{} represents a dynamic range function that maps L _i [b, t] to L _o [b, t].

The method of claim 9, wherein L _i [b, t] is a time-smoothing and/or frequency smoothing version of the specific loudness of the audio signal.

A method according to any one of claims 8 to 12, wherein the method is operable as a dynamic range control, wherein the application of the modification or the modified parameter produces or the target specific loudness corresponds to an audio message The signal, in which the perceived audio spectrum is present or the perceived audio spectrum present in the interfering signal, may vary for different values than the loudness scale adjustment.

The method of claim 13, wherein the dynamic range function controls loudness in each frequency band such that short-term changes applied to respective frequency bands vary independently between frequency bands, and are applied to the average change of each frequency band for all The frequency bands are substantially the same.

According to the method of claim 14, wherein L _o [b, t] as a function of L _i [b, t] can be expressed as: Where L _o [ t ]= DRC { L _i [ t ]} represents a mapping of the total loudness of the audio signal to the desired total loudness, [ t ] represents a time-average version of the wide-band loudness L _i [t] of the audio signal, and [ b , t ] represents a time-averaged version of the multi-band loudness L _i [b, t] of the audio signal.

The method of any one of claims 14 or 15, wherein the method is operable as a dynamic range control, wherein the modification or the application of the modified parameter produces or the target specific loudness corresponds to an audio signal, The ratio of the perceived audio spectrum of the perceived audio spectrum or the presence of an interfering signal to different values is substantially the same as the perceived audio spectrum of the audio signal.

The method of claim 7, wherein the specific loudness is scaled by a ratio of a desired spectral shape measurement to a spectral shape measurement of the one of the audio signals.

The method of claim 17, wherein the method converts the perceptual spectrum of the audio signal from a time-varying perceptual spectrum to a substantially non-time-variant perceptual spectrum.

According to the method of claim 17 or 18, wherein the one or more scale adjustments can be expressed in the following relationship item: And where [ b , t ] is a time-smoothed multi-band loudness of the audio signal, [ b , t ] is a required spectrum EQ[b], which is normalized to have, for example, multi-band loudness [ b , t ] the same wide-band loudness, thus [ b , t ] can be expressed as: Where N [ b , t ] is the specific loudness of the audio signal, [ b , t ] is the target specific loudness, and β is one of the parameters having a range containing zero and one and being limited by it, the parameter controlling the scale adjustment level.

According to the method of claim 19, wherein the parameter β is selected or controlled by a source external to the method.

The method of claim 20, wherein the source is a user of the method.

The method according to any one of claims 17 to 21, wherein the method is usable as a dynamic equalizer, wherein the modification or the application of the modified parameter produces or the target specific loudness corresponds to an audio The signal, in which the perceived audio spectrum is present or the perceived audio spectrum present in the interfering signal, may vary for different loudness scale adjustments for different values.

The method of any one of claims 7 to 22, wherein the multi-band loudness of the audio signal is obtained by dividing the audio signal into a plurality of primary frequency bands and performing frequency smoothing across a plurality of the primary frequency bands Approximate.

The method according to claim 23, wherein the multi-band loudness in one of the specific frequency bands b-smoothed version L [ b , t ] can be expressed as a convolution sum across all frequency bands c: Where N [ c , t ] is the specific loudness of the audio signal, and Q(bc) is the frequency band-shift response of the smoothing filter.

According to the method of claim 6, wherein the one or more scale adjustments comprise a time varying non-frequency variable scale coefficient Φ[t] scale adjustment of the specific loudness, such as the following relationship: among them( [ b , t ]) is the target specific loudness, ( N [ b , t ]) is the specific loudness of the audio signal, b is the frequency measurement, and t is the time measurement.

The method of claim 25, wherein the one or more scale adjustments are determined at least in part by a desired ratio of broadband loudness to wideband loudness of the audio signal.

According to the method of claim 25 or 26, wherein the scale adjustment in the function of the specific loudness of the audio signal can be expressed as L _o [t] / L _i [t] in the following relationship: Where N [ b , t ] is the specific loudness of the audio signal, L _o [t] is the desired wide-band loudness, L _i [t] is the wide-band loudness of the audio signal, and [ b , t ] is the target specific loudness.

According to the method of claim 27, wherein L _o [t] is a function of L _i [t].

According to the method of claim 28, wherein L _o [t] as a function of L _i [t] can be expressed as: L _o [ t ]= DRC { L _i [ t ]} where DRC{} represents a mapping Dynamic range function of L _i [t] to L _o [t].

According to the method of claim 27, wherein L _i [t] is a time-smoothed version of the total loudness of the audio signal.

According to the method of claim 27, wherein L _i [t] is one of the long-term loudness of the audio signal.

According to the method of claim 27, wherein L _i [t] is one of the short-term loudness of the audio signal.

The method of any one of claims 25 to 32, wherein the method can be used as an automatic gain control or dynamic range control, wherein the modification or the application of the modified parameters produces or the target corresponds to loudness In the audio signal, the perceptual audio spectrum in which the audio spectrum is perceived or present in an interference signal, the specific loudness scale adjustment or the partial loudness scale adjustment for different values remains substantially the same as the perceived audio spectrum of the audio signal.

The method of any one of clauses 7 to 33, wherein the time and frequency variation scale factor is a function of the measurement of the audio signal or the audio signal.

According to the method of claim 6, wherein the one or more scale adjustments comprise a non-time variable frequency scaling coefficient Θ[ b ] adjusting the specific loudness by the following relationship scale: among them [ b , t ] is the target specific loudness, N [ b , t ] is the specific loudness of the audio signal, b is the frequency measurement, and t is the time measurement.

The method of claim 35, wherein the modifying or the deriving step comprises storing the scale factor Θ[ b ].

According to the method of claim 35, wherein the scale factor Θ[ b ] is a source received from outside the method.

According to the method of claim 6, wherein the one or more scale adjustments comprise a non-time varying non-frequency variable scale coefficient a to adjust the specific loudness of the audio signal by the following relationship scale: among them [ b , t ] is the target specific loudness, N [ b , t ] is the specific loudness of the audio signal, b is the frequency measurement, and t is the time measurement.

The method of claim 38, wherein the modifying or the deriving step comprises storing the scale factor α .

A method according to any one of claims 35 to 38, wherein the method is operable as a volume control, wherein the application of the modification or the modified parameter produces or the target specific loudness corresponds to an audio signal, The perceived audio spectrum in which the perceived audio spectrum or an interfering signal is present remains substantially the same as the perceived audio spectrum of the audio signal for different values of the specific loudness or partial loudness scale adjustment.

The method of any one of claims 1 to 40, wherein the modifying step or the application of the modifying parameter, or the deriving step, explicitly calculates (a) specific loudness, and/or (b) partial ratio Loudness, and/or (c) the target's specific loudness.

A method according to any one of claims 1 to 40, wherein the modification step or the application of the modification parameter, or the derivation step implicitly calculates (a) specific loudness, and/or (b) portion Specific loudness, and / or (c) the target specific loudness.

According to the method of claim 42, wherein the modification step or the application of the modification parameter, or the derivation step of the modification parameter, or the generation step adopts a comparison table, which inherently determines (a) specific loudness, and/or Or (b) partial loudness, and/or (c) the target specific loudness.

According to the method of claim 43, wherein the modification step or the application of the modification parameter, or the derivation step of the modification parameter, or the generating step adopts a closed form mathematical expression, which inherently determines (a) specific loudness , (b) and / or part of the specific loudness, and / or (c) the target specific loudness.

The method of any one of claims 1 to 44, wherein the modified parameters are temporally smoothed.

The method of claim 45, wherein the modified parameters comprise a plurality of amplitude scale adjustment coefficients for a frequency band of the audio signal.

According to the method of claim 46, at least some of the plurality of amplitude scale adjustment coefficients are time-variant.

The method of claim 45, wherein the modified parameters comprise a plurality of filter coefficients for controlling one or more filters.

The method of claim 48, wherein at least some of the one or more filters are time-varying, and at least some of the filter coefficients are time-varying.

The method of any one of claims 1 to 49, wherein the modifying step or the application of the modifying parameter, the deriving step of the modifying parameter, or the generating step is one or more of the following : measuring one of the interfering audio signals, a target specific loudness, An estimate of the specific loudness of the unmodified audio signal from the loudness or partial loudness of the modified audio signal, the specific loudness of the unmodified audio signal, and the ratio derived from the modified audio signal An approximation of the loudness or part of the loudness of the target than the loudness.

The method of any one of claims 1 to 49, wherein the modifying step or the applying of the modifying parameter or the deriving step of the modifying parameter derives at least in part from one or more of the following: Measuring one of the interfering audio signals, a target specific loudness, an evaluation value derived from the specific loudness of the modified audio signal or the specific loudness of the unmodified audio signal of the partial loudness, the unmodified audio signal The specific loudness, and an approximation of the target specific loudness from the loudness or partial loudness of the modified audio signal.

The method of claim 51, wherein the modifying step or the applying of the modifying parameter or the deriving step of the modifying parameter derives at least in part a modified parameter from: (1) one of the following: a target specific loudness And an evaluation value of one of the specific loudness of the unmodified audio signal received from the loudness of the modified audio signal, and (2) one of the following: The specific loudness of the unmodified audio signal and an approximation of the target specific loudness derived from the loudness of the modified audio signal.

The method of claim 51, wherein the modifying step or the applying of the modifying parameter or the deriving step of the modifying parameter derives at least in part a modified parameter from: (1) measuring one of the interfering audio signals, (2) one of the following: a target specific loudness, and an estimate of the specific loudness of the unmodified audio signal derived from the partial loudness of the modified audio signal, and (3) one of the following: The specific loudness of the unmodified audio signal and an approximation of the target specific loudness of the partial loudness of the modified audio signal.

The method of claim 52, wherein the method employs a pre-administration configuration, wherein the specific loudness is derived from the audio signal, and wherein the target specific loudness is received from one of the external sources of the method or The modification step or the derivation step of the application or modification parameter of the modification parameter includes receiving from a storage location when storing a target specific loudness.

The method of claim 52, wherein the method employs a hybrid pre-request/reward configuration, wherein an approximation of the target specific loudness is derived from the modified audio signal, and wherein the target specific loudness is Received from one of the external sources of the method or when the modification step or the repair The application of the modified parameter or the derivation of the modified parameter includes receiving from a storage when storing a target specific loudness.

The method of claim 52, wherein the modifying step or the applying of the modifying parameter or the deriving step of the modifying parameter comprises one or more processing programs to obtain the target specific loudness explicitly or implicitly, wherein The one or more processing programs explicitly or implicitly calculate the function of the audio signal or the measurement of the audio signal.

According to the method of claim 56, wherein the method adopts a pre-administration configuration, wherein the specific loudness and the target specific loudness are derived from the audio signal, and the target specific loudness is derived by using the audio signal or the audio signal The measurement of this function.

According to the method of claim 56, wherein the method employs a hybrid pre-request/reward configuration, wherein an approximation of the target specific loudness is derived from the modified audio signal, and the target specific loudness is from the audio The signal is derived, and the target is compared to the loudness by the function of the audio signal or the measurement of the audio signal.

The method of claim 52, wherein the modifying step or the applying of the modifying parameter or the deriving step of the modifying parameter comprises one or more processing programs to explicitly or implicitly respond to the modified audio signal Obtaining an estimate of the loudness of the unmodified audio signal, wherein the one or more processing programs explicitly or implicitly calculate an inverse function of the function of the audio signal or the measurement of the audio signal.

According to the method of claim 59, wherein the method adopts a feedback configuration, wherein one of the specific loudness of the unmodified audio signal is evaluated The estimate, and an approximation of the target specific loudness, is derived from the modified audio signal, the estimate of the loudness being calculated using an inverse function of the function of the audio signal or the measurement of the audio signal.

The method of claim 59, wherein the method employs a hybrid pre-request/return configuration, wherein the specific loudness is derived from the audio signal, and the evaluation value of the specific loudness of the unmodified audio signal Derived from the modified audio signal, the derivation of the evaluation value is calculated using the inverse of the function of the audio signal or the measurement of the audio signal.

The method of any one of claims 1 to 61, wherein the step a) further comprises transmitting or storing the audio signal and the calculated target specific loudness or its expression, and step b) further comprises The frequency and time variation is received prior to modifying the parameter to receive the transmitted or regenerated stored audio signal and the target specific loudness or its expression, and the execution of step a) is temporally and/or spatially separated from steps b) and c).

A device for performing all the steps of the method, the method of any one of claims 1 to 61, the device comprising: computing means for calculating a target specific loudness as the audio signal a function; a derivation device whose derivation can be used to modify the audio signal to reduce frequency and time variation modification parameters between the particular loudness characteristic and the target specific loudness, and An application device that applies the modified parameters to the audio signal to reduce the difference between its specific loudness characteristics and the target specific loudness.

A computer program stored on a computer readable medium, adapted to cause a computer to perform all of the steps of the method of any one of claims 1 to 61.