KR100460159B1

KR100460159B1 - Audio signal encoding method and apparatus

Info

Publication number: KR100460159B1
Application number: KR1019970010242A
Authority: KR
Inventors: 데이비드 존스톤 제임스; 신하 디펜
Original assignee: 루센트 테크놀러지스 인크
Priority date: 1996-03-19
Filing date: 1997-03-19
Publication date: 2005-02-23
Anticipated expiration: 2017-03-19
Also published as: KR970067255A

Abstract

본 발명의 오디오 코딩 기술은 제 1 필터 뱅크와 웨이브렛 필터 뱅크를 갖는 신호 적응성 스위치 필터 뱅크(signal adaptive switched filterbank)를 이용한다. 필터 뱅크는 입력 신호의 정상성(stationarity)의 함수로서 입력 신호를 필터링하도록 제 1 필터 뱅크와 웨이브렛 필터 뱅크 사이를 스위칭한다. 제 1 필터 뱅크는 정상 신호 성분들(stationary signal components)을 필터링하도록 이용된다. 웨이브렛 필터 뱅크는 비정상 신호 성분들(non-stationary signal components; 예컨대, 발성들(attacks))을 필터링하기 위해 사용된다.The audio coding technique of the present invention utilizes a signal adaptive switched filterbank having a first filter bank and a wavelet filter bank. The filter bank switches between the first filter bank and the wavelet filter bank to filter the input signal as a function of the stationarity of the input signal. The first filter bank is used to filter stationary signal components. The wavelet filter bank is used to filter non-stationary signal components (eg, attachments).

Description

Audio signal encoding method and apparatus

본 출원은 1996년 3월 19일 출원된 미국 가출원 제 60/014,725 호의 우선권을 주장한다.This application claims the priority of US Provisional Application No. 60 / 014,725, filed March 19, 1996.

본 발명은 신호들의 처리에 관한 것으로, 특히, 서브 밴드 코딩 설계들, 예컨대, 지각 오디오 코딩(perceptual audio coding)을 사용하는 오디오 신호의 인코딩에 관한 것이다.TECHNICAL FIELD The present invention relates to the processing of signals, and in particular, to the encoding of an audio signal using subband coding designs, such as perceptual audio coding.

소비자, 생산자, 스튜디오 및 연구실에서는 양질의 오디오 신호들을 저장, 처리 및 통신하기 위한 제품들에 대한 많은 수요들이 있다. 매우 낮은 비트율들에서 오디오 신호들의 압축은 디지털 오디오 테이프, 콤팩트디스크들 및 멀티미디어 응용들과 같은 많은 최근의 디지털 오디오 응용들에 대해 매우 바람직하다. 이러한 디지털 응용들에 이용된 압축 기술들은 양질의 신호들을 처리할 수 있다. 그러나, 그러한 성능은 종종 많은 데이터 저장 용량 및 전송 대역폭을 통해 달성된다.Consumers, producers, studios and laboratories have high demands for products for storing, processing and communicating high quality audio signals. Compression of audio signals at very low bit rates is very desirable for many modern digital audio applications such as digital audio tapes, compact discs and multimedia applications. The compression techniques used in these digital applications can process good signals. However, such performance is often achieved through large amounts of data storage capacity and transmission bandwidth.

압축 영역의 많은 작업량은 디지털 오디오의 코딩에서 데이터 저장 및 전송 대역폭 요구들을 줄이도록 노력하여 왔다. 그러한 압축 기술은 인간의 지각 시스템의 모델을 사용함으로써 소스 신호들의 무관한 정보를 제거한다. 이러한 지각 오디오 코딩(이하, "PAC") 기술은, 예컨대 발명의 명칭이 "지각 모델을 기초하여 오디오 신호들을 코딩하기 위한 방법 및 장치(Method and Apparatus for Coding Audio Signals Based on Perceptual Model)"인, 1994년 2월 8일, 발행된 제이. 디이. 존스톤(J. D. Johnston)의 미국 특허 제 5,285,498 호에 개시되며, 본 명세서에 참조에 의해 편입된다(이하, "존스톤" 특허로 부른다).Large workloads in the compression domain have sought to reduce data storage and transmission bandwidth requirements in the coding of digital audio. Such compression techniques remove the extraneous information of the source signals by using a model of the human perceptual system. This perceptual audio coding ("PAC") technique, for example, is entitled "Method and Apparatus for Coding Audio Signals Based on Perceptual Model". Jay, published February 8, 1994. Dee. J. D. Johnston, US Pat. No. 5,285,498, which is incorporated herein by reference (hereinafter referred to as the "Johnston" patent).

예컨대, 존스톤 특허에서 기술된 바와 같이 지각 오디오 코딩은 오디오 신호들을 나타내는데 요구된 비트율들 또는 총 비트들의 수를 낮추기 위한 기술이다. PAC 기술은 주파수의 함수로서 단기간 에너지 분배를 이용한다. 이러한 에너지 분배로부터, 바로 인식할 수 있는 노이즈 레벨들을 나타내는 한 세트의 문턱값들이 계산될 수 있음은 공지되어 있다. 이 때, 그 중에서도, 원하는 신호의 신호 성분을 나타내도록 사용된 양자화의 조잡도(coarseness)는 코딩 자신에 의해 유도된 양자화 노이즈가 노이즈 문턱값들 위로 상승하지 않도록 선택된다. 그러므로, 유도된 노이즈는 지각 처리에서 마스킹된다. 마스킹은 동일한 스펙트럼의, 시간적인 또는 공간적인 장소에서 2개의 신호 성분들(신호에 속하는 하나와 노이즈에 속하는 하나) 사이를 구별하는 인간 지각 메커니즘의 무능력 때문에 일어난다.For example, perceptual audio coding as described in the Johnston patent is a technique for lowering the number of bit rates or total bits required to represent audio signals. PAC technology uses short-term energy distribution as a function of frequency. From this energy distribution, it is known that a set of thresholds representing immediately recognizable noise levels can be calculated. At this time, among others, the coarseness of the quantization used to represent the signal component of the desired signal is selected so that the quantization noise induced by the coding itself does not rise above the noise thresholds. Therefore, the induced noise is masked in the perceptual process. Masking occurs because of the inability of human perceptual mechanisms to distinguish between two signal components (one belonging to the signal and one belonging to the noise) in the same spectrum, temporal or spatial place.

최근에, 128 내지 256 kbps(즉, 6 내지 12의 압축 인자들) 범위의 전송 압축을 제공하도록 요구되는 다수의 지각 오디오 코더들이 개발되어 왔다. 전형적으로, 그러한 코더들은 입력 신호를 그 주파수 성분들로 분할하는 분석 필터 뱅크들을 이용한다. 이 때, 이러한 성분들은, 상술된 바와 같이, 인간 청각의 마스킹 특성들에 기초한 지각 모델을 사용하여 양자화된다. 존스톤 특허에서, 예컨대, 변조 이산 코사인 변형(Modified Discrete cosine Transform; 이하, "MST") 필터 뱅크로서 공지된 고주파 분해 필터 뱅크(high frequency resolution filterbank)가 신호를 주파수 성분들에 분할하도록 이용하는 PAC 접근법이 기술된다. 이러한 고주파 분해 MDCT 필터 뱅크(예를 들어, 1024 서브 밴드들 또는 주파수 라인들을 갖는)는 소위 정상 신호들(stationary signals)(예를 들어, 연주 음악 및 대부분의 보컬 음악)에 대하여 매우 치밀한 표현(compact representation)에 이르게 한다. 그러나, 소위 순간적이거나 날카로운 발성들(attacks)(예를 들면, 캐스터네츠들 또는 트라이앵글들)을 포함하는 비정상 오디오 신호들(non-stationary audio signals)은 고주파 분해 MDCT 필터 뱅크를 사용하여 치밀하게 나타낼 수 없다. 이는, 치밀한 표현들을 위해 더 높은 주파수에서 요구된 더 높은 시간 분해에 기인한다. 부가하여, 비정상 신호 성분들에 대해 MDCT를 사용하는 것은 저질의 코딩된 신호들에 이르게 한다.Recently, a number of perceptual audio coders have been developed which are required to provide transmission compression in the range of 128 to 256 kbps (ie 6 to 12 compression factors). Typically, such coders use analysis filter banks that divide the input signal into its frequency components. These components are then quantized using a perceptual model based on the masking characteristics of the human auditory, as described above. In the Johnston patent, for example, a PAC approach in which a high frequency resolution filterbank, known as a Modified Discrete cosine Transform ("MST") filter bank, is used to split the signal into frequency components, Are described. This high frequency decomposition MDCT filter bank (e.g. with 1024 subbands or frequency lines) is a very compact representation of so-called stationary signals (e.g. playing music and most vocal music). to the representation. However, non-stationary audio signals, including so-called momentary or sharp attachments (e.g. castanets or triangles), cannot be closely represented using a high frequency decomposition MDCT filter bank. . This is due to the higher time resolution required at higher frequencies for dense representations. In addition, using MDCT for abnormal signal components leads to poor quality coded signals.

비정상 신호들을 코딩할 때 부딪히는 필터링 문제를 다루는 다른 기술들이 개발되어 왔다. 예컨대, 존스톤 특허에서 기술된 그러한 기술은 소위 "윈도우 스위칭" 설계를 이용한다. 이러한 PAC 설계는 비정상 신호들의 날카로운 발성들을 다루도록 소위 "긴" 및 "짧은" MDCT 윈도우들을 사용한다. "윈도우 스위칭"에 있어서, 신호의 정상성은 2개의 레벨들에서 모니터링된다. 먼저, 긴 MDCT 윈도우들(예를 들어, 1024 서브 밴드들을 갖는 윈도우)은 정상 신호 성분들을 위해 사용되고, 이 때 필요하다면, 짧은 윈도우들(예를 들어, 128 서브 밴드들을 갖는 윈도우)은 비정상성의 주기동안 사용된다. 그러나, 이러한 접근법의 결점은 짧은 MDCT 윈도우들이 모든 주파수들에 대하여 균일하게 시간 분해를 증가시키는 것이다. 바꾸어 말하면, 더 높은 주파수들에서 원하는 정도까지 시간 분해를 증가시키기 위하여, 이러한 기술은 또한 더 낮은 주파수들에서도 시간 분해를 증가시켜야 한다.Other techniques have been developed to address the filtering problem encountered when coding abnormal signals. For example, such a technique described in the Johnston patent uses a so-called "window switching" design. This PAC design uses so-called "long" and "short" MDCT windows to handle sharp utterances of abnormal signals. In "window switching", the normality of the signal is monitored at two levels. First, long MDCT windows (e.g., a window with 1024 subbands) are used for normal signal components, and if necessary, short windows (e.g., a window with 128 subbands) are periods of abnormality. Is used during. However, a drawback of this approach is that short MDCT windows increase the time resolution evenly for all frequencies. In other words, in order to increase the time resolution to the desired degree at higher frequencies, this technique must also increase the time resolution at lower frequencies.

날카로운 발성들을 필터링하기 위한 더 바람직한 필터 뱅크는 주파수 축의 임계 밴드 분할을 매칭하는 서브 밴드들을 갖는 불균일한 구조를 갖는 것이다(즉, 서브 밴드들은 바크 스케일(bark scale) 상에 균일하다). 게다가, 필터 뱅크의 고주파 필터들이 비례하여 더 짧은 것이 훨씬 바람직하다. 이러한 목적들을 충족시키는 하나의 코딩 구조는 하이브리드 또는 캐스캐이드 구조(예컨대, 케이. 브란덴부르크 등(K. Brandenburg et al.)의 " ISO-MPEG 오디오 코덱: 양질의 디지털 오디오의 코딩을 위한 일반 표준(The ISO-MPEG-Audio Codec: A Generic Standard for Coding of High Quality Digital Audio)" Journal of Audio Engineering Society, Vol. 42, No. 10, October, 1994, 및 제이 프린슨(J. Princen)과 제이. 디이. 존스톤(J. D. Johnston)의 "신호 적응성 필터 뱅크들을 갖는 오디오 코딩(Audio Coding with Signal Adaptive Filterbanks)", Proceedings of IEEE, ICASSP, Detroit, 1995를 참조)를 이용한다. 이러한 코딩 기술은 균일한 또는 불균일한 필터 뱅크를 갖는 제 1 스테이지로 구성된다. 각각의 서브 밴드들은 균일한 필터 뱅크들을 사용하여 더 스플릿될 수 있다. 그러나, MDCT 필터 뱅크들과 비교하여 이러한 접근법의 결점은 하이브리드/캐스캐이드 구조가 증가된 구현 비용들뿐만 아니라 필터들의 더 열악한 주파수 응답에 이르게 하는 정상 및 비정상 신호들 양쪽에 사용되어야 한다는 것이다.A more preferred filter bank for filtering sharp vocals is to have a non-uniform structure with subbands that match the critical band division of the frequency axis (ie, the subbands are uniform on the bark scale). In addition, it is much desirable that the high frequency filters of the filter bank be proportionally shorter. One coding scheme that meets these objectives is a hybrid or cascaded structure (e.g., K. Brandenburg et al. "ISO-MPEG Audio Codec: General Standard for Coding of High Quality Digital Audio" The ISO-MPEG-Audio Codec: A Generic Standard for Coding of High Quality Digital Audio. "Journal of Audio Engineering Society, Vol. 42, No. 10, October, 1994, and J. Princen and Jay. JD Johnston's "Audio Coding with Signal Adaptive Filterbanks", Proceedings of IEEE, ICASSP, Detroit, 1995). This coding technique consists of a first stage having a uniform or non-uniform filter bank. Each subband may be further split using uniform filter banks. However, a drawback of this approach compared to MDCT filter banks is that the hybrid / cascade structure must be used for both normal and abnormal signals leading to poorer frequency response of the filters as well as increased implementation costs.

그러므로, 당해 기술 분야에서는, 서브 밴드 코딩에서 비정상 신호들을 핸들링하기 위한 종래 기술의 필터링 배열들의 결점들을 극복하는 필터 뱅크를 필요로 한다.Therefore, there is a need in the art for a filter bank that overcomes the drawbacks of prior art filtering arrangements for handling abnormal signals in subband coding.

본 발명의 원리들을 이용하는 신호 압축 기술은 지각 오디오 코딩 또는 유사한 서브 밴드 타입 코딩을 사용하여 오디오 신호들을 코딩하기 위해 제 1 필터 뱅크와 웨이브렛 필터 뱅크(wavelet filterbank) 사이를 스위칭한다.Signal compression techniques using the principles of the present invention switch between a first filter bank and a wavelet filterbank to code audio signals using perceptual audio coding or similar subband type coding.

양호한 실시예에 있어서, 2개의 필터 뱅크들 사이의 스위칭은 신호의 시간-변화 특성들, 바람직하게는 그 지각 엔트로피 레벨(perceptual entropy level)에 기초한다. 또한, 바람직한 실시예들에 있어서, 제 1 필터 뱅크는 고주파 분해 MDCT 필터 뱅크이다. 일반적으로, 고주파 분해 MDCT 필터 뱅크는 입력 신호를 필터링하는데 사용되지만, 비정상성의 경우에는 웨이브렛 필터 뱅크가 사용된다. 유리하게도, 본 발명은, 그것이 비정상 성분들을 포함할 때 신호의 더 치밀한 표현을 달성한다. 바람직한 실시예에 따라, 웨이브렛 필터 뱅크는 불균일한 트리 구조의 필터 뱅크이다.In a preferred embodiment, the switching between the two filter banks is based on the time-varying characteristics of the signal, preferably its perceptual entropy level. Also in preferred embodiments, the first filter bank is a high frequency decomposition MDCT filter bank. In general, a high frequency decomposition MDCT filter bank is used to filter the input signal, but in the case of anomalies a wavelet filter bank is used. Advantageously, the present invention achieves a more dense representation of the signal when it contains abnormal components. According to a preferred embodiment, the wavelet filter bank is a filter bank of non-uniform tree structure.

본 발명은 지각 오디오 코딩 또는 유사한 서브 밴드 타입 코딩을 사용하여 코딩된 비정상 신호들을 핸들링하기 위해 제 1 필터 뱅크(바람직하게는 고주파 분해 MDCT 필터 뱅크)와 웨이브렛 필터 사이를 스위칭하는 신호 적응성 스위치 필터 뱅크를 이용하는 오디오 신호 압축 기술에 관한 것이다.The present invention provides a signal adaptive switch filter bank for switching between a first filter bank (preferably a high frequency decomposition MDCT filter bank) and a wavelet filter to handle abnormal signals coded using perceptual audio coding or similar subband type coding. It relates to an audio signal compression technique using.

본 발명의 예시적인 실시예들은 명확한 설명을 위해 기능적인 블록들로 제공된다. 이러한 블록들이 나타내는 기능들은 소프트웨어를 실행할 수 있는 하드웨어를 포함하지만, 거기에 제한되지 않는, 공유 또는 전용 하드웨어 중 어느 하나의 사용을 통해 제공될 수 있다. 또한, 용어 "프로세서"의 사용은 소프트웨어를 실행할 수 있는 하드웨어를 배타적으로 언급하는 것으로 해석되지 않아야 한다. 어떤 실시예들은 ATT&T DSP16 또는 DSP32와 같은 디지털 신호 프로세서(이하, "DSP")하드웨어와 아래에 논의되는 동작들을 수행하기 위한 소프트웨어를 포함할 수 있다. 하이브리드 DSP/VLSI 실시예들뿐만 아니라 본 발명의 대규모 집적(이하, "VLSI") 하드웨어 실시예들이 또한 제공될 수 있다.Exemplary embodiments of the invention are provided in functional blocks for clarity. The functions represented by these blocks may be provided through the use of either shared or dedicated hardware, including but not limited to hardware capable of executing software. In addition, the use of the term "processor" should not be interpreted as exclusively referring to hardware capable of executing software. Some embodiments may include a digital signal processor ("DSP") hardware, such as ATT & T DSP16 or DSP32, and software for performing the operations discussed below. In addition to hybrid DSP / VLSI embodiments, large scale integrated (“VLSI”) hardware embodiments of the present invention may also be provided.

도 1은 본 발명이 구현되는 예시적인 시스템의 전체 블록도이다. 도 1에서, 아날로그 오디오 신호(101)는 프리프로세서(102)에 제공되어, 샘플링되며(전형적으로 48 kHz에서), 종래의 방식으로 도선(103) 상에 샘플당 16 비트 디지털 펄스 코드 변조(이하, 'PCM")로 변환된다. PCM 신호는 PCM 신호를 압축하고 통신 채널 또는 저장 매체 중 어느 하나에 도선(105) 상의 압축된 PAC 신호를 출력하는 지각 오디오 코더(200)에 공급된다. 후자는, 예컨대, 마그네틱 테이프, 콤팩트 디스크 또는 다른 저장 매체일 수 있다. 통신 채널 또는 저장 매체로부터, 도선(107) 상의 압축된 PAC 인코딩된 신호는 압축된 PAC 인코딩된 신호를 압축 해제시키고 최초의 오디오 신호(101)의 디지털 표현인 도선(109) 상의 PCM 신호를 출력하는 지각 오디오 디코더(108)에 공급된다. 지각 오디오 디코더로부터, 도선(108) 상의 PCM 신호는 신호의 아날로그 표현을 만드는 포스트프로세서(post-processor)에 공급된다.1 is an overall block diagram of an exemplary system in which the present invention is implemented. In FIG. 1, analog audio signal 101 is provided to preprocessor 102, sampled (typically at 48 kHz), and 16-bit digital pulse code modulation per sample on lead 103 in a conventional manner (hereinafter PCM signal is supplied to a perceptual audio coder 200 which compresses the PCM signal and outputs a compressed PAC signal on the lead wire 105 on either the communication channel or the storage medium. For example, a magnetic tape, a compact disc, or other storage medium .. From a communication channel or storage medium, a compressed PAC encoded signal on a wire 107 decompresses the compressed PAC encoded signal and the original audio signal ( Supplied to a perceptual audio decoder 108 that outputs a PCM signal on a lead 109, which is a digital representation of 101. From the perceptual audio decoder, a PCM signal on a lead 108 is used to produce an analog representation of the signal. It is supplied to the bit processor (post-processor).

지각 오디오 코더(200)의 예시적인 실시예는 도 2에서 블록도로 도시된다. 지각 오디오 코더(200)는 신호 적응성 스위치 필터 뱅크(202), 지각 모델 프로세서(210), 양자화기/속도 루프 프로세서(212), 엔트로피 코더(214)를 포함함에 따라 관찰되는 장점이 있다. 지각 모델 프로세서(210), 양자화기/속도 루프 프로세서(212) 및 엔트로피 코더(214)의 구조와 동작은 일반적으로 오디오 신호를 프로세싱하기 위해 존스톤 특허에서 확인된 바와 같은 구성요소의 구조 및 동작과 유사하다. 그러나, 신호 적응성 스위치 필터 뱅크(202)는 이후 제 1 필터 뱅크(바람직하게는 고주파 분해 MDCT 필터 뱅크)와 웨이브렛 필터 뱅크 사이의 스위칭에 관련하여 상세히 설명된다. 도 2의 다른 소자들과 조합하여, 스위치 필터 뱅크(202)의 특성들은 본 발명의 이점들을 제공한다.An exemplary embodiment of the perceptual audio coder 200 is shown in block diagram in FIG. The perceptual audio coder 200 has the observed benefits as it includes a signal adaptive switch filter bank 202, a perceptual model processor 210, a quantizer / speed loop processor 212, and an entropy coder 214. The structure and operation of the perceptual model processor 210, the quantizer / speed loop processor 212, and the entropy coder 214 are generally similar to the structure and operation of components as identified in the Johnston patent for processing audio signals. Do. However, the signal adaptive switch filter bank 202 is described in detail in connection with the switching between the first filter bank (preferably high frequency decomposition MDCT filter bank) and the wavelet filter bank. In combination with the other elements of FIG. 2, the characteristics of the switch filter bank 202 provide the advantages of the present invention.

다시 도 2로 돌아가면, 신호 적응성 스위치 필터 뱅크(202)는 상술한 바와 같이 미리 결정된 형태로 신호의 인코딩 중 고주파 분해 MDCT 필터 뱅크(204)와 2개의 필터 뱅크 사이의 스위칭(206)에 사용하기 위한 웨이브렛 필터 뱅크(208)를 포함한다. 상술한 바와 같이, 인코딩 프로세스에서 고주파 분해 MDCT(예컨대, 1024 서브 밴드 또는 PAC 내의 주파수 라인)의 이용은 MDCT가 정상 신호에 대한 매우 치밀한 표현을 유도하는데 유용하다. PAC의 목적을 위해, MDCT는 다음의 특징을 갖는다: (ⅰ) 임계 샘플링 특성들(즉, 필터 뱅크로의 모든 n 샘플들에 대해, n 샘플들이 얻어진다); (ⅱ) 일반적으로 MDCT는 필터 뱅크에 독립적으로 주입된 노이즈의 제어를 처리하는 양호한 방법을 제공하는 하프-오버랩(즉, 변형 길이는 필터 뱅크로 쉬프트된 샘플들의 수(n)의 정확히 2배의 길이이다)을 제공한다: (ⅲ) MDCT는 정수의 샘플의 지연에만 입력 샘플의 정확한 재구성을 제공한다. 공지된 MDCT는, 예컨대, 제이. 피이. 프린슨(J. P. Princen)과 에이. 비이 브래들리(A. B. Bradley)의 "시간 도메인 에일리어싱 소거에 기초한 분석/합성 필터 뱅크(Analysis/Synthesis filterbank Design Based on Time Domain Aliasing Cancellation)", IEEE Yrans. ASSP, Vol. 34, No. 5, October 1986에 기술된다. PAC의 사용을 위한 MDCT의 공지된 적용과 본 명세서에서 고주파 분해 MDCT 필터 뱅크(204)에 의해 수행되는 기능은, 예컨대, 존스톤 특허에서 완전하게 설명된다.2, the signal adaptive switch filter bank 202 is used for switching 206 between the high frequency decomposition MDCT filter bank 204 and the two filter banks during encoding of the signal in a predetermined form as described above. A wavelet filter bank 208 for use. As mentioned above, the use of high frequency decomposition MDCT (eg, 1024 subbands or frequency lines within a PAC) in the encoding process is useful for the MDCT to derive a very dense representation of a normal signal. For the purpose of the PAC, MDCT has the following characteristics: (i) critical sampling characteristics (ie, for all n samples into the filter bank, n samples are obtained); (Ii) In general, MDCT is a half-overlap that provides a good way to handle the control of noise injected independently into the filter bank (i.e. the strain length is exactly twice the number n of samples shifted into the filter bank). (I) MDCT provides accurate reconstruction of the input sample only with a delay of integer samples. Known MDCTs are, for example, Jay. Bloody. J. P. Princen and A. A. B. Bradley, "Analysis / Synthesis filterbank Design Based on Time Domain Aliasing Cancellation", IEEE Yrans. ASSP, Vol. 34, No. 5, October 1986. Known applications of MDCT for use of PACs and the functions performed by the high frequency decomposition MDCT filter bank 204 herein are fully described, for example, in the Johnston patent.

그러나, 상술한 바와 같이 고주파 분해 MDCT(204) 필터 뱅크 정상 신호를 나타내기 위해 사용되면 매우 효과적이지만, MDCT는 비정상 신호(즉, 순간적이거나 날카로운 발성을 포함하는 신호)의 치밀한 표현을 제공하지 않는다. 그러나, 우리는 오디오 코더(200)의 오디오 압축 특성을 향상시킬 수 있는 고주파 분해 MDCT 필터 뱅크(204)를 이용하는 장점을 제공하는 기술을 실현하였다.However, while used to represent the high frequency decomposition MDCT 204 filter bank steady signal as described above, MDCT does not provide a dense representation of an abnormal signal (ie, a signal that includes instantaneous or sharp speech). However, we have realized a technique that provides the advantage of using a high frequency decomposition MDCT filter bank 204 that can improve the audio compression characteristics of the audio coder 200.

따라서, 본 발명에 따르면, 스위치 필터 뱅크(202)에 적당한 신호가, 예컨대, 오디오 신호(101)를 인코딩하기 위해 고주파 분해 MDCT 필터 뱅크(204)와 웨이브렛 필터 뱅크(208)를 모두 이용한다. 바람직한 실시예에 따르면, 고주파 분해 MDCT 필터 뱅크(204)는 인코딩 목적을 위해 고주파 분해 MDCT를 사용한다. 즉, 비정상 신호가 공급될 때, 필터 뱅크(204)는 소위 긴 윈도우들(즉, 1024 서브 밴드)만을 이용하고, 소위 짧은 윈도우들(즉, 1024 서브 밴드에 대립하는 128 서브 밴드)에 "스위칭"하지 않는다. 물론, 이것은 이전에 언급된 종래의 윈도우 스위칭 기술로서 존스톤 특허에서 설명된다. 본 발명에 따르면, 스위치 필터 뱅크(202)는 상기 비정상 주기동안 짧은 MDCT 윈도우로의 스위칭보다는 오히려 웨이브렛 필터 뱅크(208)를 사용한다.Thus, according to the present invention, a signal suitable for the switch filter bank 202 uses both a high frequency decomposition MDCT filter bank 204 and a wavelet filter bank 208 to encode the audio signal 101, for example. According to a preferred embodiment, the high frequency decomposition MDCT filter bank 204 uses a high frequency decomposition MDCT for encoding purposes. That is, when an abnormal signal is supplied, the filter bank 204 uses only so-called long windows (i.e., 1024 subbands) and "switches" to so-called short windows (i.e. 128 sub-bands opposing the 1024 sub-bands). "I never do that. Of course, this is described in the Johnston patent as the conventional window switching technique mentioned previously. According to the present invention, the switch filter bank 202 uses the wavelet filter bank 208 rather than switching to a short MDCT window during the abnormal period.

특히, 웨이브렛 필터 뱅크(208)는 비정상 성분을 갖는 입력 신호를 효과적으로 필터링하기 위해 웨이브렛 변형을 이용한다. 웨이브렛은 여러 가지 변형과 팽창 특성들을 통해 유한 에너지 신호의 공간에 완전한 직교 체제를 제공하는 기능을 한다. 최적의 웨이브렛 변형을 이용한 오디오 신호의 전체적인 코딩은, 예컨대, 디이. 신하(D. Sinha)와 에이. 에이치. 튜픽(A. H. Tewfik)의 "적응된 웨이브렛들을 사용하는 낮은 비트율 투과성 오디오 압축(Low Bit Rate Transparent Audio Compression Adapted using Adapted Wavelets)", IEEE Transactions on Signal Processing, Vol. 41, No. 12, PP. 3463-3479, Dec. 1993에 기술된다. 본 발명의 실시예에 따르면, 우리는 PAC를 기초로 하는 사이코어쿠스틱(psychoacoustic) 모델을 사용하기 위해, 및 본 명세서에서 예시한 웨이브렛의 도면에서 주요한 기준에 따라 일정한 주파수와 시간의 특성들을 이용하기 위해 웨이브렛 변형을 채용해 왔다.In particular, the wavelet filter bank 208 uses wavelet modifications to effectively filter input signals with abnormal components. Wavelets serve to provide a complete orthogonal regime to the space of finite energy signals through various deformation and expansion characteristics. The overall coding of the audio signal using the optimal wavelet modification is, for example, Dei. D. Sinha and A. H. A. H. Tewfik, "Low Bit Rate Transparent Audio Compression Adapted Using Adapted Wavelets," IEEE Transactions on Signal Processing, Vol. 41, No. 12, PP. 3463-3479, Dec. Described in 1993. In accordance with an embodiment of the present invention, we use constant frequency and time characteristics to use a psychoacoustic model based on a PAC, and according to key criteria in the drawings of the wavelets illustrated herein. Wavelet variants have been employed for this purpose.

사이코어쿠스틱 분석의 시간-주파수 분해가 청각 시스템의 시간-주파수 분해와 조합될 수 있음은 공지된다. 상기 분해 특성은 사이코어쿠스틱 모델에서의 주파수 분해가 100Hz의 저주파로부터 약 4kHz의 고주파로(즉, 분해 시 40:1의 변화) 변하도록 지시하는 기준 밴드 스케일에 반영된다. 이로서, PAC 코더에서의 순간적인 분해는 저주파에서 고주파로 약 40:1의 비율로 증가시킬 수 있다. 대부분의 사이코 어쿠스틱 모델이 매우 낮은 균일한 시간적인 분해를 이용함은 공지된다. 고주파에서의 시간적인 분해의 부족은 정상 신호에 대해 계산된 문턱값에 거의 영향을 주지 않는다. 그러나, 비정상 신호에 대해 계산된 문턱값은 정확하지 않아서 청취 왜곡들(audible distortions)을 일으킬 수 있다. 상기 동작은 본 발명의 신호 적응성 스위치 필터 뱅크를 이용함으로써 개선될 수 있다.It is known that the time-frequency decomposition of the psychocore acoustic analysis can be combined with the time-frequency decomposition of the auditory system. The decomposition characteristic is reflected in the reference band scale which instructs the frequency decomposition in the Psycorecoustic model to vary from a low frequency of 100 Hz to a high frequency of about 4 kHz (ie, a change of 40: 1 upon decomposition). As such, the instantaneous decomposition in the PAC coder can be increased at a ratio of about 40: 1 at low to high frequencies. It is known that most psycho acoustic models use very low uniform temporal decomposition. The lack of temporal decomposition at high frequencies has little effect on the calculated threshold for steady signals. However, the threshold calculated for the abnormal signal may be inaccurate and cause audible distortions. The operation can be improved by using the signal adaptive switch filter bank of the present invention.

본 발명의 신호 적응성 스위치 필터 뱅크의 사용은 비정상 신호 세그먼트들이나 천이들(transients)의 코딩을 위하여 종래의 기술에 여러 가지 장점을 제공한다. 예를 들면, 비정상 신호 성분의 더욱 간단한 표시를 유도한다. 또한, 신호의 비정상 세그먼트들 중 더욱 정확한 사이코어쿠스틱 모델링을 유도한다. 상기 특징은 천이들(transients)을 나타내기 위한 전체 비트율 성분을 상당히 절약하는 것으로 이해될 수 있다. 또한, 본 신호 적응성 스위치 필터 뱅크의 사용은 정상 신호 세그먼트들의 압축을 위해 고주파 분해 MDCT 필터 뱅크의 공지된 성능의 이점들을 보호한다.The use of the signal adaptive switch filter bank of the present invention provides several advantages over the prior art for the coding of abnormal signal segments or transients. For example, it leads to a simpler representation of the abnormal signal component. It also leads to more accurate psychocore modeling of abnormal segments of the signal. It can be understood that the feature significantly saves the overall bit rate component for representing transitions. In addition, the use of the present signal adaptive switch filter bank protects the benefits of the known performance of the high frequency decomposition MDCT filter bank for the compression of normal signal segments.

특히, 본 발명의 양호한 실시예에 따르면 트리 구조의 웨이브렛 필터 뱅크가 사용된다. 상술한 바와 같이, 사이코어쿠스틱 모델의 정확도는 이용되는 주파수 스플릿이 주파수 축의 임계 뱅크 분할에 상당히 밀접하기 때문에 중요하다. 웨이브렛 필터 뱅크는 우수한 주파수 선택성(즉, 인접 서브 밴드간의 작은 오버랩)을 제공한다. 또한, 웨이브렛 필터 뱅크는 더 높은 주파수 서브 밴드의 임펄스 응답이 신속히 소멸하는 양호한 시간적인 특성들(또한, 긴밀하게 위치되는 것으로 알려짐)을 제공한다. 긴밀하게 위치된 더 높은 주파수 서브 밴드는 비정상 신호 세그먼트의 효과적인 표시를 유도한다. 본 발명의 양호한 실시예에서 사용되는 트리 구조는 상술한 원하는 웨이브렛 필터 뱅크 특성을 제공하는데 유익하다. 상기 트리 구조는 임계 밴드가 더 높은 주파수에서 더 넓기 때문에 더 높은 주파수 서브 밴드에 대한 필터가 상대적으로 짧은 장점을 제공하고, 이에 따라 전체 트리 구조에서 더 적은 스테이지들로 원하는 주파수 분해를 구현할 수 있다. 또한, 트리 구조의 필터 뱅크의 시간적인 특성들의 제어는 모멘트 조건(moment condition)을 이용하여 공급된다. 트리 구조를 임계 밴드 분할에 밀접하게 조합시키기 위해, 양호한 실시예의 트리 구조 웨이브렛 필터 뱅크는 3세트의 필터 뱅크를 사용한다. 필터 뱅크들 중 한 세트는 4개의 서브 밴드 스플릿을 제공하고, 한편 다른 2개의 세트는 각각 후술되는 바와 같이 2개의 서브 밴드 스플릿을 제공한다.In particular, according to a preferred embodiment of the present invention a tree-shaped wavelet filter bank is used. As mentioned above, the accuracy of the psychocore model is important because the frequency split used is quite close to the critical bank division of the frequency axis. The wavelet filter bank provides good frequency selectivity (ie, small overlap between adjacent subbands). In addition, the wavelet filter bank provides good temporal characteristics (also known to be closely located) in which the impulse response of the higher frequency subbands quickly disappears. The closely located higher frequency subbands lead to an effective indication of the abnormal signal segment. The tree structure used in the preferred embodiment of the present invention is beneficial to provide the desired wavelet filter bank characteristics described above. The tree structure provides a relatively short advantage of filters for higher frequency subbands because the critical band is wider at higher frequencies, thus enabling the desired frequency decomposition with fewer stages in the overall tree structure. In addition, control of the temporal characteristics of the filter bank of the tree structure is supplied using a moment condition. To closely combine the tree structure with the critical band division, the tree structure wavelet filter bank of the preferred embodiment uses three sets of filter banks. One set of filter banks provides four subband splits, while the other two sets each provide two subband splits as described below.

도 3은 스위치 필터 뱅크(202)에 사용되는 트리 구조의 웨이브렛 필터 뱅크에 대한 압축 해제 트리(300)를 예시한 도면이다. 바람직한 실시예에 따르면, 웨이브렛 필터 뱅크(208)의 예시적 트리 구조에 사용되는 3세트의 필터 뱅크는 3개의 구성이 임계 밴드부에 거의 밀착시키기에 충분한 디자인 유연성을 제공한다. 특히, 제 1 필터 뱅크 세트(310)는 신호의 4개의 밴드 스플릿(즉, 311-314)을 제공한다. 예시적으로, 상기 4개의 밴드 필터 스플릿은 필터(311 내지 314)로부터의 주파수를 증가시키고, 각 필터는 64개의 지지부(길이)를 갖는다. 또한, 예시적으로 제 2 필터 뱅크(320)는 40개의 지지부를 갖는 2개의 밴드 스플릿(즉, 321 및 322)을 제공하고, 한편 제 3 필터 뱅크(330)는 또한 20개의 지지부를 갖는 2개의 밴드 스플릿(즉, 331 및 332)을 제공한다. 당해 기술 분야에 숙련된 지식을 가진 자에게는 명백한 바와 같이, 압축 해제 트리(300)의 임의의 노드에서 필터 뱅크(310)의 응용은 4개의 인자에 의한 결정에 따른다. 마찬가지로, 필터 뱅크(320, 330)의 응용 각각은 2개의 인자에 의한 결정에 따른다. 예시적으로, N 샘플의 입력 블록으로 서브 밴드(331)는 N/64 필터링된 샘플을 갖고, 한편 서브 밴드(322)는 N/4 필터링된 샘플을 갖는다. 웨이브렛 필터 뱅크(208)에 의해 이용되는 3개의 필터 뱅크는 예를 들면 공지된 매개변수화된 패로너티(paraunity) 필터 뱅크를 이용 및 표준 최적화 수단을 제공함으로써 최적화된다. 상술한 트리 구조의 필터 뱅크에 의해 제공되는 최적화는 전체 필터 뱅크뿐만 아니라 3개의 필터 뱅크 각각이 자체로 우수한 주파수 선택성을 제공하도록 한다. 웨이브렛 필터 뱅크(208)를 최적화하기 위해 사용되는 최적화 기준은 공지된 중량 정지 밴드 에너지 기준에 기초한다(예컨대, 피이. 베이다이어나탄(P. Vaidyanathan)의 "다중 속도 디지털 필터들, 필터 뱅크들, 다상 네트워크들, 및 응용들: 지침서(Multirate Digital Filters, filterbanks, Polyphase Networks, and Applications: A Tutorial)", Proceedings of the IEEE, Vol. 78, No. 1, pp. 56-92, January 1990에서 알 수 있다). 상술한 트리 구조 필터 뱅크에 의해 제공되는 최적화로 전체 필터 뱅크뿐만 아니라 3개의 필터 뱅크 각각이 자체로 우수한 주파수 선택성을 제공할 수 있다.3 illustrates a decompression tree 300 for a wavelet filter bank of a tree structure used for the switch filter bank 202. According to a preferred embodiment, the three sets of filter banks used in the exemplary tree structure of the wavelet filter bank 208 provide sufficient design flexibility for the three configurations to be close to the critical band portion. In particular, the first filter bank set 310 provides four band splits (ie, 311-314) of the signal. By way of example, the four band filter splits increase the frequency from filters 311-314, and each filter has 64 supports (length). Further, by way of example, the second filter bank 320 provides two band splits (i.e., 321 and 322) with 40 supports, while the third filter bank 330 also has two supports with 20 supports. Band splits (ie, 331 and 332). As will be apparent to those skilled in the art, the application of the filter bank 310 at any node of the decompression tree 300 depends on the decision by four factors. Similarly, each application of filter banks 320 and 330 is subject to a determination by two factors. By way of example, with an input block of N samples, subband 331 has N / 64 filtered samples, while subband 322 has N / 4 filtered samples. The three filter banks used by wavelet filter bank 208 are optimized, for example, by using a known parameterized parity filter bank and by providing standard optimization means. The optimization provided by the tree filter filters described above allows each of the three filter banks as well as the entire filter bank to provide excellent frequency selectivity on its own. The optimization criteria used to optimize the wavelet filter bank 208 are based on known weight stop band energy criteria (eg, P. Vaidyanathan's "Multi-rate Digital Filters, Filter Banks"). , Multiphase Digital Filters, filterbanks, Polyphase Networks, and Applications: A Tutorial, "Proceedings of the IEEE, Vol. 78, No. 1, pp. 56-92, January 1990. Able to know). The optimization provided by the tree structure filter bank described above allows each of the three filter banks as well as the entire filter bank to provide excellent frequency selectivity on its own.

바람직한 실시예에서, 모멘트 조건은 고주파 필터의 바람직한 시간적인 특성들을 구현하는데 중요한 역할을 한다(즉, 더 높은 주파수를 갖는 압축 분해 트리(300) 내의 서브 밴드에 대응하는 필터) 모멘트 조건에서 중앙 주파수에 더 가까운 더 높은 서브 밴드 주파수 응답의 평탄성(편차(differentiablility)의 순서)을 결정한다. 이하에서 알 수 있는 바와 같이, 중앙 주파수 근방의 상기 더 큰 평탄성은 긴밀하게 위치되는 대응하는 임펄스 응답을 유도한다. 특히, 서브 밴드 필터 {H_i}_i=1 to M을 갖춘 M밴드 패로니터리 필터 뱅크(paraunitary filterbank)는 H(e^iw) for i=2, 3,...M이 w=0에서 P^th 순서 제로를 갖는다면, P^th 순서 모멘트 조건을 충족한다. 이 때, 필터는 P를 제로의 값을 가진 모멘트로 한다. 예시한 웨이브렛 필터 뱅크(208) 구성에서, P〉1 이어야 하는 필터에 대한 소정의 지지부(K)로 효과적인 지지부가 P의 증가를 감소시키는 필터를 만든다. 즉, 에너지의 대부분은 간격 KT에 집중되고, 여기서 KT는 더 높은 P보다 작다.In a preferred embodiment, the moment condition plays an important role in implementing the desired temporal characteristics of the high frequency filter (ie, the filter corresponding to the subband in the compression decomposition tree 300 with higher frequency) at the center frequency at the moment condition. Determine the flatness (order of differentiablility) of the higher subband frequency response closer. As can be seen below, the greater flatness near the center frequency leads to a correspondingly located corresponding impulse response. In particular, the M-band parunitary filterbank with subband filter {H _i } _{i = 1} to M is H (e ^iw ) for i = 2, 3, ... M with w = 0 at w = 0 ^{If th} order is zero, the P ^th order moment condition is met. At this time, the filter makes P a moment with a value of zero. In the illustrated wavelet filter bank 208 configuration, with a predetermined support K for the filter, which should be P > 1, the effective support makes a filter that reduces the increase in P. That is, most of the energy is concentrated in the interval KT, where KT is smaller than the higher P.

필터의 순간적 응답에서의 개선점은 일반적으로 진폭 주파수 응답 시 증가된 천이 밴드(transition band)에 있음을 알 수 있다(예컨대, 피이. 베이다이어나탄(P. Vaidyanathan)의 "다중 속도 디지털 필터들, 필터 뱅크들, 다상 네트워크들 및 응용들: 지침서(Multirate Digital Filters, filterbanks, Polyphase Networks, and Applications: A Tutorial", Proceedings of the IEEE, Vol. 78, No. 1, pp. 56-92, January, 1990에서 알 수 있다). 트리 구조의 필터 뱅크는 필터의 시간적인 특성들에서 원하는 위치를 갖기 위해 3세트의 필터 뱅크 각각에 대하여 2개의 0의 값을 갖는 모멘트(즉, P=2)를 갖는다. 예를 들면, 웨이브렛 필터 뱅크(208)의 최고 주파수 서브 밴드(예를 들면, 도 3에 도시된 서브 밴드(314))의 임펄스 응답(410)은 동일한 주파수 특성을 갖는 코사인 변조된 필터 뱅크로부터의 필터의 응답(420)을 비교하기 위해 나란히 예시한다. 도시된 바와 같이, 바람직한 실시예에 따라 구성된 웨이브렛 필터 뱅크로부터의 응답(410)은 고주파 웨이브렛 필터(314)의 임펄스 응답(410)에 의해 증명된 바와 같이 시간 내에 최상의 위치를 제공한다. 고주파 웨이브렛 필터(314)는 그 에너지의 대부분이 n=10에서 n=40사이에 집중된다. 비교시, 코사인 변조된 필터 뱅크의 응답(420)은 전체 범위 n=1에서 n=64로 확산된 에너지를 갖는다.It can be seen that the improvement in the instantaneous response of the filter is generally in an increased transition band in the amplitude frequency response (eg, P. Vaidyanathan's "Multi-Speed Digital Filters, Filters" Banks, Multiphase Networks and Applications: Multirate Digital Filters, filterbanks, Polyphase Networks, and Applications: A Tutorial ", Proceedings of the IEEE, Vol. 78, No. 1, pp. 56-92, January, 1990 The filter bank of the tree structure has two zero-valued moments (ie, P = 2) for each of the three sets of filter banks to have a desired position in the temporal characteristics of the filter. For example, the impulse response 410 of the highest frequency subband of the wavelet filter bank 208 (e.g., the subband 314 shown in FIG. 3) is derived from a cosine modulated filter bank having the same frequency characteristics. Compare the response of the filter 420 As shown, the response 410 from the wavelet filter bank constructed in accordance with the preferred embodiment is best positioned in time as evidenced by the impulse response 410 of the high frequency wavelet filter 314. The high frequency wavelet filter 314 concentrates most of its energy between n = 10 and n = 40. In comparison, the response 420 of the cosine modulated filter bank is in the full range n = 1 to n. Energy diffused to = 64.

본 발명의 원리에 따라, 고주파 분해 MDCT 필터 뱅크(204)는 정상 신호를 코딩하기 위해 사용되고, 필터 뱅크(208)는 비정상 신호를 코딩하기 위해 사용된다. 2개의 필터 뱅크를 이용하는 효율성의 문턱값은 특수한 신호 성분(즉, 정상 대 비정상 신호)에 기초한 것들 사이의 스위칭을 위한 메커니즘이다. 이를 위하여, MDCT는 오버랩된 직교 변형이 되도록 해야한다. 즉, 종래의 블록 변형과 다르게 인접 블록 사이에 50퍼센트의 오버랩이 있다. 따라서, 고주파 분해 MDCT 필터 뱅크(204)와 웨이브렛 필터 뱅크(204) 사이의 스위칭은 MDCT 블록과 웨이브렛 블록 사이의 오버랩 영역에서의 직교화를 필요로 한다. 한편, 공지된 일반적인 직교화의 디자인 방법에는 문제점이 있고(예컨대, 씨이. 헐리 등(C. Herley et al.)의 "시간 주파수 평면의 타일링: 임의의 직교 기초들 및 고속 타일링 알고리즘의 구성(Tiling of the Time-Frequency Plane: Construction of Arbitrary Orthogonal Bases and Fast Tiling Algorithm)", IEEE Transactions on Signal Processing, Vol. 41, No. 12, December, 1993」에서 알 수 있다), 상기 디자인의 단점은 결과 변형 매트릭스가 실행 지점에서 비효율적인데 있다. 즉, 결과 필터에서 임의의 구성의 결함은 웨이브렛 변형의 빠른 계산을 매우 어렵게 한다.In accordance with the principles of the present invention, the high frequency decomposition MDCT filter bank 204 is used to code a normal signal, and the filter bank 208 is used to code an abnormal signal. The threshold of efficiency using two filter banks is a mechanism for switching between those based on special signal components (i.e., normal versus abnormal signals). For this purpose, MDCT should be an overlapping orthogonal deformation. That is, there is a 50 percent overlap between adjacent blocks unlike conventional block variations. Thus, switching between the high frequency decomposition MDCT filter bank 204 and the wavelet filter bank 204 requires orthogonalization in the overlap region between the MDCT block and the wavelet block. On the other hand, there is a problem with known general orthogonal design methods (e.g. C. Herley et al., "Tiling in the Time-Frequency Plane: Arbitrary Orthogonal Foundations and Fast Tiling Algorithms. of the Time-Frequency Plane: Construction of Arbitrary Orthogonal Bases and Fast Tiling Algorithm. ", IEEE Transactions on Signal Processing, Vol. 41, No. 12, December, 1993." The matrix is inefficient at the point of execution. In other words, any configuration defect in the resulting filter makes the calculation of wavelet deformation very difficult.

따라서, 2N 샘플의 블록 상의 MDCT 동작이 상기 N 샘플 상의 N 포인트 직교 블록 변형 Q를 따라 윈도우된 데이터 내의 대칭 동작과 일치함(즉, 윈도우의 한끝으로부터 외부 N/2 샘플이 윈도우의 내부 N/2 샘플로 겹친다)을 통해 단순화가 실현될 수 있다. 신호의 완전한 재구성은 특수한 블록 직교 변형(Q)과 관계없음을 확인할 수 있다. 따라서, Q는 한 블록에 대한 MDCT 및 다음 블록에 대한 웨이브렛 변형일 수 있다. MDCT에 대응하는 매트릭스(Q)는 공지되었기 때문에 더 이상 설명하지 않는다. 이하, 웨이브렛 필터 뱅크에 의해 사용되는 매트릭스(Q)를 설명한다. 웨이브렛 변형을 이용할 때, 직교 매트릭스(Q) 필터 뱅크(이하, Q^WFB라 함)는 상술한 트리 구조의 웨이브렛의 3개의 필터 뱅크에 기초한 N×N 매트릭스이다. 상기 매트릭스 Q^WFB는 도 3의 압축 해제 트리(300)에 남아 있는 노드(즉, 서브 밴드)에 대응하는 각 블록을 갖는 복수의 블록으로 구성된다. 당해 기술 분야에 숙련된 지식을 가진 자에게는 명백한 바와 같이, 압축 해제 트리(300)에 대한 매트릭스는 3개의 필터 뱅크(310, 320, 330) 내의 필터와 유한 블록 사이즈(즉, 경계 조건들)를 조작하기 위한 기술에 의해 완전하게 분류된다. 설명을 명확히 하기 위해, 이하 바람직한 실시예에서 도 3에 도시된 압축 해제 트리(300)의 4개의 밴드 스플릿(310)에 대한 경계 조건의 핸들링에 대해 설명한다. 이에 따른 전체 트리 구조의 확대는 당업자에게는 명백하다.Thus, the MDCT operation on a block of 2N samples matches the symmetrical operation in the data windowed along the N point orthogonal block variant Q on the N sample (i.e., an outer N / 2 sample from one end of the window is the inner N / 2 of the window). Simplification can be realized by overlapping with samples). It can be seen that the complete reconstruction of the signal is independent of the special block orthogonal deformation (Q). Thus, Q may be a MDCT for one block and a wavelet variant for the next block. The matrix Q corresponding to MDCT is known and will not be described further. The matrix Q used by the wavelet filter bank is described below. When using a wavelet variant, an orthogonal matrix (Q) filter bank (hereinafter referred to as Q ^WFB ) is an N × N matrix based on the three filter banks of the wavelet of the tree structure described above. The matrix Q ^WFB is composed of a plurality of blocks having each block corresponding to a node (ie, a subband) remaining in the decompression tree 300 of FIG. 3. As will be apparent to one of ordinary skill in the art, the matrix for decompression tree 300 may determine the filters and finite block sizes (ie boundary conditions) in three filter banks 310, 320, 330. Completely classified by the technique for manipulation. For clarity, the handling of the boundary conditions for the four band splits 310 of the decompression tree 300 shown in FIG. 3 in the preferred embodiment is described below. This expansion of the overall tree structure is apparent to those skilled in the art.

도 3에 도시된 4개의 밴드 스플릿에 대하여, 대응하는 변형 매트릭스(Q)는 필터(311, 312, 313, 314) 각각에 대응하는 하나의 서브 블록을 갖는 사이즈 N/4×N 의 4개의 서브 블록으로 구성된다. 예시적으로, K에 따른 상기 필터의 길이를 규정하고, 또한 다른 상수 K1=(K/4)-1을 규정한다. 서브 블록의 N/4-K1 행을 제외한 4개의 서브 블록 각각에 대하여 각 서브 밴드 필터 및 서브 밴드 필터의 (N/4-K1-1) 변형이 대응한다. 순환 회전을 피하기 위해, 서브 블록의 남은 K1행은 블록의 끝단 근방에서 동작되도록 디자인된 변형 필터이다. 특히, Q1, Q2, Q3, Q4는 4개의 다른 행에 대응하는 K1×N에 따라 규정된다. 다음에, Q1 내지 Q4는 서브 스페이스에 대한 직교 체제로 된 상기 매트릭스가 Q의 미리 규정된 4×(N/4-K1)와 집합적으로 직교하도록 선택된다. 또한, Q1 내지 Q4는 다음 식의 폼을 갖는 비용 함수를 최대화하기 위해 선택된다.For the four band splits shown in FIG. 3, the corresponding deformation matrix Q is four subs of size N / 4 × N with one subblock corresponding to each of the filters 311, 312, 313, 314. It is composed of blocks. By way of example, it defines the length of the filter according to K, and also defines another constant K1 = (K / 4) -1. Respective subband filters and (N / 4-K1-1) modifications of the subband filters correspond to each of the four subblocks except the N / 4-K1 row of the subblock. To avoid cyclic rotation, the remaining K1 rows of sub-blocks are deformation filters designed to operate near the end of the block. In particular, Q1, Q2, Q3, and Q4 are defined according to K1 × N corresponding to four different rows. Next, Q1 to Q4 are selected such that the matrix in orthogonal regime to the subspace is collectively orthogonal to the predefined 4x (N / 4-K1) of Q. Further, Q1 to Q4 are selected to maximize the cost function with the form of

Cost=Trace(Q1WTD1WQ1T + Q2WTD2WQ2T + Q3WTD3WQ3T + Q4WTD4WQ4T), 여기서 W는 N×N 유리에 변형 매트릭스이고, D1 내지 D4는 제로가 아닌 N 대각선 소자의 N/4 및 1을 갖는 대각선 매트릭스이다. 특수한 서브 밴드에 대한 제로가 아닌 N/4 제로가 아닌 소자는 주파수 축 상의 특수한 서브 밴드의 위치와 일치한다. 당업자에게는 명백한 바와 같이, 이것은 예를 들면 표준 최적화 수단을 이용하여 해결될 수 있는 서브스페이스 제한된 최적화 문제점이다. 각 서브 밴드에 대하여, 변형 필터는 서브 밴드 계수가 정확한 순간적인 해석을 할 수 있도록 증가된 그룹 지연의 순서로 Q^WFB에 배치된다.Cost = Trace (Q1WTD1WQ1T + Q2WTD2WQ2T + Q3WTD3WQ3T + Q4WTD4WQ4T), where W is a strain matrix in N × N glass, and D1 through D4 are diagonal matrices with N / 4 and 1 of non-zero N diagonal elements. A nonzero N / 4 nonzero device for a particular subband matches the location of the particular subband on the frequency axis. As will be apparent to those skilled in the art, this is a subspace limited optimization problem that can be solved using, for example, standard optimization means. For each subband, the distortion filter is placed in the Q ^WFB in order of increased group delay so that the subband coefficients can be accurately instantaneously interpreted.

또한, 상술한 직교법이 시간 내에 웨이브렛 필터의 확장 효과 및/또는 웨이브렛 필터 자체에서의 불연속을 도입하는 효과를 갖음을 알 수 있다. 웨이브렛 필터 뱅크(208)의 임의의 가능한 손상은 다음의 이유로 완화될 수 있다: (ⅰ) 순간적인 START 및 STOP 윈도우(예를 들면, 존스톤 특허에 설명된 바와 같이)는 고주파 분해 MDCT 필터 뱅크(204)와 웨이브렛 필터 뱅크(208)의 이용 중의 천이에 따라 사용된다; (ⅱ) 소위 평탄한 윈도우들의 패밀리를 이용하여 천이 윈도우와 웨이브렛 윈도우 사이의 유효 오버랩을 감소시킨다. 상술한 기술을 이용하는 고주파 분해 MDCT 필터 뱅크(204)와 웨이브렛 필터 뱅크(208) 사이의 예시적 스위칭 시퀀스는 도 5에 도시된다. 도 5에 도시된 바와 같이, START 윈도우(502)는 고주파 분해 MDCT 필터 뱅크 윈도우(501)와 웨이브렛 필터 뱅크 윈도우(503) 사이의 천이에 이용된다. 또한, STOP 윈도우(504)는 웨이브렛 필터 뱅크 윈도우(504)와 고주파 분해 MDCT 필터 뱅크 윈도우(505) 사이의 천이에 이용된다.It can also be seen that the orthogonality described above has the effect of introducing the wavelet filter's expansion and / or discontinuities in the wavelet filter itself within time. Any possible damage to the wavelet filter bank 208 can be mitigated for the following reasons: (i) The instantaneous START and STOP windows (eg, as described in the Johnston patent) may be used for the high frequency decomposition MDCT filter bank ( 204 and wavelet filter bank 208 are used depending on the transition during use; (Ii) Use a family of so-called flat windows to reduce the effective overlap between the transition window and the wavelet window. An exemplary switching sequence between the high frequency decomposition MDCT filter bank 204 and the wavelet filter bank 208 using the techniques described above is shown in FIG. 5. As shown in FIG. 5, the START window 502 is used for the transition between the high frequency decomposition MDCT filter bank window 501 and the wavelet filter bank window 503. STOP window 504 is also used for the transition between wavelet filter bank window 504 and high frequency decomposition MDCT filter bank window 505.

소위 평탄한 윈도우는 START 윈도우(502)와 웨이브렛 윈도우(503) 사이의 오버랩 영역에 이용되고, 다시 웨이브렛 윈도우(503)와 STOP 윈도우(504) 사이의 오버랩 영역 사이에 사용된다. 상기 평탄한 윈도우는 베이스 밴드 필터로서 유용하고, 시간 내에 치밀하게 위치된다(즉, 윈도우 내의 대부분의 에너지는 중앙부 주변에 집중된다). 평탄한 윈도우는 다음의 방정식을 이용하여 발생된다: h(n)=h(t) ｜ t=(n+1/2)(1/N), n=0, 1, .....N-1, 여기서 h(t)는 간격 [0,1]에서는 제로가 아니고, 이외에서는 제로이다.The so-called flat window is used for the overlap area between the START window 502 and the wavelet window 503, and again between the overlap area between the wavelet window 503 and the STOP window 504. The flat window is useful as a base band filter and is located densely in time (ie, most of the energy in the window is concentrated around the center). The flat window is generated using the following equation: h (n) = h (t) | t = (n + 1/2) (1 / N), n = 0, 1, ..... N- 1, where h (t) is not zero in the interval [0, 1] and zero otherwise.

도 2로 다시 돌아가면, 지각 모델 프로세서(210)는 스위치 분석 필터 뱅크(202) 내의 지각 중요도 평가와 여러 가지 신호 성분의 노이즈 마스킹 성질을 계산하기 위해 사이코어쿠스틱 분석을 이용한다. 프로세서(210)에서 발생하는 사이코어쿠스틱 분석은 공지되어 있고, 예를 들면 존스톤 특허와 제이. 디이. 존스톤(J. D. Johnston)의 "지각 노이즈 기준을 사용하는 오디오 신호들의 변형 코딩(Transform Coding of Audio Signals Using Perceptual Noise Criteria)", IEEE Journal on Selected Areas in Communication, Vol. 6, pp. 319-323, February, 1988에 기술된다. 한편, MDCT 블록 내의 계수의 양자화를 위한 문턱값은 사이코어쿠스틱 분석으로부터의 공지된 방법에서 직접 제공되고, 웨이브렛 블록에 의해 사용되는 문턱값은 부가적인 프로세싱을 필요로 한다.Returning to FIG. 2, the perceptual model processor 210 uses psychocore analysis to calculate the perceptual importance assessment within the switch analysis filter bank 202 and to calculate the noise masking properties of the various signal components. Psychoacoustic analysis occurring in the processor 210 is known, for example Johnston Patent and J. Dee. J. D. Johnston, "Transform Coding of Audio Signals Using Perceptual Noise Criteria," IEEE Journal on Selected Areas in Communication, Vol. 6, pp. 319-323, February, 1988. On the other hand, the threshold for quantization of the coefficients in the MDCT block is provided directly in a known method from psychocore analysis, and the threshold used by the wavelet block requires additional processing.

웨이브렛 계수의 양자화를 위한 문턱값들은 시간 변화 확산 에너지(time-varing spread energy)의 평가와 PAC에 따라 평가된 음질 특정에 기초한다. 확산 에너지는 시간뿐만 아니라 주파수 교차 마스킹의 확산을 고려하여 계산된다. 즉, 시간적 확산 함수뿐만 아니라 내부 주파수가 이용된다. 확산 함수의 형태는 예를 들면, 제이. 비이. 앨련(J. B. Allen)의 "통신에서 스피치 및 청취의 ASA 에디션(The ASA edition of Speech Hearing in Communications)", Acoustical Society of America, New York, 1995에 기술된 바와 같이 초클리어 필터(chochlear filter)로부터 유도된다. 마스킹의 시간적인 확산은 주파수에 의존하며, 특수한 주파수에서 초클리어 필터의 대역폭의 반전에 의해 대략적으로 결정된다. 고정된 시간적인 확산 함수는 주파수 또는 서브 밴드의 범위에 이용되는 것이 바람직하다. 따라서, 확산 함수의 형태는 더 높은 주파수에서 더 좁아지게 된다. 서브 밴드 내의 계수는 코더 밴드 내에서 그룹화되고, 코더 밴드마다 하나의 문턱값은 양자화 중 이용된다. 예시적으로, 코더 밴드는 최저 주파수 서브 밴드에서의 10msec로부터 최고 주파수 서브 밴드에서의 약 2.5msec까지 확대된다.The thresholds for quantization of the wavelet coefficients are based on the evaluation of time-varing spread energy and the sound quality specification evaluated according to the PAC. The spread energy is calculated taking into account the spread of frequency cross-masking as well as time. That is, the internal frequency as well as the temporal spreading function are used. The form of the diffusion function is, for example, Jay. B. Derived from chochlear filter as described in JB Allen's "The ASA edition of Speech Hearing in Communications," Acoustical Society of America, New York, 1995. do. The temporal spread of the masking is frequency dependent and roughly determined by the inversion of the bandwidth of the superclear filter at a particular frequency. Fixed temporal spreading functions are preferably used for a range of frequencies or subbands. Thus, the shape of the diffusion function becomes narrower at higher frequencies. Coefficients in the subbands are grouped within the coder band, with one threshold per coder band used during quantization. By way of example, the coder band extends from 10 msec in the lowest frequency subband to about 2.5 msec in the highest frequency subband.

또한, 존스톤 특허에서 설명되는 양자화/속도 루프 프로세서(212)는 스위치 분석 필터 뱅크(202), 지각 모델 프로세서(210), 할당 비트들, 노이즈로부터의 출력을 취하여, 주어진 응용에 대해 필요한 비트율을 정하도록 다른 시스템 파라미터를 제어한다. 엔트로피 코더(214)는 루프 프로세서(212)와의 결합으로 더 이상의 노이즈 없는 압축을 구현하기 위해 사용된다. 설명된 바와 같이, 예를 들면, 존스톤 특허에서 엔트로피 코더(214)는 양자화/속도 루프 프로세서(212)로부터 출력된 양자화된 오디오 신호를 수신한다. 이 때, 엔트로피 코더(214)는, 예를 들면, 공지된 최소 중복성 허프만 코딩 기술(minimum-redundancy Huffman coding technique)을 이용하여 양자화된 오디오 신호의 손실없는 인코딩을 수행할 수 있다. 허프만 코드는, 예컨대, 디이. 에이. 허프만(D. A. Huffman)의 "최소 중복성 코드들의 구성을 위한 방법(A Method for the Construction of Minimum Redundancy Codes)", Proc. IRE, 40:1090-1101, 1952, 및 티이. 엠. 커버(T. M. Cover)와 제이. 에이. 토마스(J. A. Thomas)의 "정보 이론의 요소들(Elements of Information Theory)", pp. 92-101, 1991에 기술된다. 또한, 존스톤 특허는 엔트로피 코더(214)의 PAC 내용에서 허프만 코딩의 사용을 설명한다. 당업자는 공지된 Lempel-Ziv 압축 방법을 포함한 다른 노이즈 없는 데이터 압축 기술을 이용한 엔트로피 코더(214)의 또 다른 실시예를 실행하는 방법을 용이하게 인식할 수 있다.In addition, the quantization / velocity loop processor 212 described in the Johnston patent takes the output from the switch analysis filter bank 202, the perceptual model processor 210, the allocated bits, and the noise to determine the required bit rate for a given application. To control other system parameters. Entropy coder 214 is used in conjunction with loop processor 212 to implement further noiseless compression. As described, for example, in the Johnston patent, entropy coder 214 receives the quantized audio signal output from quantization / speed loop processor 212. At this time, the entropy coder 214 may perform lossless encoding of the quantized audio signal using, for example, a known minimum-redundancy Huffman coding technique. Huffman code is, for example, Dee. a. D. A. Huffman, "A Method for the Construction of Minimum Redundancy Codes", Proc. IRE, 40: 1090-1101, 1952, and tee. M. T. M. Cover and J. a. J. A. Thomas, "Elements of Information Theory", pp. 92-101, 1991. The Johnston patent also describes the use of Huffman coding in the PAC content of entropy coder 214. Those skilled in the art can readily recognize how to implement another embodiment of entropy coder 214 using other noiseless data compression techniques, including the known Lempel-Ziv compression method.

결국, 스위칭 기준(206)은 고주파 분해 NDCT 필터 뱅크(204)와 웨이브렛 필터 뱅크(208) 사이의 효과적인 스위칭을 더 실행하기 위해 이용된다. 효율화를 위해, 기준은 소정의 잘못된 경보들(false alarms) 또는 빠트린 발성들 없이 정확하게 발성들을 검색하야만 한다. 예를 들면, 검색되지 않은 발성들은 고주파 분해 MDCT 필터 뱅크(204)를 이용하여 인코딩되면 특히 저 비트율에서 두드러진 왜곡을 일으킨다. 대조적으로, 웨이브렛 필터 뱅크(208)로 상대적으로 정상 신호를 코딩하는 경우 출력 비트와 프로세싱 전력의 상당한 낭비를 유발한다.As a result, the switching criteria 206 is used to further perform effective switching between the high frequency decomposition NDCT filter bank 204 and the wavelet filter bank 208. In order to be efficient, the criteria must search for utterances accurately without any false alarms or missed utterances. For example, unsearched utterances cause significant distortion, especially at low bit rates, when encoded using the high frequency decomposition MDCT filter bank 204. In contrast, coding a relatively normal signal with wavelet filter bank 208 introduces a significant waste of output bits and processing power.

따라서, 바람직한 실시예에 따르면 지각 엔트로피 기준이 사용된다. 상술한 바와 같이, 지각 엔트로피는 샘플당 비트의 이론적으로 낮은 경계를 명료하게 코드 세그먼트에 공급하는 신호의 특수한 변형 세그먼트의 측정값이다. 한 세그먼트로부터 다음 세그먼트로의 지각 엔트로피의 큰 증가시 신호의 강한 비정상(예를 들면, 발성)의 유효한 표시를 한다. 도 2의 실시예에 따르면, 지각 엔트로피 변화의 상기 형태는 고주파 분해 MDCT 필터 뱅크(204)로부터 웨이브렛 필터 뱅크(208)까지의 스위칭을 트리거하기 위해 코더(202)에 의해 사용된다. 예시적으로, 고주파 분해 MDCT 필터 뱅크(204)와 웨이브렛 필터 뱅크(208) 사이의 스위칭에 관한 결정은 25msec마다 코더(202)에 의해 이루어진다. 결국, 상술한 바는 단지 본 발명의 원리를 예시할 뿐이다. 당 분야에 숙련된 지식을 가진 자는 본 명세서에 명확하게 도시 및 설명되지는 않았지만, 첨부된 클레임에 규정된 바와 같이 상기 원리와 본 발명의 범주 내에서 이에 따른 원리를 실시하는 여러 가지 다른 배치를 할 수 있다.Thus, according to a preferred embodiment, perceptual entropy criteria are used. As mentioned above, perceptual entropy is a measure of a particular strained segment of a signal that clearly feeds the code segment a theoretically low boundary of bits per sample. A large increase in perceptual entropy from one segment to the next makes a valid indication of a strong anomaly (eg, speech) in the signal. According to the embodiment of FIG. 2, this form of perceptual entropy change is used by coder 202 to trigger switching from high frequency decomposition MDCT filter bank 204 to wavelet filter bank 208. By way of example, a decision regarding switching between high frequency decomposition MDCT filter bank 204 and wavelet filter bank 208 is made by coder 202 every 25 msec. In the end, the foregoing merely illustrates the principles of the invention. Persons of ordinary skill in the art, although not explicitly shown and described herein, may make various other arrangements to implement the above principles and the principles within the scope of the invention as defined in the appended claims. Can be.

본 발명의 오디오 신호 압축 기술은 지각 오디오 코딩 또는 유사한 서브 밴드 타입 코딩을 이용하여 고주파 분해 MDCT 필터 뱅크와 웨이브렛 필터 사이를 스위칭하는 신호 적응성 스위치 필터 뱅크를 이용함으로써 코딩된 비정상 신호를 핸들링할 수 있다.The audio signal compression technique of the present invention can handle coded abnormal signals by using signal adaptive switch filter banks that switch between high frequency decomposition MDCT filter banks and wavelet filters using perceptual audio coding or similar subband type coding. .

도 1은 본 발명이 예시적으로 구현된 시스템의 블록도1 is a block diagram of a system in which the present invention is illustratively implemented.

도 2는 본 발명의 신호 적응성 스위치 필터 뱅크를 이용하는 도 1의 시스템에 사용된 예시적인 지각 오디오 코더(perceptual audio coder)의 블록도2 is a block diagram of an exemplary perceptual audio coder used in the system of FIG. 1 using the signal adaptive switch filter bank of the present invention.

도 3은 도 2의 신호 적응성 스위치 필터 뱅크에 사용된 트리 구조의 웨이브 및 필터 뱅크를 예시하는 도면3 illustrates a tree structured wave and filter bank used in the signal adaptive switch filter bank of FIG.

도 4는 코사인 변조 필터와 도 2의 신호 적응성 스위치 필터 뱅크에 사용된 웨이브렛 필터 사이의 비교를 예시하는 도면4 illustrates a comparison between a cosine modulation filter and a wavelet filter used in the signal adaptive switch filter bank of FIG. 2.

도 5는 도 2의 신호 적응성 스위치 필터 뱅크를 사용하여 발생된 예시적인 필터 뱅크 스위칭 시퀀스를 예시하는 도면5 illustrates an example filter bank switching sequence generated using the signal adaptive switch filter bank of FIG. 2.

*도면의 주요부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

101 : 아날로그 오디오 102 : 프리프로세서101: analog audio 102: preprocessor

103 : PCM 신호 106 : 통신 채널/저장 매체103: PCM signal 106: communication channel / storage medium

108 : 지각 오디오 디코더 110 : 포스트프로세서108: perceptual audio decoder 110: postprocessor

202 : 신호 적응성 스위치 필터 뱅크 300 : 압축 해제 트리202: signal adaptive switch filter bank 300: decompression tree

Claims

In the method for encoding an audio signal,

Sampling the audio signal;

Selectively filtering the sampled audio signal by switching between a first filter bank and a wavelet filter bank to produce a filtered signal, wherein the wavelet filter bank is a non-uniform filter bank of a tree structure and the first filter bank; The filter bank is independent of the wavelet filter bank, and the switching occurs in response to a stationarity of the audio signal;

Encoding the filtered signal to provide a compressed output signal.

The method of claim 1,

And the first filter bank is a high frequency resolution MDCT filterbank.

The method of claim 2,

And wherein the wavelet filter bank uses a plurality of moment conditions to distinguish frequency response within the non-uniform filter bank.

The method of claim 2,

And in said filtering step said high frequency decomposition MDCT filter bank is used to filter normal components of said audio signal, and said wavelet filter bank is used to filter abnormal components of said audio signal.

The method of claim 1,

And wherein said encoding step comprises perceptual audio coding.

In the method for encoding an audio signal,

Generating a plurality of noise thresholds as a function of frequency characteristics of the audio signal;

Selectively filtering the audio signal by switching between a first filter bank and a wavelet filter bank to produce a filtered signal, wherein the wave and filter bank are a non-uniform filter bank of a tree structure and the first filter bank Is independent from the wavelet filter bank, and the switching occurs in response to the normality of the audio signal;

Quantizing the filtered signal, wherein the coarseness of the quantization is determined by the noise thresholds;

And perceptually encoding the quantized signal.

The method of claim 6,

And the first filter bank is a high frequency decomposition MDCT filter bank.

The method of claim 7, wherein

The method of claim 8,

And said normality of said audio signal is determined using perceptual entropy.

The method of claim 10,

The first of the non-uniform filter banks provides four band splits of the audio signal and the second of the non-uniform filter banks provides two band splits of the signal Way.

A method for encoding a digital audio signal to generate a compressed output signal, the method comprising:

Generating a plurality of noise thresholds as a function of the frequency characteristic of the digital signal;

Selectively filtering the digital signal by switching between a first filter bank and a wavelet filter bank to produce a filtered signal, wherein the wavelet filter bank is a non-uniform filter bank of a tree structure and the first filter bank Is independent from the wavelet filter bank, and the switching occurs in response to the normality of the audio signal;

And perceptually encoding the filtered signal to provide the compressed output signal.

The method of claim 12,

And the first filter bank is a high frequency decomposition MDCT filter bank.

An apparatus for encoding an audio signal, the apparatus comprising:

The device,

Means for sampling the audio;

Means for selectively filtering the sampled audio signal by switching between a first filter bank and a wave and filter bank to produce a filtered signal;

Means for encoding the filtered signal to produce a compressed output signal, wherein the wavelet filter bank is a non-uniform filter bank of a tree structure, the first filter bank is independent from the wavelet filter bank, And the switching occurs in response to the normality of the audio signal.

The method of claim 14,

And the first filter bank is a high frequency decomposition MDCT filter bank.

The method of claim 15,

And wherein said normality is determined as a function of perceptual entropy of said audio signal.

An apparatus for encoding an audio signal, the apparatus comprising:

Means for generating a plurality of noise thresholds as a function of the audio signal and frequency characteristics;

Means for sampling the audio signal;

Means for selectively filtering the sampled audio signal by switching between a first filter bank and a wavelet filter bank to produce a filtered signal, wherein the wavelet filter bank is a non-uniform filter bank of a tree structure and the first filter bank; Means for filtering, wherein one filter bank is independent of the wavelet filter bank and the switching occurs in response to a normality of the audio signal;

Means for quantizing the filtered signal, wherein the coarseness of the quantization is controlled by the noise thresholds;

Means for perceptually encoding the quantized signal.