CN103000179B

CN103000179B - Multichannel audio coding/decoding system and method

Info

Publication number: CN103000179B
Application number: CN201210257019.5A
Authority: CN
Inventors: 张玲玲; 叶青华; 蔡志博; 李晓东
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2011-09-16
Filing date: 2012-07-23
Publication date: 2014-11-12
Anticipated expiration: 2032-07-23
Also published as: CN103000179A

Abstract

The invention relates to a multi-channel audio codec system and its method. The multi-channel signal is converted into two pairs of stereo signals before and after, and the dynamic information of simulating the slight rotation of the human head is added, and the two pairs of stereo signals before and after modulation are changed accordingly, without affecting the The listening effect of ordinary stereo after mixing, at the same time, the receiving end can also solve the dynamic information, distinguish the two pairs of stereo before and after, and then distribute it to the multi-channel system for playback, so as to achieve good surround sound playback, which can be used in broadcasting systems, in homes, cars, etc. Private spaces enable good surround sound reproduction. The present invention embeds the dynamic information simulating the slight rotation of the human head during encoding, on the one hand, it does not affect the listening effect of the down-mixed stereo signal; source location.

Description

A multi-channel audio codec system and method thereof

技术领域 technical field

本发明涉及声信号处理技术领域，特别涉及一种多通道音频编解码系统及其方法。The invention relates to the technical field of acoustic signal processing, in particular to a multi-channel audio codec system and method thereof.

背景技术 Background technique

随着多通道数字音频编解码技术的广泛应用和多通道环绕声节目的丰富，如何在家庭、汽车等私人空间中通过传统的传输信道（如广播）欣赏高质量的环绕声节目正逐渐被重视。With the wide application of multi-channel digital audio codec technology and the enrichment of multi-channel surround sound programs, how to enjoy high-quality surround sound programs through traditional transmission channels (such as broadcasting) in private spaces such as homes and cars is gradually being paid attention to .

上世纪70年代开始，针对多通道信号和双通道立体声之间的兼容问题提出了矩阵环绕（Matrix Surround）编解码，通过编码矩阵使得下混合后立体声的左右通道间关系对应于声源的空间信息，解码矩阵尽可能的恢复出和原始环绕声相同空间定位的多通道信号。基于此开发了一系列商业化系统如“Dolby Pro Logic”,“Lexicon Logic7”，“SRS Circle Surround”，“DTS Neo 6”等。Since the 1970s, Matrix Surround codec was proposed for the compatibility between multi-channel signals and two-channel stereo. Through the coding matrix, the relationship between the left and right channels of the down-mixed stereo corresponds to the spatial information of the sound source. , the decoding matrix restores the multi-channel signal with the same spatial positioning as the original surround sound as much as possible. Based on this, a series of commercial systems such as "Dolby Pro Logic", "Lexicon Logic7", "SRS Circle Surround", "DTS Neo 6" and so on were developed.

近几年出现的MPEG Surround（ISO/IEC 23003-1）同样采用矩阵环绕编解码作为一种工作模式，基于大量的音频样本测试，通过推算各通道间的声级差、相关度等恢复出多通道音频信号，但是仍不能避免矩阵环绕编解码存在的缺陷。矩阵环绕声系统经过多年的发展，虽有多种形式，但其编码方法是类似的。下面列举一个5.1系统的例子来说明其原理。MPEG Surround (ISO/IEC 23003-1) that has appeared in recent years also uses matrix surround codec as a working mode. Based on a large number of audio sample tests, it restores multi-channel by calculating the sound level difference and correlation between channels. audio signal, but still cannot avoid the defects of matrix surround codec. After years of development, the matrix surround sound system has many forms, but its encoding method is similar. The following is an example of a 5.1 system to illustrate its principle.

d=Ms （1）d=Ms (1)

其中，in,

$M m = = (\begin{matrix} 11 & 00 & \frac{11}{\sqrt{22}} & j j \frac{\sqrt{33}}{22} & j j \frac{11}{22} \\ 00 & 11 & \frac{11}{\sqrt{22}} & - - j j \frac{11}{22} & - - j j \frac{\sqrt{33}}{22} \end{matrix}),,$ $s the s = = [\begin{matrix} l l ((t t)) \\ r r ((t t)) \\ c c ((t t)) \\ {l l}_{s the s} ((t t)) \\ {r r}_{s the s} ((t t)) \end{matrix}]$ $d d = = [\begin{matrix} {l l}_{t t} ((t t)) \\ {r r}_{t t} ((t t)) \end{matrix}] - - - - - - ((22))$

M为编码矩阵，将多通道信号转化为双通道立体声信号；s为多通道信号，l(t)、r(t)、c(t)、l_s(t)、r_s(t)分别表示左前、右前、中前、左后、右后5个音频通道信号；d为下混合后的双通道立体声信号，l_t(t)、r_t(t)分别表示立体声的左右通道信号。M is the encoding matrix, which converts the multi-channel signal into a two-channel stereo signal; s is the multi-channel signal, l(t), r(t), c(t), l _s (t), rs _s (t) represent Left front, right front, center front, left rear, and right rear 5 audio channel signals; d is the down-mixed two-channel stereo signal, l _t (t), r _t (t) respectively represent the left and right channel signals of the stereo.

假设某一方位的虚拟声像用最近的两个扬声器播放，信号分配按照立体声信号分配原则，然后计算下混合之后的立体声左右通道幅度比，如图1所示，横坐标表示虚拟声像的某一方位角度，纵坐标表示左右通道的幅度比。Assuming that the virtual sound image in a certain direction is played by the nearest two speakers, the signal distribution follows the principle of stereo signal distribution, and then the amplitude ratio of the left and right stereo channels after down-mixing is calculated, as shown in Figure 1, and the abscissa represents a virtual sound image. An azimuth angle, and the ordinate represents the amplitude ratio of the left and right channels.

从图1中可以看到，通过编码矩阵下混合后的左右信号幅度比与设定的虚拟声源的位置一一对应，保留了原始声源的空间信息，解码端通过比较基本上能对声源进行定位，但仅局限于单声源的情况。但是，实际中存在多声源和背景声时，矩阵编解码技术向下混合后的立体声信号往往会出现严重的串声干扰和定位信息丢失，尤其是包含的原始环绕通道的音质和定位信息受损伤，解码后的输出信号各通道之间存在过多的串声，造成较差的声源定位、声染色，无清晰定位的背景环绕声。另外，如果声像在后半平面，下混合后的立体声信号左右通道相位相反，影响了普通用户的正常听音。It can be seen from Figure 1 that the amplitude ratio of the left and right signals after downmixing by the encoding matrix corresponds to the position of the set virtual sound source one by one, and the spatial information of the original sound source is preserved. source localization, but only in the case of a single source. However, when there are multiple sound sources and background sounds in practice, the stereo signal mixed down by the matrix codec technology often suffers from serious crosstalk interference and loss of positioning information, especially the sound quality and positioning information of the original surround channel. Damage, there is too much crosstalk between the channels of the decoded output signal, resulting in poor sound source localization, sound coloring, and no clear localized background surround sound. In addition, if the sound image is in the rear half plane, the phases of the left and right channels of the down-mixed stereo signal are reversed, which affects the normal listening of ordinary users.

发明内容 Contents of the invention

本发明的目的在于，本发明提出多通道音频编解码系统及其方法，将多通道信号转换成前后两对立体声信号，模拟人头微小转动的动态信息，调制前后两对立体声信号做出相应的变化，接收端解出动态信息，区分前后两对立体声，再分配到多通道系统播放。The purpose of the present invention is that the present invention proposes a multi-channel audio codec system and its method, which converts the multi-channel signal into two pairs of stereo signals before and after, simulates the dynamic information of the slight rotation of the human head, and makes corresponding changes to the two pairs of stereo signals before and after modulation. , the receiving end decodes the dynamic information, distinguishes the front and back stereo pairs, and then distributes them to the multi-channel system for playback.

一般来说声源定位主要由以下因素决定：双耳时间差（Interaural Time Difference,简称ITD）、双耳声级差（Interaural Level Difference,简称ILD）、谱因素（SpectralCue）、动态因素（Dynamic Cue）等。其中，动态因素依靠头部转动引起ITD和ILD的改变，来定位声源。如果变化符合前方声源的变化规律，则人能辨别出是前方声像，反之亦然。Generally speaking, sound source localization is mainly determined by the following factors: Interaural Time Difference (ITD for short), Interaural Level Difference (ILD for short), spectral factor (SpectralCue), dynamic factor (Dynamic Cue), etc. . Among them, the dynamic factor depends on the change of ITD and ILD caused by head rotation to localize the sound source. If the change conforms to the changing law of the front sound source, people can identify the front sound image, and vice versa.

人听音时头部会不自觉有一些微小的转动，引起ITD（双耳时间差）和ILD（双耳声级差）等的改变，根据这样的动态信息可以区分前后声源，同时不影响听音效果。同样在编码中为了区分前后，又不影响听音效果，可以调制声源对应的立体声信号来模拟人头部的微小转动，从而实现多通道编解码更准确的声源定位。本申请通过调制前后两对立体声的左右通道幅度值来模拟人头微小转动。When people listen to the sound, there will be some small rotations of the head unconsciously, which will cause changes in ITD (binaural time difference) and ILD (binaural sound level difference). According to such dynamic information, the front and rear sound sources can be distinguished without affecting listening Effect. Also in encoding, in order to distinguish the front and back without affecting the listening effect, the stereo signal corresponding to the sound source can be modulated to simulate the slight rotation of the human head, so as to achieve more accurate sound source positioning for multi-channel encoding and decoding. This application simulates the slight rotation of the human head by modulating the amplitude values of the left and right channels of the front and rear stereo pairs.

在立体声转化为多通道音频时，有研究者（Christof Faller,Matrix SurroundRevisited,the AES 30th International Conference,Saariselka,Finland,2007）提出先将直达声和背景声分离，提高直达声的信噪比，然后再对直达声做立体声到多通道的转化。When converting stereo sound to multi-channel audio, some researchers (Christof Faller, Matrix Surround Revisited, the AES 30th International Conference, Saariselka, Finland, 2007) proposed to separate the direct sound from the background sound first, improve the signal-to-noise ratio of the direct sound, and then Then convert the direct sound from stereo to multi-channel.

主要步骤如下：The main steps are as follows:

⑴编码端：⑴ Encoding end:

·将多通道音频信号分成前方信号和后方信号，然后根据信号分配原则，将前方信号分配到前面的一对虚拟立体声，后方信号分配到后面的一对虚拟立体声。得到前后两对立体声。Divide the multi-channel audio signal into the front signal and the rear signal, and then according to the signal distribution principle, distribute the front signal to the front pair of virtual stereos, and the rear signal to the rear pair of virtual stereos. Get two pairs of stereo front and rear.

·设定人头转动模式（转向、顺序），前后两对立体声分别对应各自的转动模式。调制前后立体声信号的左右通道幅度模拟人头左右来回微小转动。如果只有前后两帧信号进行分辨时，前后两对立体声左右通道的幅值变化必须要不同才能进行分辨。如果使用前后几帧信号进行分辨时，前后两对立体声左右通道的幅值在这几帧中的变化趋势要有所区别，不能完全相同。·Set the head rotation mode (direction, sequence), and the front and rear stereo pairs correspond to their respective rotation modes. The left and right channel amplitudes of the front and rear stereo signals are modulated to simulate the tiny rotation of the human head back and forth. If only two frames of signals before and after can be distinguished, the amplitude changes of the two pairs of stereo left and right channels must be different before they can be distinguished. If the signals of several frames before and after are used for discrimination, the variation trends of the amplitudes of the two pairs of stereo left and right channels in these frames must be different and cannot be completely the same.

·将前后两对立体声，分别左通道相加、右通道相加，向下混合成一对立体声。·Add the left channel and the right channel of the front and rear stereo pairs respectively, and mix down to form a stereo pair.

⑵解码端：⑵ Decoder:

·将立体声信号转化到时频域处理，将下混合后双通道信号中的直达声和背景声分离（各得到一对立体声信号），提高直达声信噪比。·Convert the stereo signal to the time-frequency domain for processing, separate the direct sound and the background sound in the down-mixed dual-channel signal (each obtain a pair of stereo signals), and improve the signal-to-noise ratio of the direct sound.

·同时将原双通道立体声信号在每一个时频点将左右通道相除，观察前后两帧（或更多帧）左右通道比值的变化，然后对比之前设定好的前后2对立体声左右通道比值的变化，判断出该频点属于前方或后方，将原双通道立体声的时频点换成相对应的直达声的时频点，最后将时频域信号反变换到时域信号，解出对应直达声的前后方两对立体声信号。At the same time, the original dual-channel stereo signal is divided by the left and right channels at each time-frequency point, observe the changes in the ratio of the left and right channels in the two frames (or more frames) before and after, and then compare the ratio of the two pairs of stereo left and right channels set before and after The change of the frequency point is judged to belong to the front or the rear, and the time-frequency point of the original dual-channel stereo is replaced by the time-frequency point of the corresponding direct sound. Finally, the time-frequency domain signal is inversely transformed into the time-domain signal, and the corresponding Two pairs of stereo signals front and rear of the direct sound.

·信号分配：将两对直达声按其方位，通过立体声信号分配原则分配到最近的两个通道；将背景声直接加到左前、右前通道；背景声做一定延迟，通过低通滤波器，分配到左后、右后通道。·Signal distribution: distribute the two pairs of direct sounds to the nearest two channels according to their orientation; add the background sound to the left front and right front channels directly; the background sound is delayed and distributed through a low-pass filter To the left and right rear passages.

在此基础之上，本发明提供了一种多通道音频编解码系统，包括：多通道音频声源、扬声器、声信号AD/DA转化器、功率放大器、多通道音频编码器、多通道音频解码器和声信号传输设备，其特征在于：On this basis, the present invention provides a kind of multi-channel audio codec system, comprising: multi-channel audio sound source, loudspeaker, acoustic signal AD/DA converter, power amplifier, multi-channel audio encoder, multi-channel audio decoding Device and acoustic signal transmission equipment, characterized in that:

所述的多通道音频声源，用于产生多通道音频信号，输出到多通道音频编码器的输入端；The multi-channel audio sound source is used to generate a multi-channel audio signal, which is output to the input end of a multi-channel audio encoder;

所述的多通道音频编码器，用于将多通道音频信号分成前方信号和后方信号，将前方信号分配到前面的一对虚拟立体声信号、后方信号分配到后面的一对虚拟立体声信号，得到前后两对立体声信号；然后，模拟人头的微小转动，根据设定的前后方的人头转动模式对前后两对立体声信号分别调制其左右通道的幅值，前后两对立体声左右通道的幅度变化趋势不能相同，必须要有区别；最后，将前后两对立体声信号向下混合成一对立体声信号输出到声信号传输设备；The multi-channel audio encoder is used to divide the multi-channel audio signal into a front signal and a rear signal, distribute the front signal to a pair of virtual stereo signals in the front, and distribute the rear signal to a pair of virtual stereo signals in the back, to obtain front and rear Two pairs of stereo signals; then, to simulate the slight rotation of the human head, the amplitudes of the left and right channels of the front and rear pairs of stereo signals are respectively modulated according to the set front and rear head rotation mode. , there must be a difference; finally, the two pairs of stereo signals before and after are down-mixed into a pair of stereo signals and output to the sound signal transmission equipment;

所述的声信号传输设备，用于将多通道音频编码器输出的一对立体声信号传输到多通道音频解码器的输入端；The sound signal transmission device is used to transmit a pair of stereo signals output by the multi-channel audio encoder to the input end of the multi-channel audio decoder;

所述的多通道音频解码器，根据多通道音频编码器中设定的前后转动模式，将一对立体声信号分解成前后两对立体声信号，再根据扬声器配置，利用信号分配原则将两对立体声恢复出和原始环绕声相同空间定位的多通道音频信号；The multi-channel audio decoder decomposes a pair of stereo signals into two pairs of front and rear stereo signals according to the front and rear rotation mode set in the multi-channel audio encoder, and then restores the two pairs of stereo signals according to the speaker configuration by using the principle of signal distribution. Output a multi-channel audio signal with the same spatial positioning as the original surround sound;

所述的功率放大器，用于放大解码器和声信号DA转换器输出的多通道音频信号，并将信号输出到扬声器。The power amplifier is used to amplify the multi-channel audio signal output by the decoder and the acoustic signal DA converter, and output the signal to the speaker.

作为上述技术方案的一种改进，所述的多通道音频编码器和多通道音频解码器采用普通PC机或者数字信号处理器。As an improvement of the above technical solution, the multi-channel audio encoder and the multi-channel audio decoder adopt common PCs or digital signal processors.

作为上述技术方案的一种改进，所述的扬声器为包括两个或两个以上扬声器阵列，用于播放解码后的多通道音频信号。As an improvement of the above technical solution, the loudspeaker includes two or more loudspeaker arrays for playing the decoded multi-channel audio signal.

本发明还提供了一种多通道音频编解码方法，该方法在编码端先将多通道信号转化为前后两对立体声信号，然后加入模拟人头微小转动的动态信息，调制两对立体声信号做出相应的变化，向下混合成一对立体声信号输出，在解码端根据立体声信号解出动态信息，区分出前后两对立体声，具体包括如下步骤：The present invention also provides a multi-channel audio encoding and decoding method. At the encoding end, the multi-channel signal is converted into two pairs of stereo signals before and after, and then the dynamic information simulating the slight rotation of the human head is added to modulate the two pairs of stereo signals to make corresponding The change of the signal is mixed down into a pair of stereo signal output, and the dynamic information is solved according to the stereo signal at the decoding end, and the front and rear pairs of stereo are distinguished, which specifically includes the following steps:

（1）将多通道音频声源信号输出到多通道音频编码器的输入端；(1) Output the multi-channel audio source signal to the input end of the multi-channel audio encoder;

（2）通过多通道音频编码器将多通道音频信号分成前方信号和后方信号，然后根据信号分配原则，将前方信号分配到前面的一对虚拟立体声信号，后方信号分配到后面的一对虚拟立体声信号，得到前后两对立体声信号；(2) Divide the multi-channel audio signal into the front signal and the rear signal through the multi-channel audio encoder, and then according to the signal distribution principle, distribute the front signal to the front pair of virtual stereo signals, and the rear signal to the rear pair of virtual stereo signals signal, to obtain two pairs of stereo signals before and after;

（3）模拟人头的微小转动，根据设定的前后方的人头转动模式，按照一定的频率和转动模式对前后两对立体声信号分别调制其左右通道的幅值；前后两对立体声左右通道的幅度变化值不能相同，必须要有区别。(3) Simulate the tiny rotation of the human head, according to the set front and rear head rotation mode, modulate the amplitude of the left and right channels of the front and rear pairs of stereo signals according to a certain frequency and rotation mode; the amplitude of the front and rear pairs of stereo left and right channels The variation values cannot be the same, there must be a difference.

（4）将前后两对立体声信号分别进行左通道相加，右通道相加，向下混合成一对立体声信号，使用声信号传输设备传输到多通道音频解码器；(4) Add the left channel and the right channel of the two pairs of stereo signals before and after, mix them down into a pair of stereo signals, and transmit them to a multi-channel audio decoder using an acoustic signal transmission device;

（5）多通道音频解码器根据多通道音频编码器中设定好的前后转动模式，将一对立体声信号分解成前后两对立体声信号，再根据扬声器配置，利用信号分配原则将两对立体声恢复出和原始环绕声相同空间定位的多通道音频信号；(5) The multi-channel audio decoder decomposes a pair of stereo signals into two pairs of front and rear stereo signals according to the front and rear rotation mode set in the multi-channel audio encoder, and then restores the two pairs of stereo signals according to the speaker configuration by using the signal distribution principle Output a multi-channel audio signal with the same spatial positioning as the original surround sound;

（6）经过声信号DA转化器、功率放大器输出到扬声器进行播放。(6) Through the sound signal DA converter and power amplifier, it is output to the speaker for playback.

作为上述技术方案的一种改进，所述的步骤3）中模拟人头的转动频率一般在50Hz以上，为了让听者感觉不到声像移动。As an improvement of the above technical solution, the rotation frequency of the simulated human head in step 3) is generally above 50 Hz, so that the listener does not feel the movement of the sound image.

作为上述技术方案的一种改进，所述的步骤5）通过多通道音频解码器将双通道立体声信号转化到时频域处理，对信号做时频变换，在每一个时频点将左右通道相除，观察前后两帧或更多帧左右通道比值的变化，然后对比之前设定好的前后2对立体声左右通道比值的变化，判断出该频点属于前方或后方，最后将时频域信号反变换到时域信号，解出前后方两对立体声信号，根据扬声器配置将两对立体声变换成多通道音频信号。As an improvement of the above technical solution, the step 5) converts the two-channel stereo signal to the time-frequency domain through the multi-channel audio decoder, performs time-frequency transformation on the signal, and converts the left and right channels at each time-frequency point In addition, observe the changes in the left and right channel ratios of two or more frames before and after, and then compare the changes in the ratios of the two pairs of stereo left and right channels set before, to determine whether the frequency point belongs to the front or the rear, and finally reverse the time-frequency domain signal Convert to time-domain signals, solve two pairs of stereo signals, front and rear, and convert the two pairs of stereo signals into multi-channel audio signals according to the speaker configuration.

作为上述技术方案的一种改进，所述的步骤5）通过多通道音频解码器将双通道立体声信号转化到时频域处理，采用信号处理方法将下混合后的双通道信号中的直达声成分和背景声成分分离，各得到一对立体声信号，能够提高直达声信噪比；同时原双通道立体声信号在每一个时频点将左右通道相除，观察前后两帧或更多帧左右通道比值的变化，然后对比之前设定好的前后2对立体声左右通道比值的变化，判断出该频点属于前方或后方，将原双通道立体声的时频点换成相对应的直达声的时频点，最后将时频域信号反变换到时域信号，解出对应直达声的前后方两对立体声信号；根据扬声器配置将两对直达声立体声和一对背景声立体声变换成多通道音频信号。As an improvement of the above technical solution, the step 5) converts the two-channel stereo signal to time-frequency domain processing through a multi-channel audio decoder, and adopts a signal processing method to convert the direct sound component in the down-mixed two-channel signal Separate from the background sound components to obtain a pair of stereo signals, which can improve the signal-to-noise ratio of the direct sound; at the same time, the original two-channel stereo signal divides the left and right channels at each time-frequency point, and observes the ratio of the left and right channels of two or more frames before and after Then compare the changes in the ratio of the two pairs of stereo left and right channels set before, and judge that the frequency point belongs to the front or rear, and replace the time-frequency point of the original dual-channel stereo with the time-frequency point of the corresponding direct sound , and finally inversely transform the time-frequency domain signal to the time-domain signal, and solve the two pairs of stereo signals corresponding to the direct sound, front and rear; transform the two pairs of direct sound stereo and one pair of background sound stereo into multi-channel audio signals according to the speaker configuration.

作为上述技术方案的一种改进，所述的步骤5）将两对直达声按其方位，通过立体声信号分配原则分配到最近的两个通道；将背景声直接加到左前、右前通道；背景声做一定延迟，通过低通滤波器，分配到左后、右后通道。As an improvement of the above technical solution, the step 5) assigns the two pairs of direct sounds to the nearest two channels according to their orientation through the principle of stereo signal distribution; directly adds the background sound to the left front and right front channels; the background sound Do a certain delay, pass through a low-pass filter, and distribute it to the left and right rear channels.

作为上述技术方案的一种改进，所述的方法包括步骤：As a kind of improvement of above-mentioned technical scheme, described method comprises steps:

步骤301：多通道音频编码器，数字信号处理器202读取5.1格式多通道音频声源201，左前、右前、中前、低音、左后、右后通道信号分别为L、R、C、LFE、Ls、和Rs，采样频率为44100Hz；Step 301: Multi-channel audio encoder, digital signal processor 202 reads 5.1 format multi-channel audio source 201, left front, right front, center front, bass, left rear, right rear channel signals are L, R, C, LFE respectively , Ls, and Rs, the sampling frequency is 44100Hz;

步骤302：假设在±45度、±135度方位有4个虚拟扬声器401、402、403和404，根据简单的信号分配原理，将5.1通道信号转化到假设的2对虚拟扬声器上，前后两对立体声分别为：Step 302: Assuming that there are 4 virtual speakers 401, 402, 403 and 404 in the directions of ±45 degrees and ±135 degrees, according to the simple signal distribution principle, convert the 5.1-channel signal to the hypothetical 2 pairs of virtual speakers, two pairs before and after Stereo are:

${L L}_{f f} = = L L * * cos cos ((((3030 - - 4545)) * * pi p / / 180180)) + + \sqrt{22} / / 22 * * ((C C + + LFE LFE)) + + R R * * cos cos ((((3030 + + 4545)) * * pi p / / 180180));;$

${R R}_{f f} = = L L * * cos cos ((((3030 + + 4545)) * * pi p / / 180180)) + + \sqrt{22} / / 22 * * ((C C + + LFE LFE)) + + R R * * cos cos ((((3030 - - 4545)) * * pi p / / 180180));;$

L_b=L_s；L _b = L _s ;

R_b=R_s；R _b = R _s ;

步骤303：设定前后方头部的转动模式，模拟人头微小转动，当头部逆时针转动时，相当于4个虚拟扬声器401、402、403、404同时顺时针转动，相应的各个通道的信号发生变化为：Step 303: Set the rotation mode of the front and rear heads to simulate the slight rotation of the human head. When the head rotates counterclockwise, it is equivalent to the simultaneous rotation of four virtual speakers 401, 402, 403, and 404 clockwise, and the corresponding signals of each channel changes to:

令前半平面的立体声L_f′=a₁*L_f，R_f′=a₂*R_f，Let the stereo of the front half plane L _f ′=a ₁ *L _f , R _f ′=a ₂ *R _f ,

后半平面的立体声L_b′=a₃*L_b,R_b′=a₄*R_b；Stereo L _b ′=a ₃ *L _b ,R _b ′=a ₄ *R _b for the rear half plane;

其中，a₁、a₂、a₃、a₄的选取需要符合调制后声像方位基本不变，只是声像宽度稍变宽，相比于调制前人耳感受无很大区别，为了使声音的强度和效果不产生大的变化，取值约在0.9-1.1之间，（a₁，a₂）其中一个取值大于1，另一个取值小于1，（a₃，a₄）同理。谁大谁小要根据扬声器的转动方向；Among them, the selection of a ₁ , a ₂ , a ₃ , and a ₄ should conform to the fact that the orientation of the sound image after modulation is basically unchanged, but the width of the sound image is slightly wider, which is not much different from the perception of the human ear before modulation. In order to make the sound The intensity and effect of the two do not change greatly, and the value is between 0.9-1.1. (a ₁ , a ₂ ) one of the values is greater than 1, and the other is less than 1. (a ₃ , a ₄ ) is the same . Who is big and who is small depends on the direction of rotation of the speaker;

当头部顺时针转动时，前后两对立体声左右通道的变化相反；When the head turns clockwise, the changes of the front and rear pairs of stereo left and right channels are opposite;

前半平面的立体声L_f'=a₂*L_f，R_f'=a₁*R_f，Stereo L _f '=a ₂ *L _f for the front half plane, R _f '=a ₁ *R _f ,

后半平面的立体声L_b′=a₄*L_b,R_b′=a₃*R_b；Stereo L _b ′=a ₄ *L _b ,R _b ′=a ₃ *R _b for the rear half plane;

步骤304：向下混合两对立体声为一对立体声：L=L_f′+L_b′，R=R_f′+R_b′，Step 304: Mix down two stereo pairs into a stereo pair: L=L _f ′+L _b ′, R=R _f ′+R _b ′,

步骤305：将（L,R）这对立体声利用广播发射端203转化为无线信号发射出去；Step 305: Use the broadcast transmitter 203 to convert the (L, R) stereo pair into a wireless signal and transmit it;

步骤306：用户使用广播接收端205接收（L,R）这对立体声，并将其解调成数字信号输入给多通道音频解码器；Step 306: The user uses the broadcast receiving terminal 205 to receive the (L, R) stereo pair, demodulates it into a digital signal, and inputs it to the multi-channel audio decoder;

步骤307：多通道音频解码器采用第二种解码方式，由数字信号处理器206对立体声信号做时频变换，分离直达声成分（L_d，R_d）和背景声成分（L_a，R_a），提高了直达声的信噪比；Step 307: The multi-channel audio decoder adopts the second decoding method, and the digital signal processor 206 performs time-frequency conversion on the stereo signal to separate the direct sound component (L _d , R _d ) and the background sound component (L _a , R _a ), improving the signal-to-noise ratio of the direct sound;

步骤308：同时观察下混后的立体声（L，R）前后两帧或更多帧左右通道比值的变化，然后对比之前设定好的前后两对立体声左右通道比值的变化，判断出该频点属于前方或后方，将混后的立体声（L，R）的时频点换成对应直达声（L_d，R_d）的时频点，最后将时频域信号反变换到时域信号，这样得到前后两对直达声成分的立体声信号；Step 308: Simultaneously observe the changes in the left and right channel ratios of two or more frames before and after the downmixed stereo (L, R), and then compare the changes in the ratios of the two pairs of stereo left and right channels set before to determine the frequency point Belonging to the front or rear, replace the time-frequency points of the mixed stereo (L, R) with the time-frequency points of the corresponding direct sound (L _d , R _d ), and finally inversely transform the time-frequency domain signal to the time-domain signal, so Obtain stereo signals of two pairs of direct sound components before and after;

步骤309：根据扬声器播放系统209将两对直达声立体声信号和一对背景声立体声信号转换为5.1多通道音频信号；Step 309: Convert two pairs of direct sound stereo signals and a pair of background sound stereo signals into 5.1 multi-channel audio signals according to the speaker playback system 209;

将直达声的每个通道信号，通过立体声信号分配原则分配到最近的两个扬声器通道；将背景声直接加到左前L、右前R通道；背景声做一定延迟，分配到左后Ls，右后Rs通道；对前方一对直达声做低通滤波处理后相加，分配到低音通道LFE。Distribute each channel signal of the direct sound to the nearest two speaker channels through the principle of stereo signal distribution; add the background sound directly to the left front L and right front R channels; do a certain delay for the background sound, and distribute it to the left rear Ls and right rear Rs channel: low-pass filter the front pair of direct sounds, add them, and distribute them to the bass channel LFE.

本申请的方法及其系统在编码时嵌入模拟人头微小转动的动态信息，一方面不影响下混合后立体声信号的听音效果，另一方面恢复多通道信号时能解出动态信息，区分声源的前后，明确声源定位。The method and system of the present application embed the dynamic information that simulates the slight rotation of the human head during encoding, on the one hand, it does not affect the listening effect of the down-mixed stereo signal, on the other hand, when restoring the multi-channel signal, it can solve the dynamic information and distinguish the sound source Before and after the sound source positioning.

本发明的优点在于，本发明的模拟人头微小转动来实现多通道音频信号之间的转换，与现有技术相比优点在于：The advantage of the present invention is that the micro-rotation of the simulated human head of the present invention realizes the conversion between multi-channel audio signals. Compared with the prior art, the advantages are:

1、与现有的矩阵编解码技术相比，本发明能够较好的进行后方声源定位，提高了声源的空间定位感。1. Compared with the existing matrix codec technology, the present invention can better localize the rear sound source and improve the sense of spatial localization of the sound source.

2、本发明很好的兼容了立体声信号的播放，解决了矩阵编解码的相位失真等产生的问题。2. The present invention is well compatible with the playback of stereo signals, and solves the problems caused by the phase distortion of matrix encoding and decoding.

总之，本发明针对传统的矩阵环绕（Matrix Surround）编解码存在的不足，将多通道信号转换成前后两对立体声信号，模拟人头微小转动的动态信息，调制前后两对立体声信号做出相应的变化，既不影响下混合后普通立体声的收听效果，同时接收端也可以解出动态信息，区分前后两对立体声，再分配到多通道系统播放，实现良好的环绕声重放，可用于广播系统，在家庭、汽车等私人空间能实现很好的环绕声重放。In short, the present invention aims at the deficiencies of the traditional Matrix Surround encoding and decoding, converts the multi-channel signal into two pairs of stereo signals before and after, simulates the dynamic information of the small rotation of the human head, and makes corresponding changes to the two pairs of stereo signals before and after modulation , which does not affect the listening effect of ordinary stereo after down-mixing, and at the same time, the receiving end can also solve the dynamic information, distinguish the two pairs of stereo before and after, and then distribute it to the multi-channel system for playback to achieve good surround sound playback, which can be used in broadcasting systems. It can realize very good surround sound playback in private spaces such as homes and cars.

附图说明 Description of drawings

图1是现有矩阵环绕编码下混合后的左右通道幅度比示意图；Fig. 1 is a schematic diagram of the amplitude ratio of the left and right channels after down-mixing in the existing matrix surround coding;

图2是本发明编解码系统的一个实施例的示意图；Fig. 2 is a schematic diagram of an embodiment of the codec system of the present invention;

图3是本发明编解码方法的一个实施例的流程图；Fig. 3 is a flowchart of an embodiment of the encoding and decoding method of the present invention;

图4是5.1多通道信号转化为前后两对立体声的示意图；Fig. 4 is a schematic diagram of converting 5.1 multi-channel signals into two pairs of stereo before and after;

图5是人头转动模型；Fig. 5 is a human head rotation model;

图6本发明使用ITU推荐矩阵编码下混合后的左右通道幅度比示意图；Fig. 6 present invention uses the ITU recommended matrix encoding to mix the left and right channel amplitude ratio schematic diagram;

图7是仿真实例中下混合后的立体声信号；Fig. 7 is the stereo signal after down-mixing in the simulation example;

图8是仿真实例中从立体声中分离出的背景声；Fig. 8 is the background sound separated from the stereo in the simulation example;

图9是仿真实例中直达声中分离出的前后声源。Figure 9 is the front and rear sound sources separated from the direct sound in the simulation example.

具体实施方式 Detailed ways

下面结合附图对本发明进行进一步说明。本发明的通过下面参照附图对本发明实施例的详细描述，本发明的上述和其他特点和优点将会变得更清楚，其中：The present invention will be further described below in conjunction with the accompanying drawings. The above-mentioned and other features and advantages of the present invention will become clearer by referring to the detailed description of the embodiments of the present invention below with reference to the accompanying drawings, wherein:

为了实现本发明的目的之一，提供了一套多通道音频编解码的实际系统，包括：多通道音频声源，扬声器，声信号AD/DA转化器，功率放大器，多通道音频编码器，多通道音频解码器，声信号传输设备。In order to realize one of the purposes of the present invention, a set of practical system of multi-channel audio codec is provided, comprising: multi-channel audio sound source, loudspeaker, sound signal AD/DA converter, power amplifier, multi-channel audio coder, multiple Channel audio decoder, acoustic signal transmission equipment.

所述的多通道音频声源，可以由麦克风阵列现场采集或者后期人工制作，输出到多通道音频编码器的输入端。The multi-channel audio sound source can be collected on-site by a microphone array or artificially produced in a later stage, and output to the input end of a multi-channel audio encoder.

所述的多通道音频编码器，首先将多通道音频信号分成前方信号和后方信号，然后分别转化成对应的两对立体声信号；模拟人头的微小转动，设定前后方的人头转动模式（顺序、方向），对前后两对立体声信号分别调制其左右通道的幅值；最后为了兼容立体声的传输和播放，将前后两对立体声信号向下混合成一对立体声信号。The multi-channel audio encoder first divides the multi-channel audio signal into a front signal and a rear signal, and then converts them into two corresponding pairs of stereo signals; simulates the slight rotation of the human head, and sets the front and rear head rotation modes (sequence, direction), respectively modulate the amplitudes of the left and right channels of the front and rear pairs of stereo signals; finally, in order to be compatible with stereo transmission and playback, the front and rear pairs of stereo signals are down-mixed into a pair of stereo signals.

所述的声信号传输设备，可以是无线广播的发射和接受通路或者网络等传输电路，将多通道音频编码器输出的立体声信号传输到多通道音频解码器的输入端。The sound signal transmission device may be a transmitting and receiving channel of wireless broadcasting or a transmission circuit such as a network, which transmits the stereo signal output by the multi-channel audio encoder to the input end of the multi-channel audio decoder.

所述的多通道音频解码器，根据编码器中设定好的前后转动模式，将一对立体声信号分解成前后两对立体声信号，再根据扬声器配置将两对立体声变换成多通道音频信号。The multi-channel audio decoder decomposes a pair of stereo signals into two pairs of front and rear stereo signals according to the front and rear rotation mode set in the encoder, and then transforms the two pairs of stereo signals into multi-channel audio signals according to the speaker configuration.

所述的功率放大器，用于放大解码器和声信号DA转换器输出的多通道音频信号，并将信号连接到扬声器。The power amplifier is used to amplify the multi-channel audio signal output by the decoder and the acoustic signal DA converter, and connect the signal to the loudspeaker.

所述的扬声器为包括两个或两个以上扬声器阵列，用于播放解码后的多通道音频信号。The loudspeaker includes two or more loudspeaker arrays for playing the decoded multi-channel audio signal.

上述技术方案中，所述的多通道音频编码器和多通道音频解码器采用普通PC机或者数字信号处理器。In the above technical solution, the multi-channel audio encoder and multi-channel audio decoder adopt common PC or digital signal processor.

本发明的新的多通道音频编解码方法在编码端模拟人头微小转动的动态信息，调制多通道信号做出相应的变化，向下混合成一对立体声信号输出，再在解码端根据立体声信号解出动态信息，区分出前后2对立体声。包括如下步骤：The new multi-channel audio encoding and decoding method of the present invention simulates the dynamic information of the slight rotation of the human head at the encoding end, modulates the multi-channel signal to make corresponding changes, mixes it downwards into a pair of stereo signal output, and then decodes it according to the stereo signal at the decoding end. Dynamic information, to distinguish the front and rear 2 pairs of stereo. Including the following steps:

（1）将多通道音频声源信号输出到多通道音频编码器的输入端。(1) Output the multi-channel audio source signal to the input end of the multi-channel audio encoder.

（2）编码器：首先将多通道音频信号分成前方信号和后方信号，然后利用信号分配原则分别转化成对应的两对立体声信号；(2) Encoder: first divide the multi-channel audio signal into the front signal and the rear signal, and then use the signal distribution principle to convert them into two corresponding pairs of stereo signals;

（3）模拟人头的微小转动，设定前后方的人头转动模式（顺序、方向），前后方立体声对应的转动模式可以相同也可以不同。为了让听者感觉不到声像移动，转动频率一般在50Hz以上。按照一定的频率和转动模式对前后两对立体声信号分别调制其左右通道的幅值；(3) Simulate the slight rotation of the human head, set the front and rear head rotation modes (order, direction), and the rotation modes corresponding to the front and rear stereo can be the same or different. In order for the listener not to feel the movement of the sound image, the rotation frequency is generally above 50Hz. According to a certain frequency and rotation mode, the amplitudes of the left and right channels are respectively modulated for the front and rear pairs of stereo signals;

（4）将前后两对立体声信号向下混合成一对立体声信号。使用声信号传输设备传输到多通道音频解码器。(4) Down-mix the two pairs of front and rear stereo signals into a pair of stereo signals. Transmit to a multi-channel audio decoder using an acoustic signal transmission device.

（5）解码器：可以有两种模式（可按需求选择）(5) Decoder: There are two modes (can be selected according to requirements)

1、将双通道立体声信号转化到时频域处理，对信号做时频变换，在每一个时频点将左右通道相除，观察前后两帧（或更多帧）左右通道比值的变化，然后对比之前设定好的前后2对立体声左右通道比值的变化，判断出该频点属于前方或后方，最后将时频域信号反变换到时域信号。解出前后方两对立体声信号。根据扬声器配置将两对立体声变换成多通道音频信号。1. Convert the dual-channel stereo signal to the time-frequency domain for processing, perform time-frequency transformation on the signal, divide the left and right channels at each time-frequency point, observe the changes in the ratio of the left and right channels in the two frames (or more frames) before and after, and then Comparing the changes in the ratio of the two pairs of stereo left and right channels set before, it is judged that the frequency point belongs to the front or the rear, and finally the time-frequency domain signal is inversely transformed into a time-domain signal. Solve two pairs of stereo signals, front and rear. Converts two stereo pairs into a multi-channel audio signal according to the speaker configuration.

2、将双通道立体声信号转化到时频域处理，采用信号处理方法将下混合后的双通道信号中的直达声成分和背景声成分分离（各得到一对立体声信号），能够提高直达声信噪比。同时原双通道立体声信号在每一个时频点将左右通道相除，观察前后两帧（或更多帧）左右通道比值的变化，然后对比之前设定好的前后2对立体声左右通道比值的变化，判断出该频点属于前方或后方，将原双通道立体声的时频点换成相对应的直达声的时频点，最后将时频域信号反变换到时域信号，解出对应直达声的前后方两对立体声信号。根据扬声器配置将两对直达声立体声和一对背景声立体声变换成多通道音频信号。2. Convert the dual-channel stereo signal to time-frequency domain processing, and use the signal processing method to separate the direct sound component and the background sound component in the down-mixed dual-channel signal (to obtain a pair of stereo signals for each), which can improve the direct sound signal noise ratio. At the same time, the original dual-channel stereo signal divides the left and right channels at each time-frequency point, observes the changes in the ratio of the left and right channels in the two frames (or more frames) before and after, and then compares the changes in the ratio of the two pairs of stereo left and right channels set before and after. , it is judged that the frequency point belongs to the front or the rear, and the time-frequency point of the original two-channel stereo is replaced by the time-frequency point of the corresponding direct sound, and finally the time-frequency domain signal is inversely transformed into a time-domain signal to solve the corresponding direct sound Two pairs of stereo signals front and rear. Converts two direct stereo pairs and one background stereo pair into a multi-channel audio signal according to the speaker configuration.

（6）经过声信号DA转化器，功率放大器输出到扬声器进行播放。(6) After the sound signal DA converter, the power amplifier outputs to the speaker for playback.

具体实施方案如下：The specific implementation plan is as follows:

如图2所示，本发明的一个实施例系统，包括了一组5.1系统的多通道音频声源201，5.1通道扬声器播放系统209，功率放大器208，声信号AD/DA转换器207，这里的AD/DA转换器采用的是普通电脑内置声卡或外置声卡，广播的发射端203，广播的接收端205，声信号传输设备204为无线传输（中波、短波、调频），多通道音频编码器202和解码器206利用的是广播站内的数字信号处理器。As shown in Figure 2, an embodiment system of the present invention has included the multi-channel audio sound source 201 of a group of 5.1 systems, 5.1 channel loudspeaker playback system 209, power amplifier 208, acoustic signal AD/DA converter 207, here The AD/DA converter adopts a common computer built-in sound card or an external sound card, the broadcast transmitter 203, the broadcast receiver 205, the sound signal transmission device 204 is wireless transmission (medium wave, short wave, FM), multi-channel audio coding The decoder 202 and decoder 206 utilize digital signal processors within the broadcasting station.

如图3所示，本实例中多通道音频编解码系统的具体步骤如下：As shown in Figure 3, the specific steps of the multi-channel audio codec system in this example are as follows:

步骤301：编码器，数字信号处理器202读取5.1格式多通道音频声源201，左前、右前、中前、低音、左后、右后通道信号分别为L、R、C、LFE、Ls、Rs。本实施例的采样频率为44100Hz。Step 301: Encoder, digital signal processor 202 read 5.1 format multi-channel audio source 201, left front, right front, center front, bass, left rear, right rear channel signals are L, R, C, LFE, Ls, respectively Rs. The sampling frequency of this embodiment is 44100Hz.

步骤302：如图4所示，假设在±45度、±135度方位有4个虚拟扬声器401、402、403、404，根据简单的信号分配原理，将5.1通道信号转化到假设的2对虚拟扬声器上，本实施例中，由于后方通道Ls，Rs已经构成一对立体声信号，因此，不必对后方通道进行转换，直接使用原后方通道的信号即可。Step 302: As shown in Figure 4, assuming that there are 4 virtual speakers 401, 402, 403, 404 in the directions of ±45 degrees and ±135 degrees, according to the simple signal distribution principle, convert the 5.1-channel signal into the hypothetical 2 pairs of virtual speakers On the loudspeaker, in this embodiment, since the rear channel Ls and Rs already constitute a pair of stereo signals, there is no need to convert the rear channel, and the original rear channel signal can be used directly.

前后两对立体声分别为：The front and back stereo pairs are:

L_b=L_s；L _b = L _s ;

R_b=R_s。R _b =R _s .

步骤303：设定前后方头部的转动模式（转向，顺序），模拟人头微小转动，如图5所示，当头部逆时针转动时，相当于扬声器401、402、403、404同时顺时针转动，相应的各个通道的信号也发生变化：Step 303: Set the rotation mode (steering, sequence) of the front and rear heads, simulating the small rotation of the human head, as shown in Figure 5, when the head rotates counterclockwise, it is equivalent to the speakers 401, 402, 403, 404 clockwise at the same time Rotate, the corresponding signal of each channel also changes:

令前半平面的立体声L_f'=a₁*L_f，R_f′=a₂*R_f，Let the stereo of the front half plane L _f '=a ₁ *L _f , R _f ′=a ₂ *R _f ,

其中，a₁、a₂、a₃、a₄的选取需要符合调制后声像方位基本不变，只是声像宽度稍变宽，相比于调制前人耳感受无很大区别。为了使声音的强度和效果不产生大的变化，一般取值约在0.9-1.1之间。（a₁，a₂）其中一个取值大于1，另一个取值小于1，（a₃，a₄）同理。谁大谁小要根据扬声器的转动方向，本实施例中扬声器顺时针转动则a₁<a₂,a₃>a₄。本实施例中令a₁=0.95，a₂=1.05，a₃=1.05，a₄=0.95。Among them, the selection of a ₁ , a ₂ , a ₃ , and a ₄ needs to conform to the fact that the orientation of the sound image after modulation is basically unchanged, but the width of the sound image is slightly wider, which is not much different from the perception of the human ear before modulation. In order to keep the intensity and effect of the sound from changing greatly, the value is generally between 0.9-1.1. (a ₁ , a ₂ ) one of the values is greater than 1, and the other value is less than 1, and (a ₃ , a ₄ ) is the same. Who is big and who is small depends on the rotation direction of the speaker. In this embodiment, if the speaker rotates clockwise, a ₁ _{_{<a 2 , a 3 >}} a ₄ . In this embodiment, a ₁ =0.95, a ₂ =1.05, a ₃ =1.05, a ₄ =0.95.

当头部顺时针转动时，前后通道的变化相反。When the head is turned clockwise, the changes in the anterior and posterior channels are reversed.

前半平面的立体声L_f'=a₂*L_f，R_f′=a₁*R_f，Stereo in the front half plane L _f '=a ₂ *L _f , R _f ′=a ₁ *R _f ,

下混合过程相当于使用了ITU推荐矩阵 $M = (\begin{matrix} 1 & 0 & 1 / \sqrt{2} & a & 0 \\ 0 & 1 & 1 / \sqrt{2} & 0 & a \end{matrix}),$ 其中a=1。The down-mixing process is equivalent to using the ITU recommended matrix $m = (\begin{matrix} 1 & 0 & 1 / \sqrt{2} & a & 0 \\ 0 & 1 & 1 / \sqrt{2} & 0 & a \end{matrix}),$ where a=1.

如图6所示，当a=1时左右通道幅度之比与声源角度的关系。根据此立体声信号无法定位声源前后位置，但加入动态信息，并且在解码端解出此信息后能解决此问题。本方法利用人头动态信息，首先区分出前后方声源，然后根据图6所示，前后方声源角度与左右通道的比值分别是一一对应的，由此来确定声源的具体位置。As shown in Figure 6, when a=1, the relationship between the ratio of the amplitude of the left and right channels and the angle of the sound source. Based on this stereo signal, it is impossible to locate the front and rear positions of the sound source, but adding dynamic information and solving this problem at the decoding end can solve this problem. This method utilizes head dynamic information to first distinguish the front and rear sound sources, and then according to Figure 6, the ratio of the front and rear sound source angles to the left and right channels is one-to-one correspondence, thereby determining the specific location of the sound source.

本实施例中，我们设定头部先顺时针转，然后逆时针转，依次来回转动。前后方立体声对应的头部的转动模式相同。转动频率大于50Hz时，人耳几乎感觉不出声像移动，本实施例中，采样频率为44100Hz，选取在时域上每448点转动一次。In this embodiment, we set the head to turn clockwise first, then counterclockwise, and turn back and forth in turn. The head rotation pattern corresponding to the front and rear stereo is the same. When the rotation frequency is greater than 50 Hz, the human ear can hardly feel the movement of the sound image. In this embodiment, the sampling frequency is 44100 Hz, and the rotation is selected every 448 points in the time domain.

步骤305：将（L,R）这对立体声利用广播发射端203转化为无线信号发射出去。Step 305: Convert the stereo pair (L, R) into a wireless signal by the broadcast transmitter 203 and transmit it.

步骤306：用户使用广播接受端205接收（L,R）这对立体声，并将其解调成数字信号输入给多通道音频解码器。Step 306: The user uses the broadcast receiving terminal 205 to receive the (L, R) stereo pair, demodulates it into a digital signal, and inputs it to the multi-channel audio decoder.

步骤307：解码器（采用第二种解码方式），数字信号处理器206对立体声信号做时频变换，分离直达声成分（L_d，R_d）和背景声成分（L_a，R_a），提高了直达声的信噪比。Step 307: Decoder (using the second decoding method), the digital signal processor 206 performs time-frequency conversion on the stereo signal, and separates the direct sound component (L _d , R _d ) and the background sound component (L _a , R _a ), The signal-to-noise ratio of the direct sound is improved.

步骤308：同时观察下混后的立体声（L，R）前后两帧（或更多帧）左右通道比值的变化，然后对比之前设定好的前后2对立体声左右通道比值的变化，判断出该频点属于前方或后方，将（L，R）的时频点换成对应直达声（L_d，R_d）的时频点，最后将时频域信号反变换到时域信号，这样得到前后两对直达声成分的立体声信号。Step 308: Simultaneously observe the changes in the left and right channel ratios of the downmixed stereo (L, R) two frames (or more frames) before and after, and then compare the changes in the ratios of the two pairs of stereo left and right channels set before and after, to determine the The frequency point belongs to the front or the rear, replace the time-frequency point of (L, R) with the time-frequency point corresponding to the direct sound (L _d , R _d ), and finally inversely transform the time-frequency domain signal into the time-domain signal, so that the front and rear Stereo signal of two pairs of direct sound components.

步骤309：根据扬声器播放系统209将两对直达声立体声信号和一对背景声立体声信号转换为5.1多通道音频信号。Step 309: Convert two pairs of direct sound stereo signals and a pair of background sound stereo signals into 5.1 multi-channel audio signals according to the speaker playback system 209 .

将直达声的每个通道信号，通过立体声信号分配原则分配到最近的两个扬声器通道；将背景声直接加到左前L、右前R通道；背景声做一定延迟，分配到左后Ls，右后Rs通道；对前方一对直达声做低通滤波处理后相加，分配到低音通道LFE。Distribute each channel signal of the direct sound to the nearest two speaker channels through the principle of stereo signal distribution; add the background sound directly to the left front L and right front R channels; make a certain delay for the background sound and distribute it to the left rear Ls and right rear Rs channel: low-pass filter the front pair of direct sounds, add them, and distribute them to the bass channel LFE.

本实施例中模拟的5.1系统声源信号，当模拟的是其他多通道音频声源时，本发明同样适用，可以使用上述步骤实现多通道音频的编解码。The simulated 5.1 system sound source signal in this embodiment is also applicable to the simulation of other multi-channel audio sound sources, and the above steps can be used to realize multi-channel audio encoding and decoding.

本实施例中设定的人头转动模式为前后方相同的模式，即先顺时针转，后逆时针转，依次来回转动。同样可以设定前后方人头不同的转动模式，如按照伪随机序列的值（M序列、Gold码等）来设定转动的方向和顺序。人头转动模式的选择要能区分出前后方声源（前后两对立体声左右通道的变化不同）。The human head rotation mode set in this embodiment is the same mode at the front and rear, that is, it rotates clockwise first, then counterclockwise, and then rotates back and forth in turn. It is also possible to set different rotation modes for the front and rear heads, such as setting the direction and order of rotation according to the value of the pseudo-random sequence (M sequence, Gold code, etc.). The selection of the head rotation mode should be able to distinguish the front and rear sound sources (the changes of the front and rear two pairs of stereo left and right channels are different).

仿真实验Simulation

本实验使用Matlab软件仿真。使用双声源做测试：假设前方立体声为30度方位的女声语音，后方立体声为40度方位的男声语音，用白噪声作为背景声。This experiment uses Matlab software simulation. Use dual sound sources for testing: Assume that the front stereo is a female voice at a 30-degree orientation, the rear stereo is a male voice at a 40-degree orientation, and white noise is used as the background sound.

编码端：分别对前后方立体声左右通道信号做调制，然后向下混合成一对立体声，如图7所示。Encoder: Modulate the front and rear stereo left and right channel signals respectively, and then down-mix them into a pair of stereo, as shown in Figure 7.

解码端：首先分离出直达声和背景声（背景声如图8所示），同时根据编码端人头动态信息将一对直达声区分出前后两对立体声，如图9所示。Decoder: first separate the direct sound and the background sound (the background sound is shown in Figure 8), and at the same time distinguish a pair of direct sounds into two pairs of stereo before and after according to the dynamic information of the head at the encoding end, as shown in Figure 9.

进行主观测听实验，结果是：将背景声与直达声在一定程度上分离了，提高了直达声的信噪比；分别试听前后两对立体声，很明显能区分男女声源，但是仍存在微小的串声；当使用多通道播放系统重放时，由于掩蔽效应，几乎听不出串声，很好地区分了前后两对立体声。Conducting subjective listening experiments, the result is: the background sound is separated from the direct sound to a certain extent, and the signal-to-noise ratio of the direct sound is improved; the two pairs of stereo before and after listening to it are obviously able to distinguish between male and female sound sources, but there are still slight differences. When using a multi-channel playback system to replay, due to the masking effect, almost no crosstalk can be heard, and the front and rear pairs of stereo are well distinguished.

综上，本发明利用模拟人头动态信息的环绕声编解码，很好地实现了多通道与双通道立体声的高质量转换兼容，也可以在加入动态信息后采用矩阵环绕编码方式，兼容矩阵环绕解码器，目前两种工作模式的原理验证和方法已初步实现。To sum up, the present invention utilizes the surround sound codec that simulates the dynamic information of the human head to achieve high-quality conversion compatibility between multi-channel and dual-channel stereo, and can also adopt matrix surround encoding after adding dynamic information, which is compatible with matrix surround decoding. At present, the principle verification and methods of the two working modes have been preliminarily realized.

最后所应说明的是，以上实施例仅用以说明本发明的技术方案而非限制。尽管参照实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，对本发明的技术方案进行修改或者等同替换，都不脱离本发明技术方案的精神和范围，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention rather than limit them. Although the present invention has been described in detail with reference to the embodiments, those skilled in the art should understand that modifications or equivalent replacements to the technical solutions of the present invention do not depart from the spirit and scope of the technical solutions of the present invention, and all of them should be included in the scope of the present invention. within the scope of the claims.

Claims

1. a multi-channel audio coding/decoding system, comprising: multi-channel audio sound source, loudspeaker, acoustical signal AD/DA converter, power amplifier, multi-channel audio decoder, multi-channel audio demoder harmony signal transmission apparatus, is characterized in that:

Described multi-channel audio sound source, for generation of multi-channel audio signal, outputs to the input end of multi-channel audio decoder;

Described multi-channel audio decoder, for multi-channel audio signal is divided into front signal and rear signal, a pair of virtual three-dimensional acoustical signal, rear signal that front signal is assigned to are above assigned to a pair of virtual three-dimensional acoustical signal below, two stereophonic signal before and after obtaining; Then, the minor rotation of phantom bead, the amplitude of front and back two stereophonic signal being modulated respectively to its left and right passage according to the forward-and-rearward number of people rotation mode of setting, the changes in amplitude trend of the two pairs of stereo left and right passages in front and back is different; Finally, become a stereophonic signal to output to acoustical signal transmission equipment front and back two stereophonic signal downmixes;

Described acoustical signal transmission equipment, for being transferred to a stereophonic signal of multi-channel audio decoder output the input end of multi-channel audio demoder;

Described multi-channel audio demoder, according to the front and back rotation mode of setting in multi-channel audio decoder, two stereophonic signal before and after one stereophonic signal is resolved into, according to speaker configurations, utilize signal distribution principle by two pairs of stereo multi-channel audio signals of locating with original surround sound same space that recover again;

Described power amplifier, for amplifying the multi-channel audio signal of demoder and the output of acoustical signal DA converter, and outputs to loudspeaker by signal;

The step of the minor rotation of the phantom bead of described multi-channel audio decoder comprises:

Step 301: multi-channel audio decoder, digital signal processor (202) reads 5.1 form multi-channel audio sound sources (201), left front, right front, in before, bass, left back, right back channel signal be respectively L, R, C, LFE, Ls and Rs;

Step 302: supposing has 4 virtual speakers (401,402,403 and 404) in ± 45 degree, ± 135 degree orientation, according to simple signal distribution principle, 5.1 channel signals are transformed on 2 pairs of virtual speakers of hypothesis to stereo being respectively in two pairs of front and back:

L_{f} = L * \cos ((30 - 45) * pi / 180) + \sqrt{2} / 2 * (C + LFE) + R * \cos ((30 + 45) * pi / 180)

R_{f} = L * \cos ((30 + 45) * pi / 180) + \sqrt{2} / 2 * (C + LFE) + R * \cos ((30 - 45) * pi / 180)

L _b＝L _s；

R _b＝R _s；

Step 303: the rotation mode of setting front and back head, phantom bead minor rotation, when head rotates counterclockwise, be equivalent to 4 virtual speakers (401,402,403,404) and clockwise rotate simultaneously, accordingly the signal of each passage change into:

The stereo L of half-plane before order _f'=a ₁* L _f, R _f'=a ₂* R _f,

The stereo L of rear half-plane _b'=a ₃* L _b, R _b'=a ₄* R _b;

Wherein, a ₁, a ₂, a ₃, a ₄choose need to meet modulation after acoustic image orientation substantially constant, just dimension slightly broadens, than the impression of modulation forefathers ear without very large difference, in order to make intensity and the effect of sound not produce large variation, value between 0.9-1.1, a ₁or a ₂in a value be greater than 1, another value is less than 1, whose large whose rootlet determines according to the rotation direction of loudspeaker; a ₃or a ₄in a value be greater than 1, another value is less than 1, whose large whose rootlet determines according to the rotation direction of loudspeaker;

When head clockwise rotates, the variation of front and back passage is contrary;

The stereo L of front half-plane _f'=a ₂* L _f, R _f'=a ₁* R _f,

The stereo L of rear half-plane _b'=a ₄* L _b, R _b'=a ₃* R _b;

Step 304: two pairs of downmixes are stereo is a pair of stereo: L=L _f'+L _b', R=R _f'+R _b';

Step 305: utilize broadcast transmission end (203) to be converted into wireless signal transmission to stereo L and R this and go out;

Step 306: user uses broadcast receiving terminal (205) to receive this to stereo L and R, and is demodulated into digital signal and inputs to multi-channel audio demoder;

Step 307: multi-channel audio demoder is decoded, does time-frequency conversion by digital signal processor (206) stereophonic signal, separated direct sound wave composition L _dand R _dwith background sound composition L _aand R _a, improved the signal to noise ratio (S/N ratio) of direct sound wave;

Step 308: simultaneously observe stereo L and R front and back two frames or the more variation of multiframe left and right passage ratio after lower mixing, the variation of the two pairs of stereo left and right passage ratios in front and back that then set before contrast, while judging this, frequency belongs to the place ahead or rear, changes the stereo L after lower mixing and the time frequency of R into corresponding direct sound wave L _dand R _dtime frequency, finally by time-frequency domain signal inverse transformation to time-domain signal, the stereophonic signal of two pairs of direct sound wave compositions before and after obtaining like this;

Step 309: two pairs of direct sound wave stereophonic signals and a pair of background sound stereophonic signal are converted to 5.1 multi-channel audio signals according to loudspeaker Play System (209);

By each channel signal of direct sound wave, by stereophonic signal distribution principle, be assigned to two nearest loudspeaker channel; Background sound is directly added to left front L, right front R passage; Background sound is done certain delay, is assigned to left back Ls, right back Rs passage; After a pair of direct sound wave in the place ahead is done to low-pass filtering treatment, be added, be assigned to bass channel LFE.

2. multi-channel audio coding/decoding system according to claim 1, is characterized in that, described multi-channel audio decoder and multi-channel audio demoder adopt ordinary PC or digital signal processor.

3. multi-channel audio coding/decoding system according to claim 1, is characterized in that, described loudspeaker is for comprising two or more loudspeaker arrays, for the multi-channel audio signal after broadcast decoder.

4. a multi-channel audio decoding method, the method is two stereophonic signal before and after coding side is first converted into multi channel signals, then the multidate information that adds phantom bead minor rotation, modulate two stereophonic signal and make corresponding variation, downmix becomes a stereophonic signal output, in decoding end, according to stereophonic signal, solve multidate information, distinguish two pairs of front and back stereo, specifically comprise the steps:

(1) multi-channel audio sound-source signal is outputed to the input end of multi-channel audio decoder;

(2) by multi-channel audio decoder, multi-channel audio signal is divided into front signal and rear signal, then according to signal distribution principle, front signal is assigned to a pair of virtual three-dimensional acoustical signal above, rear signal is assigned to a pair of virtual three-dimensional acoustical signal below, two stereophonic signal before and after obtaining;

(3) minor rotation of phantom bead, according to the forward-and-rearward number of people rotation mode of setting, the amplitude of front and back two stereophonic signal being modulated respectively to its left and right passage according to certain frequency and rotation mode, the changes in amplitude trend of the two pairs of stereo left and right passages in front and back is different;

(4) front and back two stereophonic signal are carried out respectively to left passage addition, the addition of right passage, downmix becomes a stereophonic signal, uses acoustical signal transmission equipment to be transferred to multi-channel audio demoder;

(5) multi-channel audio demoder is according to the front and back rotation mode setting in multi-channel audio decoder, two stereophonic signal before and after one stereophonic signal is resolved into, according to speaker configurations, utilize signal distribution principle by two pairs of stereo multi-channel audio signals of locating with original surround sound same space that recover again;

(6) through acoustical signal DA converter, power amplifier, output to loudspeaker and play;

Described step (3) comprising:

L_{f} = L * \cos ((30 - 45) * pi / 180) + \sqrt{2} / 2 * (C + LFE) + R * \cos ((30 + 45) * pi / 180)

R_{f} = L * \cos ((30 + 45) * pi / 180) + \sqrt{2} / 2 * (C + LFE) + R * \cos ((30 - 45) * pi / 180)

L _b＝L _s；

R _b＝R _s；

The stereo L of half-plane before order _f'=a ₁* L _f, R _f'=a ₂* R _f,

The stereo L of rear half-plane _b'=a ₃* L _b, R _b'=a ₄* R _b;

The stereo L of front half-plane _f'=a ₂* L _f, R _f'=a ₁* R _f,

The stereo L of rear half-plane _b'=a ₄* L _b, R _b'=a ₃* R _b;

Step 308: simultaneously observe stereo L and R front and back two frames or the more variation of multiframe left and right passage ratio after lower mixing, the variation of the two pairs of stereo left and right passage ratios in front and back that then set before contrast, while judging this, frequency belongs to the place ahead or rear, the time frequency of stereo L after lower mixing and R is changed into the time frequency of corresponding direct sound wave Ld and Rd, finally time-frequency domain signal inverse transformation is arrived to time-domain signal, the stereophonic signal of two pairs of direct sound wave compositions before and after obtaining like this;

5. multi-channel audio decoding method according to claim 4, is characterized in that, the rotational frequency of phantom bead is more than 50Hz in described step (3), and imperceptible acoustic image moves when allowing hearer use normal stereo to listen to.

6. multi-channel audio decoding method according to claim 4, it is characterized in that, described step (5) is transformed into time-frequency domain by multi-channel audio demoder by dual-channel stereo signal and processes, signal is done to time-frequency conversion, at each time-frequency left and right passage of naming a person for a particular job, be divided by, two frames or the more variation of multiframe left and right passage ratio before and after observing, the variation of the two pairs of stereo left and right passage ratios in front and back that then set before contrast, while judging this, frequency belongs to the place ahead or rear, finally time-frequency domain signal inverse transformation is arrived to time-domain signal, solve front and back two stereophonic signal, according to speaker configurations by two pairs of stereo multi-channel audio signals that are transformed into.

7. multi-channel audio decoding method according to claim 4, it is characterized in that, described step (5) is transformed into time-frequency domain by multi-channel audio demoder by dual-channel stereo signal and processes, adopt signal processing method by the direct sound wave composition in lower mixed double-channel signal and background sound component separating, respectively obtain a pair of stereophonic signal, can improve direct sound wave signal to noise ratio (S/N ratio); Simultaneously former dual-channel stereo signal is divided by each time-frequency left and right passage of naming a person for a particular job, two frames or the more variation of multiframe left and right passage ratio before and after observing, the variation of the two pairs of stereo left and right passage ratios in front and back that then set before contrast, while judging this, frequency belongs to the place ahead or rear, when former binary channels is stereosonic, frequency changes the time frequency of corresponding direct sound wave into, finally time-frequency domain signal inverse transformation is arrived to time-domain signal, solve front and back two stereophonic signal of corresponding direct sound wave; According to speaker configurations by two pairs of stereo multi-channel audio signals that are transformed into of the stereo and a pair of background sound of direct sound wave.

8. multi-channel audio decoding method according to claim 7, is characterized in that, described step (5) by its orientation, is assigned to nearest two passages by stereophonic signal distribution principle by two pairs of direct sound waves; Background sound is directly added to left front, right front passage; Background sound is done certain delay, by low-pass filter, is assigned to left back, right back passage.