CN101414463B - A kind of sound mixing coding method, device and system - Google Patents
A kind of sound mixing coding method, device and system Download PDFInfo
- Publication number
- CN101414463B CN101414463B CN2007101813767A CN200710181376A CN101414463B CN 101414463 B CN101414463 B CN 101414463B CN 2007101813767 A CN2007101813767 A CN 2007101813767A CN 200710181376 A CN200710181376 A CN 200710181376A CN 101414463 B CN101414463 B CN 101414463B
- Authority
- CN
- China
- Prior art keywords
- audio
- audio mixing
- stream
- mixing
- codes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Telephonic Communication Services (AREA)
Abstract
Description
技术领域technical field
本发明涉及多媒体通信技术领域,特别涉及一种混音编码方法、装置和系统。The present invention relates to the technical field of multimedia communication, in particular to a sound mixing encoding method, device and system.
背景技术Background technique
目前,实时多媒体通信服务的应用越来越多,用以满足日益增长的业务需求,例如多媒体会议系统等等,因此各种多媒体会议系统相关技术显得十分重要。At present, there are more and more applications of real-time multimedia communication services, such as multimedia conferencing systems, etc., to meet growing service requirements. Therefore, various technologies related to multimedia conferencing systems are very important.
在多媒体会议中,音频互动是最基本的要素。在集中式会议中,各个终端都与多点控制单元(Multi-point Controlling Unit,MCU)建立基于单播(unicast)的连接,实时地向MCU发送音频码流和从MCU接收音频码流。因此,MCU的输入均是各种编码方案编码后的音频码流,其输出为按照合成策略进行混音处理后的音频码流。In multimedia conferencing, audio interaction is the most basic element. In a centralized conference, each terminal establishes a unicast-based connection with a Multi-point Controlling Unit (MCU), and sends and receives audio streams to and from the MCU in real time. Therefore, the input of the MCU is the audio code stream encoded by various coding schemes, and its output is the audio code stream after the mixing process according to the synthesis strategy.
如图1所示为一个多媒体会议系统示意图,其中虚线框可以看作一个MCU单元。终端位置1,终端位置2等输入音频码流经分别解码,解码后的音频码流在混音单元混音后,再对混音后的音频码流分别进行编码,再输出到相应的终端。如图1所示的多媒体会议系统,有M个终端参与混音。对于特定的时刻t,每个终端会将音频数据送与MCU,MCU首先将音频数据解码,并对每路信号进行混音参数的计算,最终对多路解码信号进行混音处理。混音处理的常用算法即加和所有路解码数据,将加和后的数据再通过编码器编码,最终传送至各个终端。Figure 1 is a schematic diagram of a multimedia conference system, where the dotted box can be regarded as an MCU unit. The input audio streams at terminal position 1, terminal position 2, etc. are respectively decoded, and the decoded audio streams are mixed by the mixing unit, and then the mixed audio streams are respectively encoded, and then output to the corresponding terminals. In the multimedia conference system shown in Figure 1, there are M terminals participating in audio mixing. For a specific time t, each terminal sends audio data to the MCU, and the MCU first decodes the audio data, calculates the mixing parameters for each signal, and finally performs mixing processing on the multiple decoded signals. The commonly used algorithm for audio mixing processing is to add all the decoded data, and then encode the added data through the encoder, and finally transmit it to each terminal.
采用上述的时域叠加混音方案,常常会引入噪声。这是因为每一个终端在向MCU传送的音频信号都有一定的范围[min,max],其中min表示范围的下限,max表示范围的上限。当直接加和所有路信号时,很可能会超出信号取值范围[min,max]。由于数字音频信号存在量化上限和下限的问题,叠加运算很可能会造成结果溢出。通常的处理手段是进行溢出检测,然后再进行饱和运算,即超过上限的结果被置为上限值,超过下限的值置为下限值。这种运算本身破坏了语音信号原有的时域特征,从而引入了噪声,这就是在某些系统中会出现爆破声和语音不连续现象的原因。With the above-mentioned time-domain superposition and mixing scheme, noise is often introduced. This is because the audio signal transmitted by each terminal to the MCU has a certain range [min, max], where min represents the lower limit of the range, and max represents the upper limit of the range. When adding all signals directly, it is likely to exceed the signal value range [min, max]. Due to the upper and lower limits of quantization of digital audio signals, the superposition operation is likely to cause the result to overflow. The usual processing method is to perform overflow detection, and then perform saturation operation, that is, the result exceeding the upper limit is set as the upper limit value, and the value exceeding the lower limit is set as the lower limit value. This operation itself destroys the original time-domain characteristics of the speech signal, thereby introducing noise, which is why there are pops and speech discontinuities in some systems.
随着参与混音的终端数据增加,出现溢出的频率也不断上升,所以这类时域叠加混音方案存在一个终端数目上限,而且这个上限值很低,实验证明,很多情况下,如果在4个终端参与混音时其结果就有很多噪声和断续,无法分辨语流了。As the data of the terminals participating in the mixing increases, the frequency of overflow also continues to rise. Therefore, there is an upper limit for the number of terminals in this time-domain superposition mixing solution, and the upper limit is very low. Experiments have proved that in many cases, if the When 4 terminals participate in the audio mixing, the result is a lot of noise and interruptions, and the speech flow cannot be distinguished.
发明内容Contents of the invention
有鉴于此,本发明实施例提出一种混音编码方法,能够克服现有技术中时域混音编码的噪声问题。所述混音编码方法包括如下步骤:In view of this, an embodiment of the present invention proposes a mixing coding method, which can overcome the noise problem of time-domain mixing coding in the prior art. Described sound mixing coding method comprises the steps:
对声音信息根据混音策略设置混音标识位,根据混音标识位信息对所述声音信息进行编码,编码的结果作为核心编码数据;Set the mixing flag for the sound information according to the mixing strategy, encode the sound information according to the mixing flag information, and use the encoded result as the core coded data;
若混音标识位信息为需要混音,则计算动态边信息,生成并输出包含所述混音标识位、核心编码数据和动态边信息的音频编码码流;若混音标识位信息为不需要混音,则生成并输出包含所述混音标识位和核心编码数据的音频编码码流;If the audio mixing flag information needs to be mixed, then calculate the dynamic side information, generate and output the audio code stream that contains the audio mixing flag, core coded data and dynamic side information; if the audio mixing flag information does not need Mixing, then generate and output the audio coded stream that contains the mixing identification bit and core coded data;
网络侧收到来自终端的音频编码码流,根据其中的混音标识位信息判断是否需要对该音频编码码流进行混音处理,对需要进行混音处理的M’路音频编码码流,根据其中的动态边信息选出N路音频编码码流,对所选择的N路音频编码码流的核心编码数据进行混音处理,并输出混音后的音频编码码流,其中N小于等于M’。The network side receives the audio coded stream from the terminal, and judges whether the audio coded stream needs to be mixed according to the mixing flag information therein, and for the M' audio coded streams that need to be mixed, according to Among them, the dynamic side information selects N channels of audio coded streams, performs mixing processing on the core coded data of the selected N channels of audio coded streams, and outputs the mixed audio coded streams, wherein N is less than or equal to M' .
本发明实施例还提出一种终端侧编码方法,包括如下步骤:The embodiment of the present invention also proposes a terminal-side encoding method, including the following steps:
根据混音策略对声音信息设置混音标识,根据所述混音标识信息对所述声音信息进行编码获得核心编码数据;Setting a mixing identifier for the sound information according to the mixing strategy, and encoding the sound information according to the mixing identifier information to obtain core coded data;
若所述混音标识信息为需要混音,则计算动态边信息,生成并输出包含所述混音标识、核心编码数据和动态边信息的音频编码码流;若所述混音标识信息为不需要混音,则终端生成并输出包含所述混音标识和核心编码数据的音频编码码流。If the audio mixing identification information is that audio mixing is required, then calculate the dynamic side information, generate and output an audio coded stream containing the audio mixing identification, core coded data and dynamic side information; if the audio mixing identification information is not If audio mixing is required, the terminal generates and outputs an audio coded stream including the audio mixing identifier and core coded data.
本发明实施例还提出一种网络侧混音编码方法,包括如下步骤:The embodiment of the present invention also proposes a network-side audio mixing encoding method, including the following steps:
接收M路音频编码码流,根据其中的混音标识信息判断是否需要对该音频编码码流进行混音处理,对需要进行混音处理的M’路音频编码码流,根据其中的动态边信息选出N路音频编码码流,对所选择的N路音频编码码流的核心编码数据进行混音处理,并输出混音后的音频编码码流,其中M、M’和N均为正整数,N小于等于M’,M’小于等于M。Receive M-channel audio coded streams, judge whether the audio coded streams need to be mixed according to the mixing identification information, and for the M' audio coded streams that need to be mixed, according to the dynamic side information Select N channels of audio coded streams, perform mixing processing on the core coded data of the selected N channels of audio coded streams, and output the mixed audio coded streams, where M, M' and N are all positive integers , N is less than or equal to M', and M' is less than or equal to M.
本发明实施例提出一种多媒体会议系统,包括M个终端和多点控制单元;An embodiment of the present invention proposes a multimedia conference system, including M terminals and a multipoint control unit;
包括M个终端和多点控制单元,其特征在于,Including M terminals and multi-point control units, characterized in that,
所述终端用于对收集的声音信息根据本地的混音策略设置混音标识位,根据混音标识位信息对所述声音信息进行编码,编码的结果作为核心编码数据;并根据本地的混音策略设置混音标识位,生成并输出包含所述核心编码数据、混音标识位为需要混音和动态边信息的音频编码码流,或者生成并输出包含所述核心编码数据和混音标识位为不需要混音的音频编码码流;The terminal is used to set the mixing identification bit for the collected sound information according to the local mixing strategy, encode the sound information according to the mixing identification bit information, and use the encoded result as core coded data; and The policy sets the mixing flag, generates and outputs the audio code stream containing the core coded data, and the mixing flag needs to be mixed and dynamic side information, or generates and outputs the core coded data and the mixing flag Encode streams for audio that does not require mixing;
所述多点控制单元用于接收来自终端的音频编码码流,根据其中的混音标识位的取值判断是否需要对该音频编码码流进行混音处理,对需要进行混音处理的M’路音频码流,根据其中的动态边信息中选出N路音频码流,对所选择的N路音频码流的核心编码数据进行混音处理,并输出混音后的音频编码码流,其中M、M’和N均为正整数,N小于等于M’,M’小于等于M。The multi-point control unit is used to receive the audio coded stream from the terminal, judge whether the audio coded stream needs to be mixed according to the value of the audio mixing flag, and the M' that needs to be mixed audio code streams, N audio code streams are selected according to the dynamic side information therein, the core coded data of the selected N audio code streams are mixed, and the audio coded streams after mixing are output, wherein M, M' and N are all positive integers, N is less than or equal to M', and M' is less than or equal to M.
本发明实施例提出一种多媒体会议终端,包括:An embodiment of the present invention proposes a multimedia conference terminal, including:
声音收集模块,用于收集声音信息;Sound collection module, used for collecting sound information;
混音策略模块,用于根据预先设置的混音策略对所述声音收集模块所收集的声音信息设置混音标识位;A sound mixing strategy module, configured to set a sound mixing identification bit for the sound information collected by the sound collection module according to a preset sound mixing strategy;
核心编码模块,用于对所述声音信息进行编码,输出核心编码数据;A core encoding module, configured to encode the sound information and output core encoded data;
成帧模块,用于根据所述混音策略模块设置的混音标识位计算动态边信息,并根据所述混音标识位的取值,生成包含所述核心编码数据、混音标识位和动态边信息的音频编码数据帧,或者生成包含所述核心编码数据和混音标识位的音频编码数据帧;A framing module, configured to calculate dynamic side information according to the audio mixing flag set by the audio mixing strategy module, and generate a frame containing the core coded data, the audio mixing flag, and the dynamic side information according to the value of the audio mixing flag. The audio coded data frame of the side information, or generate the audio coded data frame containing the core coded data and the mixing identification bit;
输出模块,用于对外输出所述成帧模块生成的音频编码数据帧作为音频编码码流。The output module is configured to output the audio coded data frame generated by the framing module as an audio coded stream.
本发明实施例提出一种多点控制单元,包括:An embodiment of the present invention proposes a multi-point control unit, including:
选择单元,用于对接收来自M个终端的音频编码码流,根据所述音频编码码流的混音标识位的取值判断是否需要对该音频编码码流进行混音处理,对需要进行混音处理的M’路音频编码码流,根据其中的动态边信息选出N路音频编码码流;The selection unit is used for receiving audio coded streams from M terminals, judging whether the audio coded stream needs to be mixed according to the value of the audio coded bit stream of the audio coded stream, and performing mixing M' road audio coded streams for audio processing, select N road audio coded streams according to the dynamic side information therein;
混音单元,用于将所述选择单元所选择的N路音频编码码流中的核心编码数据进行混音处理,得到M’路混音后的音频编码码流;A sound mixing unit, for performing mixing processing on the core coded data in the N-way audio coded streams selected by the selection unit, to obtain M' road-mixed audio coded streams;
发送单元,用于将来自所述混音单元的音频编码码流发送到相应的目的终端。A sending unit, configured to send the audio coded stream from the mixing unit to a corresponding destination terminal.
从以上技术方案可以看出,在终端侧,在编码码流中进行混音标识位的标定并增加相应的动态边信息;在网络侧,根据混音标识位以及动态边信息来选择需要混音的音频编码码流进行混音处理,可以解决混音编码时的噪声问题。From the above technical solutions, it can be seen that on the terminal side, the audio mixing flag is calibrated in the encoded code stream and the corresponding dynamic side information is added; on the network side, the required audio mixing is selected according to the audio mixing flag and dynamic side information The audio coded bit stream is mixed, which can solve the noise problem when mixing and encoding.
附图说明Description of drawings
图1为现有技术的一个多媒体会议系统示意图;Fig. 1 is a schematic diagram of a multimedia conference system in the prior art;
图2为本发明实施例的多媒体会议系统示意图;FIG. 2 is a schematic diagram of a multimedia conference system according to an embodiment of the present invention;
图3为本发明实施例的终端编码器单元输出的音频编码码流中的编码数据帧的结构图;3 is a structural diagram of an encoded data frame in an audio encoded code stream output by a terminal encoder unit according to an embodiment of the present invention;
图4为本发明实施例的终端侧的编码流程图;Fig. 4 is the coding flowchart of the terminal side of the embodiment of the present invention;
图5为本发明实施例的MCU侧的混音编码流程图;FIG. 5 is a flow chart of audio mixing encoding on the MCU side of an embodiment of the present invention;
图6为发明实施例提出的一种多媒体会议终端框图;FIG. 6 is a block diagram of a multimedia conference terminal proposed by an embodiment of the invention;
图7为本发明实施例提出的一种多点控制单元框图。FIG. 7 is a block diagram of a multi-point control unit proposed by an embodiment of the present invention.
具体实施方式Detailed ways
本发明实施例提出基于混音标识位的混音编码方法,终端输出的数据流中,除了承载语音的核心编码码流,还包括混音标识位和动态边信息,其中动态边信息携带混音编码所需的信息,如果混音标识位设置为需要混音,则设置动态边信息;如果混音标识位设置为不需要混音,则不设置动态边信息。MCU根据所述混音标识位选择需要进行混音处理的核心编码码流进行混音处理。The embodiment of the present invention proposes a mixing encoding method based on the mixing identification bit. In addition to the core coded stream carrying the voice, the data stream output by the terminal also includes the mixing identification bit and dynamic side information, wherein the dynamic side information carries the mixing audio The information required for encoding, if the audio mixing flag is set to require audio mixing, then set the dynamic side information; if the audio mixing flag is set to not require audio mixing, then the dynamic side information will not be set. The MCU selects the core coded streams that need to be subjected to the audio mixing process according to the audio mixing flag to perform the audio mixing process.
为使本发明的目的、技术方案和优点更加清楚,下面结合附图对本发明作进一步的详细阐述。In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further elaborated below in conjunction with the accompanying drawings.
图2示出了本发明实施例的多媒体会议系统示意图图。该多媒体会议系统中,包括M个终端,即终端1、终端2......终端M;还包括一个MCU。Fig. 2 shows a schematic diagram of a multimedia conference system according to an embodiment of the present invention. The multimedia conference system includes M terminals, that is, terminal 1, terminal 2...terminal M; and an MCU.
以终端1为例,该终端包括编码器单元201,编码器单元201对终端1的声音收集装置如麦克风收集到的声音进行编码,生成携带所述声音信息的核心编码码流。编码器单元201还根据本地设置的混音策略,设置混音标识位。所述混音策略用于确定本终端输出的声音编码是否需要进行混音处理,根据实际的需要可以设置不同的混音策略,例如,可以对不同的终端设置不同的优先级,对于来自优先级高的终端的音频码流优先进行混音;还可以设置声音能量阈值,当终端收集的声音能量超过该能量阈值则对该终端的音频码流进行混音等等。并且多个混音策略可以同时使用。Taking terminal 1 as an example, the terminal includes an
如果设置的混音标识位表示需要混音,则编码器单元201还要生成动态边信息,写入音频码流中;如果混音标识位表示不需要混音,则编码器单元201输出的音频码流中仅包括核心编码和混音标识位。If the set mixing flag indicates that mixing is required, the
图3示出了本发明实施例的终端编码器单元输出的音频编码码流中的编码数据帧的结构图。设一个数据帧的总长度为n比特,当混音标识位表示需要混音时,该编码数据帧如图3中的上图所示,包括t比特的混音标识位,m比特的动态边信息,以及n-m-t比特的核心编码。其中,混音标识位设置在帧头,便于MCU识别。当混音标识位表示不需要混音时,该编码数据帧如图3中的下图所示,包括t比特的混音标识位和n-t比特的核心编码。Fig. 3 shows a structural diagram of encoded data frames in an audio encoded code stream output by a terminal encoder unit according to an embodiment of the present invention. Let the total length of a data frame be n bits, when the audio mixing flag indicates that audio mixing is required, the coded data frame is shown in the upper figure in Figure 3, including the audio mixing flag of t bits, and the dynamic edge of m bits information, and a core encoding of n-m-t bits. Wherein, the audio mixing identification bit is set in the frame header, which is convenient for the MCU to identify. When the audio mixing flag indicates that audio mixing is not required, the coded data frame is shown in the lower figure of FIG. 3 , including t-bit audio-mixing flags and n-t-bit core codes.
对于G.711窄带增强层(Low Band Enhance,LBE)编码来说,图3中各个部分可取如下数值:t=1,n=80,m=9。For G.711 narrowband enhancement layer (Low Band Enhance, LBE) coding, each part in Fig. 3 can take the following values: t=1, n=80, m=9.
边信息包括:帧能量(Frame Energy)和声音分值(Voicing score),若边信息码长为9比特,则其中6比特为量化的帧能量,3比特为量化的声音分值。The side information includes: frame energy (Frame Energy) and voice score (Voicing score). If the code length of the side information is 9 bits, 6 bits are the quantized frame energy, and 3 bits are the quantized voice score.
其中,帧能量的计算用公式(1)表示:Among them, the calculation of frame energy is expressed by formula (1):
Frame_Length为帧长度,S(i)是经过正交镜象滤波器(Quadrature MirrorFilter,QMF)的低频带信号,i为帧中的采样值序号。Frame_Length is the frame length, S(i) is the low frequency band signal passed through the quadrature mirror filter (Quadrature MirrorFilter, QMF), and i is the sample value sequence number in the frame.
声音分值用公式(2)计算:The voice score is calculated using formula (2):
其中,过零率(Zero_Crossing_Rate)表示10ms内,时域波形过零次数。约化因子(Scale_Factor)为预先设置的约化常量,取值为[0,1]。Among them, the zero-crossing rate (Zero_Crossing_Rate) indicates the number of zero-crossing times of the time-domain waveform within 10ms. The reduction factor (Scale_Factor) is a preset reduction constant, and its value is [0, 1].
根据实际情况,动态边信息也可设置为其它可用于作为混音处理判断依据的量,例如,可以设置为静音活动检测(VAD)。According to actual conditions, the dynamic side information can also be set to other quantities that can be used as a basis for judging the sound mixing process, for example, it can be set to a silent activity detection (VAD).
终端输出的音频码流发送到MCU后,首先输入选择单元202。选择单元202从收到的音频编码码流中首先识别出混音标识位,根据混音标识位的取值,确定是否需要对该路音频编码码流进行混音处理,如果不需要混音处理,则选择单元202将该路音频编码码流输出至相应的目的终端。对于所有M’(M’小于等于M)路需要混音处理的音频编码码流,选择单元202根据其中的动态边信息,选择出N(N小于等于M’)路音频编码码流,将这些音频编码码流分别发送至相应的解码器,经过解码后,再发送到混音单元203进行混音处理,得到M’路混音后的音频码流,再将这M’路音频码流分别用编码器编码后,发送至相应的终端。After the audio code stream output by the terminal is sent to the MCU, it is first input into the
本发明实施例的终端侧的编码过程如图4所示,包括如下步骤:The encoding process on the terminal side of the embodiment of the present invention is shown in Figure 4, including the following steps:
步骤401:对收集的声音信息根据本地的混音策略设置混音标识位,然后对所述声音信息进行编码,编码的结果作为核心编码数据;Step 401: Set the mixing identification bit for the collected sound information according to the local mixing strategy, and then encode the sound information, and the encoded result is used as the core encoded data;
步骤402:若设置混音标识位为需要混音,则计算动态边信息,可以依据前述公式(1)和公式(2)计算帧能量和声音分值作为动态边信息。Step 402: If the audio mixing flag is set to require audio mixing, then calculate the dynamic side information. The frame energy and sound score can be calculated according to the aforementioned formula (1) and formula (2) as the dynamic side information.
步骤403:生成并输出音频编码码流。所述生成音频编码码流具体包括:若所设置的混音标识位为有效,则生成包括所述混音标识位、核心编码数据和动态边信息的音频编码数据帧;若所设置的混音标识位为无效,则生成包括所述混音标识位和核心编码数据的音频编码数据帧。所述混音标识位设置在数据帧最前,较佳地,长度为1比特。Step 403: Generate and output an audio coded stream. The generating the audio coded stream specifically includes: if the set audio mixing flag is valid, generating an audio coded data frame including the audio mixing flag, core coded data and dynamic side information; If the identification bit is invalid, an audio encoding data frame including the mixing identification bit and core encoding data is generated. The audio mixing identification bit is set at the beginning of the data frame, preferably, the length is 1 bit.
本发明实施例的MCU侧的混音编码过程如图5所示,包括如下步骤:The audio mixing encoding process on the MCU side of the embodiment of the present invention is shown in Figure 5, including the following steps:
步骤501:MCU收到来自终端的音频编码码流,根据其中的混音标识位的取值判断是否需要对该音频编码码流进行混音处理,若是,则执行步骤502,否则,执行步骤503。Step 501: The MCU receives the audio coded stream from the terminal, and judges whether the audio coded stream needs to be mixed according to the value of the audio mixing flag. If so, execute
步骤502:将该路音频编码码流直接发送到对应的目的终端,并结束对该路音频编码码流的处理。Step 502: Send the coded audio stream directly to the corresponding destination terminal, and end the processing of the coded audio stream.
步骤503:对于同一时刻收到的来自M’个终端的音频编码码流,且这些音频编码码流中的混音标识位均为需要进行混音处理,MCU根据这些码流中的动态边信息,从中选择出N路音频编码码流,并丢弃剩下的M’-N路音频编码码流。其中N小于等于M’。Step 503: For the audio coded streams received from M' terminals at the same time, and the audio mixing flags in these audio coded streams all need to be mixed, the MCU according to the dynamic side information in these coded streams , select N channels of audio coded streams, and discard the remaining M'-N channels of audio coded streams. Where N is less than or equal to M'.
可以根据边信息中能量的值,如果大于某一个阈值T,则混音,小于则不进行混音。According to the value of the energy in the side information, if it is greater than a certain threshold T, the sound will be mixed, and if it is smaller than it, the sound will not be mixed.
504:对所选择的N路音频编码码流的核心编码数据分别进行解码,将解码后的核心编码数据进行混音处理,得到M’路混音后的音频码流。504: Decode the core coded data of the selected N channels of audio coded streams respectively, perform audio mixing processing on the decoded core coded data, and obtain M' audio coded streams after mixing.
步骤505:将所述M’路混音后的音频码流分别进行编码,将编码后的M’路编码并混音后的音频编码码流分别发送到M’个目的终端。Step 505: Encode the M' mixed audio streams respectively, and send the encoded M' encoded and mixed audio encoded streams to M' destination terminals respectively.
图6为发明实施例提出的一种多媒体会议终端,包括:Fig. 6 is a kind of multimedia conferencing terminal proposed by the embodiment of the invention, including:
声音收集模块601,用于收集声音信息;
混音策略模块602,用于根据预先设置的混音策略对所述声音收集模块601所收集的声音信息设置混音标识位;A sound
核心编码模块603,用于对所述声音信息进行编码,输出核心编码数据;如果混音策略模块602将混音标识位设置为不需要混音,则核心编码模块603进行编码时,无需考虑动态边信息的比特分配;如果该混音标识位设置为需要混音,则核心编码模块603进行编码时,需要考虑动态边信息的比特分配。例如,如果编码数据帧的总比特数为n比特,混音标识位为t比特,动态边信息为m比特,则对于不需要考虑动态边信息的比特分配的情况,核心编码模块603编码得到的核心编码数据长度为n-t比特;对于需要考虑动态边信息的比特分配的情况,核心编码模块603编码得到的核心编码数据长度为n-m-t比特。The
成帧模块604,用于根据所述混音策略模块603设置的混音标识位计算动态边信息,并根据所述混音标识位的取值,生成包含所述核心编码数据、混音标识位和动态边信息的音频数据帧,或者生成包含所述核心编码数据和混音标识位的音频数据帧;The framing
输出模块605,用于将所述成帧模块604生成的音频数据帧作为音频编码码流对外输出。The
图7为本发明实施例提出的一种多点控制单元,包括:Fig. 7 is a kind of multi-point control unit proposed by the embodiment of the present invention, including:
选择单元701,用于对接收来自M个终端的音频编码码流,根据所述音频编码码流的混音标识位的取值判断是否需要对该音频编码码流进行混音处理,对需要进行混音处理的M’路音频编码码流,根据其中的动态边信息选出N路音频编码码流;The
混音单元702,用于将所述选择单元所选择的N路音频编码码流中的核心编码数据进行混音处理,得到M’路混音后的音频码流;The
发送单元703,用于将来自所述混音单元的音频码流发送到相应的目的终端。The sending
所述选择单元701将不需要混音处理的音频编码码流发送到所述发送单元703;则所述发送单元703将来自所述选择单元的音频编码码流发送到相应的目的终端。The
所述多点控制单元进一步包括:解码器704,用于对所述选择单元701所选择的音频编码码流中的核心编码数据进行解码,并将解码后的核心编码数据发送到所述混音单元702;The multipoint control unit further includes: a
编码器705,用于对来自所述混音单元702的混音后的音频码流进行编码,并将编码后的音频编码码流发送到所述发送单元703。The
本发明实施例方案在编码码流中进行混音标识位的标定并增加相应的动态边信息,根据混音标识位和动态分配边信息比特分配。MCU根据混音标识位以及动态边信息来选择需要混音的音频编码码流进行混音处理,可以解决信号溢出以及对大信号进行混音时会引入误差的问题,并降低MCU的计算复杂度;在不进行混音时,能够充分利用码流比特分配,提高核心编码质量。本发明方案既可用于混音系统,又可应用常用编解码系统的编解码器,有利实现编码码流的智能控制,增强MCU单元交互性。The solution of the embodiment of the present invention performs the marking of the audio mixing identification bit in the encoded code stream and adds the corresponding dynamic side information, and allocates the side information bits according to the audio mixing identification bit and dynamic allocation. The MCU selects the audio coded stream that needs to be mixed according to the mixing flag and dynamic side information for mixing processing, which can solve the problem of signal overflow and the introduction of errors when mixing large signals, and reduce the computational complexity of the MCU ; When not performing audio mixing, it can make full use of the code stream bit allocation to improve the core coding quality. The solution of the invention can be used in a sound mixing system, and can also be applied to a codec of a commonly used codec system, which is beneficial to realize the intelligent control of the code stream and enhance the interactivity of the MCU unit.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.
Claims (13)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2007101813767A CN101414463B (en) | 2007-10-19 | 2007-10-19 | A kind of sound mixing coding method, device and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2007101813767A CN101414463B (en) | 2007-10-19 | 2007-10-19 | A kind of sound mixing coding method, device and system |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201110205093A Division CN102324235A (en) | 2007-10-19 | 2007-10-19 | Sound mixing encoding method, device and system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN101414463A CN101414463A (en) | 2009-04-22 |
| CN101414463B true CN101414463B (en) | 2011-08-10 |
Family
ID=40594963
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2007101813767A Expired - Fee Related CN101414463B (en) | 2007-10-19 | 2007-10-19 | A kind of sound mixing coding method, device and system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN101414463B (en) |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102045461B (en) * | 2009-10-09 | 2013-07-24 | 杭州华三通信技术有限公司 | Sound mixing method and device for voice signal |
| CN102222503B (en) * | 2010-04-14 | 2013-08-28 | 华为终端有限公司 | Mixed sound processing method, device and system of audio signal |
| CN103151046B (en) * | 2012-10-30 | 2015-12-09 | 贵阳朗玛信息技术股份有限公司 | Voice server and method of speech processing thereof |
| CN108766448B (en) * | 2018-06-19 | 2020-05-01 | 苏州科达科技股份有限公司 | Mixing testing system, method, device and storage medium |
| CN109901811B (en) * | 2019-02-26 | 2022-09-06 | 北京华夏电通科技股份有限公司 | Sound mixing method and device applied to digital court trial |
| CN110070878B (en) * | 2019-03-26 | 2021-05-04 | 苏州科达科技股份有限公司 | Decoding method of audio code stream and electronic equipment |
| CN111741177B (en) * | 2020-06-12 | 2021-07-27 | 浙江齐聚科技有限公司 | Audio mixing method, device, equipment and medium for online conference |
| CN111951813A (en) * | 2020-07-20 | 2020-11-17 | 腾讯科技(深圳)有限公司 | Voice coding control method, device and storage medium |
| CN112951252B (en) * | 2021-05-13 | 2021-08-03 | 北京百瑞互联技术有限公司 | LC3 audio code stream sound mixing method, device, medium and equipment |
| CN114937456A (en) * | 2022-04-24 | 2022-08-23 | 海宁奕斯伟集成电路设计有限公司 | External playing device, method, program and system |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH09305464A (en) | 1996-05-17 | 1997-11-28 | Olympus Optical Co Ltd | Audio information recording and reproducing device |
| EP1724774A2 (en) * | 2005-05-20 | 2006-11-22 | Enter Tech Co., Ltd | Digital audio player and playing method thereof |
| WO2007091842A1 (en) * | 2006-02-07 | 2007-08-16 | Lg Electronics Inc. | Apparatus and method for encoding/decoding signal |
-
2007
- 2007-10-19 CN CN2007101813767A patent/CN101414463B/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH09305464A (en) | 1996-05-17 | 1997-11-28 | Olympus Optical Co Ltd | Audio information recording and reproducing device |
| EP1724774A2 (en) * | 2005-05-20 | 2006-11-22 | Enter Tech Co., Ltd | Digital audio player and playing method thereof |
| CN1866393A (en) * | 2005-05-20 | 2006-11-22 | 株式会社Enter技术 | Digital audio player and playing method thereof |
| WO2007091842A1 (en) * | 2006-02-07 | 2007-08-16 | Lg Electronics Inc. | Apparatus and method for encoding/decoding signal |
Also Published As
| Publication number | Publication date |
|---|---|
| CN101414463A (en) | 2009-04-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN101414463B (en) | A kind of sound mixing coding method, device and system | |
| CN1954367B (en) | Support for switching between audio encoder modes | |
| KR20200050940A (en) | Method and apparatus for frame erasure concealment for a multi-rate speech and audio codec | |
| EP2786552B1 (en) | Method to select active channels in audio mixing for multi-party teleconferencing | |
| EP2359365B1 (en) | Apparatus and method for encoding at least one parameter associated with a signal source | |
| WO2007140724A1 (en) | A method and apparatus for transmitting and receiving background noise and a silence compressing system | |
| KR20030076646A (en) | Method and apparatus for interoperability between voice transmission systems during speech inactivity | |
| CN101414462A (en) | Audio encoding method and multi-point audio signal mixing control method and corresponding equipment | |
| WO2008148321A1 (en) | An encoding or decoding apparatus and method for background noise, and a communication device using the same | |
| JP2010170142A (en) | Method and device for generating bit rate scalable audio data stream | |
| CN101488344A (en) | Quantitative noise leakage control method and apparatus | |
| CN103413553A (en) | Audio coding method, audio decoding method, coding terminal, decoding terminal and system | |
| CN102915736B (en) | Mixed audio processing method and stereo process system | |
| CN103915097B (en) | Voice signal processing method, device and system | |
| CN103680509A (en) | Method for discontinuous transmission of voice signals and generation of background noise | |
| TW561451B (en) | Audio mixing method and its device | |
| CN102324235A (en) | Sound mixing encoding method, device and system | |
| Hiwasaki et al. | A G. 711 embedded wideband speech coding for VoIP conferences | |
| WO2008049311A1 (en) | A method, system and apparatus for transmitting the encoded code stream of the background noise | |
| US7536298B2 (en) | Method of comfort noise generation for speech communication | |
| CN101950562A (en) | Hierarchical coding method and system based on audio attention | |
| CN112995425A (en) | Equal loudness sound mixing method and device | |
| JP4437011B2 (en) | Speech encoding device | |
| CN101399041A (en) | Encoding/decoding method and device for noise background | |
| CN1845573A (en) | Simultaneous interpretation video conference system and method for supporting high capacity mixed sound |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110810 Termination date: 20161019 |

