CN115966216A

CN115966216A - Audio stream processing method and device

Info

Publication number: CN115966216A
Application number: CN202211648646.1A
Authority: CN
Inventors: 于雷; 张皓羽; 何钧
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-04-14

Abstract

The present application provides an audio stream processing method and device, wherein the audio stream processing method includes: obtaining the audio stream to be processed, and identifying the encoding format of the audio stream to be processed; when the encoding format of the audio stream to be processed is panoramic sound In the case of the encoding format, obtain the set audio loudness parameter; according to the set audio loudness parameter, transcode the audio stream to be processed into an audio stream in the stereo encoding format; determine the audio stream in the stereo encoding format according to the audio stream in the stereo encoding format Describes the target audio stream after transcoding the audio stream to be processed. Through the obtained set audio loudness parameters, the audio stream to be processed is transcoded into an audio stream in stereo encoding format, and then the target video stream is obtained, which can effectively solve the problem of low volume generated when panoramic sound is converted to stereo, and improve audio quality. At the same time, the audio processing process is simplified, the processing efficiency is improved, and it is compatible with the normal production of stereo audio.

Description

Audio stream processing method and device

技术领域technical field

本申请涉及计算机技术领域，特别涉及一种音频流处理方法。本申请同时涉及一种音频流处理装置，一种计算设备，以及一种计算机可读存储介质。The present application relates to the field of computer technology, in particular to an audio stream processing method. The present application also relates to an audio stream processing device, a computing device, and a computer-readable storage medium.

背景技术Background technique

随着计算机技术的快速发展，信号处理技术也突飞猛进，其中，以音频处理技术最为突出。在当前的点播流媒体中，应用最广泛的音频是立体声音频和全景声音频。为了兼容所有设备的音频播放，在生产音频时，生成全景声与立体声两种规格的音频。With the rapid development of computer technology, signal processing technology has also advanced by leaps and bounds, among which audio processing technology is the most prominent. In the current on-demand streaming media, the most widely used audio is stereo audio and panoramic audio. In order to be compatible with the audio playback of all devices, when producing audio, generate audio in both panoramic and stereo formats.

现有技术中，通常将全景声音频直接转码生成全景声编码格式和立体声编码格式的两种音频流，用于提供给不同的硬件设备播放。但由于全景声音频其混音和方式与立体声音频有很大区别，在将全景声音频转码成立体声音频后，音量会变低，音频质量降低，导致立体声音频几乎无法播放。因此，亟需一种有效的方案以解决上述问题。In the prior art, usually the panoramic sound audio is directly transcoded to generate two audio streams in the panoramic sound encoding format and the stereo encoding format, which are provided to different hardware devices for playback. However, because the mixing method of Atmos audio is very different from that of stereo audio, after transcoding Atmos audio to stereo audio, the volume will become lower and the audio quality will be reduced, making stereo audio almost impossible to play. Therefore, there is an urgent need for an effective solution to solve the above problems.

发明内容Contents of the invention

有鉴于此，本申请实施例提供了一种音频流处理方法。本申请同时涉及一种音频流处理装置，一种计算设备，以及一种计算机可读存储介质，以解决现有技术中存在的转码后音频质量低的技术缺陷。In view of this, an embodiment of the present application provides an audio stream processing method. The present application also relates to an audio stream processing device, a computing device, and a computer-readable storage medium, so as to solve the technical defect of low audio quality after transcoding existing in the prior art.

根据本申请实施例的第一方面，提供了一种音频流处理方法，包括：According to the first aspect of the embodiments of the present application, an audio stream processing method is provided, including:

获取待处理音频流，并识别所述待处理音频流的编码格式；Obtain the audio stream to be processed, and identify the encoding format of the audio stream to be processed;

在所述待处理音频流的编码格式为全景声编码格式的情况下，获取设定音频响度参数；In the case where the encoding format of the audio stream to be processed is the panoramic sound encoding format, acquiring and setting the audio loudness parameter;

根据所述设定音频响度参数，将所述待处理音频流转码成立体声编码格式的音频流；Transcoding the audio stream to be processed into an audio stream in a stereo encoding format according to the set audio loudness parameter;

根据所述立体声编码格式的音频流，确定所述待处理音频流转码后的目标音频流。Determine, according to the audio stream in the stereo encoding format, a target audio stream after transcoding the audio stream to be processed.

可选地，所述根据所述设定音频响度参数，将所述待处理音频流转码成立体声编码格式的音频流，包括：Optionally, transcoding the audio stream to be processed into an audio stream in a stereo encoding format according to the set audio loudness parameter includes:

对所述待处理音频流进行解码处理，得到音频采样数据；Decoding the audio stream to be processed to obtain audio sample data;

根据所述设定音频响度参数，按照设定立体声编码策略，对所述音频采样数据进行编码处理，得到立体声编码格式的音频流。According to the set audio loudness parameter and according to the set stereo encoding strategy, the audio sampling data is encoded to obtain an audio stream in a stereo encoding format.

可选地，所述根据所述设定音频响度参数，按照设定立体声编码策略，对所述音频采样数据进行编码处理，得到所述立体声编码格式的音频流，包括：Optionally, performing encoding processing on the audio sampling data according to the set audio loudness parameter according to the set stereo encoding strategy to obtain the audio stream in the stereo encoding format, including:

按照设定立体声编码策略，对所述音频采样数据进行编码，并根据所述设定音频响度参数，调整所述音频采样数据在设定时长内的声音级别，得到所述立体声编码格式的音频流。Encoding the audio sample data according to the set stereo encoding strategy, and adjusting the sound level of the audio sample data within the set duration according to the set audio loudness parameter, to obtain the audio stream in the stereo encoding format .

可选地，所述待处理音频流包含多个音频数据包；Optionally, the audio stream to be processed includes a plurality of audio data packets;

所述对所述待处理音频流进行解码处理，得到音频采样数据，包括：Decoding the audio stream to be processed to obtain audio sample data includes:

对每个所述音频数据包进行解码处理，得到各所述音频采样数据对应的子音频采样数据；Decoding each of the audio data packets to obtain sub-audio sample data corresponding to each of the audio sample data;

所述根据所述设定音频响度参数，按照设定立体声编码策略，对所述音频采样数据进行编码处理，得到立体声编码格式的音频流，包括：According to the set audio loudness parameter, according to the set stereo encoding strategy, the audio sampling data is encoded to obtain the audio stream in stereo encoding format, including:

针对每个子音频采样数据，根据所述设定音频响度参数，按照设定立体声编码策略，对所述子音频采样数据进行编码处理，得到立体声编码格式的子音频流；For each sub-audio sample data, according to the set audio loudness parameter, according to the set stereo encoding strategy, the sub-audio sample data is encoded to obtain a sub-audio stream in stereo encoding format;

将各所述立体声编码格式的子音频流进行拼接，得到所述立体声编码格式的音频流。Splicing the sub-audio streams in the stereo encoding format to obtain the audio stream in the stereo encoding format.

可选地，所述获取待处理音频流，包括：Optionally, the acquiring the audio stream to be processed includes:

获取待处理视频流，其中，所述待处理视频流包含待处理音频流和待处理图像序列；Obtaining a video stream to be processed, wherein the video stream to be processed includes an audio stream to be processed and an image sequence to be processed;

所述获取待处理多媒体流之后，还包括：After the acquisition of the multimedia stream to be processed, it also includes:

将所述待处理图像序列进行转码处理，得到目标图像序列；Transcoding the image sequence to be processed to obtain a target image sequence;

将所述目标音频流和所述目标图像序列进行对齐处理，确定目标视频流。performing alignment processing on the target audio stream and the target image sequence to determine a target video stream.

可选地，所述获取设定音频响度参数之前，还包括：Optionally, before acquiring and setting the audio loudness parameter, it also includes:

获取所述待处理音频流的音轨信息；Acquiring audio track information of the audio stream to be processed;

在所述音轨信息为单路音轨的情况下，识别所述待处理音频流的编码格式。If the audio track information is a single audio track, identify the encoding format of the audio stream to be processed.

可选地，所述获取所述待处理音频流的音轨信息之后，还包括：Optionally, after acquiring the audio track information of the audio stream to be processed, the method further includes:

在所述音轨信息为双路音轨的情况下，分别识别各路音轨对应的待处理音频流的编码格式；In the case where the audio track information is a dual audio track, respectively identify the encoding format of the audio stream to be processed corresponding to each audio track;

将所述编码格式为立体声编码格式的待处理音频流进行文件格式转码，得到立体声编码格式的目标音频流，和/或，将所述编码格式为全景声编码格式的待处理音频流进行文件格式转码，得到全景声编码格式的目标音频流。Perform file format transcoding on the audio stream to be processed whose encoding format is stereo encoding format to obtain a target audio stream in stereo encoding format, and/or perform file format conversion on the audio stream to be processed whose encoding format is panoramic sound encoding format Format transcoding to obtain the target audio stream in the Atmos encoding format.

根据本申请实施例的第二方面，提供了一种音频流处理装置，包括：According to a second aspect of the embodiments of the present application, an audio stream processing device is provided, including:

第一识别模块，被配置为获取待处理音频流，并识别所述待处理音频流的编码格式；The first identification module is configured to obtain the audio stream to be processed, and identify the encoding format of the audio stream to be processed;

获取模块，被配置为在所述待处理音频流的编码格式为全景声编码格式的情况下，获取设定音频响度参数；The acquisition module is configured to acquire and set the audio loudness parameter when the encoding format of the audio stream to be processed is the panoramic sound encoding format;

第一转码模块，被配置为根据所述设定音频响度参数，将所述待处理音频流转码成立体声编码格式的音频流；The first transcoding module is configured to transcode the audio stream to be processed into an audio stream in a stereo encoding format according to the set audio loudness parameter;

确定模块，被配置为根据所述立体声编码格式的音频流，确定所述待处理音频流转码后的目标音频流。The determining module is configured to determine a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo encoding format.

根据本申请实施例的第三方面，提供了一种计算设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机指令，所述处理器执行所述计算机指令时实现所述音频流处理方法的步骤。According to a third aspect of the embodiments of the present application, there is provided a computing device, including a memory, a processor, and computer instructions stored in the memory and operable on the processor. When the processor executes the computer instructions, the computer instructions are implemented. The steps of the audio stream processing method are described.

根据本申请实施例的第四方面，提供了一种计算机可读存储介质，其存储有计算机指令，该计算机指令被处理器执行时实现所述音频流处理方法的步骤。According to a fourth aspect of the embodiments of the present application, there is provided a computer-readable storage medium, which stores computer instructions, and implements the steps of the audio stream processing method when the computer instructions are executed by a processor.

本申请提供的音频流处理方法，获取待处理音频流，并识别所述待处理音频流的编码格式；在所述待处理音频流的编码格式为全景声编码格式的情况下，获取设定音频响度参数；根据所述设定音频响度参数，将所述待处理音频流转码成立体声编码格式的音频流；根据所述立体声编码格式的音频流，确定所述待处理音频流转码后的目标音频流。通过获取的设定音频响度参数，将待处理音频流转码成立体声编码格式的音频流，进而得到目标视频流，可以有效地解决全景声转立体声时产生的音量过低问题，在提高音频质量的同时，简化了音频处理过程，提高处理效率，同时能兼容立体声音频的正常生产。The audio stream processing method provided in this application obtains the audio stream to be processed, and identifies the encoding format of the audio stream to be processed; when the encoding format of the audio stream to be processed is the panoramic sound encoding format, obtains the set audio Loudness parameter; according to the set audio loudness parameter, transcode the audio stream to be processed into an audio stream in stereo encoding format; determine the target audio after transcoding the audio stream to be processed according to the audio stream in stereo encoding format flow. By obtaining the set audio loudness parameters, the audio stream to be processed is transcoded into an audio stream in stereo encoding format, and then the target video stream is obtained, which can effectively solve the problem of low volume generated when panoramic sound is converted to stereo, and improve audio quality. At the same time, it simplifies the audio processing process, improves the processing efficiency, and is compatible with the normal production of stereo audio.

附图说明Description of drawings

图1是现有技术提供的一种音频流处理方法的处理流程图；Fig. 1 is the processing flowchart of a kind of audio stream processing method that prior art provides;

图2是根据本申请实施例提供的一种音频流处理系统的结构示意图；FIG. 2 is a schematic structural diagram of an audio stream processing system provided according to an embodiment of the present application;

图3是本申请一实施例提供的一种音频流处理方法的流程图；Fig. 3 is a flowchart of an audio stream processing method provided by an embodiment of the present application;

图4A是本申请一实施例提供的一种音频流处理方法的处理流程图；FIG. 4A is a processing flowchart of an audio stream processing method provided by an embodiment of the present application;

图4B是本申请一实施例提供的一种应用于影片场景的音频流处理方法的处理流程图；FIG. 4B is a processing flowchart of an audio stream processing method applied to a movie scene provided by an embodiment of the present application;

图5是本申请一实施例提供的一种音频流处理装置的结构示意图；Fig. 5 is a schematic structural diagram of an audio stream processing device provided by an embodiment of the present application;

图6是本申请一实施例提供的一种计算设备的结构框图。Fig. 6 is a structural block diagram of a computing device provided by an embodiment of the present application.

具体实施方式Detailed ways

在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申请能够以很多不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本申请内涵的情况下做类似推广，因此本申请不受下面公开的具体实施的限制。In the following description, numerous specific details are set forth in order to provide a thorough understanding of the application. However, the present application can be implemented in many other ways different from those described here, and those skilled in the art can make similar promotions without violating the connotation of the present application. Therefore, the present application is not limited by the specific implementation disclosed below.

在本申请一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本申请一个或多个实施例。在本申请一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本申请一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。Terms used in one or more embodiments of the present application are for the purpose of describing specific embodiments only, and are not intended to limit the one or more embodiments of the present application. As used in one or more embodiments of this application and the appended claims, the singular forms "a", "the", and "the" are also intended to include the plural forms unless the context clearly dictates otherwise. It should also be understood that the term "and/or" used in one or more embodiments of the present application refers to and includes any and all possible combinations of one or more associated listed items.

应当理解，尽管在本申请一个或多个实施例中可能采用术语第一、第二等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本申请一个或多个实施例范围的情况下，第一也可以被称为第二，类似地，第二也可以被称为第一。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, etc. may be used to describe various information in one or more embodiments of the present application, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first may also be referred to as second, and similarly, second may also be referred to as first, without departing from the scope of one or more embodiments of the present application. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

首先，对本申请一个或多个实施例涉及的名词术语进行解释。First, terms and terms involved in one or more embodiments of the present application are explained.

立体声：是使用两个或多个独立的音效通道，在一对以对称方式配置的扬声器(即俗称的喇叭)上出现。以此方法所发出的声音，在不同方向仍可保持自然与悦耳。多个声道的立体声又被称为立体环绕声，比如常见的5.1环绕声和7.1环绕声。Stereo: It uses two or more independent sound channels to appear on a pair of speakers (commonly known as speakers) configured in a symmetrical manner. The sound produced in this way can still be natural and pleasant in different directions. Stereo sound with multiple channels is also called stereo surround sound, such as common 5.1 surround sound and 7.1 surround sound.

全景声：也即杜比全景声，是一种三维环绕声技术，它在立体环绕声基础上进行了扩展，加了天空声道的环绕声效，可以呈现64个独立扬声器的内容，也可同时发送多达128个声道或对象，比7.1环绕声道更为细致。杜比全景声是空间音频的一种实现。Atmos: Also known as Dolby Atmos, it is a three-dimensional surround sound technology. It expands on the basis of stereo surround sound and adds the surround sound effect of the sky channel. It can present the content of 64 independent speakers, or simultaneously Send up to 128 channels or objects, more detailed than 7.1 surround channels. Dolby Atmos is an implementation of spatial audio.

LUFS(LoudnessUnitsRelativetoFullScale)代表相对完整刻度的响度单位或满量程响度单位(即，系统可以处理的最大级别)。这是一种将人类感知和电信号强度综合考虑的声音响度的标准化测量方法。LUFS (LoudnessUnitsRelativetoFullScale) stands for Loudness Units Relative to Full Scale or Full Scale Loudness Units (ie, the maximum level the system can handle). This is a standardized measure of the loudness of sound that takes into account both human perception and electrical signal strength.

随着计算机技术的快速发展，信号处理技术也突飞猛进，其中，以音频处理技术最为突出。在当前的点播流媒体中，应用最广泛的音频是立体声音频和全景声音频。传统的立体声可以把声音展现在一个水平面上，声音定位具有了前后、左右两个维度，可以称其为二维(2D，2-Dimension)音频。当一个音频在具有前后、左右两个维度的同时，还具有上下维度的时候，可以称其为三维(3D，3-Dimension)音频，也即空间音频。With the rapid development of computer technology, signal processing technology has also advanced by leaps and bounds, among which audio processing technology is the most prominent. In the current on-demand streaming media, the most widely used audio is stereo audio and panoramic audio. Traditional stereo can display sound on a horizontal plane, and sound positioning has two dimensions, front and back, left and right, which can be called two-dimensional (2D, 2-Dimension) audio. When an audio has two dimensions of front and back, left and right, as well as up and down dimensions, it can be called three-dimensional (3D, 3-Dimension) audio, that is, spatial audio.

在当前的点播流媒体中，应用最广泛的音频是立体声编码格式的音频，由于其简单且兼容性高，很多互联网公司都采用了该方案。随着硬件技术的发展，越来越多的用户设备支持播放空间音频。因此，为了兼容所有设备的音频播放，在生产音频时，生成全景声与立体声两种规格的音频。In the current on-demand streaming media, the most widely used audio is the audio in stereo encoding format. Because of its simplicity and high compatibility, many Internet companies have adopted this solution. With the development of hardware technology, more and more user devices support playing spatial audio. Therefore, in order to be compatible with the audio playback of all devices, when producing audio, audio in both panoramic and stereo formats is generated.

现有技术中，通常将全景声音频直接转码生成全景声编码格式和立体声编码格式的两种音频流，用于提供给不同的硬件设备播放。参见图1，图1示出了现有技术提供的一种音频流处理方法的流程示意图：视频或音频，也即媒体资源的生成者，生成全景声媒体资源，然后直接对全景声媒体资源进行转码，得到全景声的目标媒体资源和立体声的目标媒体资源，以供听众使用播放设备进行播放。In the prior art, usually the panoramic sound audio is directly transcoded to generate two audio streams in the panoramic sound encoding format and the stereo encoding format, which are provided to different hardware devices for playback. Referring to FIG. 1, FIG. 1 shows a schematic flowchart of an audio stream processing method provided by the prior art: video or audio, that is, a generator of media resources, generates panoramic sound media resources, and then directly processes the panoramic sound media resources Transcoding to obtain the target media resource of the panoramic sound and the target media resource of the stereo sound, so that the listener can use the playback device to play.

但由于全景声音频其混音和方式，与立体声音频有很大区别，普通的立体声一般只有两个固定声道，播放时声音只有左右两侧的声源。而全景声的声源不仅包括左右两侧，还包括前后上下各个方向，是一种360度环绕声。因此，在将全景声音频转码成立体声音频后，音量会变低，音频质量降低，导致立体声音频几乎无法播放。However, due to the mixing and method of panoramic sound audio, it is very different from stereo audio. Ordinary stereo generally only has two fixed channels, and the sound only has the sound sources on the left and right sides during playback. The sound source of panoramic sound includes not only the left and right sides, but also the front, rear, up, down, and all directions, which is a kind of 360-degree surround sound. As a result, after transcoding Atmos audio to stereo audio, the volume becomes lower and the audio quality degrades, making stereo audio almost unplayable.

因此，本说明书提供了一种音频流处理方法，获取待处理音频流，并识别所述待处理音频流的编码格式；在所述待处理音频流的编码格式为全景声编码格式的情况下，获取设定音频响度参数；根据所述设定音频响度参数，将所述待处理音频流转码成立体声编码格式的音频流；根据所述立体声编码格式的音频流，确定所述待处理音频流转码后的目标音频流。通过获取的设定音频响度参数，将待处理音频流转码成立体声编码格式的音频流，进而得到目标视频流，可以有效地解决全景声转立体声时产生的音量过低问题，在提高音频质量的同时，简化了音频处理过程，提高处理效率，同时能兼容立体声音频的正常生产。Therefore, this specification provides an audio stream processing method, which acquires the audio stream to be processed, and identifies the encoding format of the audio stream to be processed; when the encoding format of the audio stream to be processed is the panoramic sound encoding format, Obtain the set audio loudness parameter; according to the set audio loudness parameter, transcode the audio stream to be processed into an audio stream in stereo encoding format; determine the transcoding of the audio stream to be processed according to the audio stream in stereo encoding format After the target audio stream. By obtaining the set audio loudness parameters, the audio stream to be processed is transcoded into an audio stream in stereo encoding format, and then the target video stream is obtained, which can effectively solve the problem of low volume generated when panoramic sound is converted to stereo, and improve audio quality. At the same time, it simplifies the audio processing process, improves the processing efficiency, and is compatible with the normal production of stereo audio.

在本申请中，提供了一种音频流处理方法，本申请同时涉及一种音频流处理装置，一种计算设备，以及一种计算机可读存储介质，在下面的实施例中逐一进行详细说明。In the present application, an audio stream processing method is provided, and the present application also relates to an audio stream processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.

本申请实施例提供的音频流处理方法的执行主体可以是终端，可以是服务端，也可以是终端与服务器共同协助，本申请实施例对此不作限定。并且，该终端可以是任何一种可与用户进行人机交互的电子产品，例如PC(Personal Computer，个人计算机)、手机、掌上电脑PPC(PocketPC)、平板电脑等。该服务器可以是一台服务器，也可以是由多台服务器组成的服务器集群，或者是一个云计算服务中心，本申请实施例对此不作限定。The execution subject of the audio stream processing method provided in the embodiment of the present application may be a terminal, a server, or a terminal and a server jointly assisting, which is not limited in the embodiment of the present application. Moreover, the terminal can be any electronic product capable of man-machine interaction with the user, such as a PC (Personal Computer, personal computer), a mobile phone, a Pocket PC (Pocket PC), a tablet computer, and the like. The server may be one server, or a server cluster composed of multiple servers, or a cloud computing service center, which is not limited in this embodiment of the present application.

以终端与服务器共同协助为例，参见图2，图2是根据本申请实施例提供的一种音频流处理系统的结构示意图：Taking the joint assistance of the terminal and the server as an example, see FIG. 2, which is a schematic structural diagram of an audio stream processing system provided according to an embodiment of the present application:

终端上传待处理音频流至服务器。The terminal uploads the audio stream to be processed to the server.

服务器识别所述待处理音频流的编码格式；在所述待处理音频流的编码格式为全景声编码格式的情况下，获取设定音频响度参数；根据所述设定音频响度参数，将所述待处理音频流转码成立体声编码格式的音频流；根据所述立体声编码格式的音频流，确定所述待处理音频流转码后的目标音频流。The server identifies the encoding format of the audio stream to be processed; when the encoding format of the audio stream to be processed is a panoramic sound encoding format, obtains a set audio loudness parameter; according to the set audio loudness parameter, the described The audio stream to be processed is transcoded into an audio stream in a stereo coding format; and a target audio stream after the transcoding of the audio stream to be processed is determined according to the audio stream in a stereo coding format.

相应地，服务器还可以将目标音频流发送听众对应的播放设备，以便于听众播放。Correspondingly, the server may also send the target audio stream to the playback device corresponding to the listener, so that the listener can play it.

应用本实施例的方案，通过获取的设定音频响度参数，将待处理音频流转码成立体声编码格式的音频流，进而得到目标视频流，可以有效地解决全景声转立体声时产生的音量过低问题，在提高音频质量的同时，简化了音频处理过程，提高处理效率，同时能兼容立体声音频的正常生产。Applying the scheme of this embodiment, by obtaining the set audio loudness parameters, the audio stream to be processed is transcoded into an audio stream in stereo encoding format, and then the target video stream is obtained, which can effectively solve the problem of low volume generated when panoramic sound is converted to stereo The problem is that while improving the audio quality, it simplifies the audio processing process, improves processing efficiency, and is compatible with the normal production of stereo audio.

图3示出了根据本申请一实施例提供的一种音频流处理方法的流程图，具体包括以下步骤：Fig. 3 shows a flow chart of an audio stream processing method provided according to an embodiment of the present application, which specifically includes the following steps:

步骤302：获取待处理音频流，并识别所述待处理音频流的编码格式。Step 302: Obtain the audio stream to be processed, and identify the encoding format of the audio stream to be processed.

具体的，音频指存储声音内容的文件。音频流可以控制“数据流”同步类型音频的输出质量。待处理音频流是指需要进行处理或转码的音频流。Specifically, audio refers to a file storing sound content. Audio Stream can control the output quality of "Stream" sync type audio. The audio stream to be processed refers to the audio stream that needs to be processed or transcoded.

实际应用中，获取待处理音频流的方式有多种，例如，可以是某用户向执行主体发送待处理音频流的获取指令，或者发送音频流处理指令，相应地，执行主体在接收到该获取指令后，开始对待处理音频流进行获取；也可以是执行主体每隔预设时长，自动获取待处理音频流，例如，经过预设时长后，具有音频流处理功能的服务器自动获取指定存取区域内的待处理音频流。本说明书对获取待处理音频流的方式不作任何限定。In practical applications, there are many ways to obtain the audio stream to be processed. For example, a user may send an acquisition instruction for the audio stream to be processed to the execution subject, or send an audio stream processing instruction. Correspondingly, the execution subject receives the acquisition instruction After the instruction, start to obtain the audio stream to be processed; it can also be that the execution subject automatically obtains the audio stream to be processed every preset time period. For example, after the preset time period, the server with audio stream processing function automatically obtains the designated access area The pending audio stream within the . This manual does not make any limitation on the way to obtain the audio stream to be processed.

在获取了待处理音频流之后，可以根据音频流的数据属性信息，识别待处理音频流的编码格式，也可以媒体文件信息查看工具，对待处理音频流进行检测，识别编码格式。After the audio stream to be processed is obtained, the encoding format of the audio stream to be processed can be identified according to the data attribute information of the audio stream, or the audio stream to be processed can be detected by the media file information viewing tool to identify the encoding format.

例如，可以调用mediainfo命令识别待处理音频流的编码格式。其中，MediaInfo是一款非常实用的视频参数检测工具，除了可以对视频进行编码分析查询，还可以对音频文件的编码及信息进行检测。For example, you can call the mediainfo command to identify the encoding format of the audio stream to be processed. Among them, MediaInfo is a very practical video parameter detection tool. In addition to encoding, analyzing and querying video, it can also detect the encoding and information of audio files.

步骤304：在所述待处理音频流的编码格式为全景声编码格式的情况下，获取设定音频响度参数。Step 304: When the encoding format of the audio stream to be processed is the panoramic sound encoding format, acquire and set audio loudness parameters.

具体的，编码是用预先规定的方法，将文字、数字或其他对象编成数码，或将信息、数据转换成规定的电脉冲信号。全景声编码格式是指全景声对应的编码格式，如E-AC-3JOC格式。设定音频响度参数是指对音频数据进行响度调节的参数，如LUFS(相对完整刻度的响度单位或满量程响度单位)、分贝等。Specifically, encoding is to use a predetermined method to encode text, numbers or other objects into numbers, or to convert information and data into prescribed electrical pulse signals. The panoramic sound coding format refers to a coding format corresponding to panoramic sound, such as the E-AC-3JOC format. Setting the audio loudness parameter refers to the parameter for adjusting the loudness of the audio data, such as LUFS (loudness unit of relative full scale or full-scale loudness unit), decibel, etc.

实际应用中，若识别到待处理音频流的编码格式为全景声编码格式，则获取设定音频响度参数，设定音频响度参数可以是预先设置的，也可以是在识别到待处理音频流的编码格式为全景声编码格式后，在显示器上显示全景声编码格式，由用户实时设置音频响度参数，进而得到设定音频响度参数。In practical applications, if it is identified that the encoding format of the audio stream to be processed is the panoramic sound encoding format, the set audio loudness parameter can be obtained, and the set audio loudness parameter can be set in advance, or it can be obtained after the audio stream to be processed is identified. After the coding format is the panoramic sound coding format, the panoramic sound coding format is displayed on the display, and the audio loudness parameter is set by the user in real time, thereby obtaining the set audio loudness parameter.

步骤306：根据所述设定音频响度参数，将所述待处理音频流转码成立体声编码格式的音频流。Step 306: Transcode the audio stream to be processed into an audio stream in stereo encoding format according to the set audio loudness parameter.

具体的，立体声编码格式是指立体声对应的编码格式，如AACLC格式。Specifically, the stereo encoding format refers to an encoding format corresponding to stereo, such as the AACLC format.

实际应用中，可以按照设定的转码策略，基于设定音频响度参数，将待处理音频流转码成立体声编码格式的音频流。In practical applications, the audio stream to be processed may be transcoded into an audio stream in a stereo encoding format based on a set audio loudness parameter according to a set transcoding strategy.

在本说明书一个或更多个可选的实施例中，可以基于设定音频响度参数，对待处理音频流进行解码和编码处理，进而得到立体声编码格式的音频流。也即所述根据所述设定音频响度参数，将所述待处理音频流转码成立体声编码格式的音频流，具体实现过程可以如下：In one or more optional embodiments of this specification, based on the set audio loudness parameter, the audio stream to be processed may be decoded and encoded, so as to obtain an audio stream in a stereo encoding format. That is to say, according to the set audio loudness parameter, the audio stream to be processed is transcoded into an audio stream in a stereo encoding format, and the specific implementation process can be as follows:

具体的，音频采样数据也即PCM(PulseCodeModulation，脉冲编码调制)格式的音频数据，是最贴近原始音频的格式，也称裸音频。设定立体声编码策略是指将全景声转换为立体声的编码方法、方案等。Specifically, the audio sampling data is audio data in PCM (Pulse Code Modulation, Pulse Code Modulation) format, which is the format closest to the original audio, and is also called naked audio. Setting a stereo coding strategy refers to a coding method, scheme, etc. for converting panoramic sound into stereo sound.

实际应用中，可以先将使用解码器对待处理音频流进行解码处理，得到原始的音频数据，也即音频采样数据，然后设定音频响度参数，按照设定立体声编码策略，并使用编码器对音频采样数据进行编码处理，得到立体声编码格式的音频流，且立体声编码格式的音频流的响度与设定音频响度参数的差值在设定范围内。如此，可以提高音频转码效率，并提高立体声编码格式的音频流的质量，从而提高用户的满意度。In practical applications, you can first use the decoder to decode the audio stream to be processed to obtain the original audio data, that is, audio sampling data, then set the audio loudness parameters, follow the set stereo encoding strategy, and use the encoder to process the audio The sampling data is encoded to obtain an audio stream in a stereo encoding format, and the difference between the loudness of the audio stream in the stereo encoding format and the set audio loudness parameter is within a set range. In this way, the audio transcoding efficiency can be improved, and the quality of the audio stream in the stereo encoding format can be improved, thereby improving user satisfaction.

在本说明书一个或更多个可选的实施例中，可以在编码的同时，基于设定音频响度参数调整音频流的响度。也即所述根据所述设定音频响度参数，按照设定立体声编码策略，对所述音频采样数据进行编码处理，得到所述立体声编码格式的音频流，具体实现过程可以如下：In one or more optional embodiments of this specification, while encoding, the loudness of the audio stream may be adjusted based on the set audio loudness parameter. That is to say, according to the set audio loudness parameter, according to the set stereo encoding strategy, the audio sampling data is encoded to obtain the audio stream in the stereo encoding format. The specific implementation process can be as follows:

具体的，设定时长内的声音级别是指一段时间内声音的平均大小级别。Specifically, the sound level within the set duration refers to an average level of the sound within a period of time.

实际应用中，利用设定立体声编码策略对编码器进行设置，然后设置后的编码器对音频采样数据进行编码，同时将设定音频响度参数输入编码器，使编码器调整音频采样数据在设定时长内的声音级别，即响度级别，进而得到立体声编码格式的音频流。如此，可以提高转码后立体声编码格式的音频流的准确度和质量，进而提高目标音频流的质量。In practical applications, the encoder is set by setting the stereo encoding strategy, and then the set encoder encodes the audio sample data, and at the same time, the set audio loudness parameter is input into the encoder, so that the encoder adjusts the audio sample data in the set The sound level within the duration, that is, the loudness level, and then obtain the audio stream in the stereo encoding format. In this way, the accuracy and quality of the transcoded audio stream in the stereo encoding format can be improved, thereby improving the quality of the target audio stream.

在本说明书一个或更多个可选的实施例中，所述待处理音频流包含多个音频数据包；此时，分别对每个音频数据包进行解码和编码。也即所述对所述待处理音频流进行解码处理，得到音频采样数据，具体实现过程可以如下：In one or more optional embodiments of this specification, the audio stream to be processed includes multiple audio data packets; at this time, each audio data packet is decoded and encoded respectively. That is to say, the audio stream to be processed is decoded to obtain audio sampling data. The specific implementation process can be as follows:

相应地，所述根据所述设定音频响度参数，按照设定立体声编码策略，对所述音频采样数据进行编码处理，得到立体声编码格式的音频流，包括：Correspondingly, according to the set audio loudness parameter, according to the set stereo encoding strategy, the audio sampling data is encoded to obtain an audio stream in a stereo encoding format, including:

具体的，音频数据包，即audiopacket，是指构成音频流的组成单位，即每个音频流都是有N个audiopacket组成的，编码和解码可以针对audiopacket去处理的。Specifically, an audio data packet, that is, an audiopacket, refers to a component unit that constitutes an audio stream, that is, each audio stream is composed of N audiopackets, and encoding and decoding can be processed for the audiopackets.

实际应用中，针对待处理音频流中的每个音频数据包，可以先将使用解码器对当前音频数据包进行解码处理，得到原始的子音频数据，也即子音频采样数据，然后设定音频响度参数，按照设定立体声编码策略，并使用编码器对子音频采样数据进行编码处理，得到立体声编码格式的子音频流，且立体声编码格式的子音频流的响度与设定音频响度参数的差值在设定范围内。遍历所有音频数据包之后，将各音频数据包对应的立体声编码格式的子音频流进行拼接，从而得到立体声编码格式的音频流。如此，可以进一步提高音频转码效率，并提高立体声编码格式的音频流的质量，从而提高用户的满意度。In practical applications, for each audio data packet in the audio stream to be processed, a decoder can be used to decode the current audio data packet to obtain the original sub-audio data, that is, the sub-audio sampling data, and then set the audio Loudness parameter, according to the set stereo encoding strategy, and use the encoder to encode the sub-audio sample data to obtain the sub-audio stream in the stereo encoding format, and the difference between the loudness of the sub-audio stream in the stereo encoding format and the set audio loudness parameter The value is within the set range. After traversing all the audio data packets, the sub-audio streams in the stereo encoding format corresponding to the audio data packets are spliced to obtain an audio stream in the stereo encoding format. In this way, the audio transcoding efficiency can be further improved, and the quality of the audio stream in the stereo coding format can be improved, thereby improving user satisfaction.

需要说明的是，每个audiopacket中的音频信息都是不同的，如响度、码率之类的。进行转码时，会对单个audiopacket去处理，指定固定的编码参数，即设定响度参数，最终编码的audiopacket实际的数据也是不同的，会根据audio packet本身的流信息单独处理。It should be noted that the audio information in each audiopacket is different, such as loudness and bit rate. When transcoding, a single audiopacket will be processed, and fixed encoding parameters will be specified, that is, the loudness parameter will be set. The actual data of the final encoded audiopacket will also be different, and will be processed separately according to the stream information of the audio packet itself.

可选地，针对每个子音频采样数据，根据所述设定音频响度参数，按照设定立体声编码策略，对所述子音频采样数据进行编码处理，得到立体声编码格式的子音频流；将各所述立体声编码格式的子音频流进行拼接，得到所述立体声编码格式的音频流，还可以为：针对每个子音频采样数据，按照设定立体声编码策略对子音频采样数据进行编码，并根据设定音频响度参数，调整子音频采样数据在设定时长内的声音级别，得到立体声编码格式的子音频流；将各立体声编码格式的子音频流进行拼接，得到立体声编码格式的音频流。Optionally, for each sub-audio sample data, according to the set audio loudness parameter, according to the set stereo encoding strategy, the sub-audio sample data is encoded to obtain a sub-audio stream in a stereo encoding format; Splicing the sub-audio streams in the stereo encoding format to obtain the audio stream in the stereo encoding format can also be: for each sub-audio sample data, encode the sub-audio sample data according to the set stereo encoding strategy, and according to the set The audio loudness parameter adjusts the sound level of the sub-audio sampling data within the set duration to obtain a sub-audio stream in a stereo encoding format; splicing the sub-audio streams in each stereo encoding format to obtain an audio stream in a stereo encoding format.

在本说明书一种可实现的实施例中，在获取到待处理音频流的基础上，可以直接对待处理音频流的编码格式进行识别，简化处理流程，从而提高处理效率。In a practicable embodiment of the present specification, on the basis of obtaining the audio stream to be processed, the encoding format of the audio stream to be processed can be directly identified, thereby simplifying the processing flow and improving processing efficiency.

在本说明书另一种可实现的实施例中，在获取到待处理音频流的基础上，还可以先获取待处理音频流的音轨信息，然后基于音轨信息，识别待处理音频流的编码格式。也即所述获取设定音频响度参数之前，还包括：In another realizable embodiment of this specification, on the basis of obtaining the audio stream to be processed, the audio track information of the audio stream to be processed can also be obtained first, and then based on the audio track information, the encoding of the audio stream to be processed can be identified Format. That is, before obtaining and setting the audio loudness parameter, it also includes:

具体的，音轨是指音乐的轨道，每条或每路音轨对应一个音频流。单路音轨是指待处理音频流对应一个音频流。Specifically, an audio track refers to a music track, and each or each audio track corresponds to an audio stream. A single audio track means that the audio stream to be processed corresponds to one audio stream.

实际应用中，可以根据音频流的数据属性信息，识别待处理音频流的音轨信息，也可以媒体文件信息查看工具，对待处理音频流进行检测，识别待处理音频流的音轨信息。In practical applications, the audio track information of the audio stream to be processed can be identified according to the data attribute information of the audio stream, and the audio track information of the audio stream to be processed can also be detected by a media file information viewing tool.

例如，可以调用mediainfo命令识别待处理音频流的音轨信息。For example, the mediainfo command can be called to identify the audio track information of the audio stream to be processed.

如此，通过先识别待处理音频流的音轨信息，在单路音轨的情况下，再对识别待处理音频流的编码格式，可以避免在双路音轨的情况下，两种不同的编码格式的待处理音频流，导致识别出错，进而提高识别效率。In this way, by first identifying the audio track information of the audio stream to be processed, and then identifying the encoding format of the audio stream to be processed in the case of a single audio track, it is possible to avoid two different encodings in the case of a dual audio track. format of the pending audio stream, resulting in recognition errors, thereby improving recognition efficiency.

可选地，音轨信息还可以是双路音轨，此时，需要对各路音轨对应的待处理音频流进行转码处理。也即所述获取所述待处理音频流的音轨信息之后，还包括：Optionally, the audio track information may also be two-way audio tracks. In this case, it is necessary to perform transcoding processing on the pending audio streams corresponding to each track. That is, after the acquisition of the audio track information of the audio stream to be processed, it also includes:

具体的，单路音轨是指待处理音频流对应两个音频流，即有两个待处理音频流。Specifically, a single audio track means that the audio stream to be processed corresponds to two audio streams, that is, there are two audio streams to be processed.

实际应用中，在音轨信息为双路音轨的情况下，可以调用媒体文件信息查看工具，分别识别各路音轨对应的待处理音频流的编码格式，对于编码格式为立体声编码格式的待处理音频流，将其进行转码，得到立体声编码格式的目标音频流，例如将acc文件格式的立体声音频流(立体声编码格式的待处理音频流)，转换成mp3文件格式的立体声音频流；对于编码格式为全景声编码格式的待处理音频流，将其进行转码，得到全景声编码格式的目标音频流。In practical applications, when the audio track information is a dual audio track, you can call the media file information viewing tool to identify the encoding format of the audio stream corresponding to each audio track. For the audio stream whose encoding format is stereo encoding format Process the audio stream and transcode it to obtain the target audio stream in the stereo encoding format, for example, convert the stereo audio stream in the acc file format (the audio stream to be processed in the stereo encoding format) into a stereo audio stream in the mp3 file format; for The encoding format is the audio stream to be processed in the panoramic sound encoding format, and it is transcoded to obtain the target audio stream in the panoramic sound encoding format.

由于是双路音轨，只需要将立体声编码格式的待处理音频流转码成立体声编码格式的目标音频流，将全景声编码格式的待处理音频流转码成全景声编码格式的目标音频流，没有立体声编码格式和全景声编码格式之间的转换，转码前后响度变化微乎其微，因此，无需获取设定音频响度参数。如此，不仅在保证转码质量的同时，可以简化处理流程，提高处理效率。Since it is a dual audio track, it is only necessary to transcode the audio stream to be processed in the stereo encoding format to the target audio stream in the stereo encoding format, and to transcode the audio stream to be processed in the atmos encoding format to the target audio stream in the atmos encoding format. For the conversion between the stereo encoding format and the panoramic encoding format, the change in loudness before and after transcoding is minimal, so there is no need to obtain and set the audio loudness parameter. In this way, not only can the transcoding quality be guaranteed, but also the processing flow can be simplified and the processing efficiency can be improved.

此外，在识别所述待处理音频流的编码格式之后，还包括：在所述待处理音频流的编码格式为全景声编码格式的情况下，将所述待处理音频流转码成全景声编码格式的目标音频流。In addition, after identifying the coding format of the audio stream to be processed, it also includes: transcoding the audio stream to be processed into a panoramic sound coding format when the coding format of the audio stream to be processed is a panoramic sound coding format The target audio stream for .

需要说明的是，无论是立体声编码格式的待处理音频流转码成立体声编码格式的目标音频流，还是全景声编码格式的待处理音频流转码成全景声编码格式的目标音频流，其过程相似。即对指定编码格式的待处理音频流进行解码处理，得到音频采样数据；根据目标文件格式，对音频采样数据进行编码处理，得到指定编码格式的目标音频流。其中，指定编码格式为立体声编码格式或全景声编码格式，目标文件格式为通用文件格式，以便于各种播放设备都能识别该目标音频流进行播放。It should be noted that whether the audio stream to be processed in the stereo coding format is transcoded into the target audio stream in the stereo coding format, or the audio stream in the panoramic sound coding format is transcoded into the target audio stream in the panoramic sound coding format, the process is similar. That is, the audio stream to be processed in the specified encoding format is decoded to obtain audio sample data; according to the target file format, the audio sample data is encoded to obtain the target audio stream in the specified encoding format. Wherein, the specified encoding format is a stereo encoding format or panoramic sound encoding format, and the target file format is a common file format, so that various playback devices can recognize the target audio stream for playback.

步骤308：根据所述立体声编码格式的音频流，确定所述待处理音频流转码后的目标音频流。Step 308: According to the audio stream in the stereo encoding format, determine a target audio stream after transcoding the audio stream to be processed.

实际应用中，在得到了立体声编码格式的音频流后，可以将立体声编码格式的音频流进行文件格式转换，得到立体声编码格式的目标音频流。In practical applications, after the audio stream in the stereo encoding format is obtained, the audio stream in the stereo encoding format may be converted to a file format to obtain a target audio stream in the stereo encoding format.

在本说明书一个或更多个可选的实施例中，可以获取待处理视频流，对待处理视频流中的待处理音频流进行处理。也即，所述获取待处理音频流，包括：In one or more optional embodiments of this specification, the video stream to be processed may be acquired, and the audio stream to be processed in the video stream to be processed may be processed. That is, the acquisition of the audio stream to be processed includes:

相应地，所述获取待处理多媒体流之后，还包括：Correspondingly, after the acquisition of the multimedia stream to be processed, it also includes:

实际应用中，待处理对象为待处理视频流，待处理视频流中包含有待处理音频流和待处理图像序列，其中待处理图像序列是指按照一定顺序排列的多个图像。然后，对待处理音频流，识别待处理音频流的编码格式；在待处理音频流的编码格式为全景声编码格式的情况下，获取设定音频响度参数；根据设定音频响度参数，将待处理音频流转码成立体声编码格式的音频流；根据立体声编码格式的音频流，确定待处理音频流转码后的目标音频流。对待处理图像序列同样进行转码处理，得到目标图像格式的目标图像序列，其中目标图像格式为通过的图像格式，即显示设备均可显示的图像格式。进一步地，将目标音频流和目标图像序列按照对应的时间戳进行对齐处理，得到目标视频流。In practical applications, the object to be processed is a video stream to be processed, and the video stream to be processed includes an audio stream to be processed and an image sequence to be processed, wherein the image sequence to be processed refers to multiple images arranged in a certain order. Then, identify the encoding format of the audio stream to be processed; if the encoding format of the audio stream to be processed is the panorama encoding format, obtain the set audio loudness parameter; according to the set audio loudness parameter, the to-be-processed The audio stream is transcoded into an audio stream in a stereo encoding format; and a target audio stream after the transcoding of the audio stream to be processed is determined according to the audio stream in a stereo encoding format. The image sequence to be processed is also transcoded to obtain a target image sequence in a target image format, wherein the target image format is an approved image format, that is, an image format that can be displayed by a display device. Further, the target audio stream and the target image sequence are aligned according to the corresponding time stamps to obtain the target video stream.

参见图4A，图4A示出了本申请一实施例提供的一种音频流处理方法的处理流程图：Referring to FIG. 4A, FIG. 4A shows a processing flowchart of an audio stream processing method provided by an embodiment of the present application:

在视频生产阶段，检测用户上传的原始音频元信息，即待处理音频流：使用mediainfo对音频流进行预检测(元信息检测)，获取待处理音频流的编码格式和对应的音轨信息。编码格式为E-AC-3JOC(全景声编码格式)时，表示该待处理音频流是全景声；编码格式为AACLC(立体声编码格式)时，表示该待处理音频流是立体声。多路音轨时，可以同时获取不同待处理音频流的编码格式。In the video production stage, detect the original audio meta information uploaded by users, that is, the audio stream to be processed: use mediainfo to pre-detect the audio stream (meta information detection), and obtain the encoding format and corresponding audio track information of the audio stream to be processed. When the encoding format is E-AC-3JOC (panoramic sound coding format), it means that the audio stream to be processed is panoramic sound; when the coding format is AACLC (stereo sound coding format), it means that the audio stream to be processed is stereo. When multiple audio tracks are used, the encoding formats of different audio streams to be processed can be obtained at the same time.

若检测到待处理音频流为单路音轨且编码格式为E-AC-3JOC，即单路杜比全景声音轨，则获取设定音频响度参数-18LUFS，将E-AC-3JOC的待处理音频流转码成音频响度参数为-18LUFS，且为通用文件格式的AACLC目标音频流(全景声->立体声)。即在原始音频流为单音轨杜比全景声时，在音频转码时，先调整将声音量恢复至正常的响度参数，实际响度参数为一个固定响度值-18LUFS，该响度值对动态范围较大的音频较为友好，实际应用时的效果为低音不丢失，高音较为清晰。此外，还需要E-AC-3JOC的待处理音频流进行文件格式转码，得到通用文件格式的E-AC-3JOC目标音频流(全景声->全景声)。If it is detected that the audio stream to be processed is a single-channel audio track and the encoding format is E-AC-3JOC, that is, a single-channel Dolby Atmos Process the transcoding of the audio stream into an AACLC target audio stream (atmos -> stereo) with an audio loudness parameter of -18LUFS and a common file format. That is, when the original audio stream is a single-track Dolby Atmos, when transcoding the audio, first adjust the loudness parameter to restore the sound volume to the normal one. The actual loudness parameter is a fixed loudness value -18LUFS, which affects the dynamic range Larger audio is more friendly, the effect in actual application is that the bass is not lost, and the treble is clearer. In addition, the audio stream to be processed of E-AC-3JOC needs to be transcoded in file format to obtain the E-AC-3JOC target audio stream in a common file format (atmosound->atmosound).

若检测到待处理音频流为双路音轨，且第一路音轨的编码格式为E-AC-3JOC，第二路音轨的编码格式为AACLC，即一路杜比全景声音轨+一路立体声音轨。在转码时，根据预期结果自动选择合适的音轨进行转码。使用原始音频中的立体声音轨转码生成对应的立体声音频，使用原始音频中的全景声音轨转码生成对应的全景声音频。即将第一路音轨对应的待处理音频流进行文件格式转码即可，得到通用文件格式的E-AC-3JOC目标音频流(全景声->全景声)；将第二路音轨对应的待处理音频流进行文件格式转码即可，得到通用文件格式的AACLC目标音频流(立体声->立体声)。If it is detected that the audio stream to be processed is a dual audio track, and the encoding format of the first audio track is E-AC-3JOC, and the encoding format of the second audio track is AACLC, that is, one Dolby panoramic sound track + one Stereo soundtrack. When transcoding, the appropriate audio track is automatically selected for transcoding based on the expected result. Use the transcoding of the stereo track in the original audio to generate the corresponding stereo audio, and use the transcoding of the atmos track in the original audio to generate the corresponding atmos audio. Just transcode the file format of the audio stream to be processed corresponding to the first audio track to obtain the E-AC-3JOC target audio stream (atmos->atmosound) in the general file format; convert the audio stream corresponding to the second audio track to The audio stream to be processed can be transcoded in the file format to obtain the AACLC target audio stream in the common file format (stereo -> stereo).

下述结合附图4B，以本申请提供的音频流处理方法在影片场景中的应用为例，对所述音频流处理方法进行进一步说明。其中，图4B示出了本申请一实施例提供的一种应用于影片场景的音频流处理方法的处理流程图，具体包括以下步骤：In the following, the audio stream processing method will be further described by taking the application of the audio stream processing method provided in this application in a film scene as an example in conjunction with FIG. 4B . Wherein, FIG. 4B shows a processing flowchart of an audio stream processing method applied to a movie scene provided by an embodiment of the present application, which specifically includes the following steps:

步骤402：获取待处理影片视频流，其中，待处理影片视频流包含待处理音频流和待处理图像序列。Step 402: Obtain a video stream to be processed, wherein the video stream to be processed includes an audio stream to be processed and an image sequence to be processed.

步骤404：将待处理图像序列进行转码处理，得到目标图像序列。Step 404: Transcoding the image sequence to be processed to obtain a target image sequence.

步骤406：获取待处理音频流的音轨信息。Step 406: Obtain audio track information of the audio stream to be processed.

步骤408：在音轨信息为单路音轨的情况下，识别待处理音频流的编码格式。Step 408: If the audio track information is a single audio track, identify the encoding format of the audio stream to be processed.

步骤410：在待处理音频流的编码格式为全景声编码格式的情况下，获取设定音频响度参数。Step 410: When the encoding format of the audio stream to be processed is the panoramic sound encoding format, acquire and set audio loudness parameters.

步骤412：对待处理音频流进行解码处理，得到音频采样数据。Step 412: Decode the audio stream to be processed to obtain audio sample data.

可选地，待处理音频流包含多个音频数据包；Optionally, the audio stream to be processed includes a plurality of audio data packets;

对待处理音频流进行解码处理，得到音频采样数据，包括：Decode the audio stream to be processed to obtain audio sample data, including:

对每个音频数据包进行解码处理，得到各音频采样数据对应的子音频采样数据。Each audio data packet is decoded to obtain sub-audio sample data corresponding to each audio sample data.

步骤414：按照设定立体声编码策略，对音频采样数据进行编码，并根据设定音频响度参数，调整音频采样数据在设定时长内的声音级别，得到立体声编码格式的音频流。Step 414: Encode the audio sample data according to the set stereo encoding strategy, and adjust the sound level of the audio sample data within the set duration according to the set audio loudness parameter to obtain an audio stream in stereo encoding format.

按照设定立体声编码策略，对音频采样数据进行编码，并根据设定音频响度参数，调整音频采样数据在设定时长内的声音级别，得到立体声编码格式的音频流，包括：According to the set stereo encoding strategy, the audio sample data is encoded, and according to the set audio loudness parameter, the sound level of the audio sample data is adjusted within the set duration, and the audio stream in the stereo encoding format is obtained, including:

针对每个子音频采样数据，按照设定立体声编码策略对子音频采样数据进行编码，并根据设定音频响度参数，调整子音频采样数据在设定时长内的声音级别，得到立体声编码格式的子音频流；For each sub-audio sample data, encode the sub-audio sample data according to the set stereo encoding strategy, and adjust the sound level of the sub-audio sample data within the set duration according to the set audio loudness parameters, and obtain the sub-audio in the stereo encoding format flow;

将各立体声编码格式的子音频流进行拼接，得到立体声编码格式的音频流。The sub-audio streams in the stereo encoding format are spliced to obtain an audio stream in the stereo encoding format.

步骤416：根据立体声编码格式的音频流，确定待处理音频流转码后的目标音频流。Step 416: According to the audio stream in the stereo encoding format, determine the target audio stream after transcoding the audio stream to be processed.

步骤418：在音轨信息为双路音轨的情况下，分别识别各路音轨对应的待处理音频流的编码格式。Step 418: In the case that the audio track information is a dual audio track, respectively identify the encoding format of the audio stream to be processed corresponding to each audio track.

步骤420：将编码格式为立体声编码格式的待处理音频流进行文件格式转码，得到立体声编码格式的目标音频流。Step 420: Perform file format transcoding on the audio stream to be processed whose encoding format is the stereo encoding format, to obtain a target audio stream in the stereo encoding format.

步骤422：将编码格式为全景声编码格式的待处理音频流进行文件格式转码，得到全景声编码格式的目标音频流。Step 422: Perform file format transcoding on the audio stream to be processed whose encoding format is the panoramic sound encoding format, to obtain a target audio stream in the panoramic sound encoding format.

步骤424：将目标音频流和目标图像序列进行对齐处理，确定目标影片视频流。Step 424: Align the target audio stream and the target image sequence to determine the target video stream.

本申请提供的音频流处理方法，通过获取的设定音频响度参数，将待处理音频流转码成立体声编码格式的音频流，进而得到目标视频流，可以有效地解决全景声转立体声时产生的音量过低问题，在提高音频质量的同时，简化了音频处理过程，提高处理效率，同时能兼容立体声音频的正常生产。The audio stream processing method provided by this application transcodes the audio stream to be processed into an audio stream in stereo encoding format by obtaining the set audio loudness parameters, and then obtains the target video stream, which can effectively solve the volume generated when panoramic sound is converted to stereo The problem of too low, while improving the audio quality, simplifies the audio processing process, improves the processing efficiency, and is compatible with the normal production of stereo audio.

与上述方法实施例相对应，本申请还提供了音频流处理装置实施例，图5示出了本申请一实施例提供的一种音频流处理装置的结构示意图。如图5所示，该装置包括：Corresponding to the above method embodiment, the present application also provides an embodiment of an audio stream processing device, and FIG. 5 shows a schematic structural diagram of an audio stream processing device provided by an embodiment of the present application. As shown in Figure 5, the device includes:

第一识别模块502，被配置为获取待处理音频流，并识别所述待处理音频流的编码格式；The first identifying module 502 is configured to acquire the audio stream to be processed, and identify the encoding format of the audio stream to be processed;

获取模块504，被配置为在所述待处理音频流的编码格式为全景声编码格式的情况下，获取设定音频响度参数；The obtaining module 504 is configured to obtain a set audio loudness parameter when the encoding format of the audio stream to be processed is the panoramic sound encoding format;

第一转码模块506，被配置为根据所述设定音频响度参数，将所述待处理音频流转码成立体声编码格式的音频流；The first transcoding module 506 is configured to transcode the audio stream to be processed into an audio stream in a stereo coding format according to the set audio loudness parameter;

确定模块508，被配置为根据所述立体声编码格式的音频流，确定所述待处理音频流转码后的目标音频流。The determination module 508 is configured to determine a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format.

可选地，所述第一转码模块506，进一步被配置为：Optionally, the first transcoding module 506 is further configured to:

所述第一转码模块506，进一步被配置为：The first transcoding module 506 is further configured to:

可选地，所述第一识别模块502，进一步被配置为：Optionally, the first identification module 502 is further configured to:

所述装置还包括第二转码模块，被配置为：The device also includes a second transcoding module configured to:

可选地，所述装置还包括第二识别模块，被配置为：Optionally, the device further includes a second identification module configured to:

可选地，所述装置还包括第三识别模块，被配置为：Optionally, the device further includes a third identification module configured to:

本申请提供的音频流处理装置，通过获取的设定音频响度参数，将待处理音频流转码成立体声编码格式的音频流，进而得到目标视频流，可以有效地解决全景声转立体声时产生的音量过低问题，在提高音频质量的同时，简化了音频处理过程，提高处理效率，同时能兼容立体声音频的正常生产。The audio stream processing device provided by this application transcodes the audio stream to be processed into an audio stream in a stereo encoding format by obtaining the set audio loudness parameters, and then obtains a target video stream, which can effectively solve the volume generated when panoramic sound is converted to stereo The problem of too low, while improving the audio quality, simplifies the audio processing process, improves the processing efficiency, and is compatible with the normal production of stereo audio.

上述为本实施例的一种音频流处理装置的示意性方案。需要说明的是，该音频流处理装置的技术方案与上述的音频流处理方法的技术方案属于同一构思，音频流处理装置的技术方案未详细描述的细节内容，均可以参见上述音频流处理方法的技术方案的描述。The foregoing is a schematic solution of an audio stream processing apparatus in this embodiment. It should be noted that the technical solution of the audio stream processing device and the technical solution of the above-mentioned audio stream processing method belong to the same idea, and details of the technical solution of the audio stream processing device that are not described in detail can be found in the above-mentioned audio stream processing method. Description of the technical solution.

图6示出了根据本申请一实施例提供的一种计算设备的结构框图。该计算设备600的部件包括但不限于存储器610和处理器620。处理器620与存储器610通过总线630相连接，数据库650用于保存数据。Fig. 6 shows a structural block diagram of a computing device provided according to an embodiment of the present application. Components of the computing device 600 include, but are not limited to, memory 610 and processor 620 . The processor 620 is connected to the memory 610 through the bus 630, and the database 650 is used for saving data.

计算设备600还包括接入设备640，接入设备640使得计算设备600能够经由一个或多个网络660通信。这些网络的示例包括公用交换电话网(PSTN，PublicSwitchedTelephoneNetwork)、局域网(LAN，LocalAreaNetwork)、广域网(WAN，WideAreaNetwork)、个域网(PAN，PersonalAreaNetwork)或诸如因特网的通信网络的组合。接入设备640可以包括有线或无线的任何类型的网络接口(例如，网络接口卡(NIC，NetworkInterfaceController))中的一个或多个，诸如IEEE802.11无线局域网(WLAN，WirelessLocalAreaNetwork)无线接口、全球微波互联接入(Wi-MAX，WorldwideInteroperabilityforMicrowave Access)接口、以太网接口、目标串行总线(USB，UniversalSerialBus)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC，NearFieldCommunication)接口，等等。Computing device 600 also includes an access device 640 that enables computing device 600 to communicate via one or more networks 660 . Examples of these networks include a public switched telephone network (PSTN, PublicSwitchedTelephoneNetwork), a local area network (LAN, LocalAreaNetwork), a wide area network (WAN, WideAreaNetwork), a personal area network (PAN, PersonalAreaNetwork) or a combination of communication networks such as the Internet. The access device 640 may include one or more of wired or wireless network interfaces of any type (for example, a network interface card (NIC, NetworkInterfaceController)), such as an IEEE802.11 wireless local area network (WLAN, WirelessLocalAreaNetwork) wireless interface, a global microwave Internet access (Wi-MAX, Worldwide Interoperability for Microwave Access) interface, Ethernet interface, target serial bus (USB, UniversalSerialBus) interface, cellular network interface, Bluetooth interface, near field communication (NFC, NearFieldCommunication) interface, etc.

在本申请的一个实施例中，计算设备600的上述部件以及图6中未示出的其他部件也可以彼此相连接，例如通过总线。应当理解，图6所示的计算设备结构框图仅仅是出于示例的目的，而不是对本申请范围的限制。本领域技术人员可以根据需要，增添或替换其他部件。In an embodiment of the present application, the above-mentioned components of the computing device 600 and other components not shown in FIG. 6 may also be connected to each other, for example, through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 6 is only for the purpose of illustration, rather than limiting the scope of the application. Those skilled in the art can add or replace other components as needed.

计算设备600可以是任何类型的静止或移动计算设备，包括移动计算机或移动计算设备(例如，平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如，智能手机)、可佩戴的计算设备(例如，智能手表、智能眼镜等)或其他类型的移动设备，或者诸如台式计算机或PC的静止计算设备。计算设备600还可以是移动式或静止式的服务器。Computing device 600 may be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile telephones (e.g., smartphones), ), wearable computing devices (eg, smart watches, smart glasses, etc.), or other types of mobile devices, or stationary computing devices such as desktop computers or PCs. Computing device 600 may also be a mobile or stationary server.

其中，处理器620执行所述计算机指令时实现所述的音频流处理方法的步骤。Wherein, the processor 620 implements the steps of the audio stream processing method when executing the computer instructions.

上述为本实施例的一种计算设备的示意性方案。需要说明的是，该计算设备的技术方案与上述的音频流处理方法的技术方案属于同一构思，计算设备的技术方案未详细描述的细节内容，均可以参见上述音频流处理方法的技术方案的描述。The foregoing is a schematic solution of a computing device in this embodiment. It should be noted that the technical solution of the computing device and the above-mentioned technical solution of the audio stream processing method belong to the same idea, and details of the technical solution of the computing device that are not described in detail can be found in the description of the technical solution of the above-mentioned audio stream processing method .

本申请一实施例还提供一种计算机可读存储介质，其存储有计算机指令，该计算机指令被处理器执行时实现如前所述音频流处理方法的步骤。An embodiment of the present application further provides a computer-readable storage medium, which stores computer instructions, and when the computer instructions are executed by a processor, the steps of the aforementioned audio stream processing method are implemented.

上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是，该存储介质的技术方案与上述的音频流处理方法的技术方案属于同一构思，存储介质的技术方案未详细描述的细节内容，均可以参见上述音频流处理方法的技术方案的描述。The foregoing is a schematic solution of a computer-readable storage medium in this embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the above-mentioned audio stream processing method belong to the same idea, and details of the technical solution of the storage medium that are not described in detail can be found in the description of the above-mentioned technical solution of the audio stream processing method .

上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of the present application. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

所述计算机指令包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-OnlyMemory)、随机存取存储器(RAM，RandomAccessMemory)、电载波信号、电信信号以及软件分发介质等。The computer instructions include computer program code, which may be in source code form, object code form, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-OnlyMemory), Random access memory (RAM, RandomAccessMemory), electric carrier signal, telecommunication signal and software distribution medium, etc.

需要说明的是，对于前述的各方法实施例，为了简便描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请并不受所描述的动作顺序的限制，因为依据本申请，某些步骤可以采用其它顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定都是本申请所必须的。It should be noted that, for the sake of simplicity of description, the aforementioned method embodiments are expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Depending on the application, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by this application.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其它实施例的相关描述。In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

以上公开的本申请优选实施例只是用于帮助阐述本申请。可选实施例并没有详尽叙述所有的细节，也不限制该发明仅为所述的具体实施方式。显然，根据本申请的内容，可作很多的修改和变化。本申请选取并具体描述这些实施例，是为了更好地解释本申请的原理和实际应用，从而使所属技术领域技术人员能很好地理解和利用本申请。本申请仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of the present application disclosed above are only used to help clarify the present application. Alternative embodiments are not exhaustive in all detail, nor are the inventions limited to specific implementations described. Obviously, many modifications and changes can be made according to the content of this application. This application selects and specifically describes these embodiments in order to better explain the principles and practical applications of this application, so that those skilled in the art can well understand and use this application. This application is to be limited only by the claims, along with their full scope and equivalents.

Claims

1. An audio stream processing method, comprising:

acquiring an audio stream to be processed, and identifying the coding format of the audio stream to be processed;

acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format;

according to the set audio loudness parameter, the audio stream to be processed is coded into an audio stream in a stereo coding format;

and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format.

2. The method of claim 1, wherein the transcoding the audio stream to be processed into an audio stream in a stereo encoding format according to the set audio loudness parameter comprises:

decoding the audio stream to be processed to obtain audio sampling data;

and according to the set audio loudness parameter and a set stereo coding strategy, carrying out coding processing on the audio sampling data to obtain an audio stream in a stereo coding format.

3. The method according to claim 2, wherein the encoding the audio sample data according to the set audio loudness parameter and the set stereo coding strategy to obtain the audio stream in the stereo coding format comprises:

and coding the audio sampling data according to a set stereo coding strategy, and adjusting the sound level of the audio sampling data within a set time length according to the set audio loudness parameter to obtain an audio stream in the stereo coding format.

4. A method according to claim 2 or 3, wherein the audio stream to be processed comprises a plurality of audio data packets;

the decoding processing of the audio stream to be processed to obtain audio sample data includes:

decoding each audio data packet to obtain sub-audio sampling data corresponding to each audio sampling data;

the step of coding the audio sampling data according to the set audio loudness parameter and the set stereo coding strategy to obtain an audio stream in a stereo coding format includes:

for each piece of sub-audio sampling data, according to the set audio loudness parameter and a set stereo coding strategy, coding the sub-audio sampling data to obtain a sub-audio stream in a stereo coding format;

and splicing the sub audio streams of the stereo coding format to obtain the audio stream of the stereo coding format.

5. The method of claim 1, wherein the obtaining the audio stream to be processed comprises:

acquiring a video stream to be processed, wherein the video stream to be processed comprises an audio stream to be processed and an image sequence to be processed;

after the obtaining of the multimedia stream to be processed, the method further includes:

transcoding the image sequence to be processed to obtain a target image sequence;

and aligning the target audio stream and the target image sequence to determine a target video stream.

6. The method of claim 1, wherein before obtaining the set audio loudness parameter, further comprising:

acquiring audio track information of the audio stream to be processed;

and identifying the coding format of the audio stream to be processed under the condition that the audio track information is a single-channel audio track.

7. The method according to claim 6, wherein after the obtaining of the track information of the audio stream to be processed, the method further comprises:

under the condition that the audio track information is two paths of audio tracks, respectively identifying the coding formats of the audio streams to be processed corresponding to the audio tracks;

and/or, carrying out file format transcoding on the audio stream to be processed with the coding format being a panoramic sound coding format to obtain a target audio stream with the panoramic sound coding format.

8. An audio stream processing apparatus, comprising:

the device comprises a first identification module, a second identification module and a processing module, wherein the first identification module is configured to acquire an audio stream to be processed and identify the coding format of the audio stream to be processed;

the acquisition module is configured to acquire a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramagram coding format;

a first transcoding module configured to transcode the audio stream to be processed into an audio stream in a stereo coding format according to the set audio loudness parameter;

the determining module is configured to determine a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format.

9. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-7 when executing the computer instructions.

10. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 7.