CN114363792A

CN114363792A - Transmission audio track format serial metadata generation method, device, equipment and medium

Info

Publication number: CN114363792A
Application number: CN202111425590.9A
Authority: CN
Inventors: 吴健
Original assignee: Saiyinxin Micro Beijing Electronic Technology Co ltd
Current assignee: Saiyinxin Micro Beijing Electronic Technology Co ltd
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-04-15

Abstract

The present disclosure relates to a transmission audio track format serial metadata generation method, apparatus, device, and medium, the method comprising: acquiring attributes and sub-elements of the transmission audio track format serial metadata; generating the transmission track serial metadata according to the attribute and the sub-elements; wherein the attribute is used for describing interface information of the transmission audio signal, and the sub-element is used for describing the transmission audio track and the format type of the transmitted audio signal. Converting model elements in an audio model into corresponding serial audio metadata, making an existing audio file into frames in real-time production and streaming audio applications, and transmitting audio track format serial metadata for describing the relationship of a physical audio track and an audio track unique identification element in the serial audio metadata so as to transmit the frames in real time through a transmission interface.

Description

Transmission audio track format serial metadata generation method, device, equipment and medium

Technical Field

The present disclosure relates to the field of audio processing technologies, and in particular, to a method, an apparatus, a device, and a medium for generating serial metadata in a transmission audio track format.

Background

With the development of technology, audio becomes more and more complex. The early single-channel audio is converted into stereo, and the working center also focuses on the correct processing mode of the left and right channels. But the process begins to become complex after surround sound occurs. The surround 5.1 speaker system performs ordering constraint on a plurality of channels, and further the surround 6.1 speaker system, the surround 7.1 speaker system and the like enable audio processing to be varied, and correct signals are transmitted to proper speakers to form an effect of mutual involvement. Thus, as sound becomes more immersive and interactive, the complexity of audio processing also increases greatly.

Audio channels (or audio channels) refer to audio signals that are independent of each other and that are captured or played back at different spatial locations when sound is recorded or played. The number of channels is the number of sound sources when recording or the number of corresponding speakers when playing back sound. For example, in a surround 5.1 speaker system comprising audio signals at 6 different spatial locations, each separate audio signal is used to drive a speaker at a corresponding spatial location; in a surround 7.1 speaker system comprising audio signals at 8 different spatial positions, each separate audio signal is used to drive a speaker at a corresponding spatial position.

Therefore, the effect achieved by current loudspeaker systems depends on the number and spatial position of the loudspeakers. For example, a binaural speaker system cannot achieve the effect of a surround 5.1 speaker system.

Disclosure of Invention

The invention aims to provide a method, a device, equipment and a medium for generating serial metadata in a transmission audio track format, so as to describe the relation between a physical audio track and a unique audio track identification element in the serial audio metadata, and to transmit a serial audio metadata frame in real time through a transmission interface.

A first aspect of the present disclosure provides a transmission audio track format serial metadata generation method, including:

acquiring attributes and sub-elements of the transmission audio track format serial metadata;

generating the transmission track serial metadata according to the attribute and the sub-elements;

wherein the attribute is used for describing interface information of the transmission audio signal, and the sub-element is used for describing the transmission audio track and the format type of the transmitted audio signal.

A second aspect of the present disclosure provides a transmission track format serial metadata generation apparatus, including:

the acquisition module is used for acquiring the attribute and the sub-element of the transmission audio track format serial metadata;

a generation module for generating the transmission track serial metadata according to the attribute and the sub-element;

A third aspect of the present disclosure provides an electronic device, comprising: a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a transmission track format serial metadata generation method as provided by any of the embodiments.

A fourth aspect of the present disclosure provides a storage medium containing computer-executable instructions that implement the transmission track format serial metadata generation method provided as any embodiment in a computer processor.

It can be seen from the above that, the disclosed transmission audio track format serial metadata generation method, apparatus, device, and medium convert elements in audio model metadata into corresponding serial audio metadata, make existing audio files into frames in real-time production and streaming audio applications, describe the relationship of unique identification elements of audio tracks in physical audio tracks and serial audio metadata, so as to transmit the frames of serial audio metadata in real-time through a transmission interface.

Drawings

FIG. 1 is a schematic diagram of a three-dimensional acoustic audio model provided in an embodiment of the present disclosure;

fig. 2 is a flowchart of a transmission track format serial metadata generation method in an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a transmission track format serial metadata generation apparatus according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Examples

As shown in fig. 1, a three-dimensional acoustic audio model is composed of a set of elements, each element describing one stage of audio production, the three-dimensional acoustic audio model including a content part and a format part.

Wherein the content part comprises: an audio program element, an audio content element, an audio object element, and a soundtrack unique identification element; the format part includes: an audio packet format element, an audio channel format element, an audio stream format element, and an audio track format element;

the audio program element references at least one of the audio content elements; the audio content element references at least one audio object element; the audio object element references the corresponding audio package format element and the corresponding audio track unique identification element; the audio track unique identification element refers to the corresponding audio track format element and the corresponding audio package format element;

the audio package format element references at least one of the audio channel format elements; the audio stream format element refers to the corresponding audio channel format element and the corresponding audio packet format element; the audio track format element and the corresponding audio stream format element are referenced to each other. The reference relationships between elements are indicated by arrows in fig. 1.

The audio program may include, but is not limited to, narration, sound effects, and background music, the audio program elements may be used to describe a program, the program includes at least one content, and the audio content elements are used to describe a corresponding one of the audio program elements. An audio program element may reference one or more audio content elements that are grouped together to construct a complete audio program element.

The audio content elements describe the content of a component of an audio program, such as background music, and relate the content to its format by reference to one or more audio object elements.

The audio object elements are used to build content, format and valuable information and to determine the soundtrack unique identification of the actual soundtrack.

The format part includes: an audio packet format element, an audio channel format element, an audio stream format element, an audio track format element.

The audio packet format element may be configured to describe a format adopted when the audio object element and the original audio data are packed according to channel packets.

The audio channel format element may be used to represent a single sequence of audio samples and preset operations performed on it, such as movement of rendering objects in a scene. The audio channel format element may comprise at least one audio block format element. The audio block format elements may be considered to be sub-elements of the audio channel format elements, and therefore there is an inclusion relationship between the audio channel format elements and the audio block format elements.

Audio streams, which are combinations of audio tracks needed to render channels, objects, higher-order ambient sound components, or packets. The audio stream format element is used for establishing the relationship between the audio track format element set and the audio channel format element set, or the relationship between the audio track format set and the audio packet format.

The audio track format elements correspond to a set of samples or data in a single audio track, and are used to describe the format of the original audio data, and the decoded signals of the renderer, and also to identify the combination of audio tracks required to successfully decode the audio track data.

And generating synthetic audio data containing metadata after the original audio data are produced through the three-dimensional sound audio model.

The Metadata (Metadata) is information describing characteristics of data, and functions supported by the Metadata include indicating a storage location, history data, resource lookup, or file record.

And after the synthesized audio data is transmitted to the far end in a communication mode, the far end renders the synthesized audio data based on the metadata to restore the original sound scene.

The division between content parts, format parts and BW64(Broadcast Wave 64 bit) files is shown in fig. 1. Both the content portion and the format portion constitute metadata in XML format, which is typically contained in one block ("axml" block) of the BW64 file. The bottom BW64 file portion contains a "channel allocation (chna)" block, which is a look-up table used to link metadata to the audio programs in the file.

The content part describes the technical content of the audio, e.g. whether it contains dialogs or a specific language, and loudness metadata. The format section describes the channel types of the audio tracks and how they are combined together, e.g. the left and right channels in a stereo pair. The meta-index of the content portion is typically unique to the audio and program, while the elements of the format portion may be multiplexed.

The present disclosure provides a transmission track format serial metadata generation method, as shown in fig. 2, the method including:

s210, acquiring attributes and sub-elements of the transmission audio track format serial metadata;

s220, generating the serial metadata of the transmission audio track according to the attribute and the sub-elements;

Optionally, the obtaining the attribute of the transmission track format serial metadata includes:

at least one of a transmission interface identification, a transmission interface name, the number of interface tracks, and the number of unique identification sets of interface tracks is obtained.

Optionally, obtaining sub-elements of the transmission track format serial metadata includes:

obtaining attributes of an audio track sub-element, wherein the attributes of the audio track sub-element comprise at least one of a transmission track identification, an audio sample format tag, and an audio sample format definition.

Optionally, the obtaining of the sub-elements of the transmission track format serial metadata further includes:

a soundtrack unique identification parameter sub-element of an audio track sub-element is obtained.

The audio model is an open compatible metadata generic model, but the audio model metadata is not suitable for real-time production and streaming audio applications, but rather for local file storage. When remote real-time transmission of metadata with digital audio is involved, a serial audio metadata schema is required to allow slicing of existing audio and its associated audio model metadata files into frames and streaming.

A frame of serial audio metadata contains a set of audio model metadata describing the audio frames within a certain time period associated with the frame. The serial audio metadata has the same structure, attributes and elements as the audio model metadata, as well as additional attributes for specifying the frame format. The frames of serial audio metadata do not overlap and are linked to a specified start time and duration. Metadata contained in a frame of serial audio metadata is likely to be used to describe the audio itself over the duration of the frame.

The parent element of the serial audio metadata is a frame, including: frame header (frameHeader) and audio format extended (audio format extended) two sub-elements. And the frame header includes 2 sub-elements: frame format (frameFormat) and transport track format (transportTrackFormat).

The audio format extension includes 8 sub-elements: audio program (audioprogram), audio content (audioContent), audio object (audioObject), soundtrack unique identifier (audiotrack uid), audio packet format (audiopackagformat), audio channel format (audioChannelFormat), audio stream format (audioStreamFormat), and audio track format (audioTrackFormat).

Transmitting audio track format serial metadata elements

The transport track format (transporttransport format) represents the relationship between a physical audio track (e.g., channel 1 of the AES3 interface) and a unique identification of the audio track UIDs (e.g., "ATU — 00000001") in the audio model. For the audio model, this information is described in the "channel assignment (chna)" block of the BW64 file. The transmission track format is serial audio metadata equivalent to BW64 "channel allocation (chna)" data block.

Properties of transmission track format

The transport interface name (transportName) is a name indicating an interface for transporting the relevant audio substantive content. The user is free to use any name of the interface. When multiple interfaces are used, these interfaces may be labeled as device-A, device-B, and device-C. The number of interface tracks (numTracks) is the number of associated tracks in each interface. The number of sets of interface track unique identifiers (numIDs) is the number of associated sets of track unique identifiers (audioTrackUIDs) in each interface. The properties of the transmission track format are shown in table 1:

TABLE 1

Transmitting sub-elements of a soundtrack format

The transport track identification (trackID) in the attribute of the audio track (audioTrack) sub-element is an index of the transport audio track in each interface. This index corresponds to the track number in the BW64 file. An audio sample format tag (format label) and an audio sample format definition (format definition) specify the format type of the audio signal. The sub-elements of the transmission track format are shown in table 2:

TABLE 2

Neither the audio track format identification parameter (audiotrackformatideref) nor the audio packet format identification parameter (audiopackaformatideref) are included in the transmission track format, and they are referenced by a track unique identification. Both the audio track format and the audio stream format may be omitted for PCM audio, and the soundtrack unique identifier may refer directly to the audio channel format rather than the audio track format. The same numbers are then used for the identification of the audio track format and the audio channel format.

The sub-elements of the audio track sub-element are shown in table 3:

TABLE 3

Fig. 3 is a schematic structural diagram of an apparatus for generating serial metadata in a transmission track format according to an embodiment of the present disclosure, where the apparatus includes:

an obtaining module 310, configured to obtain attributes and sub-elements of the transmission track format serial metadata;

a generating module 320 configured to generate the transmission track serial metadata according to the attribute and the sub-element;

The transmission track format serial metadata generation device provided by the embodiment of the invention can execute the transmission track format serial metadata generation method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 4, the electronic apparatus includes: a processor 410, a memory 420, an input device 430, and an output device 440. The number of the processors 30 in the electronic device may be one or more, and one processor 410 is taken as an example in fig. 4. The number of the memories 420 in the electronic device may be one or more, and one memory 420 is taken as an example in fig. 4. The processor 410, the memory 420, the input device 430, and the output device 440 of the electronic apparatus may be connected by a bus or other means, and fig. 4 illustrates the connection by the bus as an example. The electronic device can be a computer, a server and the like. The embodiment of the present disclosure describes in detail by taking an electronic device as a server, and the server may be an independent server or a cluster server.

The memory 420 serves as a computer-readable storage medium, and may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules of the serial audio mode metadata generation apparatus according to any embodiment of the present disclosure. The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 420 may further include memory located remotely from processor 410, which may be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 430 may be used to receive input numeric or character information and generate key signal inputs related to viewer user settings and function controls of the electronic device, and may also be a camera for acquiring images and a sound pickup device for acquiring audio data. The output device 440 may include an audio device such as a speaker. It should be noted that the specific composition of the input device 430 and the output device 440 can be set according to actual situations.

The processor 410 executes various functional applications of the device and data processing, i.e., implements a transmission track format serial metadata generation method, by executing software programs, instructions, and modules stored in the memory 420.

The disclosed embodiments also provide a storage medium containing computer-executable instructions for generating serial metadata including the transmission audio track format serial metadata generation method provided by any of the embodiments at a computer processor.

Of course, the storage medium provided by the embodiments of the present disclosure contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of the electronic method described above, and may also perform related operations in the electronic method provided by any embodiments of the present disclosure, and have corresponding functions and advantages.

From the above description of the embodiments, it is obvious for a person skilled in the art that the present disclosure can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present disclosure may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the electronic method according to any embodiment of the present disclosure.

It should be noted that, in the electronic device, the units and modules included in the electronic device are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present disclosure.

It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "in an embodiment," "in yet another embodiment," "exemplary" or "in a particular embodiment," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the disclosure. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although the present disclosure has been described in detail hereinabove with respect to general description, specific embodiments and experiments, it will be apparent to those skilled in the art that some modifications or improvements may be made based on the present disclosure. Accordingly, such modifications and improvements are intended to be within the scope of this disclosure, as claimed.

Claims

1. A method for generating serial metadata in a transmission track format, comprising:

2. The method of claim 1, wherein obtaining attributes of the transport track format serial metadata comprises:

3. The method of claim 2, wherein obtaining sub-elements of the transport track format serial metadata comprises:

4. The method of claim 3, wherein obtaining sub-elements of the transport track format serial metadata further comprises:

5. A transmission track format serial metadata generation apparatus, comprising:

6. An electronic device, comprising: a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.

7. A storage medium containing computer-executable instructions for implementing the method of any one of claims 1-4 when executed by a computer processor.