CN118055243B

CN118055243B - Audio and video coding processing method, device and equipment for digital television

Info

Publication number: CN118055243B
Application number: CN202410449268.7A
Authority: CN
Inventors: 戎新会; 潘会湘; 张绪
Original assignee: Shenzhen Kontech Electronics Co ltd
Current assignee: Shenzhen Kontech Electronics Co ltd
Priority date: 2024-04-15
Filing date: 2024-04-15
Publication date: 2024-06-11
Anticipated expiration: 2044-04-15
Also published as: CN118055243A

Abstract

The application relates to the technical field of audio and video coding, and discloses an audio and video coding processing method, device and equipment of a digital television. The method comprises the following steps: performing signal processing on the original audio and video signals to obtain first audio and video signals, extracting content and performing self-adaptive hierarchical coding to obtain first audio and video data streams; performing feature decomposition and data encapsulation to obtain a plurality of second audio and video data streams; creating an initial transmission parameter set through a multi-path transmission model and transmitting data; receiving a plurality of second audio and video data streams through the digital television, and carrying out data stream decoding and depth signal fusion to obtain second audio and video signals; playing the second audio and video signals through the digital television and performing state monitoring to obtain playing state data; the application realizes the audio and video dynamic coding of the digital television, improves the transmission stability of audio and video data, and further improves the playing effect of the digital television.

Description

Audio and video coding processing method, device and equipment for digital television

Technical Field

The present application relates to the field of audio and video coding technologies, and in particular, to an audio and video coding processing method, apparatus, and device for a digital television.

Background

In the field of digital televisions today, with continuous progress of technology and increasing demands of consumers for high-quality audio and video contents, an audio and video coding processing method becomes a key for improving viewing experience of users. Conventional audio and video coding methods face various challenges, such as noise problem in the original signal, data loss in the signal transmission process, and asynchronous audio and video during playing, which seriously affect the viewing experience.

The original audio and video signals often contain various background noises and interferences, which not only reduce the signal quality, but also increase the coding difficulty, thereby influencing the transmission efficiency and the final playing quality. Therefore, it becomes important to perform efficient denoising and signal enhancement on the original signal. Secondly, with the diversity and instability of network conditions, how to ensure efficient, reliable transmission of audio-video data, especially in the case of limited broadband or network congestion, is another challenge. The traditional single-path transmission model is difficult to cope with network fluctuation, is easy to cause data loss or play and is blocked, and user experience is seriously affected. Audio-video dyssynchrony is a common problem affecting the viewing experience of digital television, which not only confuses the viewer, but also reduces the viewing comfort.

Disclosure of Invention

The application provides an audio and video coding processing method, device and equipment for a digital television, which realize audio and video dynamic coding of the digital television, improve the transmission stability of audio and video data and further improve the playing effect of the digital television.

In a first aspect, the present application provides an audio/video coding processing method of a digital television, where the audio/video coding processing method of the digital television includes:

denoising and enhancing the original audio and video signals to obtain first audio and video signals, and extracting content and adaptively layering coding the first audio and video signals to obtain a first audio and video data stream;

performing feature decomposition and data encapsulation on the first audio/video data stream to obtain a plurality of second audio/video data streams;

Creating an initial transmission parameter set of the plurality of second audio/video data streams through a multi-path transmission model, and carrying out data transmission on the plurality of second audio/video data streams according to the initial transmission parameter set;

Receiving the plurality of second audio and video data streams through a digital television, and carrying out data stream decoding and depth signal fusion on the plurality of second audio and video data streams to obtain second audio and video signals;

playing the second audio and video signals through the digital television, and carrying out audio and video synchronous correction and state monitoring on the second audio and video signals to obtain playing state data;

And carrying out state feedback optimization on the initial transmission parameter set according to the play state data to generate a target transmission parameter set.

In a second aspect, the present application provides an audio/video coding processing apparatus of a digital television, where the audio/video coding processing apparatus of the digital television includes:

The coding module is used for carrying out denoising processing and signal enhancement on the original audio and video signals to obtain first audio and video signals, and carrying out content extraction and self-adaptive hierarchical coding on the first audio and video signals to obtain a first audio and video data stream;

The packaging module is used for carrying out feature decomposition and data packaging on the first audio/video data stream to obtain a plurality of second audio/video data streams;

the transmission module is used for creating an initial transmission parameter set of the plurality of second audio/video data streams through a multi-path transmission model and carrying out data transmission on the plurality of second audio/video data streams according to the initial transmission parameter set;

the decoding module is used for receiving the plurality of second audio and video data streams through the digital television, and carrying out data stream decoding and depth signal fusion on the plurality of second audio and video data streams to obtain second audio and video signals;

The correction module is used for playing the second audio and video signals through the digital television, and carrying out audio and video synchronous correction and state monitoring on the second audio and video signals to obtain playing state data;

and the optimizing module is used for carrying out state feedback optimization on the initial transmission parameter set according to the play state data to generate a target transmission parameter set.

A third aspect of the present application provides a computer apparatus comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the computer device to execute the audio and video coding processing method of the digital television.

A fourth aspect of the present application provides a computer-readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform the above-described audio/video encoding processing method of a digital television.

According to the technical scheme provided by the application, the wavelet transformation algorithm is adopted to carry out denoising treatment on the original audio and video signals, and the denoised signals are subjected to dynamic range compression and local contrast enhancement, so that the definition and the ornamental value of the audio and video signals are obviously improved. The wavelet transformation algorithm can effectively remove noise components in the signals, and the dynamic range compression and the local contrast enhancement further ensure that details and contrast of the signals are reserved and highlighted, so that the final audio and video content is more vivid and lifelike. The transmission process of the content is optimized through a multipath transmission model and self-adaptive hierarchical coding based on the characteristics of the content. The multipath transmission model allows the data stream to be transmitted through different network paths, so that the transmission interruption risk caused by single path faults is effectively reduced, and the reliability of data transmission is improved. The self-adaptive hierarchical coding dynamically adjusts the coding strategy according to different characteristics of the content, such as movement speed, color change and the like, so that high-quality audio and video content can be transmitted under the condition of limited bandwidth. By carrying out depth signal fusion and audio-video synchronization correction on the received audio-video data stream, the playing stability and the playing synchronism are obviously improved. The depth signal fusion utilizes advanced algorithm to combine signals on different transmission paths, compensates possible data loss, and ensures signal integrity. The audio and video synchronous correction ensures that the playing time of the audio and video is strictly consistent, and the problem of audio-visual mismatch in the watching process is avoided. The overall viewing experience is further improved by monitoring the playing state in real time and performing dynamic feedback optimization on playing and transmission parameters. The state monitoring can timely find out problems in the playing process, such as buffer delay, picture freezing and the like, and the dynamic feedback mechanism can adjust parameters of transmission rate, coding quality and the like according to the information, so that the playing effect is optimized in real time, and the user can enjoy smooth and high-quality audio and video content.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained based on these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an embodiment of an audio/video coding method of a digital television according to an embodiment of the present application;

fig. 2 is a schematic diagram of an embodiment of an audio/video encoding processing device of a digital television according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides an audio and video coding processing method, device and equipment of a digital television. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For easy understanding, the following describes a specific flow of an embodiment of the present application, referring to fig. 1, and an embodiment of an audio/video encoding processing method of a digital television in the embodiment of the present application includes:

Step 101, denoising and signal enhancement are carried out on an original audio/video signal to obtain a first audio/video signal, and content extraction and self-adaptive layered coding are carried out on the first audio/video signal to obtain a first audio/video data stream;

It can be understood that the execution subject of the present application may be an audio/video encoding processing device of a digital television, and may also be a terminal or a server, which is not limited herein. The embodiment of the application is described by taking a server as an execution main body as an example.

Specifically, the wavelet transformation algorithm is adopted to carry out denoising treatment on the original audio and video signals, and the algorithm can effectively separate noise and useful information in the signals to obtain denoised audio and video signals. And the dynamic range compression is carried out on the denoising video signal, so that the dynamic range of the signal is reduced, the compressed audio and video signal is obtained, the complexity of the signal is reduced, and the processing efficiency is improved. And carrying out local contrast enhancement and formatting processing on the compressed audio and video signals, improving the visual effect of the signals, and obtaining a clearer first audio and video signal. The first audio and video signals are subjected to content extraction through a Convolutional Neural Network (CNN) model, and important content information can be effectively extracted from the first audio and video signals through the model. And analyzing the content characteristics of the audio and video signal content information to obtain a set reflecting the characteristics of the video content, thereby being beneficial to understanding the structure and characteristics of the audio and video content. The first audio-video signal is adaptively layered based on the content characteristic set, the signal is decomposed into a plurality of target coding layers according to different characteristics of the content, each layer aims at a specific part of the content, and the layering mode enables the coding process to be optimized according to the importance and the characteristics of the content. Each target coding hierarchy will have its corresponding coding policy settings, which are formulated according to the characteristics of the content to ensure coding efficiency and quality. And according to the set target coding strategy, carrying out signal coding on each target coding layer, and converting the first audio and video signals into a first audio and video data stream.

102, Performing feature decomposition and data encapsulation on the first audio/video data stream to obtain a plurality of second audio/video data streams;

Specifically, the inter-frame difference calculation is performed on the video data in the first audio/video data stream, and the inter-frame difference value is obtained by comparing the differences between adjacent frames, so as to help identify the time when the video content changes, which may be due to scene transition or important actions. And performing color histogram difference calculation on the video data, and calculating to provide detailed information about color change in the video so as to obtain a color histogram difference value. By means of the two difference values, a plurality of first characteristic decomposition points of the first audio-video data stream can be accurately determined, and the decomposition points mark the positions of important changes in video content. And carrying out audio frequency spectrum characteristic decomposition on the audio data in the first audio-video data stream, and generating a plurality of second characteristic decomposition points by analyzing the frequency spectrum characteristics of the audio signal. The audio spectral decomposition can reveal main components and characteristics in the audio signal, such as changes in high and low frequency components, thereby providing basis for changes in audio content. And comprehensively analyzing the first characteristic decomposition point of the video and the second characteristic decomposition point of the audio, and determining a plurality of target characteristic decomposition points by considering the overall characteristics of the audio and video data. And based on the determined multiple target feature decomposition points, carrying out data stream decomposition on the first audio/video data stream, and decomposing the original audio/video data stream into multiple decomposition data streams. Each of the decomposed data streams represents a segment of the original data stream that is partitioned according to varying characteristics of the content, ensuring that the decomposed data streams better reflect the structure and characteristics of the original content. And data packaging is carried out on the decomposed data streams, and each decomposed data stream is packaged into an independent second audio/video data stream. The encapsulation process not only includes packaging of the data, but also involves encoding and formatting processes to ensure that each second audio-video data stream can be efficiently transmitted and played in the digital television system.

Step 103, creating an initial transmission parameter set of a plurality of second audio/video data streams through a multi-path transmission model, and carrying out data transmission on the plurality of second audio/video data streams according to the initial transmission parameter set;

Specifically, a plurality of candidate transmission paths are initialized, flexibility and high efficiency of data transmission are ensured, and the paths are selected by analyzing the topology structure and bandwidth resources of the network. And constructing a transmission model for a plurality of candidate transmission paths through a preset graph neural network. The graph neural network can predict the transmission characteristics of each path based on the real-time state and the historical data of the network, and a multipath transmission model which reflects the current network state and considers the historical data is constructed. And carrying out transmission path matching on a plurality of second audio/video data streams through the multi-path transmission model, so as to ensure that each data stream is distributed on a path which is most suitable for the characteristics of the data stream. The matching process considers the size of the data streams, the transmission requirements, and the current state of the network, and finds a path with highest transmission efficiency and lowest delay for each data stream. Network condition parameters including network congestion conditions, bandwidth variations, delays, etc. are acquired, and based on these parameters, an optimal transmission rate and retransmission policy are calculated for each target transmission path, respectively. And creating a plurality of initial transmission parameter sets of the second audio-video data streams according to the transmission rate and the retransmission strategy of each target transmission path. All necessary parameter settings, such as transmission rate, number of retransmissions, retransmission interval, etc., are included in this set to provide an initial, optimized parameter configuration for each data stream transmission. And distributing the second audio and video data streams to the corresponding target transmission paths, and carrying out multipath data transmission according to the created initial transmission parameter set. Each data stream will be transmitted on a selected path according to its specific parameter settings, which not only improves the efficiency of the transmission, but also enhances the stability and reliability of the data transmission process.

104, Receiving a plurality of second audio/video data streams through the digital television, and carrying out data stream decoding and depth signal fusion on the plurality of second audio/video data streams to obtain second audio/video signals;

Specifically, a plurality of second audio-video data streams are received through the digital television, each data stream containing a portion of the audio-video content or information of a specific aspect. And carrying out data stream decoding on the plurality of second audio/video data streams, and converting the coded signals in the data streams into playable audio/video signals. And respectively extracting depth signal characteristics of the plurality of decoded audio and video signals through a preset RNN model. The RNN model is adapted to process sequence data, such as audio-visual signals, and the model is capable of capturing time-varying features of such data. A set of depth signal features is extracted from each decoded audio-visual signal using the RNN model, the feature sets reflecting key content and attributes of the signal. And carrying out signal fusion on the plurality of decoded audio and video signals according to the depth signal characteristic set, and synthesizing the plurality of decoded audio and video signals into a single high-quality second audio and video signal. The signal fusion process considers the relation and complementarity between different signals, and realizes the seamless integration of the content by adjusting and combining the characteristics of the signals. The fusion process dynamically adjusts the fusion strategy according to the extracted depth signal features to ensure that the finally obtained second audio-video signal has the best quality in terms of vision and hearing.

Step 105, playing a second audio and video signal through the digital television, and performing audio and video synchronous correction and state monitoring on the second audio and video signal to obtain playing state data;

Specifically, parameter configuration is carried out on a player in the digital television according to the depth signal characteristic set. The depth signal feature set contains key information that affects the playback quality and viewing experience, such as the dynamic range, contrast, color depth, etc. of the signal. By adjusting the parameters of the player to match these features, it is ensured that the second audio-visual signal can achieve optimal visual and audible effects when played. And when the second audio and video signals are played through the player, performing audio and video synchronization correction. The synchronous rectification effort relies on algorithms to analyze and adjust the playback rate of the audio and video data streams to ensure that they are perfectly aligned at the correct point in time. Meanwhile, the state of the player is monitored, including the operation state, the data buffering condition, the network connection quality and the like of the player, so that any problem which possibly affects the playing experience is captured in real time. And analyzing the performance of the player and the stability of the network environment according to the initial state data. These initial state data may contain noise or non-standardized information, and thus require data cleansing and data normalization processing. Data cleansing includes removing erroneous, duplicate, or extraneous data records to ensure accuracy and correlation of the data. The data normalization process converts the data into a format that is easier to analyze and understand, and the resulting play status data reflects the performance of the player and various states during the play process.

And 106, performing state feedback optimization on the initial transmission parameter set according to the play state data to generate a target transmission parameter set.

Specifically, status feature encoding is performed on the play status data, where the status data includes information about performance of the player, network conditions, user experience, and the like. The data is converted into state feature code vectors by state feature coding, and the vectors reflect key state features in the playing process in a structured form. And inputting the state characteristic coding vector into a preset decision tree model. The decision tree model is a machine learning model applied to classification and regression tasks that generates a corresponding strategy by analyzing the input feature vectors for generating an initial parameter adjustment execution strategy for each target transmission path. And acquiring rewarding feedback parameters corresponding to each target transmission path, wherein the parameters reflect the effects of the previous transmission strategy, including transmission delay, data packet loss rate, user satisfaction and the like. And (3) carrying out strategy gradient analysis on each target transmission path by using the reward feedback parameters, wherein the strategy gradient analysis is an optimization method based on gradient descent and indicates how to adjust the transmission strategy to improve the reward feedback parameters, namely, improve the transmission effect. And carrying out strategy parameter feedback analysis on the reward feedback parameters according to the strategy gradient, and calculating to obtain strategy parameter feedback values of each target transmission path. These feedback values provide a well-defined direction and magnitude for parameter adjustment, guiding how to optimize the initial parameter adjustment execution strategy for better transmission performance. And optimizing the initial parameter adjustment execution strategy according to the strategy parameter feedback value to generate a target parameter adjustment execution strategy of each target transmission path. The target strategy considers the latest network condition and user experience feedback, and can more accurately meet the transmission requirement. And performing comprehensive parameter optimization on the initial transmission parameter set according to the target parameter adjustment execution strategy to generate a target transmission parameter set. The set contains key parameters such as optimized transmission rate, coding quality, retransmission strategy and the like, ensures that the transmission process of audio and video data is more efficient and stable, can adapt to the change of network environment, and finally improves the watching experience of users.

In the embodiment of the application, the noise of the original audio and video signals is removed by adopting the wavelet transformation algorithm, and the dynamic range compression and the local contrast enhancement are carried out on the signals after the noise removal, so that the definition and the ornamental value of the audio and video signals are obviously improved. The wavelet transformation algorithm can effectively remove noise components in the signals, and the dynamic range compression and the local contrast enhancement further ensure that details and contrast of the signals are reserved and highlighted, so that the final audio and video content is more vivid and lifelike. The transmission process of the content is optimized through a multipath transmission model and self-adaptive hierarchical coding based on the characteristics of the content. The multipath transmission model allows the data stream to be transmitted through different network paths, so that the transmission interruption risk caused by single path faults is effectively reduced, and the reliability of data transmission is improved. The self-adaptive hierarchical coding dynamically adjusts the coding strategy according to different characteristics of the content, such as movement speed, color change and the like, so that high-quality audio and video content can be transmitted under the condition of limited bandwidth. By carrying out depth signal fusion and audio-video synchronization correction on the received audio-video data stream, the playing stability and the playing synchronism are obviously improved. The depth signal fusion utilizes advanced algorithm to combine signals on different transmission paths, compensates possible data loss, and ensures signal integrity. The audio and video synchronous correction ensures that the playing time of the audio and video is strictly consistent, and the problem of audio-visual mismatch in the watching process is avoided. The overall viewing experience is further improved by monitoring the playing state in real time and performing dynamic feedback optimization on playing and transmission parameters. The state monitoring can timely find out problems in the playing process, such as buffer delay, picture freezing and the like, and the dynamic feedback mechanism can adjust parameters of transmission rate, coding quality and the like according to the information, so that the playing effect is optimized in real time, and the user can enjoy smooth and high-quality audio and video content.

In a specific embodiment, the process of executing step 101 may specifically include the following steps:

(1) Denoising the original audio and video signals by adopting a wavelet transformation algorithm to obtain denoised video signals;

(2) Dynamic range compression is carried out on the denoising video signal to obtain a compressed audio/video signal;

(3) Local contrast enhancement and formatting processing are carried out on the compressed audio and video signals, so that first audio and video signals are obtained;

(4) Extracting the content of the first audio and video signal through the CNN model to obtain audio and video signal content information;

(5) Analyzing the content characteristics of the content information of the audio and video signals to obtain a content characteristic set;

(6) Performing self-adaptive layering on the first audio and video signals according to the content characteristic set to obtain a plurality of target coding layers;

(7) Setting a coding strategy for a plurality of target coding levels to obtain a target coding strategy of each target coding level;

(8) And carrying out signal coding on the first audio and video signals according to the target coding strategy to obtain a first audio and video data stream.

Specifically, a wavelet transformation algorithm is adopted to carry out denoising treatment on an original audio and video signal, and useful information and noise in the signal are separated. The wavelet transform algorithm effectively distinguishes detailed portions (e.g., edges) and smooth portions in the signal by analyzing the signal over multiple scales so that the denoised audio-video signal retains more valuable information. For example, when processing an audio signal containing background noise and dialog, the wavelet transform can distinguish the waveform difference between the waveform of the dialog and the waveform of the background noise, thereby effectively removing the background noise and preserving clear dialog content. And the dynamic range compression is carried out on the denoised audio and video signals, so that the dynamic range of the signals is reduced, and the signals can keep good audio-visual effects in different playing environments. Dynamic range compression reduces the difference between the extremely high and low amplitude parts of the signal and the average level of the signal by adjusting them so that the details are more apparent while avoiding the problem of excessive or insufficient volume. And carrying out local contrast enhancement and formatting processing on the compressed audio and video signals. Local contrast enhancement focuses on improving the visual effect of images, particularly in areas with rich details, such as scenes with staggered light and shadow, by increasing local contrast, details are clearer, and viewing experience is improved. The formatting process ensures compatibility of the signal format with subsequent processing and playback devices. For example, local contrast enhancement may make dark detail more visible when processing a night scene video, while formatting ensures that the video can be played on a different digital television. The first audio-video signal is content extracted using a Convolutional Neural Network (CNN) model that is capable of identifying important content information, such as scene features, objects, and voice commands, from the audio-video signal. For example, by processing a video of a city street view through a CNN model, key elements such as vehicles, pedestrians, etc., as well as noise of the background and dialogue content can be identified. And (3) carrying out content characteristic analysis on the extracted audio and video signal content information to obtain a characteristic set of the content, such as moving speed, color change, rhythm of sound and the like. Based on the content characteristic set, the first audio-video signal is adaptively layered, and the signal is decomposed into a plurality of target coding layers according to the importance and characteristics of different contents. The layering strategy allows different encoding strategies to be adopted for different levels of content, and storage and transmission efficiency is optimized. For example, in a scene containing multiple layers of dialogue and background music, the dialogue and background music can be processed separately by adaptive layering to ensure the clarity of the dialogue and the richness of the background music. Coding strategies are set for each target coding level, and signal coding is carried out on the first audio/video signals according to the strategies, wherein the strategies consider the characteristics and transmission requirements of contents of each level, such as high bit rate coding for high dynamic scenes and low bit rate coding for static scenes. Finally, a first audio and video data stream is obtained.

In a specific embodiment, the process of executing step 102 may specifically include the following steps:

(1) Carrying out inter-frame difference calculation on video data in the first audio/video data stream to obtain an inter-frame difference value;

(2) Performing color histogram difference calculation on video data in the first audio and video data stream to obtain a color histogram difference value;

(3) Determining a plurality of first feature decomposition points of the first audio/video data stream according to the inter-frame difference value and the color histogram difference value;

(4) Performing audio frequency spectrum feature decomposition on the audio data in the first audio-video data stream to generate a plurality of second feature decomposition points;

(5) Comprehensively analyzing the feature demarcation points of the plurality of first feature decomposition points and the plurality of second feature decomposition points to obtain a plurality of target feature decomposition points;

(6) Carrying out data stream decomposition on the first audio and video data stream according to the target feature decomposition points to obtain a plurality of decomposition data streams;

(7) And carrying out data encapsulation on the plurality of decomposed data streams to obtain a plurality of second audio/video data streams.

Specifically, the inter-frame difference calculation is performed on the video data in the first audio/video data stream, the difference between two adjacent frames is compared, and the motion or scene change degree in the video is evaluated. By calculating these differences, inter-frame difference values are obtained that are highly reflective of the dynamic changes in the video content. And performing color histogram difference calculation, and evaluating the change of color distribution. The color histogram graphically represents the frequencies of different colors in the image, and the color histogram of the adjacent frames is compared to obtain a color histogram difference value. The value can reveal details of scene color change, such as gradual change of sky color in sunset or color difference caused by abrupt change of light, and provide quantized measurement for the color change of video content. And combining the inter-frame difference value and the color histogram difference value to determine a plurality of first characteristic decomposition points of the first audio-video data stream. The feature decomposition point marks an important change in the video content, such as the moment of a jump from one scene to another, or a transition from static to dynamic. And simultaneously, carrying out audio frequency spectrum characteristic decomposition on the audio data in the first audio-video data stream, and generating a plurality of second characteristic decomposition points by analyzing the frequency spectrum of the audio signal. The audio spectral feature decomposition can reveal the dominant frequency components in the audio, forming second feature decomposition points. And carrying out characteristic demarcation point comprehensive analysis on the plurality of first characteristic decomposition points and the plurality of second characteristic decomposition points. And fusing key feature points of the video and the audio, and determining a plurality of target feature decomposition points which represent key change nodes of the audio and video content and are the basis of data stream decomposition. The first audio-video data stream is decomposed into a plurality of decomposed data streams based on the target feature decomposition points. The data stream is split into smaller paragraphs according to the content variations, each paragraph focusing on a particular content or scene. And carrying out data encapsulation on the decomposed data streams to generate a plurality of second audio/video data streams. The data encapsulation refers to packaging the decomposed data stream according to a certain format and protocol, so that the data stream is suitable for different playing and transmission requirements. Through data encapsulation, each decomposed data stream is given an independent identification and necessary metadata, so that management and calling in a digital television system are facilitated.

In a specific embodiment, the process of executing step 103 may specifically include the following steps:

(1) Initializing a plurality of candidate transmission paths, and constructing a transmission model of the plurality of candidate transmission paths through a preset graph neural network to obtain a multipath transmission model;

(2) Carrying out transmission path matching on a plurality of second audio and video data streams through a multipath transmission model to obtain a target transmission path of each second audio and video data stream;

(3) Acquiring network condition parameters, and respectively calculating the transmission rate and retransmission strategy of each target transmission path according to the network condition parameters;

(4) Creating a plurality of initial transmission parameter sets of the second audio and video data streams according to the transmission rate and the retransmission strategy of each target transmission path;

(5) And distributing the second audio and video data streams to corresponding target transmission paths, and carrying out multipath data transmission on the second audio and video data streams according to the initial transmission parameter set.

Specifically, a plurality of candidate transmission paths are initialized. These paths are selected based on the physical topology of the network, historical performance data, or expected network loads. For example, in a trans-regional digital television transmission system, candidate paths may include network backbones through different regions, each path having its own unique delay, bandwidth, and packet loss characteristics. And constructing a transmission model of the candidate transmission path through a preset graph neural network. A Graph Neural Network (GNN) is a machine learning model that is adapted to process graph structure data, such as network topology. GNNs are able to learn the characteristics of nodes (e.g., routers, switches) and edges (i.e., connections) in the network and how they affect overall transmission performance. Through training, the GNN model can predict the performance of each path under the specific network condition, thereby constructing a multipath transmission model reflecting the real network state. And carrying out transmission path matching on the plurality of second audio/video data streams. The target transmission path most suitable for each data stream is selected by a model taking into account the characteristics (e.g., size, real-time requirements) of each data stream and the performance (e.g., expected delay, bandwidth) of each path. Network condition parameters such as current network congestion level, real-time bandwidth and delay of each path, etc. are acquired. Based on these parameters, a transmission rate and a retransmission policy are calculated for each target transmission path. According to the network performance model, factors such as the size of a data packet, a transmission window, a congestion control algorithm and the like are considered. By calculation, an initial set of transmission parameters is set for each data stream, which set of parameters is aimed at achieving optimal transmission efficiency and reliability under the current network conditions. And distributing the second audio and video data streams to respective target transmission paths, and carrying out multipath data transmission according to respective initial transmission parameter sets. Each data flow is transmitted on a selected path according to its particular parameters to accommodate possible network variations and congestion. For example, if a path suddenly becomes congested, the associated data stream may adapt to this change by dynamically adjusting its transmission parameters (e.g., reducing the sending rate or switching to a backup path), thereby ensuring transmission continuity and quality of the data stream.

In a specific embodiment, the process of executing step 104 may specifically include the following steps:

(1) Receiving a plurality of second audio/video data streams through a digital television;

(2) Performing data stream decoding on the plurality of second audio/video data streams to obtain a plurality of decoded audio/video signals;

(3) Respectively extracting depth signal characteristics of a plurality of decoded audio and video signals through a preset RNN model to obtain a depth signal characteristic set of each decoded audio and video signal;

(4) And carrying out signal fusion on the plurality of decoded audio and video signals according to the depth signal characteristic set to obtain a second audio and video signal.

Specifically, the digital television system obtains a plurality of second audio/video data streams through a receiving module thereof. These data streams may be transmitted via different transmission paths, each containing a portion of the content of a television program, e.g., one data stream may carry a main video signal, while the other data stream may contain different angles of shots, additional audio tracks, or special effects. And decoding the received second audio-video data stream. Decoding is the process of restoring a compressed data stream to a playable audio-video signal, requiring the system to be able to handle various encoding formats. After decoding, a plurality of decoded audio-video signals are obtained, each representing a different aspect of the original program. And extracting depth signal characteristics of the decoded audio and video signals through a preset cyclic neural network (RNN) model. The RNN model is suitable for processing sequence data, such as a sequence of video frames or audio samples, and can take into account the time dependence in the data. A set of depth signal features is extracted from each decoded signal by the RNN model, which features reflect key properties of the signal such as motion pattern, scene change, pitch and cadence. And performing signal fusion on the plurality of decoded audio and video signals based on the extracted depth signal feature set. How to optimally combine the different signals is dynamically determined by a deep learning model. And finally generating a fused second audio/video signal.

In a specific embodiment, the process of executing step 105 may specifically include the following steps:

(1) Performing parameter configuration on a player in the digital television according to the depth signal feature set, and playing a second audio/video signal through the player;

(2) Performing audio and video synchronization correction on the second audio and video signal, and performing state monitoring on the player to obtain initial state data;

(3) And performing data cleaning and data standardization processing on the initial state data to obtain play state data.

Specifically, parameter configuration is carried out on a player in the digital television according to the depth signal characteristic set. The depth signal feature set includes information such as resolution, frame rate, dynamic range of the video, and sampling rate and channel number of the audio. These features provide the player with the necessary information to configure it accordingly, ensuring that the player can take full advantage of these signal features to provide optimal playback quality. For example, if the feature set indicates that the video signal has High Dynamic Range (HDR) content, the player will be configured in HDR mode to exhibit richer colors and deeper contrast. Also, if the audio signal contains multiple channels, the player will be configured as a multi-channel output to provide a more immersive listening experience. And when the second audio and video signal is played, performing audio and video synchronization correction. Audio-video dyssynchrony may be caused by delay differences in the encoding, transmission, or decoding process. The player monitors the time offset between audio and video by a built-in synchronization correction mechanism and dynamically adjusts to eliminate these offsets. For example, if the video signal is advanced relative to the audio signal, the player may temporarily delay the playing of the video, or speed up the playing of the audio, to achieve synchronization. Meanwhile, the player is monitored for status, such as buffer status, play error and network connection status. And obtaining initial state data of the playing process through the monitoring data. And performing data cleaning and data standardization processing on the initial state data. Data cleansing includes removing extraneous or erroneous data records, identifying and correcting outliers, while data normalization is the conversion of data into a more consistent format for ease of analysis and understanding, and ultimately to obtain standardized play status data that can be used to analyze play performance, identify and solve problems during play, such as finding that an increase in play delay may indicate a network congestion problem, and thus, actions such as reducing play quality or switching to a better network connection may be taken to improve play experience.

In a specific embodiment, the process of executing step 106 may specifically include the following steps:

(1) Performing state feature coding on the play state data to obtain a state feature coding vector;

(2) Inputting the state feature coding vector into a preset decision tree model for strategy generation to obtain an initial parameter adjustment execution strategy of each target transmission path;

(3) Acquiring rewarding feedback parameters corresponding to each target transmission path, and carrying out strategy gradient analysis on each target transmission path to obtain strategy gradient of each target transmission path;

(4) Performing strategy parameter feedback analysis on the reward feedback parameters according to the strategy gradient to obtain strategy parameter feedback values of each target transmission path;

(5) Performing parameter adjustment execution policy optimization on the initial parameter adjustment execution policy according to the policy parameter feedback value, and generating a target parameter adjustment execution policy of each target transmission path;

(6) And carrying out parameter optimization on the initial transmission parameter set according to the target parameter adjustment execution strategy to generate a target transmission parameter set.

Specifically, the play status data is status feature encoded to convert the data into a form more suitable for machine processing. The play status data includes play delay, buffer count, data packet loss rate, etc., which are encoded into a status feature encoding vector. For example, if frequent buffering is experienced during a video playback, this information will be encoded into the vector reflecting a degradation in playback quality. And inputting the state characteristic coding vector into a preset decision tree model. The decision tree model is a machine learning method capable of generating decision rules based on input feature vectors, thereby generating policies. The model generates an initial parameter adjustment execution strategy for each target transmission path according to the state feature encoding vector. These strategies include preliminary adjustment suggestions for parameters such as transmission rate, retransmission strategy, etc. For example, if the feature encoding vector indicates that the data packet loss rate of a path is high, the decision tree model may recommend reducing the transmission rate of the path. And acquiring the reward feedback parameters corresponding to each target transmission path. These parameters reflect the effects of previous strategies such as success rate and efficiency of transmission. And carrying out strategy gradient analysis on each target transmission path based on feedback, wherein the strategy gradient analysis is an optimization technology based on a gradient descent method and is used for calculating the optimal estimation of each strategy adjustment direction and amplitude. The result of the policy gradient analysis is a policy gradient for each target transmission path, indicating how to adjust the policy to maximize rewards. And carrying out strategy parameter feedback analysis on the reward feedback parameters according to the strategy gradient to obtain strategy parameter feedback values of each target transmission path, wherein the values specifically indicate adjustment amounts of various parameters (such as transmission rate and retransmission times). And optimizing the initial parameter adjustment execution strategy according to the strategy parameter feedback value to generate a target parameter adjustment execution strategy of each target transmission path. For example, if the reward feedback for a path indicates that data loss can be significantly reduced by increasing the number of retransmissions, then the execution strategy for that path will be adjusted accordingly. And comprehensively optimizing the initial transmission parameter set according to the target parameter adjustment execution strategy to generate a target transmission parameter set.

The method for processing audio and video codes of the digital television in the embodiment of the present application is described above, and the following describes an audio and video coding processing device of the digital television in the embodiment of the present application, please refer to fig. 2, and one embodiment of the audio and video coding processing device of the digital television in the embodiment of the present application includes:

the encoding module 201 is configured to perform denoising processing and signal enhancement on an original audio/video signal to obtain a first audio/video signal, and perform content extraction and adaptive hierarchical encoding on the first audio/video signal to obtain a first audio/video data stream;

The packaging module 202 is configured to perform feature decomposition and data packaging on the first audio/video data stream to obtain a plurality of second audio/video data streams;

The transmission module 203 is configured to create an initial transmission parameter set of the plurality of second audio/video data streams through a multi-path transmission model, and perform data transmission on the plurality of second audio/video data streams according to the initial transmission parameter set;

The decoding module 204 is configured to receive a plurality of second audio/video data streams through the digital television, and perform data stream decoding and depth signal fusion on the plurality of second audio/video data streams to obtain a second audio/video signal;

the correction module 205 is configured to play the second audio and video signal through the digital television, and perform audio and video synchronization correction and status monitoring on the second audio and video signal to obtain play status data;

The optimizing module 206 is configured to perform state feedback optimization on the initial transmission parameter set according to the play state data, and generate a target transmission parameter set.

Through the cooperation of the components, the method and the device have the advantages that the wavelet transformation algorithm is adopted to carry out denoising processing on the original audio and video signals, and the dynamic range compression and the local contrast enhancement are carried out on the denoised signals, so that the definition and the ornamental value of the audio and video signals are obviously improved. The wavelet transformation algorithm can effectively remove noise components in the signals, and the dynamic range compression and the local contrast enhancement further ensure that details and contrast of the signals are reserved and highlighted, so that the final audio and video content is more vivid and lifelike. The transmission process of the content is optimized through a multipath transmission model and self-adaptive hierarchical coding based on the characteristics of the content. The multipath transmission model allows the data stream to be transmitted through different network paths, so that the transmission interruption risk caused by single path faults is effectively reduced, and the reliability of data transmission is improved. The self-adaptive hierarchical coding dynamically adjusts the coding strategy according to different characteristics of the content, such as movement speed, color change and the like, so that high-quality audio and video content can be transmitted under the condition of limited bandwidth. By carrying out depth signal fusion and audio-video synchronization correction on the received audio-video data stream, the playing stability and the playing synchronism are obviously improved. The depth signal fusion utilizes advanced algorithm to combine signals on different transmission paths, compensates possible data loss, and ensures signal integrity. The audio and video synchronous correction ensures that the playing time of the audio and video is strictly consistent, and the problem of audio-visual mismatch in the watching process is avoided. The overall viewing experience is further improved by monitoring the playing state in real time and performing dynamic feedback optimization on playing and transmission parameters. The state monitoring can timely find out problems in the playing process, such as buffer delay, picture freezing and the like, and the dynamic feedback mechanism can adjust parameters of transmission rate, coding quality and the like according to the information, so that the playing effect is optimized in real time, and the user can enjoy smooth and high-quality audio and video content.

The application also provides a computer device, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the audio/video coding processing method of the digital television in the above embodiments.

The application also provides a computer readable storage medium, which can be a nonvolatile computer readable storage medium, and can also be a volatile computer readable storage medium, wherein the computer readable storage medium stores instructions, and when the instructions run on a computer, the instructions cause the computer to execute the steps of the audio/video coding processing method of the digital television.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, systems and units may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. The audio and video coding processing method of the digital television is characterized by comprising the following steps of:

2. The audio/video coding processing method of claim 1, wherein the denoising and signal enhancement are performed on the original audio/video signal to obtain a first audio/video signal, and the content extraction and adaptive layered coding are performed on the first audio/video signal to obtain a first audio/video data stream, comprising:

Denoising the original audio and video signals by adopting a wavelet transformation algorithm to obtain denoised video signals;

Performing dynamic range compression on the denoising video signal to obtain a compressed audio/video signal;

Carrying out local contrast enhancement and formatting on the compressed audio and video signals to obtain first audio and video signals;

extracting the content of the first audio and video signal through a CNN model to obtain audio and video signal content information;

content characteristic analysis is carried out on the audio and video signal content information to obtain a content characteristic set;

Performing self-adaptive layering on the first audio and video signals according to the content characteristic set to obtain a plurality of target coding layers;

Setting the coding strategy of the plurality of target coding levels to obtain a target coding strategy of each target coding level;

And carrying out signal coding on the first audio and video signals according to the target coding strategy to obtain a first audio and video data stream.

3. The audio/video coding processing method of claim 2, wherein the performing feature decomposition and data encapsulation on the first audio/video data stream to obtain a plurality of second audio/video data streams includes:

Performing inter-frame difference calculation on the video data in the first audio/video data stream to obtain an inter-frame difference value;

performing color histogram difference calculation on video data in the first audio and video data stream to obtain a color histogram difference value;

Determining a plurality of first feature decomposition points of the first audio/video data stream according to the inter-frame difference value and the color histogram difference value;

Performing audio frequency spectrum characteristic decomposition on the audio data in the first audio-video data stream to generate a plurality of second characteristic decomposition points;

Performing feature demarcation point comprehensive analysis on the plurality of first feature decomposition points and the plurality of second feature decomposition points to obtain a plurality of target feature decomposition points;

Carrying out data stream decomposition on the first audio/video data stream according to the target feature decomposition points to obtain a plurality of decomposition data streams;

and carrying out data encapsulation on the plurality of decomposed data streams to obtain a plurality of second audio/video data streams.

4. The audio/video coding processing method of claim 1, wherein creating an initial transmission parameter set of the plurality of second audio/video data streams by a multi-path transmission model, and performing data transmission on the plurality of second audio/video data streams according to the initial transmission parameter set, comprises:

Initializing a plurality of candidate transmission paths, and constructing a transmission model of the plurality of candidate transmission paths through a preset graph neural network to obtain a multipath transmission model;

carrying out transmission path matching on the plurality of second audio and video data streams through the multi-path transmission model to obtain a target transmission path of each second audio and video data stream;

Acquiring network condition parameters, and respectively calculating the transmission rate and retransmission strategy of each target transmission path according to the network condition parameters;

Creating an initial transmission parameter set of the plurality of second audio/video data streams according to the transmission rate and the retransmission strategy of each target transmission path;

and distributing the second audio and video data streams to corresponding target transmission paths, and carrying out multipath data transmission on the second audio and video data streams according to the initial transmission parameter set.

5. The audio/video coding processing method of claim 1, wherein the receiving the plurality of second audio/video data streams by the digital television, and performing data stream decoding and depth signal fusion on the plurality of second audio/video data streams, to obtain a second audio/video signal, includes:

receiving the plurality of second audio and video data streams through a digital television;

Performing data stream decoding on the plurality of second audio/video data streams to obtain a plurality of decoded audio/video signals;

respectively extracting depth signal characteristics of the plurality of decoded audio and video signals through a preset RNN model to obtain a depth signal characteristic set of each decoded audio and video signal;

And carrying out signal fusion on the plurality of decoding audio and video signals according to the depth signal characteristic set to obtain a second audio and video signal.

6. The audio/video coding processing method of claim 5, wherein playing the second audio/video signal by the digital television, and performing audio/video synchronization correction and status monitoring on the second audio/video signal to obtain playing status data, comprises:

performing parameter configuration on a player in the digital television according to the depth signal feature set, and playing the second audio and video signals through the player;

performing audio and video synchronization correction on the second audio and video signals, and performing state monitoring on the player to obtain initial state data;

and performing data cleaning and data standardization processing on the initial state data to obtain play state data.

7. The audio/video coding processing method of claim 1, wherein the performing state feedback optimization on the initial transmission parameter set according to the play state data to generate a target transmission parameter set includes:

Performing state feature coding on the play state data to obtain a state feature coding vector;

inputting the state feature coding vector into a preset decision tree model for strategy generation to obtain an initial parameter adjustment execution strategy of each target transmission path;

Acquiring rewarding feedback parameters corresponding to each target transmission path, and carrying out strategy gradient analysis on each target transmission path to obtain strategy gradient of each target transmission path;

Performing strategy parameter feedback analysis on the reward feedback parameters according to the strategy gradient to obtain strategy parameter feedback values of each target transmission path;

performing parameter adjustment execution policy optimization on the initial parameter adjustment execution policy according to the policy parameter feedback value, and generating a target parameter adjustment execution policy of each target transmission path;

And carrying out parameter optimization on the initial transmission parameter set according to the target parameter adjustment execution strategy to generate a target transmission parameter set.

8. An audio/video coding processing device of a digital television, which is characterized in that the audio/video coding processing device of the digital television comprises:

9. A computer device, the computer device comprising: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invokes the instructions in the memory to cause the computer device to perform the audio video coding processing method of the digital television of any of claims 1-7.

10. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the audio video coding processing method of a digital television according to any one of claims 1-7.