CN117812382A

CN117812382A - Video data processing method, device, equipment and storage medium

Info

Publication number: CN117812382A
Application number: CN202311704020.2A
Authority: CN
Inventors: 汪晨飞
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-12-12
Filing date: 2023-12-12
Publication date: 2024-04-02
Anticipated expiration: 2043-12-12
Also published as: CN117812382B

Abstract

The present disclosure provides a video data processing method, device, equipment and storage medium, which relates to the field of Internet technology. The specific implementation scheme includes: obtaining video data cached in a buffer area, the video data includes at least two video frames and an auxiliary information frame between each group of two adjacent video frames, the auxiliary information frame includes the picture change information of two adjacent video frames; according to the target auxiliary information frame and the Nth video frame, a video frame is determined from the 1st video frame to the Kth video frame as the target video frame, K is an integer greater than 1, N is an integer greater than K, the target auxiliary information frame includes all auxiliary information frames between the 1st video frame to the Kth video frame, and the picture difference between the target video frame and the Nth video frame meets the preset conditions; discard the video frames between the target video frame and the Nth video frame. The present disclosure can reduce the impact of frame loss on video coherence and improve the efficiency of frame loss.

Description

Video data processing method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of the internet, in particular to the technical fields of image processing, data transmission, data encoding and the like, and can be applied to a live video scene, in particular to a video data processing method, a device, equipment and a storage medium.

Background

Video live broadcast gradually becomes the mainstream expression mode of internet. Video live broadcast is influenced by factors such as network quality or live broadcast equipment quality, and situations such as blocking and play pause can occur. In order to ensure smooth transmission of live video, a live video receiving end may increase live video buffering (i.e. pre-buffering some video frame sequences) to cope with situations such as blocking, play suspension, etc., where a live video is composed of a plurality of video frame sequences. When the buffer memory of the receiving end is increased, the delay of the receiving end is accumulated, which is called accumulated delay.

Current methods of reducing the cumulative delay include dropping all of the video data buffered at the receiving end directly, or dropping one or both frames of the video data buffered at the receiving end.

But the direct discarding of all video data buffered at the receiving end reduces video playback consistency. The manner of dropping one or two frames of the video data buffered at the receiving end is inefficient.

Disclosure of Invention

The disclosure provides a video data processing method, a device, equipment and a storage medium, which can reduce the influence of frame loss on video continuity, improve the frame loss efficiency and improve the user experience.

According to a first aspect of the present disclosure, there is provided a video data processing method, the method comprising:

and acquiring video data cached in the cache region, wherein the video data comprises at least two video frames and auxiliary information frames between each group of two adjacent video frames, and the auxiliary information frames comprise picture change information of the two adjacent video frames.

According to the target auxiliary information frame and the Nth video frame, determining a video frame from the 1 st video frame to the K th video frame as the target video frame, wherein K is an integer greater than 1, N is an integer greater than K, the target auxiliary information frame comprises all auxiliary information frames between the 1 st video frame and the K th video frame, and the picture difference between the target video frame and the Nth video frame accords with a preset condition.

And discarding video frames from the target video frame to the Nth video frame.

According to a second aspect of the present disclosure, there is provided a video data processing method, the method comprising:

and acquiring first video data to be transmitted, wherein the first video data comprises at least two video frames.

And inserting an auxiliary information frame between each group of two adjacent video frames in the first video data to obtain the second video data, wherein the auxiliary information frame comprises picture change information of the two adjacent video frames.

And sending the second video data to the receiving end.

According to a third aspect of the present disclosure, there is provided a video frame processing apparatus, the apparatus comprising: the device comprises an acquisition unit, a determination unit and a frame loss unit.

The acquisition unit is used for acquiring the video data cached in the cache area, wherein the video data comprises at least two video frames and auxiliary information frames between every two adjacent video frames, and the auxiliary information frames comprise picture change information of the two adjacent video frames.

The determining unit is used for determining a video frame from the 1 st video frame to the K th video frame as a target video frame according to the target auxiliary information frame and the N th video frame, wherein K is an integer greater than 1, N is an integer greater than K, the target auxiliary information frame comprises all auxiliary information frames between the 1 st video frame and the K th video frame, and the picture difference between the target video frame and the N th video frame accords with a preset condition.

And the discarding unit is used for discarding video frames from the target video frame to the Nth video frame.

According to a fourth aspect of the present disclosure, there is provided a video frame processing apparatus, the apparatus comprising: the device comprises an acquisition unit, an insertion unit and a sending unit.

And the acquisition unit is used for acquiring the first video data to be transmitted, wherein the first video data comprises at least two video frames.

And the inserting unit is used for inserting an auxiliary information frame between each group of two adjacent video frames in the first video data to obtain the second video data, wherein the auxiliary information frame comprises picture change information of the two adjacent video frames.

And the sending unit is used for sending the second video data to the receiving end.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as in the first aspect or the second aspect.

According to a sixth aspect of the present disclosure there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to the first or second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first or second aspect.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flow chart of a video data processing method according to an embodiment of the disclosure;

FIG. 2 is a schematic flow chart of an implementation of S102 in FIG. 1 according to an embodiment of the disclosure;

FIG. 3 is a schematic flow chart of an implementation of S201 in FIG. 2 according to an embodiment of the disclosure;

fig. 4 is another flow chart of a video data processing method according to an embodiment of the disclosure;

fig. 5 is a schematic diagram of the composition of a video data processing apparatus according to an embodiment of the present disclosure;

FIG. 6 is another schematic diagram of the video data processing apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure provided by embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be appreciated that in embodiments of the present disclosure, the character "/" generally indicates that the context associated object is an "or" relationship. The terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.

Illustratively, in a live video scenario, if the data transmission protocol is based on the transmission control protocol (transmission control protocol, TCP), for example, the real-time messaging protocol (real time messaging protocol, RTMP), HTTP-FLV, etc., the HTTP-FLV indicates that streaming media data is encapsulated into FLV (flash video) format and then transmitted through the hypertext transmission protocol (hypertext transfer protocol, HTTP), then due to the characteristics of TCP reliable transmission, the delay of the viewing end is continuously accumulated when a situation such as a clip, play pause, etc. occurs in live video, which is called an accumulated delay.

For example, when a live video plays a front stage, the live video is playing, then the live video goes back to the background, at this time, the player is paused, the audio and video data continues to be cached, and when the live video goes back to the front stage, the live video continues to start playing from the video stream data just at the time of exiting, and the actual live video is not already at this point in time. This time of the foreground-background switching accumulates a delay. As another example, in the case of poor networks or network jitter, in such a network environment, if there is no buffering policy, the live broadcast will get stuck. To solve the problem of the jamming, the playing end adds a buffering strategy, and once buffering occurs, the buffering means a delay from the streaming end (i.e. the sending end) to the playing end (i.e. the receiving end). Such delays accumulate when a stuck condition occurs multiple times.

Illustratively, the current method of reducing the cumulative delay includes dropping all of the video data buffered at the receiving end directly, or dropping one or both frames of the video data buffered at the receiving end. All video data cached at the receiving end are directly discarded, so that the influence on the continuity of the video is large, the perception of a user is obvious, and the negative influence on the playing experience is large. The single frame loss amount of the mode of dropping one or two frames of the video data buffered by the receiving end is limited, the accumulated delay elimination is slower, and the accumulated delay elimination efficiency is lower.

Under the background technology, the present disclosure provides a video data processing method, which can reduce the influence of frame loss on video continuity, and simultaneously improve the frame loss efficiency and the user experience.

The video data processing method provided by the embodiment of the disclosure can be applied to a video data processing system, and the video data processing system can comprise a sending end, a server end and a receiving end, wherein the sending end is connected with the server end, and the server end is connected with the receiving end.

The execution subject of the video data processing method may be, for example, a receiving end of the video data processing system, where the receiving end may be a device with data processing capability, such as a mobile phone, a computer, etc. The subject of execution of the method is not limited herein.

In some embodiments, the server may be a single server, or may be a server cluster formed by a plurality of servers. In some implementations, the server cluster may also be a distributed cluster. The present disclosure is not limited to a specific implementation of the server.

Fig. 1 is a flowchart illustrating a video data processing method according to an embodiment of the disclosure. As shown in fig. 1, the method may include S101-S103.

S101, obtaining video data cached in a cache area.

For example, an auxiliary information frame may be inserted between two adjacent video frames of each group in the first video data to be transmitted at the transmitting end of the video data or at a relay server (e.g., source station) of the video data, the auxiliary information frame including picture change information of the two adjacent video frames. The video data may include at least two video frames and an auxiliary information frame between each set of two adjacent video frames. The receiving end of the video data may receive the video data transmitted from the transmitting end to the receiving end. After receiving the video data sent by the sending end, the receiving end caches the video data in the cache area and plays the video data.

S102, determining a video frame from the 1 st video frame to the K th video frame as a target video frame according to the target auxiliary information frame and the N th video frame.

For example, one or more video frames having a picture difference with the nth video frame that meets a preset condition may be determined from the 1 st video frame to the kth video frame, and then one video frame is selected from the video frames that meet the preset condition as the target video frame, where K is an integer greater than 1, and N is an integer greater than K. After the video data buffered in the buffer area is obtained, a target video frame which meets the preset condition with the picture difference of the nth video frame can be determined from the 1 st video frame to the kth video frame according to all the auxiliary information frames, namely the target auxiliary information frame, between the 1 st video frame and the kth video frame in the video data.

For example, when K is 5,N and 6, after the video data buffered in the buffer is obtained, according to all the auxiliary information frames from the 1 st video frame to the 5 th video frame in the video data, a video frame meeting the preset condition is determined from the 1 st video frame to the 5 th video frame, and then one video frame is selected from the video frames meeting the preset condition as the target video frame. That is, when it is determined that the picture difference value between the 3 rd video frame and the 6 th video frame meets the preset condition according to all the auxiliary information frames between the 1 st video frame and the 5 th video frame in the video data, the 3 rd video frame may be taken as the target video frame.

S103, discarding video frames from the target video frame to the Nth video frame.

For example, after the target video frame is determined, video frames between the target video frame and the nth video frame may be discarded.

For example, when K is 5,N and 6, it is determined that the 3 rd video frame in the video data is the target video frame, the 4 th video frame and the 5 th video frame may be discarded.

The method comprises the steps of obtaining video data cached in a cache area, determining target video frames from the 1 st video frame to the K th video frame according to target auxiliary information frames from the 1 st video frame to the K th video frame in the video data, and finally discarding video frames from the target video frame to the N th video frame. The influence of frame loss on video continuity can be reduced, the frame loss efficiency is improved, and the user experience is improved.

In some embodiments, before determining a video frame from the 1 st video frame to the K th video frame as the target video frame according to the target auxiliary information frame and the nth video frame, the method may further include: and determining that the data occupation length of the video data is larger than a preset threshold value.

The data occupation length of the video data may be determined by the duration of the video data, or by the space occupied by the video data, for example. When the data occupation length of the video data is determined by the duration of the video data, if the preset threshold is 10 seconds, when the duration of the video data exceeds 10 seconds, the data occupation length of the video data is considered to be greater than the preset threshold. When the data occupation length of the video data is determined by the space occupied by the video data, if the preset threshold is 5 Gigabytes (GB), the data occupation length of the video data is considered to be greater than the preset threshold when the space occupied by the video data exceeds 5 GB.

The embodiment determines that the data occupation length of the video data is greater than a preset threshold. The frame loss processing can be carried out on the video data when the data occupation length of the video data is larger than the preset threshold value, the frame loss accuracy of the video data is improved, and the user experience is further improved.

In some embodiments, the data occupation lengths of the 1 st video frame to the K th video frame and the target auxiliary information frame are smaller than or equal to a preset threshold, and the data occupation lengths of the auxiliary information frames between the 1 st video frame to the N th video frame and between the 1 st video frame to the N th video frame are larger than the preset threshold.

For example, when K is 5,N and K is 6, the data occupation length of the 1 st video frame to the 5 th video frame and the target auxiliary information frame between the 1 st video frame to the 5 th video frame in the video data is smaller than or equal to the preset threshold, and the data occupation length of the auxiliary information frame between the 1 st video frame to the 6 th video frame and the 1 st video frame to the 6 th video frame is larger than the preset threshold.

In this embodiment, by defining that the data occupation lengths of the 1 st video frame to the K th video frame and the target auxiliary information frame are smaller than or equal to the preset threshold, the data occupation lengths of the 1 st video frame to the N th video frame and the auxiliary information frame between the 1 st video frame to the N th video frame are larger than the preset threshold. The method can ensure that the discarded video frames are in the cached video data, and improves the accuracy of frame loss.

In some embodiments, the data occupation length of the 1 st video frame to the k+1st video frame and the auxiliary information frame between the 1 st video frame to the k+1st video frame is greater than a preset threshold.

For example, when K is 5, it may be determined that the data occupation length of the 1 st video frame to the 6 th video frame and the auxiliary information frame between the 1 st video frame to the 6 th video frame is greater than the preset threshold. The data occupation length of the 1 st video frame to the 5 th video frame and the auxiliary information frame between the 1 st video frame and the 5 th video frame is smaller than or equal to a preset threshold value. At this time, K can be understood as a critical value.

In this embodiment, the data occupation length of the auxiliary information frames between the 1 st video frame and the k+1st video frame and between the 1 st video frame and the k+1st video frame is greater than a preset threshold. The method can determine that the video frames which can be discarded are selected from all video frames in the video data with the data occupation length larger than the preset threshold value, and further improves the frame loss efficiency.

In some embodiments, the data occupation length of the 1 st video frame to the N-1 st video frame and the auxiliary information frame between the 1 st video frame to the N-1 st video frame is less than or equal to a preset threshold.

Based on the above embodiment, for example, when N is 6, N-1 is 5, the data occupation length of the auxiliary information frames between the 1 st video frame and the 5 th video frame and between the 1 st video frame and the 5 th video frame is smaller than or equal to the preset threshold, the data occupation length of the auxiliary information frames between the 1 st video frame and the 6 th video frame and between the 1 st video frame and the 6 th video frame is larger than the preset threshold, and it may be determined that the 6 th video frame is the first video frame after the buffered video data.

The data occupation length of the auxiliary information frames between the 1 st video frame and the N-1 st video frame is smaller than or equal to a preset threshold value by limiting the 1 st video frame to the N-1 st video frame and the auxiliary information frames between the 1 st video frame and the N-1 st video frame. The N-th video frame can be determined to be the first video frame after the cached video data, so that the convenience for determining the discardable video frame is improved.

In some embodiments, the preset conditions may include: the picture difference value between the target video frame and the Nth video frame is the smallest.

For example, a picture difference value between each video frame and an nth video frame in the video data buffered in the buffer area may be calculated, and then a video frame having a smallest picture difference value with the nth video frame may be determined as a target video frame, where the target video frame is a discardable video frame.

The embodiment includes that the picture difference value between the target video frame and the nth video frame is minimum by limiting the preset conditions. The video frame with the smallest picture difference value with the Nth video frame can be determined as the target video frame and discarded, so that the influence of the frame loss on the video continuity is further reduced, and the video playing fluency after the frame loss is improved.

Fig. 2 is a schematic flowchart of an implementation of S102 in fig. 1 according to an embodiment of the disclosure. As shown in FIG. 2, S102 in FIG. 1 may include S201-S204.

S201, according to the auxiliary information frame between every group of two adjacent video frames between the 1 st video frame and the N th video frame, determining the picture difference value between every group of two adjacent video frames between the 1 st video frame and the N th video frame.

For example, the auxiliary information frame inserted between each group of two adjacent video frames includes picture change information of the two video frames, and one picture change calculation function may be denoted by f, where p1 is one picture image, p2 is another picture image, and p3 is another picture image, and for any three such picture images, f needs to satisfy: f (p 1, p 2) =f (p 1, p 3) +f (p 3, p 2).

S202, determining the picture difference value of the 1 st video frame and the N th video frame according to the picture difference value between every two adjacent video frames between the 1 st video frame and the N th video frame.

For example, the picture difference value between each group of two adjacent video frames between the 1 st video frame and the nth video frame may be determined according to the auxiliary information frame between each group of two adjacent video frames between the 1 st video frame and the nth video frame. And then, calculating the sum of the picture difference values between each group of two adjacent video frames between the 1 st video frame and the N th video frame according to the picture difference values between each group of two adjacent video frames between the 1 st video frame and the N th video frame, so as to obtain the picture difference values between the 1 st video frame and the N th video frame.

For example, the picture difference between two video frames may be represented by diff, and the two video frames by diff_scorePicture difference values between them. When N is 6, the picture difference value between the 1 st video frame and the N th video frame is the sum of the picture difference values between each group of two adjacent video frames between the 1 st video frame and the 6 th video frame, and the picture difference value between the 1 st video frame and the 6 th video frame can be expressed as: diff_score ₁₆ ＝diff_score ₁₂ +diff_score ₂₃ +diff_score ₃₄ +diff_score ₄₅ +diff_score ₅₆ . Likewise, the picture difference value of the 2 nd video frame and the 6 th video frame can be expressed as: diff_score ₂₆ ＝diff_score ₂₃ +diff_score ₃₄ +diff_score ₄₅ +diff_score ₅₆ The picture difference value between the 3 rd video frame and the 6 th video frame is expressed as: diff_score ₃₆ ＝diff_score ₃₄ +diff_score ₄₅ +diff_score ₅₆ The picture difference value between the 4 th video frame and the 6 th video frame is expressed as: diff_score ₄₆ ＝diff_score ₄₅ +diff_score ₅₆ The picture difference value between the 5 th video frame and the 6 th video frame is expressed as: diff_score ₅₆ 。

S203, determining the picture difference value of the Kth video frame and the Nth video frame according to the picture difference value between every group of two adjacent video frames between the Kth video frame and the Nth video frame.

For example, the specific method for determining the picture difference value between the kth video frame and the nth video frame may refer to the specific method for determining the picture difference value between the 1 st video frame and the nth video frame, which is not described herein.

S204, determining the video frame with the smallest picture difference value with the Nth video frame as a target video frame in the 1 st video frame to the Kth video frame.

As an example, as can be seen from the above embodiments, K is an integer greater than 1, N is an integer greater than K, and after determining the picture difference value between each group of two adjacent video frames between the 1 st video frame and the nth video frame, the picture difference value between each video frame between the 1 st video frame and the nth video frame can be obtained. And then, from the 1 st video frame to the K th video frame, determining the video frame with the smallest picture difference value with the N th video frame as a target video frame.

According to the embodiment, the picture difference value between each group of two adjacent video frames between the 1 st video frame and the N th video frame is determined according to the auxiliary information frame between each group of two adjacent video frames between the 1 st video frame and the N th video frame; determining the picture difference value of the 1 st video frame and the N th video frame according to the picture difference value between each group of two adjacent video frames between the 1 st video frame and the N th video frame; determining the picture difference value of the Kth video frame and the Nth video frame according to the picture difference value between each group of two adjacent video frames between the Kth video frame and the Nth video frame; the method can determine the video frame with the smallest picture difference value with the Nth video frame from the 1 st video frame to the K th video frame, and takes the video frame as the target video frame, thereby further reducing the influence of the discarded target video frame on the video continuity and improving the video playing fluency after the frame is lost.

Fig. 3 is a schematic flowchart of an implementation of S201 in fig. 2 provided in an embodiment of the disclosure. As shown in fig. 3, S201 in fig. 2 may include S301-S304.

S301, determining a first matrix and a second matrix corresponding to each group of two adjacent video frames between the 1 st video frame and the N th video frame.

Illustratively, the first matrix and the second matrix are each p×q matrices.

Taking two adjacent video frames as gray images p1 and p2 respectively as an example, the gray images are composed of a plurality of pixel points, the sizes of the images p1 and p2 are p×q pixels, then p1 can be expressed as a matrix M1 with p×q, M1 is a first matrix, and p2 can be expressed as a matrix M2 with p×q, M2 is a second matrix, wherein M1 (i, j) is the gray value of the ith row and jth column of the first matrix. For images p1, p2, diff of p1 and p2 can be defined as: diff (diff) ₁₂ ＝f(p1,p2)＝M1–M2。

S302, calculating squares of differences between elements from 0 th row to p-1 th row and from 0 th column to q-1 th column in the first matrix and corresponding elements in the second matrix, and obtaining p multiplied by q square results.

S303, calculating the sum of p multiplied by q square results to obtain a summation result.

S304, calculating the ratio of the summation result to p multiplied by q to obtain the picture difference value between every two adjacent video frames from the 1 st video frame to the N th video frame.

For example, the picture difference value between each group of two adjacent video frames between the 1 st video frame and the nth video frame may be calculated according to a preset formula. When N is 2, a picture difference value between the 1 st video frame and the 2 nd video frame may be calculated according to formula (1).

Wherein g is a picture difference value calculation function, [ M1 (i, j) -M2 (i, j)] ² Representing p x q square results,represents the summation result, g (diff ₁₂ ) Representing the picture difference value between the 1 st video frame and the 2 nd video frame, and p, q represent the pixels of the 1 st video frame and the 2 nd video frame.

According to the method, a first matrix and a second matrix which correspond to each group of two adjacent video frames between the 1 st video frame and the N th video frame are determined, then squares of differences between elements of the 0 th row to the p-1 th row and the 0 th column to the q-1 th column in the first matrix and corresponding elements in the second matrix are calculated, and p multiplied by q square results are obtained; calculating the sum of p multiplied by q square results to obtain a summation result; and finally, calculating the ratio of the summation result to p multiplied by q to obtain a picture difference value between every two adjacent video frames from the 1 st video frame to the N th video frame, and calculating the picture difference value between every two adjacent video frames from the 1 st video frame to the N th video frame to provide data support for the picture difference value between every video frame from the 1 st video frame to the N th video frame and the N th video frame calculated subsequently.

The embodiment of the disclosure also provides a video data processing method, which is applied to the transmitting end or the server of the video frame processing system, wherein the transmitting end is used for transmitting the second video data to the receiving end, and the server is used for receiving the second video data and forwarding the second video data to the receiving end.

Fig. 4 is another flow chart of a video data processing method according to an embodiment of the disclosure. As shown in fig. 4, the method may include S401-S403.

S401, acquiring first video data to be transmitted.

Wherein the first video data comprises at least two video frames.

The first video data is illustratively original video data to be transmitted by the transmitting end.

S402, inserting an auxiliary information frame between every two adjacent video frames in the first video data to obtain second video data.

S403, sending the second video data to the receiving end.

The second video data illustratively includes the first video data and auxiliary information frames interposed between each set of two adjacent video frames in the first video data. The auxiliary information frame inserted between each group of adjacent two video frames includes picture change information of the two video frames. The picture change information of the two video frames may include: displacement or motion information, pixel value differences, image texture, optical flow, object segmentation information, etc. Displacement or motion information is a change in the position of an object between two video frames and can be represented as a change in translation, rotation, scaling or warping of a pixel. The pixel value difference is the change in pixel value between two video frames, i.e., the difference in color or brightness. Image texture is the variation in texture between two video frames, such as edges, texture direction, texture density, etc. Optical flow is the direction and speed information of the motion of a pixel between two video frames. The object segmentation information is segmentation information of an object between two video frames, namely, foreground and background in the video frames are separated. And inserting an auxiliary information frame of picture change information comprising two video frames between each group of adjacent two video frames in the first video data to obtain second video data, and then sending the second video data to a receiving end. When the receiving end receives the second video data, a part of the second video data may be buffered in the buffer area, that is, the video data buffered in the buffer area in the above embodiment.

For example, when the first video data includes video frame 1, video frame 2, video frame 3, video frame 4, video frame 5, and video frame 6, the auxiliary information frame 1 may be inserted between the video frame 1 and the video frame 2, the auxiliary information frame 2 may be inserted between the video frame 2 and the video frame 3, the auxiliary information frame 3 may be inserted between the video frame 3 and the video frame 4, the auxiliary information frame 4 may be inserted between the video frame 4 and the video frame 5, and the auxiliary information frame 5 may be inserted between the video frame 5 and the video frame 6, so as to obtain the second video data, that is, the second video data includes the video frame 1, the auxiliary information frame 1, the video frame 2, the auxiliary information frame 2, the video frame 3, the auxiliary information frame 3, the video frame 4, the auxiliary information frame 4, the video frame 5, the auxiliary information frame 5, and the video frame 6, and then the second video data is transmitted to the receiving end.

In this embodiment, the second video data is obtained by obtaining the first video data to be sent and inserting an auxiliary information frame between each group of two adjacent video frames in the first video data, and then the second video data is sent to the receiving end. The influence of frame loss on video continuity can be reduced, the frame loss efficiency is improved, and the user experience is improved.

In some embodiments, the method is applied to a transmitting device or a server, the transmitting device is configured to transmit the second video data to the receiving device, and the server is configured to receive the second video data and forward the second video data to the receiving end.

For example, the auxiliary information frame may be inserted between each set of two adjacent video frames in the first video data by the encoder of the transmitting apparatus of the first video data, or the auxiliary information frame may be inserted between each set of two adjacent video frames in the first video data by the server. The sending device is used for sending the second video data to the receiving device, and the server is used for receiving the second video data and forwarding the second video data to the receiving end.

The embodiment is applied to a transmitting device or a server by limiting the method, wherein the transmitting device is used for transmitting the second video data to a receiving device, and the server is used for receiving the second video data and forwarding the second video data to a receiving end. The richness of the mode of inserting the auxiliary information frames between each group of adjacent two video frames in the first video data is increased, so that the auxiliary information frames are inserted between each group of adjacent two video frames in the video data received by a receiving end, and data support is provided for subsequent frame loss.

In an exemplary embodiment, the embodiment of the present disclosure further provides a video frame processing apparatus, which may be used to implement a video data processing method implemented by a receiving end in the foregoing embodiment. Fig. 5 is a schematic diagram of a video frame processing apparatus according to an embodiment of the disclosure. As shown in fig. 5, the apparatus may include: an acquisition unit 501, a determination unit 502, a discarding unit 503.

The obtaining unit 501 is configured to obtain video data buffered in the buffer, where the video data includes at least two video frames and an auxiliary information frame between each group of two adjacent video frames, and the auxiliary information frame includes picture change information of the two adjacent video frames.

The determining unit 502 is configured to determine, according to the target auxiliary information frame and the nth video frame, one video frame from the 1 st video frame to the kth video frame as the target video frame, where K is an integer greater than 1, and N is an integer greater than K, the target auxiliary information frame includes all auxiliary information frames between the 1 st video frame and the kth video frame, and a picture difference between the target video frame and the nth video frame meets a preset condition.

And a discarding unit 503, configured to discard video frames from the target video frame to the nth video frame.

Optionally, the determining unit 502 is further configured to determine that the data occupation length of the video data is greater than a preset threshold.

Optionally, the data occupation lengths of the 1 st video frame to the K th video frame and the target auxiliary information frame are smaller than or equal to a preset threshold, and the data occupation lengths of the auxiliary information frames between the 1 st video frame to the N th video frame and between the 1 st video frame to the N th video frame are larger than the preset threshold.

Optionally, the data occupation length of the 1 st video frame to the k+1st video frame and the auxiliary information frame between the 1 st video frame to the k+1st video frame is greater than a preset threshold.

Optionally, the data occupation length of the 1 st video frame to the N-1 st video frame and the auxiliary information frame between the 1 st video frame and the N-1 st video frame is less than or equal to a preset threshold.

Optionally, the preset conditions include: the picture difference value between the target video frame and the Nth video frame is the smallest.

Optionally, the determining unit 502 is specifically configured to determine a picture difference value between each group of two adjacent video frames between the 1 st video frame and the nth video frame according to an auxiliary information frame between each group of two adjacent video frames between the 1 st video frame and the nth video frame; determining the picture difference value of the 1 st video frame and the N th video frame according to the picture difference value between each group of two adjacent video frames between the 1 st video frame and the N th video frame; determining the picture difference value of the Kth video frame and the Nth video frame according to the picture difference value between each group of two adjacent video frames between the Kth video frame and the Nth video frame; and determining the video frame with the smallest picture difference value with the Nth video frame as a target video frame in the 1 st video frame to the Kth video frame.

Optionally, the determining unit 502 is specifically configured to determine a first matrix and a second matrix corresponding to each group of two adjacent video frames between the 1 st video frame and the nth video frame, where the first matrix and the second matrix are p×q matrices; calculating squares of differences between elements from 0 th row to p-1 th row and from 0 th column to q-1 th column in the first matrix and corresponding elements in the second matrix to obtain p multiplied by q square results; calculating the sum of p multiplied by q square results to obtain a summation result; and calculating the ratio of the summation result to p multiplied by q to obtain the picture difference value between every two adjacent video frames from the 1 st video frame to the N th video frame.

In an exemplary embodiment, the embodiment of the present disclosure further provides a video frame processing apparatus, which may be used to implement a video data processing method implemented by a transmitting end or a server in the foregoing embodiment. Fig. 6 is another schematic diagram of a video frame processing apparatus according to an embodiment of the disclosure. As shown in fig. 6, the apparatus may include: acquisition unit 601, insertion unit 602, transmission unit 603.

The acquiring unit 601 is configured to acquire first video data to be transmitted, where the first video data includes at least two video frames.

An inserting unit 602, configured to insert an auxiliary information frame between each group of two adjacent video frames in the first video data, so as to obtain second video data, where the auxiliary information frame includes picture change information of the two adjacent video frames.

A transmitting unit 603, configured to transmit the second video data to a receiving end.

Alternatively, the apparatus is applied to a transmitting device for transmitting the second video data to the receiving device or to a server for receiving the second video data and forwarding the second video data to the receiving end.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, a computer program product.

In an exemplary embodiment, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the above embodiments.

In an exemplary embodiment, the readable storage medium may be a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to the above embodiment.

In an exemplary embodiment, the computer program product comprises a computer program which, when executed by a processor, implements the method according to the above embodiments.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, for example, a video data processing method. For example, in some embodiments, the video data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. When a computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of the video data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the video data processing method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of video data processing, the method comprising:

obtaining video data cached in a cache region, wherein the video data comprises at least two video frames and auxiliary information frames between each group of two adjacent video frames, and the auxiliary information frames comprise picture change information of the two adjacent video frames;

determining a video frame from a 1 st video frame to a K th video frame as a target video frame according to a target auxiliary information frame and the N th video frame, wherein K is an integer greater than 1, N is an integer greater than K, the target auxiliary information frame comprises all auxiliary information frames between the 1 st video frame and the K th video frame, and the picture difference between the target video frame and the N th video frame accords with a preset condition;

And discarding video frames from the target video frame to the Nth video frame.

2. The method of claim 1, further comprising, prior to said determining a video frame from the 1 st video frame to the K th video frame as the target video frame based on the target auxiliary information frame and the nth video frame:

and determining that the data occupation length of the video data is larger than a preset threshold value.

3. The method of claim 2, wherein the data occupation lengths of the 1 st video frame to the K th video frame and the target auxiliary information frame are less than or equal to the preset threshold, and the data occupation lengths of the auxiliary information frames between the 1 st video frame to the N th video frame and the 1 st video frame to the N th video frame are greater than the preset threshold.

4. A method according to claim 3, wherein the data occupation lengths of the 1 st to k+1st video frames and the auxiliary information frames between the 1 st to k+1st video frames are greater than the preset threshold.

5. The method of claim 3 or 4, wherein the data occupation lengths of the 1 st video frame to the N-1 st video frame and the auxiliary information frames between the 1 st video frame to the N-1 st video frame are less than or equal to the preset threshold.

6. The method according to any one of claims 1-5, the preset conditions comprising:

and the picture difference value between the target video frame and the Nth video frame is minimum.

7. The method of claim 6, wherein determining a video frame from the 1 st video frame to the K th video frame as the target video frame based on the target auxiliary information frame and the nth video frame comprises:

determining a picture difference value between each group of two adjacent video frames between the 1 st video frame and the N th video frame according to an auxiliary information frame between each group of two adjacent video frames between the 1 st video frame and the N th video frame;

determining the picture difference value between the 1 st video frame and the N th video frame according to the picture difference value between each group of two adjacent video frames between the 1 st video frame and the N th video frame;

determining a picture difference value between the Kth video frame and the Nth video frame according to the picture difference value between each group of two adjacent video frames between the Kth video frame and the Nth video frame;

and determining the video frame with the smallest picture difference value with the Nth video frame as the target video frame in the 1 st video frame to the Kth video frame.

8. The method of claim 7, wherein determining a picture difference value between each group of two adjacent video frames between the 1 st video frame and the nth video frame from an auxiliary information frame between each group of two adjacent video frames between the 1 st video frame and the nth video frame comprises:

determining a first matrix and a second matrix corresponding to each group of two adjacent video frames between the 1 st video frame and the N th video frame, wherein the first matrix and the second matrix are p multiplied by q matrices;

calculating squares of differences between elements from 0 th row to p-1 th row and from 0 th column to q-1 th column in the first matrix and corresponding elements in the second matrix to obtain p multiplied by q square results;

calculating the sum of the p multiplied by q square results to obtain a summation result;

and calculating the ratio of the summation result to p multiplied by q to obtain the picture difference value between each group of two adjacent video frames from the 1 st video frame to the N th video frame.

9. A method of video data processing, the method comprising:

acquiring first video data to be transmitted, wherein the first video data comprises at least two video frames;

Inserting an auxiliary information frame between each group of two adjacent video frames in the first video data to obtain second video data, wherein the auxiliary information frame comprises picture change information of the two adjacent video frames;

and sending the second video data to a receiving end.

10. The method of claim 9, applied to a transmitting device for transmitting the second video data to a receiving device or a server for receiving the second video data and forwarding the second video data to a receiving end.

11. A video data processing apparatus, the apparatus comprising:

the acquisition unit is used for acquiring video data cached in the cache area, wherein the video data comprises at least two video frames and auxiliary information frames between each group of two adjacent video frames, and the auxiliary information frames comprise picture change information of the two adjacent video frames;

a determining unit, configured to determine, according to a target auxiliary information frame and an nth video frame, one video frame from a 1 st video frame to a kth video frame as a target video frame, where K is an integer greater than 1, and N is an integer greater than K, where the target auxiliary information frame includes all auxiliary information frames between the 1 st video frame and the kth video frame, and a picture difference between the target video frame and the nth video frame meets a preset condition;

And the discarding unit is used for discarding the video frames from the target video frame to the Nth video frame.

12. The apparatus of claim 11, the determining unit further to:

13. The apparatus of claim 12, the data occupation length of the 1 st video frame to the K th video frame and the target auxiliary information frame being less than or equal to the preset threshold, the data occupation length of auxiliary information frames between the 1 st video frame to the N th video frame and the 1 st video frame to the N th video frame being greater than the preset threshold.

14. The apparatus of claim 13, the data occupation lengths of the 1 st video frame to the k+1st video frame and the auxiliary information frames between the 1 st video frame to the k+1st video frame being greater than the preset threshold.

15. The apparatus of claim 13 or 14, the data occupation lengths of the 1 st video frame to the N-1 st video frame and the auxiliary information frame between the 1 st video frame to the N-1 st video frame are less than or equal to the preset threshold.

16. The apparatus according to any one of claims 11-15, the preset conditions comprising:

17. The apparatus according to claim 16, the determining unit being specifically configured to:

18. The apparatus according to claim 17, the determining unit being specifically configured to:

19. A video data processing apparatus, the apparatus comprising:

an acquisition unit, configured to acquire first video data to be transmitted, where the first video data includes at least two video frames;

an inserting unit, configured to insert an auxiliary information frame between each group of two adjacent video frames in the first video data, to obtain second video data, where the auxiliary information frame includes picture change information of two adjacent video frames;

20. The apparatus of claim 19, the apparatus applied to a transmitting device for transmitting the second video data to a receiving device or a server for receiving the second video data and forwarding the second video data to a receiving end.

21. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8, or the method of claim 9, or the method of claim 10.

22. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-8, or the method of claim 9, or the method of claim 10.

23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8, or the method of claim 9, or the method of claim 10.