CN116546238A

CN116546238A - Video data transmission method and device and electronic equipment

Info

Publication number: CN116546238A
Application number: CN202310288533.3A
Authority: CN
Inventors: 于迅博; 阮志睿; 高鑫; 邢树军; 陈硕; 李相锟; 童奕翔; 桑新柱; 颜玢玢
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-03-22
Filing date: 2023-03-22
Publication date: 2023-08-04

Abstract

The application provides a video data transmission method, a video data transmission device and electronic equipment, and relates to the technical field of communication. The method comprises the following steps: acquiring a background image of a preset background and a plurality of video frame images which are sequentially acquired under the preset background; extracting foreground images of each video frame image in the plurality of video frame images; transmitting the background image, the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to a server; the positions of the foreground images in the corresponding video frame images are used for splicing the background images and the foreground images of each video frame image to obtain a plurality of video frame images, so that when video data is transmitted in a scene with unchanged video background, the background images can be transmitted only once, the transmission quantity of the video data is effectively reduced, and the transmission efficiency of the video data is effectively improved.

Description

Video data transmission method and device and electronic equipment

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a video data transmission method, apparatus, and electronic device.

Background

Video data transmission is more and more common in life and widely applied to the fields of video conferences, network living broadcast and the like.

Taking the field of network live broadcasting as an example, a host uses a camera to collect own pictures, or directly intercepts the screen pictures of a computer, and transmits the obtained video data to a server, so that a user can pull the video data from the server, restore the pictures after local processing, and watch live broadcasting. But the data transmission efficiency is low due to the large data volume of the video data.

Therefore, how to improve the transmission efficiency of video data is a problem to be solved by those skilled in the art.

Disclosure of Invention

The video data transmission method, the video data transmission device and the electronic equipment can effectively reduce the transmission quantity of video data, so that the transmission efficiency of the video data is effectively improved.

The application provides a video data transmission method, which can comprise the following steps:

and acquiring a background image of a preset background and a plurality of video frame images which are sequentially acquired under the preset background.

And extracting the foreground image of each video frame image in the plurality of video frame images.

Transmitting the background image, the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to a server; the positions of the foreground images in the corresponding video frame images are used for splicing the background images and the foreground images of the video frame images to obtain the plurality of video frame images.

According to the video data transmission method provided by the application, the extracting the foreground image of each video frame image in the plurality of video frame images includes:

and detecting a foreground region in the video frame image for each video frame image.

And dividing the video frame image according to the foreground region to obtain the foreground image in the video frame image.

According to the video data transmission method provided by the application, the sending the background image, the foreground image of each video frame image, and the position of the foreground image in the corresponding video frame image to a server includes:

and sending the background image to the server.

And under the condition that the background image is transmitted to the server, sequentially transmitting the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to the server according to the acquisition sequence of each video frame image.

According to the video data transmission method provided by the application, according to the acquisition sequence of each video frame image, the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image are sequentially sent to the server, and the method comprises the following steps:

And according to the acquisition sequence of each video frame image, sequentially carrying out compression coding on the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to obtain a video stream corresponding to each foreground image.

And sequentially sending video streams corresponding to the foreground images to the server according to the acquisition sequence of the video frame images.

The application also provides a video data transmission method, which may include:

receiving a background image of a preset background sent by a server, a foreground image of each video frame image in a plurality of video frame images sequentially acquired under the preset background, and the position of the foreground image in the corresponding video frame image.

And splicing the background image and the foreground image of each video frame image according to the position of the foreground image in the corresponding video frame image to obtain the plurality of video frame images.

According to the video data transmission method provided by the application, the splicing is performed on the background image and the foreground image of each video frame image according to the position of the foreground image in the corresponding video frame image, so as to obtain the plurality of video frame images, and the video data transmission method comprises the following steps:

And based on the receiving sequence of the foreground images of the video frame images, splicing the background images and the foreground images of the video frame images according to the positions of the foreground images in the corresponding video frame images in sequence to obtain the plurality of video frame images.

The application also provides a video data transmission device, which is applied to video acquisition equipment, and comprises:

the acquisition unit is used for acquiring a background image of a preset background and a plurality of video frame images acquired in sequence under the preset background.

And the processing unit is used for extracting the foreground image of each video frame image in the plurality of video frame images.

A sending unit, configured to send the background image, the foreground image of each video frame image, and the position of the foreground image in the corresponding video frame image to a server; the positions of the foreground images in the corresponding video frame images are used for splicing the background images and the foreground images of the video frame images to obtain the plurality of video frame images.

According to the video data transmission device provided by the application, the processing unit is specifically configured to:

detecting a foreground region in the video frame image for each video frame image; and dividing the video frame image according to the foreground region to obtain the foreground image in the video frame image.

According to the video data transmission device provided by the application, the sending unit is specifically configured to:

transmitting the background image to the server; and under the condition that the background image is transmitted to the server, sequentially transmitting the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to the server according to the acquisition sequence of each video frame image.

according to the acquisition sequence of each video frame image, sequentially carrying out compression coding on the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to obtain a video stream corresponding to each foreground image; and sequentially sending video streams corresponding to the foreground images to the server according to the acquisition sequence of the video frame images.

The application also provides another video data transmission device, which is applied to video playing equipment, and the device comprises:

the receiving unit is used for receiving a background image of a preset background sent by the server, a foreground image of each video frame image in a plurality of video frame images sequentially acquired under the preset background and the position of the foreground image in the corresponding video frame image;

And the splicing unit is used for splicing the background image and the foreground image of each video frame image according to the position of the foreground image in the corresponding video frame image to obtain the plurality of video frame images.

According to another video data transmission device provided in the present application, the splicing unit is specifically configured to:

The application also provides a video playing system, comprising: video acquisition equipment, server and video playback equipment.

The video acquisition equipment is used for acquiring a background image of a preset background and a plurality of video frame images acquired in sequence under the preset background; extracting foreground images of each video frame image in the plurality of video frame images; transmitting the background image, the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to the server;

The video playing device is used for receiving a background image of a preset background sent by the server, a foreground image of each video frame image in a plurality of video frame images which are sequentially collected under the preset background, and the position of the foreground image in the corresponding video frame image; and splicing the background image and the foreground image of each video frame image according to the position of the foreground image in the corresponding video frame image to obtain the plurality of video frame images.

The application also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any one of the video data transmission methods or the other video data transmission method as described above when executing the program.

The present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a video data transmission method as described in any one of the above or the other video data transmission method.

The present application also provides a computer program product comprising a computer program which when executed by a processor implements a video data transmission method as described in any one of the above or the other video data transmission method.

According to the video data transmission method, the video data transmission device and the electronic equipment, the background image of the preset background and the plurality of video frame images which are sequentially collected under the preset background are obtained; extracting foreground images of each video frame image in the plurality of video frame images; transmitting the background image, the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to a server; the positions of the foreground images in the corresponding video frame images are used for splicing the background images and the foreground images of each video frame image to obtain a plurality of video frame images, so that when video data is transmitted in a scene with unchanged video background, the background images can be transmitted only once, the transmission quantity of the video data is effectively reduced, and the transmission efficiency of the video data is effectively improved.

Drawings

For a clearer description of the present application or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a video data transmission method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a frame for transmitting a background image and each foreground image according to an embodiment of the present application;

fig. 3 is a flowchart of another video data transmission method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a video data transmission device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of another video data transmission device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a video playing system according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is apparent that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In the text description of the present application, the character "/" generally indicates that the front-rear association object is an or relationship.

The technical scheme provided by the embodiment of the application can be applied to a network live broadcast scene or a video conference scene. Taking the live webcast scene as an example, a host uses a camera to collect own pictures, or directly intercepts the screen pictures of a computer, and transmits the obtained video data to a server, so that a user can pull the video data from the server, restore the pictures after local processing, and watch live broadcasting. But the data transmission efficiency is low due to the large data volume of the video data. Especially for higher definition video data.

In the prior art, when transmitting the acquired video data to a server, it is generally required to perform compression encoding on the video data first and transmit the compression-encoded video data to the server. The compression encoding is used to remove part of information in the image, such as spatial redundancy, temporal redundancy, visual redundancy, etc., which is unnecessary for the transmission process or human eye vision, and all the compression encoding is performed on the pixel values of the image, so that the more and more clear images are wanted, the more and more complicated processing procedures and slower compression encoding rates are often meant, and the lower transmission efficiency of video data is caused.

Therefore, in order to improve the transmission efficiency of video data, for a scene with unchanged video background, only one background image can be transmitted when video data is transmitted, and when other video frame images are transmitted, only the foreground image of each video frame image and the position of the foreground image in the video frame image are required to be transmitted, so that the background image and the foreground image of each video frame image can be spliced according to the position of the foreground image in the corresponding video frame image to obtain a plurality of video frame images, and the transmission quantity of the video data can be effectively reduced, thereby effectively improving the transmission efficiency of the video data.

Before describing in detail the video data transmission method provided in the embodiments of the present application, the basic concepts related to the present application will be described.

Resolution ratio: the size of an image is also referred to as the size of the image, and refers to the number of pixels included in a frame of the image, for example, 1280×720, 1920×1080, and other specifications. The resolution affects the size of the image and is proportional thereto.

Compression coding: means that the image data is processed and encoded such that the amount of data transferred by the image data is reduced as much as possible.

And (3) target detection: is a very important task in the fields of computer vision and computer image processing. Object detection can find all objects or objects of interest in an image and determine their class and location.

Hereinafter, a video data transmission method provided in the present application will be described in detail by the following several specific embodiments. It is to be understood that the following embodiments may be combined with each other and that some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 is a flowchart of a video data transmission method according to an embodiment of the present application, where the video data transmission method may be performed by a software and/or hardware device. For example, referring to fig. 1, the video data transmission method may include:

s101, acquiring a background image of a preset background and a plurality of video frame images acquired in sequence under the preset background.

Wherein the video is displayed at a given frequency as a series of acquired images, referred to as image frames.

For example, when a background image of a preset background and a plurality of video frame images sequentially acquired under the preset background are acquired, a multi-view camera device may be used to acquire the background image of the preset background and the plurality of video frame images under the preset background, respectively. In general, a background image of a preset background may be acquired first, and then, a plurality of video frame images may be sequentially acquired under the preset background.

For example, when acquiring a plurality of video frame images under a preset background, for each video frame image, an initial video frame image corresponding to the video frame image may be acquired first; judging whether the size of the initial video frame image accords with the preset size, if the size of the initial video frame image is smaller than the preset size, amplifying the initial video frame image, namely inserting new elements between pixel points by using a proper interpolation algorithm on the basis of the pixels of the initial video frame image to obtain an amplified video frame image; if the size of the initial video frame image is larger than the preset size, the initial video frame image can be reduced, a proper downsampling algorithm can be used on the basis of the pixels of the initial video frame image under the condition that the characteristics of the initial video frame image are maintained, a part of proper pixel points are selected to form a new image according to a certain rule, and the new image is the reduced video frame image, so that a plurality of video frame images under the preset background are obtained.

S102, extracting foreground images of all video frame images in the plurality of video frame images.

In the embodiment of the application, in order to effectively reduce the amount of video data to be transmitted, the background images in the plurality of video frame images are the same, so that the background images can be transmitted only once, and only the foreground images of each video frame image need to be transmitted for the plurality of video frame images, so that the transmission amount of the video data can be effectively reduced, and the transmission efficiency of the video data is effectively improved.

Illustratively, in the embodiment of the present application, when extracting a foreground image of each video frame image in a plurality of video frame images, a foreground region in the video frame image may be detected for each video frame image; and dividing the video frame image according to the foreground region to obtain a foreground image in the video frame image.

The foreground image may be understood as a subject target image, that is, a target image that is of interest or needs to be of the user in the video frame image, and the foreground region may be understood as a region including the subject target image in the video frame image.

For example, in the embodiment of the present application, when detecting a foreground area in a video frame image, a conventional target detection algorithm may be used to detect the foreground area in the video frame image, or a target detection model obtained by pre-training may be used to detect the foreground area in the video frame image, and specifically may be set according to actual needs.

For example, in the embodiment of the present application, when a conventional target detection algorithm is used to detect a foreground region in a video frame image, a sliding window mode may be used to extract a candidate region in the video frame image; extracting the characteristics of the local information of the video frame image in each window; classifying and judging the extracted features to obtain a plurality of candidate foreground areas; a foreground region in the video frame image is then determined from the plurality of candidate foreground regions using a non-maximum suppression algorithm.

For example, in the embodiment of the present application, when the foreground Region in the video frame image is detected by using the target detection model obtained by pre-training, a deep learning method may be used according to the principle of machine learning, and training is performed by using a very large number of datasets containing the foreground image, so as to obtain a target detection model, for example, a Region-based convolutional neural network model (Region-based Convolutional Nerual Networks, R-CNN); the foreground region in the video frame image is detected at the pixel level by the object detection model.

In combination with the above description, after the foreground region in the video frame image is detected, the video frame image may be segmented according to the foreground region, so as to obtain the foreground image in the video frame image. For example, when the video frame image is segmented according to the foreground area, the video frame image is usually cut according to the foreground area, so as to achieve segmentation of the video frame image, obtain the foreground image, and send the background image and each foreground image to the server, for example, as shown in fig. 2, fig. 2 is a schematic frame diagram of sending the background image and each foreground image provided in the embodiment of the present application, that is, for a scene with a unchanged video background, when video data is transmitted, only one background image is transmitted, and when other video frame images are transmitted, only the foreground image and the position of the foreground image in the video frame image of each video frame image need to be transmitted.

After extracting the foreground image of each of the plurality of video frame images, the following S103 is performed:

s103, transmitting the background image, the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to a server; the positions of the foreground images in the corresponding video frame images are used for splicing the background images and the foreground images of the video frame images to obtain a plurality of video frame images.

For example, in the embodiment of the present application, considering that each subsequent video frame image needs to be stitched with a background image, in order not to affect stitching, the background image may be sent to the server first; and then, under the condition that the background image is determined to be sent to the server, sequentially sending the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to the server according to the acquisition sequence of each video frame image, so that the background image and the foreground image of each video frame image can be spliced according to the position of the foreground image in the corresponding video frame image to obtain a plurality of video frame images.

For example, when the background image is sent to the server, the user may compress and encode the background image to obtain a video stream corresponding to the background image, and send the video stream corresponding to the background image to the server, considering that there is more redundant data in the background image, which generally allows for reasons such as image distortion.

Similarly, when the foreground images of the video frame images and the positions of the foreground images in the corresponding video frame images are sequentially sent to the server according to the acquisition sequence of the video frame images, the reasons that the users usually allow image distortion and the like are considered in the video frame images, and the foreground images of the video frame images and the positions of the foreground images in the corresponding video frame images can be sequentially compressed and encoded according to the acquisition sequence of the video frame images, so that video streams corresponding to the foreground images are obtained; according to the acquisition sequence of each video frame image, video streams corresponding to each foreground image are sequentially sent to the server, so that redundant data can be reduced, and the data transmission quantity can be reduced.

For example, the redundant data may include at least one of coding redundancy, pixel redundancy, visual redundancy. For coding redundancy, e.g. gray scale coding of one video frame image, more coding symbols than actually needed are used; for pixel redundancy, since any given pixel value can in principle be predicted by its neighbors, the information carried by a single pixel is relatively small, and many single pixels are redundant to the visual contribution for a video frame image; for visual redundancy, some information is of less relative importance in general visual processing than other information, i.e., visual redundancy.

For example, when any one of the background image and the foreground image of each video frame image is compressed and encoded, three independent video signals are required to be transmitted simultaneously in consideration of the RGB format, the YUV format only needs to occupy little bandwidth, and the minimum distortion degree of human eye perception can be ensured under the condition of reaching the maximum compression rate, and redundant data existing in the image can be removed greatly, so that the RGB image with original color can be converted into the image with YUV format; and then coding the YUV format image according to the existing rule to obtain a compression coded image. Existing rules, such as H264 transport protocol, H265 transport protocol, etc.

Wherein "Y" in YUV format represents brightness, i.e. gray scale value; while "U" and "V" represent chromaticity, density, and may describe the color and saturation of an image for specifying the color of a pixel. The sensitivity of human eyes to the bright point information is far higher than the chroma sensitivity, if the numerical values of the two UV channels are compressed, the human eyes feel weaker, so that the numerical values of the Y channels are compressed a little less, and the numerical values of the two UV channels are compressed a little more, so that the image effect and the compression rate are balanced.

For example, when the background image, the foreground image of each video frame image, and the position of the foreground image in the corresponding video frame image are transmitted to the server, software (Fast-Forward-Moving-PictureExpert Group, FFmpeg) that rapidly processes pictures, videos, and audios may be used for transmission. During transmission, connection between the video acquisition device and the server can be established first, and then the background image, the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image are transmitted to the server based on the established connection. The positions of the foreground images in the corresponding video frame images are used for splicing the background images and the foreground images of the video frame images.

For example, assuming that the positions of the background image, the foreground image 1, and the foreground image 1 in the corresponding video frame images, the positions of the foreground image 2 and the foreground image 2 in the corresponding video frame images, the positions of the foreground image 3 and the foreground image 3 in the corresponding video frame images, and the positions of the foreground image 4 and the foreground image 4 in the corresponding video frame images are sequentially acquired, when transmitting to the server, the background image may be transmitted to the server first, and when acquiring the positions of the foreground image 1 and the foreground image 1 in the corresponding video frame images, the positions of the foreground image 1 and the foreground image 1 in the corresponding video frame images may be transmitted to the server; transmitting the positions of the foreground image 2 and the foreground image 2 in the corresponding video frame images to a server under the condition that the positions of the foreground image 2 and the foreground image 2 in the corresponding video frame images are acquired; under the condition that the positions of the foreground image 3 and the foreground image 3 in the corresponding video frame images are acquired, the positions of the foreground image 3 and the foreground image 3 in the corresponding video frame images are sent to a server; in the case where the positions of the foreground image 4 and the foreground image 4 in the corresponding video frame images are acquired, the positions of the foreground image 4 and the foreground image 4 in the corresponding video frame images are transmitted to the server, so that the positions of the background image, the foreground image 1, and the foreground image 1 in the corresponding video frame images, the positions of the foreground image 2 and the foreground image 2 in the corresponding video frame images, the positions of the foreground image 3 and the foreground image 3 in the corresponding video frame images, and the positions of the foreground image 4 and the foreground image 4 in the corresponding video frame images are sequentially transmitted.

It can be seen that in the embodiment of the present application, by acquiring a background image of a preset background and a plurality of video frame images sequentially acquired under the preset background, extracting a foreground image of each video frame image in the plurality of video frame images, and then sending the background image, the foreground image of each video frame image, and the position of the foreground image in the corresponding video frame image to a server. The method is characterized in that for a scene with unchanged video background, when video data is transmitted, only one background image is transmitted, and when other video frame images are transmitted, only the foreground image of each video frame image and the position of the foreground image in the video frame image are required to be transmitted, so that the background image and the foreground image of each video frame image can be spliced according to the position of the foreground image in the corresponding video frame image to obtain a plurality of video frame images, the transmission quantity of the video data can be effectively reduced, and the transmission efficiency of the video data is effectively improved.

After the video acquisition device sends the background image, the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to the server, the server can cache the positions of the background image, the foreground image of each video frame image and the foreground image in the corresponding video frame image; the video playing device can pull the video stream from the server to splice and play. For example, referring to fig. 3, fig. 3 is a schematic flow chart of another video data transmission method provided in an embodiment of the present application, where the video data transmission method may be performed by a software and/or hardware device. For example, referring to fig. 3, the video data transmission method may include:

S301, receiving a background image of a preset background sent by a server, a foreground image of each video frame image in a plurality of video frame images sequentially collected under the preset background, and the position of the foreground image in the corresponding video frame image.

The video playing device may send a pull request to the server when receiving a background image of a preset background sent by the server, foreground images of each video frame image in a plurality of video frame images sequentially collected under the preset background, and positions of the foreground images in the corresponding video frame images, and receive a video stream corresponding to the background image sent by the server, and video streams corresponding to each foreground image, and follow the same rule as that in compression encoding, convert the video stream corresponding to the background image into a video frame image in YUV format, and obtain a color RGB image corresponding to the video frame image according to a conversion relationship, so as to obtain the background image; similarly, the video stream corresponding to the foreground image is converted into a video frame image in YUV format according to the same rule as the compression coding, and the corresponding color RGB image is obtained according to the conversion relation, so as to obtain the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image.

S302, splicing the background image and the foreground image of each video frame image according to the position of the foreground image in the corresponding video frame image to obtain a plurality of video frame images.

In the embodiment of the application, when the background image and the foreground image of each video frame image are spliced according to the position of the foreground image in the corresponding video frame image, the background image and the foreground image of each video frame image are spliced according to the position of the foreground image in the corresponding video frame image in turn based on the receiving sequence of the foreground image of each video frame image, so as to obtain a plurality of video frame images.

For example, assuming that the positions of the background image, the foreground image 1 and the foreground image 1 in the corresponding video frame images, the positions of the foreground image 2 and the foreground image 2 in the corresponding video frame images, the positions of the foreground image 3 and the foreground image 3 in the corresponding video frame images, and the positions of the foreground image 4 and the foreground image 4 in the corresponding video frame images are sequentially received, under the condition that the positions of the foreground image 1 and the foreground image 1 in the corresponding video frame images are received, the background image and the foreground image 1 are spliced according to the positions of the foreground image 1 in the corresponding video frame images, so as to obtain the video frame image 1; under the condition that the positions of the foreground image 2 and the foreground image 2 in the corresponding video frame images are received, splicing the background image and the foreground image 2 according to the positions of the foreground image 2 in the corresponding video frame images to obtain the video frame images 2; under the condition that the positions of the foreground image 3 and the foreground image 3 in the corresponding video frame images are received, splicing the background image and the foreground image 3 according to the positions of the foreground image 3 in the corresponding video frame images to obtain the video frame images 3; and under the condition that the positions of the foreground image 4 and the foreground image 4 in the corresponding video frame images are received, splicing the background image and the foreground image 4 according to the positions of the foreground image 4 in the corresponding video frame images to obtain the video frame image 4, so that the video frame image 1, the video frame image 2, the video frame image 3 and the video frame image 4 are sequentially played.

It can be seen that in the embodiment of the present application, by receiving a background image of a preset background sent by a server, foreground images of each video frame image in a plurality of video frame images sequentially collected under the preset background, and positions of the foreground images in the corresponding video frame images, the background image and the foreground images of each video frame image are spliced according to the positions of the foreground images in the corresponding video frame images, so as to obtain a plurality of video frame images. Therefore, when receiving video data in a scene with unchanged video background, the background image can be received only once, so that the transmission quantity of the video data is effectively reduced, and the transmission efficiency of the video data is effectively improved.

The following describes a video data transmission device provided in an embodiment of the present application, and the video data transmission device described below and the video data transmission method described above may be referred to correspondingly to each other.

Fig. 4 is a schematic structural diagram of a video data transmission apparatus 40 according to an embodiment of the present application, which is applied to a video capturing device, for example, please refer to fig. 4, the video data transmission apparatus 40 may include:

the acquiring unit 401 is configured to acquire a background image of a preset background and a plurality of video frame images sequentially acquired under the preset background.

The processing unit 402 is configured to extract a foreground image of each video frame image in the plurality of video frame images.

A transmitting unit 403, configured to transmit the background image, the foreground image of each video frame image, and the position of the foreground image in the corresponding video frame image to the server; the positions of the foreground images in the corresponding video frame images are used for splicing the background images and the foreground images of the video frame images to obtain a plurality of video frame images.

Illustratively, in the embodiment of the present application, the processing unit 402 is specifically configured to detect, for each video frame image, a foreground area in the video frame image; and dividing the video frame image according to the foreground region to obtain a foreground image in the video frame image.

Illustratively, in the embodiment of the present application, the sending unit 403 is specifically configured to send the background image to the server; and under the condition that the background image is determined to be sent to the server, sequentially sending the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to the server according to the acquisition sequence of each video frame image.

Illustratively, in the embodiment of the present application, the sending unit 403 is specifically configured to sequentially perform compression encoding on the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image according to the acquisition sequence of each video frame image, so as to obtain a video stream corresponding to each foreground image; and sequentially sending video streams corresponding to the foreground images to a server according to the acquisition sequence of the video frame images.

The video data transmission device 40 provided in this embodiment of the present application may execute the technical scheme of the video data transmission method at the video capturing device side in any of the above embodiments, and the implementation principle and beneficial effects of the video data transmission method at the video capturing device side are similar to those of the video data transmission method at the video capturing device side, and may refer to the implementation principle and beneficial effects of the video data transmission method at the video capturing device side, and will not be described herein.

Fig. 5 is a schematic structural diagram of another video data transmission apparatus 50 according to an embodiment of the present application, which is applied to a video playing device, for example, please refer to fig. 5, the video data transmission apparatus 50 may include:

the receiving unit 501 is configured to receive a background image of a preset background sent by a server, a foreground image of each video frame image in a plurality of video frame images sequentially collected under the preset background, and a position of the foreground image in the corresponding video frame image.

And the stitching unit 502 is configured to stitch the background image and the foreground image of each video frame image according to the position of the foreground image in the corresponding video frame image, so as to obtain a plurality of video frame images.

Illustratively, in the embodiment of the present application, the stitching unit 502 is specifically configured to stitch the background image and the foreground image of each video frame image according to the position of the foreground image in the corresponding video frame image in sequence based on the receiving sequence of the foreground image of each video frame image, so as to obtain a plurality of video frame images.

The video data transmission apparatus 50 provided in this embodiment may execute the technical scheme of the video data transmission method on the video playing device side in any of the above embodiments, and the implementation principle and beneficial effects of the method are similar to those of the video data transmission method on the video playing device side, and may refer to the implementation principle and beneficial effects of the video data transmission method on the video playing device side, and will not be described herein.

Fig. 6 illustrates a schematic structure of a video playing system, and as shown in fig. 6, the video playing system 60 may include: video acquisition equipment, server and video playback equipment.

The video acquisition equipment is used for acquiring a background image of a preset background and a plurality of video frame images acquired in sequence under the preset background; extracting foreground images of each video frame image in the plurality of video frame images; transmitting the background image, the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to a server;

the video playing device is used for receiving a background image of a preset background sent by the server, foreground images of all video frame images in a plurality of video frame images sequentially collected under the preset background and positions of the foreground images in the corresponding video frame images; and splicing the background image and the foreground image of each video frame image according to the position of the foreground image in the corresponding video frame image to obtain a plurality of video frame images.

The video playing system 60 provided in this embodiment may execute the technical scheme of the video data transmission method in any of the above embodiments, and the implementation principle and beneficial effects of the video data transmission method are similar to those of the video data transmission method, and may refer to the implementation principle and beneficial effects of the video data transmission method, which are not described herein.

Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform one video data transmission method, or another video data transmission method, provided by embodiments of the present application.

The video data transmission method comprises the following steps: acquiring a background image of a preset background and a plurality of video frame images which are sequentially acquired under the preset background; extracting foreground images of each video frame image in the plurality of video frame images; transmitting the background image, the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to a server; the positions of the foreground images in the corresponding video frame images are used for splicing the background images and the foreground images of the video frame images to obtain a plurality of video frame images.

Another video data transmission method includes: receiving a background image of a preset background sent by a server, a foreground image of each video frame image in a plurality of video frame images sequentially acquired under the preset background, and the position of the foreground image in the corresponding video frame image; and splicing the background image and the foreground image of each video frame image according to the position of the foreground image in the corresponding video frame image to obtain a plurality of video frame images.

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present application also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing one or the other of the video data transmission methods provided by the methods described above.

In yet another aspect, the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform one or another of the video data transmission methods provided by the above methods.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A video data transmission method, applied to a video acquisition device, comprising:

acquiring a background image of a preset background and a plurality of video frame images which are sequentially acquired under the preset background;

extracting foreground images of each video frame image in the plurality of video frame images;

2. The method of claim 1, wherein the extracting the foreground image of each of the plurality of video frame images comprises:

detecting a foreground region in the video frame image for each video frame image;

3. The method according to claim 1 or 2, wherein said sending the background image, the foreground image of each video frame image, and the position of the foreground image in the corresponding video frame image to a server comprises:

transmitting the background image to the server;

4. A method according to claim 3, wherein sequentially transmitting the foreground images of the video frame images and the positions of the foreground images in the corresponding video frame images to the server according to the acquisition order of the video frame images comprises:

According to the acquisition sequence of each video frame image, sequentially carrying out compression coding on the foreground image of each video frame image and the position of the foreground image in the corresponding video frame image to obtain a video stream corresponding to each foreground image;

5. A video data transmission method, applied to a video playing device, comprising:

receiving a background image of a preset background sent by a server, a foreground image of each video frame image in a plurality of video frame images sequentially acquired under the preset background, and the position of the foreground image in the corresponding video frame image;

6. The method according to claim 5, wherein the stitching the background image and the foreground image of each video frame image according to the position of the foreground image in the corresponding video frame image to obtain the plurality of video frame images includes:

7. A video data transmission apparatus for use with a video acquisition device, the apparatus comprising:

the acquisition unit is used for acquiring a background image of a preset background and a plurality of video frame images which are sequentially acquired under the preset background;

the processing unit is used for extracting foreground images of each video frame image in the plurality of video frame images;

8. A video data transmission apparatus for use with a video playback device, the apparatus comprising:

9. A video playback system, comprising: video acquisition equipment, a server and video playing equipment;

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the video data transmission method of any one of claims 1 to 4 or the video data transmission method of any one of claims 5 to 6 when the program is executed by the processor.