CN114401451B

CN114401451B - Video editing method, device, electronic device and readable storage medium

Info

Publication number: CN114401451B
Application number: CN202111629147.3A
Authority: CN
Inventors: 周礼
Original assignee: You Peninsula Beijing Information Technology Co ltd
Current assignee: You Peninsula Beijing Information Technology Co ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2025-04-04
Anticipated expiration: 2041-12-28
Also published as: CN114401451A

Abstract

The embodiment of the application provides a video editing method, a device, electronic equipment and a readable storage medium, wherein the method comprises the steps of obtaining original video data and original three-dimensional scene information, wherein the original three-dimensional scene information is information representing a real scene in the process of collecting the original video data; and receiving editing operation of a user for the three-dimensional video scene, and responding to the editing operation to generate target video data. The method can solve the problems of poor universality and inconvenience in the existing video editing process, so that a user can conveniently carry out the video editing process in a three-dimensional scene.

Description

Video editing method, device, electronic equipment and readable storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of video, in particular to a video editing method, a video editing device, electronic equipment and a computer readable storage medium.

Background

In recent years, with the rise of short videos (Instant Music Video), in order to meet the requirements of people for making short videos, various video acquisition tools and video editing tools are layered endlessly, and based on the tools, users can create rich and various short video contents.

The existing video editing method is usually used for editing two-dimensional video scenes, the method is usually used for acquiring two-dimensional video data based on a camera, then loading the two-dimensional video data based on a video editing tool, and realizing video editing processing by adding virtual elements into the two-dimensional video data.

Therefore, the video editing method in the prior art aims at a two-dimensional video scene, so that when a user needs to add a virtual three-dimensional element into video data, the method is generally not applicable, or even if the element can be added, the method is inconvenient and has a large limitation.

Disclosure of Invention

An object of the present disclosure is to provide an image processing method to solve the problems of poor universality and inconvenience that may exist when video editing processing is performed in the prior art.

In a first aspect of the present disclosure, there is provided a video editing method, the method comprising:

acquiring original video data and original three-dimensional scene information, wherein the original three-dimensional scene information is information representing a real scene in the process of acquiring the original video data;

reconstructing a three-dimensional video scene according to the original video data and the original three-dimensional scene information;

and receiving editing operation of a user for the three-dimensional video scene, and generating target video data in response to the editing operation.

In a second aspect of the present disclosure, there is also provided a video acquisition method, the method including:

receiving input operation of a user;

and responding to the input operation, and acquiring original video data and original three-dimensional scene information based on a preset augmented reality technology, wherein the original three-dimensional scene information is information representing a real scene in the process of acquiring the original video data.

In a third aspect of the present disclosure, there is also provided a video editing apparatus, the apparatus including:

The system comprises an original data acquisition module, a data acquisition module and a data processing module, wherein the original data acquisition module is used for acquiring original video data and original three-dimensional scene information, and the original three-dimensional scene information is information representing a real scene in the process of acquiring the original video data;

The three-dimensional video scene reconstruction module is used for reconstructing a three-dimensional video scene according to the original video data and the original three-dimensional scene information;

and the target video data generation module is used for receiving the editing operation of a user for the three-dimensional video scene and responding to the editing operation to generate target video data.

In a fourth aspect of the present disclosure, there is also provided a video capture device, the device comprising:

the receiving module is used for receiving input operation of a user;

The acquisition module is used for responding to the input operation and acquiring original video data and original three-dimensional scene information based on a preset augmented reality technology, wherein the original three-dimensional scene information is used for representing a real scene in the process of acquiring the original video data.

In a fifth aspect of the present disclosure, there is also provided an electronic device, including:

a memory for storing executable instructions;

a processor for executing the method of the first or second aspect of the present disclosure according to the control of the instruction.

In a sixth aspect of the present disclosure, there is also provided a computer readable storage medium storing a computer program readable by a computer for performing the method according to the first or second aspect of the present disclosure when the computer program is read for execution by the computer.

The video editing device has the advantages that according to the embodiment of the disclosure, the video editing device obtains original video data and original three-dimensional scene information representing a real scene in the process of collecting the original video data, reconstructs a three-dimensional video scene based on the original video data and the original three-dimensional scene information, and then provides a three-dimensional video scene which can be conveniently and directly edited in the three-dimensional scene for a user, and based on the three-dimensional video scene, the device can generate target video data by receiving the editing operation of the user and responding to the editing operation.

Other features of the present specification and its advantages will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description, serve to explain the principles of the specification.

Fig. 1 is a flowchart of a video editing method according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of triggering data acquisition provided by an embodiment of the present disclosure.

Fig. 3 is a flow chart of a three-dimensional scene reconstruction process provided by an embodiment of the present disclosure.

Fig. 4 is a flowchart of a video capturing method according to an embodiment of the present disclosure.

Fig. 5 is a schematic block diagram of a video editing apparatus provided in an embodiment of the present disclosure.

Fig. 6 is a schematic block diagram of a video capture device provided in an embodiment of the present disclosure.

Fig. 7 is a schematic hardware structure of an electronic device according to an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.

It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

< Method example one >

In recent years, with the continuous development of augmented reality technology, conventional methods for video editing based on two-dimensional video scenes are becoming increasingly inadequate in terms of universality and convenience. In order to solve the problems, in the related art, a video editing method for adding virtual three-dimensional elements in real time based on an augmented reality technology is provided to record video data containing the virtual three-dimensional elements in the process of collecting video data, but the method needs to perform video editing processing in real time, and has relatively large limitation.

In order to solve the above-mentioned problems, the embodiments of the present disclosure provide a video editing method with strong universality, which can be used for conveniently performing video editing processing in an offline scene, please refer to fig. 1, which is a schematic flow chart of the video editing method provided in the embodiments of the present disclosure, the method may be implemented in a video editing apparatus, and the video editing apparatus may be a terminal device, for example, may be a mobile phone, a tablet computer, a personal computer, etc., which is not limited in particular herein. For convenience of explanation, in the embodiments of the present disclosure, the video editing apparatus for implementing the method is illustrated as a mobile phone, unless otherwise specified.

As shown in fig. 1, the method of the present embodiment may include the following steps S1100-S1300, which are described in detail below.

Step S1100, obtaining original video data and original three-dimensional scene information, where the original three-dimensional scene information is information representing a real scene in the process of collecting the original video data.

The raw video data is video data acquired using an image acquisition device, such as a camera.

The original three-dimensional scene information may be each frame of camera pose data corresponding to each video frame of the original video data and three-dimensional scene data representing a real scene, wherein the three-dimensional scene data may be at least one of scene model data, object shielding information, scene identification plane information, scene anchor point information and the like, the scene model data may be scene mesh data, the object shielding information may be information representing shielding relation among objects in the real scene, the scene identification plane information may be information representing position and size of a plane in the real scene, the scene anchor point information may be information representing a key point of the real scene, and the anchor point information may be a geometric center of the real scene in general.

Specifically, in order to solve the problems that the existing video editing methods are poor in universality and incapable of conveniently adding virtual three-dimensional elements when video editing is performed on a two-dimensional video scene, in an embodiment of the present disclosure, three-dimensional scene information which is collected based on original video data collected by a video collecting device and represents a real scene in the process of collecting the original video data is provided so as to reconstruct a three-dimensional video scene, so that a user can conveniently perform video editing processing of a full scene based on the three-dimensional video scene.

In a specific implementation, the original video data and the three-dimensional scene information may be acquired by a video acquisition device corresponding to the video editing device, where the video acquisition device and the video editing device may be located in the same terminal device, or may also be located in different terminal devices, where no special limitation is made herein, where, in a case where the two types of devices are located in different terminal devices, the original video data and the original three-dimensional scene information acquired by the video acquisition device may be provided to the video editing device in a form of a video file, a coded data packet, a communication data frame, or the like. The following first describes how the video acquisition device acquires the original video data and the three-dimensional scene information.

In one embodiment, the video capture device may capture the raw video data and the three-dimensional scene information based on receiving an input operation from a user and capturing the raw video data and the raw three-dimensional scene information based on a preset augmented reality technique in response to the input operation.

Please refer to fig. 2, which is a schematic diagram of triggering data acquisition provided in an embodiment of the present disclosure. As shown in fig. 2, the video capturing device may be a terminal device, for example, a camera device in a mobile phone, and specifically after a user opens the camera device to enter a camera initial interface, the camera device may be triggered to capture the original video data based on a preset augmented reality technology and simultaneously collect the original three-dimensional scene information by clicking an "AR" option shown by 201 in fig. 2 and then clicking a recording component shown by 202.

It should be noted that, in the embodiment of the present disclosure, the preset augmented reality technology may be determined according to a technology type corresponding to the video capturing device. For example, in the case that the technology type is a first preset type, the original video data and the original three-dimensional scene information may be acquired based on ARkit (Augmented Reality ToolKit) technology, in the case that the technology type is a second preset type, the original video data and the original three-dimensional scene information may be acquired based on ARCore (Augmented Reality Core) technology, or in the case that the technology type is a third preset type, the original video data and the original three-dimensional scene information may be acquired based on other technologies than ARkit technology and ARCore technology, such as simultaneous localization and mapping (SLAM, simultaneous Localization AND MAPPING) technology.

After the video acquisition device acquires the original video data and the three-dimensional scene information, the two types of information can be provided to the video editing device so that the video editing device can reconstruct the three-dimensional video scene based on the two types of information for video editing processing of a user.

In specific implementation, the video acquisition device can provide original video data and original three-dimensional scene information to a video editing device based on any one of a first item, wherein the original video data and the original three-dimensional scene information are packaged based on a first preset video file packaging format to obtain a first original video file and the first original video file is sent to the video editing device, a second item, wherein the original video data and the original three-dimensional scene information are encoded based on a preset video encoding protocol to obtain a target encoding packet and the target encoding packet is sent to the video editing device, a third item, wherein the original video data is packaged based on a second preset video file packaging format to obtain a second original video file and the three-dimensional scene information is stored in an original three-dimensional information file and the second original video file and the original three-dimensional information file are sent to the video editing device, and a fourth item, wherein the original video data and the original three-dimensional scene information are encoded based on a preset communication protocol to obtain a target data frame and the target frame is sent to the video editing device.

Specifically, the video capture device may encapsulate the original video data based on a video file, such as an MP4 file encapsulation format, after capturing the original video data and the original three-dimensional scene information.

For example, the audio/video data in the original video data may be encoded according to any one of the video encoding standards such as H264 and H265, and then the encoded code stream is encapsulated according to the standard box protocol in the MP4 encapsulation format, and the original three-dimensional scene information may be encapsulated into separate boxes by using a custom protocol, for example, each frame of camera pose data in the three-dimensional scene information may be encapsulated into a box, and the three-dimensional scene data in the three-dimensional scene information may be encapsulated into one or more boxes, and then the original video data and the original three-dimensional scene information may be obtained by parsing the MP4 file by encapsulating each box obtained as an MP4 file and providing the MP4 file to the video editing device.

Of course, in the implementation, the original video data and the original three-dimensional information file may also be provided to the video acquisition device in a manner of directly transmitting the target encoding packet. For example, the original video data may be encoded based on the H264 encoding standard to obtain corresponding code stream data, and at the same time, a target encoded packet is obtained by filling each frame of camera pose data in a supplemental enhancement information field, i.e., an SEI field, and by encapsulating the above code stream data in such a manner that three-dimensional scene data is carried in metadata, i.e., metadata, and the target encoded packet is provided to a video acquisition apparatus.

In addition, the original video data and the original three-dimensional scene information may also be provided to the video editing apparatus in the manner of storing the original three-dimensional scene information in the external file as described in the above A3, where the second preset video file package format may be a video file package format based on a standard protocol, for example, may be a standard MP4 file package format.

The method comprises the steps of obtaining the two types of information by a video editing device, and carrying out three-dimensional video scene reconstruction processing based on the two types of information to provide a video editing scene for a user.

That is, after step S1100, step S1200 is performed to reconstruct a three-dimensional video scene from the original video data and the original three-dimensional scene information.

Please refer to fig. 3, which is a flow chart illustrating a three-dimensional scene reconstruction process according to an embodiment of the present disclosure. As shown in fig. 3, the method for reconstructing a three-dimensional video scene from the original video data and the original three-dimensional scene information includes a step S1210 of obtaining camera pose data and three-dimensional scene data of each frame corresponding to the original video data from the original three-dimensional scene information, and a step S1220 of performing three-dimensional scene reconstruction processing according to the original video data, the camera pose data of each frame, and the three-dimensional scene data to obtain the three-dimensional video scene.

Step S1300, receiving an editing operation of a user for the three-dimensional video scene, and generating target video data in response to the editing operation.

After reconstructing the three-dimensional video scene in step S1200, the user may perform three-dimensional video editing processing based on the three-dimensional video scene.

In one embodiment, the editing operation includes an operation of adding a virtual three-dimensional element, and the generating the target video data in response to the editing operation includes re-performing a video data rendering process based on the added virtual three-dimensional element to generate the target video data including the virtual three-dimensional element.

In this embodiment, the virtual three-dimensional element may be a virtual element such as an advertisement tile, a three-dimensional virtual object, a three-dimensional animation, a special effect, or the like.

In summary, the method provided by the embodiment of the disclosure includes that the video editing device obtains original video data and original three-dimensional scene information representing a real scene in the process of collecting the original video data, reconstructs a three-dimensional video scene based on the original video data and the original three-dimensional scene information, and then provides a three-dimensional video scene capable of conveniently and directly editing video data in the three-dimensional scene for a user.

< Method example two >

In correspondence to the first embodiment of the method, referring to fig. 4, the embodiment of the disclosure further provides a video capturing method, which is a schematic flow chart of the video capturing method provided in the embodiment of the disclosure, and the method may be implemented by a video capturing device, where the video capturing device may specifically be a terminal device, for example, may be a mobile phone, a tablet computer, a personal computer, or the like, and is not limited in particular herein.

As shown in fig. 4, the method of the present embodiment may include the following steps S4100-S4200, which are described in detail below.

In step S4100, an input operation by the user is received.

In step S4200, in response to the input operation, original video data and original three-dimensional scene information are acquired based on a preset augmented reality technology, wherein the original three-dimensional scene information is information representing a real scene in the process of acquiring the original video data.

In one embodiment, after obtaining the original video data and the original three-dimensional scene information, the method provides the original video data and the original three-dimensional scene information to a video editing device by any one of a first item, obtaining a first original video file, and transmitting the first original video file to the video editing device, a second item, encoding the original video data and the original three-dimensional scene information based on a preset video encoding protocol, obtaining a target encoding packet, and transmitting the target encoding packet to the video editing device, a third item, packaging the original video data based on a second preset video file packaging format, obtaining a second original video file, and storing the three-dimensional scene information in an original three-dimensional information file, and transmitting the second original video file and the original three-dimensional information file to the video editing device, and a fourth item, encoding the original video data and the target three-dimensional scene information based on a preset communication protocol, and transmitting the target encoding packet to the video editing device, obtaining the target frame data.

< Device example one >

In this embodiment, as shown in fig. 5, the apparatus 500 may include a data raw data acquisition module 510, a three-dimensional video scene reconstruction module 520, and a target video data generation module 530.

The system comprises an original data acquisition module 510, a three-dimensional video scene reconstruction module 520 and a target video data generation module 530, wherein the original data acquisition module 510 is used for acquiring original video data and original three-dimensional scene information, the original three-dimensional scene information is information representing a real scene in the process of acquiring the original video data, the three-dimensional video scene reconstruction module 520 is used for reconstructing a three-dimensional video scene according to the original video data and the original three-dimensional scene information, and the target video data generation module 530 is used for receiving editing operation of a user on the three-dimensional video scene and responding to the editing operation to generate target video data.

< Device example two >

In this embodiment, a video capturing apparatus is further provided, and as illustrated in fig. 6, the apparatus 600 may include a receiving module 610 and a capturing module 620.

The acquisition module 620 is configured to acquire original video data and original three-dimensional scene information based on a preset augmented reality technology in response to an input operation of a user, wherein the original three-dimensional scene information is information representing a real scene in the process of acquiring the original video data.

< Device example >

In this embodiment, there is also provided an electronic device 700, as illustrated in fig. 7, which may include a processor 720 and a memory 710, the memory 710 for storing executable instructions, the processor 720 for operating the electronic device according to control of the instructions to perform a method according to any embodiment of the present disclosure.

< Computer-readable storage Medium embodiment >

The present embodiment provides a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, perform the method described in any of the method embodiments of the present specification.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present description. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.

The embodiments of the present specification have been described above, and the above description is illustrative, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvement in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the application is defined by the appended claims.

Claims

1. A video editing method, comprising:

Acquiring original video data and original three-dimensional scene information, wherein the original three-dimensional scene information is information representing a real scene in the process of acquiring the original video data, and the original video data and the original three-dimensional scene information are acquired by a video acquisition device based on a preset augmented reality technology;

Acquiring each frame of camera gesture data and three-dimensional scene data corresponding to the original video data according to the original three-dimensional scene information, wherein the three-dimensional scene data comprises at least one of scene model data, object shielding information, scene identification plane information and scene anchor point information;

Performing three-dimensional scene reconstruction processing according to the original video data, the camera pose data of each frame and the three-dimensional scene data to obtain a three-dimensional video scene;

2. The method of claim 1, wherein the editing operation comprises an operation of adding a virtual three-dimensional element;

The generating target video data in response to the editing operation includes:

And re-performing video data rendering processing based on the added virtual three-dimensional element to generate target video data containing the virtual three-dimensional element.

3. The method of claim 1, wherein the acquiring the original video data and the original three-dimensional scene information comprises at least one of:

Receiving a first original video file obtained by a video acquisition device after packaging the original video data and the original three-dimensional scene information based on a first preset video file packaging format, and obtaining the original video data and the original three-dimensional scene information by analyzing the first original video file;

Receiving a target coding packet obtained after the video acquisition device codes the original video data and the original three-dimensional scene information based on a preset video coding protocol, and analyzing the target coding packet to obtain the original video data and the original three-dimensional scene information;

Receiving a second original video file and an original three-dimensional information file provided by the video acquisition device, and obtaining the original video data and the original three-dimensional scene information by analyzing the second original video file and the original three-dimensional information file, wherein the second original video file is obtained by an acquisition terminal after the original video data is packaged based on a second preset video file packaging format, and the original three-dimensional information file is used for storing the three-dimensional scene information;

And obtaining a target data frame which is provided by the video acquisition device and is obtained after the original video data and the original three-dimensional scene information are encoded based on a preset communication protocol, and obtaining the original video data and the original three-dimensional scene information by analyzing the target data frame.

4. A video acquisition method, the method being applied to a video acquisition device, the method comprising:

receiving input operation of a user;

The method comprises the steps of responding to input operation, acquiring original video data and original three-dimensional scene information based on a preset augmented reality technology, wherein the original three-dimensional scene information is information representing a real scene in the process of acquiring the original video data, the original three-dimensional scene information comprises each frame of camera gesture data corresponding to each video frame of the original video data and three-dimensional scene data representing the real scene, the three-dimensional scene data comprises at least one of scene model data, object shielding information, scene identification plane information and scene anchor point information, the original video data, each frame of camera gesture data and the three-dimensional scene data are used for carrying out three-dimensional scene reconstruction processing to obtain a three-dimensional video scene, the three-dimensional video scene is used for receiving editing operation of a user on the three-dimensional video scene, and responding to the editing operation to generate target video data.

5. The method according to claim 4, wherein after obtaining the original video data and the original three-dimensional scene information, the method provides the original video data and the original three-dimensional scene information to a video editing apparatus by any one of:

packaging the original video data and the original three-dimensional scene information based on a first preset video file packaging format to obtain a first original video file, and sending the first original video file to the video editing device;

Encoding the original video data and the original three-dimensional scene information based on a preset video encoding protocol to obtain a target encoding packet, and transmitting the target encoding packet to the video editing device;

Packaging the original video data based on a second preset video file packaging format to obtain a second original video file, storing the three-dimensional scene information into an original three-dimensional information file, and sending the second original video file and the original three-dimensional information file to the video editing device;

And encoding the original video data and the original three-dimensional scene information based on a preset communication protocol to obtain a target data frame, and transmitting the target data frame to the video editing device.

6. A video editing apparatus, comprising:

the system comprises an original data acquisition module, a video acquisition device and a display module, wherein the original data acquisition module is used for acquiring original video data and original three-dimensional scene information, the original three-dimensional scene information is information representing a real scene in the process of acquiring the original video data, and the original video data and the original three-dimensional scene information are acquired by the video acquisition device based on a preset augmented reality technology;

The three-dimensional video scene reconstruction module is used for obtaining each frame of camera gesture data and three-dimensional scene data corresponding to the original video data according to the original three-dimensional scene information, wherein the three-dimensional scene data comprises at least one of scene model data, object shielding information, scene identification plane information and scene anchor point information;

7. A video capture device, comprising:

the receiving module is used for receiving input operation of a user;

The system comprises an input operation module, an acquisition module and a three-dimensional scene data acquisition module, wherein the input operation module is used for responding to the input operation, the three-dimensional scene data acquisition module is used for acquiring original video data and original three-dimensional scene information based on a preset augmented reality technology, the original three-dimensional scene information is used for representing a real scene in the process of acquiring the original video data, the original three-dimensional scene information comprises camera gesture data of each frame corresponding to each video frame of the original video data and three-dimensional scene data representing the real scene, the three-dimensional scene data comprises at least one of scene model data, object shielding information, scene identification plane information and scene anchor point information, the original video data, the camera gesture data of each frame and the three-dimensional scene data are used for carrying out three-dimensional scene reconstruction processing to obtain a three-dimensional video scene, and the three-dimensional video scene is used for receiving editing operation of a user for the three-dimensional video scene and responding to the editing operation to generate target video data.

8. An electronic device, comprising:

a memory for storing executable instructions;

A processor for executing the method according to any of claims 1-5, according to control of the instructions, by the electronic device.

9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program readable for execution by a computer for performing the method according to any of claims 1-5 when being read by the computer.