CN114255164A

CN114255164A - Image processing method and device

Info

Publication number: CN114255164A
Application number: CN202011003309.8A
Authority: CN
Inventors: 高占宁; 林宪晖; 任沛然; 杨涛; 谢宣松; 張磊
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2022-03-29
Anticipated expiration: 2040-09-22
Also published as: CN114255164B

Abstract

Embodiments of the present specification provide an image processing method and apparatus, wherein the image processing method includes: acquiring a first moment image and a second moment image, and determining a first moment sub-image based on the first moment image and the second moment image an image and a sub-image at a second time; a first result of determining a target time and a first optical flow map at the target time based on the image at the first time and the image at the second time, and the sub-image based on the first time , the second moment sub-image determines the second result at the target moment and the second optical flow diagram at the target moment; according to the first result, the first optical flow diagram, the second result and the The second optical flow map obtains the image weight value at the target moment; the target image at the target moment is determined based on the image weight value, the first result and the second result, effectively enhancing the image processing effect .

Description

Image processing method and device

Technical Field

The embodiment of the specification relates to the technical field of image processing, in particular to an image processing method. One or more embodiments of the present specification also relate to an image processing apparatus, a computing device, and a computer-readable storage medium.

Background

With the development of high-definition digital televisions and high-end multimedia information systems, people have higher and higher requirements on the visual effect of video sources, and therefore, the frame rate of the existing video program source needs to be increased to achieve a better visual effect. The existing method for improving the frame rate usually adopts simple video frame replication or action estimation and synthesis based on a traditional motion estimation algorithm, and cannot effectively deal with the conditions of replicating a motion scene, lens transition and the like so as to obtain a smoother frame interpolation result and improve the visual effect of a video image.

Therefore, it is desirable to provide an image processing method that can solve the above-described problems.

Disclosure of Invention

In view of this, the present specification provides an image processing method. One or more embodiments of the present specification also relate to an image processing apparatus, a computing device, and a computer-readable storage medium to address technical deficiencies in the prior art.

According to a first aspect of embodiments herein, there is provided an image processing method including:

acquiring a first moment image and a second moment image, and determining a first moment sub-image and a second moment sub-image based on the first moment image and the second moment image;

determining a first result of a target moment and a first light flow diagram of the target moment based on the first moment image and the second moment image, and determining a second result of the target moment and a second light flow diagram of the target moment based on the first moment sub-image and the second moment sub-image;

obtaining an image weight value at the target moment according to the first result, the first optical flow graph, the second result and the second optical flow graph;

determining a target image at the target time based on the image weight value, the first result, and the second result.

According to a second aspect of embodiments of the present specification, there is provided an image processing method including:

displaying an image input interface for a user based on an image processing request of the user;

acquiring a first moment image and a second moment image input by the user based on the image input interface, and determining a first moment sub-image and a second moment sub-image based on the first moment image and the second moment image;

and determining a target image at the target moment based on the image weight value, the first result and the second result, and returning the target image at the target moment to the user.

According to a third aspect of embodiments of the present specification, there is provided an image processing method including:

receiving an image processing request which is sent by a user and carries a first time image and a second time image;

acquiring the first moment image and the second moment image, and determining a first moment sub-image and a second moment sub-image based on the first moment image and the second moment image;

According to a fourth aspect of embodiments herein, there is provided an image processing apparatus comprising:

the acquisition module is configured to acquire a first time image and a second time image and determine a first time sub-image and a second time sub-image based on the first time image and the second time image;

a first determination module configured to determine a first result of a target time and a first light flow diagram of the target time based on the first time image, the second time image, and a second result of the target time and a second light flow diagram of the target time based on the first time sub-image, the second time sub-image;

an obtaining module configured to obtain an image weight value at the target time from the first result, the first optical flow graph, the second result, and the second optical flow graph;

a second determination module configured to determine a target image at the target time based on the image weight value, the first result, and the second result.

According to a fifth aspect of embodiments herein, there is provided an image processing apparatus comprising:

a presentation module configured to present an image input interface for a user based on an image processing request of the user;

the acquisition module is configured to acquire a first time image and a second time image input by the user based on the image input interface and determine a first time sub-image and a second time sub-image based on the first time image and the second time image;

a second determination module configured to determine a target image at the target time based on the image weight value, the first result, and the second result, and return the target image at the target time to the user.

According to a sixth aspect of embodiments herein, there is provided an image processing apparatus comprising:

the receiving module is configured to receive an image processing request which is sent by a user and carries a first time image and a second time image;

an acquisition module configured to acquire the first time image and the second time image and determine a first time sub-image and a second time sub-image based on the first time image and the second time image;

According to a seventh aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions, wherein the processor implements the steps of the image processing method when executing the computer-executable instructions.

According to an eighth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any one of the image processing methods.

One embodiment of the present specification implements an image processing method and an image processing apparatus, where the image processing method includes acquiring a first time image and a second time image, and determining a first time sub-image and a second time sub-image based on the first time image and the second time image; determining a first result of a target moment and a first light flow diagram of the target moment based on the first moment image and the second moment image, and determining a second result of the target moment and a second light flow diagram of the target moment based on the first moment sub-image and the second moment sub-image; obtaining an image weight value at the target moment according to the first result, the first optical flow graph, the second result and the second optical flow graph; and determining the target image at the target moment based on the image weight value, the first result and the second result, so as to greatly improve the image processing effect.

The image processing method comprises the steps of determining an image weight value of a target moment by obtaining a first moment image, a second moment image, a first moment sub-image and a second moment sub-image, determining a target image of the target moment based on the image weight value, and inserting the target image between the first moment image and the second moment image, so that the video frame rate density of the images is improved, the resolution of the video images is improved, and the smoother visual effect of the video images is realized under the conditions of complex motion scenes and lens transition.

Drawings

Fig. 1 is a diagram of a specific example of an image processing method applied to a video frame-insertion scene according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a first image processing method provided in one embodiment of the present description;

fig. 3A is a flowchart illustrating a first result of an image processing method for determining a target time in a video frame insertion scene and a first light flow diagram according to an embodiment of the present disclosure;

fig. 3B is a flowchart of a second result of an image processing method for determining a target time in a video frame-inserted scene and a second light flow diagram according to an embodiment of the present disclosure;

FIG. 3C is a flowchart illustrating an image processing method for determining image weight values in a video frame insertion scene according to an embodiment of the present disclosure;

FIG. 4A is a flowchart illustrating a method for image processing to detect defects in a video frame-inserted scene according to an embodiment of the present disclosure;

fig. 4B is a flowchart of an image processing method in a video frame-inserted scene according to an embodiment of the present specification;

FIG. 5 is a flow chart of a second image processing method provided in one embodiment of the present description;

FIG. 6 is a flow chart of a third image processing method provided in one embodiment of the present description;

fig. 7 is a schematic structural diagram of a first image processing apparatus provided in an embodiment of the present specification;

fig. 8 is a schematic structural diagram of a second image processing apparatus provided in an embodiment of the present specification;

fig. 9 is a schematic structural diagram of a third image processing apparatus provided in an embodiment of the present specification;

fig. 10 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

Video frame insertion: and a new video frame is generated by utilizing the information of the previous frame and the next frame, so that the video frame rate is improved, and the smoothness of picture playing is improved.

Flaw detection: and detecting the frame interpolation result, and identifying the video frame with obvious frame interpolation flaws.

Optical flow: the current frame in the video represents the moving amount of the pixel point of the same object or object moving to the next frame.

Attention model: the method can be realized through deep network learning, and can also be realized through other modes, and the method can be regarded as an attention model as long as a weight graph is predicted.

In the present specification, an image processing method is provided, and the present specification simultaneously relates to an image processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.

In a video frame insertion scene, a video image in a video acquisition terminal may have a low resolution in some complex motion scenes and transition situations, so that a visual experience of a user is poor.

For ease of understanding, the following description will be made in detail on image processing in a video by applying an image processing method to a video frame-inserted scene.

Referring to fig. 1, fig. 1 is a diagram illustrating a specific example of an image processing method applied to a video frame-inserted scene according to an embodiment of the present disclosure.

The application scene of fig. 1 includes an image acquisition terminal 102 and a server 104, specifically, the image acquisition terminal 102 acquires a first time image and a second time image of any video, where the first time image is an i-th frame image of the video, the second time image is an i + 1-th frame image of the video, the first time image and the second time image are sent to the server 104, the server 104 receives the first time image and the second time image, and performs down-sampling processing on the first time image and the second time image to obtain a first time sub-image and a second time sub-image, where the first time sub-image is a small-scale image obtained by down-sampling the first time image, and the second time sub-image is a small-scale image obtained by down-sampling the second time image; and then inputting the sub-images at the first moment and the second moment into the value image interpolation model to obtain a second result and a second light flow diagram at the target moment, wherein the second result is a small-scale interpolation result at the target moment, and the second light flow diagram is a light flow diagram of a small-scale image.

At this time, the server 104 inputs the obtained first result, the first optical flow graph, the second result, and the second optical flow graph into a pre-trained attention model to obtain an attention weight value at a target time, and then the server 104 performs multi-scale fusion processing based on the obtained attention weight value at the target time, the first result at the target time, and the second result at the target time to obtain a target image at the target time, where the target image at the target time is an image at a time between the i-th frame image and the i + 1-th frame image.

The method includes the steps of firstly performing down-sampling processing on two continuous frames of images to obtain a small-scale image which is lower than an original-scale image, then respectively inputting the two continuous frames of images with large scale and the two continuous frames of images with small scale into an image interpolation model to obtain a large-scale interpolation result and a small-scale interpolation result at a target moment, fusing the large-scale interpolation result and the small-scale interpolation result through an attention weight value obtained by an attention model to obtain a target image at the fused target moment, and inserting the target image into the middle of the two continuous frames of images, so that the number of frames of the video image is increased, and the video image is smoother and the visual effect is improved.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first image processing method provided according to an embodiment of the present specification, including the steps of:

step 202: the method comprises the steps of acquiring a first time image and a second time image, and determining a first time sub-image and a second time sub-image based on the first time image and the second time image.

The first time image and the second time image may be any two consecutive frames of images in any video, where any video includes an entertainment video, a movie video, a news video, and the like, and for example, in a piece of entertainment video, the acquired first time image and the acquired second time image are two consecutive frames of images in the entertainment video.

In specific implementation, the determining the first time sub-image and the second time sub-image based on the first time image and the second time image includes:

and carrying out image scale processing on the first moment image and the second moment image to obtain a first moment sub-image and a second moment sub-image.

In practical application, image scale processing is carried out on a first moment image and a second moment image, the first moment image and the second moment image are subjected to image scale reduction to obtain a first moment sub-image and a second moment sub-image, wherein the image scale processing can be realized based on scaling of the images and down-sampling processing, and can also be realized based on a bilinear interpolation mode, and the first moment image and the second moment image are subjected to sampling processing based on the two image scale processing modes; that is, the first time image and the second time image acquired from the video acquisition terminal are continuous images of the original scale of the video, and the first time sub-image and the second time sub-image processed by down-sampling are continuous images of a small scale, wherein the continuous images of the original scale may also be referred to as large-scale images relative to the continuous images of the small scale.

In specific implementation, in the two consecutive frames of images, the first time image is an ith frame image, the second time image is an i +1 th frame image, and the target image at the target time is an image between the ith frame image and the i +1 th frame image.

In practical application, the ith frame image and the (i + 1) th frame image are obtained, and the ith frame image and the (i + 1) th frame image are subjected to down-sampling processing to obtain an ith frame sub-image and an (i + 1) th frame sub-image, wherein the ith frame sub-image and the (i + 1) th frame sub-image are small-scale images corresponding to the ith frame image and the (i + 1) th frame image, for example, a 0 th frame image and a 1 st frame image are obtained, and the 0 th frame image and the 1 st frame image are respectively subjected to down-sampling processing to obtain a small-scale 0 th frame sub-image and a small-scale 1 st frame sub-image, wherein the down-sampling processing may be scaling processing for the length and the width of an image, such as scaling an image with a resolution of 1080P by half to obtain an image with a resolution of 540P or bilinear interpolation processing for an original image to obtain a down-sampled image.

The image processing method provided in the embodiment of the present specification performs downsampling on two consecutive frames of images in an original video to obtain a downsampled small-scale image, which is beneficial to subsequently inputting the two consecutive frames of images and the small-scale image into an image interpolation model to obtain a first result, a first light flow graph, a second result and a second light flow graph, and the processing speed of the downsampled small-scale image in the image interpolation process can be greatly increased.

Step 204: determining a first result of a target time and a first light flow diagram of the target time based on the first time image and the second time image, and determining a second result of the target time and a second light flow diagram of the target time based on the first time sub-image and the second time sub-image.

The target image at the target moment is an image between the ith frame image and the (i + 1) th frame image, the first result at the target moment is a first result of the image at the target moment between the ith frame image and the (i + 1) th frame image, and the first light flow graph at the target moment is a first light flow graph of the image at the target moment between the ith frame image and the (i + 1) th frame image.

Specifically, the determining a first result of a target time based on the first time image, the second time image, and the first optical flow graph of the target time includes:

and inputting the first moment image and the second moment image into an image processing model to obtain a first scale frame insertion result of a target moment and a first scale light flow graph of the target moment.

In a specific implementation, the image processing model may be a plurality of different models, any model may be used as long as the model is based on an interpolation algorithm for optical flow estimation, the first time image and the second time image are input into the image processing model for interpolation calculation, so that a first scale interpolation result and a first scale optical flow graph of the interpolated image at any time between the first time image and the second time image can be obtained, the first scale interpolation result is based on a large scale interpolation result of the first time image and the second time image, and the first scale optical flow graph is based on a large scale optical flow estimation result of the first time image and the second time image.

In the image processing method provided in the embodiment of the present specification, a first scale frame interpolation result and a first scale light flow graph at a target time are obtained, where the first scale light flow graph is a motion estimation graph of a large-scale image in a complex motion scene, and based on the first scale frame interpolation result and the first scale light flow graph at the target time, a more accurate target image is obtained through subsequent calculation for frame interpolation processing, and the number of frames of a video image is increased, so that a video image frame is smoother, and a better visual effect is exhibited.

In addition, specifically, the determining a second result of the target time and a second optical flow graph of the target time based on the first time sub-image and the second time sub-image includes:

and inputting the first moment sub-image and the second moment sub-image into the image processing model to obtain a second scale frame interpolation result of the target moment and a second scale light flow diagram of the target moment.

In specific implementation, the image processing model may be a plurality of different models, as long as the model is based on an interpolation frame algorithm of optical flow estimation, the first time subimage and the second time subimage are input into the image processing model for interpolation frame calculation, so that a second scale interpolation frame result and a second scale optical flow graph of the interpolation frame image at any time between the first time subimage and the second time subimage can be obtained, the second scale interpolation frame result is based on a small scale interpolation frame result of the first time subimage and the second time subimage, and the second scale optical flow graph is based on a small scale optical flow estimation result of the first time subimage and the second time subimage.

In practical application, the above example is used, the ith frame image and the (i + 1) th frame image are input into an image interpolation model for interpolation calculation, and a large-scale interpolation result and a large-scale light flow graph of the interpolation image at any time between the ith frame image and the (i + 1) th frame image can be obtained, wherein the interpolation image at any time is a target image at a target time between the ith frame image and the (i + 1) th frame image; inputting the i frame sub-image and the (i + 1) frame sub-image into an image interpolation model for interpolation calculation, and obtaining a small-scale interpolation result and a small-scale light-flow graph of the interpolation image at any time between the i frame sub-image and the (i + 1) frame sub-image.

In the image processing method provided in the embodiment of the present specification, a first scale frame interpolation result and a first scale light flow graph of a target moment are obtained, where the first scale light flow graph is a motion estimation graph of a large-scale image in a complex motion scene, and based on the first scale frame interpolation result and the first scale light flow graph of the target moment, a more accurate target image is obtained through subsequent calculation for frame interpolation processing, so as to increase a frame rate of a video image, so that a video image picture is smoother, and a better visual effect is exhibited.

Step 206: and obtaining the image weight value of the target moment according to the first result, the first optical flow diagram, the second result and the second optical flow diagram.

The first result is a large-scale frame interpolation result of the target moment obtained based on the first moment image and the second moment image, the first light flow graph is a first light flow estimation motion graph of the target moment image, the second result is a small-scale frame interpolation result of the target moment obtained based on the first moment sub-image and the second moment sub-image, and the second light flow graph is a second light flow estimation motion graph of the target moment image.

The image processing method provided in the embodiment of the present specification determines a specific result of the interpolated frame image at the target time more accurately based on the image weight value at the target time.

Specifically, the obtaining an image weight value at the target time according to the first result, the first optical flow graph, the second result, and the second optical flow graph includes:

inputting the first scale frame insertion result, the first scale light flow graph, the second scale frame insertion result and the second scale light flow graph into an attention model to obtain an image weight value of the target moment.

In a specific implementation, the first result, the first optical flow graph, the second result, and the second optical flow graph are input into a pre-trained attention model, where the attention model may be realized by deep web learning or by other means, and may be regarded as an attention model as long as one weight graph is predicted, and an image weight value at the target time may be obtained in the attention model according to the four parameters, where the image weight value is a value of a prediction weight.

In practical application, inputting a large-scale frame interpolation result and a large-scale optical flow graph of an interpolation image at any moment in the middle of an ith frame image and an (i + 1) th frame image, and a small-scale frame interpolation result and a small-scale optical flow graph of the interpolation image at any moment in the middle of the ith +1 th frame sub-image into a pre-trained attention model, and obtaining an attention weight value through inference calculation, wherein the attention weight value can represent a relative proportion value of a target image at a target moment relative to the ith frame image and the (i + 1) th frame image; under the conditions of a more complex motion scene and transition, there may be no correspondence between the ith frame image and the (i + 1) th frame image, and there may be a great difference between the images of the consecutive two frames, and if the frame is copied to perform image interpolation processing in the prior art, or there is no correspondence between the preceding frame and the following frame at the positions of shot switching and transition, video interpolation is generally not needed, and the conventional algorithm cannot detect the frame, which may cause the image to be very blurred and the visual experience to be very poor.

Step 208: determining a target image at the target time based on the image weight value, the first result, and the second result.

The image weight value is an image weight value of an image to be interpolated at a target moment, the first result is a large-scale frame interpolation result of the target moment obtained by the first moment image and the second moment image, the second result is a small-scale frame interpolation result of the target moment obtained by the first moment sub-image and the second moment sub-image, and the determined target image at the target moment is an image of a frame to be interpolated at any moment between the first moment and the second moment.

In specific implementation, the first result is a large-scale frame interpolation result of an i + t frame image at a target time t obtained by an i frame image and an i +1 frame image, the second result is a small-scale frame interpolation result of an i + t frame sub-image at the target time t obtained by the i frame sub-image and the i +1 frame sub-image, and the image weight value is an image weight value of the i + t frame image, where the image weight value may be represented by a value a, and the value a may be a vector matrix value of the target image; the method comprises the steps of firstly carrying out up-sampling processing on a small-scale frame interpolation result of an i + t frame sub-image to obtain a frame interpolation result which is consistent with the resolution of a large-scale frame interpolation result of the i + t frame image, multiplying the frame interpolation result which is consistent with the resolution of the large-scale frame interpolation result of the i + t frame image by the value of adding 1 to the opposite number of an image weight value A, and adding the product of multiplying the large-scale frame interpolation result of the i + t frame image by the image weight value A to further obtain the final frame interpolation result of a target image at a target moment.

In practical applications, for example, the formula for fusing interpolation results of different scales by using image weight values can be I_t＝R_t*A+UP[R-S_t](1-A), wherein the value A is an image weight value, and R is_tAs large scale imagesThe first result of (1), the R-S_tFor the second result of the small-scale image, the UP [ R-S ]_t]For up-sampling a second result of the large-scale image, the I_tAnd obtaining an interpolation frame image of a target time t of an interpolation frame between the first time image and the second time image as a final interpolation frame result.

The image processing method provided by the embodiment of the present specification obtains a target image at a target time by fusion of interpolation results of different scales through an attention model, inserts the target image between two consecutive frames of images, improves a frame rate of a video image, and effectively improves an interpolation effect under a large-scale motion and under the conditions of lens switching, transition, and the like.

Further, the determining the target image at the target time based on the image weight value, the first result, and the second result comprises:

judging whether the image weight value at the target moment meets a first preset threshold value or not;

if so, determining an initial image of the target moment based on the image weight value of the target moment, the first result of the target moment and the second result of the target moment, and taking the initial image as a target image of the target moment;

if not, determining the initial image of the target moment based on the image weight value of the target moment, the first result of the target moment and the second result of the target moment, and determining the target image of the target moment based on the initial image of the target moment, the first moment image and the second moment image.

Specifically, the image weight value may be a weight value calculated based on a vector matrix, the first preset threshold may be preset according to multiple experimental conditions, and whether the image weight value at the target time meets the first preset threshold is determined, if yes, an initial image at the target time is determined according to the image weight value of the target image, a first result at the target time, and a second result at the target time, and the initial image is used as a final output target image at the target time and is used as a final frame insertion image for image processing; if not, after the initial image of the target time is determined, the target image of the target time is determined based on the initial image of the target time, the first time image and the second time image to be used as a final frame interpolation image.

The image processing method provided by the embodiment of the present specification judges whether a frame interpolation result has a flaw by judging an image weight value, so as to effectively improve a frame interpolation effect under a large-amplitude motion, and effectively filter a case of frame interpolation failure caused by shot switching.

Further, the determining the initial image at the target time based on the image weight value at the target time, the first result at the target time, and the second result at the target time comprises:

carrying out image scale processing on the second result of the target moment to obtain an image processing result of the target moment;

determining an initial image of the target time based on the image weight value of the target time, the first result of the target time, and the image processing result of the target time.

Specifically, the second result of the target time is an interpolated frame result of the target time of the down-sampled small-scale image, the interpolated frame result is up-sampled to obtain an image processing result of the target time, the image processing result of the target time is an interpolated frame result of a scale corresponding to a large scale of the original video image, and the initial image of the target time is determined by an implementation formula based on the image weight value of the target time, the first result of the target time, and the image processing result of the target time.

The image processing method provided by the embodiment of the present description implements fusion of frame interpolation results of different scales based on image weight values, obtains an accurate target image at a target moment, inserts the target image into two consecutive frames of images, increases the frame rate of a video image, and can effectively improve the video frame interpolation effect of a complex motion scene.

Further, the determining the target image at the target time based on the initial image at the target time, the first image at the target time, and the second image at the target time includes:

judging whether the target time in the values of the initial image of the target time meets a second preset threshold value,

if so, determining that the second image is the target image of the target moment;

and if not, determining that the first image is the target image of the target moment.

When the obtained target image at the target moment is inserted between two frames, if the two frames of images are discontinuous two frames of images under a transition lens, the frame insertion of the obtained target image at the target moment is failed, in order to further reduce the occurrence of frame insertion failure, a defect detection module capable of inserting the target image at the target moment after the image frame insertion is added, the defect detection module can detect whether the obtained multi-frame image is smooth or not after the obtained target image at the target moment is inserted between the two frames of images, if the multi-frame image is smooth, the target image at the target moment can be output, if the multi-frame image is blurred, one frame of image in the two frames of images is selected as the target image at the target moment, and the condition of frame insertion failure caused by lens switching can be effectively filtered through the defect detection module, the image weight value at the target time may be a value calculated based on a vector matrix, and the image weight value is compared with a preset threshold value, taking calculating the image weight value as an average as an example.

Specifically, whether a target image at the target moment obtained after fusion has a flaw or not is judged, and if no flaw exists, the target image at the target moment is inserted between the first moment image and the second moment image to serve as a final frame insertion image; if the target image is judged to have defects or the target image inserted into the target image can cause image blurring or deformation, the target image inserted into the target time between the first time image and the second time image can adopt the first time image or the second time image.

In specific implementation, for example, the first time image is a 0 th frame image, the second time image is a 1 st frame image, and finally, an interpolated frame image at any time t between the 0 th frame and the 1 st frame is output, where the image weight value is a, it is determined whether the image weight value a is greater than a first preset threshold, and if the image weight value a is greater than the first preset threshold, the fused target image is used as the interpolated frame image at the last output time t; if the image weight value is smaller than or equal to a first preset threshold value, it is indicated that the fused target image has flaws and is not usable, whether the moment t is larger than a second preset threshold value or not is further judged, if the intermediate moment t is larger than the second preset threshold value, the second moment image is finally output as a final frame interpolation image, and if the intermediate moment t is smaller than or equal to the second preset threshold value, the first moment image is finally output as the final frame interpolation image.

According to the image processing method provided by the embodiment of the description, the interpolation frame model based on the optical flow realizes the fusion of interpolation frame results of different scales through the attention model according to the pixel corresponding relation of the front frame and the rear frame, the interpolation frame effect under the large-amplitude motion is effectively improved, and the condition of interpolation frame failure caused by lens switching is effectively filtered by introducing the flaw detection module in order to ensure the high resolution of the video, so that the video frame rate of high-storage or standard definition is improved.

Referring to fig. 3A, fig. 3A is a flowchart of a first result and a first light flow diagram of determining a target time in a video frame-inserted scene by an image processing method according to an embodiment of the present disclosure, which specifically includes the following steps:

step 302: and acquiring a first time image and a second time image.

Specifically, the first time image is an ith frame image, and the second time image is an i +1 th frame image.

Step 304: and inputting the acquired first-time image and the acquired second-time image into an image processing model.

Specifically, the image processing model may be any of a variety of different models, as long as the model is based on an interpolation algorithm for optical flow estimation, and the acquired first time image and second time image are input into the image processing model for interpolation calculation.

Step 306: a first result of the output target moment is calculated and a first light flow graph is calculated.

Specifically, a first scale frame interpolation result of a target moment in the middle of the first moment image and the second moment image and a first light flow graph are calculated based on a frame interpolation algorithm of light flow estimation.

In the image processing method provided in the embodiment of the present specification, the first-time image and the second-time image are input to the image processing model for frame interpolation calculation, so as to obtain a first-scale frame interpolation result and a first light-ray diagram, so that the first-scale frame interpolation result and the first light-ray diagram are subsequently input to the attention model to obtain an image weight value, a target image at a target time is determined based on the image weight value, and a smoother frame interpolation image can be output.

Referring to fig. 3B, fig. 3B is a flowchart of a second result and a second light flow diagram of an image processing method for determining a target time in a video frame-inserted scene according to an embodiment of the present disclosure, which specifically includes the following steps:

step 312: and acquiring a first time image and a second time image.

Step 314: and carrying out image scale processing on the first moment image and the second moment image to obtain a first moment sub-image and a second moment sub-image.

Specifically, the image scale processing may adopt down-sampling to perform image scaling processing or a bilinear interpolation mode to perform image scale processing, and perform down-sampling processing on the first time image and the second time image to obtain corresponding small-scale first time sub-images and second time sub-images.

Step 316: and inputting the first time sub-image and the second time sub-image into an image processing model.

Specifically, the first time sub-image and the second time sub-image are small-scale images corresponding to the first time image and the second time image, respectively, and the image processing model can support various models of different types, and can be implemented as long as the frame interpolation algorithm is based on optical flow estimation.

Step 318: a second result of the output target time is calculated and a second light flow graph is output.

Specifically, the second result of the target time is a small-scale frame insertion result of the target time, and the second light flow graph is a small-scale light flow graph of the target time.

The image processing method provided in the embodiment of the present specification obtains a small-scale frame interpolation result and a small-scale light flow graph through the image processing model, and is beneficial to subsequently obtaining an attention weight value at a target moment, so as to subsequently determine a final target image at the target moment according to the attention weight value, and improve a frame interpolation effect of a video image.

Referring to fig. 3C, fig. 3C is a flowchart of determining an image weight value in a video frame insertion scene by an image processing method according to an embodiment of the present disclosure, which specifically includes the following steps:

step 322: and acquiring a first result, a first light flow diagram, a second result and a second light flow diagram of the target moment.

Step 324: and inputting the first result, the first optical flow diagram, the second result and the second optical flow diagram into an attention model.

Specifically, the first result, the first optical flow graph, the second result, and the second optical flow graph are input to an attention model trained in advance, where the attention model may be realized by deep network learning, may be realized by other methods, and may be regarded as an attention model as long as one weight graph is predicted.

Step 326: and obtaining the image weight value through reasoning calculation.

Specifically, the image weight value at the target time may be obtained in the attention model according to the four parameters, where the image weight value is a value of a prediction weight.

The image processing method provided by the embodiment of the present specification obtains the attention weight value through the attention model, can further determine the target image after the frame interpolation at the target time, and can effectively improve the image frame interpolation effect of the complex motion scene.

Referring to fig. 4A, fig. 4A shows a flowchart of performing defect detection in a video frame-inserted scene by using an image processing method provided in an embodiment of the present specification, which specifically includes the following steps:

step 402: and obtaining an image weight value.

Specifically, the image weight value at the target moment is obtained based on the first scale frame interpolation result, the first scale light flow graph, the second scale frame interpolation result and the second scale light flow graph, which are input into an attention model.

Step 404: and judging whether the image weight value exceeds a first preset threshold value.

Specifically, the image weight value may be an image weight value of an image to be interpolated at a target moment, where the image weight value may be a weight value calculated based on a vector matrix, and is determined whether the image weight value exceeds a first preset threshold, and the first preset threshold may be preset according to multiple experimental conditions.

Step 406: if yes, the target image is flawless, and the target image at the target time is output.

Specifically, if yes, determining an initial image at the target time according to the image weight value of the target image, the first result at the target time and the second result at the target time, and taking the initial image as a target image at the target time which is finally output, and taking the target image as an interpolated image of the final image processing.

Step 408: if not, the target image has flaws, and whether the target moment is checked by a second preset threshold value is judged.

Specifically, if not, after the initial image of the target time is determined, the target image of the target time is determined based on the initial image of the target time, the first time image and the second time image, and is used as the final frame interpolation image.

Step 410: and outputting the second time image.

Specifically, if it is determined that the target image has a defect, or the target image inserted into the target image may cause image blurring or deformation, the target image inserted into the target time between the first time image and the second time image may be the first time image or the second time image.

In specific implementation, if the target time of the intermediate time is greater than a second preset threshold, the output second time image is finally used as a final frame interpolation image.

Step 412: and outputting the first time image.

Specifically, if the target time of the intermediate time is less than or equal to the second preset threshold, the first time image will be output as the final frame interpolation image.

According to the image processing method provided by the embodiment of the specification, the interpolation frame model based on the optical flow realizes the fusion of interpolation frame results with different scales through the attention model according to the pixel corresponding relation of the previous frame and the next frame, the obtained fusion frame is inserted into two continuous frame images, the interpolation frame effect under the large-amplitude motion is effectively improved, and the condition of interpolation frame failure caused by lens switching is effectively filtered by introducing the flaw detection module, so that the video frame rate of high storage or standard definition is improved.

Referring to fig. 4B, fig. 4B shows a flowchart of an image processing method in a video frame-inserted scene according to an embodiment of the present specification, and specifically includes the following steps:

step 412: and inputting video frames, wherein the video frames are two continuous frames of images in the video images.

Specifically, the input video frame corresponds to a first time image and a second time image obtained in this embodiment of the present specification.

Step 414: and performing down-sampling processing on the video frame.

Specifically, the downsampling process is performed on the acquired first time image and the acquired second time image in the embodiment of the present specification correspondingly, or the downsampling process is performed through bilinear interpolation, so that the small-scale first time sub-image and the small-scale second time sub-image are obtained.

Step 416: and inputting the sub-image at the first moment and the sub-image at the second moment after down sampling into a video frame interpolation model.

Specifically, the video frame interpolation model is an image processing model in the embodiment of the present specification, and the sub-image at the first time and the sub-image at the second time are input into the image processing model to obtain a second scale frame interpolation result and a second light flow diagram, where the light flow diagram is the second light flow diagram in the embodiment of the present specification, and the small scale frame interpolation result is the second scale frame interpolation result in the embodiment of the present specification.

Step 418: and inputting the video frame to a video frame interpolation model.

Specifically, the video frame interpolation model corresponds to the image processing model of the embodiment, and the first time image and the second time image are input into the image processing model to obtain a first scale frame interpolation result and a first light flow diagram, where the light flow diagram is the first light flow diagram of the embodiment, and the large scale frame interpolation result is the first scale frame interpolation result of the embodiment.

Step 420: and inputting the small-scale light flow graph, the small-scale frame insertion result, the large-scale light flow graph and the large-scale frame insertion result into the attention model to obtain an attention weight graph.

Step 422: and performing multi-scale fusion on the attention weight graph, the large-scale frame interpolation result and the small-scale frame interpolation result to obtain a fusion frame at the target moment.

Specifically, the attention weight map corresponds to the attention weight value of the embodiment of the present specification, the large-scale frame interpolation result is the first result of the embodiment of the present specification, and the small-scale frame interpolation result is the second result of the embodiment of the present specification.

Step 424: and performing flaw detection judgment according to the attention weight map.

Specifically, the attention weight map corresponds to the attention weight value in the embodiment of the present specification, and whether the fused target image has a defect is determined according to the distribution of the attention weight value.

Step 426: and if the obtained fusion frame at the target time is flawless, outputting the fusion frame at the target time.

Specifically, the fused frame at the target time corresponds to a target image at the target time in the embodiment of the present specification.

Step 428: and if the obtained fusion frame at the target moment has defects, outputting the current frame.

Specifically, the current frame is a first time image or a second time image corresponding to the embodiment of the present specification.

Referring to fig. 5, fig. 5 is a flowchart illustrating a second image processing method according to an embodiment of the present disclosure, including the following steps:

step 502: an image input interface is presented for a user based on the user's image processing request.

Step 504: and acquiring a first time image and a second time image input by the user based on the image input interface, and determining a first time sub-image and a second time sub-image based on the first time image and the second time image.

Step 506: determining a first result of a target time and a first light flow diagram of the target time based on the first time image and the second time image, and determining a second result of the target time and a second light flow diagram of the target time based on the first time sub-image and the second time sub-image.

Step 508: and obtaining the image weight value of the target moment according to the first result, the first optical flow diagram, the second result and the second optical flow diagram.

Step 510: and determining a target image at the target moment based on the image weight value, the first result and the second result, and returning the target image at the target moment to the user.

It should be noted that, for a portion of the second image processing method provided in the embodiment of this specification, which corresponds to the embodiment of the first image processing method, reference may be made to the detailed description in the embodiment of the first image processing method, and details are not described here again.

According to the image processing method provided by the embodiment of the specification, the interpolation frame model based on the optical flow realizes the fusion of interpolation frame results of different scales through the attention model according to the pixel corresponding relation of the previous frame and the next frame, the interpolation frame effect under the large-amplitude movement is effectively improved, and the condition of interpolation frame failure caused by lens switching is effectively filtered by introducing the flaw detection module, so that the video frame rate of high storage or standard definition is improved.

Referring to fig. 6, fig. 6 is a flowchart illustrating a third image processing method according to an embodiment of the present disclosure, including the following steps:

step 602: and receiving an image processing request which is sent by a user and carries a first time image and a second time image.

Step 604: and acquiring the first time image and the second time image, and determining a first time sub-image and a second time sub-image based on the first time image and the second time image.

Step 606: determining a first result of a target time and a first light flow diagram of the target time based on the first time image and the second time image, and determining a second result of the target time and a second light flow diagram of the target time based on the first time sub-image and the second time sub-image.

Step 608: and obtaining the image weight value of the target moment according to the first result, the first optical flow diagram, the second result and the second optical flow diagram.

Step 610: and determining a target image at the target moment based on the image weight value, the first result and the second result, and returning the target image at the target moment to the user.

It should be noted that, for a part of the third image processing method provided in the embodiment of this specification, which corresponds to the embodiment of the first image processing method, reference may be made to the detailed description in the embodiment of the first image processing method, and details are not described here again.

Corresponding to the above method embodiment, the present specification further provides an embodiment of a detection apparatus, and fig. 7 shows a schematic structural diagram of a first detection apparatus provided in an embodiment of the present specification. As shown in fig. 7, the apparatus includes:

an obtaining module 702 configured to obtain a first time image and a second time image, and determine a first time sub-image and a second time sub-image based on the first time image and the second time image;

a first determining module 704 configured to determine a first result of a target time and a first light flow diagram of the target time based on the first time image, the second time image, and a second result of the target time and a second light flow diagram of the target time based on the first time sub-image, the second time sub-image;

an obtaining module 706 configured to obtain an image weight value at the target time according to the first result, the first optical flow graph, the second result, and the second optical flow graph;

a second determination module 708 configured to determine a target image at the target time based on the image weight value, the first result, and the second result.

Optionally, the obtaining module 702 is further configured to:

Optionally, the first determining module 704 is further configured to:

Optionally, the obtaining module 706 is further configured to:

Optionally, the second determining module 708 is further configured to:

if not, determining the initial image of the target time based on the image weight value of the target time, the first result of the target time and the second result of the target time, and determining the target image of the target time based on the initial image of the target time, the first image of the target time and the second image of the target time.

Optionally, the second determining module 708 is further configured to:

Optionally, the apparatus further includes:

a determination module configured to determine whether a target time among values of the initial image at the target time satisfies a second preset threshold,

if so, determining the target image of the target moment as the second image;

and if not, determining that the target image at the target moment is the first image.

Optionally, the first time image is an ith frame image, the second time image is an i +1 th frame image, and the target image at the target time is an image between the ith frame image and the i +1 th frame image.

The above is a schematic configuration of an image processing apparatus of the present embodiment. It should be noted that the technical solution of the image processing apparatus belongs to the same concept as the technical solution of the first image processing method, and details that are not described in detail in the technical solution of the image processing apparatus can be referred to the description of the technical solution of the first image processing method.

Corresponding to the above method embodiment, the present specification further provides an embodiment of a detection apparatus, and fig. 8 shows a schematic structural diagram of a first detection apparatus provided in an embodiment of the present specification. As shown in fig. 8, the apparatus includes:

a presentation module 802 configured to present an image input interface for a user based on an image processing request of the user;

an obtaining module 804, configured to obtain a first time image and a second time image input by the user based on the image input interface, and determine a first time sub-image and a second time sub-image based on the first time image and the second time image;

a first determining module 806 configured to determine a first result of a target time and a first light flow diagram of the target time based on the first time image, the second time image, and a second result of the target time and a second light flow diagram of the target time based on the first time sub-image, the second time sub-image;

an obtaining module 808 configured to obtain an image weight value at the target time from the first outcome, the first optical flow graph, the second outcome, and the second optical flow graph;

a second determining module 810 configured to determine a target image of the target time based on the image weight value, the first result, and the second result, and return the target image of the target time to the user.

The above is a schematic configuration of an image processing apparatus of the present embodiment. It should be noted that the technical solution of the image processing apparatus belongs to the same concept as the technical solution of the second image processing method, and details that are not described in detail in the technical solution of the image processing apparatus can be referred to the description of the technical solution of the second image processing method.

Corresponding to the above method embodiment, the present specification further provides an image processing apparatus embodiment, and fig. 9 shows a schematic structural diagram of a third image processing apparatus provided in an embodiment of the present specification. As shown in fig. 9, the apparatus includes:

a receiving module 902 configured to receive an image processing request carrying a first time image and a second time image sent by a user;

an obtaining module 904 configured to obtain the first time image and the second time image, and determine a first time sub-image and a second time sub-image based on the first time image and the second time image;

a first determination module 906 configured to determine a first result of a target time and a first light flow diagram of the target time based on the first time image, the second time image, and a second result of the target time and a second light flow diagram of the target time based on the first time sub-image, the second time sub-image;

an obtaining module 908 configured to obtain an image weight value at the target time from the first outcome, the first optical flow graph, the second outcome, and the second optical flow graph;

a second determining module 910 configured to determine a target image at the target time based on the image weight value, the first result, and the second result, and return the target image at the target time to the user.

The above is a schematic configuration of an image processing apparatus of the present embodiment. It should be noted that the technical solution of the image processing apparatus belongs to the same concept as that of the third image processing method, and details that are not described in detail in the technical solution of the image processing apparatus can be referred to the description of the technical solution of the third image processing method.

FIG. 10 illustrates a block diagram of a computing device 1000 provided in accordance with one embodiment of the present description. The components of the computing device 1000 include, but are not limited to, memory 1010 and a processor 1020. The processor 1020 is coupled to the memory 1010 via a bus 1030 and the database 1050 is used to store data.

Computing device 1000 also includes access device 1040, access device 1040 enabling computing device 1000 to communicate via one or more networks 1060. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 1040 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 1000 and other components not shown in FIG. 10 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 10 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 1000 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 1000 may also be a mobile or stationary server.

Wherein the processor 1020 is configured to execute computer-executable instructions, wherein the steps of the image processing method are implemented when the processor executes the computer-executable instructions.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the image processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the image processing method.

An embodiment of the present specification further provides a computer readable storage medium storing computer instructions, which when executed by a processor, implement the steps of the image processing method.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the image processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the image processing method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. An image processing method comprising:

2. The image processing method of claim 1, the determining a first time instant sub-image and a second time instant sub-image based on the first time instant image and the second time instant image comprising:

3. The image processing method of claim 1, the determining a first result of a target time instance and a first optical flow graph of the target time instance based on the first time instance image, the second time instance image comprising:

4. The image processing method of claim 3, the determining a second result of a target time instant and a second optical flow graph of the target time instant based on the first time instant sub-image, the second time instant sub-image comprising:

5. The method of image processing according to claim 4, said obtaining image weight values for the target moment from the first outcome, the first optical flow graph, the second outcome, and the second optical flow graph comprising:

6. The image processing method of claim 1, the determining a target image for the target moment based on the image weight value, the first result, and the second result comprising:

7. The image processing method of claim 6, the determining the initial image at the target time based on the image weight value at the target time, the first result at the target time, and the second result at the target time comprising:

8. The image processing method of claim 6, the determining the initial image at the target time based on the image weight value at the target time, the first result at the target time, and the second result at the target time comprising:

9. The image processing method of claim 8, the determining the target image for the target time based on the initial image for the target time, the first time image, and the second time image comprising:

10. The image processing method according to any one of claims 1 to 9, wherein the first time image is an i-th frame image, the second time image is an i + 1-th frame image, and the target image at the target time is an image intermediate between the i-th frame image and the i + 1-th frame image.

11. An image processing method comprising:

12. An image processing method comprising:

13. An image processing apparatus comprising:

14. An image processing apparatus comprising:

15. An image processing apparatus comprising:

16. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions, wherein the processor implements the steps of the image processing method according to any one of claims 1 to 10 or 11 or 12 when executing the computer-executable instructions.

17. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the image processing method of any of claims 1 to 10 or 11 or 12.