CN114708261A

CN114708261A - Motion estimation method and device of image acquisition equipment, terminal and storage medium

Info

Publication number: CN114708261A
Application number: CN202210612011.XA
Authority: CN
Inventors: 毛礼建; 张鎏锟; 熊剑平
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2022-07-05

Abstract

The invention provides a motion estimation method, a motion estimation device, a terminal and a storage medium of image acquisition equipment, wherein the motion estimation method of the image acquisition equipment acquires a video frame to be processed aiming at a target part of an object to be detected through the image acquisition equipment; determining position variable information between the video frame to be processed and the historical video frame based on the position information of the same pixel points in the video frame to be processed and the historical video frame; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information. The method and the device determine the corresponding position variable information based on the video frame to be processed and the historical video frame before the video frame to be processed, and then determine the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the obtained position variable information, so that the motion information of the image acquisition equipment when the video frame to be processed is acquired can be determined only through image transformation.

Description

Motion estimation method and device of image acquisition equipment, terminal and storage medium

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to a motion estimation method and apparatus for an image capture device, a terminal, and a computer-readable storage medium.

Background

The medical endoscope is a detection instrument composed of an image sensor, an illumination light source, an optical lens and other physical device structures, and can enter through various organs of a human body such as a nose opening and the like to shoot the conditions of some tissues and organs in the human body, and a doctor can shoot through various tissues in the human body by the endoscope to store some pathological changes in the human body. Therefore, medical endoscopes play an important role in current pathological diagnosis.

When a plurality of current medical endoscopes enter a human body to be checked, certain stimulation can be caused to the human body by the equipment of the medical endoscopes. When a doctor performs an endoscopic operation, the movement speed of the lens can cause stimulation to different degrees on a checked person. When the speed exceeds a certain level, damage to the inspected person may be caused.

Disclosure of Invention

The invention mainly solves the technical problem of providing a motion estimation method, a motion estimation device, a motion estimation terminal and a computer readable storage medium for image acquisition equipment, and solves the problem that the motion speed of the image acquisition equipment cannot be determined in the prior art.

In order to solve the above technical problems, a first technical solution provided by the present invention is: there is provided a motion estimation method of an image pickup apparatus, the motion estimation method of the image pickup apparatus including: acquiring a video frame to be processed aiming at a target part of an object to be detected through image acquisition equipment; determining position variable information between the video frame to be processed and the historical video frame based on the position information of the same pixel points in the video frame to be processed and the historical video frame; the historical video frame is an image collected aiming at a target part of an object to be detected, and the collection time of the historical video frame is earlier than that of the video frame to be processed; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information.

In a possible implementation manner, determining motion information of the image capturing device when capturing a video frame to be processed according to the position variable information includes: establishing a coordinate system based on the video frame to be processed and the historical video frame; the origin of the coordinate system is superposed with the central points of the video frame to be processed and the historical video frame; determining displacement variables corresponding to the same pixel points in the video to be processed and the historical video, wherein the displacement variables represent displacement variables of a first position point and a second position point of the corresponding same pixel points, the first position point is a coordinate point of the corresponding same pixel point in the video frame to be processed in a coordinate system, and the second position point is a coordinate point of the corresponding same pixel point in the historical video frame in the coordinate system; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired based on the displacement variable corresponding to each same pixel point.

In one possible implementation, the coordinate system includes four quadrants; according to the position variable information, determining the motion information of the image acquisition equipment when acquiring the video frame to be processed, and further comprising: counting to obtain a horizontal displacement variable and a vertical displacement variable corresponding to the position variable information in each of four quadrants; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the displacement variable in the horizontal direction and the displacement variable in the vertical direction.

In a possible implementation manner, determining motion information of the image capturing device when capturing a video frame to be processed according to a horizontal displacement variable and a vertical displacement variable, and the method includes: smoothing the displacement variable in the horizontal direction and the displacement variable in the vertical direction by adopting a Kalman filter; according to the displacement variable in the horizontal direction and the displacement variable in the vertical direction, the motion information of the image acquisition equipment when the video frame to be processed is acquired is determined, and the method comprises the following steps: and determining the motion information of the image acquisition equipment when acquiring the video frame to be processed according to the smooth horizontal displacement variable and the smooth vertical displacement variable.

In one possible implementation, the motion information includes at least one of a speed of motion and a direction of motion.

In one possible implementation, the motion information includes a motion direction, and the four quadrants include a first quadrant, a second quadrant, a third quadrant, and a fourth quadrant; the first quadrant is positioned at the upper right side of the origin of the coordinate system, the second quadrant is positioned at the upper left side of the origin of the coordinate system, the third quadrant is positioned at the lower left side of the origin of the coordinate system, and the fourth quadrant is limited at the lower right side of the origin of the coordinate system; according to the position variable information, determining the motion information of the image acquisition equipment when acquiring the video frame to be processed, which comprises the following steps: and responding to the fact that the horizontal direction displacement variable and the vertical direction displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the first quadrant are positive values, the horizontal direction displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the second quadrant is a negative value, the vertical direction displacement variable is a positive value, the horizontal direction displacement variable and the vertical direction displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the third quadrant are negative values, the horizontal direction displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the fourth quadrant is a positive value, the vertical direction displacement variable is a negative value, and then the running direction of the image acquisition equipment is determined to be the advancing direction.

In a possible implementation manner, determining motion information of the image capturing device when capturing a video frame to be processed according to the position variable information includes: determining motion information of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information by adopting a regression network model; in one possible implementation manner, the training method of the regression network model includes: acquiring a first training data set, wherein the first training data set comprises a plurality of groups of training data containing horizontal direction displacement variables and vertical direction displacement variables; each group of training data is marked with real motion information; inputting each training data into a regression network model to obtain predicted motion information corresponding to each training data; and performing iterative training on the regression network model based on the error value between the real motion information and the predicted motion information corresponding to the same training data.

In a possible implementation manner, determining position variable information between a video frame to be processed and a historical video frame based on position information of the same pixel point in the video frame to be processed and the historical video frame, where the method further includes: and respectively carrying out feature matching on each pixel point in the video frame to be processed and each pixel point in the historical video frame to obtain the position information of the same pixel point in the video frame to be processed and the position information in the historical video frame.

In a possible implementation manner, determining position variable information between a video frame to be processed and a historical video frame based on position information of the same pixel point in the video frame to be processed and the historical video frame includes: determining position variable information between the video frame to be processed and the historical video frame by adopting an image registration network model based on the position information of the same pixel points in the video frame to be processed and the historical video frame; in one possible implementation, the training method of the image registration network model includes: acquiring a second training data set, wherein the second training data set comprises a plurality of first sample images and second sample images; a preset number of frames are spaced between the first sample image and the second sample image, and real position variables are correspondingly marked between the first sample image and the second sample image spaced by the preset number of frames; inputting a first sample image and a second sample image which are separated by a preset number of frames into an image registration network model to obtain a corresponding predicted position variable between the first sample image and the second sample image; and performing iterative training on the image registration network model based on the error values between the corresponding real position variables and the corresponding predicted position variables between the first sample image and the second sample image of the same interval preset number of frames.

In order to solve the above technical problems, the second technical solution provided by the present invention is: there is provided a motion estimation apparatus of an image pickup device, the motion estimation apparatus of the image pickup device including: the acquisition module is used for acquiring a video frame to be processed aiming at a target part of an object to be detected through image acquisition equipment; the analysis module is used for determining position variable information between the video frame to be processed and the historical video frame based on the position information of the same pixel point in the video frame to be processed and the historical video frame; the historical video frame is an image collected aiming at a target part of an object to be detected, and the collection time of the historical video frame is earlier than that of the video frame to be processed; and the estimation module is used for determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information.

In order to solve the above technical problems, a third technical solution provided by the present invention is: there is provided a terminal comprising a memory, a processor and a computer program stored in the memory and running on the processor, the processor being adapted to execute the sequence data to implement the steps in the method of motion estimation of an image capturing device as described above.

In order to solve the above technical problems, a fourth technical solution provided by the present invention is: there is provided a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of motion estimation of an image acquisition device as described above.

The invention has the beneficial effects that: different from the situation of the prior art, the motion estimation method, the motion estimation device, the motion estimation terminal and the computer readable storage medium of the image acquisition equipment are provided, wherein the motion estimation method of the image acquisition equipment acquires a video frame to be processed aiming at a target part of an object to be detected through the image acquisition equipment; determining position variable information between the video frame to be processed and the historical video frame based on the position information of the same pixel points in the video frame to be processed and the historical video frame; the historical video frame is an image collected aiming at a target part of an object to be detected, and the collection time of the historical video frame is earlier than that of the video frame to be processed; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information. The method and the device for acquiring the video frame are used for determining the corresponding position variable information based on the video frame to be processed and the historical video frame before the video frame to be processed, and then determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the obtained position variable information, so that the motion information of the image acquisition equipment when the video frame to be processed is acquired can be determined only through image transformation.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a motion estimation method of an image acquisition device provided by the invention;

fig. 2 is a schematic flowchart of an embodiment of a motion estimation method for an image capturing device according to the present invention;

FIG. 3 is a flowchart illustrating an embodiment of step S21 of the motion estimation method of the image capturing apparatus shown in FIG. 2;

FIG. 4 is a flowchart illustrating an embodiment of step S22 in the motion estimation method of the image capturing device shown in FIG. 2;

FIG. 5 is a flowchart illustrating a method for motion estimation of an image capturing device according to an embodiment of the present invention;

fig. 6 is a schematic block diagram of a motion estimation apparatus of an image capturing device provided by the present invention;

FIG. 7 is a schematic block diagram of one embodiment of a terminal provided by the present invention;

FIG. 8 is a schematic block diagram of one embodiment of a computer-readable storage medium provided by the present invention.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, "plurality" herein means two or more than two.

In order to make those skilled in the art better understand the technical solution of the present invention, the following describes in detail a motion estimation method of an image capturing device provided by the present invention with reference to the accompanying drawings and the detailed description.

Referring to fig. 1, fig. 1 is a schematic flow chart of a motion estimation method of an image capturing device according to the present invention. In this embodiment, a method for estimating motion of an image capturing apparatus is provided, and the method for estimating motion of an image capturing apparatus includes the following steps.

S11: and acquiring a video frame to be processed aiming at a target part of an object to be detected through image acquisition equipment.

In particular, the image capturing device is a medical device that can be accessed in vivo, but may also be other devices that can capture images and/or video. For example, the image capturing device may be an endoscope, or may be another device that can enter the living body. In this embodiment, taking an endoscope as an example, the endoscope is inserted into a body through a part such as mouth and nose, and the endoscope is used to photograph a target part of an object to be detected, and an image in the body is acquired in real time through the endoscope as a video frame to be processed. The endoscope in the embodiment can detect in the human body, namely, acquire an image in the human body as a video frame to be processed. In another embodiment, the detection can also be performed in the animal body through an endoscope, that is, an image in the animal body is acquired as a video frame to be processed.

S12: and determining position variable information between the video frame to be processed and the historical video frame based on the position information of the same pixel point in the video frame to be processed and the historical video frame.

Specifically, the historical video frame is an image acquired for a target portion of the object to be detected, and the acquisition time of the historical video frame is earlier than the acquisition time of the video frame to be processed.

N frames of images are spaced between the video frame to be processed and the historical video frame, and n is a positive integer. And respectively carrying out feature matching on each pixel point in the video frame to be processed and each pixel point in the historical video frame to obtain the position information of the same pixel point in the video frame to be processed and the position information in the historical video frame. The same pixel points are pixel points existing in the video frame to be processed and the historical video frame at the same time, and the same pixel points can comprise one or more pixel points and are determined based on actual conditions. And determining position variable information between the video frame to be processed and the historical video frame by adopting an image registration network model based on the position information of the same pixel points in the video frame to be processed and the historical video frame.

The position variable information is the displacement variable of the same pixel point between the position information in the video frame to be processed and the position information in the historical video frame. Specifically, the position variable parameter may be a transform field, that is, position information of any pixel point in the historical video frame is converted based on the corresponding transform field to obtain position information of the pixel point mapped in the video frame to be processed.

S13: and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information.

Specifically, a coordinate system is established based on a video frame to be processed and a historical video frame; the origin of the coordinate system is superposed with the central points of the video frame to be processed and the historical video frame; determining displacement variables corresponding to the same pixel points in the video to be processed and the historical video, wherein the displacement variables represent displacement variables of a first position point and a second position point of the corresponding same pixel points, the first position point is a coordinate point of the corresponding same pixel point in the video frame to be processed in a coordinate system, and the second position point is a coordinate point of the corresponding same pixel point in the historical video frame in the coordinate system; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired based on the displacement variable corresponding to each same pixel point.

In one embodiment, the coordinate system includes four quadrants; the running speed information comprises a running speed and a running direction; counting to obtain a horizontal displacement variable and a vertical displacement variable corresponding to the position variable information in each of four quadrants; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the displacement variable in the horizontal direction and the displacement variable in the vertical direction.

In a preferred embodiment, a Kalman filter is adopted to respectively carry out smoothing processing on a displacement variable in the horizontal direction and a displacement variable in the vertical direction; and determining the motion information of the image acquisition equipment when acquiring the video frame to be processed according to the smooth horizontal displacement variable and the smooth vertical displacement variable.

In an embodiment, the motion information comprises at least one of a speed of motion and a direction of motion. And determining the movement speed and the movement direction of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information between the video frame to be processed and the historical video frame.

In a preferred embodiment, the motion information includes a motion direction, and the four quadrants include a first quadrant, a second quadrant, a third quadrant, and a fourth quadrant; the first quadrant is located above and to the right of the origin of the coordinate system, the second quadrant is located above and to the left of the origin of the coordinate system, the third quadrant is located below and to the left of the origin of the coordinate system, and the fourth quadrant is located below and to the right of the origin of the coordinate system. And responding to the fact that the horizontal direction displacement variable and the vertical direction displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the first quadrant are positive values, the horizontal direction displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the second quadrant is a negative value, the vertical direction displacement variable is a positive value, the horizontal direction displacement variable and the vertical direction displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the third quadrant are negative values, the horizontal direction displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the fourth quadrant is a positive value, the vertical direction displacement variable is a negative value, and then the running direction of the image acquisition equipment is determined to be the advancing direction.

In one embodiment, a regression network model is used to determine motion information of the image capture device when capturing a video frame to be processed according to the position variable information.

In the motion estimation method of the image capturing device provided by this embodiment, a video frame to be processed is captured for a target portion of an object to be detected by the image capturing device; determining position variable information between the video frame to be processed and the historical video frame based on the position information of the same pixel points in the video frame to be processed and the historical video frame; the historical video frame is an image collected aiming at a target part of an object to be detected, and the collection time of the historical video frame is earlier than that of the video frame to be processed; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information. The method and the device for acquiring the video frame are used for determining the corresponding position variable information based on the video frame to be processed and the historical video frame before the video frame to be processed, and then determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the obtained position variable information, so that the motion information of the image acquisition equipment when the video frame to be processed is acquired can be determined only through image transformation.

Referring to fig. 2, fig. 2 is a flowchart illustrating a motion estimation method of an image capturing device according to an embodiment of the present invention. In this embodiment, a method for estimating motion of an image capturing apparatus is provided, and the method for estimating motion of an image capturing apparatus includes the following steps.

S21: and training to obtain a regression network model.

Specifically, training the regression network model specifically includes the following steps.

Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of step S21 in the motion estimation method of the image capturing apparatus provided in fig. 2.

S211: a first training data set is acquired.

Specifically, the first training data set includes a plurality of sets of training data including a horizontal direction displacement variable and a vertical direction displacement variable; each set of training data is labeled with a true running speed and a true running direction. Wherein the real running speed and the real running direction are calculated based on the running distance and the running time of the running target.

In this embodiment, each set of training data includes 8 displacement values corresponding to four quadrants of a coordinate system, and the 8 displacement values include a horizontal displacement variable corresponding to each of the four quadrants and a vertical displacement variable corresponding to each of the four quadrants. Wherein, each training data set corresponds to training data x = (x)₁，x₂，x₃，x₄，x₅，x₆，x₇，x₈）。

In this embodiment, the actual running speed corresponding to each set of training data is the running speed of the running target when the running target has a position change of 8 displacement values in four quadrants.

The four quadrants include a first quadrant, a second quadrant, a third quadrant, and a fourth quadrant; the first quadrant is located at the upper right of the origin of the coordinate system, the second quadrant is located at the upper left of the origin of the coordinate system, the third quadrant is located at the lower left of the origin of the coordinate system, and the fourth quadrant is limited at the lower right of the origin of the coordinate system.

Displacement variable dx in the horizontal direction of the first quadrant_sAnd displacement variable dy in the vertical direction_sAre all positive values; displacement variable dx in the horizontal direction of the second quadrant_sNegative, vertical displacement variable dy_sIs a positive value; displacement variable dx in the horizontal direction of the third quadrant_sAnd displacement variable dy in the vertical direction_sAre all negative values; horizontal direction displacement variable dx of fourth quadrant_sPositive, vertically displaced variable dy_sIf the value is a negative value, the real running direction corresponding to the training data is marked as forward running. Specifically, the labeled data corresponding to the training data is a positive value. For example, the annotation data is +5cm/s or 5 cm/s; wherein the real running speed is 5cm/s, and the real running direction is forward.

Displacement variable dx in the horizontal direction of the first quadrant_sAnd displacement variable dy in the vertical direction_sAre all negative values; horizontal direction displacement variable dx of the second quadrant_sIs a positive value, displacement variable dy in vertical direction_sIs a negative value; displacement variable dx in the horizontal direction of the third quadrant_sAnd displacement variable dy in the vertical direction_sAre all positive values; horizontal direction displacement variable dx of fourth quadrant_sNegative, vertical displacement variable dy_sIf the actual moving direction is positive, the real moving direction corresponding to the training data is marked as backward. Specifically, the actual running speed corresponding to the labeled training data is a negative value. For example, the label data is-3 cm/s; wherein the real running speed is 3cm/s, and the real running direction is backward.

Displacement variable dx in the horizontal direction of the first quadrant_sIs a positive value, displacement variable dy in vertical direction_sIs zero; horizontal direction displacement variable dx of second quadrant_sAt a positive value, vertical directionBy the displacement variable dy_sIs zero; displacement variable dx in the horizontal direction of the third quadrant_sIs a positive value, displacement variable dy in vertical direction_sIs zero; horizontal direction displacement variable dx of fourth quadrant_sIs a positive value, displacement variable dy in vertical direction_sAnd if the running direction is zero, marking the real running direction corresponding to the training data as a right shift.

Displacement variable dx in the horizontal direction of the first quadrant_sNegative, vertical displacement variable dy_sIs zero; displacement variable dx in the horizontal direction of the second quadrant_sNegative, vertical displacement variable dy_sIs zero; displacement variable dx in the horizontal direction of the third quadrant_sNegative, vertical displacement variable dy_sIs zero; horizontal direction displacement variable dx of fourth quadrant_sNegative, vertical displacement variable dy_sAnd if the running direction is zero, marking the real running direction corresponding to the training data as left shift.

Displacement variable dx in the horizontal direction of the first quadrant_sIs zero, displacement variable dy in vertical direction_sIs a positive value; displacement variable dx in the horizontal direction of the second quadrant_sIs zero, displacement variable dy in vertical direction_sIs a positive value; displacement variable dx in the horizontal direction of the third quadrant_sIs zero, displacement variable dy in vertical direction_sAre all positive values; horizontal direction displacement variable dx of fourth quadrant_sZero, vertical displacement variable dy_sIf the value is positive, the real running direction corresponding to the training data is marked to be upward movement.

Displacement variable dx in the horizontal direction of the first quadrant_sZero, vertical displacement variable dy_sIs a negative value; horizontal direction displacement variable dx of second quadrant_sIs zero, displacement variable dy in vertical direction_sIs a negative value; displacement variable dx in the horizontal direction of the third quadrant_sIs zero, displacement variable dy in vertical direction_sAre all negative values; horizontal direction displacement variable dx of fourth quadrant_sIs zero, displacement variable dy in vertical direction_sIf the value is negative, the real running direction corresponding to the training data is marked as downward movement.

Specifically, when the real operation direction corresponding to the training data is left-shift, right-shift, up-shift or down-shift, the real operation speed corresponding to the training data is marked as zero, and the operation target corresponding to the training data does not move forward or backward.

S212: and inputting the training data into a regression network model to obtain the predicted operation speed and the predicted operation direction corresponding to the training data.

Specifically, each set of training data is input into a regression network model, and the regression network model determines the prediction data corresponding to each set of training data based on the 8 displacement values corresponding to each set of training data. Wherein the predicted data includes a predicted operating speed and a predicted operating direction.

In one embodiment, the regression network model takes 8 displacement values as input, and calculates the predicted operation speed and the predicted operation direction corresponding to the 8 displacement values based on formula 1.

(formula 1)

In equation 1: (x) prediction data corresponding to the training data; x represents a vector of 8 displacement values; w represents the weight values of 8 vectors, w = (w)₁，w₂，w₃，w₄，w₅，w₆，w₇，w₈) B is weight, and T represents transposed parameters.

S213: and performing iterative training on the regression network model based on the error value between the real running speed and the predicted running speed and the error value between the real running direction and the predicted running direction corresponding to the same training data.

Specifically, the regression network model is iteratively trained according to the error value between the labeling data and the prediction data corresponding to the same set of training data, and the weight value in the formula 1 is optimized.

In a specific embodiment, an error value between the actual running speed and the predicted running speed corresponding to the same set of training data is calculated, an error value between the actual running direction and the predicted running direction corresponding to the same set of training data is calculated, iterative training is performed on the regression network model based on the error value of the running speed and the error value of the running direction, and the weight value in the optimization formula 1 is trained.

In an alternative embodiment, the results of the regression network model are propagated in reverse, and the weight values of the regression network model are corrected according to the feedback loss values. That is to say, the weight values w and b in the linear regression function in the regression network model are corrected according to the feedback loss value, and the training of the regression network model is realized.

And inputting the training data into a regression network model, and predicting the running speed and the running direction corresponding to the training data by the regression network model. When the error value between the labeled data and the predicted data corresponding to the same training data is smaller than the preset threshold, the preset threshold may be set by itself, for example, 1%, 5%, and the like, and then the training of the regression network model is stopped.

By training the regression network model, the detection accuracy of the regression network model can be increased, and more local motions can be captured conveniently.

S22: and training to obtain an image registration network model.

Specifically, training the image registration network model specifically includes the following steps.

Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of step S22 in the motion estimation method of the image capturing apparatus provided in fig. 2.

S221: a second training data set is acquired.

In particular, the second training data set comprises first and second sample images of a plurality of endoscopic scenes; the first sample image and the second sample image are separated by a preset number of frames, and corresponding real position variables are marked between the first sample image and the second sample image separated by the preset number of frames.

In an embodiment, images of different video frames are acquired in a video stream as a first sample image and a second sample image, wherein an image of a frame acquired first is taken as the first sample image, and an image of a frame is acquired as the second sample image after a predetermined number of frames are spaced after the first sample image. That is, the first sample image and the second sample image are in the same scene. In this embodiment, the first sample image and the second sample image are both images corresponding to a scene of the endoscope in the body. The endoscope has less texture in the image corresponding to the in vivo scene.

In this embodiment, the first sample image and the second sample image of the same group are time-series images, and the first sample image and the corresponding second sample image have the same size. For example, the first sample image and the second sample image are each 600 × 800 in size.

S222: and inputting the first sample image and the second sample image which are separated by a preset number of frames into an image registration network model to obtain a corresponding predicted position variable between the first sample image and the second sample image.

Specifically, a set of training samples in the second training data set is simultaneously input to the image registration network model. That is, a set of training samples includes a first sample image and a corresponding second sample image. The first sample image and a second sample image spaced from the first sample image by a preset number of frames are input to an image registration network model.

And the image registration network model respectively performs feature matching on each pixel point in the first sample image and each pixel point in the second sample image to obtain the position information of the same pixel point in the first sample image and the position information of the same pixel point in the second sample image. That is to say, the position information of the same pixel point corresponding to the first sample image and the position information of the same pixel point corresponding to the second sample image are mapped to determine the corresponding predicted position variable between the first sample image and the second sample image.

In a specific embodiment, the prediction position variable corresponding between the first sample image and the second sample image may be a transform field phi, wherein,

. Phi denotes the pair between the first sample image and the second sample imageA corresponding predicted position variable; w represents the width of the image; h represents the height of the image; 2 denotes the amount of displacement of pixels mapped to each other in the first sample image and the second sample image in the horizontal direction and the vertical direction

，

，

，

。

Specifically, the first sample image I is obtained based on equation 2_kObtaining a second sample image I_n+k。

(formula 2)

Wherein the pixel coordinates in the first sample image are added with the displacement variables in the horizontal direction and the displacement variables in the vertical direction of the transformation field

，

Obtaining the transformed pixel coordinates, and respectively performing bilinear interpolation on each transformed pixel to obtain a second sample image I_n+k。

In an embodiment, a pixel point P = (x, y) is selected from the first sample image, and bilinear interpolation is performed on the pixel value of the pixel point P based on formula 3 to obtain a second sample image I_n+kThe pixel value of a point with the middle coordinate (x, y).

(formula 3)

In the formula: f (x, y) denotes a second sample image I_n+kThe pixel value of a point with the middle coordinate (x, y); q₁₁=（x₁，y₁），Q₁₂=（x₁，y₂），Q₂₁=（x₂，y₁），Q₂₂=（x₂，y₂），Q₁₁、Q₁₂、Q₂₁、Q₂₂Respectively representing the interpolation points of four integers.

S223: and iteratively training the image registration network model based on the error values between the real position variable and the predicted position variable between the first sample image and the second sample image of the same interval preset number of frames.

Specifically, an error value between a real position variable and a predicted position variable corresponding to a first sample image and a second sample image in the same training sample group is obtained through calculation, and iterative training is performed on the image registration network model based on the error value.

In a specific embodiment, an error value between a real transform field and a predicted transform field corresponding to a first sample image and a second sample image in the same training sample set is calculated, and the image registration network model is iteratively trained based on the error value.

In an optional embodiment, the result of the image registration network model is propagated reversely, and the weight value of the image registration network model is modified according to the fed-back loss value.

And simultaneously inputting a first sample image and a second sample image in the training sample into an image registration network model, wherein the image registration network model predicts a transformation field between the first sample image and the second sample image. When the error value between the first sample image and the second sample image in the same training sample is smaller than the preset threshold, the preset threshold may be set by itself, for example, 1%, 5%, and the like, and then the training of the image registration network model is stopped.

S23: and acquiring a video frame to be processed aiming at a target part of an object to be detected through image acquisition equipment.

In particular, the image acquisition device is a medical device that is accessible inside the living body. The image acquisition device can be an endoscope or other devices which can enter the living body. In this embodiment, taking an endoscope as an example, the endoscope is inserted into a body through a part such as mouth and nose, and the endoscope is used to photograph a target part of an object to be detected, and an image in the body is acquired in real time through the endoscope as a video frame to be processed. The endoscope in the embodiment can detect in the human body, namely, acquire an image in the human body as a video frame to be processed. In another embodiment, the detection can also be performed in the animal body through an endoscope, that is, an image in the animal body is acquired as a video frame to be processed.

In one embodiment, in a pending video frame I_n+kObtaining and processing video frame I from all previous historical video frames_n+kA historical video frame I spaced by n frames_k. That is, the acquired historical video frame I_kAnd a video frame I to be processed_n+kA preset number of frames apart. Acquiring historical video frames I due to the fact that the interval time between adjacent video frames is preset to be t_kAnd a video frame I to be processed_n+kThe time interval between them is nt. Historical video frame I_kAnd a video frame I to be processed_n+kThe two frames of images have time sequence and have the same size.

S24: and determining corresponding position variable information between the video frame to be processed and the historical video frame by adopting an image registration network model according to the video frame to be processed and the historical video frame before the video frame to be processed.

Specifically, a historical video frame I to be acquired_kAnd a video frame I to be processed_n+kSimultaneously inputting the image into the step S22 to be trained to obtain an image registration network model, and aligning the trained image registration network model to the video frame I to be processed_n+kEach pixel point in (1) and historical video frame I_kAnd respectively carrying out feature matching on each pixel point in the image. Corresponding to the video frame I to be processed according to the same pixel point_n+kPosition information in and history video frame I_kDetermining a video frame I to be processed_n+kAnd historical video frame I_kAre aligned with each otherCorresponding position variable information. The same pixel points are pixel points existing in the video frame to be processed and the historical video frame at the same time, and the same pixel points can comprise one or more pixel points and are determined based on actual conditions.

In one embodiment, historical video frames I to be captured_kAnd a video frame I to be processed_n+kSimultaneously inputting the data to step S22 to train to obtain an image registration network model, and calculating a historical video frame I_kAnd a video frame I to be processed_n+kThe corresponding transformation field between

. That is, the acquired history frame image I may be displayed_kObtaining a video frame I to be processed by converting the transformed field and bilinear interpolation based on the transformed field_n+k。

S25: and establishing a coordinate system based on the video frame to be processed and the historical video frame.

Specifically, in order to facilitate calculation of the position variation of each pixel point in the to-be-processed video frame and the historical video frame, the same coordinate system is established in the to-be-processed video frame and the historical video frame. The origin of the coordinate system is respectively the center point of the video frame to be processed and the center point of the historical video frame, and the center point of the video frame to be processed and the center point of the historical video frame are overlapped.

The coordinate system comprises four quadrants, wherein the four quadrants comprise a first quadrant, a second quadrant, a third quadrant and a fourth quadrant; the first quadrant is located above and to the right of the origin of the coordinate system, the second quadrant is located above and to the left of the origin of the coordinate system, the third quadrant is located below and to the left of the origin of the coordinate system, and the fourth quadrant is located below and to the right of the origin of the coordinate system.

S26: and carrying out statistics to obtain horizontal direction displacement variables and vertical direction displacement variables corresponding to four quadrants in the coordinate system respectively.

Specifically, the position variable information obtained in step S24 is divided into four quadrants. That is, the transformation field between the historical frame image and the current frame image is divided in four quadrants to obtain the average value of the displacement values of the quadrants in the four quadrants corresponding to the horizontal direction respectivelydx_sAnd the average dy of the displacement values in the vertical direction_sWhere s ∈ {1, 2, 3, 4 }.

Through the steps, 8 displacement values corresponding to four quadrants in the coordinate system are obtained.

In this embodiment, the position variable information obtained in step S24 is divided into four quadrants, so that the motion scene of the endoscope in the human body can be better modeled, and the motion transformation fields of the four quadrants can better reflect the current motion state of the endoscope.

S27: and respectively smoothing horizontal direction displacement variables and vertical direction displacement variables which respectively correspond to the video frame to be processed and the historical video frame in four quadrants.

Specifically, a kalman filter is adopted to smooth horizontal displacement variables and vertical displacement variables corresponding to the video frame to be processed and the historical video frame in four quadrants respectively.

S28: and determining the running speed and the running direction of the image acquisition equipment when the video frame to be processed is acquired by adopting a regression network model according to the horizontal direction displacement variable quantity and the vertical direction displacement variable quantity which are respectively corresponding to the video frame to be processed and the historical video frame in four quadrants.

Specifically, 8 displacement values corresponding to the video frame to be processed after the smoothing processing and the historical video frame are input into the regression network model obtained through the training in the step 21, the regression network model brings the input 8 displacement values into the formula 1, and the running speed of the endoscope when the current frame image is collected is directly calculated based on the 8 displacement values corresponding to the video frame to be processed and the historical video frame. Wherein the speed of travel of the endoscope has a directional vector.

In one embodiment, in response to that the horizontal direction displacement variable and the vertical direction displacement variable of the corresponding same pixel point in the video frame to be processed and the historical video frame in the first quadrant are both positive values, the horizontal displacement variable of the corresponding same pixel points in the video frame to be processed and the historical video frame in the second quadrant is a negative value, the vertical displacement variable is a positive value, the horizontal displacement variable and the vertical displacement variable of the corresponding same pixel points in the video frame to be processed and the historical video frame in the third quadrant are negative values, the horizontal displacement variable of the corresponding same pixel points in the video frame to be processed and the historical video frame in the fourth quadrant is a positive value, and the vertical displacement variable is a negative value, the regression network model outputs a positive endoscope travel speed indicating that the endoscope is moving forward from the historical video frame to the pending video frame. When the endoscope is operated forwards, the transformation field is in four quadrants and is outwards dispersed according to the difference of the quadrants.

In the embodiment, a regression network model is used for modeling the relation between the running speed and the running direction of the endoscope and the 8 displacement values corresponding to the four quadrants, so that the running speed of the endoscope can be accurately fitted through the numerical value of the transformation field, and the detection accuracy of the running speed is improved.

Referring to fig. 5, fig. 5 is a flowchart illustrating a motion estimation method of an image capturing device according to an embodiment of the present invention.

In a specific embodiment, after the regression network model is trained in advance, two frames of time sequence images obtained by the endoscope at intervals of n frames are read in, the obtained two frames of time sequence images are input into the image registration network model after being preprocessed, and the transformation fields corresponding to the two frames of time sequence images are output. The transformation field is divided into four quadrants corresponding to the average displacement in the horizontal direction and the vertical direction respectively, and 8 displacement values are obtained. And performing displacement parameter adjustment on the 8 displacement values by using a Kalman filter, thereby realizing the smoothing of the 8 displacement values. And inputting the obtained 8 displacement values into a trained regression network model to obtain the running speed and the running direction of the endoscope.

Referring to fig. 6, fig. 6 is a schematic block diagram of a motion estimation apparatus of an image capturing device provided by the present invention. The present embodiment provides a motion estimation apparatus 60 of an image capturing device, and the motion estimation apparatus 60 of the image capturing device includes an acquisition module 61, an analysis module 62, and an estimation module 63.

The acquiring module 61 is configured to acquire a to-be-processed video frame for a target portion of an object to be detected by an image acquiring device.

In particular, the image acquisition device is a medical device that is accessible inside the living body. The image acquisition device can be an endoscope or other devices which can enter the living body.

The analysis module 62 determines position variable information between the video frame to be processed and the historical video frame based on the position information of the same pixel point in the video frame to be processed and the historical video frame; the historical video frame is an image collected aiming at a target part of an object to be detected, and the collection time of the historical video frame is earlier than that of the video frame to be processed.

Specifically, the analysis module 62 is configured to perform feature matching on each pixel point in the video frame to be processed and each pixel point in the historical video frame, respectively, to obtain position information of the same pixel point in the video frame to be processed and position information of the same pixel point in the historical video frame.

In a specific embodiment, the analysis module 62 is configured to determine position variable information between the to-be-processed video frame and the historical video frame based on the position information of the same pixel point in the to-be-processed video frame and the historical video frame by using the image registration network model.

The estimation module 63 is configured to determine motion information of the image capturing device when capturing a video frame to be processed according to the position variable information.

In a specific embodiment, the estimation module 63 is configured to establish a coordinate system based on the video frame to be processed and the historical video frame; the origin of the coordinate system is superposed with the central points of the video frame to be processed and the historical video frame; determining displacement variables corresponding to the same pixel points in the video to be processed and the historical video, wherein the displacement variables represent displacement variables of a first position point and a second position point of the corresponding same pixel points, the first position point is a coordinate point of the corresponding same pixel point in the video frame to be processed in a coordinate system, and the second position point is a coordinate point of the corresponding same pixel point in the historical video frame in the coordinate system; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired based on the displacement variable corresponding to each same pixel point.

In one embodiment, the coordinate system includes four quadrants; the estimation module 63 is configured to count to obtain a horizontal displacement variable and a vertical displacement variable corresponding to the position variable information in each of the four quadrants; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the displacement variable in the horizontal direction and the displacement variable in the vertical direction.

In a specific embodiment, the estimation module 63 is configured to respectively smooth horizontal direction displacement variables and vertical direction displacement variables corresponding to the video frame to be processed and the historical video frame in four quadrants by using a kalman filter; and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the smooth horizontal displacement variable and the smooth vertical displacement variable.

In a particular embodiment, the movement information comprises at least one of a speed of movement and a direction of movement. The estimation module 63 is configured to determine a motion speed and a motion direction of the image capturing device when the video frame to be processed is acquired according to position variable information between the video frame to be processed and the historical video frame.

In a specific embodiment, the motion information comprises a motion direction, and the four quadrants comprise a first quadrant, a second quadrant, a third quadrant, and a fourth quadrant; the first quadrant is positioned at the upper right side of the origin of the coordinate system, the second quadrant is positioned at the upper left side of the origin of the coordinate system, the third quadrant is positioned at the lower left side of the origin of the coordinate system, and the fourth quadrant is limited at the lower right side of the origin of the coordinate system; the estimation module 63 is configured to determine that the operation direction of the image acquisition device is the forward direction if the horizontal displacement variable and the vertical displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the first quadrant are both positive values, the horizontal displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the second quadrant is a negative value, the vertical displacement variable is a positive value, the horizontal displacement variable and the vertical displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the third quadrant are both negative values, the horizontal displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the fourth quadrant is a positive value, and the vertical displacement variable is a negative value.

Specifically, the estimation module 63 is configured to determine, according to the position variable information, motion information of the image acquisition device when acquiring a video frame to be processed by using a regression network model.

According to the method and the device, the corresponding position variable information between the video frame to be processed and the historical video frame before the video frame to be processed is determined based on the video frame to be processed and the historical video frame before the video frame to be processed, the motion information of the image acquisition equipment when the video frame to be processed is acquired is determined according to the obtained position variable information, and the motion information of the image acquisition equipment when the video frame to be processed is acquired can be determined only through image transformation.

Referring to fig. 7, fig. 7 is a schematic diagram of a framework of an embodiment of the terminal of the present application. The terminal 80 comprises a memory 81 and a processor 82 coupled to each other, the processor 82 being configured to execute program instructions stored in the memory 81 to implement the steps of any of the embodiments of the motion estimation method of the image acquisition device described above. In one particular implementation scenario, the terminal 80 may include, but is not limited to: a microcomputer, a server, and in addition, the terminal 80 may further include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.

In particular, the processor 82 is configured to control itself and the memory 81 to implement the steps of any of the above-described embodiments of the motion estimation method of the image acquisition device. The processor 82 may also be referred to as a CPU (Central Processing Unit). The processor 82 may be an integrated circuit chip having signal processing capabilities. The Processor 82 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 82 may be collectively implemented by an integrated circuit chip.

Referring to fig. 8, fig. 8 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 90 stores program instructions 901 executable by the processor, the program instructions 901 being for implementing the steps of any one of the above-described embodiments of the motion estimation method for the image acquisition device.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A motion estimation method of an image capturing apparatus, the motion estimation method comprising:

acquiring a video frame to be processed aiming at a target part of an object to be detected through image acquisition equipment;

determining position variable information between the video frame to be processed and the historical video frame based on the position information of the same pixel point in the video frame to be processed and the historical video frame; the historical video frame is an image collected aiming at the target part of the object to be detected, and the collection time of the historical video frame is earlier than that of the video frame to be processed;

and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information.

2. The method according to claim 1, wherein the determining, according to the position variable information, the motion information of the image capturing device when capturing the video frame to be processed comprises:

establishing a coordinate system based on the video frame to be processed and the historical video frame; the origin of the coordinate system is superposed with the central points of the video frame to be processed and the historical video frame;

determining a displacement variable corresponding to each same pixel point in the video to be processed and the historical video, wherein the displacement variable represents displacement variables of a first position point and a second position point of the corresponding same pixel point, the first position point is a coordinate point of the corresponding same pixel point in the video frame to be processed in the coordinate system, and the second position point is a coordinate point of the corresponding same pixel point in the historical video frame in the coordinate system;

and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired based on the displacement variable corresponding to each same pixel point.

3. The motion estimation method according to claim 2, characterized in that the coordinate system comprises four quadrants; the determining, according to the position variable information, motion information of the image capturing device when the video frame to be processed is captured further includes:

counting to obtain a horizontal displacement variable and a vertical displacement variable corresponding to the position variable information in each of the four quadrants;

and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the horizontal displacement variable and the vertical displacement variable.

4. The motion estimation method according to claim 3, wherein the determining motion information of the image capturing device when capturing the video frame to be processed according to the horizontal direction displacement variable and the vertical direction displacement variable further comprises:

smoothing the displacement variable in the horizontal direction and the displacement variable in the vertical direction by adopting a Kalman filter;

the determining the motion information of the image acquisition equipment when acquiring the video frame to be processed according to the displacement variable in the horizontal direction and the displacement variable in the vertical direction comprises the following steps:

and determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the smooth horizontal direction displacement variable and the smooth vertical direction displacement variable.

5. A motion estimation method according to any one of claims 1 to 4, characterized in that the motion information comprises at least one of a motion speed and a motion direction.

6. The method according to claim 5, wherein the motion information comprises a motion direction, and the four quadrants comprise a first quadrant, a second quadrant, a third quadrant, and a fourth quadrant; the first quadrant is located above and to the right of the origin of the coordinate system, the second quadrant is located above and to the left of the origin of the coordinate system, the third quadrant is located below and to the left of the origin of the coordinate system, and the fourth quadrant is located below and to the right of the origin of the coordinate system;

the determining the motion information of the image acquisition device when acquiring the video frame to be processed according to the position variable information includes:

responding to that the horizontal direction displacement variable and the vertical direction displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the first quadrant are both positive values, and the horizontal direction displacement variable of the same pixel point corresponding to the video frame to be processed and the historical video frame in the second quadrant is a negative value, the vertical direction displacement variable is a positive value, the horizontal direction displacement variable and the vertical direction displacement variable of the same corresponding pixel points in the video frame to be processed and the historical video frame in the third quadrant are both negative values, and if the horizontal direction displacement variable of the same corresponding pixel point in the video frame to be processed and the historical video frame in the fourth quadrant is a positive value and the vertical direction displacement variable is a negative value, determining that the running direction of the image acquisition equipment is a forward direction.

7. The motion estimation method according to any one of claims 1 to 4,

the determining, according to the position variable information, motion information of the image capturing device when the video frame to be processed is captured includes:

determining motion information of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information by adopting a regression network model;

the training method of the regression network model comprises the following steps:

acquiring a first training data set, wherein the first training data set comprises a plurality of groups of training data containing horizontal direction displacement variables and vertical direction displacement variables; each group of training data is marked with real motion information;

inputting the training data into the regression network model to obtain predicted motion information corresponding to the training data;

and performing iterative training on the regression network model based on the error value between the real motion information and the predicted motion information corresponding to the same training data.

8. The motion estimation method according to any one of claims 1 to 4,

the method comprises the following steps of determining position variable information between a video frame to be processed and a historical video frame based on position information of the same pixel point in the video frame to be processed and the historical video frame, wherein the method also comprises the following steps:

and respectively carrying out feature matching on each pixel point in the video frame to be processed and each pixel point in the historical video frame to obtain the same position information of the pixel points in the video frame to be processed and the position information in the historical video frame.

9. The motion estimation method according to any one of claims 1 to 4,

the determining the position variable information between the video frame to be processed and the historical video frame based on the position information of the same pixel point in the video frame to be processed and the historical video frame comprises:

determining position variable information between the video frame to be processed and the historical video frame by adopting an image registration network model based on the same position information of the pixel points in the video frame to be processed and the historical video frame;

the training method of the image registration network model comprises the following steps:

acquiring a second training data set, wherein the second training data set comprises a plurality of first sample images and second sample images; a preset number of frames are spaced between the first sample image and the second sample image, and a real position variable is correspondingly marked between the first sample image and the second sample image spaced by the preset number of frames;

inputting the first sample image and the second sample image which are separated by a preset number of frames into the image registration network model to obtain a corresponding predicted position variable between the first sample image and the second sample image;

iteratively training the image registration network model based on error values between the real position variables and the predicted position variables corresponding to the first sample image and the second sample image at the same interval and in a preset number of frames.

10. A motion estimation apparatus of an image capturing device, characterized in that the motion estimation apparatus of the image capturing device comprises:

the acquisition module is used for acquiring a video frame to be processed aiming at a target part of an object to be detected through image acquisition equipment;

the analysis module is used for determining position variable information between the video frame to be processed and the historical video frame based on the position information of the same pixel point in the video frame to be processed and the historical video frame; the historical video frame is an image collected aiming at the target part of the object to be detected, and the collection time of the historical video frame is earlier than that of the video frame to be processed;

and the estimation module is used for determining the motion information of the image acquisition equipment when the video frame to be processed is acquired according to the position variable information.

11. A terminal, characterized in that the terminal comprises a memory, a processor and a computer program stored in the memory and running on the processor, the processor being configured to execute the sequence data to implement the steps in the method for motion estimation of an image acquisition device according to any of claims 1-9.

12. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for motion estimation of an image acquisition device according to any one of claims 1 to 9.