CN112184757A

CN112184757A - Method and device for determining motion trail, storage medium and electronic device

Info

Publication number: CN112184757A
Application number: CN202011044996.8A
Authority: CN
Inventors: 伊进延; 王晓鲁; 任宇鹏; 卢维
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2021-01-05
Anticipated expiration: 2040-09-28
Also published as: CN112184757B

Abstract

The embodiment of the invention provides a method and a device for determining a motion trail, a storage medium and an electronic device, wherein the method comprises the following steps: carrying out image processing on the obtained N frames of images to be processed to obtain processing parameters of the N frames of images to be processed; adjusting the scene flow of the N frames of images to be processed by using the processing parameters to obtain a target scene flow; tracking the target object by using the processing parameters and the target scene flow to obtain a first motion track of the target object; a second motion trajectory of the target object is determined based on the processing parameter and the first motion trajectory. By the method and the device, the problem of inaccurate determination of the motion trail in the related technology is solved, and the effect of accurately determining the motion trail is achieved.

Description

Method and device for determining motion trail, storage medium and electronic device

Technical Field

The embodiment of the invention relates to the field of images, in particular to a method and a device for determining a motion trail, a storage medium and an electronic device.

Background

With the continuous development of computer vision technology and computer network technology, video monitoring has a great deal of demands in many fields such as security protection, finance and education. In these application areas, human target motion estimation is the main content of the monitoring task. In a smart financial application scene, personnel safety can be improved by monitoring hall personnel of an Automatic Teller Machine (ATM). Under the intelligent education application scene, student behavior monitoring is of great importance to teaching quality and teaching safety. The most important of monitoring the human target is to calculate the motion information of the target in real time, especially the three-dimensional motion information which can represent the real motion of the target.

At present, three-dimensional motion estimation schemes include a motion estimation mode based on an iterative Closest point algorithm (ICP), a motion estimation mode based on a multi-view vision theory, a motion estimation mode based on a scene stream, and the like. The above-described determination method has a problem of inaccuracy.

In view of the above technical problems, no effective solution has been proposed in the related art.

Disclosure of Invention

The embodiment of the invention provides a method and a device for determining a motion trail, a storage medium and an electronic device, which are used for at least solving the problem of inaccurate determination of the motion trail in the related art.

According to an embodiment of the present invention, there is provided a method for determining a motion trajectory, including: performing image processing on N acquired images to be processed to obtain processing parameters of the N images to be processed, wherein the processing parameters include a disparity map of the N images to be processed, an example segmentation mask of the N images to be processed, and an optical flow map of the N images to be processed, the N images to be processed are obtained by shooting the same target object at different angles, and N is a natural number greater than or equal to 1; adjusting the scene flow of the N frames of images to be processed by using the processing parameters to obtain a target scene flow; tracking the target object by using the processing parameters and the target scene stream to obtain a first motion track of the target object; and determining a second motion trail of the target object based on the processing parameters and the first motion trail.

According to another embodiment of the present invention, there is provided a motion trajectory determination apparatus including: a first determining module, configured to perform image processing on N acquired images to be processed to obtain processing parameters of the N images to be processed, where the processing parameters include a disparity map of the N images to be processed, an example segmentation mask of the N images to be processed, and an optical flow map of the N images to be processed, the N images to be processed are obtained by shooting the same target object at different angles, and N is a natural number greater than or equal to 1; the second determining module is used for adjusting the scene flow of the N frames of images to be processed by using the processing parameters to obtain a target scene flow; a third determining module, configured to track the target object by using the processing parameter and the target scene stream, and obtain a first motion trajectory of the target object; and the fourth determining module is used for determining a second motion trail of the target object based on the processing parameters and the first motion trail.

In an exemplary embodiment, the first determining module includes: a first determining unit, configured to input the N frames of images to be processed into a determined disparity map network model, and obtain a disparity map of the images to be processed output by the disparity map network model, where the disparity map network model is determined based on a deep learning stereo matching network; a second determining unit, configured to perform instance segmentation and labeling on the same target object in the N frames of images to be processed by using an instance segmentation network model, so as to obtain an instance segmentation mask of the target object; and the third determining unit is used for calculating the optical flow of the N frames of images to be processed by utilizing the optical flow estimation network model to obtain an optical flow graph of the N frames of images to be processed.

In an exemplary embodiment, the second determining module includes: a fourth determining unit, configured to construct an RT-based energy function using the disparity map, the example segmentation mask, and the light flow map to obtain an RT energy function; a fifth determining unit, configured to adjust the RT energy function to obtain a target RT; and a sixth determining unit, configured to adjust the scene stream of the N frames of images to be processed by using the target RT, so as to obtain a target scene stream.

In an exemplary embodiment, the fourth determining unit includes: the first determining subunit is used for determining an initialization RT matrix; a second determining subunit, configured to determine a constraint term of the RT energy function, where the constraint term includes a photometric error constraint term, a rigid fitting constraint term, and an optical flow consistency constraint term, the photometric error constraint term is used to constrain a gray value of the target object, the rigid fitting constraint term is used to constrain a three-dimensional point cloud of the target object, and the optical flow consistency constraint term is used to constrain an optical flow value of the target object; a third determining subunit, configured to determine the RT energy function based on the initialized RT matrix and the constraint term.

In an exemplary embodiment, the fifth determining unit includes: the fourth determining subunit is used for optimizing the RT energy function by using a Gauss-Newton iteration method to obtain an optimized RT energy function; a fifth determining subunit, configured to minimize the optimized RT energy function to obtain the target RT, where minimizing the optimized RT energy function includes: minimizing the photometric error constraint term, the rigid fit constraint term, and the optical flow consistency constraint term.

In an exemplary embodiment, the third determining module includes: a seventh determining unit, configured to determine an instance segmentation mask of the target object from the instance segmentation masks of the N frames of images to be processed; the first calculation unit is used for calculating the maximum external frame of the target object in the image to be processed based on the example segmentation mask of the target object; an eighth determining unit, configured to determine a cost incidence matrix of the target object by using the maximum outside bounding box, where the cost incidence matrix is used to represent incidence costs of the target object in a kth frame of to-be-processed image and a kth-1 frame of to-be-processed image, and K is less than or equal to the natural number of N; and a first association unit, configured to associate data of the target object according to the cost association matrix, so as to output the first motion trajectory from the data.

In an exemplary embodiment, the fourth determining module includes: a ninth determining unit, configured to determine a two-dimensional coordinate value of the target object from the first motion trajectory; a tenth determining unit, configured to determine a first three-dimensional space coordinate value of the target object according to the two-dimensional coordinate value, the disparity map of the N frames of images to be processed, and a calibration parameter of the image capturing device; an eleventh determining unit, configured to predict, by using the first three-dimensional spatial coordinate value and a target rotation-translation matrix RT of the target object determined in the target scene stream, a second three-dimensional spatial coordinate value of the target object in a next frame of to-be-processed image; a twelfth determining unit configured to determine a three-dimensional space barycentric position of the target object by using the second three-dimensional space coordinate value; a twelfth determining unit, configured to determine the second motion trajectory from the three-dimensional spatial gravity center position.

According to a further embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the invention, the acquired N frames of images to be processed are subjected to image processing to obtain the processing parameters of the N frames of images to be processed, wherein the processing parameters comprise a parallax image of the N frames of images to be processed, an example segmentation mask of the N frames of images to be processed and an optical flow image of the N frames of images to be processed, the N frames of images to be processed are shot at different angles aiming at the same target object, and N is a natural number greater than or equal to 1; adjusting the scene flow of the N frames of images to be processed by using the processing parameters to obtain a target scene flow; tracking the target object by using the processing parameters and the target scene flow to obtain a first motion track of the target object; a second motion trajectory of the target object is determined based on the processing parameter and the first motion trajectory. Motion estimation may be performed based on a plurality of motion information. Therefore, the problem that the determination of the motion trail is inaccurate in the related art can be solved, and the effect of accurately determining the motion trail is achieved.

Drawings

Fig. 1 is a block diagram of a hardware structure of a mobile terminal of a method for determining a motion trajectory according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of determining a motion trajectory according to an embodiment of the present invention;

FIG. 3 is an overall flow diagram according to an embodiment of the invention;

fig. 4 is a block diagram of a motion trajectory determination apparatus according to an embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings in conjunction with the embodiments.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided in the embodiments of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking the operation on the mobile terminal as an example, fig. 1 is a hardware structure block diagram of the mobile terminal of the method for determining a motion trajectory according to the embodiment of the present invention. As shown in fig. 1, the mobile terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, wherein the mobile terminal may further include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the determination method of the motion trajectory in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In this embodiment, a method for determining a motion trajectory is provided, and fig. 2 is a flowchart of a method for determining a motion trajectory according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S202, image processing is carried out on the acquired N frames of images to be processed to obtain processing parameters of the N frames of images to be processed, wherein the processing parameters comprise a parallax image of the N frames of images to be processed, an example segmentation mask of the N frames of images to be processed and an optical flow image of the N frames of images to be processed, the N frames of images to be processed are obtained by shooting the same target object at different angles, and N is a natural number greater than or equal to 1;

step S204, adjusting the scene flow of the N frames of images to be processed by using the processing parameters to obtain a target scene flow;

step S206, tracking the target object by using the processing parameters and the target scene flow to obtain a first motion track of the target object;

in step S208, a second motion trajectory of the target object is determined based on the processing parameter and the first motion trajectory.

The execution subject of the above steps may be a terminal, but is not limited thereto.

Optionally, the present embodiments include, but are not limited to, applications in scenarios where the motion trajectory of a target object is estimated, for example, in smart financial applications, self-service ATM lobby personnel monitoring may improve personal and property safety. Under the intelligent education application scene, student behavior monitoring is of great importance to teaching quality and teaching safety.

Optionally, in this embodiment, N frames of images to be processed may be acquired by a binocular camera, and the target object includes, but is not limited to, a person. The binocular camera adopts the mode of slant dress, can shoot all people's head region in the scene, and people's head region can regard as the target of rigid motion.

Optionally, continuous frames of left and right eye images are acquired by a binocular camera in an obliquely installed mode, a monitoring picture can cover a head area of a moving target in a scene, and the left and right eye images are subjected to three-dimensional calibration and three-dimensional correction to obtain internal and external parameters and corrected images of the camera. The method comprises the steps of computing a disparity map by using a stereo matching network based on deep learning, computing an example segmentation mask by using an example segmentation network, computing a light flow graph by using a light flow estimation network, and optimizing a scene flow by using three constraint items. And carrying out multi-target tracking according to the optimized scene flow and the example segmentation to obtain a two-dimensional motion track. Finally, a three-dimensional motion trajectory is calculated through the optimized scene flow, as shown in fig. 3, which is an overall flowchart in the present embodiment, and mainly includes scene flow optimization based on parallax, optical flow, and instance segmentation, multi-target tracking based on instance segmentation and scene flow, and three-dimensional motion estimation of a target.

Through the steps, the acquired N frames of images to be processed are subjected to image processing to obtain processing parameters of the N frames of images to be processed, wherein the processing parameters comprise a parallax image of the N frames of images to be processed, an example segmentation mask of the N frames of images to be processed and an optical flow image of the N frames of images to be processed, the N frames of images to be processed are shot at different angles aiming at the same target object, and N is a natural number greater than or equal to 1; adjusting the scene flow of the N frames of images to be processed by using the processing parameters to obtain a target scene flow; tracking the target object by using the processing parameters and the target scene flow to obtain a first motion track of the target object; a second motion trajectory of the target object is determined based on the processing parameter and the first motion trajectory. Motion estimation may be performed based on a plurality of motion information. Therefore, the problem that the determination of the motion trail is inaccurate in the related art can be solved, and the effect of accurately determining the motion trail is achieved.

In an exemplary embodiment, the image processing on the acquired N frames of images to be processed to obtain processing parameters of the N frames of images to be processed includes:

s1, inputting the N frames of images to be processed into the determined disparity map network model to obtain a disparity map of the images to be processed output by the disparity map network model, wherein the disparity map network model is determined by a stereo matching network based on deep learning;

s2, carrying out instance segmentation and labeling on the same target object in the N frames of images to be processed by using the instance segmentation network model to obtain an instance segmentation mask of the target object;

and S3, calculating the optical flow of the N frames of images to be processed by using the optical flow estimation network model to obtain an optical flow graph of the N frames of images to be processed.

Optionally, in this embodiment, the scene stream represents a dense three-dimensional motion field of each point of the dynamic scene or the three-dimensional object in the scene. The scene stream may be calculated from binocular stereo images of successive frames. The three-dimensional motion field is typically recovered by disparity estimation and optical flow estimation. The scene stream represents the instantaneous motion vectors of a three-dimensional scene. The light flow graph can be obtained through calculation by using a plurality of frames of binocular images captured by the calibrated binocular cameras, and the parallax graph can be calculated through stereo matching. Meanwhile, in the embodiment, the scene flow is optimized for each target instance in the scene, and instance segmentation is also required.

Optionally, in order to obtain a dense disparity map, the embodiment employs a stereo matching network AANet based on deep learning. Supervised is firstly carried out on an open source parallax data set DrivingStereo, and then the non-label data collected by a binocular camera is used for carrying out unsupervised training in an application scene. And finally, obtaining a network model capable of predicting the disparity map in real time.

Optionally, the optical flow calculation adopts an optical flow estimation network FlowNet based on deep learning. Supervised pre-training is performed on the open source light flow data set Sceneflow first, and then unsupervised training is performed on the acquired data set using the pre-trained network model.

Optionally, the instance splitting network employs a centramask network. Firstly, carrying out instance segmentation and labeling on data collected in an application scene, and then, carrying out supervised training on the network. In the scene flow optimization process, each head region can be regarded as a rigid body which does rigid motion, and the instance labeling of the data set is mainly the labeling of the head region of a moving target.

In an exemplary embodiment, adjusting a scene stream of N frames of images to be processed using processing parameters to obtain a target scene stream includes:

s1, constructing an energy function based on the RT by using the parallax map, the example segmentation mask and the optical flow map to obtain an RT energy function;

s2, adjusting the RT energy function to obtain a target RT;

and S3, adjusting the scene flow of the N frames of images to be processed by using the target RT to obtain a target scene flow.

Optionally, in this embodiment, the head of each target object may be regarded as an example of rigid motion. Each instance of the scene doing rigid motion has its own rotation and translation matrix (RT matrix) different due to the difference of its own motion speed and direction. An RT matrix is computed for each instance to predict the three-dimensional motion of the instance.

Alternatively, a mask for each instance (head of the target object) is calculated by an instance division network, and then a disparity and an optical flow of the current frame are calculated using a disparity estimation network and an optical flow estimation network. And constructing an energy function based on the RT matrix through the example segmentation result, the disparity map and the optical flow map, and minimizing the energy function through an optimization method to obtain an optimal RT matrix. The energy function is minimized, which is to solve an optimal RT matrix to minimize the energy function.

In an exemplary embodiment, constructing an RT matrix-based energy function of a target object using a disparity map, an instance segmentation mask, and an optical flow map, resulting in an RT energy function, comprises:

s1, determining an initialization RT matrix;

s2, determining a constraint term of the RT energy function, wherein the constraint term comprises a luminosity error constraint term, a rigid fitting constraint term and an optical flow consistency constraint term, the luminosity error constraint term is used for constraining the gray value of the target object, the rigid fitting constraint term is used for constraining the three-dimensional point cloud of the target object, and the optical flow consistency constraint term is used for constraining the optical flow value of the target object;

s3, an RT energy function is determined based on the initialized RT matrix and the constraint terms.

Alternatively, in this embodiment, a reasonable initial RT matrix can be calculated using the disparity, optical flow and example segmentation results, optimized using gaussian newton's iteration.

In an exemplary embodiment, adjusting the RT energy function to obtain the target RT comprises:

s1, optimizing the RT energy function by using a Gauss-Newton iteration method to obtain an optimized RT energy function;

and S2, minimizing the optimized RT energy function to obtain the target RT, wherein the minimizing the optimized RT energy function comprises: a photometric error constraint term, a rigid fit constraint term, and an optical flow consistency constraint term are minimized.

Alternatively, in the present embodiment, L is used⁰,R⁰,L¹,R¹Representing two successive frames of binocular image pairs, D⁰,D¹Disparity maps, F, representing the 0 th and t1 th frames, respectively_L,F_RRespectively showing the corresponding optical flow diagrams of two continuous frames of left and right eyes,

the example segmentation result of the t0 th frame of the left eye image is shown, and RT shows the rotation and translation matrix of the target between the previous frame and the next frame.

Optionally, initializing the RT comprises: from the parallax and the example division result, three-dimensional coordinates XYZ _ t0 and XYZ _ t1 at times t0 and t1 of each example are calculated. The three-dimensional coordinates XYZ _ t1 are subjected To optical flow shift To obtain new three-dimensional coordinates XYZ _ t1To0_ flow. And then selecting a random subset of XYZ _ t0 and XYZ _ t1To0_ flow To obtain an initial pose RT, applying the pose To all points in the example To perform affine transformation To obtain XYZ _ t1To0_ RT, calculating the deviation of the XYZ _ t1To0_ flow and the XYZ _ t1To0_ RT, returning a pose estimation result if the point ratio of the deviation in the example is less than a set threshold value and exceeds a set value, and otherwise recalculating the initial RT.

Optionally, determining the energy function constraint comprises: for L⁰At each instance of (1), in L¹Should have similar gray scale distribution, so the same example is at L⁰And L¹The medium grey value difference should be small. The gray value difference of each instance can be used as a constraint term, namely a photometric error constraint term, which is marked as E_photo. For L⁰For each example, the three-dimensional point cloud reconstructed in the frame 0 can be fitted to the three-dimensional point cloud reconstructed in the frame 1 after being subjected to rotational translation. From this principle, a rigid fitting constraint term, denoted as E, can be constructed_rigid. The three-dimensional point of the pixel point p in a certain example of the 0 th frame is re-projected to the corresponding point on the image of the 1 st frame through the rotation and translation matrix RT and is marked as p'. And the coordinate difference between the pixel point p' and the pixel point p is the optical flow value of the p point. Optical flow F of optical flow estimation network computation_L(p' -p) calculated with each instance pixel Point an optical flow consistency constraint term can be constructed, denoted E_flow。

Optionally, the optimizing of the energy function and the scene stream comprises: three energy constraint terms with the rotation-translation matrix RT as arguments and the initial RT are constructed above. And respectively constraining three levels of gray value, three-dimensional point cloud and optical flow of each example in the scene by using a luminosity error constraint term, a rigid fitting constraint term and an optical flow consistency constraint term. The three constraint terms are taken as energy functions and minimized through an optimization method, and then the motion estimation RT of the corresponding example can be obtained. For each example, the final energy function is as follows:

E＝λ_photoE_photo+λ_rigidE_rigid+λ_flowE_flowformula (1);

wherein λ is_photo、λ_rigidAnd λ_flowIs the weight coefficient corresponding to each constraint term. The optimization method adopts a Gauss-Newton iteration method. By minimizing the energy function E_iAnd calculating an optimal rotation and translation matrix RT corresponding to the target example.

Obtaining the optimal parallax D of each example according to the rotation translation matrix RT obtained after optimization, the example segmentation result and the parallax map of the 0 th frame⁰,D¹Optical flow diagram F^rigid。

In an exemplary embodiment, tracking the target object using the processing parameters and the target scene stream to obtain a first motion trajectory of the target object includes:

s1, determining an example segmentation mask of the target object from the example segmentation masks of the N frames of images to be processed;

s2, calculating the maximum external frame of the target object in the image to be processed based on the example segmentation mask of the target object;

s3, determining a cost incidence matrix of the target object by using the maximum external frame, wherein the cost incidence matrix is used for representing the incidence cost of the target object in the Kth frame of image to be processed and the Kth-1 frame of image to be processed, and K is a natural number less than or equal to N;

and S4, correlating the data of the target object according to the cost incidence matrix to output a first motion trail from the data.

Optionally, in this embodiment, a multi-target tracking idea based on detection is utilized, the example segmentation result is calculated first, then the maximum external frame is calculated according to the mask of each example, then a cost incidence matrix is constructed, data association is performed according to the cost incidence matrix, and finally a target track is output according to the data association result.

Optionally, the associated cost matrix is constructed according to the human head example mask in the example segmentation result and the optimized example optical flow. And calculating to obtain a circumscribed frame and a central coordinate of the human head example, and then calculating a pairwise target association cost matrix cost _ mat of the m targets of the (k-1) th frame and the n targets of the (k) th frame. The associated cost is composed of a human head instance mask IOU _ ins (cross-over ratio), a bounding box IOU _ box, and a center point distance. The smaller the associated cost of two targets, the more similar the two.

Alternatively, the calculation of the example mask intersection ratio IOU _ ins requires the use of the optimized example optical flow F^rigid. First using the example optical flow F^rigidAnd mapping the example mask in the k-1 frame to the kth frame to obtain the example mask instance _ warp of the kth frame. Then, the instance _ warp and the instance mask intersection ratio of the n instances of the target of frame k-1, i.e., IOU _ ins, are calculated. The thus calculated IOU _ ins not only defines the search range of the associated target using the optical flow information, but also reduces to some extent the influence of an excessively large IOU _ box value due to occlusion or too close target position.

Optionally, for two targets to be associated in the previous and subsequent frames, first, it is determined whether the distance between the central points of the two targets is smaller than T (T is set to 40 pixels, assuming that the maximum distance between two targets moving in the two consecutive frames in the image is smaller than 40 pixels), and if the distance between the central points is smaller than T, the values of the IOU _ ins and the IOU _ box of the targets are determined. In order to improve the robustness of the algorithm, the correlation cost is calculated by combining the IOU _ ins and the IOU _ box. The correlation cost calculation formula is as follows:

the smaller the cost, the higher the probability that two objects are the same object. And when the cost is less than the preset threshold value T _ iou, the cost and the threshold value T _ iou are considered as the same target. The threshold value T _ iou is an adjustable parameter and is set to 0.6, so that a good tracking effect can be obtained. And finally outputting a two-dimensional tracking track of the target by multi-target tracking.

In one exemplary embodiment, determining the second motion profile of the target object based on the processing parameter and the first motion profile includes:

s1, determining a two-dimensional coordinate value of the target object from the first motion track;

s2, determining a first three-dimensional space coordinate value of the target object according to the two-dimensional coordinate value, the disparity map of the N frames of images to be processed and the calibration parameters of the camera device;

s3, predicting a second three-dimensional space coordinate value of the target object in the next frame of image to be processed by using the first three-dimensional space coordinate value and a target rotation translation matrix RT of the target object determined in the target scene stream;

s4, determining the three-dimensional space gravity center position of the target object by using the second three-dimensional space coordinate value;

and S5, determining a second motion track from the three-dimensional space gravity center position.

Optionally, the disparity map, the example segmentation map, the rotation-translation matrix RT, and the multi-target tracking two-dimensional tracking trajectory are calculated respectively in the above embodiments. In order to calculate the three-dimensional motion information of the target, the three-dimensional point cloud information of each target instance needs to be calculated, then the three-dimensional position information of the target at the next moment is predicted by using a rotation and translation matrix RT, and the three-dimensional motion of each target can be obtained by combining the two-dimensional motion track tracked by multiple targets.

Optionally, an example segmentation mask of each target is obtained according to the multi-target tracking track, two-dimensional coordinates [ u, v ] of the example are calculated, and then three-dimensional space coordinates [ X, Y, Z ] of the example are calculated according to the disparity map and the camera binocular calibration parameters. The calculation formula is as follows:

wherein, Baseline represents the base line distance of the binocular camera, u and v respectively represent the abscissa and ordinate of the pixel point in the example, d represents the parallax value of the corresponding point, and u0 and v0 respectively represent the abscissa and ordinate of the center point of the left eye image. Then, the three-dimensional space coordinates [ X, Y, Z ] of the example and its rotational translation matrix RT are combined to predict the three-dimensional space coordinates [ X _, Y _, Z _ ] of its next time instant. The calculation formula is as follows:

and predicting the three-dimensional space coordinates [ X _, Y _, Z _ ] of the next moment of the example, and finally calculating the three-dimensional space gravity center position of the example as a tracking track point of the three-dimensional motion, wherein the three-dimensional space coordinates of the gravity center are marked as [ X, Y, Z ]. The calculation formula is as follows:

where N represents the total number of three-dimensional points of the instance and i represents the ith three-dimensional point. Therefore, the two-dimensional motion track and the three-dimensional space motion track of the human head area can be obtained through example segmentation, a disparity map and scene flow combined multi-target tracking.

In summary, the present embodiment employs an energy function constructed based on three constraint terms, and performs reasonable initialization of RT by using optical flow information, and then can stably output the accurate rotation-translation matrix RT of each instance by minimizing the energy function of the instance. The head areas of all target personnel in the monitored area can be shot by adopting an image acquisition mode of an obliquely-installed binocular camera, and each target head area can be regarded as an example doing rigid motion. The kinect effective depth measuring range used in the patent 1 is limited, and the target with a long distance is difficult to effectively detect. The binocular camera has a longer effective detection distance than kinect. The target head area is detected by using a mode based on example segmentation, targets which are close to each other or are side by side can be effectively distinguished, and false detection of the targets in a close range is not easy to cause.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, a device for determining a motion trajectory is further provided, where the device is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 4 is a block diagram of a device for determining a motion trajectory according to an embodiment of the present invention, as shown in fig. 4, the device includes:

the first determining module 42 is configured to perform image processing on the acquired N frames of images to be processed to obtain processing parameters of the N frames of images to be processed, where the processing parameters include a disparity map of the N frames of images to be processed, an example segmentation mask of the N frames of images to be processed, and an optical flow map of the N frames of images to be processed, the N frames of images to be processed are obtained by shooting the same target object at different angles, and N is a natural number greater than or equal to 1;

the second determining module 44 is configured to adjust the scene stream of the N frames of images to be processed by using the processing parameter, so as to obtain a target scene stream;

a third determining module 46, configured to track the target object by using the processing parameters and the target scene stream, to obtain a first motion trajectory of the target object;

a fourth determining module 48, configured to determine a second motion trajectory of the target object based on the processing parameter and the first motion trajectory.

In an exemplary embodiment, the first determining module includes:

a first determining unit, configured to input the N frames of images to be processed into a determined disparity map network model, and obtain a disparity map of the images to be processed output by the disparity map network model, where the disparity map network model is determined based on a deep learning stereo matching network;

a second determining unit, configured to perform instance segmentation and labeling on the same target object in the N frames of images to be processed by using an instance segmentation network model, so as to obtain an instance segmentation mask of the target object;

and the third determining unit is used for calculating the optical flow of the N frames of images to be processed by utilizing the optical flow estimation network model to obtain an optical flow graph of the N frames of images to be processed.

In an exemplary embodiment, the second determining module includes:

a fourth determining unit, configured to construct an RT-based energy function using the disparity map, the example segmentation mask, and the light flow map to obtain an RT energy function;

a fifth determining unit, configured to adjust the RT energy function to obtain a target RT;

and a sixth determining unit, configured to adjust the scene stream of the N frames of images to be processed by using the target RT, so as to obtain a target scene stream.

In an exemplary embodiment, the fourth determining unit includes:

the first determining subunit is used for determining an initialization RT matrix;

a second determining subunit, configured to determine a constraint term of the RT energy function, where the constraint term includes a photometric error constraint term, a rigid fitting constraint term, and an optical flow consistency constraint term, the photometric error constraint term is used to constrain a gray value of the target object, the rigid fitting constraint term is used to constrain a three-dimensional point cloud of the target object, and the optical flow consistency constraint term is used to constrain an optical flow value of the target object;

a third determining subunit, configured to determine the RT energy function based on the initialized RT matrix and the constraint term.

In an exemplary embodiment, the fifth determining unit includes:

the fourth determining subunit is used for optimizing the RT energy function by using a Gauss-Newton iteration method to obtain an optimized RT energy function;

a fifth determining subunit, configured to minimize the optimized RT energy function to obtain the target RT, where minimizing the optimized RT energy function includes: minimizing the photometric error constraint term, the rigid fit constraint term, and the optical flow consistency constraint term.

In an exemplary embodiment, the third determining module includes:

a seventh determining unit, configured to determine an instance segmentation mask of the target object from the instance segmentation masks of the N frames of images to be processed;

the first calculation unit is used for calculating the maximum external frame of the target object in the image to be processed based on the example segmentation mask of the target object;

an eighth determining unit, configured to determine a cost incidence matrix of the target object by using the maximum outside bounding box, where the cost incidence matrix is used to represent incidence costs of the target object in a kth frame of to-be-processed image and a kth-1 frame of to-be-processed image, and K is less than or equal to the natural number of N;

and a first association unit, configured to associate data of the target object according to the cost association matrix, so as to output the first motion trajectory from the data.

In an exemplary embodiment, the fourth determining module includes:

a ninth determining unit, configured to determine a two-dimensional coordinate value of the target object from the first motion trajectory;

a tenth determining unit, configured to determine a first three-dimensional space coordinate value of the target object according to the two-dimensional coordinate value, the disparity map of the N frames of images to be processed, and a calibration parameter of the image capturing device;

an eleventh determining unit, configured to predict, by using the first three-dimensional spatial coordinate value and a target rotation-translation matrix RT of the target object determined in the target scene stream, a second three-dimensional spatial coordinate value of the target object in a next frame of to-be-processed image;

a twelfth determining unit configured to determine a three-dimensional space barycentric position of the target object by using the second three-dimensional space coordinate value;

a twelfth determining unit, configured to determine the second motion trajectory from the three-dimensional spatial gravity center position.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.

In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary embodiments, and details of this embodiment are not repeated herein.

It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for determining a motion trajectory is characterized by comprising the following steps:

performing image processing on the acquired N frames of images to be processed to obtain processing parameters of the N frames of images to be processed, wherein the processing parameters comprise a disparity map of the N frames of images to be processed, an example segmentation mask of the N frames of images to be processed and an optical flow map of the N frames of images to be processed, the N frames of images to be processed are obtained by shooting the same target object at different angles, and N is a natural number greater than or equal to 1;

adjusting the scene flow of the N frames of images to be processed by using the processing parameters to obtain a target scene flow;

tracking the target object by using the processing parameters and the target scene stream to obtain a first motion track of the target object;

determining a second motion trajectory of the target object based on the processing parameter and the first motion trajectory.

2. The method according to claim 1, wherein performing image processing on the acquired N frames of images to be processed to obtain processing parameters of the N frames of images to be processed comprises:

inputting the N frames of images to be processed into a determined disparity map network model to obtain a disparity map of the images to be processed output by the disparity map network model, wherein the disparity map network model is determined based on a deep learning stereo matching network;

carrying out instance segmentation and labeling on the same target object in the N frames of images to be processed by using an instance segmentation network model to obtain an instance segmentation mask of the target object;

and calculating the optical flow of the N frames of images to be processed by using an optical flow estimation network model to obtain an optical flow graph of the N frames of images to be processed.

3. The method of claim 1, wherein adjusting the scene stream of the N frames of images to be processed using the processing parameters to obtain a target scene stream comprises:

constructing an energy function based on a rotational translation matrix RT by using the disparity map, the example segmentation mask and the light flow map to obtain an RT energy function;

adjusting the RT energy function to obtain a target RT;

and adjusting the scene flow of the N frames of images to be processed by using the target RT to obtain the target scene flow.

4. The method of claim 3, wherein constructing an energy function of the target object based on an RT matrix using the disparity map, the instance segmentation mask, and the light flow map, resulting in an RT energy function, comprises:

determining an initialization RT matrix;

determining a constraint term of the RT energy function, wherein the constraint term comprises a luminosity error constraint term, a rigid fitting constraint term and an optical flow consistency constraint term, the luminosity error constraint term is used for constraining the gray value of the target object, the rigid fitting constraint term is used for constraining the three-dimensional point cloud of the target object, and the optical flow consistency constraint term is used for constraining the optical flow value of the target object;

determining the RT energy function based on the initialized RT matrix and the constraint term.

5. The method of claim 4, wherein adjusting the RT energy function to obtain a target RT comprises:

optimizing the RT energy function by using a Gauss-Newton iteration method to obtain an optimized RT energy function;

minimizing the optimized RT energy function to obtain the target RT, wherein minimizing the optimized RT energy function comprises: minimizing the photometric error constraint term, the rigid fit constraint term, and the optical flow consistency constraint term.

6. The method of claim 1, wherein tracking the target object using the processing parameters and the target scene stream to obtain a first motion trajectory of the target object comprises:

determining an instance segmentation mask of the target object from the instance segmentation masks of the N frames of images to be processed;

calculating the maximum external frame of the target object in the image to be processed based on the example segmentation mask of the target object;

determining a cost incidence matrix of the target object by using the maximum outside frame, wherein the cost incidence matrix is used for representing the incidence cost of the target object in a Kth frame of image to be processed and a Kth-1 frame of image to be processed, and K is less than or equal to the natural number of N;

and correlating the data of the target object according to the cost incidence matrix so as to output the first motion trail from the data.

7. The method of claim 1, wherein determining a second motion trajectory of a target object based on the processing parameters and the first motion trajectory comprises:

determining a two-dimensional coordinate value of the target object from the first motion track;

determining a first three-dimensional space coordinate value of the target object according to the two-dimensional coordinate value, the disparity map of the N frames of images to be processed and the calibration parameters of the camera equipment;

predicting a second three-dimensional space coordinate value of the target object in the next frame of image to be processed by using the first three-dimensional space coordinate value and a target rotation and translation matrix RT of the target object determined in the target scene stream;

determining the three-dimensional space gravity center position of the target object by using the second three-dimensional space coordinate value;

determining the second motion trajectory from the three-dimensional spatial center of gravity location.

8. An apparatus for determining a motion trajectory, comprising:

the device comprises a first determining module, a second determining module and a processing module, wherein the first determining module is used for performing image processing on the acquired N frames of images to be processed to obtain processing parameters of the N frames of images to be processed, the processing parameters comprise a disparity map of the N frames of images to be processed, an example segmentation mask of the N frames of images to be processed and an optical flow map of the N frames of images to be processed, the N frames of images to be processed are obtained by shooting the same target object at different angles, and N is a natural number greater than or equal to 1;

the second determining module is used for adjusting the scene flow of the N frames of images to be processed by using the processing parameters to obtain a target scene flow;

a third determining module, configured to track the target object by using the processing parameter and the target scene stream, to obtain a first motion trajectory of the target object;

a fourth determination module, configured to determine a second motion trajectory of the target object based on the processing parameter and the first motion trajectory.

9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1to 7 when executed.

10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1to 7.