CN116912310B

CN116912310B - Camera pose estimation method, device, computer equipment and medium

Info

Publication number: CN116912310B
Application number: CN202310861071.XA
Authority: CN
Inventors: 秦超; 黄哲; 王孝宇
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2024-10-25
Anticipated expiration: 2043-07-13
Also published as: CN116912310A

Abstract

The invention relates to the technical field of visual positioning, in particular to a camera pose estimation method, a device, computer equipment and a medium, wherein the method comprises the steps of calculating predicted pose information of a target camera at N time points according to acquired images of the target camera at N time points, calculating three-dimensional coordinate information of the road mark point according to all the acquired images and the predicted pose information, aiming at positioning data acquired by any one target camera at N time points, mapping the number of signal sources of the positioning data into confidence degrees, determining a reprojection factor and a ranging factor according to the three-dimensional coordinate information, N pieces of predicted position information, N pieces of positioning data and the confidence degrees thereof, constructing a factor graph and resolving to obtain estimated position information, determining scale information of a visual odometer through the positioning data, modeling the confidence degrees of high-noise positioning data, effectively compensating positioning errors, and improving accuracy of the visual odometer on camera position estimation.

Description

Camera pose estimation method, device, computer equipment and medium

Technical Field

The present invention relates to the field of visual positioning technologies, and in particular, to a method and apparatus for estimating a pose of a camera, a computer device, and a medium.

Background

Currently, the technology of visual synchronous positioning and mapping (Simultaneous Localization AND MAPPING, SLAM) has been widely applied to various tasks such as robot control, automatic driving, virtual Reality (VR), augmented Reality (Augmented Reality, AR), etc., to position a three-dimensional map of an environment while the robot itself is unknown in real time in an unknown environment.

In order to better estimate pose information of a robot, the existing method generally adopts a visual odometer for processing, and the visual odometer is a technology for acquiring scene information by using a camera and calculating pose change of the camera between continuous frames through the scene information. However, for a monocular camera, motion information at a real scale cannot be obtained, and even if the scale of the scene is complemented by a depth or binocular camera, accumulated errors in long-term operation are unavoidable. The stability of the visual odometer technology is greatly dependent on reliable feature points in a scene, and particularly for mobile equipment positioning in an indoor unstructured environment and an outdoor complex environment, the accuracy and the density of the feature points cannot be completely ensured. At this point, a reliable incremental positioning method, such as the global positioning system (Global Positioning System, GPS), can well compensate for the visual odometer deficiency.

Because the global positioning system has higher global consistency, the drift and the error of the visual mileage calculation method can be effectively removed by utilizing the global positioning system for positioning, and the positioning precision, accuracy and stability are improved. However, the accuracy of the global positioning system data itself cannot be guaranteed, and once the error data is used, the calculation of the whole visual odometer is likely to fail. Therefore, how to use high-noise positioning data to assist in improving accuracy of calculation of a visual odometer is a problem to be solved.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a method, an apparatus, a computer device, and a medium for estimating a pose of a camera, so as to solve the problem that under a high noise positioning data condition, the accuracy of the calculation of the auxiliary visual odometer by the positioning data is low.

In a first aspect, an embodiment of the present invention provides a method for estimating a pose of a camera, where the method for estimating a pose of a camera includes:

acquiring acquisition images acquired by acquiring the road mark points by the target camera at N preset time points in the motion process, and positioning data acquired by the target camera at N preset time points by the positioning sensors respectively;

calculating to obtain predicted pose information of the target camera at N preset time points according to all acquired images, and calculating to obtain three-dimensional coordinate information of the landmark points according to all acquired images and the predicted pose information;

for any positioning data, acquiring the number of signal sources of the positioning data, and mapping the number of signal sources into a confidence corresponding to the positioning data by using a mapping table, wherein the mapping table comprises a mapping relation between the number of signal sources and the confidence;

According to the three-dimensional coordinate information and N pieces of predicted pose information, calculating to obtain a re-projection factor corresponding to the predicted pose information, and according to all positioning data and the confidence degrees thereof, calculating to obtain a ranging factor of the target camera between every two adjacent preset time points;

And constructing a factor graph by taking the three-dimensional coordinate information and N pieces of predicted pose information as state variables and taking N-1 ranging factors and N reprojection factors as observation variables, and resolving the state variables in the factor graph to obtain estimated pose information corresponding to the N pieces of predicted pose information.

In a second aspect, an embodiment of the present invention provides a camera pose estimation apparatus, including:

The data acquisition module is used for acquiring acquisition images acquired by the target camera at N preset time points in the movement process respectively for the road sign points and positioning data acquired by the target camera at N preset time points respectively through the positioning sensors;

The pose calculation module is used for calculating and obtaining predicted pose information of the target camera at N preset time points according to all acquired images, and calculating and obtaining three-dimensional coordinate information of the road mark points according to all acquired images and the predicted pose information;

The confidence level mapping module is used for acquiring the number of signal sources of the positioning data aiming at any positioning data, mapping the number of the signal sources into the confidence level corresponding to the positioning data by using a mapping table, wherein the mapping table comprises the mapping relation between the number of the signal sources and the confidence level;

the factor calculation module is used for calculating a re-projection factor corresponding to the predicted pose information according to the three-dimensional coordinate information and the N predicted pose information, and calculating a ranging factor of the target camera between every two adjacent preset time points according to all positioning data and the confidence degrees of the positioning data;

and the factor graph calculation module is used for constructing a factor graph by taking the three-dimensional coordinate information and N pieces of predicted pose information as state variables and taking N-1 ranging factors and N reprojection factors as observation variables, and calculating the state variables in the factor graph to obtain estimated pose information corresponding to the N pieces of predicted pose information.

In a third aspect, an embodiment of the present invention provides a computer device, the computer device including a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor implementing the camera pose estimation method according to the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium storing a computer program, which when executed by a processor implements the camera pose estimation method according to the first aspect.

Compared with the prior art, the embodiment of the invention has the beneficial effects that:

Acquiring acquisition images acquired by a target camera at N preset time points respectively for the road mark points in the motion process and positioning data acquired by the target camera at N preset time points respectively through a positioning sensor, calculating to obtain predicted pose information of the target camera at N preset time points according to all the acquisition images, calculating to obtain three-dimensional coordinate information of the road mark points according to all the acquisition images and the predicted pose information, acquiring the number of signal sources of the positioning data aiming at any one positioning data, mapping the number of the signal sources into the confidence degree of the corresponding positioning data by using a mapping table, wherein the mapping table comprises the mapping relation between the number of the signal sources and the confidence degree, calculating to obtain a re-projection factor corresponding to the predicted pose information according to the three-dimensional coordinate information and the N predicted pose information, and calculating to obtain all the positioning data and the confidence degree thereof, the method comprises the steps of calculating a ranging factor of a target camera between every two adjacent preset time points, taking three-dimensional coordinate information and N pieces of predicted pose information as state variables, taking N-1 ranging factors and N pieces of re-projection factors as observation variables, constructing a factor graph, resolving the state variables in the factor graph to obtain estimated pose information corresponding to the N pieces of predicted pose information, determining scale information of a visual odometer through positioning data, restraining displacement of the target camera between the adjacent preset time points to compensate positioning errors and reduce drift of the odometer, modeling the confidence coefficient of the positioning data under the condition that the positioning data contain higher noise, and avoiding adverse effects of the wrong positioning data on a visual odometer resolving process, so that accuracy of the visual odometer on pose estimation of the camera is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application environment of a camera pose estimation method according to a first embodiment of the present invention;

fig. 2 is a flow chart of a camera pose estimation method according to a first embodiment of the present invention;

fig. 3 is a schematic structural diagram of a camera pose estimation device according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

It should be understood that the sequence numbers of the steps in the following embodiments do not mean the order of execution, and the execution order of the processes should be determined by the functions and the internal logic, and should not be construed as limiting the implementation process of the embodiments of the present invention.

In order to illustrate the technical scheme of the invention, the following description is made by specific examples.

The camera pose estimation method provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server. The client includes, but is not limited to, a palm top computer, a desktop computer, a notebook computer, an ultra-mobile personalcomputer (UMPC), a netbook, a cloud terminal device, a personal digital assistant (personal digitalassistant, PDA), and other computer devices. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

The client and the server can be deployed under a visual positioning scene, in general, a visual positioning task can be realized by a mobile robot with a camera, the client can be deployed inside the mobile robot, the server and the client can communicate in a wireless communication mode, the mobile robot respectively collects scene information in continuous frames through the camera and sends the scene information to the server, the server carries out pose estimation of the mobile robot based on the collected scene information, and as the transformation relation between the pose of the mobile robot and the pose of the camera with the mobile robot is known, namely, the camera pose estimation is carried out based on the collected scene information, thereby realizing the function of a visual odometer, being applied to various tasks such as robot control, automatic driving, virtual Reality (VR), augmented Reality (Augmented Reality, AR) and the like, taking the robot control scene as an example, the robot can control the scene and comprise functions of intelligent navigation, intelligent addressing and the like of the robot, and the current position of the mobile robot is positioned in time, so as to guide the robot to carry out subsequent route planning and route planning.

Referring to fig. 2, which is a schematic flow chart of a camera pose estimation method according to an embodiment of the present invention, the camera pose estimation method may be applied to a server in fig. 1, where the server is connected to a client to obtain an acquired image, positioning data and the number of signal sources thereof respectively acquired by the client in consecutive frames, and the server has a computing capability and is capable of performing camera pose estimation according to the acquired image, constructing a factor graph and resolving, and as shown in fig. 2, the camera pose estimation method may include the following steps:

step S201, acquiring acquisition images obtained by respectively acquiring the road mark points at N preset time points by the target camera in the movement process, and positioning data acquired by the target camera through the positioning sensors at N preset time points.

The target camera may be a camera that needs pose estimation, and the target camera may be mounted on a movable robot or a movable track, that is, the target camera is in a moving process, and the preset time point may be a plurality of time points meeting continuous sampling conditions.

The road mark point may refer to a mark point with distinction and markedness in a scene, for example, in an automatic driving scene, the road mark point may refer to a corner point of a lane line, in a robot control scene, the road mark point may refer to a corner point of a specific object, etc., the collected image may refer to an image containing the road mark point collected by a target camera at a corresponding preset time point, the positioning sensor may refer to a global positioning system, the positioning data may generally include a collection time, latitude, longitude, azimuth, magnetic declination, etc., and it is noted that the specific inclusion of the positioning data is not limited herein, and an implementer selects any one or more of the attributes that can be directly obtained from the sensor to form the positioning data, which is within the scope of the present invention.

Specifically, the target camera may be a monocular camera, and since pose estimation needs to be performed on the target camera, the target camera needs to be in a moving process, so that pose estimation of the target camera can be performed according to a triangulation method according to collected images collected by the target cameras with different poses, meanwhile, in order to ensure accuracy of pose estimation performed by the target camera, sampling moments of the target camera on the landmark points and positioning data should be continuous and compact, for example, a time difference between any two adjacent preset time points may be constrained to be Δt, Δt should be smaller than a preset time difference threshold, and the time difference threshold may be set to be 1 second.

The step of acquiring the acquired images acquired by the target camera at N preset time points respectively from the landmark points in the motion process and the step of acquiring the positioning data acquired by the target camera at N preset time points respectively through the positioning sensor provides the acquisition information of the target camera on the scene and the positioning information of the sensor, so that the basic scene information and the positioning information of the sensor are provided for the subsequent pose estimation of the target camera according to the acquisition information of the scene, and the accuracy of the pose estimation of the camera is effectively improved.

Step S202, calculating to obtain predicted pose information of the target camera at N preset time points according to all the acquired images, and calculating to obtain three-dimensional coordinate information of the road mark points according to all the acquired images and the predicted pose information.

The predicted pose information may refer to an initial result of pose estimation performed by the target camera at a corresponding preset time point, and the three-dimensional coordinate information may refer to a result of position estimation performed by the road mark point in a preset three-dimensional space.

Specifically, a preset time point corresponds to a predicted pose information, N predicted pose information is shared, three-dimensional coordinate information of the road mark point in a preset three-dimensional space can be calculated according to N acquired images and N predicted pose information, in this embodiment, the preset three-dimensional space may adopt a world coordinate system, that is, the three-dimensional coordinate information may refer to a result of estimating a position of the road mark point in the world coordinate system, and an implementer may adjust a modeling manner of the preset three-dimensional space according to actual situations.

Optionally, the number of landmark points is at least five;

calculating to obtain predicted pose information of the target camera at N preset time points according to all the acquired images, wherein the predicted pose information comprises the following steps:

For two acquired images corresponding to any two adjacent preset time points respectively, determining the image positions corresponding to any road marking points in the two acquired images respectively, and forming matching pairs by the road marking points in the image positions corresponding to the two acquired images respectively to obtain the matching pairs of the corresponding road marking points;

According to the epipolar constraint and all matching pairs, calculating to obtain a rotation matrix and a translation vector of the target camera between two adjacent preset time points;

And acquiring initial pose information of the target camera, and acquiring predicted pose information of the target camera corresponding to each of N preset time points according to the initial pose information and the rotation matrix and the translation vector of the target camera between every two adjacent preset time points.

The image position may refer to an image coordinate of a landmark point in the corresponding acquired image, the image coordinate may refer to a position in the image coordinate system of the corresponding acquired image, and one matching pair may include image positions of a single landmark point in two acquired images respectively.

The epipolar constraint can be used for estimating camera pose change information according to a plurality of groups of pixel point pairs with matching relations between two acquired images when pixel coordinates under a known pixel coordinate system are known, the rotation matrix can be used for describing the rotation condition when the camera pose is changed, and the translation vector can be used for describing the translation condition when the camera pose is changed.

The initial pose information may refer to pose information of the target camera when not moving, as a known amount.

Specifically, if the conversion relationship between the default image coordinate system and the pixel coordinate system is known, when the image position is obtained, the pixel coordinate of the road mark point in the pixel coordinate system to which the acquired image belongs is obtained, so that the pose change information of the camera can be estimated through epipolar constraint.

If the preset time points are t ₁、t₂、…、t_N respectively, the initial pose information may refer to pose information of the target camera when the preset time point is t ₁, taking the acquired image I ₁ corresponding to the preset time point t ₁ and the acquired image I ₂ corresponding to the preset time point t ₂ as examples, the pixel point corresponding to the road mark point P in the acquired image I ₁ is P ₁, and the pixel point corresponding to the road mark point P in the acquired image I ₂ is P ₂.

The coordinates of the road mark point P in the preset three-dimensional space are expressed as [ X, Y, Z ] ^T, and the transformation relationship between the pixel point P ₁ and the road mark point P can be expressed as P ₁ =kp, wherein K can refer to an internal reference matrix of a camera, which is generally known and can be obtained through camera calibration.

The transformation relationship between the pixel point P ₂ and the road marking point P may be expressed as P ₂ =k (rp+t), where R may refer to a rotation matrix between pose information of the target camera from a preset time point t ₁ to pose information of the target camera from a preset time point t ₂, and t may refer to a translation vector between pose information of the target camera from a preset time point t ₁ to pose information of the target camera from a preset time point t ₂.

Taking x ₁＝K^-1p₁,x₂＝K^-1p₂, where x ₁ and x ₂ may refer to coordinates of a pixel point p ₁ and a pixel point p ₂ on a normalized plane, combining the coordinates with the above formula p ₁ =kp and p ₂ =k (rp+t) to obtain x ₂＝Rx₁ +t, calculating the outer product of both sides of the formula and t at the same time to obtain t x ₂＝t^Rx₁, where t may represent the outer product of both sides of the formula and t, and multiplying both sides of the formula to the left at the same timeCan obtain Since t ₂ is perpendicular to both t and x ₂, the0, Then getIn combination with the above x ₁＝K^-1p₁,x₂＝K^-1p₂ to obtainAnd obtaining the expression of epipolar constraint.

Accordingly, the essential matrix E can be expressed as e=t≡r, and the basic matrix F as f=k ^-Tt^RK^-1, then there isFor simplicity of calculation, the essential matrix E is usually found from the matched pair, and then R and t are found from E.

Because the translation and rotation processes have three degrees of freedom, the essence matrix comprises six degrees of freedom, but because the essence matrix has scale equivalence, namely the essence matrix still satisfies epipolar constraint when multiplying any non-zero constant, the essence matrix actually comprises five degrees of freedom, namely the computation of the essence matrix can adopt 5 matching pairs at least, therefore, the quantity of road mark points is limited to be at least 5, and it is required to explain that partial road mark points can not be acquired in the moving process of a target camera, therefore, an implementer can reasonably increase the road mark points in the implementation process so as to ensure that at least five matching pairs exist between acquired images acquired at any two adjacent preset time points, in general, eight matching pairs can be adopted to estimate the essence matrix E, each matching pair can construct a linear equation with the essence matrix E, and the estimation process can be realized by adopting a linear equation set to calculate.

After the essential matrix E is obtained, the rotation matrix R and the translation vector t can be obtained by adopting a singular value decomposition mode, the calculation modes of the rotation matrix and the translation vector are not limited, and an operator selects any calculation mode to calculate the rotation matrix and the translation vector, so that the calculation is within the protection scope of the invention.

Because the initial pose information of the target camera is known, according to the rotation matrix and the translation vector between the pose information of the target camera from the preset time point t ₁ to the pose information of the target camera at the preset time point t ₂, the predicted pose information of the target camera at the preset time point t ₂ can be obtained by calculation through combining the initial pose information, and the like, so that the predicted pose information of the target camera respectively corresponding to the N preset time points can be obtained.

In this embodiment, pose estimation of the target camera is performed at N preset time points based on the acquired images in combination with initial pose information of the target camera, so as to obtain predicted pose information corresponding to the target camera at the N preset time points, and provide a basis for pose information calculation corresponding to the subsequent target camera at the N preset time points, so that the duration of pose information estimation is effectively shortened, and the efficiency of camera pose estimation is improved.

Optionally, calculating three-dimensional coordinate information of the road mark point according to all acquired images and predicted pose information, including:

And aiming at two acquired images corresponding to any two adjacent preset time points respectively, calculating three-dimensional coordinate information of the road mark point by adopting a triangulation method according to a rotation matrix and a displacement vector of the target camera between the two adjacent preset time points and the image positions corresponding to the road mark point in the two acquired images respectively.

The triangulation method can be used for calculating the position information of the road marking point in a preset three-dimensional space, namely the three-dimensional coordinate information of the road marking point.

Specifically, along the above example, x ₁ and x ₂ may refer to coordinates of the pixel point p ₁ and the pixel point p ₂ on a normalized plane, and s ₁x₁＝s₂Rx₂ +t may be satisfied, where s ₁ may be used to represent depth information of the pixel point p ₁ and s ₂ may be used to represent depth information of the pixel point p ₂.

For the above formula, s ₁x₁^x₁＝0＝s₂x₁^Rx₂+x₁ t can be obtained by multiplying x ₁ at the same time on both sides, so that s ₁ and s ₂ can be calculated, but because of noise, it is often difficult for the estimated R and t to make s ₁x₁^x₁ and s ₂x₁^Rx₂+x₁ t equal to 0, so that the formula s ₁x₁^x₁＝s₂x₁^Rx₂+x₁ t is usually solved by the least square method.

According to the obtained depth information, the three-dimensional coordinate information of the road mark point can be obtained, and since the embodiment can calculate one three-dimensional coordinate information according to two acquired images corresponding to each two adjacent preset time points, N-1 three-dimensional coordinate information can be calculated, and an implementer can determine the final three-dimensional coordinate information of the N-1 three-dimensional coordinate information in a mode of taking the mean value, the mode, the median and the like, and the implementation is not limited in particular.

In the embodiment, three-dimensional coordinate information of the landmark points is estimated by a triangulation method to obtain more accurate space position description of the landmark points, and reference information can be provided for a subsequent pose estimation process, so that the accuracy of pose estimation is improved.

Step S203, for any positioning data, the number of signal sources of the positioning data is obtained, and the mapping table is used to map the number of signal sources to the confidence of the corresponding positioning data.

The mapping table comprises a mapping relation between the number of signal sources and the confidence coefficient, wherein the number of signal sources can be the number of search satellites, the number of search satellites can be the number of positioning satellites searched by the designated bit data, and the confidence coefficient can be used for representing the credibility of the positioning data.

Specifically, the more the number of signal sources is, the more reliable the corresponding positioning data is indicated, that is, the higher the confidence coefficient of the corresponding positioning data is, in this embodiment, the content in the mapping table includes that if the number of signal sources is greater than or equal to 0 and less than 4, the confidence coefficient of the corresponding positioning data is set to 0, if the number of signal sources is greater than or equal to 7, the confidence coefficient of the corresponding positioning data is set to 1, otherwise, the confidence coefficient is set to 0.7, that is, if the number of signal sources is greater than or equal to 4 and less than 7, the confidence coefficient of the corresponding positioning data is set to 0.7.

In one embodiment, the relationship between the number of signal sources and the confidence level may be determined by a mapping function, e.g., the mapping function may be set to:

wherein x can represent the number of signal sources of positioning data, y can represent the confidence coefficient of the positioning data, and the value range of x is an integer greater than or equal to zero, then through the mapping function, when the number of signal sources x is greater, the confidence coefficient y is closer to 1, and when the number of signal sources x is smaller, the confidence coefficient y is closer to 0.

It should be noted that, the selection manner of the mapping function is not limited herein, and any other mapping function is selected by the implementer to have the same effect as the mapping function in the above example, which is within the scope of the present invention.

The step of obtaining the number of the signal sources of the positioning data according to any positioning data and mapping the number of the signal sources into the confidence coefficient of the corresponding positioning data by using the mapping table can model the confidence coefficient of the positioning data under the condition that the positioning data is in high noise, so that the adverse effect of the erroneous positioning data on the whole pose estimation process is avoided, the pose estimation accuracy is reduced, and the follow-up pose estimation accuracy is improved.

Step S204, according to the three-dimensional coordinate information and the N pieces of predicted pose information, a re-projection factor corresponding to the predicted pose information is calculated, and according to all positioning data and the confidence degrees thereof, a distance measurement factor of the target camera between every two adjacent preset time points is calculated.

The re-projection factor can be used for describing the relation between the three-dimensional coordinate information and the predicted pose, and the ranging factor can be used for describing the relation between the predicted poses corresponding to the target cameras between two adjacent preset time points.

According to the steps of calculating the re-projection factors corresponding to the predicted pose information according to the three-dimensional coordinate information and the N predicted pose information, calculating the ranging factors of the target camera between every two adjacent preset time points according to all positioning data and the confidence degrees of the positioning data, modeling the camera motion problem in a factor form, and facilitating the calculation of a subsequent constructed factor graph, so that the probability modeling mode can be applied to the situation of predicting the state quantity through the observed quantity under the actual application.

In step S205, three-dimensional coordinate information and N pieces of predicted pose information are used as state variables, N-1 ranging factors and N pieces of reprojection factors are used as observation variables, a factor graph is constructed, and the state variables in the factor graph are solved to obtain estimated pose information corresponding to the N pieces of predicted pose information.

The state variables may include pose variables and landmark variables, the pose variables may refer to N pieces of predicted pose information to be updated, the landmark variables may refer to three-dimensional coordinate information, the observation variables may refer to observation results, the factor graph may refer to a graph representation of a conditional probability product of the observation variables, and the estimated pose information may refer to pose information of the updated target camera.

Optionally, the three-dimensional coordinate information and the N predicted pose information are used as state variables, the N-1 ranging factors and the N re-projection factors are used as observation variables, and the factor graph is constructed, including:

The prior factors are obtained, three-dimensional coordinate information and N pieces of predicted pose information are used as state variables, and N-1 ranging factors and N pieces of reprojection factors are used as observation variables;

And combining the prior factor, the state variable and the observation variable to construct a factor graph.

The pose estimation process of the target camera generally fixes the first frame as the world origin, so the prior factor is added in the factor graph.

In this embodiment, by adding the prior factor, the solution of the whole factor graph solving process is fixed, so that the condition that the state variable to be updated has a plurality of solutions of numerical values is avoided, the effect of numerical value resolvable is achieved, and the feasibility of factor graph calculation is improved.

Optionally, the three-dimensional coordinate information and the N predicted pose information are used as state variables, the N-1 ranging factors and the N re-projection factors are used as observation variables, a factor graph is constructed, the state variables in the factor graph are solved, and estimated pose information corresponding to the N predicted pose information is obtained, including:

According to the three-dimensional coordinate information and the N pieces of predicted pose information, calculating to obtain projection positions corresponding to the predicted pose information, and calculating the distance between the projection positions corresponding to the predicted pose information and the image positions of the landmark points in the acquired images corresponding to the predicted pose information according to any piece of predicted pose information to obtain the position distance corresponding to the predicted pose information;

And taking the sum of all the position distances as a reprojection error, and calculating the state variables in the factor graph according to the reprojection error to obtain estimated pose information corresponding to the N pieces of estimated pose information.

The projection position may refer to a predicted observation result of a road mark point of known three-dimensional coordinate information under predicted pose information, and the position distance may refer to a pixel point distance of the projection position and the real observation position under an image coordinate system.

The reprojection error can represent the difference between the predicted position of the road marking point and the corresponding real observation position which are respectively calculated based on the pose information, and provide constraint for state variable calculation, so that the state variable calculation result can meet the condition that the reprojection error is as small as possible, namely the difference between the predicted position of the road marking point and the corresponding real observation position which are respectively calculated based on the pose information is as small as possible.

In this embodiment, constraint is provided to the calculation process of the state variable by the reprojection error of the road sign point, so that the projection result of the calculated estimated pose information on the road sign point can be consistent with the image information acquired truly as much as possible, that is, with the actual observation position, thereby ensuring the accuracy of the estimated pose information.

Optionally, after taking the sum of all the position distances as the reprojection error, the method further comprises:

Aiming at any two adjacent preset time points, calculating to obtain a predicted movement amount according to the predicted pose information respectively corresponding to the two adjacent preset time points;

Calculating a reference movement amount according to the positioning data and the confidence coefficient of the positioning data corresponding to the two adjacent preset time points respectively, and calculating a difference value between the predicted movement amount and the reference movement amount to obtain a movement amount difference value between the two adjacent preset time points;

Taking the sum of all the moving amount difference values as a moving error;

Correspondingly, according to the reprojection error, calculating the state variables in the factor graph to obtain estimated pose information corresponding to the N predicted pose information includes:

And according to the reprojection error and the movement error, calculating state variables in the factor graph to obtain estimated pose information corresponding to the N pieces of predicted pose information.

The predicted movement amount may refer to predicted movement information of the target camera calculated according to the predicted pose information corresponding to each of the two adjacent preset time points, the reference movement amount may refer to real movement information of the target camera calculated according to the positioning data corresponding to each of the two adjacent preset time points, and the movement amount difference may refer to a difference between the predicted movement amount and the reference movement amount.

Specifically, in this embodiment, the movement error may represent a difference between the predicted movement information of the target camera calculated based on the respective pose information and the real movement information calculated based on the corresponding positioning data, and the noise information of the reference movement amount is modeled by the confidence degrees of the positioning data corresponding to the two adjacent preset time points, respectively.

For example, if the product of the confidence levels of the two adjacent preset time points respectively corresponding to the positioning data is 1, the noise amplitude is set to 0.001.

If the product of the confidence coefficients of the positioning data corresponding to the two adjacent preset time points is 0.7, the noise amplitude is set to 0.1, if the product of the confidence coefficients of the positioning data corresponding to the two adjacent preset time points is 0.49, the noise amplitude is set to 1.0, and if the product of the confidence coefficients of the positioning data corresponding to the two adjacent preset time points is 0, the noise amplitude is set to 1000, wherein the noise can adopt conventional noise such as Gaussian noise, spiced salt noise and the like, the noise is not limited in the embodiment, and an implementation person selects any noise form to carry out noise adding processing within the protection scope of the invention.

In this embodiment, constraint is provided to the calculation process of the state variable by the movement error of the target camera, so that the movement distance calculated by the calculated estimated pose information can be as consistent as possible with the movement distance calculated based on the positioning data, and the movement distance calculated based on the positioning data is used as the real movement distance, thereby ensuring the accuracy of the estimated pose information.

Optionally, the calculating the state variable in the factor graph to obtain estimated pose information corresponding to the N predicted pose information includes:

And calculating the state variables in the factor graph by adopting a least square method to obtain estimated pose information corresponding to the N pieces of predicted pose information respectively.

In this embodiment, the objective function in the factor graph calculation process may be expressed as a product of factors corresponding to all the observed variables, so that the factors corresponding to all the observed variables all adopt a negative exponential form, and if the objective function is taken as a negative logarithm, the negative exponential function maximization problem is converted into a nonlinear least square problem, so that the solution can be performed by using a least square method.

The least square method solving process can adopt an iterative method, such as a Gauss Newton iterative method and the like, is not limited herein, and an implementer selects any calculation mode to carry out least square method solving, so that the least square method solving process is within the protection scope of the invention.

It should be noted that, the process of calculating the state variable is an iterative process, when the iterative process is not terminated, the state variable needs to be updated by the calculation result of each iteration round, and the state variable in the factor graph needs to be calculated again until the iterative process is terminated, and the condition of terminating the iterative process may refer to convergence of the reprojection error and/or convergence of the movement error, or may refer to limitation of the number of iteration rounds, and an implementer may flexibly adjust the condition of terminating the iterative process according to the actual situation.

In the embodiment, the factor graph calculation is converted into the least square method solving process, so that the calculating process is convenient to calculate, and the factor graph calculation efficiency, namely the camera pose estimation efficiency, is improved.

The method comprises the steps of taking three-dimensional coordinate information and N pieces of predicted pose information as state variables, taking N-1 ranging factors and N pieces of reprojection factors as observation variables, constructing a factor graph, resolving the state variables in the factor graph to obtain estimated pose information corresponding to the N pieces of predicted pose information, and converting a camera pose estimation task into a factor graph calculation task, so that large-scale optimal estimation is performed in real time, and the efficiency of camera pose estimation is improved.

In the embodiment, the scale information of the visual odometer is determined through the positioning data, the displacement of the target camera between adjacent preset time points is restrained to compensate positioning errors and reduce drift of the odometer, and under the condition that the positioning data contains higher noise, the confidence level of the positioning data is modeled, so that adverse effects of the erroneous positioning data on the visual odometer calculation process are avoided, and the accuracy of estimating the pose of the camera by the visual odometer is improved.

Fig. 3 shows a block diagram of a camera pose estimation device according to a second embodiment of the present invention, where the camera pose estimation device is applied to a server, and the server is connected with a client to obtain an acquisition image, positioning data and the number of signal sources thereof respectively acquired by the client in consecutive frames, and the server has a computing capability and is capable of performing camera pose estimation according to the acquisition image, constructing a factor graph, and resolving, so that only a portion related to the embodiment of the present invention is shown for convenience of explanation.

Referring to fig. 3, the camera pose estimation apparatus includes:

The data acquisition module 31 is configured to acquire acquired images acquired by the target camera at N preset time points during the movement to the landmark points, and positioning data acquired by the target camera at N preset time points by the positioning sensors, respectively;

The pose calculation module 32 is configured to calculate predicted pose information of the target camera at N preset time points according to all the acquired images, and calculate three-dimensional coordinate information of the road mark point according to all the acquired images and the predicted pose information;

The confidence level mapping module 33 is configured to obtain, for any positioning data, a number of signal sources of the positioning data, map the number of signal sources to a confidence level of the corresponding positioning data using a mapping table, where the mapping table includes a mapping relationship between the number of signal sources and the confidence level;

the factor calculation module 34 is configured to calculate a re-projection factor corresponding to the predicted pose information according to the three-dimensional coordinate information and the N predicted pose information, and calculate a ranging factor of the target camera between every two adjacent preset time points according to all positioning data and the confidence degrees thereof;

The factor graph calculation module 35 is configured to construct a factor graph from three-dimensional coordinate information and N predicted pose information as state variables, and from N-1 ranging factors and N reprojection factors as observation variables, and to calculate the state variables in the factor graph, thereby obtaining estimated pose information corresponding to the N predicted pose information.

Optionally, the number of landmark points is at least five;

The above-described pose calculation module 32 includes:

The position matching unit is used for determining the image positions of any road mark point in the two acquired images corresponding to any two adjacent preset time points respectively, and forming a matching pair by the image positions of the road mark point in the two acquired images corresponding to the road mark point respectively to obtain a matching pair of the corresponding road mark point;

the parameter calculation unit is used for calculating and obtaining a rotation matrix and a translation vector of the target camera between two adjacent preset time points according to the epipolar constraint and all matching pairs;

The pose estimation unit is used for acquiring initial pose information of the target camera, and obtaining predicted pose information of the target camera corresponding to N preset time points respectively according to the initial pose information and the rotation matrix and the translation vector of the target camera between every two adjacent preset time points.

Optionally, the pose calculating module 32 includes:

the coordinate calculation unit is used for calculating three-dimensional coordinate information of the road mark point by adopting a triangulation method according to a rotation matrix and a translation vector of the target camera between any two adjacent preset time points and the image positions of the road mark point in the two acquired images, wherein the two acquired images correspond to any two adjacent preset time points respectively.

Optionally, the factor graph calculation module 35 includes:

The prior factor unit is used for acquiring prior factors, taking three-dimensional coordinate information and N pieces of predicted pose information as state variables, and taking N-1 ranging factors and N pieces of re-projection factors as observation variables;

And the joint construction unit is used for combining the prior factors, the state variables and the observed variables to construct a factor graph.

Optionally, the factor graph calculation module 35 further includes:

a distance calculating unit, configured to calculate, according to the three-dimensional coordinate information and the N pieces of predicted pose information, a projection position corresponding to the predicted pose information, and calculate, for any piece of predicted pose information, a distance between the projection position corresponding to the predicted pose information and an image position of a landmark point in an acquired image corresponding to the predicted pose information, to obtain a position distance corresponding to the predicted pose information;

And the first error determining unit is used for calculating the state variables in the factor graph according to the reprojection errors by taking the sum of all the position distances as the reprojection errors to obtain estimated pose information corresponding to the N pieces of predicted pose information.

Optionally, the factor graph calculation module 35 further includes:

The movement amount calculation unit is used for calculating and obtaining a predicted movement amount according to the predicted pose information corresponding to any two adjacent preset time points respectively;

the difference value calculation unit is used for calculating a reference movement amount according to the positioning data and the confidence coefficient thereof corresponding to the two adjacent preset time points respectively, calculating the difference value between the predicted movement amount and the reference movement amount, and obtaining a movement amount difference value between the two corresponding adjacent preset time points;

a second error determination module, configured to take the sum of all the movement amount differences as a movement error;

accordingly, the first error determination unit includes:

and the joint estimation subunit is used for calculating the state variables in the factor graph according to the reprojection error and the movement error to obtain estimated pose information corresponding to the N pieces of predicted pose information.

Optionally, the factor graph calculation module 35 includes:

And the pose estimation unit is used for calculating the state variables in the factor graph by adopting a least square method to obtain estimated pose information corresponding to the N pieces of predicted pose information respectively.

It should be noted that, because the content of information interaction, execution process and the like between the modules, units and sub-units is based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

Fig. 4 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. As shown in fig. 4, the computer device of this embodiment includes: at least one processor (only one shown in fig. 4), a memory, and a computer program stored in the memory and executable on the at least one processor, the processor executing the computer program to perform the steps of any of the various camera pose estimation method embodiments described above.

The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 4 is merely an example of a computer device and is not intended to be limiting, and that a computer device may include more or less components than those illustrated, or may combine some components, or may include different components, such as a network interface, a display screen, an input device, etc., and that in this embodiment, the computer device is communicatively coupled to an external acquisition device that includes an image acquisition device and a sensor, and that the acquisition device may be deployed on a mobile robot to provide acquired image and sensor data to the computer device.

The Processor may be a CPU, but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application specific integrated circuits (Application SpecificIntegrated Circuit, ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory includes a readable storage medium, an internal memory, etc., where the internal memory may be the memory of the computer device, the internal memory providing an environment for the execution of an operating system and computer-readable instructions in the readable storage medium. The readable storage medium may be a hard disk of a computer device, and in other embodiments may be an external storage device of a computer device, for example, a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), etc. that are provided on a computer device. Further, the memory may also include both internal storage units and external storage devices of the computer device. The memory is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs such as program codes of computer programs, and the like. The memory may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that the above-described functional units are merely illustrated in terms of division for convenience and brevity, and that in practical applications, the above-described functional units may be allocated to different functional units as needed, i.e., the internal structure of the apparatus may be divided into different functional units or modules to perform all or part of the above-described functions. The functional units in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above-described embodiment, and may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiment described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The present invention may also be implemented as a computer program product for implementing all or part of the steps of the method embodiments described above, when the computer program product is run on a computer device, causing the computer device to execute the steps of the method embodiments described above.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus/computer device and method may be implemented in other manners. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. The camera pose estimation method is characterized by comprising the following steps of:

and constructing a factor graph by taking the three-dimensional coordinate information and N predicted pose information as state variables, taking N-1 ranging factors and N re-projection factors as observation variables, resolving the state variables in the factor graph to obtain N estimated pose information corresponding to the predicted pose information, wherein the state variables comprise pose variables and landmark point variables, the pose variables refer to the N predicted pose information, the landmark point variables refer to the three-dimensional coordinate information, the re-projection factors are observation variables used for describing the relationship between the three-dimensional coordinate information and the predicted pose, the ranging factors are observation variables used for describing the relationship between the predicted poses respectively corresponding to the target cameras between two adjacent preset time points, the factor graph refers to a graph representation of conditional probability products of the observation variables, and the estimated pose information refers to updated pose information of the target camera.

2. The camera pose estimation method according to claim 1, wherein the number of road mark points is at least five;

the calculating to obtain the predicted pose information of the target camera at N preset time points according to all the acquired images includes:

For two acquired images corresponding to any two adjacent preset time points respectively, determining the image positions corresponding to any road marking points in the two acquired images respectively, and forming matching pairs by the road marking points in the image positions corresponding to the two acquired images respectively to obtain the matching pairs corresponding to the road marking points;

According to the epipolar constraint and all matching pairs, calculating to obtain a rotation matrix and a translation vector of the target camera between the two adjacent preset time points;

and acquiring initial pose information of the target camera, and acquiring predicted pose information of the target camera corresponding to N preset time points respectively according to the initial pose information and a rotation matrix and a translation vector of the target camera between every two adjacent preset time points.

3. The method according to claim 2, wherein the calculating three-dimensional coordinate information of the landmark point according to all acquired images and the predicted pose information includes:

and aiming at two acquired images corresponding to any two adjacent preset time points respectively, calculating three-dimensional coordinate information of the road marking points by adopting a triangulation method according to a rotation matrix and a translation vector of the target camera between the two adjacent preset time points and the image positions corresponding to the road marking points in the two acquired images respectively.

4. The camera pose estimation method according to claim 1, wherein constructing a factor graph from the three-dimensional coordinate information, N pieces of predicted pose information as state variables, and N-1 pieces of ranging factors and N pieces of re-projection factors as observation variables, comprises:

Acquiring prior factors, wherein the three-dimensional coordinate information and N pieces of predicted pose information are used as the state variables, and N-1 ranging factors and N re-projection factors are used as the observation variables;

and combining the prior factor, the state variable and the observed variable to construct the factor graph.

5. The camera pose estimation method according to claim 1, wherein the constructing a factor graph from the three-dimensional coordinate information and N pieces of predicted pose information as state variables and from N-1 pieces of ranging factors and N pieces of re-projection factors as observation variables, and resolving the state variables in the factor graph to obtain N pieces of estimated pose information corresponding to the predicted pose information includes:

Calculating a projection position corresponding to the predicted pose information according to the three-dimensional coordinate information and the N predicted pose information, and calculating a distance between the projection position corresponding to the predicted pose information and an image position of the landmark point in an acquired image corresponding to the predicted pose information according to any one of the predicted pose information to obtain a position distance corresponding to the predicted pose information;

and taking the sum of all the position distances as a reprojection error, and calculating state variables in the factor graph according to the reprojection error to obtain estimated pose information corresponding to the N predicted pose information.

6. The camera pose estimation method according to claim 5, further comprising, after said taking the sum of all the position distances as the re-projection error:

Aiming at any two adjacent preset time points, calculating to obtain a predicted movement amount according to the predicted pose information corresponding to the two adjacent preset time points respectively;

Calculating a reference movement amount according to the positioning data and the confidence coefficient thereof respectively corresponding to the two adjacent preset time points, and calculating a difference value between the predicted movement amount and the reference movement amount to obtain a movement amount difference value corresponding to the two adjacent preset time points;

Taking the sum of all the moving amount difference values as a moving error;

correspondingly, the calculating the state variables in the factor graph according to the reprojection error to obtain estimated pose information corresponding to the N pieces of predicted pose information includes:

And according to the reprojection error and the movement error, calculating state variables in the factor graph to obtain estimated pose information corresponding to the N pieces of estimated pose information.

7. The method according to any one of claims 1 to 6, wherein the calculating the state variables in the factor graph to obtain estimated pose information corresponding to N pieces of predicted pose information includes:

and solving the state variables in the factor graph by adopting a least square method to obtain estimated pose information corresponding to the N pieces of predicted pose information respectively.

8. A camera pose estimation device, characterized in that the camera pose estimation device comprises:

The factor graph calculation module is configured to construct a factor graph from the three-dimensional coordinate information and N predicted pose information as state variables, and from N-1 ranging factors and N re-projection factors as observation variables, calculate the state variables in the factor graph to obtain N estimated pose information corresponding to the predicted pose information, where the state variables include pose variables and landmark variables, the pose variables refer to the N predicted pose information, the landmark variables refer to the three-dimensional coordinate information, the re-projection factors are observation variables for describing a relationship between the three-dimensional coordinate information and the predicted pose, the ranging factors are observation variables for describing a relationship between predicted poses corresponding to the target cameras between two adjacent preset time points, the factor graph refers to a graph representation of a conditional probability product of the observation variables, and the estimated pose information refers to updated pose information of the target cameras.

9. A computer device, characterized in that it comprises a processor, a memory and a computer program stored in the memory and executable on the processor, which processor implements the camera pose estimation method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the camera pose estimation method according to any one of claims 1 to 7.