CN118505756A

CN118505756A - Pose generation method and device, electronic equipment, storage medium, product and vehicle

Info

Publication number: CN118505756A
Application number: CN202410964372.XA
Authority: CN
Inventors: 黄伟浩; 梁丰收; 陈启元; 张仁凯; 周维星
Original assignee: BYD Co Ltd
Current assignee: BYD Co Ltd
Priority date: 2024-07-18
Filing date: 2024-07-18
Publication date: 2024-08-16

Abstract

The embodiment of the application discloses a pose generation method, a pose generation device, electronic equipment, a storage medium, a product and a vehicle. The method comprises the following steps: generating a first target pose of the moving object according to the image acquisition data of the moving object, and generating a second target pose of the moving object according to the wheel speed acquisition data and the movement acquisition data of the moving object; and carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object, so that pose estimation can be accurately carried out on the moving object in different environments, the precision of the pose estimation of the moving object is improved, and meanwhile, the application range of the visual odometer can be enlarged.

Description

Pose generation method and device, electronic equipment, storage medium, product and vehicle

Technical Field

The application relates to the technical field of perception, in particular to a pose generation method, a pose generation device, electronic equipment, a storage medium, a product and a vehicle.

Background

A visual odometer (Visual Odometry, VO) is a technique for estimating pose changes of a camera between successive frames by analyzing a sequence of images, and thus calculating a motion trajectory of the camera in a three-dimensional space, and is mainly applied to the fields of automatic driving, mobile robots, augmented Reality (Augmented Reality, AR), virtual Reality (VR), and the like. However, when the visual odometer is used for pose estimation by adopting images with different textures, the pose estimation accuracy is poor due to the fact that the visual odometer is single in mode.

Disclosure of Invention

Therefore, the pose generating method, the pose generating device, the electronic equipment, the storage medium, the product and the vehicle can improve the precision of pose estimation under various texture environments and enlarge the application range of the visual odometer.

In a first aspect, an embodiment of the present application provides a pose generation method, including:

generating a first target pose of the moving object according to the image acquisition data of the moving object, and generating a second target pose of the moving object according to the wheel speed acquisition data and the movement acquisition data of the moving object;

And carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object.

In a second aspect, an embodiment of the present application further provides a pose generating device, including:

the generating unit is used for generating a first target pose of the moving object according to the image acquisition data of the moving object, and generating a second target pose of the moving object according to the wheel speed acquisition data and the movement acquisition data of the moving object;

And the fusion unit is used for carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object.

In a third aspect, an embodiment of the present application further provides an electronic device, including a memory storing a plurality of instructions; the processor loads instructions from the memory to perform the pose generation method of the first aspect.

In a fourth aspect, an embodiment of the present application further provides a computer readable storage medium, where a plurality of instructions are stored, where the instructions are adapted to be loaded by a processor, to perform the pose generation method of the first aspect.

In a fifth aspect, embodiments of the present application also provide a computer program product comprising a computer program or instructions for executing the pose generation method of the first aspect by a processor.

In a sixth aspect, the present application further provides a vehicle, which includes the pose generation device provided in the second aspect, or implements the pose generation method provided in the first aspect when executed.

In the pose generation method provided by the application, the first target pose of the moving object is generated through the image acquisition data of the moving object, the second target pose of the moving object is generated according to the wheel speed acquisition data and the motion acquisition data of the moving object, and the first target pose and the second target pose are weighted and fused to obtain the final target pose of the moving object, so that pose estimation can be accurately performed on the moving object under different environments, the accuracy of the pose estimation of the moving object is improved, and the application range of the visual odometer is expanded.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an application scene diagram of a pose generation method provided by an embodiment of the present application;

fig. 2 is a schematic flow chart of a pose generation method according to an embodiment of the present application;

fig. 3 is a second flow chart of a pose generation method according to an embodiment of the present application;

Fig. 4 is a third flow chart of a pose generation method according to an embodiment of the present application;

Fig. 5 is a fourth flowchart of a pose generation method according to an embodiment of the present application;

fig. 6 is a fifth flowchart of a pose generation method according to an embodiment of the present application;

Fig. 7 is a sixth flowchart of a pose generation method according to an embodiment of the present application;

fig. 8 is a seventh flowchart of a pose generation method according to an embodiment of the present application;

Fig. 9 is an eighth flowchart of a pose generation method according to an embodiment of the present application;

fig. 10 is a schematic diagram of a pose generation device according to an embodiment of the present application;

Fig. 11 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1, fig. 1 is an application scenario diagram of a pose generation method according to an embodiment of the present application. As shown in fig. 1, the pose generation method may be applied to estimate the pose of a moving vehicle, which includes a perception system 100, the perception system 100 including a visual odometer 110, a wheel speed odometer 120, and an inertial measurement unit 130.

Odometers are a method of estimating the change in position of an object over time using data obtained from a motion sensor. The odometer may be used to estimate the pose of a moving object. The odometer can be classified into a wheel speed odometer, a visual odometer, a laser radar odometer, and the like according to the difference of application sensors.

In practical application, the wheel speed odometer cannot accurately measure due to tire slip, abrasion and the like, and has limited positioning accuracy; the visual odometer has high positioning accuracy, but the positioning quality depends on the image quality, and the accuracy of the odometer can be influenced when the light is too strong or too weak, and even the positioning failure can occur; the result of the lidar odometer is related to the reflectivity of the object in the environment, and the lidar cannot detect the presence of the object when it encounters a totally reflecting object. Therefore, it is difficult to ensure the positioning accuracy and reliability of the moving object by using a single sensor, and the multi-sensor integrated odometer becomes the main stream direction of the current positioning.

The pose generation method provided by the application can be applied to estimating the pose of the vehicle, and can also be applied to estimating the pose of a mobile robot (such as a sweeping robot) with a wheeled structure, and the pose estimation can be obtained through multi-sensor fusion odometry.

It should be noted that, the application scenario described in the foregoing embodiment of the present application is for more clearly describing the technical solution of the embodiment of the present application, and does not constitute a limitation on the technical solution provided by the embodiment of the present application, and as a person of ordinary skill in the art can know, with the appearance of a new application scenario, the technical solution provided by the embodiment of the present application is also applicable to similar technical problems.

The pose generation method is described in detail below.

As shown in fig. 2, the method includes the following steps 210 and 220.

Step 210, generating a first target pose of the moving object according to the image acquisition data of the moving object, and generating a second target pose of the moving object according to the wheel speed acquisition data and the movement acquisition data of the moving object.

In this embodiment, the moving object may be a moving vehicle, the image acquisition data is image data acquired by a sensor corresponding to a visual odometer, the wheel speed acquisition data may be used to calculate a wheel speed of the moving object, the wheel speed acquisition data may be at least one of chassis data information, motor rotation speed data information, wheel speed data and other information of the moving object, the wheel speed acquisition data preferably selects the chassis data information, and if the moving object is a moving vehicle, the chassis data information may include motor rotation speed information or wheel speed information, pedal depth information and possibly used steering angle information; if the moving object is a moving robot, the chassis data information may be motor rotation speed information. The motion acquisition data can be data information acquired by an inertial measurement unit, the inertial measurement unit is used for measuring linear acceleration, angular velocity and direction of an object, and the inertial measurement unit can obtain information such as attitude, velocity and displacement of a carrier after algorithm processing by measuring the linear acceleration and the angular velocity from three directions.

Specifically, after the image acquisition data of the moving object is acquired, the first target pose of the moving object can be generated according to the image acquisition data. However, when the moving object is in an extreme environment, the visual odometer may fail, so that the pose of the moving object cannot be estimated by adopting a direct method or a characteristic point method of the visual odometer.

Therefore, the application adopts the visual odometer to estimate the pose (first target pose) of the moving object, and increases the wheel speed odometer and the inertia measuring unit to estimate the pose (second target pose) of the moving object, namely adopts the wheel speed inertia odometer to estimate the pose of the moving object, specifically, the second target pose of the moving object is generated according to the wheel speed acquisition data and the motion acquisition data of the moving object, and then the first target pose and the second target pose are fused to obtain the final pose of the moving object, namely the final target pose, thereby avoiding the additional addition of laser radar, GPS and other equipment, reducing the pose estimating cost, improving the pose estimating accuracy and improving the robustness of the visual odometer.

In some embodiments, as shown in fig. 3, step 210 includes step 211, step 212, and step 213.

Step 211, determining environmental information of an image corresponding to the image acquisition data according to the image acquisition data;

Step 212, if the environmental information is preset environmental information, processing the image acquisition data by adopting a direct method to obtain a first target pose of the moving object;

And 213, if the environmental information is not the preset environmental information, processing the image acquisition data by adopting a characteristic point method to obtain a first target pose of the moving object.

The implementation method of the visual odometer is mainly divided into a characteristic point method and a direct method, wherein the characteristic point method is used for estimating camera motion by extracting, describing and matching significant characteristic points in an image. The method comprises the specific steps of extracting feature points, calculating descriptors and matching the feature points, wherein the feature point method is stable in operation, can achieve high positioning accuracy and is insensitive to movement and illumination conditions, but positioning failure can be caused by insufficient feature points in low-texture environments; the direct method can estimate the movement of the camera according to the brightness information of the pixels, does not need to calculate characteristic points and descriptors, has better robustness in low-texture environments, and has strict requirements on movement and illumination conditions.

Therefore, when the first target pose of the moving object is generated, the environment information of the image corresponding to the image acquisition data can be determined through the image acquisition data of the moving object, and then the image acquisition data can be processed by adopting a corresponding algorithm so as to estimate the pose of the moving object.

The environment information is texture information of an image corresponding to the image acquisition data, the preset environment information is data information of a low-texture image corresponding to the image acquisition data, and if the image corresponding to the environment information image acquisition data is the low-texture image, a direct method can be adopted to process the image acquisition data so as to estimate the pose of the moving object; if the image corresponding to the image acquisition data of the environment information display image is not a low-texture image, the image acquisition data can be processed by adopting a characteristic point method so as to estimate the pose of the moving object, further the problem of poor stability of the positioning effect of the visual odometer under various texture environments is solved, the positioning precision of the visual odometer is improved, and the positioning effect of the fusion odometer is improved.

Specifically, if the image corresponding to the environmental information image acquisition data is a low-texture image, the image acquisition data can be processed by adopting a direct method to obtain a camera pose corresponding to the image acquisition data, and then the camera pose is subjected to coordinate conversion to obtain the pose of the moving object, namely a first pose; if the image corresponding to the environmental information image acquisition data is not a low texture image, the image acquisition data can be subjected to the steps of feature extraction, feature matching, feature point matching, camera pose solving, pose optimizing and adjusting, coordinate conversion and the like by adopting a feature point method, so that the pose of the moving object, namely the first pose, can be obtained.

When the pose of the moving object is estimated by adopting the direct method, the image acquisition data can be processed by adopting the traditional direct method so as to estimate the pose of the moving object, and the image acquisition data can be processed by adopting the semi-direct method so as to estimate the pose of the moving object.

In some embodiments, as shown in fig. 4, step 211 includes step 2111, step 2112, and step 2113.

Step 2111, acquiring image acquisition data of a moving object;

Step 2112, inputting image acquisition data into a direct method network and a characteristic point method network of a preset deep learning model respectively to obtain a first pose and a second pose;

step 2113, determining environmental information of the image corresponding to the image acquisition data according to the first camera pose and the second camera pose.

Specifically, when the image acquisition information is adopted to determine the environment information of the image corresponding to the image acquisition data, the image acquisition data can be input into a pre-trained deep learning model, the current texture environment of the moving object is judged through the deep learning model, and then the pose estimation can be carried out by adopting a corresponding algorithm.

In this embodiment, the deep learning model includes a direct method network and a feature point method network, the direct method network processes the image acquisition data by using a direct method to output a first camera pose, the feature point method network processes the image acquisition data by using a feature point method to output a second camera pose, the deep learning model can calculate a score of the image acquisition data according to the first camera pose and the second camera pose, the score of the first camera pose can be recorded as T1, the score of the second camera pose can be recorded as T2, the score of an image corresponding to the image acquisition data is recorded as T, t=t1-T2, and if the score T exceeds a preset score, it can be determined that a low texture exists in the image corresponding to the image acquisition data; if the score T does not exceed the preset score, it can be determined that the image corresponding to the image acquisition data does not have low texture. The first camera pose and the second camera pose can be camera poses corresponding to image acquisition data. Wherein the preset score may be 0.

In the training process of the deep learning model, an acquired sample image for training can be input into the deep learning model to carry out a direct method and a characteristic point method for positioning and solving so as to obtain a first camera pose and a second camera pose, an actual pose corresponding to the acquired sample image is taken as a true value, then the deviation between the first camera pose and the actual camera pose and the deviation between the second camera pose and the actual camera pose are compared, and then the deep learning model is adjusted based on the deviation.

Further, in some embodiments, as shown in fig. 5, step 212 includes step 2121, step 2122, and step 2123.

Step 2121, determining characteristic points of a reference frame image corresponding to the image acquisition data;

Step 2122, re-projecting the feature points of the reference frame image into at least one frame image corresponding to the image acquisition data to obtain an initial pose of the moving object;

Step 2123, updating the initial pose of the moving object based on the deviation between the gray value of the feature point of the reference frame image and the gray value of at least one frame image corresponding to the image acquisition data, so as to obtain the first target pose of the moving object.

Specifically, the embodiment adopts a semi-direct method to process the image acquisition data to estimate the pose of the moving object. The semi-direct method is an algorithm that combines the advantages of the feature point method and the advantages of the direct method. The semi-direct method can utilize sparse feature points in the image for initial matching and combines luminosity error optimization of dense pixels, so that a balance point is found between calculation efficiency and estimation precision.

Specifically, the semi-direct method can obtain the pose of the camera by directly matching the feature point image blocks extracted from the image, but not directly matching the whole image as in the traditional direct method, the method is high in speed, can reach sub-pixel precision at a high frame rate, adopts a probability mapping technology, eliminates the requirement of expensive feature extraction and matching technology in motion estimation, and directly operates on a pixel level, thereby improving the overall robustness and precision.

In this embodiment, the feature points of the reference frame image corresponding to the image acquisition data may be determined by using a wheel speed inertial odometer, after the feature points of the reference frame image are determined, depth information of the feature points of the reference frame image and position information of the feature points on the reference frame image may be used to perform reprojection to obtain an initial pose of the moving object, then a deviation between a gray value of the feature points of the reference frame image and a gray value of at least one frame image corresponding to the image acquisition data is calculated, and the initial pose of the moving object is updated according to the deviation, so that the pose of the moving object, that is, the first target pose, may be obtained. The re-projection of the feature points can be performed under a camera coordinate system of the reference frame image to obtain an initial camera pose, and then the initial camera pose is subjected to coordinate conversion to obtain the initial pose of the moving object.

In some embodiments, as shown in fig. 6, step 213 includes step 2131, step 2132, and step 2133.

2131, Extracting characteristic points of multi-frame images in the image acquisition data;

2132, performing feature matching on the feature points of the multi-frame images to obtain at least one group of matched feature point pairs;

And 2133, predicting the pose of the moving object according to at least one group of matched characteristic point pairs to obtain a first target pose of the moving object.

In this embodiment, the feature points of the multi-frame image in the image acquisition data include key points and descriptors, the key points may be feature points in the multi-frame image, such as corner points and edge points, the descriptors may be specific descriptions of the key points, and the descriptors are mainly used for subsequent matching operations. The application can be realized by adopting SIFT (Scale-INVARIANT FEATURE TRANSFORM, scale invariant feature transform), SURF (Speeded Up Robust Features, acceleration robust feature), ORB (Oriented FAST and Rotated BRIEF) and other image feature point detection algorithms when feature point extraction is carried out.

After extracting the characteristic points of the multi-frame images in the image acquisition data, a FLANN (Fast Library for Approximate Nearest Neighbors, quick nearest neighbor search library) algorithm, a violent matching algorithm and the like can be adopted to perform characteristic matching on the characteristic points of the multi-frame images so as to obtain at least one group of matched characteristic point pairs, and then camera pose estimation can be performed according to the obtained characteristic matching point pairs, and coordinate transformation is performed so as to obtain a first target pose of the moving object.

In the process of estimating the camera pose, a epipolar geometry mode can be adopted to carry out epipolar constraint between 2D and 2D, an essential matrix and a homography matrix of pose transformation are calculated, triangulation is carried out, then a PnP (PERSPECTIVE-n-Point, perspective n-Point) algorithm can be utilized to carry out direct linear transformation, specifically, the pose of a 3D-2D camera can be estimated through a P3P (PERSPECTIVE-3-Point, perspective three-Point) algorithm, a BA (Bundle Adjustment ) algorithm and the like, and finally, the pose estimation of the 3D-3D is carried out through an ICP (ITERATIVE CLOSEST POINT ) algorithm, so that the camera pose of image acquisition data can be obtained.

In addition, after the pose estimation of the camera is performed, in order to improve the accuracy and the robustness of the estimation, the estimation result is usually optimized and filtered, and the steps of outlier rejection and the like can be adopted to minimize the re-projection error, so that the purpose of optimizing the pose of the camera is achieved.

In some embodiments, as shown in fig. 7, step 210 further includes step 310, step 320, and step 330.

Step 310, determining first path change information and first angle change information of a moving object according to wheel speed acquisition data;

Step 320, integrating the angular velocity information and the acceleration information in the motion acquisition data to obtain second path change information and second angle change information of the moving object;

Step 330, determining a second target pose of the moving object according to the first path change information, the second path change information, the first angle change information and the second angle change information.

Specifically, in the process of estimating the second target pose of the moving object by adopting the wheel speed acquisition data and the motion acquisition data of the moving object, the wheel speed acquisition data can be processed by adopting a motion model to determine the first path change information and the first angle change information of the moving object, the angular velocity data and the acceleration data of the moving object are extracted from the motion acquisition data, the angular velocity data and the acceleration data are integrated to obtain the second path change information and the second angle change information of the moving object, the first path change information and the second path change information can be fused, the first angle change information and the second angle change information are fused, and the second target pose of the moving object can be estimated according to the fused path change information and the angle change information. The motion model can be a two-wheel differential model, a four-wheel differential model, an ackerman model, an omnidirectional model and the like, and the motion model is selected according to the structure and the motion mode of the moving object, so that the application is not particularly limited.

In some embodiments, step 330 includes the steps of: performing first weighted fusion on the first path change information and the second path change information to obtain target path change information of the moving object; performing second weighted fusion on the first angle change information and the second angle change information to obtain target angle change information of the moving object; and performing motion decomposition on the target path change information and the target angle change information to obtain a second target pose of the moving object.

Specifically, when the first path change information and the second path change information, and the first angle change information and the second angle change information are fused, the first path change information, the second path change information, the first angle change information and the second angle change information can be specifically in a machine learning model trained in advance, the first path change information and the second path change information are respectively weighted and fused, the first angle change information and the second angle change information are respectively weighted and fused, the target path change information and the target angle change information of the moving object are further obtained, and finally the target path change information and the target angle change information are subjected to motion decomposition, so that the second target pose of the moving object can be obtained.

When the machine learning model is trained, multiple sets of path change information (each set of path change information can be represented by L1 and L2) and multiple sets of angle change information (each set of angle change information can be represented by Q1 and Q2) can be input into the machine learning model for fusion training until the loss function Ls1 and the loss function Lθ1 reach minimum, and the training of the machine learning model can be finished. Wherein ls1=Σ|li- Δli|, lθ1=Σ|qi- Δqi|, n is the total amount of output pose, L is the fused moving object path change value, θ is the fused angle change value, Δl is the output path change information, and Δq is the output angle change information.

And 220, carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object.

Specifically, after the first target pose of the moving object is determined by the visual odometer and the second target pose of the moving object is determined by the wheel speed inertial odometer, the first target pose and the second target pose can be subjected to weighted fusion, so that the final pose of the moving object, namely the final target pose, can be obtained, and further, the additional increase of equipment such as a laser radar and the like can be avoided, the pose estimation cost of the moving object is reduced, and meanwhile, the robustness of the visual odometer is improved.

Before the weighting fusion of the first target pose and the second target pose, the weights of the first target pose and the second target pose can be preset, or can be dynamically set according to the current environment of the moving object. Preferably, in order to obtain the final target pose of the moving object more accurately, the weight of the first target pose and the weight of the second target pose can be dynamically set according to the current environment of the moving object.

In some embodiments, as shown in fig. 8, step 220 includes step 221 and step 222.

Step 221, determining a weight relation between the first target pose and the second target pose according to the environment information of the image corresponding to the image acquisition data;

And step 222, carrying out weighted fusion on the first target pose and the second target pose according to the weight relation to obtain the final target pose of the moving object.

Specifically, when the first target pose and the second target pose are weighted and fused, the weights corresponding to the first target pose and the second target pose are not fixed and can be determined along with environmental information, when the visual odometer is unreliable, if the visual odometer is in an environment with large change of care and low texture or even no texture, the weights of the second target pose generated by the visual odometer can be reduced or even emptied, further the determination of the final target pose of the moving object can be determined by the second target pose estimated by the wheel speed inertial odometer, and further the accuracy of the pose estimation of the moving object can be ensured under various texture environments. The weighted fusion of the first target pose and the second target pose can be realized by adopting a pre-trained neural network model, so that parameter adjustment is avoided, and meanwhile, the output result of the wheel speed inertial odometer is more robust. The loss function L _total of the neural network model may be: l _total =Σ|| Ppi-Pri, n is the total amount of the output pose, pp is the fusion pose and Pr is the true value.

In some embodiments, as shown in fig. 9, the pose generation method further includes step 410, step 420, and step 430.

Step 410, acquiring initial image acquisition data, initial wheel speed acquisition data and initial motion acquisition data of a moving object;

Step 420, preprocessing the image initial acquisition data, the wheel speed initial acquisition data and the motion initial acquisition data respectively to obtain image intermediate acquisition data, wheel speed intermediate acquisition data and motion intermediate acquisition data;

Step 430, acquiring a time stamp of the moving object, and performing time synchronization processing on the image middle acquisition data, the wheel speed middle acquisition data and the movement middle acquisition data according to the time stamp to obtain the image acquisition data, the wheel speed acquisition data and the movement acquisition data of the moving object.

Specifically, when the visual odometer and the wheel speed inertial odometer are adopted to estimate the pose of a moving object, the image initial acquisition data corresponding to the visual odometer, the wheel speed initial acquisition data corresponding to the wheel speed inertial odometer and the motion initial acquisition data are required to be preprocessed, wherein the image initial acquisition data are required to be preprocessed through denoising, distortion correction and the like, the wheel speed initial acquisition data and the motion initial acquisition data are required to be preprocessed through abnormal data rejection and the like, and after the preprocessing is finished, the time between the data acquired by the sensors is not synchronous due to the fact that the frequencies of the sensors are different, the time synchronization processing is required to be carried out on the generated image intermediate acquisition data, the wheel speed intermediate acquisition data and the motion intermediate acquisition data, so that the accuracy of pose estimation of the moving object by the visual odometer and the wheel speed inertial odometer is ensured. The time synchronization process may use a time stamp of the moving object or a hardware device on the moving object to achieve time synchronization.

In the pose generation method provided by the application, a first target pose of a moving object is generated according to image acquisition data of the moving object, and a second target pose of the moving object is generated according to wheel speed acquisition data and motion acquisition data of the moving object; and carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object. The application can accurately estimate the pose of the moving object in different environments, improves the precision of the pose estimation of the moving object, and can expand the application range of the visual odometer.

The embodiment of the application also provides a pose generation device which is used for executing any embodiment of the pose generation method.

Specifically, referring to fig. 10, fig. 10 is a schematic block diagram of a pose generation device according to an embodiment of the present application.

As shown in fig. 10, the pose generation device includes: the generation unit 510 and the fusion unit 520.

A generating unit 510, configured to generate a first target pose of the moving object according to the image acquisition data of the moving object, and generate a second target pose of the moving object according to the wheel speed acquisition data and the motion acquisition data of the moving object; and the fusion unit 520 is configured to perform weighted fusion on the first target pose and the second target pose, so as to obtain a final target position of the moving object.

In some embodiments, the generating unit 510 is further specifically configured to determine, according to the image acquisition data, environmental information of an image corresponding to the image acquisition data; if the environment information is preset environment information, processing the image acquisition data by adopting a direct method to obtain a first target pose of the moving object; and if the environmental information is not the preset environmental information, processing the image acquisition data by adopting a characteristic point method to obtain a first target pose of the moving object.

In some embodiments, the generating unit 510 is further specifically configured to acquire image acquisition data of the moving object; respectively inputting the image acquisition data into a direct method network and a characteristic point method network of a preset deep learning model to obtain a first camera pose and a second camera pose; and determining the environmental information of the image corresponding to the image acquisition data according to the first camera pose and the second camera pose.

In some embodiments, the generating unit 510 is further specifically configured to determine a feature point of the reference frame image corresponding to the image acquisition data; re-projecting the characteristic points of the reference frame image into at least one frame image corresponding to the image acquisition data to obtain the initial pose of the moving object; and updating the initial pose of the moving object based on the deviation between the gray value of the characteristic point of the reference frame image and the gray value of at least one frame image corresponding to the image acquisition data, so as to obtain the first target pose of the moving object.

In some embodiments, the generating unit 510 is further specifically configured to extract feature points of multiple frames of images in the image acquisition data; performing feature matching on the feature points of the multi-frame images to obtain at least one group of matched feature point pairs; and predicting the pose of the moving object according to at least one group of matched characteristic point pairs to obtain a first target pose of the moving object.

In some embodiments, the generating unit 510 is further configured to determine first path change information and first angle change information of the moving object according to the wheel speed acquisition data; integrating the angular velocity information and the acceleration information in the motion acquisition data to obtain second path change information and second angle change information of the moving object; and determining a second target pose of the moving object according to the first path change information, the second path change information, the first angle change information and the second angle change information.

In some embodiments, the generating unit 510 is further configured to perform a first weighted fusion on the first path change information and the second path change information to obtain target path change information of the moving object; performing second weighted fusion on the first angle change information and the second angle change information to obtain target angle change information of the moving object; and performing motion decomposition on the target path change information and the target angle change information to obtain a second target pose of the moving object.

In some embodiments, the fusion unit 520 is further configured to determine a weight relationship between the first target pose and the second target pose according to the environmental information of the image corresponding to the image acquisition data; and according to the weight relation, carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object.

In some embodiments, the pose generation device is further configured to acquire image initial acquisition data, wheel speed initial acquisition data, and motion initial acquisition data of the moving object; preprocessing the image initial acquisition data, the wheel speed initial acquisition data and the motion initial acquisition data respectively to obtain image intermediate acquisition data, wheel speed intermediate acquisition data and motion intermediate acquisition data; and acquiring a time stamp of the moving object, and performing time synchronization processing on the image middle acquisition data, the wheel speed middle acquisition data and the movement middle acquisition data according to the time stamp to obtain the image acquisition data, the wheel speed acquisition data and the movement acquisition data of the moving object.

The pose generating device provided by the embodiment of the application is used for executing the first target pose of the moving object generated according to the image acquisition data of the moving object, and generating the second target pose of the moving object according to the wheel speed acquisition data and the movement acquisition data of the moving object; and carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object.

It should be noted that, as those skilled in the art can clearly understand the specific implementation process of the pose generation device and each unit, reference may be made to the corresponding description in the foregoing method embodiments, and for convenience and brevity of description, details are not repeated here.

In some embodiments, the present application further provides a vehicle, in which a sensing system is provided, where the sensing system includes the pose generating device in the above embodiments. Wherein the pose generation device may perform the pose generation method.

The above-described pose generation apparatus may be implemented in the form of a computer program that can be run on an electronic device as shown in fig. 11.

Referring to fig. 11, fig. 11 is a schematic block diagram of an electronic device according to an embodiment of the present application. The electronic device 600 may be a terminal, where the terminal may be a cloud, a vehicle-mounted terminal device, a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, a wearable device, and the like.

Referring to fig. 11, the electronic device 600 includes a processor 602, a memory and a network interface 605 connected by a system bus 601, wherein the memory may include a non-volatile storage medium 603 and an internal memory 604.

The non-volatile storage medium 603 may store an operating system 6031 and a computer program 6032. The computer program 6032 comprises program instructions that, when executed, cause the processor 602 to perform a pose generation method.

The processor 602 is used to provide computing and control capabilities to support the operation of the overall electronic device 600.

The internal memory 604 provides an environment for the execution of a computer program 6032 in the non-volatile storage medium 603, which computer program 6032, when executed by the processor 602, causes the processor 602 to perform a pose generation method.

The network interface 605 is used for network communication with other devices. It will be appreciated by those skilled in the art that the structure shown in fig. 11 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the electronic device 600 to which the present inventive arrangements are applied, and that a particular electronic device 600 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The processor 602 is configured to execute a computer program 6032 stored in the memory to implement the steps of: generating a first target pose of the moving object according to the image acquisition data of the moving object, and generating a second target pose of the moving object according to the wheel speed acquisition data and the movement acquisition data of the moving object; and carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object.

It should be appreciated that in embodiments of the present application, the Processor 602 may be a central processing unit (Central Processing Unit, CPU), the Processor 602 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

According to one aspect of the present application, there is also provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the electronic device to implement the steps of: generating a first target pose of the moving object according to the image acquisition data of the moving object, and generating a second target pose of the moving object according to the wheel speed acquisition data and the movement acquisition data of the moving object; and carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object.

Those skilled in the art will appreciate that all or part of the flow in a method embodying the above described embodiments may be accomplished by computer programs instructing the relevant hardware. The computer program comprises program instructions, and the computer program can be stored in a storage medium, which is a computer readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present application also provides a storage medium. The storage medium may be a computer readable storage medium. The storage medium stores a computer program, wherein the computer program includes program instructions. The program instructions, when executed by the processor, cause the processor to perform the steps of: generating a first target pose of the moving object according to the image acquisition data of the moving object, and generating a second target pose of the moving object according to the wheel speed acquisition data and the movement acquisition data of the moving object; and carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object.

The storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, or other various computer-readable storage media that can store program codes.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the application can be combined, divided and deleted according to actual needs. In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The integrated unit may be stored in a storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing an electronic device (which may be a personal computer, a terminal, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application.

While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. The pose generation method is characterized by comprising the following steps of:

Generating a first target pose of a moving object according to image acquisition data of the moving object, and generating a second target pose of the moving object according to wheel speed acquisition data and motion acquisition data of the moving object;

2. The pose generation method according to claim 1, wherein the generating the first target pose of the moving object from the image acquisition data of the moving object comprises:

Determining environmental information of an image corresponding to the image acquisition data according to the image acquisition data;

If the environment information is preset environment information, processing the image acquisition data by adopting a direct method to obtain a first target pose of the moving object;

And if the environmental information is not preset environmental information, processing the image acquisition data by adopting a characteristic point method to obtain a first target pose of the moving object.

3. The pose generation method according to claim 2, wherein the determining environmental information of an image corresponding to the image acquisition data according to the image acquisition data includes:

acquiring image acquisition data of the moving object;

Respectively inputting the image acquisition data into a direct method network and a characteristic point method network of a preset deep learning model to obtain a first camera pose and a second camera pose;

and determining the environment information of the image corresponding to the image acquisition data according to the first camera pose and the second camera pose.

4. The pose generation method according to claim 2, wherein the processing the image acquisition data by using a direct method to obtain the first target pose of the moving object comprises:

determining characteristic points of a reference frame image corresponding to the image acquisition data;

Re-projecting the characteristic points of the reference frame image into at least one frame image corresponding to the image acquisition data to obtain the initial pose of the moving object;

And updating the initial pose of the moving object based on the deviation between the gray value of the characteristic point of the reference frame image and the gray value of at least one frame image corresponding to the image acquisition data, so as to obtain a first target pose of the moving object.

5. The pose generation method according to claim 2, wherein the processing the image acquisition data by using a feature point method to obtain the first target pose of the moving object comprises:

Extracting characteristic points of multi-frame images in the image acquisition data;

performing feature matching on the feature points of the multi-frame images to obtain at least one group of matched feature point pairs;

And predicting the pose of the moving object according to at least one group of matched characteristic point pairs to obtain a first target pose of the moving object.

6. The pose generation method according to claim 1, wherein the generating the second target pose of the moving object from the wheel speed acquisition data and the motion acquisition data of the moving object includes:

determining first path change information and first angle change information of the moving object according to the wheel speed acquisition data;

Integrating the angular velocity information and the acceleration information in the motion acquisition data to obtain second path change information and second angle change information of the moving object;

And determining a second target pose of the moving object according to the first path change information, the second path change information, the first angle change information and the second angle change information.

7. The pose generation method according to claim 6, wherein the determining the second target pose of the moving object according to the first path change information, the second path change information, the first angle change information, and the second angle change information comprises:

Performing first weighted fusion on the first path change information and the second path change information to obtain target path change information of the moving object;

performing second weighted fusion on the first angle change information and the second angle change information to obtain target angle change information of the moving object;

and performing motion decomposition on the target distance change information and the target angle change information to obtain a second target pose of the moving object.

8. The method according to claim 1, wherein the performing weighted fusion on the first target pose and the second target pose to obtain a final target pose of the moving object includes:

Determining a weight relation between the first target pose and the second target pose according to the environment information of the image corresponding to the image acquisition data;

and according to the weight relation, carrying out weighted fusion on the first target pose and the second target pose to obtain the final target pose of the moving object.

9. The pose generation method according to any of claims 1-8, characterized in that the method further comprises:

acquiring initial image acquisition data, initial wheel speed acquisition data and initial motion acquisition data of the moving object;

Preprocessing the image initial acquisition data, the wheel speed initial acquisition data and the motion initial acquisition data respectively to obtain image intermediate acquisition data, wheel speed intermediate acquisition data and motion intermediate acquisition data;

And acquiring a time stamp of the moving object, and performing time synchronization processing on the image middle acquisition data, the wheel speed middle acquisition data and the movement middle acquisition data according to the time stamp to obtain the image acquisition data, the wheel speed acquisition data and the movement acquisition data of the moving object.

10. A pose generation device, characterized by comprising:

the generation unit is used for generating a first target pose of the moving object according to the image acquisition data of the moving object, and generating a second target pose of the moving object according to the wheel speed acquisition data and the movement acquisition data of the moving object;

11. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the pose generation method according to any of claims 1 to 9.

12. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of the pose generation method according to any of claims 1 to 9.

13. A computer program product comprising a computer program which, when executed by a processor, implements the pose generation method according to any of claims 1 to 9.

14. A vehicle comprising the pose generation device of claim 10 or implementing the pose generation method of any of claims 1 to 9 when executed.