CN118999577A

CN118999577A - Pose estimation method, pose estimation device, robot and storage medium

Info

Publication number: CN118999577A
Application number: CN202411188154.8A
Authority: CN
Inventors: 李道胜; 周波; 仓玉
Original assignee: Suzhou Gebote Technology Co ltd
Current assignee: Suzhou Gebote Technology Co ltd
Priority date: 2024-08-27
Filing date: 2024-08-27
Publication date: 2024-11-22

Abstract

The application discloses a pose estimation method, a pose estimation device, a robot and a storage medium, wherein the method comprises the following steps: acquiring a multi-layer map of the environment where the robot is located; the multi-layer map comprises a grid map and a label map which are aligned with a coordinate system; under the condition that a label in the environment is acquired, determining a predicted pose based on the label map, and constructing a candidate pose set by utilizing the predicted pose; determining the accuracy of each pose in the candidate pose set based on the grid map; and generating an estimated pose of the robot at the current moment based on each pose in the candidate pose set and the corresponding accuracy.

Description

Pose estimation method, pose estimation device, robot and storage medium

Technical Field

The present application relates to the field of robots, and in particular, to a pose estimation method, a pose estimation device, a robot, and a storage medium.

Background

With the rapid development of robot technology, robots are increasingly used in various fields such as daily life, industrial production, medical care, military exploration and the like. In these applications, accurate positioning of the robot is the basis for achieving its autonomous navigation, task execution, etc. However, the environment in which the robot is located is often complex and variable, which presents a serious challenge to the pose estimation technique of the robot.

Disclosure of Invention

In view of this, embodiments of the present application at least provide a pose estimation method, a pose estimation device, a robot, a storage medium and a program product.

The technical scheme of the embodiment of the application is realized as follows:

In one aspect, an embodiment of the present application provides a pose estimation method, where the method includes: acquiring a multi-layer map of the environment where the robot is located; the multi-layer map comprises a grid map and a label map which are aligned with a coordinate system; under the condition that a label in the environment is acquired, determining a predicted pose based on the label map, and constructing a candidate pose set by utilizing the predicted pose; determining the accuracy of each pose in the candidate pose set based on the grid map; and generating an estimated pose of the robot at the current moment based on each pose in the candidate pose set and the corresponding accuracy.

In another aspect, an embodiment of the present application provides a pose estimation apparatus, including: the acquisition module is used for acquiring a multi-layer map of the environment where the robot is located; the multi-layer map comprises a grid map and a label map which are aligned with a coordinate system; the first determining module is used for determining a predicted pose based on the tag map under the condition that the tag in the environment is acquired, and constructing a candidate pose set by utilizing the predicted pose; the second determining module is used for determining the accuracy of each pose in the candidate pose set based on the grid map; and generating an estimated pose of the robot at the current moment based on each pose in the candidate pose set and the corresponding accuracy.

In yet another aspect, an embodiment of the present application provides a robot, including a memory and a processor, where the memory stores a computer program executable on the processor, and where the processor implements some or all of the steps of the above method when the processor executes the program.

In yet another aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs some or all of the steps of the above-described method.

In yet another aspect, embodiments of the present application provide a computer program product comprising a computer program or instructions which, when executed by a processor, implement some or all of the steps of the above method.

In the embodiment of the application, by acquiring the grid map and the tag map which are aligned with the coordinate system, the robot can not only acquire the accurate position information of the environmental obstacle, but also determine the tag pose of the known tag in the environment; meanwhile, the known tag pose in the tag map is utilized to initialize the predicted pose of the robot and construct a candidate pose set, so that initial positioning is provided for the robot, the diversity and uncertainty of the pose are considered through the construction of the candidate pose set, and abundant data support is provided for subsequent accuracy assessment and pose estimation generation; in addition, through the accuracy of the grid map, accuracy evaluation is carried out on each candidate pose, so that more reliable and reasonable poses can be screened out, and the accuracy and rationality of estimating the poses are improved. Meanwhile, the estimated pose generated by the embodiment of the application can provide clear guidance for the subsequent movement and decision of the robot, and can enable the robot to realize efficient and stable autonomous navigation in a complex environment.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic implementation flow diagram of a pose estimation method according to an embodiment of the present application;

fig. 2 is a second implementation flow chart of a pose estimation method according to an embodiment of the present application;

fig. 3 is a schematic diagram III of an implementation flow of a pose estimation method according to an embodiment of the present application;

fig. 4 is a schematic implementation flow chart of a pose estimation method according to an embodiment of the present application;

fig. 5 is a schematic diagram of an implementation flow of a pose estimation method according to an embodiment of the present application;

Fig. 6 is a schematic diagram of a implementation flow of a pose estimation method according to an embodiment of the present application;

Fig. 7 is a flow chart of a global positioning method in a large-scale indoor scene according to an embodiment of the present application;

Fig. 8 is a schematic diagram of a composition structure of a pose estimation device according to an embodiment of the present application;

fig. 9 is a schematic diagram of a hardware entity of a robot according to an embodiment of the present application.

Detailed Description

The technical solution of the present application will be further elaborated with reference to the accompanying drawings and examples, which should not be construed as limiting the application, but all other embodiments which can be obtained by one skilled in the art without making inventive efforts are within the scope of protection of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict. The term "first/second/third" is merely to distinguish similar objects and does not represent a particular ordering of objects, it being understood that the "first/second/third" may be interchanged with a particular order or precedence, as allowed, to enable embodiments of the application described herein to be implemented in other than those illustrated or described herein.

Before explaining the embodiments of the present disclosure in further detail, terms and terminology involved in the embodiments of the present disclosure are explained, and the terms and terminology involved in the embodiments of the present disclosure are applicable to the following explanation.

(1) Simultaneous localization and mapping (Simultaneous Localization AND MAPPING, SLAM): allowing the robot to determine its own position in real time in an unknown environment and construct a map of the surrounding environment. Two core tasks are achieved by integrating data collected by various sensors (such as lidar, cameras, inertial measurement units, etc.), and complex algorithms to process the data: positioning and mapping. In the SLAM process, the robot first acquires information of the surrounding environment through its sensors. These sensor data may include laser point clouds, image frames, inertial measurements, and the like. The SLAM algorithm then extracts useful feature points from the raw data, which represent key parts of the environment that can be used for localization and mapping. The algorithm then performs a data association, i.e., matching the features of the current frame with features of previous maps or other frames. This process is critical to understanding how the robot moves and how the environment changes. Through data association, the SLAM algorithm can infer the motion trail of the robot, including translation, rotation and the like. After determining the motion trajectory of the robot, the SLAM algorithm uses this information to construct a map of the environment. The map is constructed in various modes, and common modes comprise a coverage grid map and a point cloud map.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing the application only and is not intended to be limiting of the application.

Nowadays, the application of autonomous navigation mobile robots is increasingly popular, including floor sweeping robots, inspection robots, unmanned delivery vehicles and the like, and the key precondition for realizing autonomous navigation of the robots is that the robots know real-time accurate position and posture information, namely pose. The traditional mobile navigation robot based on the global positioning system (Global Positioning System, GPS) is accurate in positioning and convenient to use, but cannot work in a GPS refusing scene, and the GPS positioning is insensitive to directions, so that the application of the mobile navigation robot in certain scenes is limited by the factors. On the other hand, the scheme of simultaneous positioning and map construction based on vision and laser can realize the positioning of the robot and the construction of a scene map, but the positioning and map construction are needed to be carried out again in each navigation process, so that unnecessary waste of calculation resources is caused, and the method has higher requirement on hardware cost. The robot positioning method based on the scanning matching generally needs to manually specify the initial pose of the robot, then the continuous scanning matching method is used for obtaining the pose of the robot, and the specification of the initial pose has certain requirements on the technology of a robot user, so that a certain threshold is added for the use and popularization of the full-automatic robot. In addition, although the positioning method based on the manual label can realize the rapid positioning of the robot by using the manual label, the positioning method based on the manual label has the problems of a large number of manual labels, high labor cost, need of readjusting the label position in the later period and the like, so that the method is not convenient in application and maintenance.

Aiming at the problems, the embodiment of the application provides a laser radar and vision sensor carried by a robot, and a rapid and accurate global positioning method is realized in a scene with a small number of special labels. The method applies the multi-layer map to execute the rough-before-fine matching strategy so as to realize the real-time accurate positioning information of the robot.

The method is characterized in that the global autonomous positioning of the robot can be realized without any priori pose information, a small amount of special labels and the sensing capability of the robot can be utilized to provide real-time accurate pose information for the robot in a large-scale scene, and the method has the position finding capability. The method can effectively improve the positioning accuracy and the positioning efficiency of the autonomous navigation robot, and has important practical significance for application and popularization of the autonomous navigation mobile robot in various scenes.

The embodiment of the application provides a pose estimation method which can be executed by a processor of a robot. The robot may be a device with autonomous movement capabilities, sensor integration, and data processing capabilities. These devices are capable of performing complex tasks including, but not limited to, navigation, path planning, object recognition and manipulation, environmental awareness, and the like.

In some embodiments, the robot may be a device that not only has basic data processing capabilities, but also integrates sensors (e.g., cameras, lidar, ultrasonic sensors, gyroscopes, accelerometers, etc.), actuators (e.g., motors, wheels, robotic arms, etc.), and algorithms and software for controlling and optimizing these components. By way of example, the robot may be a service robot (e.g., a home sweeping robot, a restaurant meal delivery robot, a hospital care robot, etc.), an industrial robot (a robot that performs tasks of machining, assembly, handling, etc. on a production line with high repeatability and strict accuracy requirements), an exploration robot (for exploration and investigation tasks in extreme environments or hazardous areas), a mobile robot (an automated guided vehicle, a drone, etc.).

Fig. 1 is a schematic implementation flow diagram of a pose estimation method according to an embodiment of the present application, as shown in fig. 1, the method includes steps S101 to S104 as follows:

step S101, acquiring a multi-layer map of an environment where a robot is located; the multi-layer map includes a grid map and a label map aligned with a coordinate system.

Among them, a multi-layered map is a map representation method integrating various types of information for providing a comprehensive view about an environment in which a robot is located. In some embodiments, the multi-layer map includes multiple layers, each layer representing a different aspect of the environment, and the layers are aligned at the level of the coordinate system so as to be mutually referenced and used in combination.

In an embodiment of the present application, the multi-layer map includes a grid map and a label map. Among them, the grid map is a map representation method of dividing an environment into a plurality of small cells (grids), each representing a small area in the environment, and giving a value to indicate whether the area is occupied by an obstacle. Illustratively, the grid may have three states: unknown state, occupied state, and idle state.

In the embodiment of the application, the pose of each tag in the tag map storage environment and the identification information of each tag can be exemplified by a specific text, an image, a two-dimensional code or any other visual tag which can be identified by a robot and used for positioning or navigation, and correspondingly, the identification information corresponding to the tag can be semantic information and ID information.

In some embodiments, the pose to which the tag corresponds includes a coordinate parameter of the tag relative to the world coordinate system and an orientation angle parameter. It can be understood that the tag in the embodiment of the present application is a visual tag disposed in an environment, the coordinate parameter represents a position of the visual tag in the current environment, and the orientation angle parameter represents an orientation of the visual tag in the current environment.

It will be appreciated that the grid map and the label map in the multi-layer map described above are aligned in a coordinate system. That is, aligned coordinate systems means that different layers (e.g., grid map and label map) are represented in the same coordinate system (i.e., world coordinate system in subsequent embodiments) so that the two can be superimposed and matched to each other, thereby providing more complete environmental information.

Step S102, under the condition that the labels in the environment are collected, a predicted pose is determined based on the label map, and a candidate pose set is constructed by utilizing the predicted pose.

In the embodiment of the application, pose information and corresponding identification information of all the tags are stored in the tag map, and when the robot is in the environment, the environment data are collected through the sensor, so that whether the tags exist in the tag map in the environment is determined.

In some embodiments, the robot first receives raw data from its sensor, which may be image data captured by a camera, and then performs data preprocessing steps, such as image filtering, enhancement, binarization, etc., on the captured image data, so as to more easily extract features, such as text, image, two-dimensional code, etc., from the data, which are helpful for identifying the tag. And matching the extracted features with the labels stored in the label map to obtain pose information corresponding to the labels. The pose information corresponding to the tag refers to coordinates and orientations of the tag in a world coordinate system, and the coordinates and orientations are used for determining the accurate position and direction of the tag in a global environment.

In some embodiments, the robot first observes the tag in the environment through its sensors and obtains the pose of the tag in the robot coordinate system (the robot coordinate system is a robot-centric coordinate system that is used to describe the relative position and orientation between the robot and its surroundings). Meanwhile, the robot can obtain the pose of the label under the world coordinate system based on the label map in a matching mode. The predicted pose of the robot under the world coordinate system can be estimated by utilizing the difference between the two poses.

It can be understood that the pose of the tag under the robot coordinate system is obtained by analyzing the acquired image including the tag by the robot through an image processing and recognition algorithm. In consideration of the fact that accuracy of visual algorithm recognition is greatly affected by environmental factors, after the predicted pose is obtained, a pose set is built based on the predicted pose, so that the most accurate pose can be found in the pose set later.

In some embodiments, a series of possible poses may be generated to construct a set of poses, taking into account the range of motion of the robot, based on the predicted poses. For example, this may be achieved by uniformly distributing a plurality of points within a certain radius around the predicted pose.

Illustratively, in the case where the predicted pose of the robot is (x 1, y1, θ1), where (x 1, y 1) is the coordinates, θ1 is the orientation. A series of points are generated around (x 1, y 1), assuming that 4 points (5 points in total) are generated, the coordinates being (x 2, y 2), (x 3, y 3) respectively, (x 5, y 5), after which different orientations are set for each coordinate point, assuming that the orientations of each point are respectively θ1, θ2, θ3, θ5. In this way, a set of poses including 36 poses can be obtained. In some embodiments, the generated pose set may also be screened based on the motion range of the robot to remove poses that do not conform to the motion range of the robot.

Step S103, determining the accuracy of each pose in the candidate pose set based on the grid map.

In some embodiments, the set of candidate poses includes a plurality of poses, i.e., including the predicted pose and a plurality of other poses located near the predicted pose. It is understood that the plurality of candidate poses are poses that the robot may be in during the performance of the task.

In the embodiment of the application, the accuracy represents the matching degree of each pose in the pose set and the actual environment, namely the reliability of the pose and the degree of approaching to the actual position.

In some embodiments, the robot uses the sensor to scan the current environment, obtain information of obstacles, ground, etc.; for each pose in the pose set, simulating the distribution of the environmental data under the pose, namely converting the environmental data acquired by the sensor into a coordinate system of the pose to obtain the distribution condition of the environmental data under each pose under a world coordinate system; and comparing the simulated distribution condition of the environmental data under each pose under the world coordinate system with the grid map, checking whether the distribution condition of the environmental data under the world coordinate system is consistent with the grid state of each grid in the grid map, and further, distributing an accuracy score for each pose. The higher the score, the higher the matching degree of the gesture and the actual environment, and the more reliable.

For example, assume that the robot is in a warehouse environment, obstacle data of the current environment is acquired by a laser radar. After the robot generates a series of poses (including the predicted pose and other poses nearby), for each pose: the robot converts the acquired obstacle data into a coordinate system of the pose, and simulates the environmental data distribution under the pose; checking whether the position and the size of the obstacle are consistent with the information in the grid map or not by utilizing a pre-constructed grid map and comparing the simulated environmental data distribution; and according to the comparison result, assigning an accuracy score to each pose. For example, if the simulated environmental data is completely consistent with the obstacle location in the grid map, the accuracy score of the pose may be high; if there is a large deviation, the score will be lower. According to the scheme, the robot can evaluate the accuracy of each pose more accurately by comparing the simulated environment data with the grid map.

Step S104, based on each pose in the candidate pose set and the corresponding accuracy, generating an estimated pose of the robot at the current moment.

In some embodiments, from the set of candidate poses, the pose with the highest accuracy may be selected as the estimated pose. In other embodiments, at least two poses with the highest accuracy can be selected, and an average value of the at least two poses with the highest accuracy is calculated as an estimated pose, so that errors of a single pose are reduced, and stability of the pose is improved.

Illustratively, assume that pose a in the candidate pose set is 95% accurate, pose B is 85%, pose C is 90%, and pose D is 92%. In some embodiments, pose a may be selected directly as the estimated pose, in other embodiments, pose a, pose C, and pose D are selected, and the average of these three poses is taken as the estimated pose: estimate pose= (pose a+pose c+pose D)/3.

Fig. 2 is a schematic diagram of a second implementation flow of a pose estimation method according to an embodiment of the present application, where the method may be executed by a processor of a robot. Based on fig. 1, S103 in fig. 1 may be updated to S201 to S202, and the description will be made in connection with the steps shown in fig. 2.

Step 201, acquiring scanning data of the robot on the environment at the current moment.

In the embodiment of the application, the robot is provided with a sensor, such as a laser radar, a camera and the like, and the sensor can be used for collecting the scanning data of the environment in real time.

For example, in the case where the sensor is a lidar, the lidar may emit a laser beam at a preset frequency, and the laser beam is reflected back after encountering an obstacle; the laser radar converts the collected reflected signals into distance information and generates a point cloud data set (i.e., scan data) that includes information on the location and shape of all obstacles in the current field of view of the robot.

Step S202, determining the accuracy of each pose based on the grid map and the scanning data under each pose.

The accuracy characterizes the matching degree of the scanning data under the pose and the grid map.

In some embodiments, the grid map has been constructed, that is, the status (occupied, free, unknown) of the grid has been marked explicitly.

In some embodiments, in the process of determining the accuracy of each pose, sequentially taking out one pose from the candidate pose set, and converting the acquired scanning data into a coordinate system of a grid map by taking the current pose as a reference; then, traversing each grid in the grid map, checking whether corresponding barriers are matched with the grids in the occupied state, checking whether the barriers are not collided with the grids in the idle state, and further calculating the matching degree of the scanning data under the current pose and the grid map according to the matching and collision conditions; and evaluating the accuracy of the current pose according to the calculation result of the matching degree, wherein the higher the matching degree is, the better the consistency of the scanning data and the grid map is, and the higher the accuracy of the current pose is.

Taking the scanning data as point cloud data as an example, the robot acquires laser radar scanning data under the current pose to form point cloud. At the same time, a pre-built grid map is loaded, where each grid is marked as "occupied" (with obstacles), "free" (without obstacles), or "unknown" (undetected areas). In calculating the accuracy of a pose, it is necessary to project the point cloud data onto the grid map based on the pose, i.e., assign each point cloud into the nearest grid. Thereafter, the number of first point clouds falling within the grid marked as "occupied" in the grid map is calculated, it being understood that these point clouds should correspond to obstacles in the grid map, so a high degree of matching means that more point clouds coincide with known obstacle locations. Meanwhile, the number of second point clouds falling in the grids marked as idle in the grid map is calculated, and the more the number of second point clouds is, the more inaccurate the pose is represented. Further, a corresponding accuracy may be generated for the current pose based on the first and second point cloud amounts.

In the embodiment of the application, the accuracy assessment of the robot pose is further refined and improved by introducing the matching process of the real-time scanning data and the grid map; furthermore, the perception capability of the robot in a complex environment is enhanced, and more reliable and accurate basis is provided for autonomous navigation and decision making thereof.

Fig. 3 is a schematic diagram of a third implementation flow of a pose estimation method according to an embodiment of the present application, where the method may be executed by a processor of a robot. Based on fig. 1, S104 in fig. 1 may be updated to S301, and the description will be made in connection with the steps shown in fig. 3.

Step 301, performing a pose search process of at least one depth on each pose in the candidate pose set until a termination condition is met, and determining the estimated pose based on the pose with the highest accuracy.

In the embodiment of the application, the pose searching process with at least one depth is mainly used for finding the best estimated pose by gradually refining the searching range and improving the searching precision. The process is implemented through pose search iteration of multiple depths, each depth is based on the result of the previous depth, and the precision and accuracy of the pose are gradually improved.

In order to facilitate understanding of the embodiments of the present application, an exemplary description will be given below in terms of a one-depth pose search process. It is understood that the current depth in the following embodiments may be the first depth of the at least one depth, or any other depth.

The pose searching process corresponding to the current depth includes step S3011.

Step S3011, determining a screened pose based on an accuracy threshold and the accuracy of each pose in the search pose set aiming at each input search pose set corresponding to the current depth; constructing the middle search pose set corresponding to the screened pose based on the pose precision corresponding to the current depth; and screening the poses in the middle search pose set based on pruning strategies to obtain the search pose set of the current depth output.

In the embodiment of the application, the search pose set output by the pose search process of the first depth is the search pose set input by the pose search process of the second depth, and the search pose set input by the pose search process of the first depth is the candidate pose set. That is, the input set of search poses corresponding to the current depth is derived from the output of the last depth or the initial set of candidate poses.

In the embodiment of the application, aiming at the pose precision of the same type, the pose precision corresponding to the first depth is lower than the pose precision corresponding to the second depth; the pose searching process of the first depth is the adjacent last pose searching process of the second depth.

In some embodiments, the pose comprises pose parameters of at least one of: a position parameter in a first direction, a position parameter in a second direction, and an orientation angle parameter; the pose accuracy includes at least one of: a pose search step length and a pose search range, wherein the pose search step length is used for determining the difference of pose parameters between adjacent poses in a constructed pose set; the pose search range is used for determining a parameter range of the pose parameters in the constructed pose set.

If the pose searching steps are the same, the greater the pose searching range is, the greater the pose accuracy is; if the pose search ranges are the same, the smaller the pose search step length is, the greater the pose accuracy is.

The first direction and the second direction are two directions perpendicular to each other, and in some embodiments, the first direction and the second direction are two directions of a world coordinate system corresponding to the multi-layer map. Illustratively, in the two-dimensional space, the first direction may be an X-axis direction and the second direction may be a Y-axis direction. Correspondingly, the position parameter in the first direction is the coordinate (such as the X-axis coordinate) of the robot in the first direction, and the position parameter in the second direction is the coordinate (such as the Y-axis coordinate) of the robot in the second direction; the orientation angle parameter is used to indicate the orientation of the robot at its location. Illustratively, the heading angle parameter may represent an amount of rotation of the robot relative to some fixed direction (e.g., the positive X-axis direction).

The pose searching step length is the difference of pose parameters of adjacent poses when the pose set is constructed. Illustratively, if the pose search step in the first direction (X-axis) is set to 0.1 meters, then the difference in position of adjacent poses in the X-axis direction in the constructed pose set will be 0.1 meters. The smaller the step size, the higher the search fineness, but the larger the calculation amount. The pose search range is a parameter range of pose parameters in the constructed pose set. Illustratively, in the first direction (X-axis), the search range may be-1 meter to +1 meter, which means that all possible positions between-1 meter and +1 meter (discretized according to step size) will be considered when constructing the pose set. The larger the search range, the larger the search space, but the same increases the calculation amount.

In the embodiment of the application, the pose with high quality is screened through the accuracy threshold value so as to reduce unnecessary calculation; meanwhile, a new search pose set is constructed based on pose precision of the current depth, and the number of poses in the set is further reduced through pruning strategies, so that search efficiency is improved.

Step S3011 calculates the accuracy of each pose by traversing each pose in each input search pose set corresponding to the current depth, and compares the accuracy with a preset accuracy threshold; under the condition that the accuracy of the pose is greater than or equal to a threshold value, reserving the pose as the screened pose; in the event that the accuracy of the pose is less than a threshold, it is removed from the collection; for each screened pose, constructing a surrounding pose set according to the pose precision (step length and range) of the current depth, and further obtaining a middle search pose set corresponding to each screened pose; and then, screening the poses in the middle search pose set by utilizing a pruning strategy, wherein the rest poses form a search pose set of current depth output. This set will be used as input to the next depth search process or for the final estimated pose estimation if the current depth is the last layer.

In some embodiments, the above process of constructing the set of intermediate search poses includes: and for each screened pose, taking the pose as a reference point, and constructing a surrounding pose set according to the pose precision (step length and search range) of the current depth. For example, for poses (including position coordinates and orientation angles) in a two-dimensional space, multiple new poses may be generated to both sides within the search range at intervals of step sizes along the X-axis and the Y-axis (or any two orthogonal directions), respectively. These newly generated poses together with the reference poses constitute the set of intermediate search poses corresponding to the screened poses.

In some embodiments, the pruning strategy may determine whether to preserve all the poses of each side (e.g., the most distal pose or the pose of a specific location) based on the evaluation of a representative pose (e.g., the most distal pose) of that side (e.g., the positive X-axis direction, the negative X-axis direction, the positive Y-axis direction, the negative Y-axis direction, etc.).

For example, in the case where the pose accuracy corresponding to the current depth is 0.1m in step size and the search range is ±0.5m in each direction, for one pose obtained after screening, the pose is used as a reference point to generate a new pose in a range of ±0.5m in step size along the X-axis and the Y-axis (the orientation in the pose is not considered in the current embodiment for easy understanding) respectively. In this way, 10 new poses (including boundary points) are generated in the positive X-axis direction, the negative X-axis direction, the positive Y-axis direction and the negative Y-axis direction respectively, and 41 poses are formed by adding the reference poses as the middle search pose set; under the condition that the pruning strategy is judged based on the most distant pose in each direction, the poses, which are 0.5m away from the reference pose in the 4 directions, can be used as representative poses to be evaluated, and if the estimated accuracy of the representative pose in a certain direction is lower than a certain threshold value or is lower than the estimated accuracy of the representative pose in the opposite direction, all the poses in the direction are deleted.

In some embodiments, the termination condition includes at least one of: reaching a preset maximum depth; the highest accuracy in the search pose set meets the requirements.

In some embodiments, the method further comprises: constructing a multi-resolution map based on the grid map; the multi-resolution map is used for determining the accuracy of each pose in the search pose set in the pose search process of each depth.

Here, since the accuracy of each pose needs to be determined in the pose search process of each depth, in order to improve the accuracy of the accuracy, a multi-resolution map may be constructed in advance based on the grid map, and the implementation manner includes: a plurality of different resolution grid maps are generated by downsampling (reducing resolution) or upsampling (increasing resolution) the original grid map such that each resolution map holds a different level of information of the environment, from a rough overall overview to detailed local features. Of course, the resolution of a grid map can also be intuitively understood as the size of the individual grids present in the grid map.

In the embodiment of the application, after generating the grid maps with a plurality of different resolutions, the accuracy of each pose can be determined by selecting the grid map with the resolution corresponding to the current depth in the process of determining the accuracy of each pose at the current depth along with the increase of depth (hierarchy). It will be appreciated that the higher the depth, the higher the pose accuracy and, correspondingly, the higher the resolution of the grid map used.

In some embodiments, after the above-mentioned at least one depth pose search process is performed, a pose with highest accuracy may be determined in the last depth output set of search poses, and in order to improve accuracy of estimating the pose, a pose search process may be performed again based on the pose with highest accuracy. In practice, the above determination of the estimated pose based on the pose with the highest accuracy may be achieved through steps S3012 to S3014.

Step S3012, constructing a target search pose set corresponding to the pose with the highest accuracy based on the target pose precision; and aiming at the pose precision of the same type, the target pose precision is higher than the pose precision corresponding to each depth.

In some embodiments, the target pose accuracy is an accuracy requirement that is higher than the pose accuracy for all depth (or hierarchy) correspondences. In general, target pose accuracy defines the degree of pose accuracy that is desired to be achieved during a search.

Here, the process of "constructing the target search pose set corresponding to the pose with the highest accuracy based on the target pose accuracy" in the above step S3012 corresponds to "constructing the intermediate search pose set corresponding to the screened pose based on the pose accuracy corresponding to the current depth" in the above step S3011, and may refer to the specific embodiment of the above step S3011 when implemented.

Step S3013, using the target search pose set as the input search pose set in the pose search process, to obtain a corresponding output search pose set.

Here, for the target search pose set, screening can be performed based on the accuracy of each pose in the target search pose set, for example, at least two screened poses with higher accuracy are selected; constructing an intermediate search pose set corresponding to the screened poses based on the final pose precision; and screening the poses in the middle search pose set based on pruning strategies to obtain a final output search pose set. Here, the final pose accuracy is higher than the target pose accuracy.

Step S3014, determining, as the estimated pose, the pose with the highest accuracy in the output search pose set corresponding to the target search pose set.

In the embodiment of the application, after the pose searching process of at least one depth is executed, the pose searching process is carried out again through the pose with the highest accuracy, and the pose with the highest accuracy is determined as the estimated pose in the finally output searching pose set, so that the accuracy of estimating the pose can be further improved.

Fig. 4 is a schematic diagram of an implementation flow chart of a pose estimation method according to an embodiment of the present application, where the method may be executed by a processor of a robot. Based on fig. 1, "constructing a candidate pose set using the predicted poses" may include step S401, which will be described in connection with the steps shown in fig. 4.

Step S401, under the condition that labels in the environment are acquired, a candidate pose set corresponding to the predicted pose is constructed based on pose precision corresponding to the labeled state.

Wherein the candidate pose set comprises the predicted pose, and the pose range and/or the pose difference in the candidate pose set is related to the pose precision.

Here, the process of "constructing the candidate pose set corresponding to the predicted pose based on the pose accuracy corresponding to the tagged state" in the above step S401 corresponds to "constructing the intermediate search pose set corresponding to the screened pose based on the pose accuracy corresponding to the current depth" in the above step S3011, and may refer to the specific embodiment of the above step S3011 when implemented.

It should be noted that the pose accuracy corresponding to the tagged state is smaller than the pose accuracy corresponding to each depth in the above embodiment.

In some embodiments, the method may further include step S402:

step S402, under the condition that the historical pose of the robot does not exist and the labels in the environment cannot be acquired, the candidate pose set is constructed based on the pose precision corresponding to the label-free state.

The pose precision corresponding to the unlabeled state is lower than the pose precision corresponding to the labeled state for the pose precision of the same type.

In the embodiment of the application, under the condition that the tags in the environment cannot be acquired, compared with the condition that the tags can be acquired in the previous embodiment, the current scene cannot know the predicted pose of the robot, so that a corresponding candidate pose set needs to be constructed based on the whole map.

The historical pose refers to the pose of the robot at the historical moment, and the robot can estimate the current pose based on the historical pose, the collected pose and the pose change amount between the current poses. In some implementation scenarios, step S402 described above may be applied to a starting point in time in the robot navigation process, when there is no historical pose of the robot; the step S402 may also be applied to a time node after the robot starts navigation, where the time node after the start navigation may be, but is not limited to, a network interrupt, a calculation process interrupt, or a historical data loss, and in the above case, there is no historical pose of the robot, and only the predicted pose of the robot can be estimated again.

In some embodiments, the pose accuracy corresponding to the unlabeled state is lower than the pose accuracy corresponding to the labeled state. The pose accuracy corresponding to the label-free state also comprises a corresponding step length and a range, wherein the range is the whole map, and the step length corresponding to the label-free state can be larger than the step length corresponding to the label-free state. The set of candidate poses may be constructed by: generating a plurality of position points on the whole map according to the set step length, wherein the position points form a position part in the candidate pose set; then, at each generated position point, a plurality of orientation angles are generated according to the orientation accuracy of the robot (this is also a part of the pose accuracy). In general, the number of orientation angles may be adjusted as desired, but should generally be sufficient to cover all directions that a robot may face. And combining each position point with all the generated orientation angles to form a plurality of candidate poses. These candidate poses together form a set of candidate poses.

According to the embodiment of the application, the robot can construct a set containing a plurality of candidate poses by utilizing the information of the whole map and the pose precision in the non-label state under the condition of no label information. This set provides rich candidate solutions for subsequent pose search, path planning or positioning correction; meanwhile, the candidate position points are generated by adopting larger step sizes and ranges, so that the method is relatively high in calculation efficiency and resource consumption.

Fig. 5 is a schematic diagram of an implementation flow of a pose estimation method according to an embodiment of the present application, where the method may be executed by a processor of a robot. Based on fig. 1, the method further comprises a step S501 and a step S502, which will be described in connection with the steps shown in fig. 5.

In the moving process of the robot, in the case that the tag in the environment cannot be acquired, the pose increment of the robot from the historical moment to the current moment is acquired in step S501.

The above-mentioned pose increment refers to the amount of change in the position and direction of the robot between two different time points (the historical time and the current time) due to movement. Generally, the amount of translation along the X, Y axes and the amount of change in orientation angle may be included.

In some embodiments, the above-described pose increments may be obtained by setting pose sensors such as wheel speed meters, gyroscopes, etc. in the robot. For example, the data of the pose sensor may be continuously collected during the movement of the robot, including but not limited to the rotation speed, steering angle, acceleration, etc. of the wheels, and the collected sensor data may be processed to calculate the relative movement of the robot between two time points. By way of example, the amount of translation of the robot in the X, Y axes can be estimated from the wheel speed meter data; the change in the orientation angle of the robot can be estimated from the gyroscope data. And further obtaining the pose increment of the robot from the historical moment to the current moment. Assuming that the pose of the robot at the history time t0 is (x 0, y0, θ0), the wheel speed meter data shows that the robot moves by Δx on the x axis and by Δy on the y axis in a certain period, and the gyroscope data shows that the orientation angle of the robot changes by Δθ. The pose increment of the robot at the current time t1 is (Δx, Δy, Δθ).

In another embodiment, successive laser frames from the last time the tag was seen to the current time can also be aligned during the robot movement by point cloud registration methods such as iterative closest point (ITERATIVE CLOSEST POINT, ICP) or normal distribution transform (Normal Distribution Transform, NDT) or the like. The ICP algorithm estimates the transformation relationship (including rotation and translation) between two sets of point clouds by iteratively finding the corresponding points between the two sets of point clouds and minimizing the distance between the corresponding points, thereby yielding pose increments. NDT algorithms obtain pose increments by dividing the target point cloud into a plurality of small three-dimensional grids, calculating the probability distribution (typically normal distribution) of points within each grid, and then estimating the transformation relationship (including rotation and translation) between the two sets of point clouds by maximizing the probability of points in the source point cloud to the distribution within these grids.

Step S502, generating the pose corresponding to the current moment based on the pose corresponding to the historical moment and the pose increment.

Wherein the pose increment of the robot from the historical moment to the current moment can be expressed as Δp _bt, then the final estimated robot pose P _b satisfies the following formula (1):

P _b＝P_bc+ΔP_bt formula (1);

Wherein P _bc is the pose of the robot when the label is seen at the previous time.

In the embodiment of the application, the current pose is generated based on the historical pose and the pose increment, so that the robot can continuously track and update the position and direction information of the robot without the assistance of an environment tag.

Fig. 6 is a schematic diagram sixth implementation flow chart of a pose estimation method according to an embodiment of the present application, where the method may be executed by a processor of a robot. Based on fig. 1, the method for constructing a multi-layer map includes steps S601 to S604, which will be described in connection with the steps shown in fig. 6.

Step S601, acquiring a robot pose of the robot under a map coordinate system and a tag pose of the tag under the robot coordinate system in the process of constructing the grid map by the robot.

In some embodiments, the robot may construct the grid map described above through a simultaneous vision and laser localization and mapping scheme. Of course, the application can also adopt other grid map construction schemes.

In the process of constructing the grid map, a small number of labels can be preset in the current environment, and then the pose of the robot and the pose of the labels at any moment can be acquired simultaneously in the process of collecting environmental data through the sensor by the robot.

The map coordinate system is a coordinate system for describing positions of points on a map, and the robot coordinate system is a coordinate system centered on a robot and used for describing relative positions of the robot and objects detected by sensors of the robot.

In some embodiments, the robot pose of the robot in the map coordinate system is a position and a direction of the robot in the map coordinate system. Illustratively, X, Y coordinates of the robot (representing horizontal and vertical positions of the robot on a map) and an orientation angle (representing a heading of the robot) may be included. In the embodiment of the present application, the above-mentioned orientation angle is also understood as an orientation angle of the camera, that is, an orientation of the camera is the same as an advancing direction of the robot. For example, assuming that the robot is in a map coordinate system in an application scenario such as a mall, its pose may be represented as P _b(x₁,y₁,θ₁, where (x ₁,y₁) is the coordinate of the robot relative to a reference position such as a mall entrance (or a designated position such as a mall lower left corner), θ ₁ is the orientation angle of the robot, where the setting of the reference position may be set at an arbitrary position in a two-dimensional plane where the current scenario (such as a mall) is located, in some embodiments, in order to facilitate calculation of the subsequent pose, the reference position may be set at four corners of the two-dimensional plane where the current scenario is located, so that the coordinates of the robot in the current scenario are all in the same quadrant of the coordinate system; in other embodiments, to facilitate the readability of the data, the reference position may be set on the functional area demarcation point in the current scene based on the functional area division of the current scene, for example, the reference position may be set on the market entrance, the market exit, or the functional partition line of the market, so that the developer can intuitively obtain the position of the robot approximately from the coordinate values.

In some embodiments, the position and orientation of the tag in the robot coordinate system are the position and orientation of the tag relative to the current position and orientation of the robot. This typically includes X, Y coordinates of the tag (representing the fore-aft and side-to-side positions of the tag relative to the robot) and an orientation angle (representing the orientation of the tag relative to the robot). Illustratively, assuming that the robot recognizes one tag by the camera, the pose of the tag in the robot coordinate system may be denoted as P _c(x₂,y₂,θ₂), where (x ₂,y₂) is the coordinate of the tag with respect to the current position of the robot, and θ ₂ is the orientation angle of the tag with respect to the advancing direction of the robot.

In some embodiments, the robot first needs to capture an image of its field of view through the camera and then identify the tag in the image using image processing and identification algorithms. Once the tag is identified, the robot extracts the pixel coordinates of the tag in the image and converts them to coordinates in the robot coordinate system by the camera's internal and external parameters. In addition, the robot may determine an orientation angle of the tag by analyzing a geometric feature of the tag or using depth information. Thus, the tag pose under the robot coordinate system can be obtained.

Step S602, determining the tag pose of the tag under the map coordinate system based on the robot pose of the robot under the map coordinate system and the tag pose of the tag under the robot coordinate system.

In the embodiment of the present application, the step S602 is aimed at converting the tag from the pose in the robot coordinate system to the pose in the map coordinate system. Illustratively, it may be achieved by formulas (2) to (4):

x ₃＝x₁+x₂*coS(θ₁)-y₂*sin(θ₁) equation (2);

y ₃＝y₁+y₂*sin(θ₁)+y₂*coS(θ₁) equation (3);

θ ₃＝θ₁+θ₂ equation (4);

here, the pose of the tag with respect to the map coordinate system may be denoted as P _t(x₃,y₃,θ₃).

And step 603, constructing the label map by using the label pose of each label in the environment where the robot is located under the map coordinate system.

In some embodiments, the tag map stores the pose of each tag in the environment, and the identification information of each tag. And storing the detected label pose of each label under the map coordinate system and the identification information corresponding to each label to obtain the label map.

And step S604, aligning the label map and the grid map to obtain the multi-layer map.

In the embodiment of the application, the label map and the grid map are aligned, which means that the coordinate systems of the two maps are unified, and the coordinate systems of the two maps are ensured to be under the same reference frame.

In some embodiments, if the coordinate systems of the two maps are not uniform, it is necessary to perform a coordinate transformation operation on the coordinate system of one of the maps, where the coordinate transformation operation may include at least one of the following: translation operation, rotation operation.

In some embodiments, after unifying the coordinate systems of the tag map and the grid map, the pose of each tag stored in the tag map, the identification information of each tag, and the grid state of each grid in the grid map need to be stored in the unified coordinate system, so as to form the multi-layer map. Therefore, the obtained multi-layer map not only contains the position features of the barriers in the grid map, but also marks the pose and the identification information of the labels in the environment, thereby providing a comprehensive environment view for the robot and being beneficial to more accurate navigation and task planning.

In the embodiment of the application, the label map is aligned and fused with the grid map constructed based on the vision and the laser sensor to form the multi-layer map containing rich environment information. The multi-layer map not only provides the physical structure (such as walls, barriers and the like) of the environment, but also comprises important positioning reference points (such as labels), and provides the robot with omnibearing environment sensing capability. The multi-layer map is constructed, so that the robot can fully utilize various information sources in the navigation process, the accuracy and the efficiency of path planning are improved, and meanwhile, the robot can react and adjust more quickly when facing complex or dynamically-changing environments, and the smooth completion of navigation tasks is ensured.

The application of the pose estimation method provided by the embodiment of the application in an actual scene is mainly related to a global positioning scheme in a large-scale indoor scene.

Aiming at a large-scale indoor scene, the application provides a global positioning scheme which can provide accurate pose information for the mobile robot in a complex environment, so that the mobile robot can effectively cope with navigation challenges in various complex scenes. The application adopts a special tag enhanced positioning strategy, and realizes the rapid and accurate identification of the position of the robot by carrying the laser radar and the vision sensor and combining a small number of tags, thereby improving the stability and reliability of positioning. In addition, the application has the important characteristics that global positioning can be realized without any priori pose information, and the application has the capability of position finding back. The characteristic can prevent positioning failure caused by position tracking loss, and ensures the navigation effect of the robot in a complex environment.

Referring to fig. 7, a flow chart of a global positioning method in a large-scale indoor scene is shown. As shown in fig. 7, the method of the present application has three inputs (including a laser radar 71, a camera 72 and a tag 73) and one output (robot pose 77), and the method can be roughly divided into two parts of map building and global pose estimation, wherein the map building part firstly uses the laser radar 71 and the camera (sensor) 72 carried by the robot to generate a grid map 74 and a tag map 75 respectively, then aligns the grid map 74 and the tag map 75 to form a multi-layer map 76, and then the global pose estimation part estimates the pose of the robot relative to the grid map, namely the pose 77 of the robot in real time when the robot needs navigation.

In some embodiments, the lidar 71 and the camera 72 are mounted on a robot platform, the camera 72 is horizontally looking forward of the robot, the lidar 71 is horizontally placed and as far as possible to avoid occlusion, and the tag 73 is sparsely arranged by personnel in the working scene of the robot. Environmental information around the robot is acquired by the lidar 71 for SLAM mapping (via SLAM 78), and whether tag information is available in a certain direction (in a certain pose) of the robot is detected by the camera 72 in real time. In addition, the final estimated robot pose 77 of the robot relative to the grid map contains three degrees of freedom, the X-Y plane displacement and the robot orientation θ angle.

SLAM78 receives successive laser frames and obtains successive robot pose 77 estimates by aligning the laser point cloud frames. In the working process of the SLAM78, the camera continuously observes whether available tags exist around the robot, when the available tags are detected, the pose P _b of the current robot relative to the map coordinate system and the pose P _c of the tag observed by the camera obtained by the SLAM78 are determined, the pose P _t of the tag relative to the map coordinate system is determined, the tag map 75 is generated, and it can be understood that the pose P _b(x₁,y₁,θ₁) of the robot, the pose P _c(x₂,y₂,θ₂) of the tag observed by the camera and the pose P _t(x₃,y₃,θ₃) of the tag relative to the map coordinate system satisfy formulas (2) to (4):

The robot traverses the environment for a week to finish the SLAM process and generate a corresponding grid map and a label map, wherein the grid map uses grids with equal sizes to describe the map, each small grid in the grid map has three states, namely an unknown state, an occupied state and an idle state, the grid map can be directly used for robot navigation, and the pose of the robot relative to the grid map only needs to be determined during robot navigation. And the tag map is used for recording the identification (IdentityDocument, ID) information and the pose information carried by each tag in the environment. After the grid map and label map are constructed, a multi-layer map 76 is generated by aligning the coordinate systems of the grid map and label map.

The coarse correlation scan matching module 771 does not know that the initial pose of the robot needs pose initialization when the robot just begins to navigate.

The following description will be made on the case where the camera can see the tag at the time of pose initialization and the case where the camera cannot see the tag, respectively.

If the camera can see the tag when the pose is initialized, the pose of the tag is seen by the camera to initialize the robot pose. Assuming that the pose of the tag is P _t(x₄,y₄,θ₄), the pose of the tag is P _c(x₅,y₅,θ₅) observed by the camera), then the initial pose of the robot is P _b(x₆,y₆,θ₆), and the three satisfy the following formulas (5) to (7):

θ ₆＝θ₅-θ₄ equation (5);

x ₆＝x₅*cos(θ₆)-y₅*sin(θ₆)-x₄ formula (6);

y ₆＝y₅*sin(θ₆)+y₅*cos(θ₆)-y₄ equation (7);

In some embodiments, the coarse correlation scan matching module generates a set of candidate poses with smaller search ranges and smaller search steps in three dimensions of the X-axis, Y-axis, and angle around the initial pose P _b observed by the camera.

The search range represents a numerical range in any dimension in the candidate pose set, such as a difference between a maximum value and a minimum value of an X axis, a difference between a maximum value and a minimum value of a Y axis, and a difference between a maximum value and a minimum value of an angle. The search step length represents the difference value between two adjacent pose positions with the numerical value in the candidate pose set, and naturally, the step length of the three dimensions of the X axis, the Y axis and the angle is also respectively included.

If no tag is observed in the camera for pose initialization, a candidate pose set is generated by using a larger search range and a larger search step length, and initialization is completed.

Taking dimension as an X axis as an example, in the case that no tag is observed in the pose initialization camera, the corresponding search range can be 10 meters, and the search step length is 1 meter; under the condition that the camera can observe the tag when the pose is initialized, the corresponding search range can be 1 meter, and the search step length is 0.1 meter.

After the rough correlation scan matching module 771 generates the candidate pose set, each pose element in the candidate pose set is scored according to the matching degree of each pose element in the candidate pose set and the grid map 74 in the multi-layer map 76, and the scores of the candidate pose set and each pose element are output to the branch-and-bound acceleration module 772 for accelerating search matching. In the subsequent robot movement process, if pose loss occurs, the pose initialization function can be used for quickly determining the position area of the robot again.

The branch-and-bound acceleration module 772 obtains the candidate pose set from the output of the coarse correlation scan matching module 771 as the initial candidate pose set and builds a multi-resolution map with the grid map 74.

The branch-and-bound acceleration module 772 first screens out a portion of candidate poses with lower scores through a lowest score threshold, then branches and expands the remaining candidate poses to obtain candidate pose subsets with higher resolution, wherein the subsets are search spaces of the candidate poses, and represent pose estimation of robots with higher resolution and more accuracy relative to a map. Then, a pruning strategy is adopted in the search space of the candidate pose to reduce the complexity of the search, and then, the recursive search is performed to recursively apply the steps of branching and subtracting branches to each subset until the termination condition is met, wherein the termination condition of the recursive search can be a judgment condition such as the size of the search space, the search depth or an error threshold value. And finally, after all searches are finished, selecting the pose with the highest score as the output pose of the coarse correlation scanning matching.

In some embodiments, after the branch-and-bound acceleration module 772 finds a highest-scoring robot pose, the fine registration module 774 uses this pose as the best candidate pose and regenerates the set of candidate poses using finer search ranges and search steps and re-executes the branch-and-bound acceleration module 772 once to find a highest-scoring candidate pose, taking this pose as the final estimated robot pose.

If the camera does not observe the tag during the movement of the robot, the point cloud registration module 773 aligns continuous laser frames from the last time the tag is seen to the current time by using a point cloud registration method such as ICP (Iterative Closest Point) or NDT (Normal Distribution Transform) so as to estimate the pose of the robot at the current time. Assuming that the estimated pose of the robot when the tag is seen at the previous time is P _bc (the pose of the robot at the previous time output by the fine registration module 774), the pose increment of the robot estimated by the point cloud registration module is Δp _bt in the period when the tag is not seen, and the finally estimated pose P _b of the robot satisfies the formula (1).

The point cloud registration module 773 obtains the robot pose estimate at the current time by registering the point cloud until the robot returns to the tagged location area again.

Based on the embodiment, the prior pose provided by the special tag can be used for obtaining the prior pose based on the tag, and the correct pose of the robot can be searched out nearby the prior pose. Because of the available a priori pose, the search time and accuracy are hardly changed by the increase in scale of the map scale. In addition, the application accelerates the search through branching delimitation, combines the search mode using the laser frames and the multi-layer map, and has lower data volume and lower calculation cost compared with the local map because the local map is formed by a plurality of laser frames; meanwhile, compared with the scheme that a large number of uniform labels are needed in a scene in the related art, the method only needs a small number of labels, allows the condition that a robot cannot see the labels to occur, has less limitation on the number of the labels and the arrangement positions, and is more flexible and stable.

The pose estimation method provided by the application can be suitable for common mobile robot working scenes, and comprises mobile service robots in hotels and shops, AGV (Automated Guided Vehicle) trolleys in factories and the like for example.

The implementation flow of the method mainly comprises the following steps of sparsely arranging a small number of special tags in a robot working scene, carrying a laser radar and a camera by the robot to traverse the environment for one week, positioning and mapping the robot in real time by SLAM technology in the process, and identifying the pose of the special tags in the mapping process. At the end of the mapping, generating a label map occupying the grid map and recording the label positions by SLAM technology, and superposing the grid map and the label map to form a multi-layer map. When the robot needs to navigate, first performing rough matching, identifying a label near the robot through a camera, searching a label map layer of a multi-layer map to find an optimal pose candidate area, then selecting a larger position step length and an angle step length, and finding an optimal candidate pose through relevant scanning matching and branch delimitation acceleration searching means to complete rough matching. The accurate matching is based on coarse matching, and branch delimitation acceleration search is re-executed once with smaller position step length and angle step length, so that more accurate robot pose estimation is obtained. In addition, when the tag is invisible, the point cloud registration module continuously matches the laser frame between the last time the tag is seen and the current time to obtain the pose estimation of the current robot, and the point cloud registration module continues until the tag is seen again next time. By the method, the pose estimation of the robot relative to the priori map can be quickly and accurately realized, and real-time accurate pose information is provided for the robot to realize autonomous navigation.

Based on the foregoing embodiments, the embodiments of the present application provide a pose estimation device, where the pose estimation device includes units included, and modules included in the units may be implemented by a processor in a robot; of course, the method can also be realized by a specific logic circuit; in an implementation, the Processor may be a central processing unit (Central Processing Unit, CPU), a microprocessor (Microprocessor Unit, MPU), a digital signal Processor (DIGITAL SIGNAL Processor, DSP), or a field programmable gate array (Field Programmable GATE ARRAY, FPGA), or the like.

Fig. 8 is a schematic structural diagram of a pose estimation device according to an embodiment of the present application, and as shown in fig. 8, the pose estimation device 800 includes: an acquisition module 810, a first determination module 820, a second determination module 830, a generation module 840, wherein:

An obtaining module 810, configured to obtain a multi-layer map of an environment in which the robot is located; the multi-layer map comprises a grid map and a label map which are aligned with a coordinate system; a first determining module 820, configured to determine a predicted pose based on the tag map and construct a candidate pose set using the predicted pose in the case where a tag in an environment is acquired; a second determining module 830, configured to determine, based on the grid map, accuracy of each pose in the candidate pose set; a generating module 840, configured to generate an estimated pose of the robot at the current moment based on each pose in the candidate pose set and the corresponding accuracy.

In some embodiments, the second determining module 830 is further configured to: acquiring scanning data of the robot on the environment at the current moment; determining the accuracy of each pose based on the grid map and the scanning data under each pose; the accuracy characterizes the degree of matching of the pose scan data with the grid map.

In some embodiments, the generating module 840 is further configured to: performing a pose search process of at least one depth on each pose in the candidate pose set until a termination condition is met, and determining the estimated pose based on the pose with the highest accuracy; the pose searching process corresponding to the current depth comprises the following steps: determining a screened pose based on an accuracy threshold and the accuracy of each pose in the search pose set for each input search pose set corresponding to the current depth; constructing the middle search pose set corresponding to the screened pose based on the pose precision corresponding to the current depth; screening the poses in the middle search pose set based on pruning strategies to obtain the search pose set of the current depth output; the method comprises the steps that a first depth pose searching process outputs a search pose set which is a second depth pose searching process input search pose set, and the first depth pose searching process inputs the search pose set which is the candidate pose set; aiming at the pose precision of the same type, the pose precision corresponding to the first depth is lower than the pose precision corresponding to the second depth; the pose searching process of the first depth is the adjacent last pose searching process of the second depth.

In some embodiments, the generating module 840 is further configured to: constructing a target search pose set corresponding to the pose with the highest accuracy based on the target pose accuracy; aiming at the pose precision of the same type, the target pose precision is higher than the pose precision corresponding to each depth; taking the target search pose set as an input search pose set in the pose search process to obtain a corresponding output search pose set; and determining the pose with highest accuracy as the estimated pose in the output search pose set corresponding to the target search pose set.

In some embodiments, the first determining module 820 is further configured to: constructing a candidate pose set corresponding to the predicted pose based on the pose precision corresponding to the tagged state; the candidate pose set includes the predicted pose, and a pose range and/or a pose difference in the candidate pose set is related to pose accuracy.

In some embodiments, the first determining module 820 is further configured to construct the candidate pose set based on pose accuracy corresponding to a label-free state in a case where there is no historical pose of the robot and a label in an environment cannot be acquired; and aiming at the pose precision of the same type, the pose precision corresponding to the unlabeled state is lower than the pose precision corresponding to the labeled state.

In some embodiments, the generating module 840 is further configured to: in the moving process of the robot, under the condition that labels in the environment cannot be acquired, acquiring pose increment of the robot from the historical moment to the current moment; and generating the pose corresponding to the current moment based on the pose corresponding to the historical moment and the pose increment.

In some embodiments, the pose estimation device 800 further comprises a building module. The construction module is used for: in the process of constructing a grid map by a robot, acquiring the pose of the robot under a map coordinate system and the pose of a label under the robot coordinate system; determining a tag pose of the tag under a map coordinate system based on the robot pose of the robot under the map coordinate system and the tag pose of the tag under the robot coordinate system; constructing the tag map by using the tag pose of each tag in the environment where the robot is located under the map coordinate system; and aligning the label map and the grid map to obtain the multi-layer map.

The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. In some embodiments, the functions or modules included in the apparatus provided by the embodiments of the present application may be used to perform the methods described in the foregoing method embodiments, and for technical details that are not disclosed in the embodiments of the apparatus of the present application, reference should be made to the description of the embodiments of the method of the present application.

It should be noted that, in the embodiment of the present application, if the above-mentioned pose estimation method is implemented in the form of a software functional module, and sold or used as a separate product, the pose estimation method may also be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium, comprising several instructions for causing a robot (which may be a service robot, an industrial robot, an exploration robot, a mobile robot, etc.) to perform all or part of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the application are not limited to any specific hardware, software, or firmware, or any combination of hardware, software, and firmware.

The embodiment of the application provides a robot, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor realizes part or all of the steps in the method when executing the program.

Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs some or all of the steps of the above-described method. The computer readable storage medium may be transitory or non-transitory.

Embodiments of the present application provide a computer program comprising computer readable code which, in case of running in a robot, performs part or all of the steps for implementing the above method.

Embodiments of the present application provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program which, when read and executed by a computer, performs some or all of the steps of the above-described method. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In some embodiments, the computer program product is embodied as a computer storage medium, and in other embodiments, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It should be noted here that: the above description of various embodiments is intended to emphasize the differences between the various embodiments, the same or similar features being referred to each other. The above description of apparatus, storage medium, computer program and computer program product embodiments is similar to that of method embodiments described above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus, the storage medium, the computer program and the computer program product of the present application, reference should be made to the description of the embodiments of the method of the present application.

Fig. 9 is a schematic diagram of a hardware entity of a robot according to an embodiment of the present application, as shown in fig. 9, the hardware entity of the robot 900 includes: a processor 901 and a memory 902, wherein the memory 902 stores a computer program executable on the processor 901, the processor 901 implementing the steps in the method of any of the embodiments described above when the program is executed.

The memory 902 stores a computer program executable on the processor, and the memory 902 is configured to store instructions and applications executable by the processor 901, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by each module in the processor 901 and the robot 900, which may be implemented by a FLASH memory (FLASH) or a random access memory (Random Access Memory, RAM).

The processor 901 implements the steps of any one of the above-described pose estimation methods when executing a program. The processor 901 generally controls the overall operation of the robot 900.

An embodiment of the present application provides a computer storage medium storing one or more programs executable by one or more processors to implement the steps of the pose estimation method of any of the embodiments above.

It should be noted here that: the description of the storage medium and apparatus embodiments above is similar to that of the method embodiments described above, with similar benefits as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and the apparatus of the present application, please refer to the description of the method embodiments of the present application.

The Processor may be at least one of an Application SPECIFIC INTEGRATED Circuit (ASIC), a digital signal Processor (DIGITAL SIGNAL Processor, DSP), a digital signal processing device (DIGITAL SIGNAL Processing Device, DSPD), a programmable logic device (Programmable Logic Device, PLD), a field programmable gate array (Field Programmable GATE ARRAY, FPGA), a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronic device implementing the above-mentioned processor function may be other, and embodiments of the present application are not limited in detail.

The computer storage medium/Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable programmable Read Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), a magnetic random access Memory (Ferromagnetic Random Access Memory, FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Read Only optical disk (Compact Disc Read-Only Memory, CD-ROM); but may also be various terminals such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence number of each step/process described above does not mean that the execution sequence of each step/process should be determined by its functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units. Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

The foregoing is merely an embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application.

Claims

1. A pose estimation method, characterized in that it is applied to a robot, said method comprising:

acquiring a multi-layer map of the environment where the robot is located; the multi-layer map comprises a grid map and a label map which are aligned with a coordinate system;

under the condition that a label in the environment is acquired, determining a predicted pose based on the label map, and constructing a candidate pose set by utilizing the predicted pose;

Determining the accuracy of each pose in the candidate pose set based on the grid map;

and generating an estimated pose of the robot at the current moment based on each pose in the candidate pose set and the corresponding accuracy.

2. The method of claim 1, wherein determining the accuracy of each pose in the set of candidate poses based on the grid map comprises:

acquiring scanning data of the robot on the environment at the current moment;

Determining the accuracy of each pose based on the grid map and the scanning data under each pose; the accuracy characterizes the degree of matching of the pose scan data with the grid map.

3. The method of claim 1 or 2, wherein the generating the estimated pose of the robot based on each pose in the set of candidate poses and the corresponding accuracy comprises:

Performing a pose search process of at least one depth on each pose in the candidate pose set until a termination condition is met, and determining the estimated pose based on the pose with the highest accuracy;

the pose searching process corresponding to the current depth comprises the following steps: determining a screened pose based on an accuracy threshold and the accuracy of each pose in the search pose set for each input search pose set corresponding to the current depth; constructing the middle search pose set corresponding to the screened pose based on the pose precision corresponding to the current depth; screening the poses in the middle search pose set based on pruning strategies to obtain the search pose set of the current depth output;

the method comprises the steps that a first depth pose searching process outputs a search pose set which is a second depth pose searching process input search pose set, and the first depth pose searching process inputs the search pose set which is the candidate pose set; aiming at the pose precision of the same type, the pose precision corresponding to the first depth is lower than the pose precision corresponding to the second depth; the pose searching process of the first depth is the adjacent last pose searching process of the second depth;

The pose includes pose parameters of at least one of: a position parameter in a first direction, a position parameter in a second direction, and an orientation angle parameter; the pose accuracy includes at least one of: a pose search step length and a pose search range, wherein the pose search step length is used for determining the difference of pose parameters between adjacent poses in a constructed pose set; the pose search range is used for determining a parameter range of the pose parameters in the constructed pose set.

4. A method according to claim 3, wherein said determining said estimated pose based on the pose with the highest accuracy comprises:

constructing a target search pose set corresponding to the pose with the highest accuracy based on the target pose accuracy; aiming at the pose precision of the same type, the target pose precision is higher than the pose precision corresponding to each depth;

taking the target search pose set as an input search pose set in the pose search process to obtain a corresponding output search pose set;

And determining the pose with highest accuracy as the estimated pose in the output search pose set corresponding to the target search pose set.

5. The method of any one of claims 1 to 4, wherein constructing a set of candidate poses using the predicted poses comprises:

Constructing a candidate pose set corresponding to the predicted pose based on the pose precision corresponding to the tagged state; the candidate pose set comprises the predicted pose, and the pose range and/or the pose difference in the candidate pose set are related to the pose precision;

Under the condition that the historical pose of the robot does not exist and the labels in the environment cannot be acquired, constructing the candidate pose set based on the pose precision corresponding to the label-free state; and aiming at the pose precision of the same type, the pose precision corresponding to the unlabeled state is lower than the pose precision corresponding to the labeled state.

6. The method according to any one of claims 1 to 5, further comprising:

In the moving process of the robot, under the condition that labels in the environment cannot be acquired, acquiring pose increment of the robot from the historical moment to the current moment;

And generating the pose corresponding to the current moment based on the pose corresponding to the historical moment and the pose increment.

7. The method according to any one of claims 1 to 6, wherein the method of constructing a multi-layer map comprises:

in the process of constructing a grid map by a robot, acquiring the pose of the robot under a map coordinate system and the pose of a label under the robot coordinate system;

Determining a tag pose of the tag under a map coordinate system based on the robot pose of the robot under the map coordinate system and the tag pose of the tag under the robot coordinate system;

constructing the tag map by using the tag pose of each tag in the environment where the robot is located under the map coordinate system;

And aligning the label map and the grid map to obtain the multi-layer map.

8. A pose estimation device, the device comprising:

The acquisition module is used for acquiring a multi-layer map of the environment where the robot is located; the multi-layer map comprises a grid map and a label map which are aligned with a coordinate system;

The first determining module is used for determining a predicted pose based on the tag map under the condition that the tag in the environment is acquired, and constructing a candidate pose set by utilizing the predicted pose;

the second determining module is used for determining the accuracy of each pose in the candidate pose set based on the grid map;

And the generation module is used for generating the estimated pose of the robot at the current moment based on each pose in the candidate pose set and the corresponding accuracy.

9. A robot comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 when the program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.