CN113034584B

CN113034584B - Mobile robot visual positioning method based on object semantic road sign

Info

Publication number: CN113034584B
Application number: CN202110411557.4A
Authority: CN
Inventors: 林旭滨; 何力; 杨益枘; 管贻生; 张宏
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2022-08-30
Anticipated expiration: 2041-04-16
Also published as: CN113034584A

Abstract

The invention discloses a mobile robot visual positioning method based on object semantic road signs, which comprises the steps of firstly constructing an off-line object road sign global map, then constructing an on-line object road sign local map, then carrying out mathematical expression on the object road sign local map and the object road sign global map by taking a map structure as a mathematical model, taking a sub-map matching problem as a map matching mathematical equation, realizing registration estimation of the object road sign local map in the object road sign global map by using a sub-map matching solving result, and finally realizing the visual positioning of a mobile robot. The invention carries out the visual positioning of the robot based on the semantic road signs of the object, utilizes the property that the geometric structure of the three-dimensional space has the visual angle invariance, gets rid of the limitation requirement of the positioning system on the motion posture of the robot, improves the robustness of the positioning system on the external environment conditions, and finally improves the autonomy and the flexibility of the mobile robot.

Description

A visual localization method for mobile robots based on object semantic landmarks

技术领域technical field

本发明涉及移动机器人的技术领域，尤其涉及到一种基于物体语义路标的移动机器人视觉定位方法。The invention relates to the technical field of mobile robots, in particular to a visual positioning method of mobile robots based on object semantic landmarks.

背景技术Background technique

自主导航是移动机器人技术领域的一个核心技术问题，自主导航系统一般包含定位、路径规划以及控制等三个主要模块，其中定位模块是实时控制模块和局部路径规划模块的信息输入，作为信息反馈环节在自主导航系统起着至关重要作用。定位模块的实时性、精准度以及鲁棒性对自主导航系统性能有着决定性因素。Autonomous navigation is a core technical problem in the field of mobile robot technology. Autonomous navigation systems generally include three main modules: positioning, path planning and control. plays a vital role in autonomous navigation systems. The real-time, accuracy and robustness of the positioning module are decisive factors for the performance of the autonomous navigation system.

现有基于视觉传感器的定位系统主要依赖于视觉特征关键点的匹配，即当前帧图像中的特征关键点与特征地图数据库中的关键点进行匹配关联。在匹配成功的情况下，借助地图数据库中关键点的三维坐标，通过相机投射投影几何模型估计出当前帧图像对应的相机坐标系的三维位姿，即等同于估计出传感器移动平台的位姿，从而实现定位功能。在定位过程中，视觉特征关键点匹配是精准位姿估计的前提，特征关键点的匹配主要基于描述向量提取算法，其本质上依赖于图像局部像素强度分布特性。在外界环境的纹理信息、光照条件严格相同的条件下，图像局部像素强度分布特性会受到相机捕获视角的影响，致使在同一场景下，因为观察角度和地图构建过程时不一致而导致特征点匹配失败，进而造成机器人定位系统无法可靠工作。The existing visual sensor-based positioning systems mainly rely on the matching of visual feature key points, that is, the feature key points in the current frame image are matched and associated with the key points in the feature map database. In the case of successful matching, with the help of the three-dimensional coordinates of the key points in the map database, the three-dimensional pose of the camera coordinate system corresponding to the current frame image is estimated through the camera projection projection geometric model, which is equivalent to estimating the pose of the sensor mobile platform. So as to realize the positioning function. In the positioning process, the matching of visual feature key points is the premise of accurate pose estimation. The matching of feature key points is mainly based on the description vector extraction algorithm, which essentially depends on the local pixel intensity distribution characteristics of the image. Under the conditions that the texture information and lighting conditions of the external environment are strictly the same, the local pixel intensity distribution characteristics of the image will be affected by the angle of view captured by the camera, resulting in the failure of feature point matching in the same scene due to the inconsistency between the observation angle and the map construction process. , thus causing the robot positioning system to fail to work reliably.

发明内容SUMMARY OF THE INVENTION

针对现有基于视觉方案的移动机器人全局定位技术中由于视觉特征点的提取及匹配算法的光照不变性和视角不变性不足的问题，本发明提出一种基于物体语义路标的移动机器人视觉定位方法，旨在摆脱定位系统对机器人运动姿态的限制要求，提升定位系统对外界环境条件的鲁棒性，最终提升移动机器人的自主性和灵活性。Aiming at the problem of insufficient illumination invariance and viewing angle invariance in the extraction of visual feature points and the matching algorithm in the existing global positioning technology for mobile robots based on vision schemes, the present invention proposes a visual positioning method for mobile robots based on object semantic landmarks. The purpose is to get rid of the limitation of the positioning system on the robot's motion posture, improve the robustness of the positioning system to external environmental conditions, and finally improve the autonomy and flexibility of the mobile robot.

为实现上述目的，本发明所提供的技术方案为：For achieving the above object, the technical scheme provided by the present invention is:

一种基于物体语义路标的移动机器人视觉定位方法，包括如下步骤：A mobile robot visual positioning method based on object semantic landmarks, comprising the following steps:

S1、构建离线的物体路标全局地图；S1. Build an offline global map of object road signs;

S2、构建在线的物体路标局部地图；S2. Build an online local map of object road signs;

S3、以图结构为数学模型对物体路标局部地图和物体路标全局地图进行数学表达，并以子图匹配问题作为地图匹配数学方程，以子地图匹配求解结果实现物体路标局部地图在物体路标全局地图中的配准估计，最终实现移动机器人的视觉定位。S3. Mathematically express the local map of object landmarks and the global map of object landmarks using the graph structure as a mathematical model, and use the sub-map matching problem as a map matching mathematical equation, and use the sub-map matching solution result to realize the local map of object landmarks in the global map of object landmarks The registration estimation in the final realization of the visual localization of the mobile robot.

进一步地，所述步骤S1构建离线的物体路标全局地图的具体过程如下：Further, the specific process of constructing an offline global map of object road signs in the step S1 is as follows:

S1-1、获取输入图像，并对输入图像进行预处理；S1-1. Acquire an input image, and preprocess the input image;

S1-2、利用训练好的卷积神经网络实现多目标物体检测，并最终以2D矩形包络框的形式输出，输出图像中的所有目标物体的类别向量c_i及其包络框在像素坐标系中的位置x_i，y_i及尺寸信息w_i，h_i，即{c_i,x_i,y_i,w_i,h_i,i∈N}，w_i为高度，h_i为高度；S1-2. Use the trained convolutional neural network to achieve multi-target object detection, and finally output it in the form of a 2D rectangular envelope, output the category vector c _i of all target objects in the image and its envelope at the pixel coordinates Position x _i , y _i and size information _wi , _hi in the system, namely { _ci , _xi , _yi , _wi , _hi ,i∈N}, _wi is height, _hi is height;

S1-3、利用光流法对当前图像与上一帧图像进行像素级别配准，通过最小化两帧间配准的像素的灰度值代价函数，估计相机运动位姿：S1-3. Use the optical flow method to perform pixel-level registration between the current image and the previous frame image, and estimate the camera motion pose by minimizing the gray value cost function of the pixels registered between the two frames:

式(1)中，(·)^符号表示将李代数se(3)中的6维向量变换成4×4的矩阵形式，即

而exp(·)是对李代数se(3)的指数映射，即exp(ξ^)∈SE(3)为相邻帧的刚体运动位姿，SE(3)是特殊欧几里得群。P₁和P₂为对应相机的投影矩阵，X_i,1和X_i,2为空间中对应的三维点在对应相机坐标系下的表达，I₁(·)和I₂(·)为对应像素点的像素灰度值；In formula (1), the ( )^ symbol represents the transformation of the 6-dimensional vector in the Lie algebra se(3) into a 4×4 matrix form, that is

And exp( ) is the exponential mapping to the Lie algebra se(3), that is, exp(ξ^)∈SE(3) is the rigid motion pose of adjacent frames, and SE(3) is a special Euclidean group. P ₁ and P ₂ are the projection matrices of the corresponding cameras, X _i,1 and X _i,2 are the expressions of the corresponding three-dimensional points in the space in the corresponding camera coordinate system, and I ₁ ( ) and I ₂ ( ) are the corresponding The pixel gray value of the pixel point;

S1-4、利用对偶椭球曲面对物体进行包络作为物体路标地图表达，先对图像物体检测框进行内切椭圆拟合并得到椭圆的对偶矩阵形式，对于视图i而言，在其相机姿态(R_i，t_i)和相机内参矩阵K已经求得的情况下，三维空间中的对偶椭球曲面Q^*在该视图中的投影轮廓

通过投影矩阵P_i＝K·[R_i t_i]具有如下关系：S1-4, use the dual ellipsoid surface to envelop the object as the object road sign map expression, first perform inscribed ellipse fitting on the image object detection frame and obtain the dual matrix form of the ellipse. For view i, in its camera When the pose (R _i , t _i ) and the camera internal parameter matrix K have been obtained, the projected contour of the dual ellipsoid surface Q ^* in the three-dimensional space in this view

Through the projection matrix P _i =K·[R _i t _i ] has the following relationship:

式(2)中，标量

表明方程在相差一个尺度下具有等价性。In formula (2), the scalar

Show that the equations are equivalent up to one scale apart.

S1-5、在齐次坐标系中，椭球的对偶形式Q^*为一个4X4的对称矩阵，而椭圆的对偶形式C^*为一个3X3的对称矩阵，即Q^*和C^*分别有10个和6个独立元素；方程(2)是一个二次型方程，将P_i进行二次型向量表达，记为B_i，原方程线性化表达成：S1-5. In the homogeneous coordinate system, the dual form Q ^* of the ellipsoid is a 4X4 symmetric matrix, and the dual form C ^* of the ellipse is a 3X3 symmetric matrix, that is, Q ^* and C ^* have 10 and 6 independent elements; Equation (2) is a quadratic equation, expressing P _i as a quadratic vector, denoted as B _i , the original equation is linearized and expressed as:

S1-6、对于多个视图，联立多个方程(3)构建线性方程组，对于过约束方程的求解问题是线性最小二乘问题，通过SVD进行求解出变量

将其恢复成对称矩阵可得到椭球原对偶形式：S1-6. For multiple views, multiple equations (3) are simultaneously constructed to construct a linear equation system. The solution problem for the over-constrained equation is a linear least squares problem, and the variables are solved by SVD.

Restoring it to a symmetric matrix gives the ellipsoid primordial form:

完成物体包络椭球的重建，形成物体路标全局地图。Complete the reconstruction of the object envelope ellipsoid to form a global map of object landmarks.

进一步地，所述步骤S1-1中，预处理过程包括：滤波去噪、灰度图转化以及尺寸缩放，以适应神经网络输入维度。Further, in the step S1-1, the preprocessing process includes: filtering and denoising, grayscale image conversion and size scaling, so as to adapt to the input dimension of the neural network.

进一步地，所述步骤S2构建在线的物体路标局部地图的过程如下：Further, the process of constructing an online local map of object road signs in the step S2 is as follows:

对视觉图像进行预处理后，进行物体检测和数据关联；接着利用光流法进行位姿跟踪，即实现视觉里程计功能，该视觉里程计融合上一时刻优化位姿，以最大程度消除累计误差；然后利用多视图构建局部物体路标地图，并采用滑动窗口式的局部地图规模控制策略，并对物体路标包络面坐标系进行刚体变换，在相机坐标系下进行表达，最终形成以相机坐标系相关联的规模可控的物体路标局部地图。After the visual image is preprocessed, object detection and data association are performed; then the optical flow method is used for pose tracking, that is, the visual odometry function is realized. ; Then use multi-view to construct a local object landmark map, and adopt a sliding window type local map scale control strategy, and perform rigid body transformation on the coordinate system of the object landmark envelope surface, express it in the camera coordinate system, and finally form a camera coordinate system. A local map of the associated scale-controllable object landmarks.

进一步地，所述步骤S3的具体过程如下：Further, the specific process of the step S3 is as follows:

对物体路标局部地图与物体路标全局地图进行匹配，匹配问题从数学上看为一个子图匹配问题，将物体路标全局地图表示为图结构G＝(V,E)，其中V为节点，包含每个路标i的三维坐标和标签，E为边，表示两个路标之间的相互三维关系；同理将局部地图表示为图结构H＝(V′,E′)，V′为节点，E′为边；Match the local map of object landmarks with the global map of object landmarks. Mathematically, the matching problem is a subgraph matching problem. The global map of object landmarks is represented as a graph structure G=(V, E), where V is a node, including each The three-dimensional coordinates and labels of a landmark i, E is an edge, which represents the mutual three-dimensional relationship between the two landmarks; similarly, the local map is expressed as a graph structure H=(V′, E′), V′ is a node, E′ for the edge;

则地图匹配问题具体定义为：给定具有n个节点的图G＝(V,E)和具有m个节点的图H＝(V′,E′)，在图G和图H寻求节点对应关系：Then the map matching problem is specifically defined as: given a graph G=(V, E) with n nodes and a graph H=(V′, E′) with m nodes, find the corresponding relationship between nodes in graph G and graph H :

X∈{0,1}^n×m X∈{0,1} ^n×m

以最大化图属性和结构的一致性：To maximize the consistency of graph properties and structure:

其中v表示图G的i₁节点和图H的i₂节点的一致性度量，该度量来自路标的三维尺度一致性以及语义标签的一致性；d表示图G的i₁-j₁边和图H的i₂-j₂边的一致性度量，该度量来自地图中的不同路标之间三维位置距离一致性；利用语义信息对两个图中的相同标签的节点进行匹配，减少优化迭代次数，最后获得最优匹配解，确定两个地图之间路标的一一对应关系；where v represents the consistency measure of the i ₁ node of graph G and the i ₂ node of graph H, which is derived from the 3D scale consistency of landmarks and the consistency of semantic labels; d represents the i ₁ -j ₁ edge of graph G and the graph The consistency measure of the i ₂ -j ₂ edge of H, which comes from the consistency of the three-dimensional position distance between different landmarks in the map; uses semantic information to match nodes with the same label in the two graphs, reducing the number of optimization iterations, Finally, the optimal matching solution is obtained, and the one-to-one correspondence between the road signs between the two maps is determined;

完成物体路标局部地图对物体路标全局地图的匹配之后，以物体路标全局地图中的路标三维信息作为基准进行位姿传播，即匹配上的全局地图路标三维信息-局部地图路标三维信息-局部地图的参考坐标系——相机坐标系在世界坐标系中的变换T_wc∈SE(3)，最后获得当前时刻机器人相对于地图世界坐标系的位姿：After completing the matching between the local map of the object landmark and the global map of the object landmark, the three-dimensional information of the landmark in the global map of the object landmark is used as the benchmark to propagate the pose, that is, the three-dimensional information of the matching global map landmark - the three-dimensional information of the local map landmark - the local map. Reference coordinate system - the transformation of the camera coordinate system in the world coordinate system T _wc ∈ SE (3), and finally obtain the pose of the robot relative to the map world coordinate system at the current moment:

T_wr＝T_wcT_cr∈SE(3)T _wr =T _wc T _cr ∈SE(3)

式中，T_CR∈SE(3)是机器人相对相机坐标系的变换矩阵,T_WC∈SE(3)是相机坐标系相对世界坐标系的变换矩阵，根据变换矩阵链式法则，二者相乘做矩阵运算可以得到机器人相对世界坐标系的变换矩阵T_WR∈SE(3)，即机器人的定位姿态信息。In the formula, T _CR ∈ SE(3) is the transformation matrix of the robot relative to the camera coordinate system, and T _WC ∈ SE(3) is the transformation matrix of the camera coordinate system relative to the world coordinate system. According to the transformation matrix chain rule, the two are multiplied together. The transformation matrix T _WR ∈ SE (3) of the robot relative to the world coordinate system can be obtained by doing matrix operations, that is, the positioning and attitude information of the robot.

与现有技术相比，本方案原理及优点如下：Compared with the prior art, the principle and advantages of this scheme are as follows:

本方案首先构建离线的物体路标全局地图，然后构建在线的物体路标局部地图，接着、以图结构为数学模型对物体路标局部地图和物体路标全局地图进行数学表达，并以子图匹配问题作为地图匹配数学方程，以子地图匹配求解结果实现物体路标局部地图在物体路标全局地图中的配准估计，最终实现移动机器人的视觉定位。本方案基于物体语义路标进行机器人视觉定位，利用三维空间的几何结构具有视角不变性的性质，摆脱定位系统对机器人运动姿态的限制要求，提升定位系统对外界环境条件的鲁棒性，最终提升移动机器人的自主性和灵活性。This scheme firstly builds an offline global map of object landmarks, and then builds an online local map of object landmarks. Then, the local map of object landmarks and the global map of object landmarks are mathematically expressed using the graph structure as a mathematical model, and the subgraph matching problem is used as the map. Match mathematical equations, and realize the registration estimation of the local map of object landmarks in the global map of object landmarks based on the sub-map matching solution results, and finally realize the visual positioning of mobile robots. This scheme performs robot visual positioning based on object semantic landmarks. The geometric structure of three-dimensional space has the property of invariance of viewing angle, which gets rid of the restriction requirements of the positioning system on the robot's motion posture, improves the robustness of the positioning system to external environmental conditions, and finally improves the movement. Robotic autonomy and flexibility.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的服务作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the services required in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only For some embodiments of the present invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明一种基于物体语义路标的移动机器人视觉定位方法的原理流程图；Fig. 1 is the principle flow chart of a kind of mobile robot visual positioning method based on object semantic landmarks of the present invention;

图2为实施例中基于椭球包络曲面表征的物体路标全局地图；2 is a global map of object landmarks represented by an ellipsoid envelope surface in an embodiment;

图3为实施例中滑动窗口策略下的物体路标局部地图。FIG. 3 is a partial map of object landmarks under the sliding window strategy in the embodiment.

具体实施方式Detailed ways

下面结合具体实施例对本发明作进一步说明：Below in conjunction with specific embodiment, the present invention will be further described:

如图1所示，本实施例所述的一种基于物体语义路标的移动机器人视觉定位方法，包括以下步骤：As shown in FIG. 1 , a method for visual positioning of a mobile robot based on object semantic landmarks described in this embodiment includes the following steps:

S1、构建离线的物体路标全局地图，过程如下：S1. Build an offline global map of object road signs. The process is as follows:

S1-1、获取输入图像，并对输入图像进行预处理，具体包括：滤波去噪、灰度图转化以及尺寸缩放，以适应神经网络输入维度；S1-1. Acquire an input image, and perform preprocessing on the input image, specifically including: filtering and denoising, grayscale image conversion, and size scaling, so as to adapt to the input dimension of the neural network;

And exp( ) is an exponential mapping to the Lie algebra se(3), that is, exp(ξ^)∈SE(3) is the rigid motion pose of adjacent frames, and SE(3) is a special Euclidean group. P ₁ and P ₂ are the projection matrices of the corresponding cameras, X _i,1 and X _i,2 are the expressions of the corresponding three-dimensional points in the space in the corresponding camera coordinate system, and I ₁ (·) and I ₂ (·) are the corresponding The pixel gray value of the pixel point;

式(2)中，标量

Show that the equations are equivalent up to one scale apart.

Restoring it to a symmetric matrix gives the ellipsoid primordial form:

完成物体包络椭球的重建，形成物体路标全局地图，如图2所示。After completing the reconstruction of the object envelope ellipsoid, a global map of object landmarks is formed, as shown in Figure 2.

本步骤同步骤S1，对视觉图像进行预处理后，进行物体检测和数据关联；接着利用光流法进行位姿跟踪，即实现视觉里程计功能，该视觉里程计融合上一时刻优化位姿，以最大程度消除累计误差；然后利用多视图构建局部物体路标地图，并采用滑动窗口式的局部地图规模控制策略，并对物体路标包络面坐标系进行刚体变换，在相机坐标系下进行表达，最终形成以相机坐标系相关联的规模可控的物体路标局部地图，如图3所示。This step is the same as step S1. After the visual image is preprocessed, object detection and data association are performed; then the optical flow method is used to track the pose, that is, the visual odometry function is realized. To eliminate the accumulated error to the greatest extent; then use multi-view to build a local object landmark map, and adopt a sliding window type local map scale control strategy, and perform rigid body transformation on the coordinate system of the object landmark envelope, and express it in the camera coordinate system, Finally, a scale-controllable local map of object landmarks associated with the camera coordinate system is formed, as shown in Figure 3.

S3、物体路标局部地图和物体路标全局地图进行匹配，最终实现移动机器人的视觉定位。S3, the local map of the object road sign is matched with the global map of the object road sign, and finally the visual positioning of the mobile robot is realized.

本步骤的具体过程如下：The specific process of this step is as follows:

X∈{0,1}^n×m X∈{0,1} ^n×m

完成物体路标局部地图对物体路标全局地图的匹配之后，以物体路标全局地图中的路标三维信息作为基准进行位姿传播，即匹配上的全局地图路标三维信息-局部地图路标三维信息-局部地图的参考坐标系——相机坐标系在世界坐标系中的变换T_wc∈SE(3)，最后获得当前时刻机器人相对于地图世界坐标系的位姿：After completing the matching of the local map of the object landmark to the global map of the object landmark, the three-dimensional information of the landmark in the global map of the object landmark is used as the benchmark to propagate the pose, that is, the three-dimensional information of the matching global map landmark - the three-dimensional information of the local map landmark - the local map. Reference coordinate system - the transformation of the camera coordinate system in the world coordinate system T _wc ∈ SE (3), and finally obtain the pose of the robot relative to the map world coordinate system at the current moment:

T_wr＝T_wcT_cr∈SE(3)T _wr =T _wc T _cr ∈SE(3)

本实施例基于物体语义路标进行机器人视觉定位，利用三维空间的几何结构具有视角不变性的性质，摆脱定位系统对机器人运动姿态的限制要求，提升定位系统对外界环境条件的鲁棒性，最终提升移动机器人的自主性和灵活性。This embodiment performs robot visual positioning based on object semantic landmarks, uses the property of invariant viewing angle of the geometric structure of three-dimensional space, gets rid of the restriction requirements of the positioning system on the robot's motion posture, improves the robustness of the positioning system to external environmental conditions, and finally improves the Autonomy and flexibility of mobile robots.

以上所述之实施例子只为本发明之较佳实施例，并非以此限制本发明的实施范围，故凡依本发明之形状、原理所作的变化，均应涵盖在本发明的保护范围内。The above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of implementation of the present invention. Therefore, any changes made according to the shape and principle of the present invention should be included within the protection scope of the present invention.

Claims

1. A mobile robot visual positioning method based on object semantic road signs is characterized by comprising the following steps:

s1, constructing an offline object landmark global map;

s2, constructing an online object road sign local map;

s3, performing mathematical expression on the object landmark local map and the object landmark global map by taking the graph structure as a mathematical model, taking a sub-map matching problem as a map matching mathematical equation, realizing registration estimation of the object landmark local map in the object landmark global map by using a sub-map matching solution result, and finally realizing visual positioning of the mobile robot;

the specific process of constructing the offline global map of object landmarks in step S1 is as follows:

s1-1, acquiring an input image, and preprocessing the input image;

s1-2, realizing multi-target object detection by using the trained convolutional neural network, finally outputting in the form of a 2D rectangular envelope frame, and outputting the category vectors c of all target objects in the image _i And the position x of its envelope frame in the pixel coordinate system _i ，y _i And size information w _i ，h _i I.e. { c _i ,x _i ,y _i ,w _i ,h _i ,i∈N}，w _i Is height, h _i Is the height;

s1-3, performing pixel level registration on the current image and the previous frame image by using an optical flow method, and estimating the motion pose of the camera by minimizing the gray value cost function of the registered pixels between two frames:

in equation (1) (. cndot.). Lambda-symbology transforms a 6-dimensional vector in lie algebra se (3) into a 4x4 matrix form, i.e.

Exp (-) is an exponential mapping of lie algebra SE (3), namely exp (xi ^ epsilon SE (3) is a rigid motion pose of an adjacent frame, and SE (3) is a special Euclidean group; p is ₁ And P ₂ For projection matrices of corresponding cameras, X _i,1 And X _i,2 For the representation of the corresponding three-dimensional points in space in the corresponding camera coordinate system, I ₁ (. cndot.) and I ₂ (·) is the pixel gray value of the corresponding pixel point;

s1-4, enveloping the object by using the dual ellipsoid curve as object road sign map expression, firstly carrying out inscribed ellipse fitting on the image object detection frame and obtaining the dual matrix form of ellipse, and regarding the view i, at the camera attitude (R) of the view i _i ，t _i ) Dual ellipsoid curved surface in three-dimensional space under the condition that the sum camera internal reference matrix K is obtainedQ ^* Projected contour in this view

By projecting a matrix P _i ＝K·[R _i t _i ]Has the following relationship:

in the formula (2), scalar quantity

Showing that the equations are equivalent at one scale of phase difference;

s1-5, in the homogeneous coordinate system, the dual form Q of ellipsoid ^* Is a symmetric matrix of 4X4, and the dual form C of the ellipse ^* Is a symmetrical matrix of 3X3, i.e. Q ^* And C ^* There are 10 and 6 independent elements, respectively; equation (2) is a quadratic equation with P _i Performing quadratic form vector expression, and recording as B _i The original equation is linearly expressed as:

s1-6, for multiple views, simultaneously establishing multiple equations (3) to construct a linear equation set, solving the overconstrained equation to be a linear least squares problem, and solving through SVD to obtain variables

Restoring the ellipsoid into a symmetric matrix to obtain an ellipsoid primal-dual form:

and reconstructing an object envelope ellipsoid to form an object landmark global map.

2. The method for visually positioning a mobile robot based on semantic landmarks of objects as recited in claim 1, wherein said preprocessing step S1-1 comprises: filtering and denoising, converting a gray scale image and scaling the size to adapt to the input dimension of the neural network.

3. The visual positioning method for mobile robot based on semantic object signposts of claim 1, wherein the step S2 comprises the following steps:

after the visual image is preprocessed, object detection and data association are carried out; then, a position and pose tracking is carried out by using an optical flow method, namely, the function of a visual odometer is realized, and the visual odometer integrates the position and pose optimized at the last moment to eliminate the accumulated error to the maximum extent; and then constructing a local object road sign map by using multiple views, adopting a sliding window type local map scale control strategy, carrying out rigid body transformation on an object road sign envelope surface coordinate system, expressing under a camera coordinate system, and finally forming the scale-controllable object road sign local map related to the camera coordinate system.

4. The visual positioning method for mobile robot based on semantic object road sign of claim 1, wherein the specific process of step S3 is as follows:

matching an object landmark local map and an object landmark global map, wherein the matching problem is a sub-graph matching problem in mathematics, and the object landmark global map is represented as a graph structure G (V, E), wherein V is a node and comprises a three-dimensional coordinate and a label of each landmark i, and E is an edge and represents the mutual three-dimensional relationship between two landmarks; in the same way, the local map is represented as a graph structure H ═ (V ', E'), where V 'is a node and E' is an edge;

the map matching problem is specifically defined as: given a graph G with n nodes (V, E) and a graph H with m nodes (V ', E'), node correspondences are sought in graph G and graph H:

X∈{0,1} ^n×m

to maximize consistency of graph attributes and structure:

wherein v represents i of graph G ₁ Node and i of graph H ₂ A consistency measure of the nodes, the measure derived from the three-dimensional scale consistency of the signposts and the consistency of the semantic tags; d represents i of diagram G ₁ -j ₁ Edge and i of graph H ₂ -j ₂ A consistency measure of the edge, the measure derived from three-dimensional positional distance consistency between different landmarks in the map; matching nodes with the same label in the two maps by utilizing semantic information, reducing the number of optimization iterations, finally obtaining an optimal matching solution, and determining the one-to-one correspondence of the road signs between the two maps;

after the matching of the object landmark local map to the object landmark global map is finished, the pose propagation is carried out by taking the landmark three-dimensional information in the object landmark global map as a reference, namely the transformation T of the global map landmark three-dimensional information-local map reference coordinate system-camera coordinate system in the world coordinate system on the matching _wc E, SE (3), and finally obtaining the pose of the robot at the current moment relative to a map world coordinate system:

T _WR ＝T _WC T _CR ∈SE(3)

in the formula, T _CR E SE (3) is the transformation matrix of the robot with respect to the camera coordinate system, T _WC E SE (3) is a transformation matrix of the camera coordinate system relative to the world coordinate system, and the transformation matrix T of the robot relative to the world coordinate system can be obtained by multiplying the two according to a transformation matrix chain rule and performing matrix operation _WR And E, SE (3) is the positioning attitude information of the robot.