CN114550491A

CN114550491A - Airport underground parking space positioning and navigation method based on mobile phone

Info

Publication number: CN114550491A
Application number: CN202210207709.3A
Authority: CN
Inventors: 王庆; 张凯; 阳媛; 张波
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2022-05-27

Abstract

一种基于手机的机场地下停车位定位导航方法，可以通过手机自身视觉和惯性传感器进行停车位定位和导航算法。本发明首先通过基于场景坐标回归的视觉定位方法对手机相机拍摄的停车位进行定位，该视觉定位算法以端到端的神经网络算法实现输入RGB图像输出相机位姿，然后当人员需要寻找车位时，再次拍摄当前位置图像获得人员当前初始位置，之后通过惯性传感器进行实时推算位置进行导航，并且以相机定位位姿作为机会信号对惯性传感器累计误差进行补偿，最终可以指导人员到达停车位。本发明通过基于场景坐标回归的高精度视觉定位和高频的惯性传感器的位姿估计，两者进行融合定位实现实时的高精度定位导航。A mobile phone-based method for locating and navigating underground parking spaces in an airport can use the mobile phone's own vision and inertial sensors to perform parking space locating and navigation algorithms. The present invention firstly locates the parking space photographed by the mobile phone camera through the visual positioning method based on scene coordinate regression, the visual positioning algorithm realizes the input RGB image and outputs the camera pose by the end-to-end neural network algorithm, and then when personnel need to find the parking space, Take the current position image again to obtain the current initial position of the person, and then use the inertial sensor to estimate the position in real time for navigation, and use the camera positioning pose as an opportunity signal to compensate the accumulated error of the inertial sensor, and finally guide the person to the parking space. The present invention realizes real-time high-precision positioning and navigation through fusion positioning of high-precision visual positioning based on scene coordinate regression and pose estimation of high-frequency inertial sensors.

Description

A mobile phone-based method for locating and navigating underground parking spaces in airports

技术领域technical field

本发明属于室内定位领域，特别是涉及一种基于手机的机场地下停车位定位导航方法。The invention belongs to the field of indoor positioning, and in particular relates to a mobile phone-based method for positioning and navigating underground parking spaces in airports.

背景技术Background technique

近些年随着经济的快速发展，汽车的数量已经大大增加，使用汽车出行也越来越受欢迎，停车已成为汽车出行的刚需，为了应对激增的停车位的需求，随之而来地下停车场的不断建设和增加，并且其规模也越来越大，当车主在停车之后进行其他活动，再回到停车场进行寻找车辆变得非常困难，所以对于规模很大的停车场的定位导航服务是必不可少的，尤其是机场的地下停车场，一般都是开私家车停车之后乘坐飞机去往外地再返回到停车场寻找车辆，经过长时间之后再寻找停车位置变得更加困难。In recent years, with the rapid development of the economy, the number of cars has greatly increased, and the use of cars has become more and more popular. Parking has become a rigid need for car travel. In order to cope with the surge in demand for parking spaces, underground parking comes along with it. With the continuous construction and increase of the parking lot, and its scale is also getting larger and larger, when the car owner performs other activities after parking, it becomes very difficult to return to the parking lot to find the vehicle, so the positioning and navigation service for the large-scale parking lot It is essential, especially in the underground parking lot of the airport. It is generally used to drive a private car to park and then fly to the field and then return to the parking lot to find a vehicle. After a long time, it becomes more difficult to find a parking location.

目前市面上主要是针对室外定位导航技术开发的平台比较多，并且大部分都已经很成熟，一般的地上的停车场的导航定位，主要采用GPS定位系统，系统会为车主提供车辆的位置信息，并且还可以得到车主人员的位置信息，利用GPS定位得到的车主自身的位置信息，根据停车位与车主人员本身的之间的相对位置规划出到达停车位最好的的导航方案，进行导航定位寻找车辆。但是在地下停车场，往往卫星信号会变得十分微弱，所以定位技术一般效果不佳甚至不可用，我国对于地下停车场的室内导航技术目前技术比较少，并且已有的室内定位导航技术大都设备昂贵或者定位精度低，所以利用手机的室内定位技术应用于机场地下停车场的停车位导航定位技术变得十分必要，基于普及的手机端进行定位导航具有普及性和可行性。At present, there are many platforms mainly developed for outdoor positioning and navigation technology on the market, and most of them are very mature. Generally, the navigation and positioning of the parking lot on the ground mainly adopts the GPS positioning system, and the system will provide the vehicle owner with the position information of the vehicle. In addition, the location information of the car owner can also be obtained, and the position information of the car owner obtained by GPS positioning can be used to plan the best navigation scheme to reach the parking space according to the relative position between the parking space and the car owner, and search for navigation and positioning. vehicle. However, in the underground parking lot, the satellite signal often becomes very weak, so the positioning technology is generally ineffective or even unavailable. At present, the indoor navigation technology for the underground parking lot in my country is relatively rare, and most of the existing indoor positioning and navigation technologies are equipped with equipment. It is expensive or has low positioning accuracy, so it is very necessary to use the indoor positioning technology of mobile phones to apply the parking space navigation and positioning technology of the airport underground parking lot. The positioning and navigation based on the popular mobile phone terminal is popular and feasible.

发明内容SUMMARY OF THE INVENTION

针对以上问题，本发明旨在提供一种基于手机的机场地下停车位定位导航方法，用以在地下停车场实现室内定位导航技术，可以满足普遍的手机端下的低成本、精确的导航性能。In view of the above problems, the present invention aims to provide a mobile phone-based method for locating and navigating an airport underground parking space, which is used to realize indoor positioning and navigation technology in an underground parking lot, and can meet the low-cost and accurate navigation performance of a common mobile phone.

本发明提供一种基于手机的机场地下停车位定位导航方法，具体步骤如下，其特征在于The invention provides a mobile phone-based method for locating and navigating underground parking spaces in an airport, the specific steps are as follows, and it is characterized in that

S11手机端的图像采集单元采集图像，通过可微分的RANSAC方法对获取的RGB图像进行坐标回归，得到该张图像中心点位姿预测，The image acquisition unit of the S11 mobile phone collects images, and performs coordinate regression on the acquired RGB image through the differentiable RANSAC method to obtain the pose prediction of the center point of the image.

上述可微分RANSAC方法包括：The differentiable RANSAC methods described above include:

S11-1场景坐标回归，通过ResNet网络预测二维RGB图像的每个像素i对应的3维停车场的场景坐标y_i(ω)，其中ω为神经网络ResNet模型的参数，通过该模型实现图像2维像素点(x_i,y_i)到3维场景坐标的(R,T)，R表示3个自由度的空间旋转，T表示3个自由度的位移，其中表示用f(ω)表示2维图像相机坐标系到相机6维姿态的映射，其中ω为ResNet神经网络需要学习的参数，优化可学习的参数ω通过最小化训练集上最终估计的期望姿势损失l：S11-1 scene coordinate regression, predict the scene coordinate y _i (ω) of the 3-dimensional parking lot corresponding to each pixel i of the 2-dimensional RGB image through the ResNet network, where ω is the parameter of the neural network ResNet model, through which the image is realized 2-dimensional pixel point (x _i , y _i ) to (R, T) of 3-dimensional scene coordinates, R represents the spatial rotation of 3 degrees of freedom, T represents the displacement of 3 degrees of freedom, which is represented by f(ω) The mapping from the 2D image camera coordinate system to the 6D pose of the camera, where ω is the parameter that the ResNet neural network needs to learn, and the learnable parameter ω is optimized by minimizing the final estimated expected pose loss l on the training set:

式中f^*表示图像I的位姿的真值，为了对神经网络通过梯度下降法训练优化参数ω，对参数ω求导，上述公式的偏导数为：In the formula, f ^* represents the true value of the pose of the image I. In order to train the neural network through the gradient descent method to optimize the parameter ω, the derivative of the parameter ω is obtained. The partial derivative of the above formula is:

S11-2位姿假设采样，每个假设f(ω)的生成都是由图像对应的子集产生的，这个子集的大小为计算唯一解所需的最小映射数，这个最小子集为M_J，其中J＝{j₁,L,j_n}，其中n为最小子集数量；假设f(ω)的场景回归坐标为PnP问题，所以四个场景坐标足以定义一个独特的相机姿态，所以n＝4，由于每次随机选取四个点进行预测的结果可能是错误的，所以通过随机选择4对的图像到场景坐标进行预测，最终可以生成n个图像的姿态预测f(ω)的集合，每一个生成的场景坐标假设f(ω)都取决于参数ω；S11-2 pose hypothesis sampling, the generation of each hypothesis f(ω) is generated by the subset corresponding to the image, the size of this subset is the minimum number of mappings required to calculate a unique solution, and the minimum subset is M _J , where J={j ₁ ,L,j _n }, where n is the minimum number of subsets; suppose the scene regression coordinates of f(ω) are PnP problems, so four scene coordinates are enough to define a unique camera pose, so n=4, since the result of randomly selecting four points for prediction each time may be wrong, by randomly selecting 4 pairs of images to the scene coordinates for prediction, the set of pose prediction f(ω) of n images can finally be generated , each generated scene coordinate hypothesis f(ω) depends on the parameter ω;

S11-3选择最优位姿假设，对于有不同场景坐标回归性能的假设f(ω)，需要有评价机制选择最优的假设，评价决定了相机姿态假设的选择和最后细化假设的效果，以产生最终的估计，具体如下；S11-3 Select the optimal pose hypothesis. For the hypothesis f(ω) with different scene coordinate regression performance, an evaluation mechanism is required to select the optimal hypothesis. The evaluation determines the selection of the camera pose hypothesis and the effect of the final refinement of the hypothesis. to produce the final estimate, as follows;

首先定义图像的像素i和假设f(ω)的重投影误差为：First define the pixel i of the image and assume the reprojection error of f(ω) as:

e_i(f,ω)＝||Cf^-1k_i(ω)-p_i||e _i (f,ω)=||Cf ^-1 k _i (ω)-p _i ||

式中p_i为图像的每个像素i的图像坐标(x_i,y_i)，C为相机投影矩阵，k_i为内点，如果e_i＜τ，其中τ确定内点的误差阈值；where pi is the image coordinate (x _i , y _i ) of each pixel _i of the image, C is the camera projection matrix, _ki is the interior point, if e _i <τ, where τ determines the error threshold of the interior point;

场景坐标回归工作依赖于对内点k_i计数来对假设进行评分，为了实现神经网络端到端的训练，通过用sigmoid函数来构造一个可微函数：The scene coordinate regression work relies on counting the inliers k _i to score the hypothesis. To achieve end-to-end training of the neural network, a differentiable function is constructed by using the sigmoid function:

式中e_i表示重投影误差，超参数β控制sigmoid的柔软度；where e _i represents the reprojection error, and the hyperparameter β controls the softness of the sigmoid;

评价函数p(f)对每个假设与所有场景坐标预测的一致性进行评分，指标为j的假设f_j(ω)是根据评分值导出的概率分布P(j,ω,α)选择的，得分高的假设更有可能被选中，选择最终假设根据softmax分布P：The evaluation function p(f) scores the consistency of each hypothesis with all scene coordinate predictions, and the hypothesis f _j (ω) with index j is selected according to the probability distribution P(j, ω, α) derived from the score value, Hypotheses with high scores are more likely to be selected, and the final hypothesis is selected according to the softmax distribution P:

内点计数分数的大小可以根据场景的难度而有很大的不同，通常在不同环境下有很大数量级的不同，将评价分数保持在合理的范围内对于拥有广泛的分布P(j；ω,α)很重要，对稳定端到端训练很重要，手动设置每个场景的超参数α是一项乏味的任务，在端到端训练中自动适应α，通过熵来对超参数α大小进行选择：The size of the inlier count scores can vary greatly depending on the difficulty of the scene, usually by a large order of magnitude in different environments, keeping the evaluation scores within a reasonable range for having a wide distribution P(j;ω, α) is important, important for stable end-to-end training, manually setting the hyperparameter α for each scene is a tedious task, automatically adapting α during end-to-end training, and choosing the size of the hyperparameter α by entropy :

在端到端的训练中训练是通过argmin_α|S(α)-S^*|梯度下降来确定参数α，选择是在刚开始的端到端训练迭代中建立目标熵，并在整个过程中保持稳定；In end-to-end training, the training is done by argmin _α |S(α)-S ^* | gradient descent to determine the parameter α, the choice is to build the target entropy in the first iteration of end-to-end training and keep it stable throughout the process ;

S11-4细化最优位姿假设，细化函数R是一个迭代过程，它在使用当前姿态估计确定内点像素和优化内点像素的估计之间交替进行，迭代过程如果要使用神经网络，必须实现端到端的训练，为了提高训练得到模型的泛化性和精度，训练过程分为两步，第一步采用MAML方法把训练过程分为内循环和外循环，使模型有一个较好的初始参数，第二部对细化过程可微化，最后可以使神经网络模型输出最终位姿R(y_j(ω))；S11-4 refines the optimal pose hypothesis. The refinement function R is an iterative process that alternates between determining inlier pixels using the current pose estimate and optimizing the estimation of inlier pixels. If a neural network is to be used in the iterative process, End-to-end training must be realized. In order to improve the generalization and accuracy of the model obtained from training, the training process is divided into two steps. The first step uses the MAML method to divide the training process into an inner loop and an outer loop, so that the model has a better performance. Initial parameters, the second part is differentiable for the refinement process, and finally the neural network model can output the final pose R(y _j (ω));

S21通过对手机拍摄获得的图像进行场景坐标回归获得当前人员的位姿，其中可微分的RANSAC的坐标回归方法精度非常高，位姿精度误差可以在5cm和5°之内，以通过图像获得的位置和姿态作为当前的初始位置。S21 obtains the pose of the current person by performing scene coordinate regression on the image captured by the mobile phone. The coordinate regression method of the differentiable RANSAC has very high accuracy, and the pose accuracy error can be within 5cm and 5°. The position and attitude are used as the current initial position.

S22获取此时此刻手机内置的IMU的数据，根据获得的初始位置姿态数据，也就是上一时刻的姿态数据进行推理更新，获得当前时刻的位姿数据，S22 obtains the data of the built-in IMU of the mobile phone at this moment, performs inference and update according to the obtained initial position and attitude data, that is, the attitude data of the previous moment, and obtains the pose data of the current moment,

S23通过IMU进行位姿更新和导航过程中，并通过图像的场景坐标回归得到的位姿作为新的初始坐标，初始化IMU的迭代更新过程，通过高精度的视觉定位消除惯性传感器的累计误差。S23 uses the IMU to update the pose and navigate the process, and uses the pose obtained by the scene coordinate regression of the image as the new initial coordinate to initialize the iterative update process of the IMU, and eliminate the accumulated error of the inertial sensor through high-precision visual positioning.

作为本发明进一步改进，步骤S11-4中训练过程的第一步，采用MAML方法把训练过程分为内循环和外循环具体如下：As a further improvement of the present invention, the first step of the training process in step S11-4 adopts the MAML method to divide the training process into an inner loop and an outer loop as follows:

对于模型初始化引入MAML初始化模型参数并且使模型有一个较好的初始性能，把神经网络的训练过程分为内循环和外循环，内循环实现基本的模型功能的训练，外循环提高模型泛化性的训练，同样也把训练集分为两部分用于两个循环的训练；For model initialization, MAML is introduced to initialize the model parameters and make the model have a better initial performance. The training process of the neural network is divided into an inner loop and an outer loop. The inner loop realizes the training of basic model functions, and the outer loop improves the generalization of the model. training, and also divide the training set into two parts for the training of two loops;

其中内循环的训练集在有3D模型时，使用渲染的场景真值进行初始化，否则近似的真值进行启发式的初始化，内循环的优化为：The training set of the inner loop is initialized with the real value of the rendered scene when there is a 3D model, otherwise the approximated real value is initialized heuristically. The optimization of the inner loop is:

内循环的具体参数迭代过程表示为：The specific parameter iteration process of the inner loop is expressed as:

ω'＝ω-μ▽_ωL(f_ω)ω'=ω-μ▽ _ω L(f _ω )

式中，ω'为内循环迭代的最优参数，ω是初始化参数，μ为内训话训练过程的学习率，▽_ωL(f_ω)表示内循环的梯度。In the formula, ω' is the optimal parameter of the inner loop iteration, ω is the initialization parameter, μ is the learning rate of the inner training process, and ▽ _ω L(f _ω ) represents the gradient of the inner loop.

对于外循环，使用地面真值姿态计算的重投影误差，在第内循环步骤中使用启发式算法，则有效地恢复场景点的正确深度；For the outer loop, using the reprojection error calculated by the ground truth pose, and using a heuristic algorithm in the inner loop step, effectively restores the correct depth of the scene points;

因此,内循环的优化过程为：Therefore, the optimization process of the inner loop is:

关于内循环和外循环参数的传递，具体参数传递方法为在采样下一批任务之前，执行元更新或元优化，通过对内循环的训练找到了最优参数ω，之后计算对每个内循环的ω的梯度，并通过梯度更新的方法来更新随机初始化参数ω，这使得随机初始化参数θ找到一个和目标任务更加近似的初始参数，当训练下一批任务时不需要采取许多梯度步骤；Regarding the transfer of parameters of the inner loop and the outer loop, the specific parameter transfer method is to perform meta-update or meta-optimization before sampling the next batch of tasks, find the optimal parameter ω through the training of the inner loop, and then calculate the optimal parameter ω for each inner loop. The gradient of ω, and the random initialization parameter ω is updated by the gradient update method, which makes the random initialization parameter θ find an initial parameter that is more similar to the target task, and does not need to take many gradient steps when training the next batch of tasks;

整个过程表述如下：The whole process is described as follows:

式中，ω是初始化参数，η是外循环的学习率超参数，

是对于外循环对于参数ω'的梯度；where ω is the initialization parameter, η is the learning rate hyperparameter of the outer loop,

is the gradient of the outer loop for the parameter ω';

对于第二步训练，以端到端方式训练模型，要求所有过程都是可微的，包括姿态优化，对于细化假设的过程，细化位姿的过程为：For the second step of training, training the model in an end-to-end manner requires all processes to be differentiable, including pose optimization. For the process of refining the hypothesis, the process of refining the pose is:

由于argmin函数不可微，采用高斯牛顿法求得细化函数的微分：Since the argmin function is not differentiable, the Gauss-Newton method is used to obtain the differentiation of the refinement function:

式中，J_e表示雅可比矩阵，其中包含偏导数

其中h_o＝R^t＝∞(y)，最终实现端到端的训练。In the formula, J _e represents the Jacobian matrix, which contains the partial derivatives

where h _o = R ^{t = ∞} (y), and finally achieve end-to-end training.

作为本发明进一步改进，所述每个假设y(ω)依赖于相应场景坐标的参数。As a further improvement of the present invention, each hypothesis y(ω) depends on the parameters of the corresponding scene coordinates.

作为本发明进一步改进，所述超参数α控制了分布的宽度，选择合适的超参数有助于端到端学习，其作用就是减小假设评分之间的差别，使得梯度值相差不会太远。As a further improvement of the present invention, the hyperparameter α controls the width of the distribution, and selecting an appropriate hyperparameter is helpful for end-to-end learning, and its function is to reduce the difference between the hypothesis scores, so that the gradient values are not too far apart .

针对目前的定位导航系统在地下停车场室内环境下精度低或成本高的问题，本发明提出一种基于手机的机场地下停车位定位导航方法。该方法应用分为两步进行实现，第一步在人员停车的时候通过可微分RANSAC的视觉定位方法对手机拍摄的图像进行定位，获得车辆停放位置作为后续导航定位的终点坐标；第二步在人员再次进入地下停车场寻找车辆的时候，首先通过手机拍摄图像视觉定位获取导航的初始坐标位置，同样作为手机惯性传感器IMU的初始坐标，通过IMU不断解算推理进行导航定位到达终点坐标。Aiming at the problems of low precision or high cost of the current positioning and navigation system in the indoor environment of the underground parking lot, the present invention proposes a mobile phone-based positioning and navigation method for the underground parking space of the airport. The application of this method is implemented in two steps. The first step is to locate the image captured by the mobile phone through the visual positioning method of differentiable RANSAC when the person is parking, and the parking position of the vehicle is obtained as the end coordinate of the subsequent navigation and positioning; When people enter the underground parking lot to find vehicles again, they first obtain the initial coordinates of the navigation through the visual positioning of the images captured by the mobile phone, which is also used as the initial coordinates of the mobile phone inertial sensor IMU.

本发明对室内定位导航定位问题，提出一种可微分RANSAC视觉定位与惯性定位融合的方法，可以融合高精度视觉定位和高频惯性定位，有效解决室内定位无GPS定位精度低的问题，同时可在手机平台上实现导航定位，能够广泛的应用于机场地下停车场的停车位导航服务。For the indoor positioning, navigation and positioning problem, the present invention proposes a method of fusion of differentiable RANSAC visual positioning and inertial positioning, which can integrate high-precision visual positioning and high-frequency inertial positioning, effectively solve the problem of low positioning accuracy without GPS in indoor positioning, and at the same time can The realization of navigation and positioning on the mobile phone platform can be widely used in the parking space navigation service of the airport underground parking lot.

附图说明Description of drawings

图1为本发明一种基于手机的机场地下停车位定位导航方法的总体流程图；1 is a general flow chart of a mobile phone-based airport underground parking space positioning and navigation method of the present invention;

图2为可微分RANSAC场景回归视觉定位算法的示意图。Figure 2 is a schematic diagram of a differentiable RANSAC scene regression visual localization algorithm.

具体实施方式Detailed ways

下面结合附图与具体实施方式对本发明作进一步详细描述：The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments:

本发明的一种基于手机的机场地下停车位定位导航方法，如图1所示，本发明是视觉和惯性融合的室内定位方法，包括以下步骤：A mobile phone-based method for locating and navigating underground parking spaces in airports of the present invention, as shown in FIG. 1 , the present invention is an indoor positioning method integrating vision and inertia, including the following steps:

图2为可微分RANSAC方法，其具体的实现方法包括：Figure 2 shows the differentiable RANSAC method, and its specific implementation method includes:

S11-1场景坐标回归，通过ResNet网络预测二维RGB图像的每个像素i对应的3维停车场的场景坐标y_i(ω)，其中ω为神经网络ResNet模型的参数，该模型的主要作用是实现图像2维像素点(x_i,y_i)到3维场景坐标的(R,T)，R表示3个自由度的空间旋转，T表示3个自由度的位移，其中表示用f(ω)表示2维图像相机坐标系到相机6维姿态的映射，其中ω为ResNet神经网络需要学习的参数，优化可学习的参数ω通过最小化训练集上最终估计的期望姿势损失l：S11-1 scene coordinate regression, predict the scene coordinate y _i (ω) of the 3-dimensional parking lot corresponding to each pixel i of the 2-dimensional RGB image through the ResNet network, where ω is the parameter of the neural network ResNet model, the main function of the model It is to realize the (R, T) of the 2-dimensional pixel point (x _i , y _i ) of the image to the 3-dimensional scene coordinate, R represents the spatial rotation of 3 degrees of freedom, and T represents the displacement of 3 degrees of freedom, which is represented by f( ω) represents the mapping from the 2-dimensional image camera coordinate system to the 6-dimensional pose of the camera, where ω is the parameter that the ResNet neural network needs to learn, and the learnable parameter ω is optimized by minimizing the final estimated expected pose loss l on the training set:

式中f^*表示图像I的位姿的真值，为了对神经网络通过梯度下降法训练优化参数ω，我们需要对参数ω求导，上述公式的偏导数为：In the formula, f ^* represents the true value of the pose of the image I. In order to train the neural network to optimize the parameter ω through the gradient descent method, we need to derive the parameter ω. The partial derivative of the above formula is:

S11-2位姿假设采样，每个假设f(ω)的生成都是由图像对应的子集产生的，这个子集的大小为计算唯一解所需的最小映射数，这个最小子集为M_J，其中J＝{j₁,L,j_n}，其中n为最小子集数量。本文的假设f(ω)的场景回归坐标为PnP问题，所以四个场景坐标足以定义一个独特的相机姿态，所以n＝4。但是每次随机选取四个点进行预测的结果可能是错误的，所以通过随机选择4对的图像到场景坐标进行预测，最终可以生成n个图像的姿态预测f(ω)的集合，每一个生成的场景坐标假设f(ω)都取决于参数ω。S11-2 pose hypothesis sampling, the generation of each hypothesis f(ω) is generated by the subset corresponding to the image, the size of this subset is the minimum number of mappings required to calculate a unique solution, and the minimum subset is M _J , where J={j ₁ , L, j _n }, where n is the minimum number of subsets. This paper assumes that the scene regression coordinates of f(ω) are PnP problems, so four scene coordinates are enough to define a unique camera pose, so n=4. However, the result of randomly selecting four points for prediction each time may be wrong, so by randomly selecting 4 pairs of images to the scene coordinates for prediction, a set of pose prediction f(ω) of n images can finally be generated, each generating The scene coordinates of f(ω) are assumed to depend on the parameter ω.

S11-3选择最优位姿假设，对于有不同场景坐标回归性能的假设f(ω)，需要有评价机制选择最优的假设，评价决定了相机姿态假设的选择和最后细化假设的效果，以产生最终的估计。首先本发明定义图像的像素i和假设f(ω)的重投影误差为：S11-3 Select the optimal pose hypothesis. For the hypothesis f(ω) with different scene coordinate regression performance, an evaluation mechanism is required to select the optimal hypothesis. The evaluation determines the selection of the camera pose hypothesis and the effect of the final refinement of the hypothesis. to produce the final estimate. First of all, the present invention defines the reprojection error of the pixel i of the image and the assumption f(ω) as:

e_i(f,ω)＝||Cf^-1k_i(ω)-p_i||e _i (f,ω)=||Cf ^-1 k _i (ω)-p _i ||

式中p_i为图像的每个像素i的图像坐标(x_i,y_i)，C为相机投影矩阵，k_i为内点，如果e_i＜τ，其中τ确定内点的误差阈值。where pi is the image coordinate (x _i , y _i ) of each pixel _i of the image, C is the camera projection matrix, _ki is the interior point, if e _i <τ, where τ determines the error threshold of the interior point.

本发明的场景坐标回归工作依赖于对内点k_i计数来对假设进行评分，为了实现神经网络端到端的训练，我们通过用sigmoid函数来构造一个可微函数：The scene coordinate regression work of the present invention relies on the counting of the interior points k _i to score the hypothesis. In order to realize the end-to-end training of the neural network, we construct a differentiable function by using the sigmoid function:

式中e_i表示重投影误差，超参数β控制sigmoid的柔软度。where e _i represents the reprojection error, and the hyperparameter β controls the softness of the sigmoid.

评价函数p(f)对每个假设与所有场景坐标预测的一致性进行评分。指标为j的假设f_j(ω)是根据评分值导出的概率分布P(j,ω,α)选择的。得分高的假设更有可能被选中，选择最终假设根据softmax分布P：The evaluation function p(f) scores each hypothesis for its agreement with all scene coordinate predictions. The hypothesis f _j (ω) with index j is selected according to the probability distribution P(j,ω,α) derived from the score value. Hypotheses with high scores are more likely to be selected, and the final hypothesis is selected according to the softmax distribution P:

内点计数分数的大小可以根据场景的难度而有很大的不同，通常在不同环境下有很大数量级的不同。将评价分数保持在合理的范围内对于拥有广泛的分布P(j；ω,α)很重要，对稳定端到端训练很重要。手动设置每个场景的超参数α是一项乏味的任务，因此我们在端到端训练中自动适应α，通过熵来对超参数α大小进行选择：The size of the inlier count scores can vary widely depending on the difficulty of the scene, often by a large order of magnitude across different environments. Keeping the evaluation scores within a reasonable range is important for having a wide distribution P(j;ω,α), which is important for stable end-to-end training. Manually setting the hyperparameter α for each scene is a tedious task, so we automatically adapt α in end-to-end training, using entropy to choose the size of the hyperparameter α:

在端到端的训练中训练是通过argmin_α|S(α)-S^*|梯度下降来确定参数α，选择是在刚开始的端到端训练迭代中建立目标熵，并在整个过程中保持稳定。In end-to-end training, the training is done by argmin _α |S(α)-S ^* | gradient descent to determine the parameter α, the choice is to build the target entropy in the first iteration of end-to-end training and keep it stable throughout the process .

S11-4细化最优位姿假设，细化函数R是一个迭代过程，它在使用当前姿态估计确定内点像素和优化内点像素的估计之间交替进行，迭代过程如果要使用神经网络，必须实现端到端的训练，为了提高训练得到模型的泛化性和精度，训练过程分为两步，第一步采用MAML方法把训练过程分为内循环和外循环，使模型有一个较好的初始参数，第二部对细化过程可微化，最后可以使神经网络模型输出最终位姿R(h_j(ω))。S11-4 refines the optimal pose hypothesis. The refinement function R is an iterative process that alternates between determining inlier pixels using the current pose estimate and optimizing the estimation of inlier pixels. If a neural network is to be used in the iterative process, End-to-end training must be realized. In order to improve the generalization and accuracy of the model obtained from training, the training process is divided into two steps. The first step uses the MAML method to divide the training process into an inner loop and an outer loop, so that the model has a better performance. The initial parameters, the second part is differentiable for the refinement process, and finally the neural network model can output the final pose R(h _j (ω)).

本发明的神经网络使用成对的RGB图像和地面真实姿势以端到端方式进行训练，但从头开始这样做将会失败，因为系统很快就会达到局部最小值。我们提出了一种新的二步训练模式，每一步都有不同的目标函数。根据3D场景模型是否可用，我们在第一步使用渲染或近似场景坐标初始化网络并提高系统准确性。训练第二步提高了系统的准确性，这在没有提供三维模型用于初始化时是至关重要的。The neural network of the present invention is trained in an end-to-end manner using paired RGB images and ground truth poses, but doing so from scratch will fail because the system quickly reaches local minima. We propose a new two-step training mode, each with a different objective function. Depending on whether a 3D scene model is available, we initialize the network with rendered or approximated scene coordinates in the first step and improve system accuracy. The second step of training improves the accuracy of the system, which is critical when no 3D model is provided for initialization.

训练过程的第一步，对于模型初始化引入MAML初始化模型参数并且使模型有一个较好的初始性能，把神经网络的训练过程分为内循环和外循环，内循环实现基本的模型功能的训练，外循环提高模型泛化性的训练，同样也把训练集分为两部分用于两个循环的训练。The first step of the training process is to introduce MAML to initialize the model parameters and make the model have a good initial performance. The training process of the neural network is divided into an inner loop and an outer loop. The inner loop realizes the training of basic model functions. The training of the outer loop to improve the generalization of the model also divides the training set into two parts for the training of two loops.

其中内循环的训练集在有3D模型时，使用渲染的场景真值进行初始化，否则近似的真值进行启发式的初始化，内循环的优化为：The training set of the inner loop is initialized with the real value of the rendered scene when there is a 3D model, otherwise the approximate truth value is initialized heuristically. The optimization of the inner loop is:

内循环的具体参数迭代过程可以表示为：The specific parameter iteration process of the inner loop can be expressed as:

ω'＝ω-μ▽_ωL(f_ω)ω'=ω-μ▽ _ω L(f _ω )

对于外循环，我们优化了使用地面真值姿态计算的重投影误差。如果我们在第内循环步骤中使用启发式算法，则可以有效地恢复场景点的正确深度。因此,内循环的优化过程为：For the outer loop, we optimize the reprojection error computed using the ground truth pose. If we use a heuristic in the inner loop step, we can efficiently recover the correct depth of scene points. Therefore, the optimization process of the inner loop is:

关于内循环和外循环参数的传递，具体参数传递方法为在采样下一批任务之前，我们执行元更新或元优化。也就是说，在上一步中，通过对内循环的训练找到了最优参数ω，之后计算对每个内循环的ω的梯度，并通过梯度更新的方法来更新我们的随机初始化参数ω。这使得随机初始化参数θ找到一个和目标任务更加近似的初始参数，当训练下一批任务时不需要采取许多梯度步骤。整个过程它可以表述如下：Regarding the transfer of parameters in the inner loop and the outer loop, the specific parameter transfer method is that we perform meta-update or meta-optimization before sampling the next batch of tasks. That is to say, in the previous step, the optimal parameter ω is found by training the inner loop, then the gradient of ω for each inner loop is calculated, and our random initialization parameter ω is updated by the gradient update method. This makes it possible to randomly initialize the parameters θ to find an initial parameter that is more similar to the target task, without taking many gradient steps when training the next batch of tasks. The whole process can be expressed as follows:

式中，ω是初始化参数，η是外循环的学习率超参数，

是对于外循环对于参数ω'的梯度。where ω is the initialization parameter, η is the learning rate hyperparameter of the outer loop,

is the gradient of the outer loop with respect to the parameter ω'.

第二步训练，我们要想以端到端方式训练我们的模型。这要求所有过程都是可微的，包括姿态优化。对于细化假设的过程，我们细化位姿的过程为：In the second training step, we want to train our model in an end-to-end fashion. This requires all processes to be differentiable, including pose optimization. For the process of refining the hypothesis, our process of refining the pose is:

式中，J_e表示雅可比矩阵，其中包含偏导数

其中h_o＝R^t＝∞(y)，最终我们可以实现端到端的训练。In the formula, J _e represents the Jacobian matrix, which contains the partial derivatives

where h _o = R ^{t = ∞} (y), and finally we can achieve end-to-end training.

以上所述，仅是本发明的较佳实施个例而已，并非是对本发明作任何其他形式的限制，而是依据本发明的技术实质所作的任何修改或等同变化，仍然属于本发明所要求保护的范围。The above are only examples of preferred embodiments of the present invention, and are not intended to limit the present invention in any other form, but any modifications or equivalent changes made according to the technical essence of the present invention still belong to the claimed protection of the present invention range.

Claims

1. A mobile phone-based airport underground parking space positioning and navigation method is characterized by comprising the following specific steps

S11, collecting the image by the image collecting unit of the mobile phone end, performing coordinate regression on the obtained RGB image by a differentiable RANSAC method to obtain the position and pose prediction of the central point of the image,

the above differentiable RANSAC method includes:

s11-1 scene coordinate regression, and predicting scene coordinate y of 3-dimensional parking lot corresponding to each pixel i of two-dimensional RGB image through ResNet network_i(omega), wherein omega is a parameter of a neural network ResNet model, and 2-dimensional pixel points (x) of the image are realized through the model_i,y_i) (R, T) to 3-dimensional scene coordinates, R representing spatial rotation of 3 degrees of freedom, T representing displacement of 3 degrees of freedom, wherein the mapping of the 2-dimensional image camera coordinate system to the 6-dimensional pose of the camera is represented by f (ω), where ω is a parameter that the ResNet neural network needs to learn, and the learnable parameter ω is optimized by minimizing the final estimated expected pose loss l on the training set:

in the formula f^*Representing the true value of the pose of the image I, and solving the derivative of the parameter omega in order to train and optimize the parameter omega for the neural network by a gradient descent method, wherein the partial derivative of the formula is as follows:

s11-2 pose hypothesis sampling, each hypothesis f (ω) is generated from a corresponding subset of the image, the subset having a size that is the minimum number of mappings required to compute a unique solution, the minimum subset being M_JWherein J ═ { J1, L, J_nH, where n is the minimum number of subsets; assuming that the scene regression coordinate of f (ω) is a PnP problem, four scene coordinates are sufficient to define a unique camera pose, so that n is 4, and since the result of randomly selecting four points for prediction at a time may be wrong, by randomly selecting 4 pairs of images to predict scene coordinates, a set of pose predictions f (ω) of n images may be finally generated, where each generated scene coordinate hypothesis f (ω) depends on a parameter ω;

s11-3, selecting an optimal pose hypothesis, selecting the optimal hypothesis by an evaluation mechanism for a hypothesis f (omega) with different scene coordinate regression performances, evaluating and determining the effect of selecting the camera attitude hypothesis and finally refining the hypothesis to generate a final estimation, wherein the method specifically comprises the following steps;

first, the reprojection error of pixel i of the image and the hypothesis f (ω) is defined as:

e_i(f,ω)＝||Cf^-1k_i(ω)-p_i||

in the formula p_iFor each pixel i of the image, the image coordinate (x)_i,y_i) C is the camera projection matrix, k_iIs an interior point, if e_i< τ, where τ determines an error threshold for the interior point;

scene coordinate regression work relies on pairingsInner point k_iCounting to score hypotheses, and in order to achieve end-to-end training of the neural network, constructing a differentiable function by using a sigmoid function:

in the formula e_iRepresenting a reprojection error, and controlling the softness of the sigmoid by using a hyper-parameter beta;

the evaluation function p (f) scores the agreement of each hypothesis with all scene coordinate predictions, hypothesis f, index j_j(ω) is selected based on a probability distribution P (j, ω, α) derived from the score value, hypotheses with high scores are more likely to be selected, and the final hypothesis is selected based on the softmax distribution P:

the size of the interior point counting score can be greatly different according to the difficulty of scenes, usually has large number of grades different under different environments, the evaluation score is kept in a reasonable range, the evaluation score is important for having wide distribution P (j; omega, alpha) and is important for stable end-to-end training, manual setting of the hyper-parameter alpha of each scene is a tedious task, alpha is automatically adapted in end-to-end training, and the size of the hyper-parameter alpha is selected through entropy:

training in end-to-end training is by argmin_α|S(α)-S^*Determining a parameter alpha by gradient descent, wherein the selection is to establish a target entropy in the just-started end-to-end training iteration and keep the target entropy stable in the whole process;

s11-4 refining the optimal pose assumption, the refining function R is an iterative process that uses the current pose estimate to determine and optimize inlier pixelsThe estimation is carried out alternately, if a neural network is used in the iterative process, end-to-end training must be realized, in order to improve the generalization and the precision of the model obtained by training, the training process is divided into two steps, the first step adopts an MAML method to divide the training process into an inner loop and an outer loop, so that the model has a better initial parameter, the second step can miniaturize the refining process, and finally the neural network model can output the final pose R (y)_j(ω))；

S21, obtaining the pose of the current person by performing scene coordinate regression on the image obtained by shooting through the mobile phone, wherein the coordinate regression method of the differentiable RANSAC is very high in precision, the error of the pose precision can be within 5cm and 5 degrees, and the position and the pose obtained through the image are used as the current initial position.

S22, obtaining the IMU data built in the mobile phone at the moment, carrying out reasoning and updating according to the obtained initial position posture data, namely the posture data at the previous moment, obtaining the posture data at the current moment,

s23, in the process of pose updating and navigation through the IMU, the pose obtained through the regression of the scene coordinates of the image is used as a new initial coordinate, the iterative updating process of the IMU is initialized, and the accumulated error of the inertial sensor is eliminated through high-precision visual positioning.

2. The method for mobile phone based navigation of airport underground parking spaces according to claim 1,

in the first step of the training process in step S11-4, the training process is divided into an inner loop and an outer loop by using the MAML method as follows:

introducing MAML initialization model parameters for model initialization and enabling the model to have better initial performance, dividing the training process of a neural network into an inner loop and an outer loop, wherein the inner loop realizes the training of basic model functions, the outer loop improves the training of model generalization, and a training set is also divided into two parts for the training of two loops;

when a 3D model exists in a training set of the inner loop, a rendered scene truth value is used for initialization, otherwise, a heuristic initialization is carried out by an approximate truth value, and the optimization of the inner loop is as follows:

the specific parametric iterative process of the inner loop is represented as:

in the formula, ω' is the optimal parameter of the inner loop iteration, ω is the initialization parameter, μ is the learning rate of the inner training process,

the gradient of the inner loop is indicated.

For the outer loop, the reprojection error calculated by the ground truth attitude is used, and a heuristic algorithm is used in the step of the inner loop, so that the correct depth of the scene point is effectively recovered;

therefore, the optimization process of the inner loop is as follows:

regarding the transfer of the inner loop and the outer loop parameters, a specific parameter transfer method is to perform element update or element optimization before the next batch of tasks is sampled, find the optimal parameter omega through the training of the inner loop, then calculate the gradient of omega for each inner loop, and update the random initialization parameter omega through a gradient update method, so that the random initialization parameter theta finds an initial parameter more similar to a target task, and a plurality of gradient steps are not needed when the next batch of tasks are trained;

the whole process is expressed as follows:

where ω is an initialization parameter, η is a learning rate hyperparameter of the outer loop,

is the gradient for the parameter ω' for the outer loop;

for the second training step, the model is trained in an end-to-end mode, all processes are required to be differentiable, including attitude optimization, and for the process of refining the hypothesis, the process of refining the pose is as follows:

because the argmin function is not differentiable, the differentiation of the thinning function is obtained by adopting a Gauss-Newton method:

in the formula, J_eRepresenting a Jacobian matrix comprising partial derivatives

Wherein h is_o＝R^t＝∞And (y) finally realizing end-to-end training.

3. A handset-based navigation method for airport underground parking space positioning according to claim 1, wherein each of said hypotheses y (ω) depends on parameters of the corresponding scene coordinates.

4. The method as claimed in claim 1, wherein the hyper-parameter α controls the width of the distribution, and the selection of the appropriate hyper-parameter facilitates end-to-end learning, which is to reduce the difference between the hypothesis scores so that the gradient values do not differ too far.