CN115439743A - Method for accurately extracting visual SLAM static characteristics in parking scene - Google Patents
Method for accurately extracting visual SLAM static characteristics in parking scene Download PDFInfo
- Publication number
- CN115439743A CN115439743A CN202211028947.4A CN202211028947A CN115439743A CN 115439743 A CN115439743 A CN 115439743A CN 202211028947 A CN202211028947 A CN 202211028947A CN 115439743 A CN115439743 A CN 115439743A
- Authority
- CN
- China
- Prior art keywords
- network
- image
- feature
- points
- feature points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000000007 visual effect Effects 0.000 title claims abstract description 32
- 230000003068 static effect Effects 0.000 title claims abstract description 24
- 238000001514 detection method Methods 0.000 claims abstract description 58
- 238000000605 extraction Methods 0.000 claims abstract description 27
- 239000011159 matrix material Substances 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 13
- 238000005516 engineering process Methods 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 6
- 238000013519 translation Methods 0.000 claims description 4
- 230000006872 improvement Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims 1
- 230000017105 transposition Effects 0.000 claims 1
- 238000013135 deep learning Methods 0.000 abstract description 5
- 230000007547 defect Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 230000004927 fusion Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010924 continuous production Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明属于视觉SLAM与深度学习领域,尤其涉及一种泊车场景下视觉SLAM利用深度学习目标检测去除动态特征点的方法,能够在泊车场景下精确提取视觉SLAM静态特征,便于完成建图。The invention belongs to the field of visual SLAM and deep learning, and in particular relates to a method for visual SLAM in a parking scene using deep learning target detection to remove dynamic feature points.
背景技术Background technique
同时定位与地图构建(Simultaneous localization and mapping,SLAM)技术在没有环境先验内容的前提下,利用机器人自身的传感器完成对周围环境信息的摄取和处理,进而完成地图构建与机器人的自身定位。利用相机传感器完成对周围环境的感知称为视觉SLAM,视觉传感器利用其成本低廉、采集信息丰富等优势,成为现代SLAM研究常用到的传感器。Simultaneous localization and mapping (SLAM) technology uses the robot's own sensors to capture and process the surrounding environment information without prior knowledge of the environment, and then completes the map construction and the robot's own positioning. The use of camera sensors to complete the perception of the surrounding environment is called visual SLAM. Visual sensors take advantage of their low cost and rich collection of information to become sensors commonly used in modern SLAM research.
随着视觉SLAM的技术不断完善和发展,涌现一批如ORB-SLAM2,OpenVSLAM等优秀的开源SLAM框架。经典的视觉SLAM主要由传感器数据输入、前端视觉里程计、后端优化、回环检测、建图几个模块组成。ORB-SLAM2由跟踪、局部建图和回环检测三个线程并行运行,采用传统ORB算法进行特征提取,经过验证,其对于光照的鲁棒性较差。近年来也不断涌现出一些基于深度学习的特征提取算法,特征提取对于整个视觉SLAM系统起到了举足轻重的作用,保证提取出的特征点对于场景具有较好的代表作用,现在的视觉SLAM大多只能提取出此场景当前场景下的所有特征点,但是对于车辆和行人这些动态物体上的特征点下次定位时未必还能匹配到,造成定位失败。With the continuous improvement and development of visual SLAM technology, a number of excellent open source SLAM frameworks such as ORB-SLAM2 and OpenVSLAM have emerged. The classic visual SLAM is mainly composed of several modules such as sensor data input, front-end visual odometer, back-end optimization, loop detection, and map building. ORB-SLAM2 is run in parallel by three threads of tracking, local mapping and loop detection. The traditional ORB algorithm is used for feature extraction. It has been verified that its robustness to illumination is poor. In recent years, some feature extraction algorithms based on deep learning have emerged. Feature extraction has played a pivotal role in the entire visual SLAM system, ensuring that the extracted feature points have a good representative effect on the scene. Most of the current visual SLAM can only All the feature points in the current scene of this scene are extracted, but the feature points on dynamic objects such as vehicles and pedestrians may not be matched in the next positioning, resulting in positioning failure.
发明内容Contents of the invention
针对上述传统视觉SLAM算法对于动态特征点以及特征点精度和鲁棒性上的不足,本发明的目的是提供一种泊车场景下精确提取视觉SLAM静态特征的方法,此方案能够提高特征点提取的精度,并且可有效减弱动态物体特征点对于视觉SLAM建图定位的影响,提高视觉SLAM的可靠性。In view of the shortcomings of the above-mentioned traditional visual SLAM algorithm for dynamic feature points and the accuracy and robustness of feature points, the purpose of the present invention is to provide a method for accurately extracting static features of visual SLAM in a parking scene, which can improve feature point extraction. It can effectively reduce the impact of dynamic object feature points on visual SLAM mapping and positioning, and improve the reliability of visual SLAM.
为解决上述技术问题,本发明提供的一种泊车场景下精确提取视觉SLAM静态特征的方法,包括以下步骤:In order to solve the above technical problems, the present invention provides a method for accurately extracting static features of visual SLAM in a parking scene, comprising the following steps:
步骤1:对车前方的停车场场景提取图像,对图像进行预处理后将图像输入到目标检测网络中进行目标检测,得到目标物的检测框;Step 1: Extract the image of the parking lot scene in front of the car, preprocess the image and input the image into the target detection network for target detection, and obtain the detection frame of the target object;
步骤2:筛选步骤1中输出的动态物体检测框,并形成mask掩码,与SuperPoint提取的特征点结合使用,剔除动态物体检测框的特征点,并得到关键点和描述子,SuperPoint网络包括关键点和描述子的共享编码器、关键点解码器和描述子解码器,共享编码器用于对图像进行编码得到特征图,关键点解码器用于对获得图像中关键点的坐标,描述子解码器用于获取关键点的描述子向量,其中,对SuperPoint网络的改进包括:将编码器中的所有卷积改成深度可分离卷积,其中,目标检测和特征提取使用多线程并行技术,在特征提取的同时进行目标检测;Step 2: Filter the dynamic object detection frame output in
步骤3:如果mask掩码代表的是行人,SuperPoint网络对掩码内的特征点进行剔除;如果是汽车,则对比相邻帧的汽车目标检测区域,相邻的两个目标检测区域的非公共部分的特征点保留,公共部分的特征点进行剔除,得到筛选后的静态特征点;Step 3: If the mask represents a pedestrian, the SuperPoint network removes the feature points in the mask; if it is a car, compare the car target detection areas of adjacent frames, and the non-common points of the adjacent two target detection areas Part of the feature points are retained, and the common part of the feature points is eliminated to obtain the filtered static feature points;
步骤4:利用SuperPoint网络提取并剔除mask掩码内的关键点和描述子,使用剩余的特征点进行特征匹配,继续执行视觉SLAM的tracking模块,计算相机位姿并建图,完成整个SLAM工作。Step 4: Use the SuperPoint network to extract and remove key points and descriptors in the mask, use the remaining feature points for feature matching, continue to execute the tracking module of visual SLAM, calculate the camera pose and build a map, and complete the entire SLAM work.
步骤1中,目标检测网络采用YOLOv5网络,基于YOLOv5的目标检测算法流程如下:输入一张608*608*3的RGB图像,将输入图像缩放到网络的输入尺寸,并利用Mosaic进行数据增强,Mosaic随机选取4张图片进行缩放、旋转、排布组成一张新的图片,不仅大大增加图片数量,并且加快了训练速度,达到数据增强的作用;Backbone模块使用CSPDarknet53结构和Focus结构来提取一些通用特征;将提取的通用特征输送到Neck网络中提取更具多样性和鲁棒性的特征,输入到CSP2_X和CBL结构,并经过上采样,和主干网络输出的特征进行contcat,增强了特征融合的能力;最后输出端使用CIoU_LOSS代替之前的GIoU_LOSS作为Bounding Box的损失函数,CIoU公式如下:In
CIou考虑了真实框和预测框的尺寸比例,式中,v∈[0,1]表示预测框长宽和对应的真实框之间比例差值的归一化表示,α表示损失平衡因子。CIou considers the size ratio of the real frame and the predicted frame, where v ∈ [0, 1] represents the normalized representation of the proportional difference between the length and width of the predicted frame and the corresponding real frame, α represents the loss equalization factor.
步骤2中,步骤2中采用的SuperPoint网络在训练前,In step 2, the SuperPoint network used in step 2 is before training,
SuperPoint网络采用自监督的方式进行提取,首先使用规则的几何形状作为数据集训练一个全卷积网络-Base Detector;将未标注的真实图片利用Base Detector网络的检测结果作为伪Ground Truth Keypoint(伪真值关键点),为了伪Ground Truth Keypoint更具鲁棒性和准确性,使用Homographic Adaptation技术(单应技术),将未标注的真实图片在不同尺寸下提取特征,生成伪标签;生成伪标签后,即可将真是未标注图片放进SuperPoint网络中进行训练。在图像输入阶段,采用翻转等数据增强手段。The SuperPoint network is extracted in a self-supervised manner. First, a fully convolutional network-Base Detector is trained using regular geometric shapes as a data set; the detection results of unlabeled real pictures using the Base Detector network are used as pseudo-Ground Truth Keypoint (false-true Value key point), in order to make the pseudo-Ground Truth Keypoint more robust and accurate, use Homographic Adaptation technology (homographic technology) to extract features from unlabeled real pictures at different sizes to generate pseudo-labels; after generating pseudo-labels , you can put the real unlabeled pictures into the SuperPoint network for training. In the image input stage, data enhancement methods such as flipping are used.
SuperPoint网络包括关键点和描述子的共享编码器、关键点解码器、描述子解码器三部分,进一步地,步骤2中,SuperPoint网络检测关键点和描述子过程如下:The SuperPoint network includes three parts: a shared encoder of key points and descriptors, a key point decoder, and a descriptor decoder. Further, in step 2, the SuperPoint network detects key points and descriptors as follows:
输入一张H*W*3的图像帧,将其灰度化后转化成H*W*1,接着将图像输入到经过改进的更加轻量化共享编码器,经过编码器后,输入图像尺寸转化为Hc=H/8,Wc=W/8,以此降低图像尺寸;Input an image frame of H*W*3, convert it to H*W*1 after grayscale, and then input the image to an improved and lighter shared encoder. After the encoder, the input image size is converted H c =H/8, W c =W/8, so as to reduce the image size;
关键点解码器进行子像素卷积操作,通过depth to space过程将输入向量由H/8*W/8*65转化成H*W,最终输出为各个像素点是Keypoint的概率;The key point decoder performs sub-pixel convolution operation, converts the input vector from H/8*W/8*65 to H*W through the depth to space process, and finally outputs the probability that each pixel is a Keypoint;
描述子解码器利用卷积网络得到半稠密描述子,接着利用双三次差值得出剩余描述,最后通过L2归一化得到统一长度(H*W*D)的描述子。The descriptor decoder uses the convolutional network to obtain a semi-dense descriptor, then uses the bicubic difference to obtain the remaining description, and finally obtains a descriptor of uniform length (H*W*D) through L2 normalization.
步骤2中,使用改进SuperPoint的共享编码器,原始SuperPoint编码器使用类VGG6的卷积网络层,但是计算量、训练参数庞大,本发明将编码器部分所有卷积改成深度可分离卷积。正常卷积过程如图3所示,设定输入图像尺寸为H*W*3,输出m层feature map,则普通卷积核参数量为3*f*f*m;In step 2, the shared encoder of the improved SuperPoint is used. The original SuperPoint encoder uses a VGG6-like convolutional network layer, but the calculation amount and training parameters are huge. The present invention changes all convolutions of the encoder part into depth-separable convolutions. The normal convolution process is shown in Figure 3, set the input image size to H*W*3, and output an m-layer feature map, then the normal convolution kernel parameter is 3*f*f*m;
深度可分离卷积(如图4所示)分为逐通道卷积和逐点卷积两个连续过程,逐通道卷积是给每个通道一个单独的卷积核进行卷积,将卷积过程转化到二位平面内进行,最终生成mid feature map(中间特征图),此环节的卷积核参数量为f*f*3,生成的mid featuremap进行逐点卷积,使用1*1*3卷积核,具有数据融合的作用,最终也输出m层feature map,此部分的参数量为1*3*m;则逐通道核与逐点卷积的卷积核的参数量为3*(f*f+m),较之直接卷积的3*f*f*m下降了一个数量级,时间效率上会大大提升;虽然学习的参数量较之普通卷积有所下降,但是精度上并没有下降太多。Depth separable convolution (as shown in Figure 4) is divided into two continuous processes of channel-by-channel convolution and point-by-point convolution. Channel-by-channel convolution is to convolve each channel with a separate convolution kernel, and the convolution The process is transformed into a two-dimensional plane, and finally a mid feature map (intermediate feature map) is generated. The convolution kernel parameter of this link is f*f*3, and the generated mid featuremap is convolved point by point, using 1*1* 3 convolution kernels, which have the function of data fusion, and finally output the m-layer feature map. The parameter quantity of this part is 1*3*m; the parameter quantity of the convolution kernel of channel-by-channel kernel and point-by-point convolution is 3* (f*f+m), which is an order of magnitude lower than the 3*f*f*m of direct convolution, and the time efficiency will be greatly improved; although the number of parameters learned is lower than that of ordinary convolution, the accuracy is higher. Didn't drop much.
步骤2中,改进SuperPoint的损失函数由关键点提取损失和描述子检测损失两部分组成:In step 2, the loss function of the improved SuperPoint consists of two parts: key point extraction loss and descriptor detection loss:
式中损失函数由两部分组成:为关键点损失,使用交叉熵损失函数,为描述子损失,λ为平衡因子。where the loss function consists of two parts: For the keypoint loss, use the cross-entropy loss function, is the descriptor loss, and λ is the balance factor.
步骤2中,目标检测和特征提取运行机制如下:使用多线程并行技术,在特征提取的同时进行目标检测,跟踪线程不用等待YOLOv5的检测结果,提高的CPU的利用率,提升了运行效率。In step 2, the operating mechanism of target detection and feature extraction is as follows: using multi-thread parallel technology, target detection is performed while feature extraction is performed, and the tracking thread does not need to wait for the detection results of YOLOv5, which improves CPU utilization and improves operating efficiency.
步骤3中,剔除由步骤1、2动态物体(预定车辆与行人)检测框中的特征点,如果直接剔除每个动态物体框中的特征点,会由于特征点过少出现难以匹配的情况。如图7所示,对于相邻两帧的检测出动态目标框的特征点分别为(图7中A区域),(图7中B区域),表示第n帧中得第i个特征点。将相邻两帧检测同一动态物体目标框的交集作为最终的动态目标特征点,即D=DnDn+1(图7的C区域),将D集合中特征点最为最终动态特征点集合,这样降低了动态特征点误剔除的概率,也保留部分疑似静态特征点,剩余的绿色、蓝色和黄色区域为保留区域,增加了跟踪线程的可靠性。In
步骤4中,计算相机位姿过程如下:经过上述过程中筛选出的特征点以及描述子进行图像匹配,利用RANSAC随机采样一致性算法剔除误匹配特征点,根据匹配关系转化成2D点到2D点的对极几何问题。假定x1、x2为两张图像上对应的匹配点归一化坐标,R为相机旋转矩阵,t为平移矩阵,则满足In step 4, the process of calculating the camera pose is as follows: through the feature points and descriptors selected in the above process, image matching is performed, and the RANSAC random sampling consensus algorithm is used to eliminate mismatched feature points, and the matching relationship is converted into 2D points to 2D points The epipolar geometry problem. Assuming that x 1 and x 2 are the normalized coordinates of the corresponding matching points on the two images, R is the camera rotation matrix, and t is the translation matrix, then satisfy
x2=Rx1+tx 2 =Rx 1 +t
左乘xT 2t: Left multiply x T 2 t:
等式左侧为0,则:即为对极约束表达式,在按照最小重投影误差即可求出相机位姿。令基础矩阵E=t^R,本质矩阵F=K-TEK-1。求解相机位姿可以转化成以下两步:求出基础矩阵E或者本质矩阵F;根据E或F,求解R,t。The left side of the equation is 0, then: It is the epipolar constraint expression, and the camera pose can be calculated according to the minimum reprojection error. Let the fundamental matrix E=t^R, the essential matrix F=K -T EK -1 . Solving the camera pose can be transformed into the following two steps: finding the fundamental matrix E or the essential matrix F; according to E or F, solving R, t.
步骤4中,关键帧的筛选对于信息的冗余与计算资源的释放有着较大的影响。若系统处于定位模式、局部地图被占用或者刚刚结束重定位,则不插入关键帧。In step 4, the screening of key frames has a great influence on the redundancy of information and the release of computing resources. If the system is in positioning mode, the local map is occupied, or repositioning has just finished, no keyframe will be inserted.
与现有技术相比,本发明至少能够实现以下有益效果:Compared with the prior art, the present invention can at least achieve the following beneficial effects:
本发明采用基于深度学习的SuperPoint网络结合目标检测网络(如YOLOv5)分别进行关键点、描述子提取以及动态目标检测,传统方案大都使用ORB、SURF来进行特征提取,但是对于停车场场景变化、光照强度变化明显时的特征提取效果不佳,而本发明使用的改进SuperPoin首先让网络模型更加轻量化,且最终能使得特征提取对于不同场景变化更加鲁棒,提取的特征点更加均匀合理。The present invention uses a deep learning-based SuperPoint network combined with a target detection network (such as YOLOv5) to perform key point, descriptor extraction, and dynamic target detection. Traditional solutions mostly use ORB and SURF for feature extraction, but for parking lot scene changes, lighting The feature extraction effect is not good when the intensity changes are obvious, but the improved SuperPoint used in the present invention first makes the network model more lightweight, and finally makes the feature extraction more robust to different scene changes, and the extracted feature points are more uniform and reasonable.
附图说明Description of drawings
图1是本发明实施例提供的泊车场景下精确提取视觉SLAM静态特征方法的流程示意图。Fig. 1 is a schematic flowchart of a method for accurately extracting static features of visual SLAM in a parking scene provided by an embodiment of the present invention.
图2是本发明实施例提供的改进的轻量化SuperPoint核心流程图。Fig. 2 is a flowchart of an improved lightweight SuperPoint core provided by an embodiment of the present invention.
图3是普通卷积示意图。Figure 3 is a schematic diagram of ordinary convolution.
图4是深度可分离卷积示意图,其中,(a)图是逐通道卷积示意图,(b)图是逐点卷积示意图。Figure 4 is a schematic diagram of depth-separable convolution, where (a) is a schematic diagram of channel-by-channel convolution, and (b) is a schematic diagram of point-by-point convolution.
图5是本发明实施例提供的YOLOv5在停车场场景的检测效果图。Fig. 5 is a detection effect diagram of YOLOv5 in the parking lot scene provided by the embodiment of the present invention.
图6是本发明实施例提供的SuperPoint特征提取效果图。Fig. 6 is an effect diagram of SuperPoint feature extraction provided by an embodiment of the present invention.
图7是本发明实施例提供的动态特征筛除示意图(C处为去除区域)。Fig. 7 is a schematic diagram of dynamic feature screening provided by an embodiment of the present invention (C is the removal area).
具体实施方式detailed description
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申对技术方案进行清楚、完整地描述。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions will be clearly and completely described below in conjunction with the present application.
传统的视觉SLAM特征提是基于静态环境假设,但是对于停车场环境每个停车位,并不是一直都有汽车停放的,所以剔除停车位上一些动态物体的特征点是很有必要的,保证这些动态特征不会保留在最后的地图上。现有技术提出的动态特征点一般是用传统的特征提取算法如ORB、SURF等,但是这些特征点对于不同的场景变化其效果差别是很大的,并不是很鲁棒,本发明采用基于深度学习的SuperPoint网络结合YOLOv5算法分别进行关键点、描述子提取以及动态目标检测,并对SuperPoint进行改进让网络模型更加轻量化,最终使得特征提取对于不同场景变化更加鲁棒,提取的特征点更加均匀合理。下面对本发明所提供的方法做具体介绍。The traditional visual SLAM feature extraction is based on the assumption of a static environment, but for each parking space in the parking lot environment, there are not always cars parked, so it is necessary to remove the feature points of some dynamic objects on the parking space to ensure these Dynamic features are not preserved on the final map. The dynamic feature points proposed in the prior art generally use traditional feature extraction algorithms such as ORB, SURF, etc., but the effect of these feature points on different scene changes is very different, and it is not very robust. The present invention uses depth-based The learned SuperPoint network is combined with the YOLOv5 algorithm to extract key points, descriptors, and dynamic object detection, and to improve SuperPoint to make the network model more lightweight, and finally make the feature extraction more robust to different scene changes, and the extracted feature points are more uniform Reasonable. The method provided by the present invention is described in detail below.
如图1至图6所示,本发明公开了一种泊车场景下精确提取视觉SLAM静态特征的方法,包括以下步骤:As shown in Figures 1 to 6, the present invention discloses a method for accurately extracting static features of visual SLAM in a parking scene, comprising the following steps:
Stepl:对车前方的停车场场景提取图像,对图像进行预处理后将图像输入到目标检测网络中进行目标检测,得到目标物的检测框。Stepl: Extract the image of the parking lot scene in front of the car, preprocess the image, input the image into the target detection network for target detection, and obtain the detection frame of the target object.
在本发明的其中一些实施例中,所采用的目标检测网络为YOLOV5网络模型。可以理解的是,在其他实施例中,也可以采用其他目标检测网络。In some embodiments of the present invention, the target detection network used is the YOLOV5 network model. It can be understood that in other embodiments, other target detection networks can also be used.
使用单目相机对车前方的停车场场景实时采集图像,进行图像预处理操作(如滤波、图像增强等)后,将一张H*W*3(H、W分别表示一张图像的长宽的像素数)的RGB图像输入到YOLOV5网络模型中,输入图像缩放到网络的输入尺寸,并利用Mosaic进行数据增强,Mosaic随机选取4张图片进行缩放、旋转、排布组成一张新的图片,不仅大大增加了图片数量,并且能够加快训练速度,达到数据增强的作用;主干网络提取图像特征,生成特征图;Backbone模块使用CSPDarknet53结构和Focus结构来提取一些通用特征;将提取的通用特征输送到Neck网络中提取更具多样性和鲁棒性的特征,输入到CSP2_X和CBL结构,并经过上采样,和主干网络输出的特征进行拼接,增强特征融合的能力;最后输出端使用CIoU_LOSS代替之前的GIoU_LOSS作为Bounding Box的损失函数,CIoU公式如下:Use a monocular camera to collect images of the parking lot scene in front of the car in real time, and after image preprocessing operations (such as filtering, image enhancement, etc.), a H*W*3 (H, W represent the length and width of an image respectively) The number of pixels) of the RGB image is input into the YOLOV5 network model, the input image is scaled to the input size of the network, and the data is enhanced using Mosaic. Mosaic randomly selects 4 pictures to scale, rotate, and arrange to form a new picture. It not only greatly increases the number of pictures, but also can speed up the training speed and achieve the effect of data enhancement; the backbone network extracts image features and generates feature maps; the Backbone module uses the CSPDarknet53 structure and Focus structure to extract some general features; the extracted general features are sent to More diverse and robust features are extracted from the Neck network, input to the CSP2_X and CBL structures, and after upsampling, they are spliced with the features output by the backbone network to enhance the ability of feature fusion; the final output uses CIoU_LOSS to replace the previous ones GIoU_LOSS is used as the loss function of Bounding Box, and the CIoU formula is as follows:
式中,CIou考虑了真实框和预测框的尺寸比例,CIoU(Bpre,BGT)表示预测框和真实目标框之间的CIOU交并比;Iou(Bpre,BGr)表示预测框和真实目标框之间的交并比;Bpre表示预测框;BGT表示真实目标检测框;ρ(Bpre,BGT)表示预测框和真实框的中心点距离;v表示预测框和真实框的宽高比相似度,v∈[0,1];wGT表示真实框的宽;hGT表示真实框的高;GT表示真实框信息;w、h分别表示预测框的宽和高;α表示损失平衡因子。In the formula, CIou considers the size ratio of the real frame and the predicted frame, CIoU(B pre , B GT ) represents the CIOU intersection ratio between the predicted frame and the real target frame; Iou(B pre , B Gr ) represents the predicted frame and Intersection and union ratio between real target frames; B pre represents the predicted frame; B GT represents the real target detection frame; ρ(B pre , B GT ) represents the center point distance between the predicted frame and the real frame; v represents the predicted frame and the real frame The aspect ratio similarity of , v∈[0,1]; w GT represents the width of the real frame; h GT represents the height of the real frame; GT represents the information of the real frame; w, h represent the width and height of the predicted frame, respectively; α represents the loss equalization factor.
Step2:筛选步骤1中输出的动态物体检测框,并形成mask掩码,与SuperPoint提取的特征点结合使用,剔除动态物体检测框的特征点。Step2: Screen the dynamic object detection frame output in
在本发明的其中一些实施例中,动态物体检测框中动态物体包括:车辆、行人、少数的动物。In some embodiments of the present invention, the dynamic objects in the dynamic object detection frame include: vehicles, pedestrians, and a few animals.
与此同时,另一线程进行特征提取过程。将采集的尺寸为H*W*3的RGB图像经过灰度化后输入到轻量化的SuperPoint网络中,SuperPoint网络采用自监督的方式进行提取,首先使用规则的几何形状作为数据集训练一个全卷积网络Base Detector;将未标注的真实图片利用Base Detector网络的检测结果作为伪真值关键点(伪Ground TruthKeypoint),为了伪真值关键点更具鲁棒性和准确性,使用单应技术(HomographicAdaptation技术),将未标注的真实图片在不同尺寸下提取特征,生成伪标签;生成伪标签后,即可将真实未标注图片放进SuperPoint网络中进行训练。在图像输入阶段,采用翻转等数据增强手段。At the same time, another thread conducts the feature extraction process. The collected RGB image with a size of H*W*3 is grayscaled and then input into the lightweight SuperPoint network. The SuperPoint network uses a self-supervised method for extraction. First, a full volume is trained using regular geometric shapes as a data set. Product network Base Detector; unmarked real pictures using the detection results of the Base Detector network as false ground truth key points (pseudo Ground TruthKeypoint), in order to make false ground truth key points more robust and accurate, using homography technology ( HomographicAdaptation technology), which extracts features from unlabeled real pictures at different sizes to generate pseudo-labels; after generating pseudo-labels, the real unlabeled pictures can be put into the SuperPoint network for training. In the image input stage, data enhancement methods such as flipping are used.
SuperPoint网络包括关键点和描述子的共享编码器、关键点解码器和描述子解码器,共享编码器用于对图像进行编码得到特征图,关键点解码器用于对获得图像中关键点的坐标,描述子解码器用于获取关键点的描述子向量。The SuperPoint network includes shared encoders for key points and descriptors, key point decoders, and descriptor decoders. The shared encoders are used to encode images to obtain feature maps. The key point decoders are used to obtain the coordinates of key points in the image and describe The sub-decoder is used to obtain the descriptor sub-vectors of the keypoints.
具体地,输入一张H*W*3的图像帧,将其灰度化后转化成H*W*1,接着将图像输入到经过改进的更加轻量化的共享编码器,编码器中将所有的普通卷积操作转化成深度可分离卷积。对于普通卷积,设定输出的特征图有m层,则卷积核参数量为3*f*f*m(f表示卷积核尺寸,m表示最终输出通道数),经过深度可分离卷积后参数量变为3*(f*f+m),较之直接卷积的3*f*f*m下降了一个数量级,时间效率上会大大提升;虽然学习的参数量较之普通卷积有所下降,但是精度上并没有下降太多。Specifically, input an image frame of H*W*3, grayscale it and convert it into H*W*1, and then input the image into an improved and lighter shared encoder, in which all The ordinary convolution operation is transformed into a depthwise separable convolution. For ordinary convolution, set the output feature map to have m layers, then the convolution kernel parameter is 3*f*f*m (f represents the convolution kernel size, m represents the final output channel number), after the depth separable convolution After the product, the amount of parameters becomes 3*(f*f+m), which is an order of magnitude lower than the 3*f*f*m of direct convolution, and the time efficiency will be greatly improved; although the amount of parameters learned is compared with ordinary convolution It has decreased, but the accuracy has not decreased too much.
在本发明的其中一些实施例中,深度可分离卷积包括逐通道卷积和逐点卷积两个连续过程,逐通道卷积是给每个通道一个单独的卷积核进行卷积,将卷积过程转化到二维平面内进行,最终生成中间特征图,此环节的卷积核参数量为f*f*3,生成的中间特征图进行逐点卷积,使用1*1*3卷积核,具有数据融合的作用,最终也输出m层feature map,此部分的参数量为1*3*m;则逐通道核与逐点卷积的卷积核的参数量为3*(f*f+m),较之直接卷积的3*f*f*m下降了一个数量级,时间效率上会大大提升;虽然学习的参数量较之普通卷积有所下降,但是精度上并没有下降太多。In some embodiments of the present invention, depthwise separable convolution includes two continuous processes of channel-by-channel convolution and point-by-point convolution. Channel-by-channel convolution is to convolve each channel with a separate convolution kernel. The convolution process is transformed into a two-dimensional plane, and finally an intermediate feature map is generated. The convolution kernel parameter in this link is f*f*3, and the generated intermediate feature map is convolved point by point, using 1*1*3 volume The accumulation kernel has the function of data fusion, and finally outputs the m-layer feature map. The parameter amount of this part is 1*3*m; the parameter amount of the convolution kernel of channel-by-channel kernel and point-by-point convolution is 3*(f *f+m), which is an order of magnitude lower than the 3*f*f*m of direct convolution, and the time efficiency will be greatly improved; although the number of parameters learned is lower than that of ordinary convolution, the accuracy is not. Dropped too much.
经过编码器后,输入图像尺寸转化为Hc=H/8,Wc=W/8,以此降低图像尺寸;关键点解码器进行子像素卷积操作,通过depth to space(把depth维的数据移到space维上)过程将输入向量由H/8*W/8*65转化成H*W向量,H*W向量通过softmax运算、再通过Resahpe进行维度转换后,最终输出各个像素点是关键点(KeyPoint)的概率,以向量形式表示,通过概率阈值筛选为关键点的地方就是关键点坐标;描述子检测器利用卷积网络得到半稠密描述子,接着利用双三次差值得出剩余描述,最后通过L2归一化得到统一长度(H*W*D)的描述子。After the encoder, the input image size is converted to H c = H/8, W c = W/8, thereby reducing the image size; the key point decoder performs sub-pixel convolution operation, through depth to space (the depth dimension The data is moved to the space dimension) process converts the input vector from H/8*W/8*65 to H*W vector, and the H*W vector is subjected to softmax operation, and then dimensionally transformed by Resahpe, and the final output of each pixel is The probability of a key point (KeyPoint), expressed in vector form, is the key point coordinates where the key point is screened by the probability threshold; the descriptor detector uses a convolutional network to obtain a semi-dense descriptor, and then uses the bicubic difference to obtain the remaining description , and finally obtain a descriptor of uniform length (H*W*D) through L2 normalization.
改进SuperPoint的损失函数由关键点提取损失和描述子检测损失两部分组成:The loss function of the improved SuperPoint consists of two parts: key point extraction loss and descriptor detection loss:
式中损失函数由两部分组成:为关键点损失,为为翻转后图像的关键点损失,使用交叉熵损失函数,为描述子损失,表示图像经编码网络模型后对关键点的响应,尺寸为 表示原图经过翻转后经编码网络模型后对关键点的响应;D表示图像经编码网络模型后对描述子的响应;D′表示翻转后的图像经编码网络模型后对描述子的响应;Y表示关键点坐标标签;Y′表示翻转后图像的关键点坐标标签;S表示表示原图像和翻转后图像组成的图像对;λ为平衡因子。where the loss function consists of two parts: is the keypoint loss, For the keypoint loss of the flipped image, use the cross-entropy loss function, is the descriptor loss, Indicates the response of the image to key points after being encoded by the network model, and the size is Indicates the response of the original image to the key points after being flipped and encoded by the network model; D indicates the response of the image to the descriptor after being encoded by the network model; D′ indicates the response of the flipped image to the descriptor after being encoded by the network model; Y Indicates the key point coordinate label; Y' indicates the key point coordinate label of the flipped image; S represents the image pair composed of the original image and the flipped image; λ is the balance factor.
Step3:如果mask掩码代表的是行人,SuperPoint网络对mask掩码内的特征点进行剔除;如果是汽车,则对比相邻帧的汽车目标检测区域,相邻的两个目标检测区域的非公共部分的特征点保留,公共部分的特征点进行剔除,得到筛选后的静态特征点。Step3: If the mask represents pedestrians, the SuperPoint network removes the feature points in the mask; if it is a car, compare the car target detection areas of adjacent frames, the non-common of two adjacent target detection areas Part of the feature points are retained, and the common part of the feature points is eliminated to obtain the filtered static feature points.
目标检测框和深度特征使用多线程并行技术,在特征提取的同时进行目标检测,跟踪线程不用等待目标检测网络的检测结果,提高了CPU的利用率,提升了运行效率。The target detection frame and deep features use multi-thread parallel technology to perform target detection while feature extraction, and the tracking thread does not need to wait for the detection results of the target detection network, which improves CPU utilization and improves operating efficiency.
剔除由步骤1、2动态物体(如车辆与行人)检测框中的特征点,如果直接剔除每个动态物体框中的特征点,会由于特征点过少出现难以匹配的情况。因此,在本发明的其中一些实施例中,对于相邻两帧的检测出动态目标框的特征点分别为(如图7中A区域),(如图7中B区域),表示第n帧中的第i个特征点,p表示第n帧中的特征点的总数,q表示第n+1帧中的特征点的总数。将相邻两帧检测同一动态物体目标框的交集作为最终的动态目标特征点,即D=DnDn+1(如图7中的C区域),将D集合中特征点作为最终动态特征点集合,将D集合中的特征点进行剔除,这样降低了动态特征点误剔除的概率,也保留了部分疑似静态特征点,剩余的A、B区域为保留区域,增加了跟踪线程的可靠性,将筛选后的静态特征点进行保存,用于后续的特征匹配与位姿计算。Eliminate the feature points in the detection frame of dynamic objects (such as vehicles and pedestrians) in
Step4:利用SuperPoint网络提取并剔除mask掩码内的关键点和描述子,使用剩余的特征点进行特征匹配,继续执行视觉SLAM的Tracking模块,利用最小重投影误差计算相机位姿并建图,完成整个SLAM工作。Step4: Use the SuperPoint network to extract and eliminate key points and descriptors in the mask mask, use the remaining feature points for feature matching, continue to execute the Tracking module of visual SLAM, use the minimum reprojection error to calculate the camera pose and build a map, and complete The whole SLAM job.
经过上述过程中筛选出的特征点以及描述子进行图像匹配,在本发明的其中一些实施例中,利用RANSAC随机采样一致性算法剔除误匹配特征点,根据匹配关系转化成2D点到2D点的对极几何问题。假定x1、x2为两张图像上对应的匹配点归一化坐标,R为相机旋转矩阵,t为平移矩阵,则满足Image matching is carried out through the feature points and descriptors screened out in the above process. In some embodiments of the present invention, the RANSAC random sampling consensus algorithm is used to eliminate the mismatching feature points, and convert them into 2D points to 2D points according to the matching relationship. Epipolar geometry problems. Assuming that x 1 and x 2 are the normalized coordinates of the corresponding matching points on the two images, R is the camera rotation matrix, and t is the translation matrix, then satisfy
x2=Rx1+tx 2 =Rx 1 +t
左乘xT 2t: Left multiply x T 2 t:
等式左侧为0,则:即为对极约束表达式,按照最小重投影误差即可求出相机位姿。令基础矩阵E=t^R,本质矩阵F=K-TEK-1,K-相机内参矩阵,E-基本矩阵,T表示矩阵的转置运算,t表示平移矩阵,R表示旋转矩阵。求解相机位姿可以转化成以下两步:求出基础矩阵E或者本质矩阵F;根据E或F,求解R,t。若系统处于定位模式、局部地图被占用或者刚刚结束重定位,则不插入关键帧。在不满足上面三种情况的前提下,若当前帧匹配的内点数超过设定的阈值,可将当前帧设置成关键帧,跟踪线程完成,接着继续进行建图和回环检测,最终完成整个地图的建立。The left side of the equation is 0, then: It is the epipolar constraint expression, and the camera pose can be calculated according to the minimum reprojection error. Let the basic matrix E=t^R, the essential matrix F=K -T EK -1 , K-camera internal reference matrix, E-basic matrix, T represents the transpose operation of the matrix, t represents the translation matrix, and R represents the rotation matrix. Solving the camera pose can be transformed into the following two steps: finding the fundamental matrix E or the essential matrix F; according to E or F, solving R, t. If the system is in positioning mode, the local map is occupied, or just finished repositioning, no keyframe will be inserted. Under the premise that the above three conditions are not satisfied, if the number of inliers matched by the current frame exceeds the set threshold, the current frame can be set as a key frame, the tracking thread is completed, and then continue to build the map and loop detection, and finally complete the entire map of establishment.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其他实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211028947.4A CN115439743A (en) | 2022-08-23 | 2022-08-23 | Method for accurately extracting visual SLAM static characteristics in parking scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211028947.4A CN115439743A (en) | 2022-08-23 | 2022-08-23 | Method for accurately extracting visual SLAM static characteristics in parking scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115439743A true CN115439743A (en) | 2022-12-06 |
Family
ID=84244129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211028947.4A Pending CN115439743A (en) | 2022-08-23 | 2022-08-23 | Method for accurately extracting visual SLAM static characteristics in parking scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115439743A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115049731A (en) * | 2022-06-17 | 2022-09-13 | 感知信息科技(浙江)有限责任公司 | Visual mapping and positioning method based on binocular camera |
CN116311082A (en) * | 2023-05-15 | 2023-06-23 | 广东电网有限责任公司湛江供电局 | Wearing detection method and system based on matching of key parts and images |
CN116630901A (en) * | 2023-07-24 | 2023-08-22 | 南京师范大学 | A Visual Odometry Method Based on Latent Graph Predictive Unsupervised Learning Framework |
CN117893693A (en) * | 2024-03-15 | 2024-04-16 | 南昌航空大学 | Dense SLAM three-dimensional scene reconstruction method and device |
CN118366130A (en) * | 2024-06-19 | 2024-07-19 | 深圳拜波赫技术有限公司 | Pedestrian glare protection and intelligent shadow area generation method and system |
-
2022
- 2022-08-23 CN CN202211028947.4A patent/CN115439743A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115049731A (en) * | 2022-06-17 | 2022-09-13 | 感知信息科技(浙江)有限责任公司 | Visual mapping and positioning method based on binocular camera |
CN116311082A (en) * | 2023-05-15 | 2023-06-23 | 广东电网有限责任公司湛江供电局 | Wearing detection method and system based on matching of key parts and images |
CN116630901A (en) * | 2023-07-24 | 2023-08-22 | 南京师范大学 | A Visual Odometry Method Based on Latent Graph Predictive Unsupervised Learning Framework |
CN116630901B (en) * | 2023-07-24 | 2023-09-22 | 南京师范大学 | Visual odometer method based on potential diagram prediction non-supervision learning framework |
CN117893693A (en) * | 2024-03-15 | 2024-04-16 | 南昌航空大学 | Dense SLAM three-dimensional scene reconstruction method and device |
CN117893693B (en) * | 2024-03-15 | 2024-05-28 | 南昌航空大学 | Dense SLAM three-dimensional scene reconstruction method and device |
CN118366130A (en) * | 2024-06-19 | 2024-07-19 | 深圳拜波赫技术有限公司 | Pedestrian glare protection and intelligent shadow area generation method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115439743A (en) | Method for accurately extracting visual SLAM static characteristics in parking scene | |
CN113052210B (en) | Rapid low-light target detection method based on convolutional neural network | |
CN110188705B (en) | Remote traffic sign detection and identification method suitable for vehicle-mounted system | |
CN110544251A (en) | Dam crack detection method based on multi-transfer learning model fusion | |
CN108460403A (en) | The object detection method and system of multi-scale feature fusion in a kind of image | |
CN108427924A (en) | A kind of text recurrence detection method based on rotational sensitive feature | |
CN104463241A (en) | Vehicle type recognition method in intelligent transportation monitoring system | |
WO2023082784A1 (en) | Person re-identification method and apparatus based on local feature attention | |
CN113901961B (en) | Parking space detection method, device, equipment and storage medium | |
CN110008900B (en) | Method for extracting candidate target from visible light remote sensing image from region to target | |
CN112801182B (en) | A RGBT Target Tracking Method Based on Difficult Sample Perception | |
CN105225281B (en) | A kind of vehicle checking method | |
CN116279592A (en) | Method for dividing travelable area of unmanned logistics vehicle | |
CN107066916A (en) | Scene Semantics dividing method based on deconvolution neutral net | |
CN110334709A (en) | End-to-end multi-task deep learning-based license plate detection method | |
CN112560852A (en) | Single-stage target detection method with rotation adaptive capacity based on YOLOv3 network | |
CN112070174A (en) | Text detection method in natural scene based on deep learning | |
CN111680580A (en) | A recognition method, device, electronic device and storage medium for running a red light | |
CN113095371A (en) | Feature point matching method and system for three-dimensional reconstruction | |
CN112766056A (en) | Method and device for detecting lane line in low-light environment based on deep neural network | |
CN117058641A (en) | Panoramic driving perception method based on deep learning | |
CN118262090A (en) | A lightweight open set remote sensing target detection method based on LMSFA-YOLO | |
CN116935249A (en) | A small target detection method with three-dimensional feature enhancement in drone scenes | |
CN116883868A (en) | UAV intelligent cruise detection method based on adaptive image defogging | |
CN112446292A (en) | 2D image salient target detection method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |