CN116168088B

CN116168088B - Monocular side angle imaging vehicle scale estimation and speed estimation method

Info

Publication number: CN116168088B
Application number: CN202310034513.3A
Authority: CN
Inventors: 李远玥; 周智; 翁黎竹
Original assignee: Welltrans O&e Co ltd
Current assignee: Welltrans O&e Co ltd
Priority date: 2023-01-10
Filing date: 2023-01-10
Publication date: 2024-11-05
Anticipated expiration: 2043-01-10
Also published as: CN116168088A

Abstract

A vehicle scale estimation and speed estimation method for single-target side angle imaging, the steps are as follows: S1, establish a target detection model, detect vehicle targets, wheels, key points, and lane lines; S2, perform target tracking, obtain the tracking sequence of each vehicle, the vehicle detection frame, and the vehicle classification result; S3, calculate the coordinates of the wheel center point "projected" to the road surface to obtain the axle information and divide the lane lines to obtain multiple lanes; S4, based on the tracking sequence of similar targets in step S2 and the axle information of similar targets in step S3, self-calibrate the corresponding targets, and obtain the mapping relationship between the pixel coordinate system and the actual coordinate system of the target; S5, determine the target vehicle, obtain its tracking sequence, axle information at different positions, and coordinate information, and based on the mapping relationship in step S4, obtain the estimated value of the final length of the target vehicle and the estimated value of the vehicle speed. The present invention can automatically calibrate and reduce the complexity of manual operation.

Description

A vehicle scale estimation and speed estimation method based on single-target side angle imaging

技术领域Technical Field

本发明属于交通技术领域，具体涉及一种单目的侧角度成像的车辆尺度估计与速度估计方法。The invention belongs to the field of traffic technology, and in particular relates to a vehicle scale estimation and speed estimation method based on single-object side angle imaging.

背景技术Background Art

车辆长宽估计和车辆行驶速度估计，是交通领域的一个重要课题。现在常用的方式包括，3d方案，3d/2d结合方案，纯2d方案等。引入3d传感觉的方案通常硬件成本较高。使用纯2d的方案，通常需要人工标定，每次新场景安装或者相机姿态调整，均需要重新标定，工作量较大。Vehicle length, width and speed estimation is an important topic in the field of transportation. Commonly used methods include 3D solutions, 3D/2D combined solutions, pure 2D solutions, etc. Solutions that introduce 3D sensing usually have high hardware costs. Using pure 2D solutions usually requires manual calibration. Every time a new scene is installed or the camera posture is adjusted, it needs to be recalibrated, which is a lot of work.

发明内容Summary of the invention

针对上述背景技术介绍中存在的问题，本发明的目的在于提供了一种能够自动标定、精度较高、减少了人工操作的复杂度的单目的侧角度成像的车辆尺度估计与速度估计方法。In view of the problems existing in the above background technology introduction, the purpose of the present invention is to provide a vehicle scale estimation and speed estimation method for single-purpose side angle imaging that can be automatically calibrated, has high accuracy, and reduces the complexity of manual operation.

本发明采用的技术方案是：The technical solution adopted by the present invention is:

一种单目的侧角度成像的车辆尺度估计与速度估计方法，其具体步骤如下：A vehicle scale estimation and speed estimation method based on single-object side angle imaging, the specific steps of which are as follows:

S1，建立目标检测模型，对车辆目标、车轮、关键点、车道线进行检测；S1, establish a target detection model to detect vehicle targets, wheels, key points, and lane lines;

S2，进行目标跟踪，获取每辆车的跟踪序列以及车辆检测框和车辆分类结果；S2, perform target tracking, obtain the tracking sequence of each vehicle, as well as the vehicle detection frame and vehicle classification results;

S3，计算车轮中心点“投射到”路面的坐标得到车轴信息并对车道线进行分割得到多个车道；S3, calculating the coordinates of the wheel center point "projected" onto the road surface to obtain axle information and segmenting the lane lines to obtain multiple lanes;

S4，基于步骤S2中的同类目标的跟踪序列以及步骤S3中同类目标的车轴信息，对相应的目标进行自标定，获取该类目标的像素坐标系和实际坐标系的映射关系；S4, based on the tracking sequence of the same type of target in step S2 and the axle information of the same type of target in step S3, self-calibrate the corresponding target to obtain the mapping relationship between the pixel coordinate system and the actual coordinate system of the target;

S5，确定目标车辆，获取其跟踪序列、在不同位置的车轴信息、坐标信息，基于步骤S4中的映射关系，得到目标车辆最终长度估计值、车辆速度的估计值。S5, determine the target vehicle, obtain its tracking sequence, axle information at different positions, and coordinate information, and based on the mapping relationship in step S4, obtain the estimated value of the final length of the target vehicle and the estimated value of the vehicle speed.

进一步，步骤S1中的目标检测模型是混合头模型基于yolov5目标检测器进行扩展得到，其包括主干网络，所述主干网络后设有四个输出分支，每个输出分支对应一个上采样金字塔结构。输出分支一输出的是轴对齐目标框和车辆类别，用于车辆目标检测，输出分支二输出的是椭圆，用于车轮检测；输出分支三输出的是关键点，用于定位车辆外接矩形投影到地面的关键点1、关键点2以及关键点3，其中关键点1和关键点2的连线沿着车道线方向，关键点2和关键点3的连线垂直于车道线方向；输出分支四输出的是语义分割图，用于检测车道线。Furthermore, the target detection model in step S1 is obtained by expanding the hybrid head model based on the yolov5 target detector, which includes a backbone network, and the backbone network is followed by four output branches, each of which corresponds to an upsampling pyramid structure. Output branch 1 outputs an axis-aligned target frame and a vehicle category for vehicle target detection, and output branch 2 outputs an ellipse for wheel detection; output branch 3 outputs key points for locating key points 1, 2, and 3 of the vehicle circumscribed rectangle projected onto the ground, wherein the line connecting key points 1 and 2 is along the direction of the lane line, and the line connecting key points 2 and 3 is perpendicular to the direction of the lane line; output branch 4 outputs a semantic segmentation map for detecting lane lines.

进一步，输出分支一、输出分支二、输出分支三的每帧都要进行预测，输出分支四按照设定的间隔进行预测。Furthermore, each frame of output branch 1, output branch 2, and output branch 3 is predicted, and output branch 4 is predicted at a set interval.

进一步，对获取的数据集分为车道线数据集和车辆数据集，并分开进行标注。Furthermore, the acquired data set is divided into a lane line data set and a vehicle data set, and labeled separately.

进一步，步骤S2中的跟踪序列是基于iou跟踪方式和输出分支一的输出结果获取的。Further, the tracking sequence in step S2 is obtained based on the iou tracking mode and the output result of the output branch one.

进一步，步骤S3中的计算车轮中心点“投射到”路面的坐标的具体步骤包括：Further, the specific steps of calculating the coordinates of the wheel center point "projected onto" the road surface in step S3 include:

S31，基于关键点1和关键2的连线，获取直线A；S31, obtaining straight line A based on the connection line between key point 1 and key point 2;

S32，计算车轮中心点到直线A的垂点坐标，即车轮中心的投影点坐标。S32, calculating the coordinates of the perpendicular point from the center point of the wheel to the straight line A, that is, the coordinates of the projection point of the wheel center.

进一步，步骤S4的自标定的具体步骤如下：Further, the specific steps of the self-calibration in step S4 are as follows:

S41,坐标系设定，每条车道由于两条车道线限定,以“沿着于外侧车道线”的方向，为轴一；以“垂直于外侧车道线”的方向，为轴二；以“外侧车道线”与图像右边界的交点作为原点；S41, coordinate system setting. Each lane is defined by two lane lines. The direction "along the outer lane line" is the first axis. The direction "perpendicular to the outer lane line" is the second axis. The intersection of the "outer lane line" and the right edge of the image is the origin.

相机坐标和实际空间坐标的映射公式如下：The mapping formula between camera coordinates and actual space coordinates is as follows:

其中L表示“相机焦点到实际测量点的距离”；L_real代表“某点空间坐标下的实际长度”；L_pixel代表“此点拍照后的像素长度”；f代表“相机焦距”；cellSize代表“像元尺寸”；f和cellSize，为固定值；Where L represents the distance from the camera focus to the actual measurement point; L_real represents the actual length of a point in space; L_pixel represents the pixel length of this point after taking a photo; f represents the focal length of the camera; cellSize represents the pixel size; f and cellSize are fixed values;

L的计算公式如下：The calculation formula of L is as follows:

其中H0代表“相机焦点与轴一的垂直距离”，H1代表“测量点的轴二方向坐标值”，R0代表“相机焦点与轴二的垂直距离”，R1代表“测量点的轴一方向坐标值”；一旦相机固定安装后，H0和R0成为固定值。Among them, H0 represents the "vertical distance between the camera focus and axis one", H1 represents the "coordinate value of the measuring point in the direction of axis two", R0 represents the "vertical distance between the camera focus and axis two", and R1 represents the "coordinate value of the measuring point in the direction of axis one"; once the camera is fixedly installed, H0 and R0 become fixed values.

将公式2带入公式1，得到新表达式：Substituting formula 2 into formula 1, we get the new expression:

以Ratio表示，像素坐标到实际坐标的转换比例，由于公式3可以看出：Ratio represents the conversion ratio from pixel coordinates to actual coordinates. As can be seen from formula 3:

Ratio＝f(H1,R1) (公式4)Ratio＝f(H1,R1) (Formula 4)

由于H0、R0、f、cell_size为固定值，Ratio可以表示为由H1和R1变量控制的函数；Since H0, R0, f, and cell_size are fixed values, Ratio can be expressed as a function controlled by the H1 and R1 variables;

为了简化计算，将f(H1,R1)转化为平面函数，简化后的具体公式如下：In order to simplify the calculation, f(H1, R1) is converted into a plane function. The simplified specific formula is as follows:

Ratio＝a*H1+b*R1+c (公式5)Ratio＝a*H1+b*R1+c (Formula 5)

其中，a/b/c，为待求参数；Among them, a/b/c are the parameters to be found;

S42，使用最小二乘法拟合，需要引入多组已知点集S42, using the least squares method to fit, requires the introduction of multiple sets of known points

{(H1_0,R1_0,Ratio_0),(H1_1,R1_1,Ratio_1),......，{(H1_0,R1_0,Ratio_0),(H1_1,R1_1,Ratio_1),......,

(H1_n,R1_n,Ratio_n)}，简称拟合点集；每个点由于一个三元组构成，包括该点轴二方向像素坐标，该点轴一方向像素坐标，该点比例尺；(H1_n, R1_n, Ratio_n)}, referred to as the fitting point set; each point is composed of a triplet, including the pixel coordinates of the point in the second direction of the axis, the pixel coordinates of the point in the first direction of the axis, and the scale of the point;

S43,确定三元组，通过模型检测和后处理逻辑，可得到每个车辆目标轴心的像素坐标；S43, determining the triplet, and obtaining the pixel coordinates of the axis of each vehicle target through model detection and post-processing logic;

根据公式3，可以得到实际轴距的表示式如下：According to formula 3, the actual wheelbase can be expressed as follows:

其中,L_real轴距为车辆实际轴距；L_pixel轴距为车辆的像素轴距；d_x为为微分单元；Among them, L _{real wheelbase} is the actual wheelbase of the vehicle; L _{pixel wheelbase} is the pixel wheelbase of the vehicle; d _x is the differential unit;

对公式7进行近似计算：Approximate calculation of formula 7:

由于公式7和公式8，非常接近，误差可以忽略，因此后续对公式8进行分解，其中得到三元组的计算公式：Since Formula 7 and Formula 8 are very close, the error can be ignored, so Formula 8 is subsequently decomposed to obtain the calculation formula of the triplet:

其中，H1，R1，L_pixel轴距-----等像素级的坐标据可以基于模型输出+后处理，计算得到；L_real轴距2500mm；Among them, H1, R1, L _{pixel wheelbase} -----pixel-level coordinate data can be calculated based on model output + post-processing; L _{real wheelbase is} 2500mm;

S44，根据步骤S43的三原组计算法方式，基于标定参照的200组轿车的跟踪序列，计算出多个三元组，从而得到步骤S42的拟合点集；S44, according to the triplet calculation method of step S43, based on the 200 sets of car tracking sequences for calibration reference, multiple triplet groups are calculated to obtain the fitting point set of step S42;

S45,进行拟合，S45, fitting,

a.第一次拟合，基于所有三元组，拟合平面，计算得到一组(a，b，c)；假设标定完后的参数为(a，b，c)＝(a1，b1，c1)；将上述结构带入公式5。得到：a. The first fitting is based on all the triplets, fitting the plane, and calculating a set of (a, b, c); assuming that the parameters after calibration are (a, b, c) = (a1, b1, c1); substituting the above structure into formula 5. We get:

Ratio＝a1*H1+b1*R1+c1 (公式11)Ratio＝a1*H1+b1*R1+c1 (Formula 11)

b.剔除偏差较大的三元组b. Eliminate triplets with large deviations

基于公式11和公式9；计算偏差为:Based on Formula 11 and Formula 9; the calculated deviation is:

其中pia代表，经公式11计算的比例尺，与公式9中的比例尺的偏差；基于公式12计算得到的所有三元组偏差；对偏差最大的前20％数量的三元组，予以删除；保留80％的三元组；Where pia represents the deviation of the scale calculated by formula 11 from the scale in formula 9; the deviation of all triples calculated based on formula 12; the top 20% of the triplets with the largest deviations are deleted; and 80% of the triplets are retained;

c.第二次拟合，对b中剩余三元组，拟合平面，计算得到一组(a，b，c)；假设标定完后的参数为(a，b，c)＝(a2，b2，c2)。将上述结构带入公式5；得到：c. The second fitting, for the remaining triples in b, fit the plane and calculate a set (a, b, c); assume that the parameters after calibration are (a, b, c) = (a2, b2, c2). Substitute the above structure into formula 5; get:

Ratio＝a2*H1+b2*R1+c2 (公式12)Ratio＝a2*H1+b2*R1+c2 (Formula 12)

至此，自标定完毕。At this point, self-calibration is completed.

进一步，步骤S5中目标车辆最终长度估计值的计算步骤如下：Further, the calculation steps of the target vehicle final length estimation value in step S5 are as follows:

S51,获取目标车辆的跟踪序列，基于序列中车辆在不同位置的“关键点1、关键点2”计算出“像素层面的长度”；S51, obtaining a tracking sequence of the target vehicle, and calculating the “pixel-level length” based on “key point 1, key point 2” of the vehicle at different positions in the sequence;

S52,基于公式12，将像素层面的长度转换为空间长度；S52, based on formula 12, converting the length at the pixel level into the spatial length;

S53,对序列中的多个空间长度求平均，得到车辆最终长度估计值。S53, averaging multiple spatial lengths in the sequence to obtain an estimated final length of the vehicle.

进一步，步骤S5中目标车辆速度计算的计算步骤如下：Further, the calculation steps of the target vehicle speed calculation in step S5 are as follows:

S54,获取目标车辆的跟踪序列，基于序列中车辆在不同位置的车轴中心点结果，计算车辆最前车轴的中心点到地面的投影坐标；S54, obtaining a tracking sequence of the target vehicle, and calculating the projection coordinates of the center point of the front axle of the vehicle to the ground based on the axle center point results of the vehicle at different positions in the sequence;

S55,基于“首位置”的车辆“最前车轴中心点地面投影坐标”和“尾位置”的车辆“最前车轴中心点地面投影坐标”，计算像素层面的距离；S55, calculating the distance at the pixel level based on the “ground projection coordinates of the center point of the front axle” of the vehicle at the “first position” and the “ground projection coordinates of the center point of the front axle” of the vehicle at the “last position”;

S56,基于公式12进行坐标转换，得到“首位置”和“尾位置”的实际距离；S56, performing coordinate conversion based on formula 12 to obtain the actual distance between the “first position” and the “last position”;

S57,计算“首位置所在帧的成像时刻”和“尾位置所在帧的成像时刻”的时间差；S57, calculating the time difference between "the imaging time of the frame where the first position is located" and "the imaging time of the frame where the last position is located";

S58,基于首尾的实际距离/成像时间差，计算车辆速度的估计值。S58, based on the actual distance/imaging time difference between the front and rear ends, calculate the estimated value of the vehicle speed.

进一步，步骤S5中还能进行车辆宽度、车辆轴数计算；Furthermore, in step S5, the vehicle width and the number of vehicle axles can also be calculated;

车辆宽度的计算步骤如下：The steps to calculate the vehicle width are as follows:

利用“关键点2/关键点3”计算车辆像素宽度；Use "Key Point 2/Key Point 3" to calculate the vehicle pixel width;

计算关键点2和关键点3连线的中点坐标，进一步计算得到，“该中心点到左侧车道线的垂线”与两条车道线的交点；两个交点间的像素距离，代表车道像素宽度；Calculate the coordinates of the midpoint of the line connecting key point 2 and key point 3, and further calculate the intersection of the "vertical line from the center point to the left lane line" and the two lane lines; the pixel distance between the two intersection points represents the lane pixel width;

通过如下公式算出车辆宽度：(车辆像素宽度/车道像素宽度)*预配置的车道线宽度；The vehicle width is calculated using the following formula: (vehicle pixel width/lane pixel width)*preconfigured lane line width;

车辆轴数的计算步骤如下：The steps to calculate the number of vehicle axles are as follows:

获取目标车辆的跟踪序列，基于序列中车辆在不同位置的车轴数量；Get the tracking sequence of the target vehicle based on the number of axles of the vehicle at different positions in the sequence;

使用多帧投票方式,某“车轴数量”出现次数大于跟踪序列总数的一半，且该“车轴数量”大于等于2的，作为最终结果；否则，不返回车轴检测结果。Using the multi-frame voting method, if the number of occurrences of a certain "axle number" is greater than half of the total number of tracking sequences, and the "axle number" is greater than or equal to 2, it will be taken as the final result; otherwise, no axle detection result will be returned.

本发明与现有技术相比，其显著优点包括：能够自动标定、精度较高、减少了人工操作的复杂度。Compared with the prior art, the present invention has the following significant advantages: automatic calibration, high precision, and reduced complexity of manual operation.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的混合头模型的结构示意图；FIG1 is a schematic structural diagram of a mixing head model of the present invention;

图2是本发明的车辆目标框的结果示意图；FIG2 is a result schematic diagram of a vehicle target frame of the present invention;

图3是本发明的车轮检测结果示意图；FIG3 is a schematic diagram of wheel detection results of the present invention;

图4是本发明的关键点的结果示意图；FIG4 is a schematic diagram of the results of the key points of the present invention;

图5是本发明的车道线/车道的示意图；FIG5 is a schematic diagram of lane lines/lanes of the present invention;

图6是本发明的同一辆车的跟踪序列示意图；FIG6 is a schematic diagram of a tracking sequence of the same vehicle of the present invention;

图7是本发明的车轮中心投影点的示意图；FIG7 is a schematic diagram of a wheel center projection point of the present invention;

图8是本发明的车道线/车道的示意图；FIG8 is a schematic diagram of lane lines/lanes of the present invention;

图9是本发明的建立坐标系的示意图；FIG9 is a schematic diagram of establishing a coordinate system of the present invention;

图10是本发明的长度L的分解示意图；FIG10 is an exploded schematic diagram of the length L of the present invention;

图11是本发明的不同坐标的比例尺示意图；FIG11 is a schematic diagram of scales of different coordinates of the present invention;

图12是本发明的单个车辆的序列成像示意图；FIG12 is a schematic diagram of sequential imaging of a single vehicle of the present invention;

图13是本发明的车辆与焦点之间距离的示意图；FIG13 is a schematic diagram of the distance between a vehicle and a focus of the present invention;

图14是本发明的前后轮心中心点的示意图。FIG. 14 is a schematic diagram of the center points of the front and rear wheels of the present invention.

图15是本发明的三元组点集示意图。FIG. 15 is a schematic diagram of a triplet point set of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合具体实施例来对本发明进行进一步说明，但并不将本发明局限于这些具体实施方式。本领域技术人员应该认识到，本发明涵盖了权利要求书范围内所可能包括的所有备选方案、改进方案和等效方案。The present invention is further described below in conjunction with specific embodiments, but the present invention is not limited to these specific embodiments. Those skilled in the art should recognize that the present invention covers all possible alternatives, improvements and equivalents within the scope of the claims.

在本发明的描述中，需要理解的是，术语“中心”、“纵向”、“横向”、“长度”、“宽度”、“厚度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”“内”、“顺时针”、“逆时针”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本发明的描述中，除非另有说明，“多个”的含义是两个或两个以上，除非另有明确的限定。In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inside", "clockwise", "counterclockwise" and the like indicate positions or positional relationships based on the positions or positional relationships shown in the accompanying drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be understood as limiting the present invention. In addition, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. In the description of the present invention, unless otherwise specified, the meaning of "multiple" is two or more, unless otherwise clearly defined.

在本发明中，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, unless otherwise clearly specified and limited, the terms "installed", "connected", "connected", "fixed" and the like should be understood in a broad sense, for example, it can be a fixed connection, a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection, or it can be an indirect connection through an intermediate medium, or it can be the internal communication of two components. For ordinary technicians in this field, the specific meanings of the above terms in the present invention can be understood according to specific circumstances.

在本发明中，除非另有明确的规定和限定，第一特征在第二特征之“上”或之“下”可以包括第一和第二特征直接接触，也可以包括第一和第二特征不是直接接触而是通过它们之间的另外的特征接触。而且，第一特征在第二特征“之上”、“上方”和“上面”包括第一特征在第二特征正上方和斜上方，或仅仅表示第一特征水平高度高于第二特征。第一特征在第二特征“之下”、“下方”和“下面”包括第一特征在第二特征正下方和斜下方，或仅仅表示第一特征水平高度小于第二特征。In the present invention, unless otherwise clearly specified and limited, a first feature being "above" or "below" a second feature may include that the first and second features are in direct contact, or may include that the first and second features are not in direct contact but are in contact through another feature between them. Moreover, a first feature being "above", "above" and "above" a second feature includes that the first feature is directly above and obliquely above the second feature, or simply indicates that the first feature is higher in level than the second feature. A first feature being "below", "below" and "below" a second feature includes that the first feature is directly below and obliquely below the second feature, or simply indicates that the first feature is lower in level than the second feature.

专业术语解释：Explanation of professional terms:

yolov5是一种单阶段目标检测算法，该算法在YOLOv4的基础上添加了一些新的改进思路，使其速度与精度都得到了极大的性能提升。主要的改进思路如下所示：输入端：在模型训练阶段，提出了一些改进思路，主要包括Mosaic数据增强、自适应锚框计算、自适应图片缩放；基准网络：融合其它检测算法中的一些新思路，主要包括：Focus结构与CSP结构；Neck网络：目标检测网络在BackBone与最后的Head输出层之间往往会插入一些层，Yolov5中添加了FPN+PAN结构；Head输出层：输出层的锚框机制与YOLOv4相同，主要改进的是训练时的损失函数GIOU_Loss，以及预测框筛选的DIOU_nms。Yolov5 is a single-stage target detection algorithm. It adds some new improvement ideas based on YOLOv4, which greatly improves its speed and accuracy. The main improvement ideas are as follows: Input: In the model training stage, some improvement ideas are proposed, mainly including Mosaic data enhancement, adaptive anchor box calculation, and adaptive image scaling; Baseline network: Integrate some new ideas in other detection algorithms, mainly including: Focus structure and CSP structure; Neck network: The target detection network often inserts some layers between BackBone and the final Head output layer. Yolov5 adds FPN+PAN structure; Head output layer: The anchor box mechanism of the output layer is the same as YOLOv4. The main improvements are the loss function GIOU_Loss during training and the prediction box screening DIOU_nms.

iou跟踪方式是一种目标跟踪方式。The iou tracking method is a target tracking method.

本实施例提供了一种单目的侧角度成像的车辆尺度估计与速度估计方法，其具体步骤如下：This embodiment provides a vehicle size estimation and speed estimation method based on single-object side angle imaging, and the specific steps are as follows:

其中目标检测模型是混合头模型基于yolov5目标检测器进行扩展得到，其包括主干网络，所述主干网络后设有四个输出分支，每个输出分支对应一个上采样金字塔结构。见图1所示。The target detection model is obtained by extending the hybrid head model based on the yolov5 target detector, which includes a backbone network, and the backbone network is followed by four output branches, each of which corresponds to an upsampling pyramid structure, as shown in FIG1 .

输出分支一输出的是轴对齐目标框和车辆类别，用于车辆目标检测，使用原本的上采样金字塔结构，共3层，每层相对于输入图片的缩放比例分别为1/32，1/16，1/8，上采样金字塔中的3个层都作为分支一的输出层。车辆类别包括：小轿车，suv，面包，商务，bus，小卡，大卡，其他等。后处理方式，直接使用检测模型常用的“最大值抑制”，得到最终输出框。见图2所示。The output of output branch 1 is the axis-aligned target box and vehicle category, which is used for vehicle target detection. The original upsampling pyramid structure is used, with a total of 3 layers. The scaling ratio of each layer relative to the input image is 1/32, 1/16, and 1/8 respectively. The 3 layers in the upsampling pyramid are used as the output layers of branch 1. Vehicle categories include: sedan, SUV, bread, business, bus, small truck, large truck, and others. In the post-processing method, the "maximum suppression" commonly used in the detection model is directly used to obtain the final output box. See Figure 2.

输出分支二输出的是椭圆，用于车轮检测，所对应的上采样金字塔结构共3层(1/32，1/16，1/8),只有1/8层作为输出层。后处理方式，直接使用最大值抑制，与矩形框的最大值抑制类似，得到最终输出椭圆。见图3所示。The output of the second output branch is an ellipse, which is used for wheel detection. The corresponding upsampling pyramid structure has 3 layers (1/32, 1/16, 1/8), and only 1/8 layer is used as the output layer. The post-processing method directly uses maximum suppression, which is similar to the maximum suppression of the rectangular frame, to obtain the final output ellipse. See Figure 3.

输出分支三输出的是关键点，用于定位车辆外接矩形投影到地面的关键点1、关键点2以及关键点3，其中关键点1和关键点2的连线沿着车道线方向，关键点2和关键点3的连线垂直于车道线方向；所对应的上采样金字塔结构共3层(1/32，1/16，1/8，1/4)，只使用1/4层作为heatmap。后处理，使用类heatmap关键点层的峰值过滤方式，进行最大值抑制，得到最终输出关键点。见图4所示。Output branch 3 outputs key points, which are used to locate key points 1, 2, and 3 of the vehicle's circumscribed rectangle projected onto the ground. The line connecting key points 1 and 2 is along the lane line, and the line connecting key points 2 and 3 is perpendicular to the lane line. The corresponding upsampling pyramid structure has 3 layers (1/32, 1/16, 1/8, 1/4), and only 1/4 layer is used as heatmap. Post-processing uses the peak filtering method of the heatmap-like key point layer to suppress the maximum value and obtain the final output key points. See Figure 4.

输出分支四输出的是语义分割图，用于检测车道线；所对应的上采样金字塔结构共3层(1/32，1/16，1/8，1/4)，只使用1/4层作为语义分割输出层。后处理方式为，首先输出层阈值分割，得到二值图；然后，对二值图进行骨架提取，得到骨架图；最后对骨架进行霍夫变换拟合，得到车道线的直线坐标。见图5所示。The output of output branch 4 is a semantic segmentation map, which is used to detect lane lines. The corresponding upsampling pyramid structure has 3 layers (1/32, 1/16, 1/8, 1/4), and only 1/4 layer is used as the semantic segmentation output layer. The post-processing method is to first perform threshold segmentation on the output layer to obtain a binary map; then perform skeleton extraction on the binary map to obtain a skeleton map; finally, perform Hough transform fitting on the skeleton to obtain the linear coordinates of the lane line. See Figure 5.

主干网络：每帧都要推理。输出分支一、输出分支二、输出分支三的每帧都要进行预测，得到每帧的车辆检测结果、关键点、车轮检测结果，以保持车辆结果的连续性。输出分支四按照设定的间隔进行预测(例如每50帧预测1次)，用于提高整体的推理速度。车道线主要用于辅助标定，不需要每帧都检测。Backbone network: Inference is performed for each frame. Output branches 1, 2, and 3 are predicted for each frame to obtain vehicle detection results, key points, and wheel detection results for each frame to maintain the continuity of vehicle results. Output branch 4 is predicted at set intervals (for example, once every 50 frames) to improve the overall inference speed. Lane lines are mainly used to assist calibration and do not need to be detected in every frame.

模型进行训练时需要注意的是，对获取的数据集分为车道线数据集和车辆数据集，并分开进行标注。车道线数据集，只有车道线标签，无车辆标签；车辆数据集，只有车辆标签，没有车道线标签。车道线样本只监督分支四，分支一/分支二/分支三不参与回传；车辆样本只监督分支一/分支二/分支三，不监督分支四。When training the model, it is important to note that the acquired data set is divided into a lane line data set and a vehicle data set, and they are labeled separately. The lane line data set only has lane line labels but no vehicle labels; the vehicle data set only has vehicle labels but no lane line labels. Lane line samples only supervise branch 4, and branches 1/2/3 are not involved in the feedback; vehicle samples only supervise branches 1/2/3, and do not supervise branch 4.

S2，进行目标跟踪，获取每辆车的跟踪序列以及车辆检测框和车辆分类结果；其中跟踪序列是基于iou跟踪方式和输出分支一的输出结果获取的。分支一的输出结果，包含车辆检测框坐标和分类结果。见图6所示。S2, perform target tracking, obtain the tracking sequence of each vehicle, the vehicle detection frame and the vehicle classification result; the tracking sequence is obtained based on the IOU tracking method and the output result of output branch 1. The output result of branch 1 includes the coordinates of the vehicle detection frame and the classification result. See Figure 6.

其中的计算车轮中心点“投射到”路面的坐标的具体步骤包括：The specific steps of calculating the coordinates of the wheel center point "projected" to the road surface include:

S32，计算车轮中心点到直线A的垂点坐标，即车轮中心的投影点坐标。见图7所示。S32, calculating the coordinates of the perpendicular point from the wheel center point to the straight line A, that is, the coordinates of the projection point of the wheel center, as shown in FIG7 .

基于步骤S1中的车道线检测结果，分割出多个车道。每两条车道线，可以表示一个车道区域。为了便于区分，将同一车道中两条车道线，称为“近侧车道线”和“远侧车道线”。见图8所示，之所以要将各条车道单独示意，是因为后续的标定和计算，均是按照车道进行的。每个车道对应一组标定参数。Based on the lane line detection results in step S1, multiple lanes are segmented. Every two lane lines can represent a lane area. For easy distinction, the two lane lines in the same lane are called "near lane line" and "far lane line". As shown in Figure 8, the reason why each lane is illustrated separately is that the subsequent calibration and calculation are all performed according to the lane. Each lane corresponds to a set of calibration parameters.

步骤S4的自标定的具体步骤如下：The specific steps of the self-calibration in step S4 are as follows:

S41,坐标系设定，见图9所示，每条车道由于两条车道线限定,以“沿着于外侧车道线”的方向，为轴一；以“垂直于外侧车道线”的方向，为轴二；以“外侧车道线”与图像右边界的交点作为原点；S41, coordinate system setting, as shown in FIG9 , each lane is defined by two lane lines, with the direction “along the outer lane line” as axis one, the direction “perpendicular to the outer lane line” as axis two, and the intersection of the “outer lane line” and the right edge of the image as the origin;

L的计算公式如下：The calculation formula of L is as follows:

其中H0代表“相机焦点与轴一的垂直距离”，H1代表“测量点的轴二方向坐标值”，R0代表“相机焦点与轴二的垂直距离”，R1代表“测量点的轴一方向坐标值”；一旦相机固定安装后，H0和R0成为固定值。见图10所示。Among them, H0 represents the "vertical distance between the camera focus and axis 1", H1 represents the "axis 2 coordinate value of the measuring point", R0 represents the "vertical distance between the camera focus and axis 2", and R1 represents the "axis 1 coordinate value of the measuring point". Once the camera is fixedly installed, H0 and R0 become fixed values. See Figure 10.

Ratio＝f(H1,R1) (公式4)Ratio＝f(H1,R1) (Formula 4)

从图11中可以看出，公式4近似于一个平面。为了简化计算，将f(H1,R1)转化为平面函数，简化后的具体公式如下：As can be seen from Figure 11, Formula 4 approximates a plane. In order to simplify the calculation, f(H1, R1) is converted into a plane function. The simplified specific formula is as follows:

Ratio＝a*H1+b*R1+c (公式5)Ratio＝a*H1+b*R1+c (Formula 5)

S43,确定三元组，见图14所示，通过模型检测和后处理逻辑，可得到每个车辆目标轴心的像素坐标；S43, determine the triplet, as shown in FIG14, through model detection and post-processing logic, the pixel coordinates of the axis of each vehicle target can be obtained;

其中,L_real轴距为车辆实际轴距；L_pixel轴距为车辆的像素轴距；d_x为为微分单元，见图13所示；Among them, L _{real wheelbase} is the actual wheelbase of the vehicle; L _{pixel wheelbase} is the pixel wheelbase of the vehicle; d _x is the differential unit, as shown in Figure 13;

对公式7进行近似计算：Approximate calculation of formula 7:

S44，根据步骤S43的三原组计算法方式，基于标定参照的200组轿车的跟踪序列，计算出多个三元组，从而得到步骤S42的拟合点集，见图15所示；S44, according to the triplet calculation method of step S43, based on the 200 sets of car tracking sequences for calibration reference, multiple triplet groups are calculated, thereby obtaining the fitting point set of step S42, as shown in FIG15;

其中标定参照的200组轿车的跟踪序列的获取如下：The tracking sequences of 200 groups of cars for calibration reference are obtained as follows:

a.每个车道，选取200组跟踪序列，作为标定参照，见图12所示。a. For each lane, 200 tracking sequences are selected as calibration references, as shown in Figure 12.

b.跟踪序列的获取方式，使用目标检测和目标跟踪，获取。b. The tracking sequence is obtained by using target detection and target tracking.

c.只使用轿车类别，作为参照。c. Only the sedan category is used as a reference.

d.使用轿车的“像素层面轴距”和“实际层面轴距”进行标定计算。d. Use the car’s “pixel-level wheelbase” and “actual-level wheelbase” for calibration calculations.

e.轿车车轴的长度范围为“2400-2550mm”，而本文准备对所有轿车轴距的实际值，使用2500mm作为参考值。e. The length range of car axles is "2400-2550mm", and this article intends to use 2500mm as a reference value for the actual value of the wheelbase of all cars.

S45,进行拟合，S45, fitting,

Ratio＝a1*H1+b1*R1+c1 (公式11)Ratio＝a1*H1+b1*R1+c1 (Formula 11)

b.剔除偏差较大的三元组b. Eliminate triplets with large deviations

其中pia代表，经公式11计算的比例尺，与公式9中的比例尺的偏差；Where pia represents the deviation of the scale calculated by formula 11 from the scale in formula 9;

基于公式12计算得到的所有三元组偏差；对偏差最大的前20％数量的三元组，予以删除；保留80％的三元组；Based on the deviations of all triples calculated by formula 12, the top 20% of triples with the largest deviations are deleted, and 80% of the triples are retained;

Ratio＝a2*H1+b2*R1+c2 (公式12)Ratio＝a2*H1+b2*R1+c2 (Formula 12)

至此，自标定完毕。At this point, self-calibration is completed.

其中目标车辆最终长度估计值的计算步骤如下：The calculation steps of the target vehicle's final length estimate are as follows:

其中目标车辆速度计算的计算步骤如下：The calculation steps for the target vehicle speed calculation are as follows:

本实施例步骤S5中还能进行车辆宽度、车辆轴数计算；In step S5 of this embodiment, the vehicle width and the number of vehicle axles can also be calculated;

如果只需要对车辆的“速度估计/长度估计/轴数计算”，则完全不需要人工标注；如果除了“速度估计/长度估计/轴数计算”几项，还需计算“车辆宽度”，则需要人工输入每个车道的宽度值即可。本发明能够自动标定、精度较高、减少了人工操作的复杂度。If only the "speed estimation/length estimation/axle number calculation" of the vehicle is needed, no manual labeling is required; if in addition to the "speed estimation/length estimation/axle number calculation", the "vehicle width" needs to be calculated, then the width value of each lane needs to be manually input. The present invention can automatically calibrate, has high accuracy, and reduces the complexity of manual operation.

Claims

1. A monocular side angle imaging vehicle scale estimation and speed estimation method comprises the following specific steps:

S1, a target detection model is established, and a vehicle target, wheels, key points and lane lines are detected;

s2, target tracking is carried out, and a tracking sequence of each vehicle, a vehicle detection frame and a vehicle classification result are obtained;

S3, calculating coordinates of a wheel center point projected onto a road surface to obtain axle information, and dividing lane lines to obtain a plurality of lanes;

s4, based on the tracking sequence of the similar targets in the step S2 and the axle information of the similar targets in the step S3, performing self-calibration on the corresponding targets to acquire the mapping relation between the pixel coordinate systems and the actual coordinate systems of the similar targets; the specific steps of the self-calibration are as follows:

s41, setting a coordinate system, wherein each lane is defined by two lane lines, and takes the direction along the outer lane line as an axis I; taking the direction vertical to the outer lane line as a second axis; taking the intersection point of the outer lane line and the right boundary of the image as an origin;

the mapping formula of the camera coordinates and the actual space coordinates is as follows:

Wherein L represents "distance of camera focus to actual measurement point"; l _real represents "actual length under a certain point space coordinate"; l_pixel represents "pixel length after this point photographing"; f represents "camera focal length"; cellSize represents the "pixel size"; f and cellSize are fixed values;

The calculation formula of L is as follows:

Wherein H0 represents the vertical distance between the camera focus and the first axis, H1 represents the coordinate value of the second axis of the measuring point, R0 represents the vertical distance between the camera focus and the second axis, and R1 represents the coordinate value of the first axis of the measuring point; once the camera is fixedly installed, H0 and R0 become fixed values;

Bringing equation 2 into equation 1 yields a new expression:

Expressed in Ratio, the conversion Ratio of pixel coordinates to actual coordinates can be seen from equation 3:

Ratio=f (H1, R1) (formula 4)

Since H0, R0, f, cellSize are fixed values, ratio can be expressed as a function controlled by the H1 and R1 variables;

to simplify the calculation, f (H1, R1) is converted into a planar function, and the specific formula after simplification is as follows:

Ratio=a h1+b r1+c (formula 5), wherein a/b/c is the parameter to be solved;

s42, using least squares fitting, it is necessary to introduce multiple sets of known point sets { (h1_0, r1_0, ratio_0), (h1_1, r1_1, ratio_1),.

(H1_n, R1_n, ratio_n) }, abbreviated as fitting point set; each point is formed by a triplet and comprises a point axis two-direction pixel coordinate, a point axis one-direction pixel coordinate and a point scale;

s43, determining a triplet, and obtaining pixel coordinates of each vehicle target axle center through model detection and post-processing logic;

from equation 3, the expression for the actual wheelbase can be found as follows:

Wherein L _{real Wheelbase} is the actual wheelbase of the vehicle; l _{pixel Wheelbase} is the pixel wheelbase of the vehicle; d _x is a differential unit;

approximation calculation was performed for equation 7:

since equation 7 and equation 8 are very close, the error is negligible, so equation 8 is subsequently decomposed, where the triplet calculation equation is derived:

Wherein, the coordinate data of H1, R1, L _{pixel Wheelbase} - - -at the level of equal pixels can be calculated based on model output+post-processing; l _{real Wheelbase} mm;

s44, calculating a plurality of triples based on the tracking sequence of the 200 groups of cars with calibration reference according to the three-tuple calculation method in the step S43, so as to obtain a fitting point set in the step S42;

S45, fitting is carried out,

A. fitting for the first time, and calculating a group (a, b, c) based on all triples and fitting planes; assuming that the calibrated parameter is (a, b, c) = (a 1, b1, c 1); bringing the above structure into formula 5; the method comprises the following steps:

Ratio=a1 h1+b1 R1+c1 (formula 11)

B. triads with larger rejection bias

Based on equation 11 and equation 9; the calculated deviation is:

Wherein pia represents the deviation of the scale calculated by formula 11 from the scale in formula 9;

all triad deviations calculated based on equation 12; deleting the triples of which the number is the first 20% of the maximum deviation; 80% of triples remain;

c. fitting the remaining triples in the step b for the second time, fitting planes, and calculating to obtain a group (a, b, c); assuming that the calibrated parameter is (a, b, c) = (a 2, b2, c 2); bringing the above structure into formula 5; the method comprises the following steps:

Ratio=a2 h1+b2 r1+c2 (formula 13) to this point, the self calibration is completed;

S5, determining the target vehicle, acquiring tracking sequences, axle information and coordinate information of the target vehicle at different positions, and obtaining a final length estimated value of the target vehicle and an estimated value of the vehicle speed based on the mapping relation in the step S4.

2. A monocular side angle imaging vehicle dimension estimation and speed estimation method according to claim 1, wherein: the target detection model in the step S1 is obtained by expanding a hybrid head model based on yolov target detectors, and comprises a main network, wherein four output branches are arranged behind the main network, and each output branch corresponds to an up-sampling pyramid structure.

3. A monocular side angle imaging vehicle dimension estimation and speed estimation method according to claim 2, wherein: each frame of the first output branch, the second output branch and the third output branch is predicted, and the fourth output branch is predicted according to a set interval.

4. A monocular side angle imaging vehicle dimension estimation and speed estimation method according to claim 2, wherein: the acquired data set is divided into a lane line data set and a vehicle data set, and is marked separately.

5. A monocular side angle imaging vehicle dimension estimation and speed estimation method according to claim 2, wherein: the tracking sequence in step S2 is obtained based on the iou tracking mode and the output result of the output branch one.

6. A monocular side angle imaging vehicle dimension estimation and speed estimation method according to claim 2, wherein: the specific step of calculating coordinates of the wheel center point "projected onto" the road surface in step S3 includes:

S31, acquiring a straight line A based on the connection line of the key point 1 and the key 2;

s32, calculating the vertical point coordinates from the wheel center point to the straight line A, namely, the projection point coordinates of the wheel center.

7. A monocular side angle imaging vehicle dimension estimation and speed estimation method according to claim 1, wherein: the calculation step of the final length estimation value of the target vehicle in step S5 is as follows:

S51, acquiring a tracking sequence of a target vehicle, and calculating the length of a pixel layer based on the key points 1 and 2 of the vehicles at different positions in the sequence;

S52, converting the length of the pixel layer surface into a space length based on the formula 13;

And S53, averaging a plurality of space lengths in the sequence to obtain a final length estimation value of the vehicle.

8. A monocular side angle imaging vehicle dimension estimation and speed estimation method according to claim 1, wherein: the calculation step of the target vehicle speed calculation in step S5 is as follows:

S54, acquiring a tracking sequence of a target vehicle, and calculating projection coordinates from the center point of the forefront axle of the vehicle to the ground based on axle center point results of the vehicle in different positions in the sequence;

s55, calculating the distance of the pixel layer based on the ground projection coordinates of the central point of the front axle of the vehicle at the 'head position' and the ground projection coordinates of the central point of the front axle of the vehicle at the 'tail position';

s56, performing coordinate transformation based on the formula 13 to obtain the actual distance between the 'head position' and the 'tail position';

s57, calculating the time difference between the imaging moment of the frame where the head position is located and the imaging moment of the frame where the tail position is located;

s58, calculating an estimated value of the vehicle speed based on the head-to-tail actual distance/imaging time difference.

9. A monocular side angle imaging vehicle dimension estimation and speed estimation method according to claim 1, wherein: in the step S5, the vehicle width and the vehicle axle number can be calculated;

The vehicle width is calculated as follows:

calculating the vehicle pixel width by using the key point 2/key point 3;

calculating the midpoint coordinates of the connecting line of the key point 2 and the key point 3, and further calculating to obtain the intersection point of the perpendicular line from the center point to the left lane line and the two lane lines; the pixel distance between the two intersection points represents the lane pixel width;

The vehicle width is calculated by the following formula: (vehicle pixel width/lane pixel width) preconfigured lane line width;

The calculation steps of the vehicle axle number are as follows:

Acquiring a tracking sequence of a target vehicle, and based on the number of axles of the vehicle in different positions in the sequence;

Using a multi-frame voting mode, wherein the occurrence number of a certain 'number of axles' is more than half of the total number of tracking sequences, and the 'number of axles' is more than or equal to 2, and the final result is obtained; otherwise, the axle detection result is not returned.