CN103177248B

CN103177248B - A kind of rapid pedestrian detection method of view-based access control model

Info

Publication number: CN103177248B
Application number: CN201310132965.1A
Authority: CN
Inventors: 周泓; 陈益如; 杨思思; 程添; 蔡宇
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2013-04-16
Filing date: 2013-04-16
Publication date: 2016-03-23
Anticipated expiration: 2033-04-16
Also published as: CN103177248A

Abstract

The invention discloses a kind of rapid pedestrian detection method of view-based access control model, first the method obtains the video image in vehicle forward march by the camera be arranged on vehicle, then class Lis Hartel is adopted to levy as pedestrian's Expressive Features, build multiple dimensioned cascade classifier as pedestrian detector, series connection concatenation tactic is adopted to realize the Classification and Identification of pedestrian and non-pedestrian in real time fast, finally determine and the moving window that pedestrian's feature is mated most with non-maxima suppression algorithm, determine the position of pedestrian.If the moving window do not matched with pedestrian's feature after above-mentioned steps judges, then judge in the image inputted without pedestrian.Pedestrian detection technology is pushed to practical by the inventive method, enables pedestrian detection technology be applicable in actual engineer applied, as all having huge applicable prospect in security protection video monitoring field, automobile Initiative Defense security fields.

Description

A Fast Pedestrian Detection Method Based on Vision

技术领域technical field

本发明涉及计算机视觉技术及汽车先进辅助驾驶领域，尤其涉及一种基于视觉的快速行人检测方法。The invention relates to computer vision technology and the field of advanced assisted driving of automobiles, in particular to a vision-based rapid pedestrian detection method.

技术背景technical background

随着最近十年汽车数量的急速增长，道路交通安全已成为一个全球范围内的重要问题，一份世界卫生组织的报告显示交通事故是造成伤亡的主要原因之一，每年全球因交通事故导致伤亡大约有1000万人，其中200-300万人为严重伤亡。那些易受伤害的道路使用者（如行人，骑自行车者和其他小型交通工具乘坐者）占据了交通事故中受害者的绝大部分。根据2003年美国报道的交通事故数据统计，在美国35,000例道路交通伤亡事故中，有5,000例涉及到行人与车辆碰撞；在欧盟地区，由于车辆与行人的碰撞，导致了150,000人受伤，7,000死亡。因此，面对频发且日趋严重的道路交通事故，国内外的研究机构从车辆自主防御的角度提出了先进驾驶辅助系统(AdvancedDriverAssistanceSystems,ADAS)来避免交通事故的发生或者减小交通事故的严重性，提高道路交通安全。而其中的一个重要组成部分就是行人检测系统(PedestrianDetectionSystems,PDS)就是从保护道路行人的角度提出的汽车主动防御系统。With the rapid increase in the number of cars in the last decade, road traffic safety has become an important issue worldwide. A World Health Organization report shows that traffic accidents are one of the main causes of casualties, and traffic accidents cause casualties worldwide every year There were about 10 million people, of which 2-3 million were serious casualties. Vulnerable road users (such as pedestrians, cyclists and other small vehicle occupants) account for the vast majority of victims in traffic accidents. According to statistics on traffic accidents reported in the United States in 2003, among the 35,000 road traffic casualties in the United States, 5,000 cases involved collisions between pedestrians and vehicles; in the European Union, 150,000 people were injured and 7,000 died due to collisions between vehicles and pedestrians . Therefore, in the face of frequent and increasingly serious road traffic accidents, domestic and foreign research institutions have proposed Advanced Driver Assistance Systems (Advanced Driver Assistance Systems, ADAS) from the perspective of vehicle autonomous defense to avoid traffic accidents or reduce the severity of traffic accidents. , Improve road traffic safety. And one of the important components is the pedestrian detection system (Pedestrian Detection Systems, PDS), which is an active defense system for automobiles from the perspective of protecting road pedestrians.

行人检测系统是指通过安装在车辆的传感器（光学摄像头、红外摄像头、雷达等）获取车辆前进方向的道路信息，然后借助一定的智能检测算法判断车辆行驶环境中出现的行人，并判断行人与车辆的空间关系，对可能发生危险情况向驾驶员发出警报或者对车辆执行自动刹车。基于视觉的车载行人检测系统，采用光学摄像头作为主要的传感器，一方面能够辅助扩展驾驶员的视野，减小因车辆结构造成的驾驶员视觉盲区，能够提前预警出现在盲区中的行人，避免车辆与盲区中突现的行人或车辆发生碰撞。尤其对于存在较大视觉盲区的大型工程车，基于视觉的行人检测系统具有非常重要的工程意义；另一方面基于视觉的车载行人检测系统能够辅助经验不足的驾驶员，判断车辆与行人的距离关系，提高驾驶员行车的安全性，减少道路交通事故的发生。。The pedestrian detection system refers to obtaining the road information of the vehicle's forward direction through the sensors installed on the vehicle (optical camera, infrared camera, radar, etc.), and then using a certain intelligent detection algorithm to judge the pedestrians appearing in the driving environment of the vehicle, and to judge the distance between pedestrians and vehicles. The spatial relationship of the vehicle can be used to warn the driver of possible dangerous situations or perform automatic braking on the vehicle. The vehicle-mounted pedestrian detection system based on vision uses an optical camera as the main sensor. On the one hand, it can assist in expanding the driver's field of vision, reduce the driver's visual blind spot caused by the vehicle structure, and can give early warning of pedestrians appearing in the blind spot to avoid vehicles. A collision with a pedestrian or vehicle emerging from the blind spot. Especially for large engineering vehicles with large visual blind spots, the vision-based pedestrian detection system has very important engineering significance; on the other hand, the vision-based on-board pedestrian detection system can assist inexperienced drivers to judge the distance relationship between vehicles and pedestrians , improve the driver's driving safety and reduce the occurrence of road traffic accidents. .

目前基于视觉的行人检测系统一般对道路的适应能力差，且检测速度比较慢，处理速度一般低于1秒每帧(framepersecond,fps)。同时行人检测的准确率普遍不高。因此针对基于视觉的行人检测系统计算速率低，道路适应能力差等问题，本发明提出了一种基于视觉方案的快速行人检测方法，该方法能够实现实时速率的行人检测速率，保证一定的行人检测准备率，同时，该方法具备较强的道路、行人多样性的适应能力，具备工程应用的前景。At present, the vision-based pedestrian detection system generally has poor adaptability to the road, and the detection speed is relatively slow, and the processing speed is generally lower than 1 second per frame (frame per second, fps). At the same time, the accuracy of pedestrian detection is generally not high. Therefore, aiming at the problems of low calculation rate and poor road adaptability of the vision-based pedestrian detection system, the present invention proposes a fast pedestrian detection method based on vision scheme, which can realize real-time pedestrian detection rate and ensure certain pedestrian detection At the same time, this method has a strong adaptability to road and pedestrian diversity, and has the prospect of engineering application.

发明内容Contents of the invention

本发明的目的是克服现有的基于视觉的行人检测技术的不足，提供一种基于视觉的快速行人检测方法。The purpose of the present invention is to overcome the shortcomings of the existing pedestrian detection technology based on vision, and provide a fast pedestrian detection method based on vision.

本发明的目的是通过以下技术方案来实现的：一种基于视觉的快速行人检测方法，该方法包括以下内容：The purpose of the present invention is achieved by the following technical solutions: a visual-based fast pedestrian detection method, the method includes the following:

（1）通过安装在车辆上的摄像头获取车辆前进道路上的视频图像；(1) Obtain video images of the vehicle on the road ahead through the camera installed on the vehicle;

（2）将步骤1获取的视频图像进行逐帧处理：对输入图像分别计算色彩不变参数特征通道图像、HOG特征通道图像、梯度幅值特征通道图像，得到、、和特征通道图像，其中，、、为色彩不变参数特征通道图像，为HOG特征通道图像，梯度幅值特征通道图像；(2) Process the video image obtained in step 1 frame by frame: calculate the color invariant parameter feature channel image, HOG feature channel image, and gradient amplitude feature channel image for the input image respectively, and obtain , , and feature channel image, where, , , is the color-invariant parameter feature channel image, is the HOG feature channel image, gradient magnitude feature channel image;

（3）分别计算步骤2中的、、和特征通道图像对应的整型图像表示方法，得到各特征通道图像对应的整型特征通道图像：、、、；(3) Calculate respectively in step 2 , , and The integer image representation method corresponding to the feature channel image, and the integer feature channel image corresponding to each feature channel image is obtained: , , , ;

（4）采用不同尺度的滑动窗口遍历步骤3中得到的各整型特征通道图像，计算每个滑动窗口内的类哈尔特征作为行人描述特征；(4) Use sliding windows of different scales to traverse the integer feature channel images obtained in step 3, and calculate the Haar-like features in each sliding window as pedestrian description features;

（5）使用行人检测器检测步骤4计算得到的行人描述特征，判断输入的特征是否是与行人相关的特征；(5) Use the pedestrian detector to detect the pedestrian description features calculated in step 4, and judge whether the input features are related to pedestrians;

（6）采用串联级联策略提高步骤5中行人检测器检测输入特征的速率和效率；(6) Adopt a series cascading strategy to improve the rate and efficiency of the pedestrian detector detecting input features in step 5;

（7）用非极大值抑制算法确定与行人特征最匹配的滑动窗口，确定行人的位置；若经过上述步骤判断后没有与行人特征相匹配的滑动窗口，则判断输入的图像中无行人。(7) Use the non-maximum value suppression algorithm to determine the sliding window that best matches the characteristics of the pedestrian, and determine the position of the pedestrian; if there is no sliding window that matches the characteristics of the pedestrian after the above steps, it is judged that there is no pedestrian in the input image.

本发明的有益效果是，本发明在保证行人检测准确率的前提下，提高了行人检测方法检测道路中行人的速率，使检测速率达到了实时检测的水平。同时该快速行人检测方法具有较强的道路、行人多样性的适应能力。上述对行人检测方法技术性的提高将行人检测方法进一步推向实用化，是行人检测方法具备了工程应用价值。The beneficial effect of the present invention is that, on the premise of ensuring the accuracy of pedestrian detection, the present invention improves the detection rate of pedestrians in the road by the pedestrian detection method, and makes the detection rate reach the level of real-time detection. At the same time, the fast pedestrian detection method has strong adaptability to road and pedestrian diversity. The above-mentioned technical improvement of the pedestrian detection method further pushes the pedestrian detection method to practicality, which makes the pedestrian detection method have engineering application value.

附图说明Description of drawings

图1是图像采集示意图；Fig. 1 is a schematic diagram of image acquisition;

图2是基于4*4邻域的梯度直方图示意图；Figure 2 is a schematic diagram of a gradient histogram based on a 4*4 neighborhood;

图3是拉普拉斯算子示意图；Fig. 3 is a schematic diagram of the Laplacian operator;

图4是类哈尔特征示意图。Fig. 4 is a schematic diagram of Haar-like features.

具体实施方式detailed description

下面结合附图详细描述本发明，本发明的目的和效果将变得更加明显。本发明基于色彩不变参数的背景减除方法包括如下步骤：The purpose and effects of the present invention will become more apparent by describing the present invention in detail below in conjunction with the accompanying drawings. The background subtraction method based on the color invariant parameter of the present invention comprises the following steps:

步骤1：通过安装在车辆上的摄像头获取车辆前进道路上的视频图像。Step 1: Obtain video images on the road ahead of the vehicle through the camera installed on the vehicle.

本发明方法采集图像的具体方法如图1所示，采用的摄像头为PAL制式，每帧图像的分辨率为352*288。The specific method of image acquisition by the method of the present invention is shown in Figure 1, the camera adopted is PAL system, and the resolution of each frame of image is 352*288.

步骤2：对输入图像分别计算色彩不变参数特征通道图像、HOG特征通道图像、梯度幅值特征通道图像，得到、、、特征通道图像，其中、、为色彩不变参数特征通道图像，为HOG特征通道图像，梯度幅值特征通道图像。Step 2: Calculate the color-invariant parameter feature channel image, HOG feature channel image, and gradient magnitude feature channel image for the input image, and obtain , , , feature channel image, where , , is the color-invariant parameter feature channel image, is the HOG feature channel image, Gradient magnitude feature channel image.

特征通道图像是指将输入的图像进行特征计算得到的图像，色彩不变参数特征通道图像、HOG特征通道图像、梯度幅值特征通道图像分别是指将输入图像计算色彩不变参数特征、HOG特征、梯度幅值特征得到的特征通道图像。色彩不变参数特征、HOG特征、梯度幅值特征通道图像计算过程如下：The feature channel image refers to the image obtained by performing feature calculation on the input image, and the color invariant parameter feature channel image, HOG feature channel image, and gradient amplitude feature channel image refer to the input image to calculate the color invariant parameter feature, HOG feature , The feature channel image obtained by the gradient magnitude feature. The calculation process of the color invariant parameter feature, HOG feature, and gradient amplitude feature channel image is as follows:

2.1、色彩不变参数特征通道图像2.1. Color invariant parameter feature channel image

色彩不变参数是通过结合图像中关于色彩的光谱信息和空间结构信息计算得到的特征参数。该参数在图像局部邻域范围内具有平移不变性，尺度不变形及色彩不变性等特点，具有极强的色彩区分能力，对光线变化有很好的适应性。计算色彩不变参数首先需要对图像按如下公式进行物理建模：The color invariant parameter is a characteristic parameter calculated by combining the spectral information and spatial structure information of the color in the image. This parameter has the characteristics of translation invariance, scale invariance and color invariance in the local neighborhood of the image. It has strong color discrimination ability and good adaptability to light changes. Calculating the color invariant parameters first requires physical modeling of the image according to the following formula:

；（1） ;(1)

其中，为图像的物理模型，表示图像中的位置，为光线的波长，表示光照的光谱，表示在位置的菲涅尔反射，表示物质的放射率；in, is the physical model of the image, represents the position in the image, is the wavelength of light, represents the spectrum of light, expressed in The Fresnel reflection of the position, Indicates the emissivity of the substance;

在上述物理模型中，特征参数H、、具有色彩不变性特性，分别定义如下：In the above physical model, the characteristic parameters H, , It has the property of color invariance, which are defined as follows:

（2） (2)

（3） (3)

（4） (4)

其中，为对的一阶偏导，为对的二阶偏导，为对公式（1）的x方向的一阶偏导，为对公式（1）y方向的一阶偏导。in, for right The first-order partial derivative of for right The second partial derivative of , is the first-order partial derivative in the x direction of formula (1), is the first-order partial derivative of formula (1) in the y direction.

根据公式（1）、（2）、（3）、（4）对输入的图像计算色彩不变参数特征通道图像，得到色彩不变参数、、分别对应的、、特征通道图像。According to the formulas (1), (2), (3), and (4), the color invariant parameter feature channel image is calculated for the input image, and the color invariant parameter is obtained , , Corresponding respectively , , feature channel image.

2.2）HOG特征通道图像2.2) HOG feature channel image

对输入的图像计算其梯度图像，再依次以每个像素为中心8*8邻域内计算以该像素为中心的8*8邻域内各个像素的梯度直方图分布。直方图的统计规则如下：8*8邻域内每个像素的梯度幅值为该像素的权重，直方图以梯度的方向(0-180°)为划分区间，分为6个区间。每个像素根据自身梯度的方向落入对应的区间，再将各个区间内存在的像素的对应的梯度幅值相加，最终得到梯度直方图，如图2所示。Calculate the gradient image of the input image, and then calculate the gradient histogram distribution of each pixel in the 8*8 neighborhood centered on each pixel in turn. The statistical rules of the histogram are as follows: the gradient magnitude of each pixel in the 8*8 neighborhood is the weight of the pixel, and the histogram is divided into 6 intervals based on the direction of the gradient (0-180°). Each pixel falls into the corresponding interval according to the direction of its own gradient, and then the corresponding gradient magnitudes of the pixels existing in each interval are added, and finally the gradient histogram is obtained, as shown in Figure 2.

2.3）梯度幅值特征通道图像2.3) Gradient magnitude feature channel image

采用二阶微分算子——拉普拉斯算子来计算图像的梯度幅值。图像二阶偏微分的定义如下：The second order differential operator - Laplacian operator is used to calculate the gradient magnitude of the image. The definition of the second order partial differential of the image is as follows:

（5） (5)

（6） (6)

其中表示输入图像，,表示该像素在图像中的位置。那么2维图像的梯度得到如下：in represents the input image, , Indicates the position of the pixel in the image. Then the gradient of the 2D image get as follows:

即， which is,

（8） (8)

那么图像的梯度幅值就为。Then the gradient magnitude of the image is .

在实际计算中，采用图3所示的拉普拉斯算子与图像的每个像素进行滤波再取膜得到图像的梯度幅值特征通道图像。In the actual calculation, the Laplacian operator shown in Figure 3 is used to filter each pixel of the image and then take the film to obtain the gradient magnitude feature channel image of the image.

步骤3：分别计算步骤2中的色彩不变参数特征通道图像、HOG特征通道图像和梯度幅值特征通道图像对应的整型图像表示方法，获得各特征通道图像对应的整型特征通道图像。Step 3: Calculate the integer image representation methods corresponding to the color invariant parameter feature channel image, HOG feature channel image and gradient magnitude feature channel image in step 2 respectively, and obtain the integer feature channel images corresponding to each feature channel image.

整型图像表示计算法方式如下：The calculation method of integer image representation is as follows:

（9） (9)

式中，为图像的整形表示，为原特征通道图像中的像素值，表示图像中像素的位置。对步骤3）中得到的特征通道图像依次计算其整形图像，将这些整形图像记作、、、。In the formula, is the plastic representation of the image, is the pixel value in the original feature channel image, Indicates the position of a pixel in the image. For the feature channel images obtained in step 3), the reshaping images are calculated sequentially, and these reshaping images are denoted as , , , .

步骤4：采用不同尺度的滑动窗口遍历步骤3中得到的各整型特征通道，计算每个滑动窗口内的类哈尔特征作为行人描述特征。Step 4: Use sliding windows of different scales to traverse the integer feature channels obtained in step 3, and calculate the Haar-like features in each sliding window as pedestrian description features.

本发明方法通过采用不同尺度的滑动窗口来检测检测输入图像中不同大小尺寸的行人。实施例中，可以以100*160大小的滑动窗口作为标准尺度窗口，再以的尺度步长放缩滑动窗口。The method of the present invention detects pedestrians of different sizes in the input image by using sliding windows of different scales. In the embodiment, a sliding window with a size of 100*160 can be used as a standard scale window, and then The scale step size scales the sliding window.

在滑动窗口内采用4*4大小的矩阵，主要计算了3种类哈尔特征，分别是基于2个相邻矩形的类哈尔特征、基于3个相邻矩形的类哈尔特征和基于4个相邻矩形的类哈尔特征。如图4所示，A和B所示的基于2个相邻矩形的类哈尔特征，该特征为两个相邻矩形中值的总和的差值，即：Using a 4*4 size matrix in the sliding window, three Haar-like features are mainly calculated, which are the Haar-like features based on 2 adjacent rectangles, the Haar-like features based on 3 adjacent rectangles, and the Haar-like features based on 4 adjacent rectangles. Haar-like features of adjacent rectangles. As shown in Figure 4, the Haar-like features shown in A and B are based on two adjacent rectangles, and this feature is the difference between the sum of the median values of the two adjacent rectangles, namely:

（10） (10)

其中，一个矩形中像素值的总和，表示另外一个矩形中像素值的总和。类似的基于3个相邻矩形的类哈尔特征和基于4个相邻矩形的类哈尔特征分别表示如下：in, sum of pixel values in a rectangle, Represents the sum of pixel values in another rectangle. Similar Haar-like features based on 3 adjacent rectangles and Haar-like features based on 4 adjacent rectangles are expressed as follows:

（11） (11)

（12） (12)

步骤5：使用行人检测器检测步骤4计算得到的行人描述特征，判断输入的特征是否是与行人相关的特征。Step 5: Use the pedestrian detector to detect the pedestrian description features calculated in step 4, and judge whether the input features are related to pedestrians.

本步骤中涉及的行人检测器是指预先训练好的行人检测分类器。训练行人检测分类器的具体步骤如下：The pedestrian detector involved in this step refers to a pre-trained pedestrian detection classifier. The specific steps to train the pedestrian detection classifier are as follows:

5.1）采用INRIA行人图像数据库作为计算训练分类器样本数据的图像集。5.1) The INRIA pedestrian image database is used as the image set for computing the training classifier sample data.

5.2）根据步骤2-4计算INRIA行人图像数据库中图像的行人描述特征集合，下面用集合的形式表示特征集合：5.2) Calculate the pedestrian description feature set of images in the INRIA pedestrian image database according to steps 2-4, and the feature set is expressed in the form of a set as follows:

（13） (13)

（14） (14)

（15） (15)

其中，表示分类器的训练样本，表示行人特征集合，表示非行人特征集合，表示行人特征集合中的一个元素，表示该元素对应的特征值，1表示该元素的分类属性为行人特征，表示非行人特征集合中的一个元素，表示该元素对应的特征值，-1表示该元素的分类属性为非行人特征。in, Represents the training samples of the classifier, Represents the set of pedestrian features, Represents the set of non-pedestrian features, Represents an element in the pedestrian feature set, Indicates the feature value corresponding to the element, 1 indicates that the classification attribute of the element is a pedestrian feature, Represents an element in the set of non-pedestrian features, Indicates the feature value corresponding to the element, and -1 indicates that the classification attribute of the element is a non-pedestrian feature.

5.3）采用一组由2级判定树构成cascade结构的分类器，cascade分类器表示为：5.3) A set of classifiers with a cascade structure composed of a two-level decision tree is used, and the cascade classifier is expressed as:

（16） (16)

其中，表示经过学习的分类器，表示构成分类器的弱分类器，即2级判定树，i=1,…K，表示判定树的下标，K表示分类器中判定树的数量，本发明方法中取K=12，表示所对应的权重。in, represents the learned classifier, Represent the weak classifier that constitutes the classifier, i.e. 2-level decision tree, i=1,...K, represent the subscript of the decision tree, K represent the number of decision trees in the classifier, get K=12 in the method of the present invention, express the corresponding weight.

5.4）用步骤5.2）中计算所得的训练样本数据训练步骤5.3）定义的分类器。采用Adaboost算法训练分类器，确定每个判定树的参数以及其对于的权值。5.4) Use the training sample data calculated in step 5.2) to train the classifier defined in step 5.3). Adaboost algorithm is used to train the classifier, and the parameters of each decision tree and their weights are determined.

5.5）用步骤5.4）的方法训练5个标准尺度分类器，分类器尺度取决于用于训练分类器的样本数据对应的滑动窗口的尺寸。本发明方法以25*15、50*30、100*60、200*120、250*1505个尺寸的窗口作为标准尺寸滑动窗口，将这五个尺寸遍历图像产生的行人描述特征作为训练样本数据，得到5个标准尺度分类器。5.5) Use the method of step 5.4) to train 5 standard scale classifiers. The classifier scale depends on the size of the sliding window corresponding to the sample data used to train the classifier. The method of the present invention uses windows of 25*15, 50*30, 100*60, 200*120, and 250*1505 sizes as sliding windows of standard sizes, and uses the pedestrian description features generated by traversing images of these five sizes as training sample data, 5 standard scale classifiers are obtained.

5.6）以步骤5.5）中训练得到的5个标准尺度分类器为基础，采用尺度估计的方法构建一组完备尺度的分类器。构建方法如下：5.6) Based on the five standard scale classifiers trained in step 5.5), a set of classifiers with complete scales is constructed using the method of scale estimation. The build method is as follows:

（17） (17)

（18） (18)

，（19） , (19)

其中、为标准尺度分类器的参数，、为待估计尺度的分类器参数，为特征值在尺度1和尺度的比值。为尺度值，、、、分别为上采样和下采样参数需要经过大量实验确定其值。本发明方法中对HOG和梯度幅值特征采用，，、；对色彩不变参数特征采用，，、。in , is the parameter of the standard scale classifier, , is the classifier parameter of the scale to be estimated, for eigenvalues at scale 1 and scale ratio. is the scale value, , , , The upsampling and downsampling parameters, respectively, require extensive experiments to determine their values. In the method of the present invention, HOG and gradient amplitude features are adopted , , , ; for color invariant parameter features use , , , .

5.7）以步骤5.5）中训练得到的5个标准尺度分类器为基础，采用步骤5.6）中的方法，构建50个尺度完备分类器集合，即行人检测器。5.7) Based on the 5 standard scale classifiers trained in step 5.5), use the method in step 5.6) to construct a set of 50 scale-complete classifiers, namely pedestrian detectors.

步骤6：采用CrosstalkCascade（串联级联）策略提高步骤5）中行人检测器检测输入特征的速率和效率。Step 6: Use the CrosstalkCascade (serial cascade) strategy to improve the speed and efficiency of the pedestrian detector in step 5) to detect input features.

CrosstalkCascade策略是完成快速行人检测的关键，具体执行步骤如下：The CrosstalkCascade strategy is the key to complete fast pedestrian detection. The specific execution steps are as follows:

6.1）对滑动窗口内的行人描述特征参数按softcascade（疏松级联）规则用行人检测分类器进行滤波筛选潜在的属于行人特征的特征参数。softcascade规则为：6.1) According to the softcascade (loose cascade) rule, the pedestrian detection classifier is used to filter the pedestrian description feature parameters in the sliding window to screen potential feature parameters that belong to pedestrian features. The softcascade rules are:

（20） (20)

(21) (twenty one)

其中，为输入行人检测进行分类的行人描述特征参数，表示构成行人检测器的判定树的数量，=1,…，=1,…，、均表示判定树的下标，表示第个判定树，为对应的权值，表示第1到第i棵判定树输出值的总和，为判定的阈值。如果公式（20）成立，则判定过程结束，判定为非行人特征，即判断所在的滑动窗口中不包含行人。in, Pedestrian description feature parameters for classifying input pedestrian detections, Denotes the number of decision trees that make up the pedestrian detector, =1,... , =1,... , , Both represent the subscript of the decision tree, Indicates the first a decision tree, for the corresponding weight, Indicates the sum of the output values of the 1st to i-th decision trees, is the judgment threshold. If the formula (20) is established, the judgment process ends, and the judgment is a non-pedestrian feature, i.e. judging Pedestrians are not included in the sliding window.

6.2）若行人描述特征经过步骤6.1）判断为潜在的行人描述特征，则以滑动窗口为单位，以特征所在的滑动窗口为中心，选择7*7*3个滑动窗口内的行人描述特征参数输入行人检测器。其中7*7*3对应于w*h*d，w表示水平方向上滑动窗口的个数，h表示垂直方向上滑动窗口的个数，d表示图像中某个位置对于的d个相邻尺度的滑动窗口。将7*7*3个滑动窗口内的特征记作：6.2) If the pedestrian describes the characteristics After step 6.1), it is judged as a potential pedestrian description feature, and the sliding window is used as the unit, and the feature The sliding window where it is located is the center, and the pedestrian description feature parameters in 7*7*3 sliding windows are selected to input to the pedestrian detector. Among them, 7*7*3 corresponds to w*h*d, w represents the number of sliding windows in the horizontal direction, h represents the number of sliding windows in the vertical direction, and d represents the d adjacent scales of a certain position in the image sliding window. Denote the features in 7*7*3 sliding windows as:

(22) (twenty two)

6.3）采用excitationcascade（激励级联）规则对步骤6.2）中得到的内的行人描述特征参数进行筛选。excitationcascade规则如下：6.3) Use the excitationcascade (incentive cascade) rule to get in step 6.2) The pedestrian description feature parameters in the filter are screened. The excitationcascade rules are as follows:

（23） (twenty three)

(24) (twenty four)

其中，表示步骤6.2）中得到的行人描述特征，为判定阈值，表示构成行人检测器的判定树的数量，=1,…，=1,…，、均表示判定树的下标，表示第个判定树，为对应的权值，表示第1到第i棵判定树输出值的总和，为判定的阈值。当公式（23）成立时，判定为非行人特征，即判断所在的滑动窗口中不包含行人。in, Denotes the pedestrian description features obtained in step 6.2), is the decision threshold, Denotes the number of decision trees that make up the pedestrian detector, =1,... , =1,... , , Both represent the subscript of the decision tree, Indicates the first a decision tree, for the corresponding weight, Indicates the sum of the output values of the 1st to i-th decision trees, is the judgment threshold. When the formula (23) is established, the judgment is a non-pedestrian feature, i.e. judging Pedestrians are not included in the sliding window.

6.4）采用inhibitorycascade（截止级联）规则对步骤6.3）、6.1）中得到的特征参数集进行筛选。inhibitorycascade规则如下：6.4) Use the inhibitory cascade (cut-off cascade) rule to filter the feature parameter sets obtained in steps 6.3) and 6.1). The inhibitory cascade rules are as follows:

(25) (25)

其中、由公式(21)(24)定义，为判定的阈值。当公式（25）成立时，则判断特征参数为非行人特征。in , Defined by equations (21)(24), is the judgment threshold. When the formula (25) is established, the characteristic parameter is judged is a non-pedestrian feature.

6.5）将经过上述步骤判定后未被筛选掉的行人描述特征参数判断为行人特征，即特征对应的窗口中包含行人。6.5) Determine the pedestrian description feature parameters that have not been screened out after the above steps are determined as pedestrian features, that is, feature The corresponding windows contain pedestrians.

步骤7：用非极大值抑制算法确定与行人特征最匹配的滑动窗口，确定行人的位置。若经过上述步骤判断后没有与行人特征相匹配的滑动窗口，则判断输入的图像中无行人。Step 7: Use the non-maximum value suppression algorithm to determine the sliding window that best matches the characteristics of the pedestrian, and determine the position of the pedestrian. If there is no sliding window matching the characteristics of pedestrians after the above steps are judged, it is judged that there is no pedestrian in the input image.

经过行人检测器的检测，可能存在多个都与行人特张相匹配的窗口，因此需要从这些窗口中选择最佳匹配的窗口，采用非极大值抑制算法能够有效快速的选择出该窗口。After the detection of the pedestrian detector, there may be multiple windows that match the pedestrian profile. Therefore, it is necessary to select the best matching window from these windows. The non-maximum suppression algorithm can effectively and quickly select the window.

本发明方法针对传统的行人检测方法中计算费时的问题，提出了一种基于视觉的快速行人检测方法。该方法在提高检测速率的同时也保证了行人检测的准确率。本发明方法主要从以下三个方面提高行人检测方法的执行速率：1）通过定义有效同时便于计算的行人描述特征通道：色彩不变参数特征通道、梯度直方图特征通道(HistogramofGradient,HOG)、梯度幅值特征通道；2）通过构建多尺度分类器的方法，将检测过程中费时的环节预先在训练行人检测器阶段完成，同时该方法也提高了检测的准确率；3）采用CrosstalkCascade策略使训练好的分类器在实施实时检测时快速高效的完成行人特征的检测。本发明方法将行人检测技术推向实用化，使行人检测技术能够适用于实际的工程应用中，如在安防视频监控领域、汽车主动防御安全领域均有巨大的适用前景。The method of the invention aims at the problem of time-consuming calculation in the traditional pedestrian detection method, and proposes a fast pedestrian detection method based on vision. This method not only improves the detection rate, but also ensures the accuracy of pedestrian detection. The method of the present invention mainly improves the execution speed of the pedestrian detection method from the following three aspects: 1) By defining effective and easy-to-calculate pedestrian description feature channels: color invariant parameter feature channel, gradient histogram feature channel (Histogram of Gradient, HOG), gradient Amplitude feature channel; 2) By building a multi-scale classifier, the time-consuming link in the detection process is completed in the training stage of the pedestrian detector in advance, and this method also improves the accuracy of detection; 3) Using the CrosstalkCascade strategy to make training A good classifier can quickly and efficiently detect pedestrian features when implementing real-time detection. The method of the invention promotes the practicality of the pedestrian detection technology, so that the pedestrian detection technology can be applied to practical engineering applications, such as in the field of security video monitoring and the field of automobile active defense safety, which has great application prospects.

Claims

1. A vision-based fast pedestrian detection method is characterized in that the method comprises the following:

(1) Obtain video images on the road ahead of the vehicle through a camera installed on the vehicle;

(2) Process the video image obtained in step (1) frame by frame: calculate the color invariant parameter feature channel image, HOG feature channel image, and gradient amplitude feature channel image for the input image respectively, and obtain _SH , S _x , S _y , S _Histo and S _M feature channel images, where _SH , S _x , S _y are color-invariant parameter feature channel images, S _Histo is HOG feature channel images, and S _M is gradient magnitude feature channel images;

(3) Calculate the integer image representation methods corresponding to the S _H , S _x , S _y , S _Histo and S _M feature channel images in step (2) respectively, and obtain the integer feature channel images corresponding to each feature channel image: IS _H , IS _x , IS _y , IS _Histo , IS _M ;

(4) Use sliding windows of different scales to traverse the integer feature channel images obtained in step (3), and calculate the Haar-like features in each sliding window as pedestrian description features;

(5) Use the pedestrian description feature calculated in the pedestrian detector detection step (4) to judge whether the input feature is a feature related to the pedestrian;

(6) Adopting a series cascading strategy to improve the speed and efficiency of pedestrian detector detection input features in step (5);

(7) Determine the sliding window most matching with the pedestrian feature with the non-maximum value suppression algorithm, and determine the position of the pedestrian; if there is no sliding window matching the pedestrian feature after the above-mentioned steps are judged, then judge that there is no pedestrian in the input image;

In the step (2), the HOG feature channel is defined as follows: calculate its gradient image for the input image, and then calculate each pixel in the 8*8 neighborhood centered on each pixel in turn in the 8*8 neighborhood centered on the pixel The gradient histogram distribution of the gradient; the statistical rules of the histogram are as follows: the gradient amplitude of each pixel in the 8*8 neighborhood is the weight of the pixel, and the histogram is divided into 6 intervals based on the direction of the gradient (0-180°). Each pixel falls into the corresponding interval according to the direction of its own gradient, and then adds the corresponding gradient amplitudes of the pixels existing in each interval to finally obtain the gradient histogram;

In the step (4), the Haar-like feature is defined as follows: a matrix of 4*4 size is used in the sliding window, and 3 Haar-like features are calculated, which are respectively based on 2 adjacent rectangle-like Haar-like features, based on 3 The Haar-like feature of two adjacent rectangles and the Haar-like feature based on 4 adjacent rectangles; the Haar-like feature based on 2 adjacent rectangles, which is the difference between the sum of the median values of two adjacent rectangles, which is:

H ₂ ＝|∑rec1 _i -∑rec2 _j |(10)

Among them, ∑rec1 _i represents the sum of pixel values in one rectangle, and ∑rec2 _j represents the sum of pixel values in another rectangle; Haar-like features based on 3 adjacent rectangles and Haar-like features based on 4 adjacent rectangles They are expressed as follows:

H ₃ ＝|∑rec1 _i +∑rec3 _j -∑rec2 _j |(11)

H ₄ ＝|∑rec1 _i -∑rec2 _j +∑rec3 _i -∑rec4 _j |(12);

In the step (6), the series cascading strategy is defined as follows:

(6.1) According to the loose cascade rule, the pedestrian detection classifier is used to filter and screen the potential characteristic parameters belonging to the pedestrian feature to the pedestrian description feature parameters in the sliding window; the loose cascade rule is:

{H h}_{k k} ((x x)) < < {θ θ}_{k k}^{R R},, k k < < K K - - - - - - ((2020))

{H h}_{k k} ((x x)) = = {Σ Σ}_{i i = = 11}^{k k} {a a}_{i i} {h h}_{i i} ((x x)) - - - - - - ((21 twenty one))

Among them, x is the pedestrian description feature parameter for input pedestrian detection classification, K represents the number of decision trees constituting the pedestrian detector, i=1,...k, k=1,...K, i, k all represent The subscript of the decision tree, h _i (x) represents the i-th decision tree, a _i is the weight corresponding to h _i (x), H _k (x) represents the sum of the output values of the 1st to i-th decision trees, is the threshold of judgment; if the formula (20) is established, the judgment process ends, and it is judged that x is a non-pedestrian feature, that is, the sliding window where x is judged does not contain pedestrians;

(6.2) If the pedestrian description feature x is judged to be a potential pedestrian description feature after step (6.1), then take the sliding window as the unit and take the sliding window where the feature x is located as the center, select 7*7*3 pedestrians in the sliding window The description feature parameters are input into the pedestrian detector; where 7*7*3 corresponds to w*h*d, w represents the number of sliding windows in the horizontal direction, h represents the number of sliding windows in the vertical direction, and d represents a certain The sliding window of d adjacent scales for the position; the features in the 7*7*3 sliding windows are recorded as:

N(x)＝{ _xi | _xi ∈N(x)}(22)

(6.3) Use the incentive cascading rules to filter the pedestrian description feature parameters in N(x) obtained in step (6.2); the incentive cascading rules are as follows:

Among them, x' represents the pedestrian description feature obtained in step (6.2), is the decision threshold, K represents the number of decision trees that constitute the pedestrian detector, i=1,...k, k=1,...K, i, k both represent the subscripts of the decision tree, h _i (x) Represents the i-th decision tree, a _i is the weight corresponding to h _i (x), H _k (x) represents the sum of the output values of the 1st to i-th decision trees, is the judgment threshold; when the formula (23) is established, it is judged that x is a non-pedestrian feature, that is, the sliding window where x is judged does not contain pedestrians;

(6.4) adopt cut-off cascade rule to filter the feature parameter set obtained in step (6.3), step (6.1); cut-off cascade rule is as follows:

\frac{{H h}_{k k} ((x x))}{{H h}_{k k} (({x x}^{' '}))} < < {θ θ}_{k k}^{I I},, k k < < K K - - - - - - ((2525))

Among them, H _k (x), H _k (x') are defined by formula (21) (24), is the threshold of judgment; when the formula (25) is established, then the judgment feature parameter x is a non-pedestrian feature;

(6.5) Determine the pedestrian description feature parameters that have not been screened out after the above steps are determined as pedestrian features, that is, the window corresponding to feature x contains pedestrians.

2. the vision-based fast pedestrian detection method according to claim 1, is characterized in that, in described step (2), color invariant parameter feature channel is defined as follows:

The color invariant parameter is a characteristic parameter calculated by combining the spectral information and spatial structure information of the color in the image; this parameter has translation invariance, scale invariance and color invariance in the local neighborhood of the image, and has a strong Color discrimination ability, good adaptability to light changes; calculation of color invariant parameters first requires physical modeling of the image according to the following formula,

in, is the physical model of the image, Indicates the position in the image, λ is the wavelength of the light, represents the spectrum of light, expressed in The Fresnel reflection of the position, Indicates the emissivity of the substance;

In the above physical model, the characteristic parameters H, W _x , W _y have the property of color invariance, which are defined as follows:

H h = = \frac{{E E.}_{λ λ}}{{E E.}_{λ λ λ λ}} - - - - - - ((22))

{W W}_{x x} = = \frac{{E E.}_{x x}}{E E.} - - - - - - ((33))

{W W}_{y the y} = = \frac{{E E.}_{y the y}}{E E.} - - - - - - ((44))

where E _λ is The first-order partial derivative of λ, E _λλ is To the second-order partial derivative of λ, E _x is the first-order partial derivative to the x direction of formula (1), and E _y is the first-order partial derivative to the formula (1) y direction;

According to the formulas (1), (2), (3), and (4), calculate the color invariant parameter feature channel image for the input image, and obtain the SH and S _x corresponding to the color invariant parameters _H , W _x , and W _y respectively , S _y feature channel image.

3. the vision-based fast pedestrian detection method according to claim 1, is characterized in that, in described step (2), the gradient magnitude feature channel is defined as follows:

The second-order differential operator - Laplacian operator is used to calculate the gradient magnitude of the image; the definition of the second-order partial differential of the image is as follows:

\frac{{\partial \partial}^{22} f f}{\partial \partial {x x}^{22}} = = f f ((x x + + 11,, y the y)) + + f f ((x x - - 11,, y the y)) - - 22 f f ((x x,, y the y)) - - - - - - ((55))

\frac{{\partial \partial}^{22} f f}{\partial \partial {y the y}^{22}} = = f f ((x x,, y the y + + 11)) + + f f ((x x,, y the y - - 11)) - - 22 f f ((x x,, y the y)) - - - - - - ((66))

Where f represents the input image, x, y represent the position in the image; then the gradient of the 2D image get as follows:

{&dtri; &dtri;}^{22} f f = = \frac{{\partial \partial}^{22} f f}{\partial \partial {x x}^{22}} + + \frac{{\partial \partial}^{22} f f}{\partial \partial {y the y}^{22}} - - - - - - ((77))

which is,

{&dtri; &dtri;}^{22} f f = = [[f f ((x x + + 11,, y the y)) + + f f ((x x - - 11,, y the y)) + + f f ((x x,, y the y - - 11)) + + f f ((x x,, y the y + + 11))]] - - 44 f f ((x x,, y the y)) - - - - - - ((88))

Then the gradient magnitude of the image is

4. the vision-based fast pedestrian detection method according to claim 1, is characterized in that, in the described step (3), the integer image representation calculation method is as follows:

ii(x, y) = ∑ _{x'≤x, y'≤y} i(x', y') (9)

In the formula, ii(x, y) is the plastic representation of the image, i(x', y') is the pixel value in the original feature channel image, x, y represents the position of the pixel in the image; for the obtained in step (3) The characteristic channel images of are calculated in turn for their reshaping images, and these reshaping images are denoted as IS _H , IS _x , IS _y , IS _Histo , and _ISM .

5. the vision-based fast pedestrian detection method according to claim 1, is characterized in that, in described step (5), the construction and training method of pedestrian detector are as follows:

(5.1) Use the INRIA pedestrian image database as the image set for calculating the training classifier sample data;

(5.2) According to step (2)-step (4), calculate the pedestrian description feature collection of image in INRIA pedestrian image database, represent feature collection with the form of collection below:

S=P∪N(13)

P = {p _i ∈ P: p _i = (v _P , 1)} (14)

N = {n _i ∈ N: n _i = (v _N , -1)} (15)

Among them, S represents the training sample of the classifier, P represents the pedestrian feature set, N represents the non-pedestrian feature set, p _i represents an element in the pedestrian feature set, v _P represents the feature value corresponding to the element, and 1 represents the classification of the element The attribute is a pedestrian feature, n _i represents an element in the non-pedestrian feature set, v _N represents the feature value corresponding to the element, and -1 indicates that the classification attribute of the element is a non-pedestrian feature;

(5.3) A group of classifiers with a cascade (cascade) structure composed of a 2-level decision tree is adopted, and the cascade classifier is expressed as:

H h ((x x)) = = {H h}_{K K} ((x x)) = = {Σ Σ}_{i i = = 11}^{K K} {a a}_{i i} {h h}_{i i} ((x x)) - - - - - - ((1616))

Among them, H(x) represents the learned classifier, h _i (x) represents the weak classifier constituting the classifier, that is, a 2-level decision tree, i=1,...K, represents the subscript of the decision tree, K Represent the number of decision trees in the classifier, take K=12, a _i represents the weight corresponding to h _i (x);

(5.4) use the classifier defined in the training sample data training step (5.3) defined in the step (5.2); Adopt the Adaboost algorithm training classifier to determine the parameters of each decision tree and its corresponding weight;

(5.5) Train 5 standard scale classifiers with the method of step (5.4), the classifier scale depends on the size of the sliding window corresponding to the sample data used to train the classifier; the method is 25*15, 50*30, 100* 60, 200*120, 250*1505 size windows are used as standard size sliding windows, and the pedestrian description features generated by traversing images of these five sizes are used as training sample data to obtain 5 standard scale classifiers;

(5.6) Based on the five standard scale classifiers trained in step (5.5), use the method of scale estimation to construct a set of classifiers with complete scales; the construction method is as follows:

v _rec '=v _rec *r(s)(17)

τ'=τ*r(s)(18)

r r ((s the s)) = = \{\begin{matrix} {a a}_{u u} * * {s the s}^{{b b}_{u u}} & i i f f s the s > > 11 \\ {a a}_{d d} * * {s the s}^{{b b}_{d d}} & o o t t h h e e r r w w i i s the s e e \end{matrix},, - - - - - - ((1919))

Among them, v _rec and τ are the parameters of the standard scale classifier, v _rec ', τ' are the classifier parameters of the scale to be estimated, r(s) is the ratio of the feature value at scale 1 to scale s; s is the scale value, a _u , b _u , a _d , b _d are the up-sampling and down-sampling parameters, and their values need to be determined through a lot of experiments; for HOG and gradient amplitude features, a _u = 1, b _u = 0, a _d = 0.89, b _d = 1.586; a _u = 1, b _u = 2, a _d = 1, b _d = 2 are used for color invariant parameter features;

(5.7) Based on the 5 standard scale classifiers trained in step (5.5), use the method in step (5.6) to construct a set of 50 scale-complete classifiers, namely pedestrian detectors.