Real-Time Pedestrian Detection and Tracking at Nighttime For Driver-Assistance Systems
Real-Time Pedestrian Detection and Tracking at Nighttime For Driver-Assistance Systems
Real-Time Pedestrian Detection and Tracking at Nighttime For Driver-Assistance Systems
Abstract—Pedestrian detection is one of the most important ate in different environments and directly output accurate depth
components in driver-assistance systems. In this paper, we propose information, it is difficult for them to distinguish pedestrians
a monocular vision system for real-time pedestrian detection and from other obstacles. Thus, video cameras are more suitable for
tracking during nighttime driving with a near-infrared (NIR)
camera. Three modules (region-of-interest (ROI) generation, ob- detecting pedestrians as they are similar to the human visual
ject classification, and tracking) are integrated in a cascade, and perception system and provide rich information for applying
each utilizes complementary visual features to distinguish the discriminative pattern recognition techniques.
objects from the cluttered background in the range of 20–80 m. However, robust and efficient vision-based pedestrian de-
Based on the common fact that the objects appear brighter than
the nearby background in nighttime NIR images, efficient ROI
tection is a challenging task in real traffic and cluttered en-
generation is done based on the dual-threshold segmentation al- vironments, due to the movement of cameras, the variable
gorithm. As there is large intraclass variability in the pedestrian illumination conditions, the wide range of possible human
class, a tree-structured, two-stage detector is proposed to tackle appearances and poses, the strict performance criteria, and the
the problem through training separate classifiers on disjoint sub- hard real-time constraints.
sets of different image sizes and arranging the classifiers based on
Haar-like and histogram-of-oriented-gradients (HOG) features in Recently, many interesting approaches for vision-based
a coarse-to-fine manner. To suppress the false alarms and fill the pedestrian detection have been proposed. Most of them tend to
detection gaps, template-matching-based tracking is adopted, and utilize expensive far-infrared (FIR) cameras or stereo vision to
multiframe validation is used to obtain the final results. Results facilitate the extraction of regions of interest (ROIs), because
from extensive tests on both urban and suburban videos indicate the conventional background subtraction, as frequently used
that the algorithm can produce a detection rate of more than
90% at the cost of about 10 false alarms/h and perform as fast as in surveillance applications, completely fails with the moving
the frame rate (30 frames/s) on a Pentium IV 3.0-GHz personal cameras, and the sliding window approach, which searches
computer, which also demonstrates that the proposed system is all possible sizes at all locations over the images, is currently
feasible for practical applications and enjoys the advantage of low computationally too intensive for real-time application. On the
implementation cost.
contrary, considering the low-cost benefit of monocular vision
Index Terms—AdaBoost, histogram of oriented gradients and its potential practical value, increasingly more researches
(HOG), Kalman filter, near infrared camera, pedestrian detection, take account of detecting pedestrians based on a monocular
template matching.
normal or near-infrared (NIR) camera with rough appearance
cues (e.g., symmetry, intensity, and texture) for ROI generation.
I. I NTRODUCTION
In this paper, we present a real-time pedestrian detection and
TABLE I
OVERVIEW OF CURRENT PEDESTRIAN DETECTION SYSTEMS. DR IS IN TERMS OF THE NUMBER OF PEDESTRIANS. THE FA PER FRAME
IS ESTIMATED FROM THE ORIGINAL DATA IN RESPECTIVE PAPERS
about the pedestrian-detection system in Section III. Then, from Considering the monocular daylight or NIR camera, which
Section IV to Section VI, we discuss further details about each is cheaper than stereo vision and holds better signal-to-noise
module. Experimental results that validate the proposed system ratio and resolution than the FIR camera, although it loses the
are presented in Section VII. Finally, conclusions are given in depth information, there are still efficient approaches for candi-
the Section VIII. date selection. Broggi et al. [11], [12] use vertical symmetry
derived from gray levels and vertical gradient magnitude to
II. R ELATED W ORK select candidate regions around each relevant symmetry axis.
Shashua et al. [8] obtain 75 ROIs/frame by filtering out win-
Generally, a vision-based pedestrian detection system can
dows based on the lack of distinctive texture properties and
be divided into three modules, as described in [2] and [10]:
noncompliance with perspective constraints on the range and
ROI generation, object classification, and tracking. The ROI
size of the candidates. Cao et al. [9] perform an exhaustive
generation is the first and important step since it provides
scan on the particular rectangle region in which the pedestrians
candidates for the following processes and directly affects the
might cause a collision with the vehicle to generate the candi-
system performance. Thus, both efficiency and effectiveness
are mandatory. Fortunately, the ROI generation module can be dates. For the night videos from the NIR camera, Tian et al.
facilitated by special hardware configuration. [13] extract the target regions from the raw data by using an
Nighttime pedestrian detection is usually carried out using adaptive-thresholding-based image segmentation algorithm.
FIR cameras as they provide special cues for candidate se- Once the ROIs have been obtained, different combinations
lection. In [6], the system starts with the search for hot spots of features and pattern classifiers can be applied to distin-
assumed to be parts of the human body through a dynamic guish pedestrians from nonpedestrian candidates. The shape,
threshold of each frame and then selects upper body and full- appearance, and motion features are the most important cues
body candidates after getting the road information. However, for pedestrian detection, such as the raw image intensity [6],
pedestrians are not always brighter than the background in gradient magnitude [3], edge [4], binary head model subimage
the FIR images during the summer or when they are wearing [7], [11], Haar wavelets [5], [14], Gabor filter outputs [15],
well-insulated clothing. With the help of stereo FIR vision, local receptive fields [4], [16], motion filter output [9], [17],
Bertozzi et al. [7] exploit three different approaches for can- and HOGs [18], [19].
didate generation, i.e., warm area detection for hot spots and Referring to the classification techniques, typical approaches
vertical-edge detection and disparity computation for detecting like template matching [4], [20], artificial neural networks
cold objects that can potentially be pedestrians. (ANNs) [3], [4], support vector machines (SVMs) [5], [6],
Stereo vision is also widely used in daytime pedestrian [14], [18], [19], and AdaBoost [8], [9], [17] have been adopted
detection. Zhao and Thorpe [3] obtain the foreground regions to find pedestrians in conjunction with the above features.
by clustering in the disparity space. Gavirila and Munder [4] Obviously, SVM is the most popular learning method and often
scan the depth map with windows of appropriate sizes and produces accurate classifications with most of the features, but
take into account the ground-plane constraints to obtain the it requires heavy computation if there are lots of candidates.
ROIs where pedestrians are likely. Alonso et al. [5] propose Conversely, AdaBoost, which combines several weak classi-
a candidate-selection method based on the direct computation fiers into a strong classifier using weighted majority vote, can
of the 3-D coordinates of relevant points in the scene. quickly identify the candidates with simple features but with
GE et al.: REAL-TIME PEDESTRIAN DETECTION AND TRACKING AT NIGHTTIME 285
Fig. 5. False segmentation caused by a large TH , when the pixel values of the
pedestrian area are near the dark background. (a) Original image. (b) Intensity
. (c) False segmentation of T
of the scan line, TH , TL , and the optimized TH H
and TL . (d) Acceptable result of the optimized TH and T .
L
where I(i, j) indicates the intensity of the pixel (i, j). w is Fig. 6. If both the object region and background are bright in scan lines, the
initially set to 12 according to the width distribution of pedes- original TH will become too high for correct segmentation. (a) Original image.
. (c) Bad
(b) Pixel values along the scan line, TH , TL , and the optimized TH
trians: α = 2, and β = 8.
result of TH and TL . (d) Result of TH and T .
The two thresholds are employed like the nonmaximum L
Fig. 10. Procedure for calculating a HOG descriptor. (a) Original image. (b) Gradient orientation and magnitude of each pixel are illustrated by the arrows’
direction and length. (c) Splitting the detection window into 6 × 15 cells. (d) HOG computation in each cell. (e) Normalizing all the histograms inside a block of
2 × 2 cells. (f) Concatenating all the normalized histograms into an HOG descriptor.
5) Normalize the histograms within a block of 2 × 2 cells. Algorithm 1: Training procedure of Gentle AdaBoost.
6) Group all the normalized histograms into a single vector. • Given data set S = (x1 , y1 ), . . . , (xm , ym ), where (xi , yi )
∈ X × {−1, +1}, the number of weak classifiers to be
The normalization is used to reduce the illumination variabil-
selected T .
ity. The final HOG descriptor is obtained by concatenating all
• Initialize the sample weight distribution D1 (i) = 1/m.
the normalized histograms into a single 2520-D (36 × 5 × 14)
• For t = 1, . . . , T
vector.
1) Learn a CART with lowest error as the best weak classi-
fier to partition X into several disjoint subregions
B. AdaBoost Learning X1 , . . . , Xn .
2) Under the weight distribution Dt , calculate
Given the feature set and the training set of positive and
negative samples, many machine learning approaches can be
used to learn the classification function. In our system, a variant Wlj = P (xi ∈ Xj , yi = l)
of the AdaBoost algorithm, i.e., Gentle AdaBoost [32], is used
to select the relevant features and train the classifiers. = Dt (i), l = ±1. (6)
i:xi ∈Xj ∧yi =l
Compared with other statistical learning approaches (e.g.,
SVM and ANN), which try to learn a single powerful discrim-
inant function from all the specified features extracted from 3) Set the output of h on each Xj as
training samples, the AdaBoost algorithm combines a collec-
j j
tion of simple weak classifiers on a small set of critical features W+1 − W−1
to form a strong classifier using the weighted majority vote. ∀xi ∈ Xj , h(x) = j j
. (7)
W+1 + W−1
Meanwhile, as a kind of large-margin classifier [33], AdaBoost
provides strong bounds on generalization and guarantees the
comparable performance with SVM. Thus, AdaBoost is an 4) Update the sample weight distribution
effective learning algorithm for real-time application.
Moreover, Gentle AdaBoost [32] has been proven to be Dt (i) exp [−yi ht (xi )]
Dt+1 (i) = (8)
superior to Real AdaBoost [34] in most cases, both of which Zt
adopt weak learners (WLs) with real-valued outputs to over-
come the disadvantage of binary-valued stump functions in where Zt is a normalization factor.
discriminating the complex distribution of the positive and • The final strong classifier H is
negative samples and yield superior performance compared
T
with the original discrete AdaBoost.
As the HOG descriptor and simple features are high- H(x) = sign ht (x) − b (9)
dimensional vectors, we regard each dimension of the vector t=1
as a 1-D feature. Thus, all the features can be treated in the
same way. In the AdaBoost learning procedure, we employ where b is a threshold whose default value is zero. The
the classification and regression trees (CARTs) [35] as the of H is defined as confH (x) = |R(x)|, and
confidence
real-valued WLs that divide the instance space into a set of R(x) = t ht (x) − b.
disjoint subregions with the lowest error in each round. The
real-valued output in each subregion can be calculated by the
C. Tree-Structured Two-Stage Detector
sample weights falling into it. The WL output then depends
only on the subregion that the instance belongs to. The detailed The structure of the pedestrian detector is shown in
training procedure is shown in Algorithm 1. Fig. 11. Unlike the cascaded framework proposed by Viola and
GE et al.: REAL-TIME PEDESTRIAN DETECTION AND TRACKING AT NIGHTTIME 291
Jones [29] for face detection, which consists of multiple stages The parameters that should be optimized in the detector is
and rejects lots of nonface candidates in the early stages to the threshold of the first and second stages, which is denoted T1
reduce the computational cost, our tree-structured two-stage and T2 , for combination. Furthermore, the optimal parameter
architecture aims to improve the classification performance by settings can be obtained through the sequential parameter-
integrating classifiers trained on different features and sample optimization method described in [4].
sets of different image sizes.
The proposed classification framework arranges two stages
VI. T RACKING
in a coarse-to-fine manner according to the issues mentioned
at the beginning of this section, instead of learning a cascaded Once the candidates are validated by the AdaBoost detector,
detector of multiple stages based on the hybrid of features [36] a tracking stage takes place. The tracking module will fill
for efficiency reasons. the detection gap between frames and discard the spurious
The classifier based on Haar-like features is selected for its detections, thus helping to minimize both false-positive (FP)
higher performance than that based on other simple features. and false-negative detections based on the temporal coherence.
Furthermore, it is used for rough classification, focusing The tracking algorithm relies on the detection results to ini-
on rejecting the nonpedestrian candidates and selecting tiate the tracks and the candidates to provide possible observa-
the well-bounded pedestrian ROIs. On the other hand, the tions (or measurements). If the tracked object fails to obtain the
classifiers based on HOG features are utilized to give precise associated observation from the selected candidates, template
determination. matching is utilized to give the complementary measurement.
As the pedestrian class holds high intraclass variability due The details about the tracking process are presented in Fig. 12.
to different sizes, illuminations, poses, or clothes, the instance For a pedestrian in the video, the state variables that we
space is partitioned into subregions of reduced variability ac- concern are its centroid position (X, Y ), width (W ), and
cording to the distribution of the width and height of the height (H), as well as their differentials between two suc-
pedestrians, three HOG-based classifiers are trained on three cessive frames. Thus, the state vector for Kalman tracking is
separate sets, which are denoted by AdaBoost-S, AdaBoost-M, x = (X, dX, Y, dY, W, dW, H, dH)T . If we consider that the
and AdaBoost-L in Fig. 11. The results from the experiments movement of the camera is straight with constant speed, which
indicate that this strategy not only reduces the complexity of is reasonable in most cases, the state transition matrix is simple,
the classifiers but also improves the detection performance. and because we can directly observe all the state variables of
292 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 10, NO. 2, JUNE 2009
the tracked object in each frame, the measurement matrix will A pretracked object will be validated as a pedestrian and
be an identity matrix. Taking the state variables X and dX, for moved to the tracking stage only if it has been detected more
example, the process equation and measurement equation can than Np times, and its average confidence in ten consecutive
be described as follows: frames is above a threshold T Hp . The pedestrian confidence in
each frame is obtained from the detection confidence (which
Xk+1 1 1 Xk
= + wk (10) is mapped from R(x) by f (x) = 1/(1 + exp(−x))) or by
dXk+1 0 1 dXk
template matching. Only the objects in the tracking stage will
m
be shown as output alarms. The parameters Np and T Hp can
Xk 1 0 Xk
= + vk (11) also be optimized by the sequential parameter-optimization
dXkm 0 1 dXk
method [4].
where wk is the process noise, and vk is the measurement Thus, in the tracking procedure, the pretracking and multi-
noise, both of which are assumed to be additive, white, and frame validation can reject the spurious detections, while the
Gaussian, with zero mean. If we obtain the estimated covari- tracking stage tends to fill the detection gaps of the correspond-
ance matrix of wk and vk , the update of the Kalman filter will ing objects.
be straightforward.
In addition to the update of the Kalman filter, another impor-
VII. E XPERIMENTAL R ESULTS
tant issue of object tracking is the data association that tries to
find the associated observation of the tracked object to correct To evaluate the integrated pedestrian-detection and tracking
the filter’s prediction. system, make a comparison of different approaches, and opti-
Since the object is represented by its position and size infor- mize the parameter settings, we generate both the training and
mation, the predicted object’s nearest neighbor can be regarded testing sets from the raw video data captured by the NIR camera
as the corresponding measurement of the tracked object. We during about 10-h of suburban and urban driving at speeds of
define the distance criterion as follows: no more than 70 km/h.
The positive samples including pedestrians and bicyclists
D(x1 , x2 ) = (X1 − X2 )2 + (Y1 − Y2 )2 that cover a wide variety of sizes, poses, illuminations, and
backgrounds are extracted from not only the manually labeled
+ (|W1 − W2 | + |H1 − H2 |) . (12) results but the candidates generated by the ROI generation
module as well. These roughly bounded samples will make
A candidate will not be considered as the observation of the
the trained classifiers to be insensitive to the bounding-box
associated track unless it is the prediction’s nearest neighbor
accuracy. Accordingly, the negative samples are initially ran-
and the overlap ratio between them is greater than 0.5.
domly selected from the nonpedestrian candidates but further
However, the nearest-neighbor method may fail to find the
increased by following the bootstrap strategy in [18] and [37] to
object’s observation due to bad segmentation or the nonlinear
obtain more representative samples, because the easy negative
movement of the camera. Then, template matching is adopted
samples are useless for improving or validating the discriminat-
for improvement. The template is initiated at the beginning
ing capability of the trained classifiers.
of the track and updated when the nearest-neighbor method
Finally, the training set consists of 4754 positive samples
works. Once the nearest neighbor cannot be found from the
and 5929 negative samples, while there are 1977 positive and
candidates in the current frame, template matching is taken
2116 negative instances in the testing set. All the samples
to search for the observation. If the best matching confidence
can be divided into three groups (small, middle, and large)
is greater than 0.85, the resulting region is accepted as the
according to their size for three HOG-based classifiers. More-
measurement; otherwise, the track’s corresponding observation
over, three video sequences in 720 × 480 resolution containing
is missing. The matching confidence is defined as follows:
hundreds of video clips are prepared for parameter optimization
(T (x, y) − t̄) Φ(x, y) − φ̄ and performance evaluation. Both suburban and urban scenes
d(T, Φ) =
x,y
(13) (indicated by V1-SU, V2-S, and V3-U) are considered in the
2 testing videos; in addition, the pedestrians’ positions in each
(T (x, y) − t̄)2 Φ(x, y) − φ̄
x,y x,y frame are manually labeled by bounding boxes for comparison
with the segmentation and detection results.
where T (x, y) is the template, and Φ(x, y) indicates the image To inspect how the proposed system would work on pedes-
signal of the same size. t̄ and φ̄ are the mean values of T (x, y) trians in different poses and views, all the pedestrians in the
and Φ(x, y), respectively. image sets and video are divided into three groups, i.e., along
An active track will be terminated for the following three the road, across the road, and bicyclist, for statistics. The details
reasons: 1) the tracked object is out of the frame; 2) the about the data set are presented in Table II.
matching confidence in template matching step is less than
0.7; and 3) the corresponding observation has missed in three
A. Evaluation Criteria
successive frames.
To improve the system performance, the tracking process is Commonly, there are distinct differences between the clas-
divided into two stages: pretracking and tracking. The pretrack- sifiers’ classification performance and the system’s detection
ing stage is applied to reconfirm the existence of pedestrians. performance. The true-positive (TP) rate (also called the hit
GE et al.: REAL-TIME PEDESTRIAN DETECTION AND TRACKING AT NIGHTTIME 293
Fig. 13. Performance of four steps in the ROI generation module using diverse settings of w, i.e., w = 9, 10, . . . , 15.
Fig. 14. Number of weak classifiers in each strong classifier based on different
features. HOG-N indicates computing HOG features from normalized samples.
Small, middle, and large refer to the individual classifiers of the size-sensitive
classifier.
Fig. 16. Performance comparison of two-stage classifiers using different Fig. 17. Detection performance of each module on V2-S and V3-U.
features.
TABLE III
EFFECTS OF PARAMETER OPTIMIZATION ON THE SYSTEM PERFORMANCE,
WHICH IS TESTED USING THE VIDEO V1-SU
TABLE IV
SYSTEM PERFORMANCE EVALUATED ON THE VIDEOS V2-S AND V3-U
Fig. 19. False alarms and missed pedestrians in testing videos. (a)–(c) Three pedestrians are missed. (d)–(f) Three false alarms.
displayed. All the pedestrians and bicyclists that are brighter Table IV presents the detailed experimental results on V2-S
than the nearby background can be detected by the system. and V3-U. All the experiments are carried out on a Pentium IV
Concerning the TP rate and FPR (estimated by FPPF/CNPF) 3.0-GHz computer with 1-GB RAM. The average running time
of the detector on videos, both of them are strikingly smaller per frame reveals that the proposed system is fast enough for
than those previously reported on the testing set. The difference real-time constraints, and this advantage also provides room for
may arise from the increase of T2 on one hand, but more further improvement.
exactly, the degraded DR is caused by rough segmentation or In the urban video, three FAs occur in about 350 frames, and
poor bounding-box accuracy, as shown in Fig. 18, and a large six pedestrians (two bicyclists, three across road, and one along
amount of extracted easy negative candidates tend to let the road) are missed for their low average detection confidence.
FPR be at a much lower level. Although there are no false detections in suburban testing,
Therefore, the specific values of the instance-based DR and there still exist two pedestrians (one bicyclist and one across
FPPF or TP rate and FPR are influenced by many factors (e.g., road) missed by the system. These results indicate that the
training samples, test criteria, and testing data) and cannot di- pedestrians walking across the road are not only the most
rectly be applied for performance comparison between different dangerous candidates but are also the hardest targets to detect.
systems. Fig. 19 shows several examples of the FAs and missed
From the perspective of the application, we are more in- pedestrians. Most of the undetected objects suffer from their
terested in the number of detected or missed pedestrians, the nonuniform brightness or the bright background, which leads to
FA times in a time interval, and the running speed. All these bad segmentation and inaccurate bounding boxes. Furthermore,
parameters apparently show the performance that can be per- all the FAs are caused by their similar shape to pedestrians.
ceived by users; thus, they are reasonable for evaluating the Traffic signs, lights, and trees are the three main sources of
performance of different systems. incorrect detection.
GE et al.: REAL-TIME PEDESTRIAN DETECTION AND TRACKING AT NIGHTTIME 297
[13] Q. Tian, H. Sun, Y. Luo, and D. Hu, “Nighttime pedestrian detection with [33] R. E. Schapire, Y. Freund, P. Barlett, and W. S. Lee, “Boosting the margin:
a normal camera using SVM classifier,” in Proc. 2nd. ISNN, May 2005, A new explanation for the effectiveness of voting methods,” Ann. Stat.,
vol. 3497, pp. 189–194. vol. 26, no. 5, pp. 1651–1686, 1998.
[14] C. Papageorgiou and T. Poggio, “A trainable system for object detection,” [34] R. E. Schapire and Y. Singer, “Improved boosting algorithms using
Int. J. Comput. Vis., vol. 38, no. 1, pp. 15–33, Jun. 2000. confidence-rated predictions,” Mach. Learn., vol. 37, no. 3, pp. 297–336,
[15] H. Cheng, N. Zheng, and J. Qin, “Pedestrian detection using sparse Gabor Dec. 1999.
filter and support vector machine,” in Proc. IEEE Intell. Vehicles Symp., [35] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification.
Jun. 2005, pp. 583–587. New York: Wiley-Interscience, Nov. 2000.
[16] S. Munder and M. Gavrila, “An experimental study on pedestrian classifi- [36] Y. Chen and C. Chen, “Fast human detection using a novel boosted cas-
cation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 11, pp. 1863– cading structure with meta stages,” IEEE Trans. Image Process., vol. 17,
1868, Nov. 2006. no. 8, pp. 1452–1464, Aug. 2008.
[17] P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of [37] K. Sung and T. Poggio, “Example-based learning for view-based human
motion and appearance,” Int. J. Comput. Vis., vol. 63, no. 2, pp. 153–161, face detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 1,
Jul. 2005. pp. 39–51, Jan. 1998.
[18] N. Dalal and B. Triggs, “Histograms of oriented gradients for human [38] N. Dalal, “Finding people in images and videos,” Ph.D. dissertation,
detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2005, Institut National Polytechnique de Grenoble, Grenoble, France, Jul. 2006.
pp. 886–893.
[19] M. Bertozzi, A. Broggi, M. Rose, M. Felisa, A. Rakotomamonjy, and
F. Suard, “A pedestrian detector using histograms of oriented gradients
and a support vector machine classifier,” in Proc. IEEE Conf. Intell. Junfeng Ge (S’09) received the B.Sc. degree in
Transp. Syst., Sep. 2007, pp. 143–148. measuring and control technology and instrumen-
[20] H. Nanda and L. Davis, “Probabilistic template based pedestrian detection tations from the Huazhong University of Science
in infrared videos,” in Proc. IEEE Intell. Vehicles Symp., Jun. 2002, and Technology, Wuhan, China, in 2003. He is cur-
pp. 15–20. rently working toward the Ph.D. degree in control
[21] B. Zhang, Q. Tian, and Y. Luo, “An improved pedestrian detection ap- science and engineering with the Tsinghua National
proach for cluttered background in nighttime,” in Proc. IEEE Intell. Conf. Laboratory for Information Science and Technology,
Veh. Electron. Safety, Oct. 2005, pp. 143–148. Department of Automation, Tsinghua University,
[22] C. Hou, H. Ai, and S. Lao, “Multiview pedestrian detection based on Beijing, China.
vector boosting,” in Proc. ACCV, Nov. 2007, vol. 4843, pp. 210–219. His research interests include machine learning
[23] M. Bertozzi, A. Broggi, S. Ghidoni, and M. Meinecke, “A night vision and computer vision.
module for the detection of distant pedestrians,” in Proc. IEEE Intell.
Vehicles Symp., Jun. 2007, pp. 25–30.
[24] A. Broggi, R. Fedriga, A. Tagliati, T. Graf, and M. Meinecke, “Pedestrian
detection on a moving vehicle: An investigation about near infra-red Yupin Luo received the B.Sc. degree from Hunan
images,” in Proc. IEEE Intell. Vehicles Symp., Sep. 2006, pp. 431–436. University, Hunan, China, in 1982 and the M.Sc. and
[25] M. Sezgin and B. Sankur, “Survey over image thresholding techniques Ph.D. degrees from Nagoya Institute of Technology,
and quantitative performance evaluation,” J. Electron. Imag., vol. 13, Nagoya, Japan, in 1987 and 1990, respectively.
no. 1, pp. 146–165, Jan. 2004. He is currently a Professor with the Tsinghua
[26] J. F. Canny, “A computational approach to edge detection,” IEEE Trans. National Laboratory for Information Science and
Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986. Technology, Department of Automation, Tsinghua
[27] M. Bertozzi, A. Broggi, A. Fascioli, T. Graf, and M. Meinecke, “Pedes- University, Beijing, China. His research interests in-
trian detection for driver assistance using multiresolution infrared vision,” clude image processing, computer vision, and pattern
IEEE Trans. Veh. Technol., vol. 53, no. 6, pp. 1666–1678, Nov. 2004. recognition.
[28] Y. Fang, K. Yamada, Y. Ninomiya, B. Horn, and I. Masaki, “A shape-
independent method for pedestrian detection with far-infrared images,”
IEEE Trans. Veh. Technol., vol. 53, no. 6, pp. 1679–1697, Nov. 2004.
[29] P. Viola and M. Jones, “Rapid object detection using a boosted cascade
of simple features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Gyomei Tei received the B.Sc. and M.Sc. degrees
Dec. 2001, pp. 511–518. from the Toyohashi University of Technology,
[30] C. Papageorgiou, M. Oren, and T. Poggio, “A general framework Toyohashi, Japan, in 1988 and 1990, respectively.
for object detection,” in Proc. IEEE Int. Conf. Comput. Vis., Jan. 1998, He is currently the President of INF Technologies
pp. 555–562. Ltd., Beijing, China. His research interests include
[31] Q. Zhu, C. Yeh, T. Cheng, and S. Avidan, “Fast human detection using pattern recognition and artificial intelligence.
a cascade of histograms of oriented gradients,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recog., Jun. 2006, pp. 1491–1498.
[32] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: A
statistical view of boosting,” Ann. Stat., vol. 28, no. 2, pp. 337–374, 2000.