Skip to main content
  • PhD: McGill University, Montreal, Canada (1989) MS: Technical University of Szczecin, Szczecin, Poland (1980) Curren... moreedit
We propose two supervised methods for people counting using an overhead fisheye camera. As opposed to standard cameras, fisheye cameras offer a large field of view and, when mounted overhead, reduce occlusions. However, methods developed... more
We propose two supervised methods for people counting using an overhead fisheye camera. As opposed to standard cameras, fisheye cameras offer a large field of view and, when mounted overhead, reduce occlusions. However, methods developed for standard cameras perform poorly on fisheye images since they do not account for the radial image geometry. Furthermore, no large-scale fisheye-image datasets with radially-aligned bounding box annotations are available for training. We adapt YOLOv3 trained on standard images for people counting in fisheye images. In one method, YOLOv3 is applied to 24 rotated, overlapping windows and the results are post-processed to produce a people count. In another method, YOLOv3 is applied to windows of interest extracted by background subtraction. For evaluation, we collected and annotated an indoor fisheye-image dataset that we make public. Experiments on this dataset show that our methods reduce the people counting MAE of two natural benchmarks by over 60%.
The estimation of human head pose is of interest in some surveillance and human-computer interaction scenarios. Traditionally, this is not a difficult task if high- or even standard-definition video cameras are used. However, such cameras... more
The estimation of human head pose is of interest in some surveillance and human-computer interaction scenarios. Traditionally, this is not a difficult task if high- or even standard-definition video cameras are used. However, such cameras cannot be used in scenarios requiring privacy protection. In this paper, we propose a non-linear regression method for the estimation of human head pose from extremely low resolution images captured by a monocular RGB camera. We evaluate the common histogram of oriented gradients (HoG) feature, propose a new gradient-based feature, and use Support Vector Regression (SVR) to estimate head pose. We evaluate our algorithm on the Biwi Kinect Head Pose Dataset by re-sizing full-resolution RGB images to extremely low resolutions. The results are promising. At 10×10-pixel resolution, we achieve 6.95, 9.92 and 12.88 degree mean-absolute errors (MAE) for roll, yaw and pitch angles, respectively. These errors are very close to state-of-the-art results for full-resolution images.
Unobtrusive monitoring of distances between people indoors is a useful tool in the fight against pandemics. A natural resource to accomplish this are surveillance cameras. Unlike previous distance estimation methods, we use a single,... more
Unobtrusive monitoring of distances between people indoors is a useful tool in the fight against pandemics. A natural resource to accomplish this are surveillance cameras. Unlike previous distance estimation methods, we use a single, overhead, fisheye camera with wide area coverage and propose two approaches. One method leverages a geometric model of the fisheye lens, whereas the other method uses a neural network to predict the 3D-world distance from people-locations in a fisheye image. For evaluation, we collected a first-of-its-kind dataset, Distance Estimation between People from Overhead Fisheye cameras (DEPOF), using a single fisheye camera, that comprises a wide range of distances between people (1-58ft) and is publicly available. The algorithms achieve 20-inch average distance error and 95% accuracy in detecting social-distance violations. d
Person re-identification using rectilinear cameras has been thoroughly researched to date. However, the topic has received little attention for fisheye cameras and the few developed methods are appearance-based. We propose a... more
Person re-identification using rectilinear cameras has been thoroughly researched to date. However, the topic has received little attention for fisheye cameras and the few developed methods are appearance-based. We propose a geometry-based approach to re-identification for overhead fisheye cameras with overlapping fields of view. The main idea is that a person visible in two camera views is uniquely located in the view of one camera given their height and location in the other camera’s view. We develop a height-dependent mathematical relationship between these locations using the unified spherical model for omnidirectional cameras. We also propose a new fisheye-camera calibration method and a novel automated approach to calibration-data collection. Finally, we propose four re-identification algorithms that leverage geometric constraints and demonstrate their excellent accuracy, which vastly exceeds that of a state-of-the-art appearance-based method, on a fisheye-camera dataset we collected.
Depth-sensors, such as the Kinect, have predominately been used as a gesture recognition device. Recent works, however, have proposed using these sensors for user authentication using biometric modalities such as: face, speech, gait and... more
Depth-sensors, such as the Kinect, have predominately been used as a gesture recognition device. Recent works, however, have proposed using these sensors for user authentication using biometric modalities such as: face, speech, gait and gesture. The last of these modalities - gestures, used in the context of full-body and hand-based gestures, is relatively new but has shown promising authentication performance. In this paper, we focus on hand-based gestures that are performed in-air. We present a novel approach to user authentication from such gestures by leveraging a temporal hierarchy of depth-aware silhouette covariances. Further, we investigate the usefulness of shape and depth information in this modality, as well as the importance of hand movement when performing a gesture. By exploiting both shape and depth information our method attains an average 1.92% Equal Error Rate (EER) on a dataset of 21 users across 4 predefined hand-gestures. Our method consistently outperforms related methods on this dataset.
Most research on deep learning algorithms for image denoising has focused on signal-independent additive noise. Focused ion beam (FIB) microscopy with direct secondary electron detection has an unusual Neyman Type A (compound Poisson)... more
Most research on deep learning algorithms for image denoising has focused on signal-independent additive noise. Focused ion beam (FIB) microscopy with direct secondary electron detection has an unusual Neyman Type A (compound Poisson) measurement model, and sample damage poses fundamental challenges in obtaining training data. Model-based estimation is difficult and ineffective because of the nonconvexity of the negative log likelihood. In this paper, we develop deep learning-based denoising methods for FIB micrographs using synthetic training data generated from natural images. To the best of our knowledge, this is the first attempt in the literature to solve this problem with deep learning. Our results show that the proposed methods slightly outperform a total variation-regularized model-based method that requires time-resolved measurements that are not conventionally available. Improvements over methods using conventional measurements and less accurate noise modeling are dramatic - around 10 dB in peak signal-to-noise ratio.
We propose a system for indoor localization using intensity-controllable LED light fixtures and light sensors mounted on the ceiling. While providing accurate location estimates, our approach preserves user privacy and is robust to... more
We propose a system for indoor localization using intensity-controllable LED light fixtures and light sensors mounted on the ceiling. While providing accurate location estimates, our approach preserves user privacy and is robust to ambient light conditions. We develop a LASSO algorithm and a localized ridge regression algorithm for locating a single object. In synthetic experiments, our localized ridge regression algorithm achieves an average localization error ranging from 0.24in to 1.39in, for different object sizes, in a 7×12-foot room. The localized ridge regression algorithm also shows the ability to locate multiple objects in experiments with a real-world occupancy scenario.
This paper presents the outcomes of the PETS2021 challenge held in conjunction with AVSS2021 and sponsored by the EU FOLDOUT project. The challenge comprises the publication of a novel video surveillance dataset on through-foliage... more
This paper presents the outcomes of the PETS2021 challenge held in conjunction with AVSS2021 and sponsored by the EU FOLDOUT project. The challenge comprises the publication of a novel video surveillance dataset on through-foliage detection, the defined challenges addressing person detection and tracking in fragmented occlusion scenarios, and quantitative and qualitative performance evaluation of challenge results submitted by six worldwide participants. The results show that while several detection and tracking methods achieve overall good results, through-foliage detection and tracking remains a challenging task for surveillance systems especially as it serves as the input to behaviour (threat) recognition.
Various coding schemes based on lifting implementation of the discrete wavelet transform applied along motion trajectories have recently gained a lot of interest in video processing community as strong candidates to succeed current... more
Various coding schemes based on lifting implementation of the discrete wavelet transform applied along motion trajectories have recently gained a lot of interest in video processing community as strong candidates to succeed current state-of-the-art hybrid coders. Still, there are a number of very important issues, including the choice of particular wavelet transform and motion model, that have significant impact on the overall coding performance and will determine usefulness of this class of coders. In this paper, we classify and discuss different motion/transform configurations that are being used in motion-compensated lifting-based wavelet transforms. Our results show that coder performance changes significantly for different combinations of motion models and transforms used.
Various coding schemes based on lifting implementation of the discrete wavelet transform applied along motion trajectories have recently gained a lot of interest in video processing community as strong candidates to succeed current... more
Various coding schemes based on lifting implementation of the discrete wavelet transform applied along motion trajectories have recently gained a lot of interest in video processing community as strong candidates to succeed current state-of-the-art hybrid coders. Still, there are a number of very important issues, including the choice of particular wavelet transform and motion model, that have significant impact on the overall coding performance and will determine usefulness of this class of coders. In this paper, we classify and discuss different motion/transform configurations that are being used in motion-compensated lifting-based wavelet transforms. Our results show that coder performance changes significantly for different combinations of motion models and transforms used.
abstract This paper presents a highly automated, more accurate approach to High Definition Imaging (HDI) using low signal-to-noise digital videos recorded at ground-based telescopes. The HDI approach involves the acquisition of a video... more
abstract This paper presents a highly automated, more accurate approach to High Definition Imaging (HDI) using low signal-to-noise digital videos recorded at ground-based telescopes. The HDI approach involves the acquisition of a video sequence (10e3-10e5 fields) taken through a turbulent atmosphere followed by three-step post-processing. The specific goal is to be able to reproduce expert results, while limiting human interaction, to study both surface features and the atmospheres of planets and moons. The telescopes used here are ...
Abstract Visual surveillance applications such as object identification, object tracking, and anomaly detection require reliable motion detection as an initial processing step. Such a detection is often accomplished by means of background... more
Abstract Visual surveillance applications such as object identification, object tracking, and anomaly detection require reliable motion detection as an initial processing step. Such a detection is often accomplished by means of background subtraction which can be as simple as thresholding of intensity difference between movement-free background and current frame. However, more effective background subtraction methods employ probabilistic modeling of the background followed by probability thresholding. In this case, ...
Abstract Visual surveillance applications such as object identification, object tracking, and anomaly detection require reliable motion detection as an initial processing step. Such a detection is often accomplished by means of background... more
Abstract Visual surveillance applications such as object identification, object tracking, and anomaly detection require reliable motion detection as an initial processing step. Such a detection is often accomplished by means of background subtraction which can be as simple as thresholding of intensity difference between movement-free background and current frame. However, more effective background subtraction methods employ probabilistic modeling of the background followed by probability thresholding. In this case, ...

And 241 more