Hyeokhyen Kwon, Ph.D.

An explainable spatial-temporal graphical convolutional network to score freezing of gait in parkinsonian patients

Freezing of gait (FOG) is a poorly understood heterogeneous gait disorder seen in patients with p... more Freezing of gait (FOG) is a poorly understood heterogeneous gait disorder seen in patients with parkinsonism which contributes to significant morbidity and social isolation. FOG is currently measured with scales that are typically performed by movement disorders specialists (ie. MDS-UPDRS), or through patient completed questionnaires (N-FOG-Q) both of which are inadequate in addressing the heterogeneous nature of the disorder and are unsuitable for use in clinical trials The purpose of this study was to devise a method to measure FOG objectively, hence improving our ability to identify it and accurately evaluate new therapies. We trained interpretable deep learning models with multi-task learning to simultaneously score FOG (cross-validated F1 score 97.6%), identify medication state (OFF vs. ON levodopa; cross-validated F1 score 96.8%), and measure total PD severity (MDS-UPDRS-III score prediction error ≤ 2.7 points) using kinematic data of a well-characterized sample of N=57 patien...

Download

Fine-grained Human Activity Recognition Using Virtual On-body Acceleration Data

Cornell University - arXiv, Nov 2, 2022

Download

Leveraging WiFi Network Logs to Infer Student Collocation and its Relationship with Academic Performance

A comprehensive understanding of collocation can help understand performance outcomes. For univer... more A comprehensive understanding of collocation can help understand performance outcomes. For university cohorts, this needs data that describes large groups over a long period. Harnessing user devices to infer this, while tempting, is challenged by privacy concerns, power consumption, and maintenance issues. Alternatively, embedding new sensors in the environment is limited by the expense of covering the entire campus. We investigate the feasibility of leveraging WiFi association logs for this purpose. While these provide coarse approximations of location, these are easily obtainable and depict multiple users on campus over a semester. We explore how these coarse collocations are related to individual performance. Specifically, we inspect the association between individual performance and the collocation behaviors of project group members. We study 163 students (in 54 project groups) over 14 weeks. After describing how we determine collocation with the WiFi logs, we present a study to...

Download

Can You See It?

GetMobile: Mobile Computing and Communications, 2021

Today's smartphones and wearable devices come equipped with an array of inertial sensors, alo... more Today's smartphones and wearable devices come equipped with an array of inertial sensors, along with IMU-based Human Activity Recognition models to monitor everyday activities. However, such models rely on large amounts of annotated training data, which require considerable time and effort for collection. One has to recruit human subjects, define clear protocols for the subjects to follow, and manually annotate the collected data, along with the administrative work that goes into organizing such a recording.

Complex Deep Neural Networks from Large Scale Virtual IMU Data for Effective Human Activity Recognition Using Wearables

Sensors (Basel, Switzerland), 2021

Supervised training of human activity recognition (HAR) systems based on body-worn inertial measu... more Supervised training of human activity recognition (HAR) systems based on body-worn inertial measurement units (IMUs) is often constrained by the typically rather small amounts of labeled sample data. Systems like IMUTube have been introduced that employ cross-modality transfer approaches to convert videos of activities of interest into virtual IMU data. We demonstrate for the first time how such large-scale virtual IMU datasets can be used to train HAR systems that are substantially more complex than the state-of-the-art. Complexity is thereby represented by the number of model parameters that can be trained robustly. Our models contain components that are dedicated to capture the essentials of IMU data as they are of relevance for activity recognition, which increased the number of trainable parameters by a factor of 1100 compared to state-of-the-art model architectures. We evaluate the new model architecture on the challenging task of analyzing free-weight gym exercises, specifica...

Download

IMUTube: Automatic Extraction of Virtual on-body Accelerometry from Video for Human Activity Recognition

The lack of large-scale, labeled data sets impedes progress in developing robust and generalized ... more The lack of large-scale, labeled data sets impedes progress in developing robust and generalized predictive models for on-body sensor-based human activity recognition (HAR). Labeled data in human activity recognition is scarce and hard to come by, as sensor data collection is expensive, and the annotation is time-consuming and error-prone. To address this problem, we introduce IMUTube, an automated processing pipeline that integrates existing computer vision and signal processing techniques to convert videos of human activity into virtual streams of IMU data. These virtual IMU streams represent accelerometry at a wide variety of locations on the human body. We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets. Our initial results are very promising, but the greater promise of this work lies in a collective approach by the computer vision, signal processing, and activity recognition communities to extend this work in ways ...

Download

Personalization Models for Human Activity Recognition with Distribution Matching-Based Metrics

Leveraging WiFi Network Logs to Infer Social Interactions: A Case Study of Academic Performance and Student Behavior

ArXiv, 2020

On university campuses, social interactions among students can explain their academic experiences... more On university campuses, social interactions among students can explain their academic experiences. However, assessing these interactions with surveys fails to capture their dynamic nature. While these behaviors can be captured with client-based passive sensing, these techniques are limited in scalability. By contrast, infrastructure-based approaches can scale to a large cohort and infer social interactions based on collocation of students. This paper investigates one such approach by leveraging WiFi association logs archived by a managed campus network. In their raw form, access point logs can approximate a student's location but with low spatio-temporal resolution. This paper first demonstrates that processing these logs can infer the collocation of 46 students in 34 lectures over 3 months, with a precision of 0.89 and a recall of 0.75. Next, we investigate how this WiFi-based coarse collocation reflects signals of social interaction. With 163 students in 54 project groups, we ...

Download

Approaching the Real-World

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2021

Recently, IMUTube introduced a paradigm change for bootstrapping human activity recognition (HAR)... more Recently, IMUTube introduced a paradigm change for bootstrapping human activity recognition (HAR) systems for wearables. The key idea is to utilize videos of activities to support training activity recognizers based on inertial measurement units (IMUs). This system retrieves video from public repositories and subsequently generates virtual IMU data from this. The ultimate vision for such a system is to make large amounts of weakly labeled videos accessible for model training in HAR and, as such, to overcome one of the most pressing issues in the field: the lack of significant amounts of labeled sample data. In this paper we present the first in-detail exploration of IMUTube in a realistic assessment scenario: the analysis of free-weight gym exercises. We make significant progress towards a flexible, fully-functional IMUTube system by extending it such that it can handle a range of artifacts that are common in unrestricted online videos, including various forms of video noise, non-hu...

IMUTube

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2020

The lack of large-scale, labeled data sets impedes progress in developing robust and generalized ... more The lack of large-scale, labeled data sets impedes progress in developing robust and generalized predictive models for on-body sensor-based human activity recognition (HAR). Labeled data in human activity recognition is scarce and hard to come by, as sensor data collection is expensive, and the annotation is time-consuming and error-prone. To address this problem, we introduce IMUTube, an automated processing pipeline that integrates existing computer vision and signal processing techniques to convert videos of human activity into virtual streams of IMU data. These virtual IMU streams represent accelerometry at a wide variety of locations on the human body. We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets. Our initial results are very promising, but the greater promise of this work lies in a collective approach by the computer vision, signal processing, and activity recognition communities to extend this work in ways ...

Download

RGB-Guided Sparse Coding for Depth map and Hyperspectral image upsampling

In this thesis, we introduce data-driven approach with RGB guidance utilized for refinement of tw... more In this thesis, we introduce data-driven approach with RGB guidance utilized for refinement of two types of image, which are depth and hyperspectral images. Depth maps captured by consumer-level depth cameras such as Kinect are usually degraded by noise, missing values, and quantizations. Hyperspectral images usually lack of spatial resolution due to limitations of hardware design of imaging sensors. On the contrary, latest imaging sensors can capture a RGB image with resolution of multiple times larger than a hyperspectral image. For refining degraded RAW depth maps that are coupled with an RGB image, our approach takes advantage of a training set of highquality depth data and transfer its information to the RAW depth map through multi-scale dictionary learning. Utilizing a sparse representation, our method learns a dictionary of geometric primitives which captures the correlation between high-quality mesh data, RAW depth maps and RGB images. The dictionary is learned and applied in a manner that accounts for various practical issues that arise in dictionary-based depth refinement. Compared to previous approaches that only utilize the correlation between RAW depth maps and RGB images, our method produces improved depth maps without over-smoothing. Since our approach is data driven, the refinement can be targeted to a specific class of objects by employing a corresponding training set. In our experiments, we show that this leads to additional improvements in recovering depth maps of human faces. And, for hyperspectral image enhancement and super-resolution, we propose an algorithm utilizing two stages: spatial upsampling stage and spectrum substitution stage. The spatial upsampling stage is guided by a high resolution RGB image of the same scene, and the spectrum substitution stage utilizes sparse coding to locally refine the upsampled hyperspectral image through dictionary substitution. Experiments show that our algorithm is highly effective and has outperformed state-of-the-art matrix factorization based approaches

Adding structural characteristics to distribution-based accelerometer representations for activity recognition using wearables

Proceedings of the 2018 ACM International Symposium on Wearable Computers, 2018

Feature extraction is a critical step in sliding-window based standard activity recognition chain... more Feature extraction is a critical step in sliding-window based standard activity recognition chains. Recently, distribution based features have been introduced that showed excellent generalization capabilities across a wide range of application domains in human activity recognition scenarios based on body-worn sensors. These features capture the data distribution of individual analysis frames, yet they ignore temporal structure inherent to the signal of a frame. We explore four variants of adding temporal structure to distribution based features and demonstrate their potential for statistically significant improvements of activity recognition in general. The addition of temporal structure comes with a moderate increase in computational complexity rendering the proposed methods applicable to mobile and embedded scenarios.

Handling annotation uncertainty in human activity recognition

Proceedings of the 23rd International Symposium on Wearable Computers, 2019

Developing systems for Human Activity Recognition (HAR) using wearables typically relies on datas... more Developing systems for Human Activity Recognition (HAR) using wearables typically relies on datasets that were manually annotated by human experts with regards to precise timings of instances of relevant activities. However, obtaining such data annotations is often very challenging in the predominantly mobile scenarios of Human Activity Recognition. As a result, labels often carry a degree of uncertainty-label jitter-with regards to: i) correct temporal alignments of activity boundaries; and ii) correctness of the actual label provided by the human annotator. In this work, we present a scheme that explicitly incorporates label jitter into the model training process. We demonstrate the effectiveness of the proposed method through a systematic experimental evaluation on standard recognition tasks for which our method leads to significant increases of mean F1 scores.

RGB-Guided Hyperspectral Image Upsampling

2015 IEEE International Conference on Computer Vision (ICCV), 2015

Download

Data-Driven Depth Map Refinement via Multi-Scale Sparse Representation

Handling Annotation Uncertainty in Human Activity Recognition

Proceeding ISWC '19 Proceedings of the 23rd International Symposium on Wearable Computers, 2019

Developing systems for Human Activity Recognition (HAR) using wearables typically relies on datas... more Developing systems for Human Activity Recognition (HAR) using wearables typically relies on datasets that were manually annotated by human experts with regards to precise timings of instances of relevant activities. However, obtaining such data annotations is often very challenging in the predominantly mobile scenarios of Human Activity Recognition. As a result, labels often carry a degree of uncertainty-label jitter-with regards to: i) correct temporal alignments of activity boundaries; and ii) correctness of the actual label provided by the human annotator. In this work, we present a scheme that explicitly incorporates label jitter into the model training process. We demonstrate the effectiveness of the proposed method through a systematic experimental evaluation on standard recognition tasks for which our method leads to significant increases of mean F1 scores.

Download

Adding Structural Characteristics to Distribution-Based Accelerometer Representations for Activity Recognition Using Wearables

In Proceedings of the 2018 ACM International Symposium on Wearable Computers, 2018

Feature extraction is a critical step in sliding-window based standard activity recognition chain... more Feature extraction is a critical step in sliding-window based standard activity recognition chains. Recently, distribution based features have been introduced that showed excellent generalization capabilities across a wide range of application domains in human activity recognition scenarios based on body-worn sensors. These features capture the data distribution of individual analysis frames, yet they ignore temporal structure inherent to the signal of a frame. We explore four variants of adding temporal structure to distribution based features and demonstrate their potential for statistically significant improvements of activity recognition in general. The addition of temporal structure comes with a moderate increase in computational complexity rendering the proposed methods applicable to mobile and embedded scenarios.

Download

RGB-Guided Sparse Coding for Depth map and Hyperspectral image upsampling

In this thesis, we introduce data-driven approach with RGB guidance utilized for refinement of tw... more In this thesis, we introduce data-driven approach with RGB guidance utilized for refinement of two types of image, which are depth and hyperspectral images. Depth maps captured by consumer-level depth cameras such as Kinect are usually degraded by noise, missing values, and quantizations. Hyperspectral images usually lack of spatial resolution due to limitations of hardware design of imaging sensors. On the contrary, latest imaging sensors can capture a RGB image with resolution of multiple times larger than a hyperspectral image. For refining degraded RAW depth maps that are coupled with an RGB image, our approach takes advantage of a training set of highquality depth data and transfer its information to the RAW depth map through multi-scale dictionary learning. Utilizing a sparse representation, our method learns a dictionary of geometric primitives which captures the correlation between high-quality mesh data, RAW depth maps and RGB images. The dictionary is learned and applied in a manner that accounts for various practical issues that arise in dictionary-based depth refinement. Compared to previous approaches that only utilize the correlation between RAW depth maps and RGB images, our method produces improved depth maps without over-smoothing. Since our approach is data driven, the refinement can be targeted to a specific class of objects by employing a corresponding training set. In our experiments, we show that this leads to additional improvements in recovering depth maps of human faces. And, for hyperspectral image enhancement and super-resolution, we propose an algorithm utilizing two stages: spatial upsampling stage and spectrum substitution stage. The spatial upsampling stage is guided by a high resolution RGB image of the same scene, and the spectrum substitution stage utilizes sparse coding to locally refine the upsampled hyperspectral image through dictionary substitution. Experiments show that our algorithm is highly effective and has outperformed state-of-the-art matrix factorization based approaches

RGB-Guided Hyperspectral Image Upsampling

Hyperspectral imaging usually lack of spatial resolution due to limitations of hardware design of... more Hyperspectral imaging usually lack of spatial resolution due to limitations of hardware design of imaging sensors. On the contrary, latest imaging sensors capture a RGB image with resolution of multiple times larger than a hyper-spectral image. In this paper, we present an algorithm to enhance and upsample the resolution of hyperspectral images. Our algorithm consists of two stages: spatial upsam-pling stage and spectrum substitution stage. The spatial up-sampling stage is guided by a high resolution RGB image of the same scene, and the spectrum substitution stage utilizes sparse coding to locally refine the upsampled hyperspectral image through dictionary substitution. Experiments show that our algorithm is highly effective and has outperformed state-of-the-art matrix factorization based approaches.

Download

Data-Driven Depth Map Refinement via Multi-scale Sparse Representation

by Hyeokhyen Kwon, Ph.D. and Yu-Wing Tai

Depth maps captured by consumer-level depth cameras such as Kinect are usually degraded by noise,... more Depth maps captured by consumer-level depth cameras such as Kinect are usually degraded by noise, missing values,
and quantization. In this paper, we present a data-driven approach for refining degraded RAW depth maps that are
coupled with an RGB image. The key idea of our approach is to take advantage of a training set of high-quality depth
data and transfer its information to the RAW depth map through multi-scale dictionary learning. Utilizing a sparse
representation, our method learns a dictionary of geometric primitives which captures the correlation between high-
quality mesh data, RAW depth maps and RGB images. The dictionary is learned and applied in a manner that accounts
for various practical issues that arise in dictionary-based depth refinement. Compared to previous approaches that only
utilize the correlation between RAW depth maps and RGB images, our method produces improved depth maps without
over-smoothing. Since our approach is data driven, the refinement can be targeted to a specific class of objects by
employing a corresponding training set. In our experiments, we show that this leads to additional improvements in
recovering depth maps of human faces.

Download

Uploads

Papers

Log In