[go: up one dir, main page]

0% found this document useful (0 votes)
58 views11 pages

Interpretable

Paper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views11 pages

Interpretable

Paper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)

Interpretable Detection of Distribution Shifts in Learning


Enabled Cyber-Physical Systems
Yahan Yang Ramneet Kaur
yangy96@seas.upenn.edu ramneetk@seas.upenn.edu
University of Pennsylvania University of Pennsylvania
Philadelphia, Pennsylvania, USA Philadelphia, Pennsylvania, USA

Souradeep Dutta Insup Lee


duttaso@seas.upenn.edu lee@cis.upenn.edu
University of Pennsylvania University of Pennsylvania
Philadelphia, Pennsylvania, USA Philadelphia, Pennsylvania, USA

ABSTRACT modern car [2] could be performing simple lane keeping function
The use of learning based components in cyber-physical systems from the video feed it receives from the camera. The other end of the
(CPS) has created a gamut of possible avenues to use high dimen- spectrum would be a fully self-driving car equipped with an auto-
sional real world signals generated from sensors like camera and mated driving system (ADS). Here, the software is in full control of
LiDAR. The ability to process such signals can be largely attributed the car, and is capable of making all decision concerning navigation
to the adoption of high-capacity function approximators like deep and maneuvering the vehicle. This has the potential to dramati-
neural networks. However, this does not come without its potential cally reduce accidents and general vehicle safety. Additionally, such
perils. The pitfalls arise from possible over-fitting, and subsequent technologies have the potential to bolster the independence and
unsafe behavior when exposed to unknown environments. One mobility of seniors and those who cannot drive.
challenge is that, in high dimensional input spaces it is almost im- Deep Neural networks (DNNs) perform most of the the heavy
possible to experience enough training data in the design phase. lifting regarding LEC’s with rich sensors like camera and LiDAR.
What is required here, is an efficient way to flag out-of-distribution DNNs are high-capacity function approximators, and form a funda-
(OOD) samples that is precise enough to not raise too many false mental building block for most machine learning applications. The
alarms. In addition, the system needs to be able to detect these downside is that DNNs do not learn to interpret an image input
in a computationally efficient manner at runtime. In this paper, the way humans do. It largely happens through an effort to fit a
our proposal is to build good representations for in-distribution high-capacity function to the training data by reducing its classifi-
data. We introduce the idea of a memory bank to store prototypical cation error rate. DNNs are complex computation graphs and can
samples from the input space. We use these memories to compute potentially have millions of nodes and parameters. This makes them
probability density estimates using kernel density estimation tech- particularly difficult to be analyzable by a human expert. What this
niques. We evaluate our technique on two challenging scenarios : a translates to is that computer vision systems are prone to errors
self-driving car setting implemented inside the simulator CARLA in a way that people are not. For instance it is fairly easy to come
with image inputs, and an autonomous racing car navigation set- up with imperceptible changes to an input image that can fool the
ting, with LiDAR inputs. In both settings, it was observed that a ADS system. It can be shown that its fairly simple to change the
deviation from in-distribution setting can potentially lead to devia- number on a speed limit sign, or even change a stop sign to a speed
tion from safe behavior. An added benefit of using training samples limit sign [24].
as memories to detect out-of-distribution inputs is that the system The challenge with most LECs is that it is necessarily the case
is interpretable to a human operator. Explanation of this nature that the input space is insufficiently sampled. That is, during the
is generally hard to obtain from pure deep learning based alter- design phase the system does not see all the training samples in the
natives. Our code for reproducing the experiments is available at vicinity of the operating region of the system. Hence it is almost
https:// github.com/ yangy96/ interpretable_ood_detection.git impossible to provide rigorous guarantees on the operating limits at
design time. This is where efficient runtime monitoring techniques
KEYWORDS can play a significant role in ensuring safe operations of the system.
Neural networks due to the nature of the training algorithms do not
anomalous inputs, autonomy, safety, vision, perception systems perform well when pushed outside of its training zone. Even though
statistical machine learning tools can provide upper bounds on the
1 INTRODUCTION test error rate of a learnt model, they are often too conservative
and do not capture a realistic error rate.
Learning-enabled components (LEC) are being increasingly used in
Detecting out-of-distribution (OOD) [16, 23] samples has gained
modern cyber-physical systems (CPS) especially self-driving cars.
widespread popularity in the recent years to counter the fragility
Designing such systems typically rely on data-driven techniques to
of DNNs when operating in unknown environments. However, it
achieve desired performance. An LEC learns to operate by having
could not been extended easily to a CPS setting, and can have high
access to a large corpus of human-labelled input-output data during
false alarm rates. The main contribution of this paper is a novel
the design phase. For instance, a learning-enabled component in a

978-1-6654-0967-4/22/$31.00 ©2022 IEEE 225


DOI 10.1109/ICCPS54341.2022.00027
technique to detect anomalous inputs in real time. The idea is to To the best of our knowledge, all the existing approaches for
build a representation system which captures the essence of in- OOD detection in CPS with LEC are tied to VAE. Either recon-
distribution data, using only a few samples from the training set. struction error from VAE on the input image or KL-divergence in
These prototype samples are referred to as memories in our paper. the latent space of the VAE is used for OOD detection in these
We use well established tools from computer vision literature and approaches. Training VAEs often requires careful manual tuning
combine it with kernel density estimation techniques to compute [5], and the quality of the training decides the efficacy of the down-
the probability density estimates. In order to improve the robustness stream processes. Here we set ourselves apart by not having to
for detecting distribution shifts, we implement a sliding window depend on a well functioning VAE. Also, unlike our approach none
based technique to flag an alarm. of the existing approaches except for Ramakrishna et al. [27]’s ap-
Additionally, our technique can provide interpretability using proach provides interpretability on the source of OOD-ness of the
samples from the data set. A widely accepted mode of explanation input. We show that our approach can be extended to the case of
for machine learning systems, is in the form of comparisons drawn LiDAR inputs as well without any conceptual modification.
between a test sample and a witness from the data set [10, 30]. In
a similar fashion we can provide explanations, when a sample is 3 MOTIVATION AND PROBLEM STATEMENT
detected either as in-distribution or out-of-distribution. In our ex-
periments we consider two broad sets of case studies, one involving
video inputs from camera, and the other involving LiDAR inputs.
For video inputs the application is an advanced emergency braking
system , and an end-to-end self-driving system introduced in [9].
We consider different sources of distribution shift such as a shift
from the training weather (low precipitation), lighting conditions
(day), leading obstacles (car), and clean (or non-adversarial) images.
In the context of LiDAR inputs such anomalous inputs come in the
form of light rays getting reflected by different surfaces in a real (a) Car doesn’t detect biker, (b) Frame detected as OOD, the
setting. This pushes the network outside the trusted zone causing leading to a crash region in red shows the pixels
responsible for deviation
deviation from safe behavior. In both cases, we were able to achieve
good OOD detection, which could save the system from a crash.
Figure 1: Deviation from training data leads to a crash with
biker as front object. Training data only had cars as front
objects. Our proposed method could detect deviations from
2 RELATED WORK in-distribution data for detecting such OODs.
OOD detection has been extensively studied in the classification
problem settings for standalone learning enabled components [16, As mentioned before, learning enabled components have the
19, 20, 23, 32, 36]. These approaches either use differences in the ability to bolster the level of autonomy a cyber-physical system has
geometrical or statistical properties of the in-distribution and OOD to offer. Detection of OOD is one of the ways we can safe-guard
data for detecting a shift in the model behavior. OOD detection systems from unwarranted behavior. In Figure 1a we show an ex-
through envelopes, in CPS with low-dimensional input space sen- ample of a setting where the car is running an advanced emergency
sors such as GPS has been studied in the past [33]. Recently, there braking system controller. The controller uses the system states
has been a growing interest for detection of OODs in closed-loop and the video feeds from the camera to sense the positions of the
CPS using high-dimensional sensors like camera [9, 13, 27, 31]. closest leading object on the road. The controller’s job is to auto-
Cai and Koutsoukos [9] propose using reconstruction error by matically brake the car if it crosses a certain distance threshold
variational autoencoder (VAE) on the input image (or frame) as a from the leading vehicle. What we observe is that because during
non-conformity score in the inductive conformal anomaly detection training the DNN experienced just cars, it never learnt to react to
(ICAD) framework [21] for detection of OOD frames. They further bikes on the road. What happens next, is that the DNN completely
apply martingale test [34] along with the cumulative sum procedure misjudges a bike in the video, and ends up causing an accident.
(CUSUM) [6] with a window of the past and present predictions What we would like to target here is to propose a method to detect
for robust detection of OOD traces. Ramakrishna et al. [27] use such a shift in distribution.
KL-divergence between the disentangled feature space of 𝛽−VAE Problem Formulation : In this paper, we would like to solve
and normal distribution as the non-conformity score in ICAD for the problem of being able to alarm the system about distribution
OOD detection of a single frame. They also use martingale test shifts in real-time. It is extremely challenging to sample high-
along with CUSUM for detecting OOD traces. Similarly, Sundar dimensional inputs space in an exhaustive fashion. This would
et al. [31] also propose using KL-divergence in the latent space of mean careful analysis of the training time in-distribution data to
𝛽−VAE for detection of OOD frames. Feng et al. [13] propose using come up with an effective detector which can act in real time.
KL-divergence in the horizontal and vertical latent sub-space of Additionally, it is desirable that such an alarm system produces
the 3D convolutional VAE from the specified prior for detection of interpretable behavior. It is often the case that DNNs due to their
OOD traces. The input to 3D convolutional VAE is a sequence of black-box nature do not offer an explanation to their decisions. Here,
frames (or the trace to be detected). we would like to take up the challenge of being able to point to an

226
explanation when samples are in-distribution or out-of-distribution. Where 𝐾 is the Kernel function, and ℎ is a smoothing term. The
We demonstrate this in Figure 1b, the system not only flags the kernel function measures the influence of sample 𝑠𝑖 on the query
image with the biker ahead as OOD, but selects a set of pixels de- point 𝑥. The choice of kernel, smoothing parameter, and distance
marcating the biker to communicate as to why it decided to label it function influences the quality of estimates that one obtains from
as an OOD. this method. We refer the reader to [29] for further details on KDE.
4.2.1 Abstract Kernel Centers. Standard KDE is often difficult to es-
4 PRELIMINARIES timate at runtime. Placing demands on both memory requirements
4.1 Clustering with Medoids and computational efficiency. To handle this, we use the concept of
Similar to k-means clustering, we wish to form partitions of the abstract kernel centers introduced in [26]. Instead of using the full
data into distinct groups or clusters. Clustering with k-means is a data set, the idea is to use a smaller representative set to estimate
well known tool but has its challenges when used in the context of the probability density. In order to obtain useful density estimates,
images. An issue with k-means is that it can potentially produce the first step is to identify centrally located data points called kernel
virtual cluster centers which are absent in the original data set. centers, which are able to capture the distribution closely. Let us
This is essentially because a simple mean of two (or more) images denote these kernel centers with the set M𝑟 : (𝑚 1, 𝑚 2, . . . , 𝑚𝑟 ).
might not correspond to a real image. The other issue being that The density estimates are computed using the k-nearest-neighbors
it is often susceptible to outliers in the data. Hence, we restrict (kNN) from the test point 𝑥 in the set M𝑟 . Since, the kernel centers
ourselves to partitioning around points which are present in the are identified using the partitioning algorithm mentioned above,
data. The algorithm which achieves this is PAM [3], which is short we can alternately call it the 𝑘 nearest memories as well. Suppose
for Partitioning Around Medoids. Intuitively the algorithm tries to this set of 𝑘 nearest memories is M𝑘 ⊂ M𝑟 . The weighted kernel
search for centrally located objects called medoids, and are used to density estimate [15] is given by,
define the cluster boundaries in a nearest medoid sense.
Let us assume that the set S is equipped with a distance metric 
𝑘
𝑓ˆ𝐴 (𝑥) = 𝑤 𝑗 𝐾ℎ (|𝑥 − 𝑚 𝑗 |) (3)
D : (𝑠 1, 𝑠 2 ) → R, for 𝑠𝑖 ∈ S and 𝑛 = |S|. Given a data set S, PAM
𝑗=1
tries to select a set of 𝑟 medoids - 𝑀𝑟 : {𝑚 1, 𝑚 2, . . . , 𝑚𝑟 } such that
the following cost is minimized, # points with closest memory as 𝑚 𝑗
𝑤𝑗 =
# points which have closest memory in M𝑘

𝑛
𝐶𝑜𝑠𝑡 (𝑀𝑟 ) = 𝑚𝑖𝑛 D (𝑚 𝑗 , 𝑠𝑖 ) (1) Thus, the data partitions built around the memories can now be
𝑚 𝑗 ∈𝑀𝑟
𝑖=1 used to compute the density estimation function 𝑓ˆ𝐴 . In this paper
we use the 𝐸𝑝𝑎𝑛𝑒𝑐ℎ𝑛𝑖𝑘𝑜𝑣 kernel. Since, this kernel permits us to
We assume that the inner minimization is always possible, and
demarcate a boundary around the center beyond which the distance
we are able to break ties arbitrarily among distinct members of the
function stops being meaningful. The bandwidth parameter ℎ can
set S.
be chosen depending on the application. Note that, for us the kernel
Algorithms : The challenge with PAM is that the naive implemen-
centers are not abstract as compared to [26], but in our case we
tation has a runtime complexity of 𝑂 (𝑛 2𝑟 2 ) [28]. Even though there
keep the terminology to be coherent with the original idea.
exists faster variants, but is still largely inaccessible for applications
at the scale of image data sets generated from autonomous driving
4.3 Structural Similarity Index Metric
scenarios. In order to circumvent this challenge we introduce a
variant of the Clustering Large Applications based upon Random- A fundamental challenge in dealing with images is to capture human
ized Search (CLARANS) [25] algorithm in Section 5.2. It combines perceptual similarity with a mathematically meaningful distance
randomized global search with local cluster improvement method function. Techniques like KDE necessitate the use of a distance met-
to improve the quality of clustering. The medoids identified by ric to compute the probability density. To the best of our knowledge
minimizing the objective in Equation 1, are referred to as memories the right candidate for this purpose is the structural similarity index
from here on. metric (SSIM). This was first introduced in [35], and has gained
widespread popularity. It computes the degree to which two images
are similar to a human eye, and was used to compute the degra-
4.2 Kernel Density Estimation
dation quality of the image. SSIM metric is designed to capture
Kernel Density Estimation (KDE) is a non-parametric way to es- statistical similarity between images. This makes our system more
timate the probability density function of a random variable. Let robust to random noise in comparison to vanilla DNNs. It also has
us assume that the set S = (𝑠 1, 𝑠 2, . . . , 𝑠𝑛 ) is independently and been used to capture image similarity for adversarial sticker attacks
identically drawn from a fixed but unknown distribution 𝑓 , and as well [22].
we wish to estimate the probability density for an element 𝑥. This We state the original SSIM distance function next. Assume we
according to kernel density estimation methods it is given by the have two images A1 ∈ R𝑁 and A2 ∈ R𝑁 . This allows us to compute
following equation, three terms : a luminance distortion term, a contrast distortion term,
and a correlation term.
1
𝑛
𝑓ˆ(𝑥) = 𝐾 (|𝑥 − 𝑠𝑖 |) (2) 2A¯1 A¯2 + 𝑐 1
𝑛 𝑖=1 ℎ 𝑙 (A1, A2 ) = 2 (4)
A¯1 + A¯2 2 + 𝑐 1

227
2𝑠 A 𝑠 A + 𝑐 2 point at random, and compute the distance score across all the
𝑐 (A1, A2 ) = 2 1 22 (5)
𝑠A + 𝑠A + 𝑐 2 samples in the currently RejectedSet in a single linear pass. The data
1 2
points which are similar enough are admitted as being close to a
𝑠 A ,A + 𝑐 3
𝑠 (A1, A2 ) = 1 2 (6) memory, and are not considered as candidates for new memories in
𝑠 A1 𝑠 A2 + 𝑐 3 the next iteration. We continue this process until all data points are
Where A¯1, A¯2, 𝑠 A2 , 𝑠 A2 and 𝑠 A1 ,A2 are the local mean, local vari- admitted into the set of memories 𝑀. This allows the subsequent
1 2
ance, and local covariance between A1 and A2 . The scalar terms algorithms to have a warm start. Algorithm 1 always terminates.
𝑐 1, 𝑐 2, 𝑐 3 aim to capture the saturation effects of the visual system, This is because the RejectedSet decreases by at least 1 at each step.
and provide numerical stability. The terms computed above cap- In the worst case we have as many memories as the number of data
ture the local difference in some chosen window in the image. The points. But in most practical datasets this is not the case.
combination across all such local windows gives the SSIM index.
With 𝑐 3 = 𝑐 2 /2, SSIM index can be written in the following form : 5.2 Learning Memories
𝑆𝑆𝐼𝑀 (A1, A2 ) = 𝑆 1 (A1, A2 )𝑆 2 (A1, A2 ) To restate, we are given a dataset S, with 𝑛 elements, and we wish to
𝑆 1 (A1, A2 ) = 𝑙 (A1, A2 ) compute an 𝑟 size memory set 𝑀 = {𝑚 1, 𝑚 2, . . . , 𝑚𝑟 } with certain
(7)
desirable properties. The search for memories can be simplified
𝑆 2 (A1, A2 ) = 𝑐 (A1, A2 )𝑠 (A1, A2 ) by viewing this as a search through a graph G [25] with subsets
SSIM can be implemented efficiently in tools like Pytorch [1] and S𝑟 ⊂ S as its nodes. Each subset of size 𝑟 defines a choice for the
accelerated using a GPU. This permits a scalable and efficient imple- memory set 𝑀.
mentation inside our OOD detection framework. On the downside,
Definition 5.1 (Memory Search Graph G). The undirected graph
SSIM does not have the mathematical properties to be a distance
G is represented by an ordered pair (𝑉 , 𝐸). The set of nodes 𝑉 is
metric. But with some modifications it can be turned into one. The
the collection of subsets of original dataset S𝑟 ⊂ S. An edge 𝑒 ∈ 𝐸
details of this modification and the associated proof can be found in
exists between two nodes S𝑟1 and S𝑟2 iff |S𝑟1 ∩ S𝑟2 | = 𝑟 − 1. That is,
[8]. We use the modified SSIM to define a distance metric D (𝐴1, 𝐴2 )
they differ by at most one memory.
in this paper and hyperparameters (e.g. 𝑐 1, 𝑐 2, 𝑐 3 ) are set same as in
[1]. The use of a proper distance metric for images allows us to com- Each node of the graph has an associated cost given by Equation
pute the probability density function, and subsequently distribution 1. Hence starting from some node it is possible to visit neighboring
shifts in a more meaningful way. nodes with decreasing costs in the search process. What we present
next is a combination of 𝐺𝑙𝑜𝑏𝑎𝑙 resets and 𝐿𝑜𝑐𝑎𝑙 minimization to
5 METHODOLOGY approximate the optimal choice.
5.1 Initializing the Memory Set
Algorithm 2 Generate Memories
Input: S : {𝑠 1, 𝑠 2, . . . , 𝑠𝑛 }
Algorithm 1 Generate Initial Memories
Output: Memories 𝑀 : {𝑚 1, 𝑚 2, 𝑚 3, . . . , 𝑚𝑟 }
Input: Data set S : {𝑠 1, 𝑠 2, . . . , 𝑠𝑛 } Parameter : (Max Global Steps : 𝑍𝑔 , Max Local Steps : 𝑍𝑙 ,
Output: Memories 𝑀 : {𝑚 1, 𝑚 2, 𝑚 3, . . . , 𝑚𝑟 } Distance Threshold 𝑑)
Parameter : Distance Threshold 𝑑 1: BestCost = ∞
1: 𝑀 = 𝜙 2: for 1 ≤ 𝑔 ≤ 𝑍𝑔 do
2: RejectedSet = S 3: Memory Set 𝑀 = GenerateInitialMemories(𝑆, 𝑑)
3: while RejectedSet ≠ 𝜙 do 4: 𝑣 = FindNode(𝑀, G)
4: 𝑠𝑚 = pickRandomPoint(RejectedSet) 5: G = CreateGraph(S, |𝑀 |) ⊲ The memory search graph
5: for 𝑠𝑖 ∈ RejectedSet do 6: CurrentCost = ComputeCost(𝑣)
6: if D (𝑠𝑚 , 𝑠𝑖 ) < 𝑑 then 7: for 1 ≤ 𝑙 ≤ 𝑍𝑙 do
7: RejectedSet = RejectedSet \𝑠𝑖 8: 𝑣 ′ = PickNeighbor(𝑣, G)
8: 𝑀 = 𝑀 ∪ {𝑠𝑚 } 9: NewCost = ComputeCost(𝑣 ′ )
9: return 𝑀 10: if NewCost < CurrentCost then
11: 𝑣 ← 𝑣′
12: CurrentCost ← NewCost
The intuition here is that high dimensional data like images, and 13: if CurrentCost < BestCost then
LiDAR scans generated from a real world setting, cluster well in 14: BestCost = CurrentCost
practice. Hence, the first step is to identify these broad categories in
15: return 𝑀
a quick and efficient fashion. One of the questions however is that
the number of partitions to be made is often not known apriori. But
drawing on the intuitions from an image distance metric, only small Algorithm 2 picks the eventual memories used in OOD detection.
enough distances have perceptual meaning. Thus, the intuition here Similar to standard CLARANS algorithm each node in G has 𝑟 (𝑛 −𝑟 )
is to populate the input space densely enough with memories such neighbors, where 𝑟 is the number of memories. Which can be quite
that, every training point is within a threshold distance 𝑑 of some large given the scale of modern machine learning data sets with
memory. Algorithm 1 summarizes our approach. We pick a data large 𝑛. What we do here is that start with a reasonable choice

228
Figure 2: This figure summarizes our approach. The memorization phase of the algorithm picks prototypical samples as memo-
ries. At run-time the algorithm computes kernel density estimates to assess the likelihood of a new data being in-distribution.

for initial node in G, and greedily look for local improvements for the 𝑘 nearest memories. The intuition being that for sufficiently
for a fixed number of iterations. The global search starts by using different memories computing a single distance pair can be used to
Algorithm 1, in order to generate the initial set of memories as reject other memories from further consideration.
node 𝑣 in G. Notice that we do not choose the number of memories We are interested in computing the nearest neighbor, that is
apriori but instead gets picked as a consequence of distance score 𝑑. 𝑘 = 1 in the set M S for a test point 𝑥𝑡 . Assume that we wish to
The partitioning cost for the choice of memories is computed by compute the distance between a test point 𝑥𝑡 , and some memory 𝑚 𝑗
the function 𝐶𝑜𝑚𝑝𝑢𝑡𝑒𝐶𝑜𝑠𝑡 which evaluates Equation 1. Note this and the distance D (𝑥𝑡 , 𝑚𝑖 ) is known. Then in the triangle formed
can be expensive since it needs a total of 𝑟 ×𝑛 distance computation by the triplet (𝑚𝑖 , 𝑥𝑡 , 𝑚 𝑗 ), the following two equations are true:
operations. The local search (Lines 7 − 12) implements a greedy
strategy to pick the neighborhood node which produces a descent. D (𝑚𝑖 , 𝑥𝑡 ) − D (𝑚𝑖 , 𝑚 𝑗 ) ≤ D (𝑚 𝑗 , 𝑥𝑡 )
The outer loop of the algorithm keeps track of the node with the
minimum cost for each such reset produced in line 3. Algorithm 2 and
trivially terminates, since each search proceeds for a fixed number D (𝑚𝑖 , 𝑚 𝑗 ) − D (𝑚𝑖 , 𝑥𝑡 ) ≤ D (𝑚 𝑗 , 𝑥𝑡 )
of steps.
Meaning that D (𝑚 ′, 𝑥𝑡 ) is lower bounded by : |D (𝑚𝑖 , 𝑚 𝑗 ) −
Definition 5.2 (Memory System M S ). A memory system is a D (𝑚𝑖 , 𝑥𝑡 )|. Since, if we are interested in memories which are within
collection of pairs M S := {(𝑚 1, 𝑞 1 ), (𝑚 2, 𝑞 2 ), . . . , (𝑚𝑟 , 𝑞𝑟 )}, where, a certain threshold ( say ℎ) of 𝑚 ′ , we do not actually need to compute
𝑞𝑖 = |𝑄𝑚𝑖 | the distance D (𝑚 ′, 𝑥𝑡 ) if the following equation holds True.

𝑄𝑚𝑖 = I(𝑚𝑖 = 𝑎𝑟𝑔𝑚𝑖𝑛 D (𝑠, 𝑚𝑖 )) (8) |D (𝑚𝑖 , 𝑚 𝑗 ) − D (𝑚𝑖 , 𝑥𝑡 )| > ℎ (9)
𝑠 ∈S 𝑖
For each memory 𝑚𝑖 , we can pre-compute a look-up table for
Thus the memory system M S keeps track of the number of the inter-memory distance Q : {(𝑚 𝑗 , D (𝑚𝑖 , 𝑚 𝑗 ))|1 ≤ 𝑗 ≤ 𝑙, 𝑗 ≠ 𝑖}.
points for every memory which belongs to the cluster defined by it. This can lead to reduction in the search space in practice by pruning
The OOD detection algorithm which follows uses these memories out memories from further consideration each time the distance of a
as abstract kernel centers in Equation 3 to compute the probability memory from 𝑥𝑡 gets measured. For 𝑘 > 1, similar reasoning holds.
density at a test point. The only difference being that, in this case the search algorithm
tracks the distance of the 𝑘 𝑡ℎ -memory furthest from the test point.
5.3 Scaling Memory Search
In order to compute the probability density estimates given by 5.4 Detecting Distribution Shifts
Equation 3, we need to do a linear time search through the current To summarize, we know how to go from the set of training data S
set of memories in M S . Even though the number of memories to the set of memories M S . This happens through a smart initial-
produced in Algorithm 2 might be small enough compared to the ization of the set of memories (Algorithm 1), followed by a further
full data set, S a search through the list of memories might still be refinement using a medoid based partitioning technique discussed
challenging. To remedy this potential drawback we deploy a simple in Algorithm 2. Additionally, to handle any potential slow downs,
hashing technique first introduced in [14] . The distance metric we briefly discuss how one can use the inter-memory distance to
D discussed in Section 4.3, was a proper distance metric, which prune out large parts of the search space. Thus allowing the system
implies that the distance function respects triangle inequality. In to scale to larger memory systems. We are now at a stage to discuss
what follows, we describe a possible avenue to speed up the search our runtime algorithm for detecting distribution shifts.

229
Algorithm 3 Detect Distribution Shifts (3) OOD-ness due to change in the front obstacles from cars in
the in-distribution traces to bikes in the OOD traces.
Input: Time Series Data 𝑥𝑡 , Memory System M S (4) OOD-ness due to perturbation of in-distribution frames with
Output: Distribution Shift Flags F𝑡 adversarial attack.
Parameter : Window Threshold - 𝜏, Window - 𝑊 , probability
threshold - 𝛼
Evaluation metrics: We refer to OOD traces as positive and in-
1: Frame = 𝜙
distribution as negative. We report false positive (FP) as the number
2: for 1 ≤ 𝑡 ≤ ∞ do
of in-distribution traces that were falsely detected as OOD. False
3: 𝐹𝑙𝑎𝑔𝑂𝑂𝐷 = DetectOOD(M S , 𝑥𝑡 , 𝛼)
negatives (FN) are the number of OOD traces falsely detected as in-
4: Frame ← UpdateFrame(𝐹𝑙𝑎𝑔𝑂𝑂𝐷 , 𝑊 , Frame)
distribution. We also report an average delay in the OOD detection
5: F𝑡 ← CountOOD(Frame) ≥ 𝜏
as the number of windows required to detect the start of the OOD-
6: return F𝑡 ness in the traces averaged over the total number of detected OOD
traces. We conduct all our experiments in this case study on a single
GPU (NVIDIA GeForce RTX 2080 Ti).

In practical scenarios, detecting a shift in distribution needs a 6.1 OOD-ness due to change in weather and
robust mechanism. We achieve this using a sliding window based lighting
implementation to track the number of out of distribution sam-
ples it sees. Algorithm 3 summarizes our approach. For each input Here we generate OOD traces in which OOD-ness gradually in-
sample at runtime, the function 𝐷𝑒𝑡𝑒𝑐𝑡𝑂𝑂𝐷 simply computes the creases with time. We increase the precipitation (or fog) parameter
probability estimates for a test point 𝑥𝑡 , from a memory system gradually in sequential frames of a trace to generate heavy rain (or
M S using Equation 3. Additionally it compares this probability foggy) OOD trace. Similarly, we gradually increase the darkness
density with a threshold 𝛼 to Flag a sample as OOD. The function parameter to generate night time OOD traces. Examples of these
𝑈 𝑝𝑑𝑎𝑡𝑒𝐹𝑟𝑎𝑚𝑒 keeps track of the OOD Flags in the last 𝑊 frames. OOD traces are shown in Appendix.
The algorithm outputs a distribution shift once this count cross 6.1.1 Results on OOD detection for heavy rain. We define frames
threshold 𝜏. with precipitation level greater than 20 as OODs due to heavy rain.
There are 4488 in-distribution images with precipitation parame-
6 CASE STUDY 1 - SIMULATED ter from 0 to 10. The test dataset contains 26 in-distribution and
AUTONOMOUS DRIVING SCENARIO USING 74 out-of-distribution traces. Our approach involves a few hyper-
parameters like window length 𝑊 , threshold count 𝜏, probability
CARLA
threshold 𝛼, and distance threshold 𝑑. In the current experiments,
System Description: Here we consider an advanced emergency we choose the hyperparameters empirically. In practice, the desired
braking system (AEBS) from [9]. The system overview is shown in sensitivity of the system would dictate the parameter choice. We
Figure 11 of A.2. It is a closed loop system composed of a perception report some of the top performers in Table 1. A more exhaustive
based LEC, which estimates distance of the object ahead of the ego study has been reported in Figure 10 of the Appendix.
vehicle. This distance combined with the velocity is the input to the
braking controller. The objective of the AEBS system is to brake the
(𝑾 , 𝝉 , 𝜶 , 𝒅) Mem FP FNAvg Exec
ego vehicle to avoid a collision. The controller is trained using stan-
Delay Time (ms)
dard reinforcement learning on in-distribution data. The distance
(5,5,0.92,0.2) 145 0/26 2/74 0.42 21.39
estimating LEC is trained using supervised learning techniques. For
(10,5,0.92,0.2) 145 0/26 2/74 0.04 19.80
details about the model architecture and training hypeparameters,
(5,5,0.78,0.3) 36 1/26 1/74 0 6.26
please refer to [9].
(10,5,0.78,0.3) 36 1/26 0/74 0 5.88
In-distribution data: We evaluate our approach on OOD detection Table 1: Results on heavy rain traces as OODs. 𝑀𝑒𝑚: Total
with the dataset provided in [9]. The dataset is generated using number of memories, 𝐸𝑥𝑒𝑐 𝑡𝑖𝑚𝑒: the time for OOD detection
CARLA [11], an open-source simulator for autonomous driving. on each frame (calculated on 3 random traces)
The in-distribution traces consists of daytime frames with slight
rain (i.e. precipitation level in {0, 1, . . . 10}), and with cars as the
front object. The sampling rate is 20𝐻𝑧 [9]. The experimental results shows that Algorithm 2 efficiently com-
pressed the training data (4488→ 145/36) into memories and can
Types of OODs: We evaluate our approach on the following four perform OOD detection under autonomous driving setting. With
different sources of OOD-ness in the traces. the parameters as shown in the table, the results in Table 1 indicate
(1) OOD-ness due to weather change from slight rain in the in- that our OOD detector could successfully detect traces with heavy
distribution traces to heavy rain (precipitation level greater rain precipitation and without delay. Performance of the VAE based
than or equal to 20) and foggy traces in OOD traces. OOD detector [9] is comparable to ours in terms of false positive
(2) OOD-ness due to change in the lighting conditions from day and false negative rates but their average detection delay (as re-
in the in-distribution traces to night in the OOD traces. ported) is higher than ours. We also report the average execution

230
Night traces Foggy traces (𝑾 , 𝝉 , 𝜶 , 𝒅) FP FN Avg Delay
(𝑾 , 𝝉 , 𝜶 , 𝒅) FN Avg Delay FN Avg Delay (5,5,0.92,0.2) 0/26 0/27 0
(5,5,0.92,0.2) 0/27 0.15 0/27 5 (10,5,0.92,0.2) 0/26 0/27 0
(10,5,0.92,0.2) 0/27 0.15 0/27 10 (5,5,0.78,0.3) 1/26 2/27 0.96
(5,5,0.78,0.3) 0/27 1.89 0/27 9 (10,5,0.78,0.3) 1/26 2/27 0
(10,5,0.78,0.3) 0/27 0.15 0/27 11.15 Table 3: Results on OOD traces with bikers.
Table 2: Results on night and foggy OOD traces.

times for detecting an OOD in Table 1. We observe that it is well


within the the sampling period of the system. Implying that Algo-
rithm 3 is amenable to real-time OOD detection. Also, as expected,
reducing 𝑑 in Algorithm 1 results in higher memories. But results
in slower execution times with better false positive rates.
6.1.2 Results on detection for foggy and night OODs. Table 2 shows
(a) Input image (clean) (b) Test image (adversarial
results of OOD detection on foggy and night OODs. Here we con-
sticker on the road)
sider 27 OOD traces for both settings. With the same hyperparam-
eters as in the heavy rain OOD traces, our detector is able to detect
Figure 4: OOD-ness due to adversarial road perturbations [9]
all OOD traces.
6.3 OOD-ness due to perturbations by
6.2 OOD-ness due to change in front obstacles
adversarial attack
One of the motivations for building an OOD framework is that it
In these experiments, we evaluate our approach for an adversarial
is often the case that unobserved data during training may lead to
attack detection. Again, we consider the same attack of painting
crash. The perception LEC only saw cars as front obstacles during
lines on the roads as considered in [9]. This attack was introduced
its training. At test time, ego vehicle is able to stop at a safe distance
by Boloor et al. [7] and shown that it causes the car to follow the
from the front obstacle if the obstacle is a car (Figure 3(a)). But if we
painted lines leading to a crash.
change the front obstacle from car to bike then it leads to a crash
We use the same attacked dataset from [9] which focuses on
(Figure 3(b)). We generated 27 OOD traces with different positions
Right Corner Driving case. We run our OOD detector to check
and types of bikes as front obstacles and all of these traces lead to
whether our detector could predict crash beforehand. There are
a crash with the biker.
total 105 traces for tests and 69 out of them ends with a crash. Note
that an attack prediction is only useful as long as it happens before
the actual crash. We forecast a crash when a shift in distribution
occurs. Let us call this 𝑡𝑝𝑐 , time when crash prediction is set to
𝑇 𝑟𝑢𝑒. Also, let us denote the time of actual crash by 𝑡𝑎𝑐 . A crash is
successfully predicted when 𝑡𝑝𝑐 < 𝑡𝑎𝑐 . We report our performance
on the following metrics in the context of crash:
# crash predicted successfully
True Prediction Rate (TPR) =
# crash happens
(a) Ego vehicle stopping at a (b) Shift from training distri- # no crash happens
safe distance from the lead car bution with a biker as front ob- False Prediction Rate (FPR) =
# crash predicted
at test time stacle leads to a crash at test
time # crash happens without forecast
Missed Prediction Rate (MPR) =
# crash happens
Figure 3: Illustration of safety hazard, i.e. collision due to (10)
shift in the training distribution In addition, we record the average forecast time as the average
value of 𝑡𝑎𝑐 −𝑡𝑝𝑐 , for the correctly predicted cases, and it is reported
in the number of frames. We report these numbers in Table 4
Here we also report the top performance in Table 4 using selected
6.2.1 Results. We define the OOD frame starting from time-step 20
hyperparameters according to the Figure 5. These results show that
in the traces (when the biker becomes visible to human). We use the
our methodology is also successful in adversarial trace detection at
same set of hyperparameters in Section 6.1. As shown in Table 3,
least 5 frames before the crash.
our OOD detector could successfully alarm the system before a
collision happens for all the 27 OOD traces for two hyperparameter 6.3.1 OOD detection reasoning using SSIM. As mentioned before,
settings. For the other two settings, we could not detect 2 out of 27 an advantage of our framework is that it is interpretable to a human.
OOD traces. For in-distribution data, it is simply the closest memory the test

231
(a) Match the input test im- (b) Highlight the least similar
(a) True prediction rate (b) False prediction rate age with memories in training part compared to the memory
data
Figure 5: Out-of-distribution traces detection results for de-
tecting adversarial attack on the road with different hyper- Figure 6: OOD detection reasoning for sticker detection (The
parameters test image is Figure 4b)

(𝑾 , 𝝉, 𝜶 , 𝒅) Mem TPR FPR MPR Avg Forecast


(5,5,0.05,0.5) 243 100.0 7.2 0.0 5.08 for a fixed range. Even though the nature of the input is of much
(5,5,0.1,0.5) 243 100.0 8.6 0.0 14.78 simpler nature compared to a camera, NNs with LiDAR inputs can
(5,5,0.2,0.6) 114 100.0 7.2 0.0 4.89 suffer from similar behavior when exposed to OOD scenarios. In
(5,5,0.25,0.6) 114 100.0 12.3 0.0 16.2 this section we introduce the case study involving an autonomous
Table 4: Results on adversarial sticker detection. car, discussed in [17]. Figure 8 illustrates the arrangement of the
functional blocks. The system involves a car from the F1/10 Au-
tonomous Racing Competition [4], navigating square tracks using
only LiDAR measurements to judge obstacles and make general ori-
entation decisions. The LiDAR measurements are sent to a neural
frame matches to. This happens by design due to the choice of the
network (NN) controller, which issues steering controls. It oper-
distance metric D. A more interesting case arises when a test frame
ates under a constant throttle setting for reasons discussed in [17].
is recognized as an OOD. Note that a simple way to frame the reason
The system state such as position and orientation, along with the
for an OOD would be to say - it is not similar enough to anything
surrounding environment determines the nature of scan that the
in the memory system. But, here we go a step further and try to
LiDAR receives. The NN controller is trained using standard deep
provide pixel level reasoning. This can be mined from the closest
reinforcement learning techniques like deep-deterministic policy
memory to an OOD sample, using modified D to attribute pixels
gradient (DDPG), and Twin Delay DDPG (TD3). The LiDAR scan
responsible for the dissimilarity. Additionally, in case of scattered
obtained from the system has 1081 rays ranging from −135 degrees
highlighted pixels (indicating that the test input is drawn from a
to 135 degrees, with 0 degrees being the heading of the car. Most
distribution that is very different from the training distribution),
of the controllers trained in [17] acted on a sub-sampled set of 21
our framework refrains from pixel attribution, and simply raises
LiDAR rays, which produced satisfactory performance in simula-
an alarm.
tion. This sets the number of LiDAR rays for the experiments in
Heatmap generation: Notice in Section 4.3, the SSIM value is
this paper as well.
a mean of the dissimilarity scores for all pixels. For some pairs of
images it is possible that difference is high due to a high concen-
tration of dissimilarity scores on a few pixels. Thus it is possible to 7.2 Simulation vs Reality
filter out these pixels if the distance is above a certain threshold in As we saw before, one of the challenges when it comes to deploying
its window. We attribute these pixels responsible for higher SSIM learning-enabled cyber-physical systems in the real world is the
value and highlight them in red for providing reasoning about OOD unexpected behavior caused by the sim2real gap. Even though
detection. The details about the heatmap generation algorithm are the recent literature [12, 18] has seen an explosion of interest in
provided in A.4. verifying closed-loop systems with NN controllers, verification
We demonstrate this in Figure 1b and Figure 6b. In 1b, the unrec- results make sense as long as the assumptions on the environment
ognized biker is highlighted in this OOD frame. In the adversarial hold. The NN controllers for this benchmark were trained in a
sticker experiment (7b), we can also notice that the highlighted virtual environment with exactly the same racing track, and obstacle
area contains the adversarial stickers on the road. setting. Simulations are a useful and rich source of training data
when it comes to deep reinforcement learning approaches. However,
7 CASE STUDY 2 - DRIVING WITH LIDAR the downside is that the aberrations arising in the real world can
cause the system to go berserk. In the current setting this aberration
7.1 System Description comes from the presence of reflective surfaces as shown in Figure 7.
LiDAR forms a fundamental component for a large section of self- This introduces a large source of uncertainty. A LiDAR ray reflected
driving car hardware, and is a reliable fall back option when it away from a highly reflective surface, takes longer time to return
comes to situations where camera is not enough. LiDAR simply to the on board detectors. Which ends up giving a false impression
computes the distance of the closest obstacle in specific angles of no obstacle in that angle. This is hard to model since surface

232
Figure 7: Left : We show a setting where the car should take a right turn on an L-shaped track. Middle : The dots show the
distance estimates as provided by the sensor. It matches well with the position of the obstacles. Right : Due to reflection from
the left wall, it gives a false impression of no obstacle to the left of the car when deployed in the real world.

the simulator for the 12 different controllers. These include LiDAR


scans over the length of 70 time steps. Which is approximately
the number of time-steps the system takes to reach from one end
of the track to the other. Note that the controllers were trained
well enough during simulation that none of the traces show a
crash. In this experiment, we create a 2-dimensional data array by
repeating the 1-dimensional measurement. The distance metric for
OOD detection is the same SSIM metric applied to a LiDAR scan.
The detection of distribution shift is implemented using Algorithm
3. We report our performance on the same metrics as mentioned
Figure 8: Functional blocks in the F1/10 Autonomous Car. Equation 10, in Table 5 for different choices of the parameters. In
the best case we were able to predict 82.1% of the crashes with 22.7%
false positive rate and ≈ 9 time steps ahead. The missed predictions
reflectivity is largely unknown. In Figure 7, this happens at the rate ≈ 10%.
left corner of L-shaped track. This creates a false impression of no
obstacle to the left of the car. The ground-truth is hard to guess (𝑾 , 𝝉 , 𝜶 , 𝒅) TPR FPR MPR Avg Forecast
just from the LiDAR inputs. But a crash could have been avoided if (40,15,0.05,0.3) 80.36 19.35 10.71 9.69
the car had switched to a safe mode, or raised an alarm ahead of (40,17,0.1,0.3) 82.14 22.73 8.93 9.8
time. In this case study, we focus on the ability to detect such OOD (40,11,0.05,0.2) 80.36 22.22 12.5 12.67
scenarios. The right course of action after detecting such a shift is (40,15,0.1,0.2) 80.36 21.88 10.71 10.4
context dependent and is beyond the scope of this work. Table 5: Results for LiDAR data

7.3 Predicting Crash


The authors in [17] report the presence of reflections as being
correlated with an actual crash. Additionally, they show that getting
rid of the reflections artificially can lead to safer outcomes. Hence,
our hypothesis here is that, crash could be due to the potential
deviation from in-distribution data, which is from simulations and
does not contain any reflections. The intuition being, if an NN
controller has experienced reflections during training time, then it
would have known the right course of action. The LiDAR scans with
reflected rays could therefore be treated as out-of-distribution data.
Notice that in this case study, there is no clear distinction when (a) True prediction rate vs (b) False prediction rate vs
things start becoming OOD. What we have instead is real crash data. threshold (𝑊 = 40) threshold (𝑊 = 40)
We have access to the time-stamped LiDAR scan log for each such
run of the system along the L-shaped trajectory in Figure 7. The data Figure 9: OOD detection results for detecting LiDAR crash
set S here, is a set of trajectories {𝑇1,𝑇2, . . . ,𝑇𝑛 }. Each trajectory with different hyperparameters
is a list of time-stamped LiDAR scans, 𝑇𝑖 = {(𝑥 1, 𝑝 1 ), (𝑥 2, 𝑝 2 ), . . . },
where 𝑥𝑖 ∈ R𝑞 is the LiDAR scan at time 𝑖, and 𝑝𝑖 is a flag variable
for crash. What we wish to test here is whether a detector for
8 CONCLUSION
distribution shift is a good predictor for a future crash.
Results In order to simulate a crash prediction setting, we run OOD detection can be of utmost importance in ensuring safety
our OOD detector for each LiDAR scan in the trace 𝑇𝑖 starting of cyber-physical systems equipped with learning enabled com-
from 𝑖 = 0. The in-distribution data here is obtained by running ponents. What we have achieved to demonstrate in this paper, is

233
that state of the art results in OOD detection for self-driving car [16] Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and
out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136
applications, can go hand in hand with overall interpretablity, with- (2016).
out compromising on execution times. In the future, we would like [17] Radoslav Ivanov, Taylor J. Carpenter, James Weimer, Rajeev Alur, George J. Pap-
to extend this technique on applications beyond self-driving cars pas, and Insup Lee. 2020. Case Study: Verifying the Safety of an Autonomous
Racing Car with a Neural Network Controller. In Proceedings of the 23rd Inter-
where anomalous inputs are challenging to handle. national Conference on Hybrid Systems: Computation and Control (Sydney, New
South Wales, Australia) (HSCC ’20). Association for Computing Machinery, New
York, NY, USA, Article 28, 7 pages. https://doi.org/10.1145/3365365.3382216
9 ACKNOWLEDGEMENT [18] Radoslav Ivanov, James Weimer, Rajeev Alur, George J. Pappas, and Insup
This work was supported in part by ARO W911NF-20-1-0080, AFRL Lee. 2019. Verisig: Verifying Safety Properties of Hybrid Systems with Neu-
ral Network Controllers. In Proceedings of the 22nd ACM International Confer-
and DARPA FA8750-18-C-0090, NSF-1915398, NSF-2125561 and ence on Hybrid Systems: Computation and Control (Montreal, Quebec, Canada)
SRC Task 2894.001. Any opinions, findings and conclusions or rec- (HSCC ’19). Association for Computing Machinery, New York, NY, USA, 169–178.
https://doi.org/10.1145/3302504.3311806
ommendations expressed in this material are those of the authors [19] Ramneet Kaur, Susmit Jha, Anirban Roy, Sangdon Park, Edgar Dobriban, Oleg
and do not necessarily reflect the views of the Air Force Research Sokolsky, and Insup Lee. 2022. iDECODe: In-distribution Equivariance for Confor-
Laboratory (AFRL), the Army Research Office (ARO), the Defense mal Out-of-distribution Detection, Association for the Advancement of Artificial
Intelligence. arXiv:2201.02331 [cs.LG]
Advanced Research Projects Agency (DARPA), or the Department [20] Ramneet Kaur, Susmit Jha, Anirban Roy, Sangdon Park, Oleg Sokolsky, and Insup
of Defense, or the United States Government. Lee. 2021. Detecting OODs as datapoints with High Uncertainty. arXiv preprint
Additionally, we would like to thank Radoslav Ivanov, postdoc- arXiv:2108.06380 (2021).
[21] Rikard Laxhammar and Göran Falkman. 2015. Inductive conformal anomaly
toral scholar from the University of Pennsylvania for discussions detection for sequential detection of anomalous sub-trajectories. Annals of
on the LiDAR experiments and sharing the data. We also especially Mathematics and Artificial Intelligence 74, 1 (2015), 67–94.
[22] Juncheng Li, Frank R. Schmidt, and J. Zico Kolter. 2019. Adversarial camera
appreciate Feiyang Cai, PhD candidate from Vanderbilt University stickers: A physical camera-based attack on deep learning systems. CoRR
for sharing the CARLA dataset and simulation code. abs/1904.00759 (2019). arXiv:1904.00759 http://arxiv.org/abs/1904.00759
[23] David Macêdo, Tsang Ing Ren, Cleber Zanchettin, Adriano LI Oliveira, and Teresa
Ludermir. 2021. Entropic out-of-distribution detection. In 2021 International Joint
REFERENCES Conference on Neural Networks (IJCNN). IEEE, 1–8.
[1] [n. d.]. pytorch-msssim. https://pypi.org/project/pytorch-msssim/ [24] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2015.
[2] [n. d.]. Toyota Safety Sense. https://www.toyota.com/safety-sense/ DeepFool: a simple and accurate method to fool deep neural networks. CoRR
[3] 1990. Partitioning Around Medoids (Program PAM). Chap- abs/1511.04599 (2015). arXiv:1511.04599 http://arxiv.org/abs/1511.04599
ter 2, 68–125. https://doi.org/10.1002/9780470316801.ch2 [25] R.T. Ng and Jiawei Han. 2002. CLARANS: a method for clustering objects for
arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470316801.ch2 spatial data mining. IEEE Transactions on Knowledge and Data Engineering 14, 5
[4] 2021. F1TENTH. https://f1tenth.org (2002), 1003–1016. https://doi.org/10.1109/TKDE.2002.1033770
[5] Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V. Dillon, Rif A. Saurous, and [26] Xiao Qin, Lei Cao, Elke A. Rundensteiner, and Samuel Madden. 2019. Scalable
Kevin Murphy. 2017. An Information-Theoretic Analysis of Deep Latent-Variable Kernel Density Estimation-based Local Outlier Detection over Large Data Streams.
Models. CoRR abs/1711.00464 (2017). arXiv:1711.00464 http://arxiv.org/abs/1711. In Advances in Database Technology - 22nd International Conference on Extending
00464 Database Technology, EDBT 2019, Lisbon, Portugal, March 26-29, 2019. 421–432.
[6] Michele Basseville, Igor V Nikiforov, et al. 1993. Detection of abrupt changes: [27] Shreyas Ramakrishna, Zahra Rahiminasab, Gabor Karsai, Arvind Easwaran, and
theory and application. Vol. 104. prentice Hall Englewood Cliffs. Abhishek Dubey. 2021. Efficient Out-of-Distribution Detection Using Latent
[7] Adith Boloor, Karthik Garimella, Xin He, Christopher Gill, Yevgeniy Vorobey- Space of 𝛽 -VAE for Cyber-Physical Systems. arXiv preprint arXiv:2108.11800
chik, and Xuan Zhang. 2020. Attacking vision-based perception in end-to-end (2021).
autonomous driving models. Journal of Systems Architecture 110 (2020), 101766. [28] Erich Schubert and Peter J. Rousseeuw. 2018. Faster k-Medoids Clustering:
https://doi.org/10.1016/j.sysarc.2020.101766 Improving the PAM, CLARA, and CLARANS Algorithms. CoRR abs/1810.05691
[8] Dominique Brunet, Edward R. Vrscay, and Zhou Wang. 2012. On the Mathemat- (2018). arXiv:1810.05691 http://arxiv.org/abs/1810.05691
ical Properties of the Structural Similarity Index. IEEE Transactions on Image [29] B.W. Silverman. 2018. Density Estimation for Statistics and Data Analysis. 1–175
Processing 21, 4 (2012), 1488–1499. https://doi.org/10.1109/TIP.2011.2173206 pages. https://doi.org/10.1201/9781315140919
[9] Feiyang Cai and Xenofon Koutsoukos. 2020. Real-time out-of-distribution detec- [30] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside
tion in learning-enabled cyber-physical systems. In 2020 ACM/IEEE 11th Interna- Convolutional Networks: Visualising Image Classification Models and Saliency
tional Conference on Cyber-Physical Systems (ICCPS). IEEE, 174–183. Maps. CoRR abs/1312.6034 (2014).
[10] Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and Jonathan [31] Vijaya Kumar Sundar, Shreyas Ramakrishna, Zahra Rahiminasab, Arvind
Su. 2019. This Looks Like That: Deep Learning for Interpretable Image Recog- Easwaran, and Abhishek Dubey. 2020. Out-of-distribution detection in multi-
nition. In NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, Hanna M. label datasets using latent space of 𝛽 -vae. In 2020 IEEE Security and Privacy
Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché Buc, Edward A. Workshops (SPW). IEEE, 250–255.
Fox, and Roman Garnett (Eds.). 8928–8939. http://papers.nips.cc/paper/9095- [32] Jihoon Tack, Sangwoo Mo, Jongheon Jeong, and Jinwoo Shin. 2020. Csi: Novelty
this-looks-like-that-deep-learning-for-interpretable-image-recognition detection via contrastive learning on distributionally shifted instances. arXiv
[11] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen preprint arXiv:2007.08176 (2020).
Koltun. 2017. CARLA: An Open Urban Driving Simulator. In Proceedings of the [33] Ashish Tiwari, Bruno Dutertre, Dejan Jovanović, Thomas de Candia, Patrick D
1st Annual Conference on Robot Learning. 1–16. Lincoln, John Rushby, Dorsa Sadigh, and Sanjit Seshia. 2014. Safety envelope
[12] Souradeep Dutta, Xin Chen, Susmit Jha, Sriram Sankaranarayanan, and Ashish for security. In Proceedings of the 3rd international conference on High confidence
Tiwari. 2019. Sherlock - A Tool for Verification of Neural Network Feedback networked systems. 85–94.
Systems: Demo Abstract. In Proceedings of the 22nd ACM International Conference [34] Vladimir Vovk, Ilia Nouretdinov, and Alexander Gammerman. 2003. Testing
on Hybrid Systems: Computation and Control (Montreal, Quebec, Canada) (HSCC exchangeability on-line. In Proceedings of the 20th International Conference on
’19). Association for Computing Machinery, New York, NY, USA, 262–263. https: Machine Learning (ICML-03). 768–775.
//doi.org/10.1145/3302504.3313351 [35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image Quality
[13] Yeli Feng, Daniel Jun Xian Ng, and Arvind Easwaran. 2021. Improving Varia- Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on
tional Autoencoder based Out-of-Distribution Detection for Embedded Real-time Image Processing 13, 4 (April 2004), 600–612. https://doi.org/10.1109/TIP.2003.
Applications. ACM Transactions on Embedded Computing Systems (TECS) 20, 5s 819861
(2021), 1–26. [36] Ev Zisselman and Aviv Tamar. 2020. Deep residual flow for out of distribution
[14] K. Fukunaga and P.M. Narendra. 1975. A Branch and Bound Algorithm for detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Computing k-Nearest Neighbors. IEEE Trans. Comput. C-24, 7 (1975), 750–753. Pattern Recognition. 13994–14003.
https://doi.org/10.1109/T-C.1975.224297
[15] Francisco José Gisbert. 2003. Weighted samples, kernel density estimators and
convergence. Empirical Economics 28 (02 2003), 335–351. https://doi.org/10.1007/
s001810200134

234
A APPENDIX Night dataset: the OOD frame starts from the 10th frame. The
average length of night episodes is 123 frames.
A.1 Hyperparameter experiments

(a) False Positive Rate (b) False Negative Rate

Figure 10: Out-of-distribution episode detection results for Figure 13: Example sequence in Night Dataset (we gradually
detecting OODs due to heavy rain increase the darkness parameter)

A.2 Overview of the AEBS system A.4 Heatmap Generation Algorithm


As mentioned in section 6.3.1, in addition to use the SSIM value to
indicate whether the test frame is similar to each memory, we can
also compute the contribution of each corresponding pixel to the
SSIM distances using 𝐶𝑜𝑚𝑝𝑢𝑡𝑒𝐹𝑢𝑙𝑙𝑆𝑆𝐼𝑀. 𝐶𝑜𝑚𝑝𝑢𝑡𝑒𝐹𝑢𝑙𝑙𝑆𝑆𝐼𝑀 com-
putes and returns the local differences for individual pixels between
two images. By highlighting the pixels with high contribution in
the heatmap, we can visualize the most dissimilar parts between
the test frame and its closest memory.

Algorithm 4 Heatmap Generation


Input: Time Series Data 𝑥𝑡 ∈ R𝑚×𝑛 , Closest Memory 𝑚𝑐
Output: Heatmap 𝑥𝑡′ ∈ R𝑚×𝑛
Parameter : Color Distance Threshold 𝑑𝑐𝑜𝑙𝑜𝑟
1: Instantiate 𝑥𝑡′ ← 𝑥𝑡
Figure 11: Closed loop of the AEBS from [9] 2: 𝐷𝑥 ∈ R𝑚×𝑛 ← ComputeFullSSIM(𝑥𝑡 ,𝑚𝑐 )
3: for 1 ≤ 𝑚 ′ ≤ 𝑚 do
4: for 1 ≤ 𝑛 ′ ≤ 𝑛 do
A.3 OODs Data Set Case Study 1 5: if 𝐷𝑥 [𝑚 ′, 𝑛 ′ ] > 𝑑𝑐𝑜𝑙𝑜𝑟 then
6: 𝑥𝑡′ [𝑚 ′, 𝑛 ′ ] ← PaintPixel(𝑥𝑡 [𝑚 ′, 𝑛 ′ ])
Fog dataset: the OOD frame starts from the 1st frame. The average
length of foggy episodes is 123 frames. 7: return 𝑥𝑡′

Figure 12: Example sequence in Fog Dataset (we gradually


increase the level of fog)

235

You might also like