When does poor governance presage biosecurity risk?
Stephen E. Lane1 , Tony Arthur2 , Christina Aston2 , Sam Zhao2 , and Andrew P. Robinson1
arXiv:1702.04052v1 [stat.AP] 14 Feb 2017
1 Centre
of Excellence for Biosecurity Risk Analysis, University of Melbourne, Parkville,
Victoria 3010, Australia, lane.s@unimelb.edu.au
2 Department of Agriculture and Water Resources, Canberra, Australian Capital Territory
2601, Australia
February 15, 2017
Abstract
Border inspection, and the challenge of deciding which of the tens of millions of consignments that arrive
should be inspected, is a perennial problem for regulatory authorities. The objective of these inspections is
to minimise the risk of contraband entering the country. As an example, for regulatory authorities in charge
of biosecurity material, consignments of goods are classified before arrival according to their economic tariff
number (Department of Immigration and Border Protection, 2016). This classification, perhaps along with
other information, is used as a screening step to determine whether further biosecurity intervention, such
as inspection, is necessary. Other information associated with consignments includes details such as the
country of origin, supplier, and importer, for example.
The choice of which consignments to inspect has typically been informed by historical records of intercepted material. Fortunately for regulators, interception is a rare event, however this sparsity undermines
the utility of historical records for deciding which containers to inspect.
In this paper we report on an analysis that uses more detailed information to inform inspection. Using
quarantine biosecurity as a case study, we create statistical profiles using generalised linear mixed models
and compare different model specifications with historical information alone, demonstrating the utility
of a statistical modelling approach. We also demonstrate some graphical model summaries that provide
managers with insight into pathway governance.
1 Introduction
Efficient and effective border biosecurity strategies are essential for protecting ecosystems and economies
from invasive pests. The annual cost of invasive species generally is estimated to be over USD$200bn
(Pimentel, 2011) in the United States, and at least USD$4bn in Australia (Sinden et al., 2005). In Australia,
the Department of Agriculture and Water Resources (the department) is both the regulatory authority and
the inspectorate for biosecurity protection, carrying out both pre-border and border intervention on a range
of imported goods, based on the risk profile of the goods and international agreements. The objective of
these interventions is to minimise the risk of biosecurity risk material (BRM) entering the country.
Here, we focus on border inspection and the challenge of deciding which of the tens of millions of
consignments that arrive should be inspected. Before arrival, consignments of goods are classified according to their economic tariff number (Department of Immigration and Border Protection, 2016), and this
classification is used, with other information, as a screening step to determine whether further biosecurity
intervention, such as inspection, is necessary. Other information associated with consignments includes
details such as the country of origin, supplier, and importer, for example.
Border inspection for quarantine biosecurity is carried out for a number of reasons, namely (i) to verify the
effectiveness of mandated pre-arrival treatments; (ii) to detect and intercept BRM; (iii) to provide information
about the intrinsic contamination rate of the activity; and (iv) to deter potential malefactors. As noted above,
tens of millions of consignments arrive every year, so the challenge is to determine which should be inspected.
1
We define a pathway as a collection of activities that culminates in the arrival to Australia of a set of
alike consignments. The pathways are hierarchical, so we may consider a pathway of all consignments of
a commodity, or all consignments of that commodity for a specific country, or even for a specific supplier.
For example, the plant product pathway, which is the focus of this article, includes goods such as kiwi fruit
and cashew nuts, which can themselves be considered pathways. Statistically, pathways can be thought of
as processes.
Pathways can be classified as either high risk or a low risk, based on the probability that a consignment
contains BRM, called the approach rate. For example, in Australia, kiwi fruit is a high-risk plant product
pathway, with an approach rate of 55.8%, whereas cashew nuts is a low-risk plant product pathway, with
an approach rate of 1.3%. Importantly, the degree of severity of the detected BRM in cashews has been
identified as very low. The risk severity classification is important because the department may apply
different interventions to low-risk than to high-risk pathways, as discussed below. The identification of
pathways as high or low risk is called profiling, and is an essential step in the efficient management of
biosecurity intervention.
Traditionally, profiling has been applied by using records of interception of regulated pests on the
pathway. This application is based on the assumption that future biosecurity compliance can be predicted
by past biosecurity compliance, at least for some periods in the past and the future. However, interception
of regulated pests is a rare event, which is good news from the point of view of biosecurity protection, but
makes profiling more difficult, especially in sparse pathways, because reliable estimates of pathway risk are
hard to obtain. This observation motivates the following question: whether future biosecurity compliance
can be predicted by other characteristics as well as by past biosecurity compliance.
Historically, all consignments of imported plant product pathways were subjected to mandatory inspection. As part of a comprehensive review of Australia’s biosecurity system, Beale et al. (2008) recommended
establishing a science-based system for managing biosecurity issues, noting that zero risk is both unattainable
and undesirable. With the full inspection strategy, pathways that have lower approach rate cost considerably
more inspection effort to intercept BRM. For example, 4623 consignments were inspected in the cashew
pathway over four years, of which BRM was detected in 59, so the average number of inspections per
detection (IPD) was about 78, compared to about 2 for the kiwi fruit pathway.
We now introduce the case study that motivates the research. The inspection work flow of imported plant
product pathways comprises three components, namely: suppliers that export plant products, importers
that import the products from suppliers, and border inspections that attempt to detect as much as possible
BRM. Inspections at the border can be stratified by supplier or importer, that is, unique inspection regimes
may be applied to individual importers or suppliers.
The department currently uses the continuous sampling plan (CSP) algorithm, specifically, CSP-3, to
manage the biosecurity risk of low-risk pathways (Dodge, 1943; Dodge and Torrey, 1951). The CSP family
of algorithms allocate intervention effort within pathways according to recent inspection history. The
department has implemented CSP-3 for the inspection of a range of low-risk pathways, including dried
apricots, green coffee beans, raisins, cashews and some nuts. This particular approach to profiling has been
shown to result in reductions of both leakage (how much BRM is missed in the inspection process) and IPD
relative to random sampling plans (Robinson et al., 2012, 2013; Arthur et al., 2013).
A wrinkle in the application of the CSP algorithm is that although it is implicit that the analysis of
inspection history would take account of only the kinds of contamination that are of specific regulatory
interest, in fact, any aspect of the inspection history can be used as an indicator of future risk. That is,
although the department may be specifically concerned about intercepting regulated pests, the inspection
history provides a much richer view of the pathway because it includes information about other incidents,
such as the interception of non-regulated pests, failures of documentation, and so on, which may arguably
and testably be related to the chances of failure types that are of regulatory concern. The question that
motivated this study is: what data provide the most useful information about the pathway: the relatively
sparse history of interception of regulated pests, or the more complete picture of the relative performance
on the pathway, or some combination? Furthermore, can insight into the future performance in a given
pathway be provided by information about historical performance in other, possibly related pathways?
This paper reports an analysis of the use of auxiliary information to try to improve profiling. The objective
is to distinguish high-risk and low-risk pathways, where risk refers to the interception rate of regulated pests,
based on a range of characteristics of the pathway, including the interception rate of regulated pests, non-
2
regulated pests, administrative failures, and supplier and tariff information. We aim to form a picture of the
governance of the pathway and use that picture as a basis for assessing their relative biosecurity risk. The
balance of the paper is organized as follows. In the next section we introduce the dataset and the models
used to test our conjectures. We then present the results and a discussion and conclusion.
2 Materials and methods
We used a number of analytical approaches to assess the conjecture. First, we tested the association between
non-regulated pest and administrative (more frequent, low severity) failures and regulated pest inspection
(less frequent, high severity) failures. If such an association was found, then we reasoned that historical
governance-related variables may be used to predict future high-severity biosecurity failures. Second,
we used historical failure rates to create profiles, and investigated performance using Receiver Operating
Characteristic (ROC) curves. Lastly, we constructed statistical models that would predict future regulated
pest interception probabilities as a function of previous regulated pest interception probabilities and other,
governance-related predictor variables.
All data preparation and modelling were performed using R Version 3.3.0 (R Core Team, 2016) with the
generalised additive mixed models of Section 2.4 using R package rstanarm (Gabry and Goodrich, 2016).
2.1
Data
The data for the analysis comprise the inspection history for all consignments classified as fruit (Chapter 8,
Department of Immigration and Border Protection, 2016) that arrived between January 2007 and December
2011, a period of five years. The pathway is a complex one, comprising 80 different tariff codes, 3150 unique
importers, and 3655 unique suppliers from 127 countries. For the purposes of this study we will assume
that all significant biosecurity contamination has been captured by the regulatory border inspection. There
were approximately 48300 inspections of more than 75000 goods. Approximately 5300 inspections resulted
in interception of a regulated pest, 8500 inspections resulted in interception of a non-regulated pest, and
5900 inspections recorded some administrative failure.
For modelling (see Section 2.4), we aggregated the data by year, tariff and supplier. This aggregation was
done for two reasons: first, it allowed us to create models that account for both supplier and tariff effects,
and second, aggregating by year limits the effects of any seasonality. We use interceptions to refer to both
interceptions of pests and administrative failures throughout the study. An appropriately formatted dataset
for modelling was constructed as follows:
For each year y within 2008 to 2011;
• Compute interception/fail rates for year y − 1 by tariff, supplier, and year for:
– Administrative interceptions
– Non-regulated pest interceptions
– Regulated pest interceptions
We denote by X st y the number of interceptions out of n st y inspections from tariff t performed in year y
from supplier s. Correspondingly, each inspection has a probability p st y of being intercepted in one of the
ways listed above. Then X st y was modelled as
d
X st y Binomial(p st y , n st y )
Computing interception rates by tariff, supplier and year sometimes resulted in very small binomial
denominators, due to the sparse history of inspection and interceptions produced. For this reason, rather
than raw interception rates, we calculated smoothed interception rates using parametric empirical Bayes
(Robinson et al., 2015). In particular, we used the Beta-binomial model to smooth interception rates for
suppliers within tariffs and years; we provide the full details in Appendix A.
3
2.2
Association between low-severity and high-severity interceptions
To investigate the association between low-severity and high-severity interceptions, we calculated log-odds
ratios and 95% confidence intervals (using a normal approximation for the log-odds) for the odds of a
regulated-pest interception for consignments with or without non-regulated pest or administrative interceptions.
For each inspected consignment, suppose Y denotes the outcome of inspection, so that Y 1 indicates a
regulated pest was intercepted. Further, let X denote whether the consignment contained a non-regulated
pest (or had an administrative failure), so that X 1 indicates the consignment contains a non-regulated
pest (or had an administrative failure). The log-odds ratio is
Pr(Y 1|X 1)/Pr(Y 0|X 1)
.
log OR log
Pr(Y 1|X 0)/Pr(Y 0|X 0)
2.3
Profiling using annual inspection data
We created profiles using annual inspection data, and compared performance using ROC curves. For
each year y in 2007 to 2010 and each kind of interception rate (regulated pest, non-regulated pest and
administrative) we: compute ROC curves against year y + 1 biosecurity inspection outcomes for regulated
pests and further, calculate the area under the curve (AUC).
We also computed ROC curves within tariff, due to the suspicion that the tariff-to-tariff variation would
dominate the ROC signal, due to the differences of interception rates between the tariffs, rather than for the
importers within the tariffs. That is, if we only ran the profiles across the tariffs then a naive assessment of
the performance would look very good because we would expect the differences between the risks of the
tariffs to be reasonably stable from year to year. Hence, assessing the model within tariffs provides a more
reasonable assessment.
2.4
Profiling using statistical modelling
We chose to construct models usinggeneralised additive mixed model formulations with the linear predictors
for the logit probability, log
p st y
1−p st y
specified as:
Base : β0 + γs + τt
M1 : β 0 + γs + τt + α y
M2 : β 0 + γs + τt + α y + ϕ st
M3 : β 0 + γs + τt + α y + ϕ st + κ s y
M4 : β 0 + γs + τt + α y + ϕ st + κ s y + b 3 (p R,st(y−1) )
M5 : β 0 + γs + τt + α y + ϕ st + κ s y + b 3 (p N,st(y−1) )
M6 : β 0 + γs + τt + α y + ϕ st + κ s y + b 3 (p A,st(y−1) )
where: β 0 is a fixed process constant to be estimated; γs is a supplier-level effect; τt is a tariff-level effect;
α y is a effect for year of interception; ϕ st is a effect for the supplier-tariff cross-classification; κ s y is a effect
for the supplier-year cross-classification; b3 (·) represent cubic regression splines for the previous year’s
regulated pest interception rate p R,st(y−1) , non-regulated pest interception rate p N,st(y−1) , and administrative
interception rate p A,st(y−1) . Bayesian logistic regression models were fit using rstanarm (Gabry and Goodrich,
2016). We used student-t priors for all coefficients, setting the scale for the intercept prior at 10, and for all
other coefficients at 2.5.
To be more descriptive, (M4) tests whether historical regulated pest interception rates can be used to
predict future regulated pest interception probability, whilst (M5) and (M6) test effect of historical nonregulated pest and administrative interception rates on probability of future regulated pest interception.
Comparison of the statistical profiling results was made via a combination of: LOOIC comparisons
(Vehtari et al., 2016), and predictive log-likelihood via repeated five-fold cross-validation. LOOIC is similar
to AIC in that it estimates out-of-sample prediction accuracy; however, LOOIC integrates over uncertainty
4
in the parameters, and does not assume multivariate normality as the AIC does. We used 20 repeats,
resulting in 100 training/testing datasets for comparison. To ensure balance across the datasets, sampling
was performed within years. All models from Section 2.4 were fit to each training dataset, and predictions
made on the testing datasets.
3 Results
We present the results in three sections: the association between low-severity and high-severity interceptions;
the operational AUC tests and the statistical modelling results. We finish this section with an in-depth look
at the information gained from the modelling procedures.
3.1
Association between low-severity and high-severity interceptions
Figure 1 shows the log-odds ratios, along with 95% confidence intervals for the association between regulated
pest (high-risk) interceptions and non-regulated pest and administrative (low-risk) interceptions both overall
and by year. All estimates and lower bounds of the confidence intervals are well above 0, showing there is a
large association between low- and high-risk interceptions.
5
●
●
●
●
●
●
Log−odds
4
3
●
●
●
●
●
Administrative
Non−regulated pest
●
●
●
2
1
0
2007
2008
2009
2010
2011
overall
Level of aggregation
Figure 1: Estimates (95% confidence intervals) of the log-odds ratios between low- and high-risk interceptions
overall and by year (2007–2011). The log-odds ratios are calculated between regulated pest (high-risk) interceptions, and non-regulated pest and administrative (low-risk) interceptions.
3.2
Comparison of profiles using annual inspection data
3.2.1
Across tariffs
Figure 2 presents ROC curves that compare how well the different profiles perform. As per Section 2.3, the
profiles are generated from the previous year’s interception rates. All profiling approaches are substantially
5
better than random, and the administrative profile consistently led to the weakest performance across each
year. We have also shown the performance from a combined profile in Figure 2; this is simply the profile
using interception rates calculated from a variable indicating if any of the interception types occur. Clearly,
the combined interception profile offers little performance over the regulated-pest profile.
The profiles derived from non-regulated pest and administrative interception rates were consistently
slightly worse than those based on regulated pest interception rates. Table 1 presents the AUC values for
each of the curves presented in Figure 2. The values are consistently close to 1, which suggests that the
relative interception rates are very stable from year to year, and that the interception rates for each year y are
a very good indicator for year y + 1. We also derived profiles using data without empirical Bayes smoothing
(see Section 2.1), however these profiles underperformed compared to the profiles using empirical Bayes
smoothing. The results without empirical Bayes smoothing are shown in Appendix B, Table B1.
2008
2009
1.00
0.75
0.50
True positives
0.25
Profile
Regulated pest
profile
Non−regulated pest
profile
Administrative
profile
Combined profile
0.00
2010
2011
1.00
0.75
0.50
0.25
00
1.
75
0.
50
0.
25
0.
00
0.
00
1.
75
0.
50
0.
25
0.
0.
00
0.00
False positives
Figure 2: ROC curves showing the performance of four profiling strategies for four years of data (2007–2011).
The profiles are constructed by tariff and importer using the previous year’s inspection data. A line is added at
x y to facilitate comparison.
3.2.2
Within tariffs
As noted in Section 2.3, our suspicion was that tariff-to-tariff variation would dominate the ROC signal,
evidence for which was supported via modelling (Table 2). Figure 3 plots the AUCs arising from the
regulated pest profile vs. the AUCs arising from the non-regulated pest, administrative and combined
profiles respectively, within tariff. Each point represents an ROC curve applied to a single tariff, where the
entities within the tariff that are being profiled are the suppliers. The size of each point indicates the number
of regulated pest interceptions in the tariff, providing a sense of importance of that tariff.
A relationship between the number of regulated pest interceptions and AUC is not apparent in Figure 3;
we would expect larger points in the top right corner if this were the case. However, within-tariff variation
is considerable, providing a measure of conservatism against the strong performance of the across-tariff
6
Table 1: Summary of AUC values for profiling strategies, by year. The profiles are as follows: Regulated pest refers
to using the previous year’s regulated pest interception rate; Non-regulated pest refers to using the previous year’s
non-regulated pest interception rate; Administrative refers to using the previous year’s administrative interception
rate; and Combined refers to using the previous year’s combined interception rate. Each AUC is computed using
the data from the following year’s inspections.
Profile
2008
2009
2010
2011
Regulated pest
Non-regulated pest
Administrative
Combined
0.902
0.870
0.743
0.859
0.929
0.909
0.815
0.896
0.935
0.892
0.803
0.890
0.939
0.920
0.813
0.905
comparisons shown in Table 1. This suggests that any profiling undertaken would need to take account of
the tariff being profiled.
Correlation between the regulated pest profile and the other profiling strategies appears strong, especially
between the non-regulated pest and combined profiles. The administrative profile results however, show that
many of the regulated pest AUCs lie above the y x line. This suggests that the administrative profiles are
likely to perform worse than the non-regulated pest profiles. This observation is supported by the findings
of the statistical modelling (Table 2), where the model that included non-regulated pest interception rates
performed better than the model including administrative interception rates.
Administration profile
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
6
5
0
0.
9
●
0.
0.
8
●
●●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
7
●
●
●
●●
●
●
●
●
●
●
●
7
●
●
●
●
●
0.
●
●
●
●
5
●
●
0
●
●●
●
7
6
0.
5
0.
●
●
0.
●
●●
●
●
●
●
●
●
2011
●
●
●
●
9
●
●
●
●●
0.
●
●●
●
●
●
●
●●
●
●
●
●
8
●
●
●
●
●
●
●●
●
●
●
●
●
●
0.
●
●
●
0.
●
●
●
●
●
●
●
●●
●
●
●
●
1.
●
●
●
●
●
●
●
●
0.
●
●
●●
●
2010
●
0
100
● 200
● 300
● 400
●
●
● ●
Fails
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ● ●
●
●
●
●
●
9
●
●
●
0.
●
●
2009
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
0.
●
●
●
●
●●
0
●●
●
●
●●
●
●
●●
●
●
●
●
●
1.
●
●
●
●
●
●
●
●
●
8
●
●
●
●
●
0.
●●
●
●
6
●
0.
●
●●
●
●
●
●
Combined profile
●●
●
1.
●
2008
AUC (Regulated pest profile)
Non−regulated pest profile
1.0
0.9
0.8
0.7
0.6
0.5
1.0
0.9
0.8
0.7
0.6
0.5
1.0
0.9
0.8
0.7
0.6
0.5
1.0
0.9
0.8
0.7
0.6
0.5
AUC
Figure 3: AUCs computed for each tariff code in the data to assess within-tariff profiling operationally, by year.
The y-axis is the AUC using the previous year’s regulated pest interception rate to set the profile. The x-axis in
each panel is the AUC using the previous year’s non-regulated pest, administration, and combined interception
rates to set the profile, assessed on the same inspection data that are used for the y-axis. The size of the point is
related to the number of fails within the profile, and a line has been added at x y to facilitate comparison.
7
3.3
Comparison of profiles using statistical modelling
Comparison of the models from Section 2.4 is reported in Table 2. Model M3 has the lowest LOOIC, and
the difference in LOOIC between M3 and M4 (98.3) is much larger than the standard error of its difference
(17). These results show that supplier and tariff information are important for predicting regulated pest
interception probability. The models are greatly improved with the addition of interaction terms between
suppliers and tariffs, and suppliers and years. After allowing for the effects of suppliers, tariffs and years,
the addition of: the previous year’s regulated pest interception rate (M4 vs M3), the previous year’s nonregulated pest interception rate (M5 vs M3) and the previous year’s administrative interception rates (M6 vs
M3) do not improve the model.
Table 2: LOOIC-based comparison of statistical profiling models. The model with the smallest LOOIC (M3)
is shown first, with subsequent rows ordered by increasing LOOIC. ∆LOOIC shows the difference in LOOIC
between all models and model M3; se(∆LOOIC) shows the estimated standard error of the difference. Eff. P
gives the estimated effective number of parameters; se(Eff. P) shows its standard error.
Model
M3
M4
M2
M1
M5
M6
Base
LOOIC
se(LOOIC)
Eff. P
se(Eff. P)
∆LOOIC
se(∆LOOIC)
2788
2886
2980
3093
3141
3154
3250
111
117
127
133
146
145
148
458
470
419
377
449
446
392
23.3
25.0
23.6
22.5
26.6
26.9
23.7
98.3
192.2
304.9
353.7
366.2
462.7
17.0
29.4
36.6
49.6
49.6
52.3
Figure 4 shows the out-of-sample mean log predictive density and AUCs (Section 2.3) for all statistical
profiling methods. Also shown in Figure 4b are the AUCs from the regulated pest profile. Models Base–M3
perform the best in terms predictive log-likelihood (larger values are better), with no clear demarcation
between them. In comparison, Models M1 and M2, as well as the Base model and the empirical Bayes profile
perform best on AUC.
3.4
Model examination
In this section we present an investigation of the effects from Model M3. We decided to investigate Model 3
further due to its superior performance in LOOIC (Table 2), as well as the within-supplier examinations that
would be available due to the interaction term. Figure 5 shows the marginal log-odds for tariffs from Model
M3, ordered left-to-right by decreasing probability of their marginal log-odds being greater than 0; bars in
the figure show 90% posterior credible intervals. The inset shows the top 10 tariffs, and as to be expected,
Kiwi fruit is the tariff that contributes the highest risk.
Figure 6 shows the marginal log-odds for suppliers from the model, ordered left-to-right by decreasing
posterior probability of their marginal log-odds being greater than 0; bars in the figure show 90% posterior
credible intervals. Supplier labels have been masked for privacy reasons. The suppliers to the left of the
figure are those predicted to have a large increase in probability of regulated pest interception, all else being
equal. It is these suppliers that would naturally be the first targets in an operational capacity.
Figure 7 provides a closer examination of the risky suppliers. We have selected the top 25 suppliers (by
the probability of their marginal log-odds being greater than 0) and calculated their posterior probability
of a regulated pest being present in a consignment, averaged over all years from Model M3. The panel on
the left shows their posterior probability for each tariff that the supplier imports, whilst the panel on the
right shows the observed proportion (averaged over years) of regulated pest interceptions by tariff. This
figure shows that these highest risk suppliers import a range of tariffs — i.e. their poor performance is not
necessarily due to importing one or two of the highest risk tariffs (as shown in Figure 5). Further, in the
tariffs they do import, they have consistently high levels of consignments with regulated pest contamination
(right panel, Figure 7).
8
0.96
−600
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
● ●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
Mean log predictive density
−900
●
● ● ● ●
●
● ●
●
●●
●
● ●
●
●●● ● ●
●●
●
● ●
● ● ●
●
●
●● ●● ●●
●
●
●
●
● ●
● ● ●
●
●
● ● ●●
●●
●
●
●●●
●
●
●● ● ●
●
●●
●
●
−1200
●
●
●
●
●
●●●
●●
●
● ●
● ● ●●
●
●● ●
● ●
●
●● ●
● ●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
● ●
●
●●
●
●
●●
● ●● ●
● ●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●
● ●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●● ● ●●
●
●●●
●
●
●●
●
●
●
●
● ●
●●
●
●
●●
●●
● ●●
●●
●
●
●●●
●●●
●
●
●
●
● ●● ●
● ●
●
●
● ● ●
●
● ●● ●
●
●
●● ● ●
●
●●
●
●●
● ●●● ●
●
●
●
●●
●
● ●
● ●
● ●●
● ●
●
●
●●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
● ● ●●
●
● ● ●
●
●
●
●
●●
●
●
● ●
●●
●●
●
●
●● ●
●● ● ●●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●●
● ●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
● ●
●
●
●
●
●
●●
●●
●
●
● ●
●
●
● ●
●
●
●●
0.88
●
●
●●
●
●
● ●
●
−1800
M1
M2
M3
●
M4
●●
●
● ●
Base
●
●
●
●
●● ●
●
●
●
●
●
●
0.90
●
●●●
●
●
●
●●
●
●
●
●
● ●
●
● ●●●
●
● ●●
●
●
●●
●
●
●●● ●
●●
●
●
●
●
●● ●
●
●
●●
●● ●
●●
●●
●●●
● ● ●●
●●
●●●●
● ●●
●
●
●● ●●
●
●
●
●
●● ●
●
●●
●
●
●
●
●●
●
●
● ● ●●
●
● ●
●
● ● ●
●
●●
●
●
●●
●● ●
●
●
●
●●
●
●
●
●●
●●
●●
●
● ● ●
●
●
●
● ● ● ●
●
● ● ●
●●
●
●
●●
●●
● ●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●
● ●
●● ●
●
●
●
●
● ● ●
●●
● ● ●●
●●
●
●
●●
●
●
●●
●
●
●
●●●
●● ●●●
●
●
●
● ● ●
●
●
●● ●●
●●
●●●
●
●
●
●
●
●● ●
●
● ● ●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●● ●
●
●
● ●
●
● ●●
●
● ●
●
●
●
●●● ● ●
●
●●
●
●
●●●
●●
● ● ● ●●
●
●
●
● ●●
●●
●
●●
●
● ● ●
●
● ●●
●
●
●
● ●
●
● ●● ●
●
●●● ● ●
●
●
●
● ● ●●
●●
● ●
●●
●●
●
●●
●
●● ●●●
● ●●
●●
●●
●
● ● ●
●
●
●● ●
●●
●
●
●
●
●●
●
●
●●
●●●
●●● ●
●
● ●
● ●
●●
●
●
●●
●
●
●
●
●
0.92
●
●
●
●●
●
●●
●
●
●
● ●●
● ●
●
●●
●
●
●
●●
●
●
●●
● ●
●
●● ● ●
●
● ●●●
●● ●
●●
●● ●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●●
● ● ●
● ●●
●●
● ● ●
●
●
●
●
●
●●
●
● ●
● ●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●● ●
● ●
●●
●
● ●
●●
●
●● ●
●
● ●●● ● ●
●
● ●
●
● ●
●● ●
● ●
●●●
●●●
● ●● ●
●●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
● ●●
●
● ●
●● ● ●
●
●
● ●●●
●● ● ●
●●
● ● ●●
●
●●
●
●
●
● ●●
●
● ●
●●
● ●●
●
●
●
● ● ●
● ●
● ● ●
●
●● ●
0.94
●●
●●●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●●
●
●
●
● ●●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●● ●●
●
●
●
●●
●
●
●
●
●●
●
●
● ● ●●
● ●●
●
● ●
●
●
● ● ●●
●●
●● ●
● ●●
●
●
●●
● ●
●
● ●● ●
●
●
●
● ●
●●
●
●●
●●
●
●
●●
●● ● ●
●
● ●●
●
● ●●● ●
●
●
● ●
●●
●
●●
●● ●
● ●
●
●
● ●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●●●
●
● ●●
●
●
● ●●
●
●
●
●
● ●●●● ●
●
● ●
● ● ●
● ●● ●
●●
●
●
●●
●●
●●
●
●
●
●
−1500
●
●
●
● ●
●
●
●
●
●
●
● ●●
●
●●●
● ●
●
●
● ●
●
● ●
●
●
●
●
●●
●
● ●●
●●
●
●
●
●
●
● ●
● ●
●●
●
●●
●● ●
●
●
● ●●●
●
● ● ●●
●
● ●●
●
●
●
●●
● ●●●
●
●
●
●●
● ●●
●
●
●
●
●
● ●
AUC
●
●
●
M5
●
● ●
M6
Base
M1
M2
M3
Model
M4
M5
M6
E. Bayes
Model
(a) Mean log predictive density
(b) AUCs
Figure 4: Mean log predictive density and AUCs for statistical profiling. Also shown are the AUCs from the
regulated pest profile (E. Bayes).
5
●
●
4
Log−odds
●
●
●
●
3
●
●
●
●
●
3
●
●
●
●
2
●
●
●
1
I
.
IT
TC
ES
NS ETC
OS RUIT RINS
KIW FRU
AD
NG EMO
SE
EF NDA
ES
RA
L
UT
OC
ER
RI
AP
V
N
O
A
H
A
R
R
O
M
G
C
OT
BE
CO
AN
R
CR
HE
OT
●
●
●
●
●
●
Log−odds
●
●
●
S
FIG
Tariff
0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3
Tariff
Figure 5: Marginal log-odds of the tariff effect in Model 3. Tariffs are ordered left-to-right by decreasing
probability of their marginal log-odds being greater than 0, with bars showing 90% posterior credible intervals.
The inset shows the top 10 tariffs.
9
●
●
5
●
●
● ●●
●
●
●
●
●
Log−odds
●
●
●●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
● ●
● ●
●
●
● ●
● ●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
● ●
● ●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●● ●
● ● ●
●
● ● ●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●● ● ●
●
● ●
● ●
●
●●● ● ●●
●
● ●
●
0
●●●●●●
● ●
●
●
● ●
● ●●●
● ●●●●●
● ●
●● ●
●●●●●●
● ●●●●●●●●●●●●●●● ●●●
●●●●●●●●●●●●●●●●● ●●●●●●●● ●●
●
●
●● ● ● ●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ●● ●●
●
●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●
●
● ●
●
● ●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●● ●●●●●●● ● ●●●●
●
●
●●● ●● ●● ●● ● ●●●●● ● ●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●●●●●●●●●● ● ● ●
●
●
●
● ●●●●●●●● ●●●●
● ● ●● ●● ●●
●
●● ●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●
●
●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●
●
●
●
●● ●●●●●●●● ●●●●● ●●●●●●● ● ●
●
● ●
●● ●●●
●● ● ● ●
● ● ●●●●●●●●●●● ●● ●● ● ● ● ●
●
● ● ●● ● ● ● ● ● ●●● ●●●●●●●●●●●● ●●●
●
●
● ● ●●●●●●●● ●●●●●●●●● ●●● ●●● ● ●
● ●● ●● ● ●●●●
● ●
●
● ● ● ● ●●●●●
●
●
●●●●●●●●●●●●●
●● ● ● ●●●●●● ●
●● ● ●
●
● ●●● ● ● ●
● ● ● ●
● ● ● ●●
●
●●●●●●● ●●
● ● ● ●●
● ● ●● ●
●●
● ●●
●
● ●●●
●●
●
●
● ●
●●
● ●
●
●
●
●●●
Supplier
Figure 6: Marginal log-odds of the supplier effect in Model M3. Suppliers are ordered left-to-right by decreasing
posterior probability of their marginal log-odds being greater than 0, with bars showing 90% posterior credible
intervals. The inset shows the top 10 tariffs.
10
9dee147d
1.00
● ●
7bfad049
●
●
● ●
●
55ae4f82
74a73160
6231360f
9dee147d
7bfad049
55ae4f82
74a73160
6231360f
203fbda3
82670a18
44cae1d7
6dfe2b2a
c9682083
f0e7fd1e
20d5c696
9cf52fe3
34ad4281
a06fde74
f068d765
be33449c
8e72b4f3
5f3f4978
668f1352
0a2da0c2
ecd0d8bd
533a6427
b3a378be
8f2c99f4
1.00
●
●
0.75
0.75
●
0.50
0.50
0.25
●
●
●
0.25
●
●
●
●
●
●
● ●
0.00
82670a18
44cae1d7
6dfe2b2a
0.00
c9682083
1.00
1.00
●
●
●
0.75
●
●
●
●
●
0.50
●
●
0.25
●
●
●●
●
●
●
0.00
f0e7fd1e
●
20d5c696
9cf52fe3
34ad4281
a06fde74
1.00
0.75
●
●
●
0.50
●
0.25
●
●
●
●
0.00
f068d765
be33449c
8e72b4f3
●
●
5f3f4978
668f1352
1.00
●
0.75
●
0.50
●
0.25
Proportion of consignments with regulated pests
Posterior probability of regulated pest in consignment
203fbda3
●
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
●
●
0.00
0a2da0c2
ecd0d8bd
●
533a6427
b3a378be
0.00
8f2c99f4
1.00
1.00
0.75
0.75
0.50
0.50
●
●
●
0.25
●
●
●
0.00
0.25
●
●
●
●
●
0.00
● ●
●
Tariff
Tariff
(a) Posterior probability of a regulated pest interception.
(b) Observed proportion of consignments with regulated
pests.
Figure 7: Posterior probability of a regulated pest interception in the top 25 suppliers, along with the observed proportion of consignments with regulated pests. Panels are ordered left-to-right, top-to-bottom by the
probability of their marginal log-odds being greater than 0; bars in the left panel show 90% posterior credible
intervals.
Figure 8 provides a closer examination of suppliers who pose minimal risk. We have selected the
bottom 25 suppliers (by the probability of their marginal log-odds being greater than 0) and calculated their
posterior probability of a regulated pest being present in a consignment, averaged over all years from Model
M3. The panel on the left shows their posterior probability for each tariff that the supplier imports, whilst
the panel on the right shows the observed proportion (averaged over years) of consignments that did not
contain regulated pests by tariff. Similar to Figure 7, these suppliers import a range of tariffs — i.e. their
good performance is not necessarily due to importing lower risk tariffs. However, in comparison to the risky
suppliers, in the tariffs they do import, they have consistently high levels of consignments without regulated
pest contamination (right panel, Figure 8).
4 Discussion and Conclusion
There was a strong association between regulated pest interceptions and the lower-risk administrative and
non-regulated pest interceptions (Section 3.1). This association was also observed when using administrative
interceptions as a predictor for operational profiling (Figure 2), demonstrating the utility of the operational
profiling approaches. However we note that this does not carry over into the statistical models (Section 3.3),
for which including historical rates as predictor variables did not improve model fits.
The statistical profiles still performed well using the cross-validated AUCs (Figure 4) as well as the
predictive log-likelihood. Thus, in answer to our motivating question of which data provide the most
useful information about the pathway, we would conclude that it is the knowledge of particular suppliers,
tariffs and their combination that is most informative. The previous year’s regulated pest profile performed
well based on AUC, however adding this to the statistical profiles gave no benefit (Table 2). Furthermore,
there is limited scope for investigating why a particular supplier may be problematic. Statistical profiling,
in comparison, allows decisions to be based on posterior probabilities. For example, we could calculate a
supplier’s (marginal) probability of having a regulated pest interception. Intervention could then be planned
11
f34c97e8
be39a51e
be39a51e
8723f064
2c302663
03c0b979
ba1580ff
9cdce10f
54c7e3b2
2d5df21a
f535cc3c
5ab4f1e8
3b151c4f
6436bd07
03fcb7fe
bd0901b4
86013f1c
dfd0892b
3a3a78cb
39fc5438
06ed4b13
cbccf62e
b5864029
c4da873b
09ec8134
ba5e1e12
0.25
●
●
●
●
●
●
●
●
9cdce10f
●
●
●
●
54c7e3b2
●
●
●
●
●
●
●
●
f535cc3c
●
●
●
●
0.00
2d5df21a
●
3b151c4f
●
●
●
6436bd07
03fcb7fe
●
●
●
bd0901b4
●
●
●
●
86013f1c
●
●
●
●
dfd0892b
●
3a3a78cb
●
●
39fc5438
●
●
06ed4b13
●
●
●●
●
cbccf62e
●
●
●
●
b5864029
●●
●
●
●
c4da873b
●
●
●
●
●●●
●
●
●
●
09ec8134
Proportion of consignments without regulated pests
Posterior probability of regulated pest in consignment
f34c97e8
0.50
●
●
●
5ab4f1e8
0.5
0.4
0.3
0.2
0.1
0.0
03c0b979
●
●
●
0.5
0.4
0.3
0.2
0.1
0.0
0.5
0.4
0.3
0.2
0.1
0.0
2c302663
0.75
ba1580ff
0.5
0.4
0.3
0.2
0.1
0.0
8723f064
1.00
0.5
0.4
0.3
0.2
0.1
0.0
1.00
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
0.00
●
ba5e1e12
1.00
0.75
0.50
0.25
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.00
●
Tariff
Tariff
(a) Posterior probability of a regulated pest interception.
(b) Observed proportion of consignments without regulated pests.
Figure 8: Posterior probability of a regulated pest interception in the bottom 25 suppliers, along with the
observed proportion of consignments without regulated pests. Panels are ordered left-to-right, top-to-bottom
by the probability of their marginal log-odds being less than 0; bars in the left panel show 90% posterior credible
intervals.
on either the top ranked suppliers (if funds are limited), or all suppliers that meet a threshold.
A benefit of using a statistical modelling approach to investigate profiling is the added level of interrogation possible from the fitted model. In Section 3.4 we demonstrated how we can gain a clearer insight
into the governance of this pathway. Firstly, by studying the marginal effects of suppliers and tariffs, we can
build up a picture of risk without relying on observed rates, which are noisy due to sampling and process
error. We can pinpoint which tariffs and suppliers contribute to excessive risk, essentially by an ordering of
the marginal log-odds, and then choose to investigate those that have a posterior probability higher than a
pre-defined cutoff set by management.
With a list of potentially risky suppliers, we further demonstrated how a manager could gain information
into the governance of those suppliers by investigating the posterior predicted probabilities of regulated pest
interceptions (Figure 7). This information could be used to initially examine why a particular supplier may
be having trouble with contaminated consignments, and be used to help improve their processes. Likewise,
looking at the less risky importers (Figure 8) may provide information on good process that can be shared
to the riskier importers.
To summarise, statistical models constructed using inspection data provide sufficient information for
profiling purposes, and can be used to further interrogate the governance of multiple pathways and to help
identify the processes underlying poor performance on these pathways.
5 Acknowledgments
The authors are grateful to Lindsay Penrose (ABARES, Australian federal Department of Agriculture and
Water Resources) for statistical analysis on an early draft not reported here and Brendan Woolcott (Plant
Division, Australian federal Department of Agriculture and Water Resources) for providing the data for the
analysis.
12
References
Arthur, T., Zhao, S., Robinson, A., Woolcott, B., Perotti, E., and Aston, C. (2013). Statistical Modelling and
Risk Return Improvements for the Plant Quarantine Pathway. Technical Report 1206F 1, Australian Centre
of Excellence for Risk Analysis.
Beale, R., Fairbrother, J., Inglis, A., and Trebeck, D. (2008). One Biosecurity: a Working Partnership. Commonwealth of Australia.
Department of Immigration and Border Protection (2016). Customs tariff act 1995.
Dodge, H. (1943). A sampling inspection plan for continuous production. The Annals of Mathematical Statistics,
14(3):264–279.
Dodge, H. F. and Torrey, M. N. (1951). Additional continuous sampling plans. Industrial Quality Control,
7(5):7–12.
Gabry, J. and Goodrich, B. (2016). rstanarm: Bayesian Applied Regression Modeling via Stan. R package version
2.12.1.
Pimentel, D. (2011). Biological Invasions: Economic and Environmental Costs of Alien Plant, Animal, and Microbe
Species. Hoboken: CRC Press, 2011., second edition.
R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical
Computing, Vienna, Austria.
Robinson, A., Bell, J., Woolcott, B., and Perotti, E. (2012). DAFF Biosecurity: Plant-Product Pathways.
Technical Report 1001J 1, Australian Centre of Excellence for Risk Analysis.
Robinson, A., Woolcott, B., Holmes, P., Dawes, A., Sibley, J., Porter, L., and Kirkham, J. (2013). Plant Quarantine Inspection and Auditing across the Biosecurity Continuum. Technical Report 1101C 1, Australian
Centre of Excellence for Risk Analysis.
Robinson, A. P., Chisholm, M., Mudford, R., and Maillardet, R. (2015). Ad hoc solutions to estimating
pathway non-compliance rates using imperfect and incomplete information. In Jarrad, F., Low-Choy, S.,
and Mengersen, K., editors, Biosecurity Surveillance: Quantitative Approaches, CABI invasive series, pages
167–180. CABI, Boston.
Sinden, J., Jones, R., Hester, S., Odom, D., Kalisch, C., James, R., Cacho, O., and Griffith, G. (2005). The
Economic Impact of Weeds in Australia. Plant protection quarterly, 20(1):25–32.
Vehtari, A., Gelman, A., and Gabry, J. (2016). Practical Bayesian model evaluation using leave-one-out
cross-validation and WAIC. Statistics and computing, pages 1–20.
13
Appendix A Calculation of rates using empirical Bayes
In this appendix, we detail the procedure used for calculating the smoothed rates in Section 2.1 via empirical
Bayes. Recall that we have X st y the number of failures out of n st y inspections from tariff t performed in year
d
y from supplier s; we assume that X st y Binomial(p st y , n st y ).
To find the empirical Bayes estimate of p st y for supplier s, in tariff t and year y, assume that the binomial
proportions p st y have a prior Beta distribution: p st y ∼ Beta α t y , β t y . Then X st y has a Beta-binomial
distribution, with probability mass function
Pr X st y
Γ n st y + 1
Γ k + α t y Γ n st y − k + β t y Γ α t y + β t y
n st y
.
k
k Γ (k + 1) Γ n st y − k + 1
Γ n st y + α t y + β t y
Γ αt y Γ βt y
The parameters α t y and β t y are found using maximum likelihood:
(
α̂ t y , β̂ t y arg max −
α t y ,β t y
S
Õ
s1
log Pr X st y x st y
)
where x st y is the observed value of X st y , and S is the number of suppliers. To complete the calculation, the
rates for supplier s in tariff t and year y, are updated using the following formula:
p̃ st y
x st y + α̂ t y
n st y + α̂ t y + β̂ t y
14
Appendix B
Further results
Table B1: Summary of AUC values for profiling strategies, by year. The profiles are as follows: Tariff and
Supplier refers to profiles constructed for the interaction of tariff and supplier, Supplier within Tariff refers to
averaging the supplier interception rates within tariffs, and Supplier within Tariff, Smoothed refers to the using
the empirical Bayes estimate of the suppliers within tariffs and years. Regulated pest refers to using the previous
year’s regulated pest interception rate; Non-regulated pest refers to using the previous year’s non-regulated pest
interception rate; Administrative refers to using the previous year’s administrative interception rate; and Combined
refers to using the previous year’s combined interception rate. Each AUC is computed using the data from the
following year’s inspections.
Profile
2008
2009
2010
2011
Tariff and Supplier
Regulated pest
Non-regulated pest
Administrative
Combined
0.881
0.849
0.728
0.833
0.899
0.878
0.767
0.861
0.902
0.859
0.759
0.854
0.917
0.890
0.776
0.867
Supplier within Tariff
Regulated pest
Non-regulated pest
Administrative
Combined
0.861
0.839
0.795
0.836
0.897
0.880
0.841
0.881
0.903
0.864
0.824
0.871
0.906
0.907
0.842
0.901
Supplier within Tariff, Smoothed
Regulated pest
Non-regulated pest
Administrative
Combined
0.902
0.870
0.743
0.859
0.929
0.909
0.815
0.896
0.935
0.892
0.803
0.890
0.939
0.920
0.813
0.905
15