[go: up one dir, main page]

Academia.eduAcademia.edu
When does poor governance presage biosecurity risk? Stephen E. Lane1 , Tony Arthur2 , Christina Aston2 , Sam Zhao2 , and Andrew P. Robinson1 arXiv:1702.04052v1 [stat.AP] 14 Feb 2017 1 Centre of Excellence for Biosecurity Risk Analysis, University of Melbourne, Parkville, Victoria 3010, Australia, lane.s@unimelb.edu.au 2 Department of Agriculture and Water Resources, Canberra, Australian Capital Territory 2601, Australia February 15, 2017 Abstract Border inspection, and the challenge of deciding which of the tens of millions of consignments that arrive should be inspected, is a perennial problem for regulatory authorities. The objective of these inspections is to minimise the risk of contraband entering the country. As an example, for regulatory authorities in charge of biosecurity material, consignments of goods are classified before arrival according to their economic tariff number (Department of Immigration and Border Protection, 2016). This classification, perhaps along with other information, is used as a screening step to determine whether further biosecurity intervention, such as inspection, is necessary. Other information associated with consignments includes details such as the country of origin, supplier, and importer, for example. The choice of which consignments to inspect has typically been informed by historical records of intercepted material. Fortunately for regulators, interception is a rare event, however this sparsity undermines the utility of historical records for deciding which containers to inspect. In this paper we report on an analysis that uses more detailed information to inform inspection. Using quarantine biosecurity as a case study, we create statistical profiles using generalised linear mixed models and compare different model specifications with historical information alone, demonstrating the utility of a statistical modelling approach. We also demonstrate some graphical model summaries that provide managers with insight into pathway governance. 1 Introduction Efficient and effective border biosecurity strategies are essential for protecting ecosystems and economies from invasive pests. The annual cost of invasive species generally is estimated to be over USD$200bn (Pimentel, 2011) in the United States, and at least USD$4bn in Australia (Sinden et al., 2005). In Australia, the Department of Agriculture and Water Resources (the department) is both the regulatory authority and the inspectorate for biosecurity protection, carrying out both pre-border and border intervention on a range of imported goods, based on the risk profile of the goods and international agreements. The objective of these interventions is to minimise the risk of biosecurity risk material (BRM) entering the country. Here, we focus on border inspection and the challenge of deciding which of the tens of millions of consignments that arrive should be inspected. Before arrival, consignments of goods are classified according to their economic tariff number (Department of Immigration and Border Protection, 2016), and this classification is used, with other information, as a screening step to determine whether further biosecurity intervention, such as inspection, is necessary. Other information associated with consignments includes details such as the country of origin, supplier, and importer, for example. Border inspection for quarantine biosecurity is carried out for a number of reasons, namely (i) to verify the effectiveness of mandated pre-arrival treatments; (ii) to detect and intercept BRM; (iii) to provide information about the intrinsic contamination rate of the activity; and (iv) to deter potential malefactors. As noted above, tens of millions of consignments arrive every year, so the challenge is to determine which should be inspected. 1 We define a pathway as a collection of activities that culminates in the arrival to Australia of a set of alike consignments. The pathways are hierarchical, so we may consider a pathway of all consignments of a commodity, or all consignments of that commodity for a specific country, or even for a specific supplier. For example, the plant product pathway, which is the focus of this article, includes goods such as kiwi fruit and cashew nuts, which can themselves be considered pathways. Statistically, pathways can be thought of as processes. Pathways can be classified as either high risk or a low risk, based on the probability that a consignment contains BRM, called the approach rate. For example, in Australia, kiwi fruit is a high-risk plant product pathway, with an approach rate of 55.8%, whereas cashew nuts is a low-risk plant product pathway, with an approach rate of 1.3%. Importantly, the degree of severity of the detected BRM in cashews has been identified as very low. The risk severity classification is important because the department may apply different interventions to low-risk than to high-risk pathways, as discussed below. The identification of pathways as high or low risk is called profiling, and is an essential step in the efficient management of biosecurity intervention. Traditionally, profiling has been applied by using records of interception of regulated pests on the pathway. This application is based on the assumption that future biosecurity compliance can be predicted by past biosecurity compliance, at least for some periods in the past and the future. However, interception of regulated pests is a rare event, which is good news from the point of view of biosecurity protection, but makes profiling more difficult, especially in sparse pathways, because reliable estimates of pathway risk are hard to obtain. This observation motivates the following question: whether future biosecurity compliance can be predicted by other characteristics as well as by past biosecurity compliance. Historically, all consignments of imported plant product pathways were subjected to mandatory inspection. As part of a comprehensive review of Australia’s biosecurity system, Beale et al. (2008) recommended establishing a science-based system for managing biosecurity issues, noting that zero risk is both unattainable and undesirable. With the full inspection strategy, pathways that have lower approach rate cost considerably more inspection effort to intercept BRM. For example, 4623 consignments were inspected in the cashew pathway over four years, of which BRM was detected in 59, so the average number of inspections per detection (IPD) was about 78, compared to about 2 for the kiwi fruit pathway. We now introduce the case study that motivates the research. The inspection work flow of imported plant product pathways comprises three components, namely: suppliers that export plant products, importers that import the products from suppliers, and border inspections that attempt to detect as much as possible BRM. Inspections at the border can be stratified by supplier or importer, that is, unique inspection regimes may be applied to individual importers or suppliers. The department currently uses the continuous sampling plan (CSP) algorithm, specifically, CSP-3, to manage the biosecurity risk of low-risk pathways (Dodge, 1943; Dodge and Torrey, 1951). The CSP family of algorithms allocate intervention effort within pathways according to recent inspection history. The department has implemented CSP-3 for the inspection of a range of low-risk pathways, including dried apricots, green coffee beans, raisins, cashews and some nuts. This particular approach to profiling has been shown to result in reductions of both leakage (how much BRM is missed in the inspection process) and IPD relative to random sampling plans (Robinson et al., 2012, 2013; Arthur et al., 2013). A wrinkle in the application of the CSP algorithm is that although it is implicit that the analysis of inspection history would take account of only the kinds of contamination that are of specific regulatory interest, in fact, any aspect of the inspection history can be used as an indicator of future risk. That is, although the department may be specifically concerned about intercepting regulated pests, the inspection history provides a much richer view of the pathway because it includes information about other incidents, such as the interception of non-regulated pests, failures of documentation, and so on, which may arguably and testably be related to the chances of failure types that are of regulatory concern. The question that motivated this study is: what data provide the most useful information about the pathway: the relatively sparse history of interception of regulated pests, or the more complete picture of the relative performance on the pathway, or some combination? Furthermore, can insight into the future performance in a given pathway be provided by information about historical performance in other, possibly related pathways? This paper reports an analysis of the use of auxiliary information to try to improve profiling. The objective is to distinguish high-risk and low-risk pathways, where risk refers to the interception rate of regulated pests, based on a range of characteristics of the pathway, including the interception rate of regulated pests, non- 2 regulated pests, administrative failures, and supplier and tariff information. We aim to form a picture of the governance of the pathway and use that picture as a basis for assessing their relative biosecurity risk. The balance of the paper is organized as follows. In the next section we introduce the dataset and the models used to test our conjectures. We then present the results and a discussion and conclusion. 2 Materials and methods We used a number of analytical approaches to assess the conjecture. First, we tested the association between non-regulated pest and administrative (more frequent, low severity) failures and regulated pest inspection (less frequent, high severity) failures. If such an association was found, then we reasoned that historical governance-related variables may be used to predict future high-severity biosecurity failures. Second, we used historical failure rates to create profiles, and investigated performance using Receiver Operating Characteristic (ROC) curves. Lastly, we constructed statistical models that would predict future regulated pest interception probabilities as a function of previous regulated pest interception probabilities and other, governance-related predictor variables. All data preparation and modelling were performed using R Version 3.3.0 (R Core Team, 2016) with the generalised additive mixed models of Section 2.4 using R package rstanarm (Gabry and Goodrich, 2016). 2.1 Data The data for the analysis comprise the inspection history for all consignments classified as fruit (Chapter 8, Department of Immigration and Border Protection, 2016) that arrived between January 2007 and December 2011, a period of five years. The pathway is a complex one, comprising 80 different tariff codes, 3150 unique importers, and 3655 unique suppliers from 127 countries. For the purposes of this study we will assume that all significant biosecurity contamination has been captured by the regulatory border inspection. There were approximately 48300 inspections of more than 75000 goods. Approximately 5300 inspections resulted in interception of a regulated pest, 8500 inspections resulted in interception of a non-regulated pest, and 5900 inspections recorded some administrative failure. For modelling (see Section 2.4), we aggregated the data by year, tariff and supplier. This aggregation was done for two reasons: first, it allowed us to create models that account for both supplier and tariff effects, and second, aggregating by year limits the effects of any seasonality. We use interceptions to refer to both interceptions of pests and administrative failures throughout the study. An appropriately formatted dataset for modelling was constructed as follows: For each year y within 2008 to 2011; • Compute interception/fail rates for year y − 1 by tariff, supplier, and year for: – Administrative interceptions – Non-regulated pest interceptions – Regulated pest interceptions We denote by X st y the number of interceptions out of n st y inspections from tariff t performed in year y from supplier s. Correspondingly, each inspection has a probability p st y of being intercepted in one of the ways listed above. Then X st y was modelled as d X st y  Binomial(p st y , n st y ) Computing interception rates by tariff, supplier and year sometimes resulted in very small binomial denominators, due to the sparse history of inspection and interceptions produced. For this reason, rather than raw interception rates, we calculated smoothed interception rates using parametric empirical Bayes (Robinson et al., 2015). In particular, we used the Beta-binomial model to smooth interception rates for suppliers within tariffs and years; we provide the full details in Appendix A. 3 2.2 Association between low-severity and high-severity interceptions To investigate the association between low-severity and high-severity interceptions, we calculated log-odds ratios and 95% confidence intervals (using a normal approximation for the log-odds) for the odds of a regulated-pest interception for consignments with or without non-regulated pest or administrative interceptions. For each inspected consignment, suppose Y denotes the outcome of inspection, so that Y  1 indicates a regulated pest was intercepted. Further, let X denote whether the consignment contained a non-regulated pest (or had an administrative failure), so that X  1 indicates the consignment contains a non-regulated pest (or had an administrative failure). The log-odds ratio is   Pr(Y  1|X  1)/Pr(Y  0|X  1) . log OR  log Pr(Y  1|X  0)/Pr(Y  0|X  0) 2.3 Profiling using annual inspection data We created profiles using annual inspection data, and compared performance using ROC curves. For each year y in 2007 to 2010 and each kind of interception rate (regulated pest, non-regulated pest and administrative) we: compute ROC curves against year y + 1 biosecurity inspection outcomes for regulated pests and further, calculate the area under the curve (AUC). We also computed ROC curves within tariff, due to the suspicion that the tariff-to-tariff variation would dominate the ROC signal, due to the differences of interception rates between the tariffs, rather than for the importers within the tariffs. That is, if we only ran the profiles across the tariffs then a naive assessment of the performance would look very good because we would expect the differences between the risks of the tariffs to be reasonably stable from year to year. Hence, assessing the model within tariffs provides a more reasonable assessment. 2.4 Profiling using statistical modelling We chose to construct models usinggeneralised additive mixed model formulations with the linear predictors for the logit probability, log p st y 1−p st y specified as: Base : β0 + γs + τt M1 : β 0 + γs + τt + α y M2 : β 0 + γs + τt + α y + ϕ st M3 : β 0 + γs + τt + α y + ϕ st + κ s y M4 : β 0 + γs + τt + α y + ϕ st + κ s y + b 3 (p R,st(y−1) ) M5 : β 0 + γs + τt + α y + ϕ st + κ s y + b 3 (p N,st(y−1) ) M6 : β 0 + γs + τt + α y + ϕ st + κ s y + b 3 (p A,st(y−1) ) where: β 0 is a fixed process constant to be estimated; γs is a supplier-level effect; τt is a tariff-level effect; α y is a effect for year of interception; ϕ st is a effect for the supplier-tariff cross-classification; κ s y is a effect for the supplier-year cross-classification; b3 (·) represent cubic regression splines for the previous year’s regulated pest interception rate p R,st(y−1) , non-regulated pest interception rate p N,st(y−1) , and administrative interception rate p A,st(y−1) . Bayesian logistic regression models were fit using rstanarm (Gabry and Goodrich, 2016). We used student-t priors for all coefficients, setting the scale for the intercept prior at 10, and for all other coefficients at 2.5. To be more descriptive, (M4) tests whether historical regulated pest interception rates can be used to predict future regulated pest interception probability, whilst (M5) and (M6) test effect of historical nonregulated pest and administrative interception rates on probability of future regulated pest interception. Comparison of the statistical profiling results was made via a combination of: LOOIC comparisons (Vehtari et al., 2016), and predictive log-likelihood via repeated five-fold cross-validation. LOOIC is similar to AIC in that it estimates out-of-sample prediction accuracy; however, LOOIC integrates over uncertainty 4 in the parameters, and does not assume multivariate normality as the AIC does. We used 20 repeats, resulting in 100 training/testing datasets for comparison. To ensure balance across the datasets, sampling was performed within years. All models from Section 2.4 were fit to each training dataset, and predictions made on the testing datasets. 3 Results We present the results in three sections: the association between low-severity and high-severity interceptions; the operational AUC tests and the statistical modelling results. We finish this section with an in-depth look at the information gained from the modelling procedures. 3.1 Association between low-severity and high-severity interceptions Figure 1 shows the log-odds ratios, along with 95% confidence intervals for the association between regulated pest (high-risk) interceptions and non-regulated pest and administrative (low-risk) interceptions both overall and by year. All estimates and lower bounds of the confidence intervals are well above 0, showing there is a large association between low- and high-risk interceptions. 5 ● ● ● ● ● ● Log−odds 4 3 ● ● ● ● ● Administrative Non−regulated pest ● ● ● 2 1 0 2007 2008 2009 2010 2011 overall Level of aggregation Figure 1: Estimates (95% confidence intervals) of the log-odds ratios between low- and high-risk interceptions overall and by year (2007–2011). The log-odds ratios are calculated between regulated pest (high-risk) interceptions, and non-regulated pest and administrative (low-risk) interceptions. 3.2 Comparison of profiles using annual inspection data 3.2.1 Across tariffs Figure 2 presents ROC curves that compare how well the different profiles perform. As per Section 2.3, the profiles are generated from the previous year’s interception rates. All profiling approaches are substantially 5 better than random, and the administrative profile consistently led to the weakest performance across each year. We have also shown the performance from a combined profile in Figure 2; this is simply the profile using interception rates calculated from a variable indicating if any of the interception types occur. Clearly, the combined interception profile offers little performance over the regulated-pest profile. The profiles derived from non-regulated pest and administrative interception rates were consistently slightly worse than those based on regulated pest interception rates. Table 1 presents the AUC values for each of the curves presented in Figure 2. The values are consistently close to 1, which suggests that the relative interception rates are very stable from year to year, and that the interception rates for each year y are a very good indicator for year y + 1. We also derived profiles using data without empirical Bayes smoothing (see Section 2.1), however these profiles underperformed compared to the profiles using empirical Bayes smoothing. The results without empirical Bayes smoothing are shown in Appendix B, Table B1. 2008 2009 1.00 0.75 0.50 True positives 0.25 Profile Regulated pest profile Non−regulated pest profile Administrative profile Combined profile 0.00 2010 2011 1.00 0.75 0.50 0.25 00 1. 75 0. 50 0. 25 0. 00 0. 00 1. 75 0. 50 0. 25 0. 0. 00 0.00 False positives Figure 2: ROC curves showing the performance of four profiling strategies for four years of data (2007–2011). The profiles are constructed by tariff and importer using the previous year’s inspection data. A line is added at x  y to facilitate comparison. 3.2.2 Within tariffs As noted in Section 2.3, our suspicion was that tariff-to-tariff variation would dominate the ROC signal, evidence for which was supported via modelling (Table 2). Figure 3 plots the AUCs arising from the regulated pest profile vs. the AUCs arising from the non-regulated pest, administrative and combined profiles respectively, within tariff. Each point represents an ROC curve applied to a single tariff, where the entities within the tariff that are being profiled are the suppliers. The size of each point indicates the number of regulated pest interceptions in the tariff, providing a sense of importance of that tariff. A relationship between the number of regulated pest interceptions and AUC is not apparent in Figure 3; we would expect larger points in the top right corner if this were the case. However, within-tariff variation is considerable, providing a measure of conservatism against the strong performance of the across-tariff 6 Table 1: Summary of AUC values for profiling strategies, by year. The profiles are as follows: Regulated pest refers to using the previous year’s regulated pest interception rate; Non-regulated pest refers to using the previous year’s non-regulated pest interception rate; Administrative refers to using the previous year’s administrative interception rate; and Combined refers to using the previous year’s combined interception rate. Each AUC is computed using the data from the following year’s inspections. Profile 2008 2009 2010 2011 Regulated pest Non-regulated pest Administrative Combined 0.902 0.870 0.743 0.859 0.929 0.909 0.815 0.896 0.935 0.892 0.803 0.890 0.939 0.920 0.813 0.905 comparisons shown in Table 1. This suggests that any profiling undertaken would need to take account of the tariff being profiled. Correlation between the regulated pest profile and the other profiling strategies appears strong, especially between the non-regulated pest and combined profiles. The administrative profile results however, show that many of the regulated pest AUCs lie above the y  x line. This suggests that the administrative profiles are likely to perform worse than the non-regulated pest profiles. This observation is supported by the findings of the statistical modelling (Table 2), where the model that included non-regulated pest interception rates performed better than the model including administrative interception rates. Administration profile ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● 6 5 0 0. 9 ● 0. 0. 8 ● ●● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● 7 ● ● ● ●● ● ● ● ● ● ● ● 7 ● ● ● ● ● 0. ● ● ● ● 5 ● ● 0 ● ●● ● 7 6 0. 5 0. ● ● 0. ● ●● ● ● ● ● ● ● 2011 ● ● ● ● 9 ● ● ● ●● 0. ● ●● ● ● ● ● ●● ● ● ● ● 8 ● ● ● ● ● ● ●● ● ● ● ● ● ● 0. ● ● ● 0. ● ● ● ● ● ● ● ●● ● ● ● ● 1. ● ● ● ● ● ● ● ● 0. ● ● ●● ● 2010 ● 0 100 ● 200 ● 300 ● 400 ● ● ● ● Fails ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 9 ● ● ● 0. ● ● 2009 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 0. ● ● ● ● ●● 0 ●● ● ● ●● ● ● ●● ● ● ● ● ● 1. ● ● ● ● ● ● ● ● ● 8 ● ● ● ● ● 0. ●● ● ● 6 ● 0. ● ●● ● ● ● ● Combined profile ●● ● 1. ● 2008 AUC (Regulated pest profile) Non−regulated pest profile 1.0 0.9 0.8 0.7 0.6 0.5 1.0 0.9 0.8 0.7 0.6 0.5 1.0 0.9 0.8 0.7 0.6 0.5 1.0 0.9 0.8 0.7 0.6 0.5 AUC Figure 3: AUCs computed for each tariff code in the data to assess within-tariff profiling operationally, by year. The y-axis is the AUC using the previous year’s regulated pest interception rate to set the profile. The x-axis in each panel is the AUC using the previous year’s non-regulated pest, administration, and combined interception rates to set the profile, assessed on the same inspection data that are used for the y-axis. The size of the point is related to the number of fails within the profile, and a line has been added at x  y to facilitate comparison. 7 3.3 Comparison of profiles using statistical modelling Comparison of the models from Section 2.4 is reported in Table 2. Model M3 has the lowest LOOIC, and the difference in LOOIC between M3 and M4 (98.3) is much larger than the standard error of its difference (17). These results show that supplier and tariff information are important for predicting regulated pest interception probability. The models are greatly improved with the addition of interaction terms between suppliers and tariffs, and suppliers and years. After allowing for the effects of suppliers, tariffs and years, the addition of: the previous year’s regulated pest interception rate (M4 vs M3), the previous year’s nonregulated pest interception rate (M5 vs M3) and the previous year’s administrative interception rates (M6 vs M3) do not improve the model. Table 2: LOOIC-based comparison of statistical profiling models. The model with the smallest LOOIC (M3) is shown first, with subsequent rows ordered by increasing LOOIC. ∆LOOIC shows the difference in LOOIC between all models and model M3; se(∆LOOIC) shows the estimated standard error of the difference. Eff. P gives the estimated effective number of parameters; se(Eff. P) shows its standard error. Model M3 M4 M2 M1 M5 M6 Base LOOIC se(LOOIC) Eff. P se(Eff. P) ∆LOOIC se(∆LOOIC) 2788 2886 2980 3093 3141 3154 3250 111 117 127 133 146 145 148 458 470 419 377 449 446 392 23.3 25.0 23.6 22.5 26.6 26.9 23.7 98.3 192.2 304.9 353.7 366.2 462.7 17.0 29.4 36.6 49.6 49.6 52.3 Figure 4 shows the out-of-sample mean log predictive density and AUCs (Section 2.3) for all statistical profiling methods. Also shown in Figure 4b are the AUCs from the regulated pest profile. Models Base–M3 perform the best in terms predictive log-likelihood (larger values are better), with no clear demarcation between them. In comparison, Models M1 and M2, as well as the Base model and the empirical Bayes profile perform best on AUC. 3.4 Model examination In this section we present an investigation of the effects from Model M3. We decided to investigate Model 3 further due to its superior performance in LOOIC (Table 2), as well as the within-supplier examinations that would be available due to the interaction term. Figure 5 shows the marginal log-odds for tariffs from Model M3, ordered left-to-right by decreasing probability of their marginal log-odds being greater than 0; bars in the figure show 90% posterior credible intervals. The inset shows the top 10 tariffs, and as to be expected, Kiwi fruit is the tariff that contributes the highest risk. Figure 6 shows the marginal log-odds for suppliers from the model, ordered left-to-right by decreasing posterior probability of their marginal log-odds being greater than 0; bars in the figure show 90% posterior credible intervals. Supplier labels have been masked for privacy reasons. The suppliers to the left of the figure are those predicted to have a large increase in probability of regulated pest interception, all else being equal. It is these suppliers that would naturally be the first targets in an operational capacity. Figure 7 provides a closer examination of the risky suppliers. We have selected the top 25 suppliers (by the probability of their marginal log-odds being greater than 0) and calculated their posterior probability of a regulated pest being present in a consignment, averaged over all years from Model M3. The panel on the left shows their posterior probability for each tariff that the supplier imports, whilst the panel on the right shows the observed proportion (averaged over years) of regulated pest interceptions by tariff. This figure shows that these highest risk suppliers import a range of tariffs — i.e. their poor performance is not necessarily due to importing one or two of the highest risk tariffs (as shown in Figure 5). Further, in the tariffs they do import, they have consistently high levels of consignments with regulated pest contamination (right panel, Figure 7). 8 0.96 −600 ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● Mean log predictive density −900 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ●● ● ● ● ●● ● ● −1200 ● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ●● ● ● ●●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● 0.88 ● ● ●● ● ● ● ● ● −1800 M1 M2 M3 ● M4 ●● ● ● ● Base ● ● ● ● ●● ● ● ● ● ● ● ● 0.90 ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ●●● ● ● ●● ●● ●●●● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ●●● ●● ●●● ● ● ● ● ● ● ● ● ●● ●● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ●● ● ●● ●●● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ●●● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● 0.92 ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ●● ● ●● ●● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●●● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 0.94 ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ●● ● ● ● ● −1500 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● AUC ● ● ● M5 ● ● ● M6 Base M1 M2 M3 Model M4 M5 M6 E. Bayes Model (a) Mean log predictive density (b) AUCs Figure 4: Mean log predictive density and AUCs for statistical profiling. Also shown are the AUCs from the regulated pest profile (E. Bayes). 5 ● ● 4 Log−odds ● ● ● ● 3 ● ● ● ● ● 3 ● ● ● ● 2 ● ● ● 1 I . IT TC ES NS ETC OS RUIT RINS KIW FRU AD NG EMO SE EF NDA ES RA L UT OC ER RI AP V N O A H A R R O M G C OT BE CO AN R CR HE OT ● ● ● ● ● ● Log−odds ● ● ● S FIG Tariff 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 Tariff Figure 5: Marginal log-odds of the tariff effect in Model 3. Tariffs are ordered left-to-right by decreasing probability of their marginal log-odds being greater than 0, with bars showing 90% posterior credible intervals. The inset shows the top 10 tariffs. 9 ● ● 5 ● ● ● ●● ● ● ● ● ● Log−odds ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● 0 ●●●●●● ● ● ● ● ● ● ● ●●● ● ●●●●● ● ● ●● ● ●●●●●● ● ●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●● ●●●●●●●● ●● ● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ●● ●● ● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●● ●●●●●●● ● ●●●● ● ● ●●● ●● ●● ●● ● ●●●●● ● ●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●●●●●●●●●● ● ● ● ● ● ● ● ●●●●●●●● ●●●● ● ● ●● ●● ●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●● ●●●●●●●● ●●●●● ●●●●●●● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ●●●●●●●●●●● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●●●●●●●●●●●● ●●● ● ● ● ● ●●●●●●●● ●●●●●●●●● ●●● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ● ● ● ● ●●●●● ● ● ●●●●●●●●●●●●● ●● ● ● ●●●●●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●●●● ●● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ●●● Supplier Figure 6: Marginal log-odds of the supplier effect in Model M3. Suppliers are ordered left-to-right by decreasing posterior probability of their marginal log-odds being greater than 0, with bars showing 90% posterior credible intervals. The inset shows the top 10 tariffs. 10 9dee147d 1.00 ● ● 7bfad049 ● ● ● ● ● 55ae4f82 74a73160 6231360f 9dee147d 7bfad049 55ae4f82 74a73160 6231360f 203fbda3 82670a18 44cae1d7 6dfe2b2a c9682083 f0e7fd1e 20d5c696 9cf52fe3 34ad4281 a06fde74 f068d765 be33449c 8e72b4f3 5f3f4978 668f1352 0a2da0c2 ecd0d8bd 533a6427 b3a378be 8f2c99f4 1.00 ● ● 0.75 0.75 ● 0.50 0.50 0.25 ● ● ● 0.25 ● ● ● ● ● ● ● ● 0.00 82670a18 44cae1d7 6dfe2b2a 0.00 c9682083 1.00 1.00 ● ● ● 0.75 ● ● ● ● ● 0.50 ● ● 0.25 ● ● ●● ● ● ● 0.00 f0e7fd1e ● 20d5c696 9cf52fe3 34ad4281 a06fde74 1.00 0.75 ● ● ● 0.50 ● 0.25 ● ● ● ● 0.00 f068d765 be33449c 8e72b4f3 ● ● 5f3f4978 668f1352 1.00 ● 0.75 ● 0.50 ● 0.25 Proportion of consignments with regulated pests Posterior probability of regulated pest in consignment 203fbda3 ● 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 ● ● 0.00 0a2da0c2 ecd0d8bd ● 533a6427 b3a378be 0.00 8f2c99f4 1.00 1.00 0.75 0.75 0.50 0.50 ● ● ● 0.25 ● ● ● 0.00 0.25 ● ● ● ● ● 0.00 ● ● ● Tariff Tariff (a) Posterior probability of a regulated pest interception. (b) Observed proportion of consignments with regulated pests. Figure 7: Posterior probability of a regulated pest interception in the top 25 suppliers, along with the observed proportion of consignments with regulated pests. Panels are ordered left-to-right, top-to-bottom by the probability of their marginal log-odds being greater than 0; bars in the left panel show 90% posterior credible intervals. Figure 8 provides a closer examination of suppliers who pose minimal risk. We have selected the bottom 25 suppliers (by the probability of their marginal log-odds being greater than 0) and calculated their posterior probability of a regulated pest being present in a consignment, averaged over all years from Model M3. The panel on the left shows their posterior probability for each tariff that the supplier imports, whilst the panel on the right shows the observed proportion (averaged over years) of consignments that did not contain regulated pests by tariff. Similar to Figure 7, these suppliers import a range of tariffs — i.e. their good performance is not necessarily due to importing lower risk tariffs. However, in comparison to the risky suppliers, in the tariffs they do import, they have consistently high levels of consignments without regulated pest contamination (right panel, Figure 8). 4 Discussion and Conclusion There was a strong association between regulated pest interceptions and the lower-risk administrative and non-regulated pest interceptions (Section 3.1). This association was also observed when using administrative interceptions as a predictor for operational profiling (Figure 2), demonstrating the utility of the operational profiling approaches. However we note that this does not carry over into the statistical models (Section 3.3), for which including historical rates as predictor variables did not improve model fits. The statistical profiles still performed well using the cross-validated AUCs (Figure 4) as well as the predictive log-likelihood. Thus, in answer to our motivating question of which data provide the most useful information about the pathway, we would conclude that it is the knowledge of particular suppliers, tariffs and their combination that is most informative. The previous year’s regulated pest profile performed well based on AUC, however adding this to the statistical profiles gave no benefit (Table 2). Furthermore, there is limited scope for investigating why a particular supplier may be problematic. Statistical profiling, in comparison, allows decisions to be based on posterior probabilities. For example, we could calculate a supplier’s (marginal) probability of having a regulated pest interception. Intervention could then be planned 11 f34c97e8 be39a51e be39a51e 8723f064 2c302663 03c0b979 ba1580ff 9cdce10f 54c7e3b2 2d5df21a f535cc3c 5ab4f1e8 3b151c4f 6436bd07 03fcb7fe bd0901b4 86013f1c dfd0892b 3a3a78cb 39fc5438 06ed4b13 cbccf62e b5864029 c4da873b 09ec8134 ba5e1e12 0.25 ● ● ● ● ● ● ● ● 9cdce10f ● ● ● ● 54c7e3b2 ● ● ● ● ● ● ● ● f535cc3c ● ● ● ● 0.00 2d5df21a ● 3b151c4f ● ● ● 6436bd07 03fcb7fe ● ● ● bd0901b4 ● ● ● ● 86013f1c ● ● ● ● dfd0892b ● 3a3a78cb ● ● 39fc5438 ● ● 06ed4b13 ● ● ●● ● cbccf62e ● ● ● ● b5864029 ●● ● ● ● c4da873b ● ● ● ● ●●● ● ● ● ● 09ec8134 Proportion of consignments without regulated pests Posterior probability of regulated pest in consignment f34c97e8 0.50 ● ● ● 5ab4f1e8 0.5 0.4 0.3 0.2 0.1 0.0 03c0b979 ● ● ● 0.5 0.4 0.3 0.2 0.1 0.0 0.5 0.4 0.3 0.2 0.1 0.0 2c302663 0.75 ba1580ff 0.5 0.4 0.3 0.2 0.1 0.0 8723f064 1.00 0.5 0.4 0.3 0.2 0.1 0.0 1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00 ● ba5e1e12 1.00 0.75 0.50 0.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● Tariff Tariff (a) Posterior probability of a regulated pest interception. (b) Observed proportion of consignments without regulated pests. Figure 8: Posterior probability of a regulated pest interception in the bottom 25 suppliers, along with the observed proportion of consignments without regulated pests. Panels are ordered left-to-right, top-to-bottom by the probability of their marginal log-odds being less than 0; bars in the left panel show 90% posterior credible intervals. on either the top ranked suppliers (if funds are limited), or all suppliers that meet a threshold. A benefit of using a statistical modelling approach to investigate profiling is the added level of interrogation possible from the fitted model. In Section 3.4 we demonstrated how we can gain a clearer insight into the governance of this pathway. Firstly, by studying the marginal effects of suppliers and tariffs, we can build up a picture of risk without relying on observed rates, which are noisy due to sampling and process error. We can pinpoint which tariffs and suppliers contribute to excessive risk, essentially by an ordering of the marginal log-odds, and then choose to investigate those that have a posterior probability higher than a pre-defined cutoff set by management. With a list of potentially risky suppliers, we further demonstrated how a manager could gain information into the governance of those suppliers by investigating the posterior predicted probabilities of regulated pest interceptions (Figure 7). This information could be used to initially examine why a particular supplier may be having trouble with contaminated consignments, and be used to help improve their processes. Likewise, looking at the less risky importers (Figure 8) may provide information on good process that can be shared to the riskier importers. To summarise, statistical models constructed using inspection data provide sufficient information for profiling purposes, and can be used to further interrogate the governance of multiple pathways and to help identify the processes underlying poor performance on these pathways. 5 Acknowledgments The authors are grateful to Lindsay Penrose (ABARES, Australian federal Department of Agriculture and Water Resources) for statistical analysis on an early draft not reported here and Brendan Woolcott (Plant Division, Australian federal Department of Agriculture and Water Resources) for providing the data for the analysis. 12 References Arthur, T., Zhao, S., Robinson, A., Woolcott, B., Perotti, E., and Aston, C. (2013). Statistical Modelling and Risk Return Improvements for the Plant Quarantine Pathway. Technical Report 1206F 1, Australian Centre of Excellence for Risk Analysis. Beale, R., Fairbrother, J., Inglis, A., and Trebeck, D. (2008). One Biosecurity: a Working Partnership. Commonwealth of Australia. Department of Immigration and Border Protection (2016). Customs tariff act 1995. Dodge, H. (1943). A sampling inspection plan for continuous production. The Annals of Mathematical Statistics, 14(3):264–279. Dodge, H. F. and Torrey, M. N. (1951). Additional continuous sampling plans. Industrial Quality Control, 7(5):7–12. Gabry, J. and Goodrich, B. (2016). rstanarm: Bayesian Applied Regression Modeling via Stan. R package version 2.12.1. Pimentel, D. (2011). Biological Invasions: Economic and Environmental Costs of Alien Plant, Animal, and Microbe Species. Hoboken: CRC Press, 2011., second edition. R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Robinson, A., Bell, J., Woolcott, B., and Perotti, E. (2012). DAFF Biosecurity: Plant-Product Pathways. Technical Report 1001J 1, Australian Centre of Excellence for Risk Analysis. Robinson, A., Woolcott, B., Holmes, P., Dawes, A., Sibley, J., Porter, L., and Kirkham, J. (2013). Plant Quarantine Inspection and Auditing across the Biosecurity Continuum. Technical Report 1101C 1, Australian Centre of Excellence for Risk Analysis. Robinson, A. P., Chisholm, M., Mudford, R., and Maillardet, R. (2015). Ad hoc solutions to estimating pathway non-compliance rates using imperfect and incomplete information. In Jarrad, F., Low-Choy, S., and Mengersen, K., editors, Biosecurity Surveillance: Quantitative Approaches, CABI invasive series, pages 167–180. CABI, Boston. Sinden, J., Jones, R., Hester, S., Odom, D., Kalisch, C., James, R., Cacho, O., and Griffith, G. (2005). The Economic Impact of Weeds in Australia. Plant protection quarterly, 20(1):25–32. Vehtari, A., Gelman, A., and Gabry, J. (2016). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing, pages 1–20. 13 Appendix A Calculation of rates using empirical Bayes In this appendix, we detail the procedure used for calculating the smoothed rates in Section 2.1 via empirical Bayes. Recall that we have X st y the number of failures out of n st y inspections from tariff t performed in year d y from supplier s; we assume that X st y  Binomial(p st y , n st y ). To find the empirical Bayes estimate of p st y for supplier s, in tariff t and year y, assume that the binomial  proportions p st y have a prior Beta distribution: p st y ∼ Beta α t y , β t y . Then X st y has a Beta-binomial distribution, with probability mass function Pr X st y     Γ n st y + 1 Γ k + α t y Γ n st y − k + β t y Γ α t y + β t y n st y    . k  k Γ (k + 1) Γ n st y − k + 1 Γ n st y + α t y + β t y Γ αt y Γ βt y    The parameters α t y and β t y are found using maximum likelihood:  ( α̂ t y , β̂ t y  arg max − α t y ,β t y S Õ s1 log Pr X st y  x st y  ) where x st y is the observed value of X st y , and S is the number of suppliers. To complete the calculation, the rates for supplier s in tariff t and year y, are updated using the following formula: p̃ st y  x st y + α̂ t y n st y + α̂ t y + β̂ t y 14 Appendix B Further results Table B1: Summary of AUC values for profiling strategies, by year. The profiles are as follows: Tariff and Supplier refers to profiles constructed for the interaction of tariff and supplier, Supplier within Tariff refers to averaging the supplier interception rates within tariffs, and Supplier within Tariff, Smoothed refers to the using the empirical Bayes estimate of the suppliers within tariffs and years. Regulated pest refers to using the previous year’s regulated pest interception rate; Non-regulated pest refers to using the previous year’s non-regulated pest interception rate; Administrative refers to using the previous year’s administrative interception rate; and Combined refers to using the previous year’s combined interception rate. Each AUC is computed using the data from the following year’s inspections. Profile 2008 2009 2010 2011 Tariff and Supplier Regulated pest Non-regulated pest Administrative Combined 0.881 0.849 0.728 0.833 0.899 0.878 0.767 0.861 0.902 0.859 0.759 0.854 0.917 0.890 0.776 0.867 Supplier within Tariff Regulated pest Non-regulated pest Administrative Combined 0.861 0.839 0.795 0.836 0.897 0.880 0.841 0.881 0.903 0.864 0.824 0.871 0.906 0.907 0.842 0.901 Supplier within Tariff, Smoothed Regulated pest Non-regulated pest Administrative Combined 0.902 0.870 0.743 0.859 0.929 0.909 0.815 0.896 0.935 0.892 0.803 0.890 0.939 0.920 0.813 0.905 15