Abstract
Sample average approximation (SAA) is a widely popular approach to data-driven decision-making under uncertainty. Under mild assumptions, SAA is both tractable and enjoys strong asymptotic performance guarantees. Similar guarantees, however, do not typically hold in finite samples. In this paper, we propose a modification of SAA, which we term Robust SAA, which retains SAA’s tractability and asymptotic properties and, additionally, enjoys strong finite-sample performance guarantees. The key to our method is linking SAA, distributionally robust optimization, and hypothesis testing of goodness-of-fit. Beyond Robust SAA, this connection provides a unified perspective enabling us to characterize the finite sample and asymptotic guarantees of various other data-driven procedures that are based upon distributionally robust optimization. This analysis provides insight into the practical performance of these various methods in real applications. We present examples from inventory management and portfolio allocation, and demonstrate numerically that our approach outperforms other data-driven approaches in these applications.










Similar content being viewed by others
Notes
Indeed, nonparametric comparison tests focus on other location parameters such as median.
Such a hypothesis is called simple. By contrast, a composite hypothesis is not defined by a single distribution, but rather a family of distributions, and asserts that the data-generating distribution F is some member of this family. An example of a composite hypothesis is that F is normally distributed (with some, unknown mean and variance). We do not consider composite hypotheses in this work.
Note that standard definitions of the statistics have the form of \(X_N^2\) and \(G_N^2\) with thresholds given by the \(\chi ^2\) distribution. Our nonstandard but equivalent definition is so that thresholds have the rate \(O(1/\sqrt{N})\) to match other tests presented.
The only nuance is that Proposition 3.4 of [46] requires a generalized Slater point. We use the empirical distribution function, \(\hat{F}_N\), as the generalized Slater point in the space of distributions.
This is most easily seen by noting:
$$\begin{aligned} \mathbb {E}\left[ \hat{z}_{{\text {SAA}}}\right]&= \mathbb {E}\left[ \min _{x \in X}\frac{1}{N} \sum _{j=1}^N c(x; \xi ^j)\right] \le \min _{x \in X}\frac{1}{N}\sum _{j=1}^N\mathbb {E}\left[ c(x; \xi ^j)\right] = z_{{\text {stoch}}} \le \mathbb {E}\left[ c(x_{{\text {SAA}}};\xi )\right] . \end{aligned}$$
References
Bassamboo, A., Zeevi, A.: On a data-driven method for staffing large call centers. Oper. Res. 57(3), 714–726 (2009)
Bayraksan, G., Love, D.K.: Data-driven stochastic programming using phi-divergences. In: Tutorials in Operations Research, pp. 1–19 (2015)
Ben-Tal, A., den Hertog, D., De Waegenaere, A., Melenberg, B., Rennen, G.: Robust solutions of optimization problems affected by uncertain probabilities. Manag. Sci. 59(2), 341–357 (2013)
Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization. In: Society for Industrial and Applied Mathematics. SIAM, Philadelphia (2001)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Bertsimas, D., Doan, X.V., Natarajan, K., Teo, C.P.: Models for minimax stochastic linear optimization problems with risk aversion. Math. Oper. Res. 35(3), 580–602 (2010)
Bertsimas, D., Gupta, V., Kallus, N.: Data-driven robust optimization. Preprint arXiv:1401.0212 (2013)
Bertsimas, D., Popescu, I.: Optimal inequalities in probability theory: a convex optimization approach. SIAM J. Optim. 15(3), 780–804 (2005)
Billingsley, P.: Convergence of Probability Measures. Wiley, New York (1999)
Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming. Springer, New York (2011)
Birge, J.R., Wets, R.J.B.: Designing approximation schemes for stochastic optimization problems, in particular for stochastic programs with recourse. Math. Program. Study 27, 54–102 (1986)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Calafiore, G.C., El Ghaoui, L.: On distributionally robust chance-constrained linear programs. J. Optim. Theory Appl. 130(1), 1–22 (2006)
D’Agostino, R.B., Stephens, M.A.: Goodness-of-Fit Techniques. Dekker, New York (1986)
Delage, E., Ye, Y.: Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 55(3), 98–112 (2010)
DeMiguel, V., Garlappi, L., Nogales, F.J., Uppal, R.: A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Manag. Sci. 55(5), 798–812 (2009)
Dodge, Y.: The Oxford Dictionary of Statistical Terms. Oxford University Press, Oxford (2006)
Dudley, R.M.: Real Analysis and Probability, vol. 74. Cambridge University Press, Cambridge (2002)
Dupačová, J.: The minimax approach to stochastic programming and an illustrative application. Stoch. Int. J. Probab. Stoch. Process. 20(1), 73–88 (1987)
Efron, B.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York (2001)
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)
Grotschel, M., Lovasz, L., Schrijver, A.: Geometric Algorithms and Combinatorial Optimization. Springer, New York (1993)
Gurobi Optimization Inc.: Gurobi optimizer reference manual. http://www.gurobi.com (2013)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Homem-de Mello, T., Bayraksan, G.: Monte Carlo sampling-based methods for stochastic optimization. Surv. Oper. Res. Manag. Sci. 19(1), 56–85 (2014)
Jiang, R., Guan, Y.: Data-driven chance constrained stochastic program. Tech. rep., Technical report, University of Florida. Available at: Optimization Online http://www.optimization-online.org (2013)
King, A.J., Wets, R.J.B.: Epiconsistency of convex stochastic programs. Stoch. Stoch. Rep. 34(1–2), 83–92 (1991)
Klabjan, D., Simchi-Levi, D., Song, M.: Robust stochastic lot-sizing by means of histograms. Prod. Oper. Manag. 22(3), 691–710 (2013)
Kleywegt, A.J., Shapiro, A., Homem-de Mello, T.: The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2), 479–502 (2002)
Kullback, S.: A lower bound for discrimination information in terms of variation. IEEE Trans. Inf. Theory 13(1), 126–127 (1967)
Levi, R., Perakis, G., Uichanco, J.: The data-driven newsvendor problem: new bounds and insights. Oper. Res. (2015). doi:10.1287/opre.2015.1422
Lim, A.E., Shanthikumar, J.G., Vahn, G.Y.: Conditional value-at-risk in portfolio optimization: coherent but fragile. Oper. Res. Lett. 39(3), 163–171 (2011)
Lobo, M.S., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone programming. Linear Algebra Appl. 284(1), 193–228 (1998)
Mak, W.K., Morton, D.P., Wood, R.K.: Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24(1), 47–56 (1999)
Noether, G.E.: Note on the Kolmogorov statistic in the discrete case. Metrika 7(1), 115–116 (1963)
Popescu, I.: Robust mean-covariance solutions for stochastic optimization. Oper. Res. 55(1), 98–112 (2007)
Prékopa, A.: Stochastic Programming. Kluwer Academic Publishers, Dordrecht (1995)
Reed, M., Simon, B.: Methods of Modern Mathematical Physics, Vol. 1: Functional Analysis. Academic Press, New York (1981)
Rice, J.: Mathematical Statistics and Data Analysis. Thomson/Brooks/Cole, Belmont (2007)
Rockafellar, T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Rohlf, F.J., Sokal, R.R.: Statistical Tables, 4th edn. Macmillan, New York (2012)
Scarf, H.: A min–max solution of an inventory problem. In: Arrow, K.J., Karlin, S., Scarf, H. (eds.) Studies in the Mathematical Theory of Inventory and Production, pp. 201–209. Stanford University Press, Stanford (1958)
Scarsini, M.: Multivariate convex orderings, dependence, and stochastic equality. J. Appl. Probab. 35(1), 93–103 (1998)
Shaked, M., Shanthikumar, J.G.: Stochastic Orders. Springer, New York (2007)
Shapiro, A.: On duality theory of conic linear problems. In: Goberna, M.A., López, M.A. (eds.) Semi-Infinite Programming: Recent Advances, pp. 135–165. Kluwer Academic Publishers, Dordrecht (2001)
Shapiro, A., Ruszczyński, A. (eds.): Handbooks in Operations Research and Management Science: Vol. 10. Stochastic Programming. Elsevier, Amsterdam (2003)
Shawe-Taylor, J., Cristianini, N.: Estimating the moments of a random vector with applications (2003). http://eprints.soton.ac.uk/260372/1/EstimatingTheMomentsOfARandomVectorWithApplications.pdf
Stephens, M.A.: Use of the Kolmogorov–Smirnov, Cramér–Von Mises and related statistics without extensive tables. J. R. Stat. Soc. B 32(1), 115–122 (1970)
Thas, O.: Comparing Distributions. Springer, New York (2009)
Tikhonov, A.N.: On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39(5), 195–198 (1943)
Vapnik, V.: Principles of risk minimization for learning theory. In: NIPS, pp. 831–838 (1991)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Wächter, A., Biegler, L.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)
Wang, Z., Glynn, P., Ye, Y.: Likelihood robust optimization for data-driven problems. Preprint arXiv:1307.6279 (2013)
Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62(6), 1358–1376 (2014)
Žáčková, J.: On minimax solutions of stochastic linear programming problems. Časopis pro pěstování matematiky 91(4), 423–430 (1966)
Acknowledgements
The authors would like to thank the anonymous reviewers and associate editor for their extremely helpful suggestions and very thorough review of the paper. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1122374.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of Theorem 1
Proof
We will show that \(\mathcal {C}(x;\mathcal {F})\) is continuous in x and that when \(c(x;\xi )\) satisfies the coerciveness conditions, \(\mathcal {C}(x;\mathcal {F})\) is also coercive. The result will then follow from the usual Weierstrass extreme value theorem for deterministic optimization [5].
Let \(S=\{x\in X:\mathcal {C}\left( x;\mathcal {F}\right) <\infty \}\). By assumption \(x_0\in S\), so \(S\ne \varnothing \). Consequently, we can restrict to minimizing over S instead of over X.
Fix any \(x\in X\). Let \(\epsilon >0\) be given. By equicontinuity of the cost at x there is a \(\delta >0\) such that any \(y\in X\) with \(\left| \left| x-y\right| \right| \le \delta \) has \(\left| c\left( x;\xi \right) -c\left( y;\xi \right) \right| \le \epsilon \) for all \(\xi \in \varXi \). Fix any such y. Then
Note that (48) implies that S is closed relative to X, which is itself closed. Hence, S is closed. Furthermore, (47) and (48) imply that \(\mathcal {C}\left( x;\mathcal {F}\right) \) is continuous in x on S.
If S is compact, the usual Weierstrass extreme value theorem provides that the continuous \(\mathcal {C}(x;\mathcal {F})\) attains its minimal (finite) value at an \(x\in S\subseteq X\).
Suppose S is not compact. Since \(S\subseteq X\) is closed, this must mean S is unbounded and then X is unbounded and hence not compact. Then, by assumption, \(c(x;\xi )\) satisfies the coerciveness assumption. Because S is unbounded, there exists a sequence \(x_i\in S\) such that \(\lim _{i\rightarrow \infty }\left| \left| x_0-x_i\right| \right| =\infty \). Then by the coerciveness assumption, \(c_i(\xi )=c(x_i;\xi )\) diverges pointwise to infinity. Fix any \(F_0\in \mathcal {F}\). Let \(c'_i(\xi )=\inf _{j\ge i}c_j(\xi )\), which is then pointwise monotone nondecreasing and pointwise divergent to infinity. Then, by Lebesgue’s monotone convergence theorem, \(\lim _{i\rightarrow \infty }\mathbb {E}_{F_0}[c'_i(\xi )]=\infty \). Since \(c'_i\le c_j\) pointwise for any \(j\ge i\), we have \(\mathbb {E}_{F_0}[c'_i(\xi )]\le \inf _{j\ge i}\mathbb {E}_{F_0}[c_j(\xi )]\) and therefore
Thus \(\mathcal {C}(x;\mathcal {F})\ge \mathbb {E}_{F_0}[c(x;\xi )]\) is also coercive in x over S. Then, the usual Weierstrass extreme value theorem provides that the continuous \(\mathcal {C}(x;\mathcal {F})\) attains its minimal (finite) value at an \(x\in S\subseteq X\).\(\square \)
1.2 Proof of Proposition 2
Proof
Suppose that \(c(x;\xi )\rightarrow \infty \) as \(\xi \rightarrow \infty \). The case of unboundedness in the negative direction is similar. Choose \(\rho >0\) small so that \(\xi ^{(i)}-\xi ^{(i-1)}>2\rho \) for all i. For \(\delta >0\) and \(\xi '\ge \xi ^{(N)}+\rho \), let \(F_{\delta ,\xi '}\) be the measure with density function
In words, this density equally distributes mass on a \(\rho \)-neighborhood of every point \(\xi ^{(i)}\), except \(\xi ^{(N)}\). For \(\xi ^{(N)}\), we “steal” \(\delta \) of the mass to place around \(\xi '\). Notice that for any \(\xi '\), \(F_{0,\xi '}\) (i.e., take \(\delta =0\)) minimizes \(S_N(F_0)\) over distributions \(F_0\). Since \(\alpha >0\), \(Q_{S_N}(\alpha )\) is strictly greater than this minimum. Since \(S_N(F_{\delta ,\xi '})\) increases continuously with \(\delta \) independently of \(\xi '\), there must exist \(\delta >0\) small enough so that \(F_{\delta ,\xi '}\in \mathcal {F}_{S_N}^\alpha \) for any \(\xi '>\xi ^{(N)}+\rho \).
Let any \(M>0\) be given. By infinite limit of the cost function, there exists \(\xi '>\xi ^{(N)}+\rho \) sufficiently large such that \(c(x;\xi )\ge M N/\delta \) for all \(\xi \ge \xi '\). Then, we have \(\mathcal {C}\left( x;\mathcal {F}_{S_N}^\alpha \right) \ge \mathbb {E}_{F_{\delta ,\xi '}} \left[ c(x;\xi )\right] \ge \mathbb {P}\left( \xi \ge \xi '\right) M N/\delta =M\).
Since we have shown this for every \(M>0\), we have \(\mathcal {C}\left( x;\mathcal {F}_{S_N}^\alpha \right) =\infty \).\(\square \)
1.3 Computing a threshold \(Q_{C_N}(\alpha )\)
We provide two ways to compute \(Q_{C_N}(\alpha )\) for use with the LCX-based GoF test. One is an exact, closed form formula, but which may be loose for moderate N. Another uses the bootstrap to compute a tighter, but approximate threshold.
The theorem below employs a bound on \(\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \) to provide a valid threshold. This bound could either stem from known support bounds or from changing (15) to a two-sided hypothesis with two-sided confidence interval, using the lower bound as in (17) and the upper bound in (49) given below.
Theorem 16
Let \(N\ge 2\). Suppose that with probability at least \(1-\alpha _2\), \(\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \le \overline{Q_{R_N}}\left( \alpha _2\right) \). Let \(\alpha _1\in (0,1)\) be given and suppose \(F_0\preceq _\text {LCX}F\). Then, with probability at least \(1-\alpha _1-\alpha _2\),
where
Hence, defining \(Q_{C_N}(\alpha _1)\) equal to the right-hand side of (49), we get a valid threshold for \(C_N\) in testing \(F_0 \preceq _{\text{ L }CX} F\) at level \(\alpha _1\).
Proof
Observe
where the first inequality follows by distributing the sup and the second inequality follows because \(F_0 \preceq _{LCX} F\). Next, we provide a probabilistic bound on the last sup, which is the maximal difference over \(\left| \left| a\right| \right| _1+\left| b\right| \le 1\) between the true expectation of the hinge function \(\max \left\{ a^T\xi -b,0\right\} \) and its empirical expectation.
The class of level sets of such functions, i.e., \(\mathcal {S} =\left\{ \left\{ \xi \in \varXi :\max \left\{ a^T\xi -b,0\right\} \le t\right\} :\right. \left. \left| \left| a\right| \right| _1+\left| b\right| \le 1,\,t\in \mathbb {R}\right\} \), is contained in the class of the empty set and all halfspaces. Therefore, it has Vapnik-Chervonenkis dimension at most \(d+1\) (cf. [53]). Therefore, Theorem 5.2 and equation (5.12) of [53] provide that for any \(\epsilon >0\) and \(p\in (1,2]\),
where \(D_p(a,b)=\int _0^\infty \left( \mathbb {P}_F\left( \max \left\{ a^T\xi -b,0\right\} >t\right) \right) ^{1/p}dt\).
Notice that for any \(\left| \left| a\right| \right| _1+\left| b\right| \le 1\), we have \(0\le \max \left\{ a^T\xi -b,0\right\} \le \max \left\{ 1,\left| \left| \xi \right| \right| _\infty \right\} \le \max \left\{ 1,\left| \left| \xi \right| \right| _2\right\} \) and hence \(\mathbb {E}_F\left[ \max \left\{ a^T\xi -b,0\right\} ^2\right] \le \max \left\{ 1^2,\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \right\} \le 1+\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \). These observations combined with Markov’s inequality yields that, for any \(\left| \left| a\right| \right| _1+\left| b\right| \le 1\) and \(p\in (1,2)\), we have
This yields a bound on \(D_p(a,b)\) that is independent of (a, b). Using this bound takes \(D_p(a,b)\) out of the sup in (52) and by bounding \(\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \le \overline{Q_{R_N}}\left( \alpha _2\right) \) (which holds with probability \(1-\alpha _2\)), we conclude from (52) that (49) holds with probability \(1-\alpha _1-\alpha _2\) for any \(p\in (1,2)\). The p given in (50) optimizes the bound for \(N\ge 2\).
\(\square \)
Next we show how to bootstrap an approximate threshold \(Q_{C_N}(\alpha )\). Recall that we seek a threshold \(Q_{C_N}(\alpha )\) such that \(\mathbb {P}\left( C_N(F_0)>Q_{C_N}(\alpha )\right) \le \alpha \) whenever \(F_0\preceq _{{\text {LCX}}}F\). Employing (51), we see that a sufficient threshold is the \((1-\alpha )\)th quantile of
where \(\xi ^i\) are drawn IID from F. The bootstrap [20] approximates this by replacing F with the empirical distribution \(\hat{F}_N\). In particular, given an iteration count B, for \(t=1,\dots , B\) it sets
where \(\tilde{\xi }^{t,i}\) are drawn IID from \(\hat{F}_N\), i.e., IID random choices from \(\{\xi ^1,\dots ,\xi ^N\}\). Then the bootstrap approximates \(Q_{C_N}(\alpha )\) by the \((1-\alpha )\)th quantile of \(\{Q^1,\dots ,Q^B\}\). However, it may be difficult to compute (53) as the problem is non-convex. Fortunately (53) can be solved with a standard MILP formulation or by discretizing the space and enumerating (the objective is Lipschitz).
In particular, our bootstrap algorithm for computing \(Q_{C_N}(\alpha )\) is as follows:

1.4 Proof of Proposition 4
Proof
We first prove that a uniformly consistent test is consistent. Let \(G_0 \ne F\) be given. Denote by d the Lévy-Prokhorov metric, which metrizes weak convergence (cf. [9]), and observe that \(d(G_0, F) > 0\).
Define \(R_N=\sup _{F_0\in \mathcal {F}_N}d(F_0,F)\). We claim that if the test is uniformly consistent, then \(\mathbb {P}\left( R_N \rightarrow 0\right) = 1\). Suppose that for some sample path, \(R_N \not \rightarrow 0\). By the definition of the supremum, there must exist \(\delta > 0\) and a sequence \(F_N \in \mathcal {F}_N\) such that \(d(F_N, F) \ge \delta \) i.o. Since d metrizes weak convergence, this means that \(F_N\) does not converge to F. However, \(F_N \in \mathcal {F}_N\) for all N, i.e. it is never rejected. Therefore, by uniform consistency of the test, the event that \(R_N \not \rightarrow 0\) must have probability 0. I.e., \(R_N \rightarrow 0\) a.s.
Since a.s. convergence implies convergence in probability, we have that \(\mathbb {P}\left( R_N<\epsilon \right) \rightarrow 1\) for every \(\epsilon > 0\), and, in particular, for \(\epsilon = d(G_0, F)>0\). Hence we have,
This proves the first part of the proposition.
For the second part, we describe a test which is consistent but not uniformly consistent. Consider testing a continuous distribution F with the following univariate GoF test:
Notice that under the null-hypothesis, the probability of rejection is
where we’ve used that \(\xi ^1\) is independent of the rest of the sample, and \(F_0(\xi ^1)\) is uniformly distributed for \(F_0\) continuous. Consequently, the test is a valid GoF test and it has significance \(\alpha \).
We claim this test is also consistent. Specifically, consider any \(F_0 \ne F\). By continuity of \(F_0\) and consistency of the KS test,
However, the test is not uniformly consistent. Fix any continuous \(F_0\ne F\) and let
Observe that \(0\le F_0(\xi ^1)\le 1\) and \([0,1]=\bigcup _{k=0}^{2^\ell -1}\left[ \frac{k}{2^\ell },\,\frac{k+1}{2^\ell }\right] \). That is, for every \(\ell \), \(F_N=F_0\) at least once for \(N\in \{2^\ell ,\dots ,2^{\ell +1}-1\}\). Therefore \(F_N = F_0\) i.o., so it does not converge weakly to F. However, as constructed, \(F_N\) is never rejected by the above test. This is done for every sample path so the test cannot be uniformly consistent. \(\square \)
1.5 Proofs of Theorems 2 and 3
We first establish two useful results.
Proposition 8
Suppose \(\mathcal {F}_N\) is the confidence region of a uniformly consistent test and that Assumptions 1 and 3 hold. Then, almost surely, \(\mathbb {E}_{F_N}[c(x;\xi )]\rightarrow E_F[c(x;\xi )]\) for any \(x\in X\) and sequences \(F_N\in \mathcal {F}_N\).
Proof
Restrict to the a.s. event that \(\left( F_N\not \rightarrow F\implies F_N\notin \mathcal {F}_N\text { i.o.}\right) \). Fix \(F_N\in \mathcal {F}_N\). Then the contrapositive gives \(F_N\rightarrow F\). Fix x. If \(\varXi \) is bounded (Assumption 3a) then the result follows from the portmanteau lemma (see for example Theorem 2.1 of [9]). Suppose otherwise (Assumption 3b). Then \(\mathbb {E}_{F_N}[\phi (\xi )]\rightarrow \mathbb {E}_F[\phi (\xi )]\). By Theorem 3.6 of [9], \(\phi (\xi )\) is uniformly integrable over \(\{F_1,F_2,\dots \}\). Since \(c(x;\xi )=O(\phi (\xi ))\), it is also uniformly integrable over \(\{F_1,F_2,\dots \}\). Then the result follows by Theorem 3.5 of [9].\(\square \)
The following is a restatement of the equivalence between local uniform convergence of continuous functions and convergence of evaluations along a convergent path. We include the proof for completeness.
Proposition 9
Suppose Assumption 1 holds and \(\mathcal {C}(x_N;\mathcal {F}_N)\rightarrow \mathbb {E}_F[c(x;\xi )]\) for any convergent sequence \(x_N\rightarrow x\). Then (18) holds.
Proof
Let \(E\subseteq X\) compact be given and suppose to the contrary that \(\sup _{x\in E}\left| \mathcal {C}(x;\mathcal {F}_N)\right. \left. -\mathbb {E}_F[c(x;\xi )]\right| \not \rightarrow 0.\) Then \(\exists \epsilon >0\) and \(x_N\in E\) such that \(\left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathbb {E}_F[c(x_N;\xi )]\right| \ge \epsilon \) i.o. This, combined with compactness, means that there exists a subsequence \(N_1<N_2<\cdots <N_k\rightarrow \infty \) such that \(x_{N_k}\rightarrow x\in E\) and \(\left| \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})-\mathbb {E}_F[c(x_{N_k};\xi )]\right| \ge \epsilon \) \(\forall k\). Then,
By assumption, \(\exists k_1\) such that \(\left| \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})-\mathbb {E}_F[c(x;\xi )]\right| \le \epsilon /4\) \(\forall k\ge k_1\). By equicontinuity and \(x_{N_k}\rightarrow x\), \(\exists k_2\) such that \(\left| c(x;\xi )-c(x_{N_k};\xi )\right| \le \epsilon /4\) \(\forall \xi ,\,k\ge k_2\). Then,
Combining and considering \(k=\max \left\{ k_1,k_2\right\} \), we get the contradiction \(\epsilon \le \epsilon /2\) for strictly positive \(\epsilon \).\(\square \)
We prove the “if” and “only if” sides of Theorem 2 separately.
Proof
(Proofs of Theorem 3 and the “only if” side of Theorem 2) For either theorem restrict to the a.s. event that
(using Proposition 8 for Theorem 2 or by assumption of c-consistency for Theorem 3).
Let any convergent sequence \(x_N\rightarrow x\) and \(\epsilon >0\) be given. By equicontinuity and \(x_N\rightarrow x\), \(\exists N_1\) such that \(\left| c(x_N;\xi )-c(x;\xi )\right| \le \epsilon /2\) \(\forall \xi ,\,N\ge N_1\). Then, \(\forall N\ge N_1\),
By definition of supremum, \(\exists F_N\in \mathcal {F}_N\) such that \(\mathcal {C}(x;\mathcal {F}_N)\le \mathbb {E}_{F_N}[c(x;\xi )]+\epsilon /4\). By (54), \(\mathbb {E}_{F_N}[c(x;\xi )]\rightarrow \mathbb {E}_{F}[c(x;\xi )]\). Hence, \(\exists N_2\) such that \(\left| \mathbb {E}_{F_N}[c(x;\xi )]\right. \left. -\mathbb {E}_{F}[c(x;\xi )]\right| \le \epsilon /4\) \(\forall N\ge N_2\). Combining these with the triangle inequality
we get
Thus, by Proposition 9, we get that (18) holds.
Let \(A_N={\text {arg}}\min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)\). We now show that \(\bigcup _N A_N\) is bounded. If X is compact (Assumption 2a) then this is trivial. Suppose X is not compact (Assumption 2b). Using the same arguments as in the proof of Theorem 1, we have in particular that \(\lim _{\left| \left| x\right| \right| \rightarrow \infty }\mathbb {E}_F[c(x;\xi )]=\infty \), \(z_\text {stoch}=\min _{x\in X}\mathbb {E}_F[c(x;\xi )]<\infty \), that \(A={\text {arg}}\min _{x\in X}\mathbb {E}_F[c(x;\xi )]\) is compact, and each \(A_N\) is compact. Let \(x^*\in A\). Fix \(\epsilon >0\). By definition of supremum \(\exists F_N\in \mathcal {F}_N\) such that \(\mathcal {C}(x^*;\mathcal {F}_N)\le \mathbb {E}_{F_N}[c(x^*;\xi )]+\epsilon \). By (54), \(\mathbb {E}_{F_N}[c(x^*;\xi )]\rightarrow \mathbb {E}_{F}[c(x^*;\xi )]=z_\text {stoch}\). As this is true for any \(\epsilon \) and since \(\min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)\le \mathcal {C}(x^*;\mathcal {F}_N)\), we have \(\limsup _{N\rightarrow \infty }\min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)\le z_\text {stoch}\). Now, suppose for contradiction that \(\bigcup _N A_N\) is unbounded, i.e. there is a subsequence \(N_1<N_2<\cdots <N_k\rightarrow \infty \) and \(x_{N_k}\in A_{N_k}\) such that \(\left| \left| x_{N_k}\right| \right| \rightarrow \infty \). Let \(\delta '=\limsup _{k\rightarrow \infty }\inf _{\xi \notin D}c(x_{N_k};\xi )\ge \liminf _{N\rightarrow \infty }\inf _{\xi \notin D}c(x_{N};\xi )>-\infty \) and \(\delta =\min \left\{ 0,\delta '\right\} \). By D-uniform coerciveness, \(\exists k_0\) such that \(c(x_{N_k};\xi )\ge (z_\text {stoch}+1-\delta )/F(D)\) \(\forall \xi \in D,\,k\ge k_0\). In the case of Theorem 2, let \(F_N\) be any \(F_N\in \mathcal {F}_N\). In the case of Theorem 3, let \(F_N\) be the empirical distribution \(F_N=\hat{F}_N\in \mathcal {F}_N\). In either case, we get \(F_N\rightarrow F\) weakly. In particular, \(F_N(D)\rightarrow F(D)\). Then \(\mathbb {E}_{F_N}[c(x_{N_k};\xi )]\ge F_N(D)\times (z_\text {stoch}+1-\delta )/F(D)+\min \left\{ 0,\inf _{\xi \notin D}c(x_{N_k};\xi )\right\} \) \(\forall k\ge k_0\). Thus \(\limsup _{N\rightarrow \infty }\min _{x\in X}\mathcal {C}(x;\mathcal {F}_{N})\ge \limsup _{k\rightarrow \infty }\min _{x\in X}\mathcal {C}(x;\mathcal {F}_{N_k})\ge z_\text {stoch}+1-\delta +\delta =z_\text {stoch}+1\), yielding the contradiction \(z_\text {stoch}+1\le z_\text {stoch}\).
Thus \(\exists A_\infty \) compact such that \(A\subseteq A_\infty \), \(A_N\subseteq A_\infty \). Then, by (18),
yielding (19). Let \(x_N\in A_N\). Since \(A_\infty \) is compact, \(x_N\) has at least one convergent subsequence. Let \(x_{N_k}\rightarrow x\) be any convergent subsequence. Suppose for contradiction \(x\notin A\), i.e., \(\epsilon =\mathbb {E}_F[c(x;\xi )]-z_\text {stoch}>0\). Since \(x_{N_k}\rightarrow x\) and by equicontinuity, \(\exists k_1\) such that \(\left| c(x_{N_k};\xi )-c(x;\xi )\right| \le \epsilon /4\) \(\forall \xi ,\,k\ge k_1\). Then, \(\left| \mathbb {E}_F[c(x_{N_k};\xi )]-\mathbb {E}_F[c(x;\xi )]\right| \le \mathbb {E}_F[\left| c(x_{N_k};\xi )-c(x;\xi )\right| ]\le \epsilon /4\) \(\forall k\ge k_1\). Also \(\exists k_2\) such that \(\delta _{N_k}\le \epsilon /4\) \(\forall k\ge k_2\). Then, for \(k\ge \max \left\{ k_1,k_2\right\} \),
Taking limits, we contradict (19).\(\square \)
Proof
(Proof of the “if” side of Theorem 2) Consider any \(\varXi \) bounded (\(R=\sup _{\xi \in \varXi }\left| \left| \xi \right| \right| <\infty \)). Let \(X=\mathbb {R}^{d}\), and
Since \(\left| c_i(x;\xi )\right| \le 3\left| \left| x\right| \right| \), expectations exist. The gradient of each \(c_i\) at x has magnitude bounded by \(R\left| \left| x\right| \right| +3\) uniformly over \(\xi \), so Assumption 1 is satisfied. Also, \(\lim _{\left| \left| x\right| \right| \rightarrow \infty }c_i(x;\xi )\ge \lim _{\left| \left| x\right| \right| \rightarrow \infty }\left| \left| x\right| \right| =\infty \) uniformly over all \(\xi \in \varXi \) and \(c_i(x;\xi )\ge 0\), so Assumption 2 is satisfied.
Restrict to the a.s. event that (18) applies simultaneously for \(c_1,c_2,c_3,c_4\). Let any sequence \(F_N\not \rightarrow F\) be given. Let I denote the imaginary unit. Then, by the Lévy continuity theorem and Cramér-Wold device, there exists x such that \(\mathbb {E}_{F_N}\left[ e^{Ix^T\xi }\right] \not \rightarrow \mathbb {E}_{F}\left[ e^{Ix^T\xi }\right] \). On the other hand, by (18),
The first two limits imply that \(\sup _{F_0\in \mathcal {F}_N}\left| \mathbb {E}_{F_0}\left[ {\text {cos}}\left( x^T\xi \right) \right] -\mathbb {E}_{F}\left[ {\text {cos}}\left( x^T\xi \right) \right] \right| \rightarrow 0\) and the second two imply that \(\sup _{F_0\in \mathcal {F}_N}\left| \mathbb {E}_{F_0}\left[ {\text {sin}}\left( x^T\xi \right) \right] -\mathbb {E}_{F}\left[ {\text {sin}}\left( x^T\xi \right) \right] \right| \rightarrow 0\). Together, recalling that \(e^{It}={\text {cos}}(t)+I{\text {sin}}(t)\), this implies that
However, since, \(\mathbb {E}_{F_N}\left[ e^{Ix^T\xi }\right] \not \rightarrow \mathbb {E}_{F}\left[ e^{Ix^T\xi }\right] \), it must be that \(F_N\notin \mathcal {F}_N\) i.o.\(\square \)
1.6 Proof of Theorem 4
Proof
In the case of finite support \(\varXi =\{\hat{\xi }^1,\dots ,\hat{\xi }^n\}\), total variation metrizes weak convergence:
Restrict to the almost sure event \(d_\text {TV}(\hat{p}_N,p)\rightarrow 0\) (see Theorem 11.4.1 of [18]). We need only show that now \(\sup _{p_0\in \mathcal {F}_N}d_\text {TV}(\hat{p}_N,p_0)\rightarrow 0\), yielding the contrapositive of the uniform consistency condition.
By an application of the Cauchy–Schwartz inequality (cf. [22]),
By Pinsker’s inequality (cf. [31]),
Since both the \(\chi ^2\) and G-tests use a rejection threshold equal to \(\sqrt{Q/N}\) where Q is the \((1-\alpha )\)th quantile of a \(\chi ^2\) distribution with \(n-1\) degrees of freedom (Q is independent of N), we have that \(d_\text {TV}(\hat{p}_N,p_0)\) is uniformly bounded over \(p_0\in \mathcal {F}_N\) by a quantity diminishing with N.\(\square \)
1.7 Proof of Theorem 5
Proof
In the case of univariate support, the Lévy metric metrizes weak convergence:
Restrict to the almost sure event \(d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_n,F)\rightarrow 0\) (see Theorem 11.4.1 of [18]). We need only show that now \(\sup _{F_0\in \mathcal {F}_N}d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_N,F_0)\rightarrow 0\), yielding the contrapositive of the uniform consistency condition.
Fix \(F_0\) and let \(0\le \epsilon <d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_N,F_0)\). Then \(\exists \xi _0\) such that either (1) \(\hat{F}_N(\xi _0-\epsilon )-\epsilon >F_0(\xi _0)\) or (2) \(\hat{F}_N(\xi _0+\epsilon )+\epsilon <F_0(\xi _0)\). Since \(F_0\) is monotonically non-decreasing, (1) implies \(D_N(F_0)\ge \hat{F}_N(\xi _0-\epsilon )-F_0(\xi _0-\epsilon )>\epsilon \) and (2) implies \(D_N(F_0)\ge F_0(\xi _0+\epsilon )-\hat{F}_N(\xi _0+\epsilon )>\epsilon \). Hence \(d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_N,F_0)\le D_N(F_0)\). Moreover, \(D_N\le V_N\) by definition. Since \(\sup _{F_0\in \mathcal {F}_{S_N}^\alpha }S_N(F_0)=Q_{S_N}(\alpha )=O(N^{-1/2})\) for either statistic, both the KS and Kuiper tests are uniformly consistent.
Consider \(D'_N(F_0)=\max _{i=1,\,\dots ,\,N}\left| F_0(\xi ^{(i)})-\frac{2i-1}{2N}\right| =\sigma \left( F_0(\xi ^{(j)})-\frac{2j-1}{2N}\right) \), where j and \(\sigma \) are the maximizing index and sign, respectively. Suppose \(D'_N(F_0)\ge 1/\sqrt{N}+1/N\). If \(\sigma =+1\), this necessarily means that \(1-\frac{2j-1}{2N}\ge 1/\sqrt{N}+1/N\) and therefore \(N-j\ge \lceil \sqrt{N}\rceil +1\). By monotonicity of \(F_0\) we have for \(0\le k\le \lceil \sqrt{N}\rceil \) that \(j+k\le N\) and
If instead \(\sigma =-1\), this necessarily means that \(\frac{2j-1}{2N}\ge 1/\sqrt{N}+1/N\) and therefore \(j\ge \lceil \sqrt{N}\rceil +1\). By monotonicity of \(F_0\) we have for \(0\le k\le \lceil \sqrt{N}\rceil \) that \(j-k\ge 1\) and
In either case we have that
using \(D'_N(F_0)\ge 1/\sqrt{N}+1/N\) and \(\left| D'_N(F_0)-D_N(F_0)\right| \le 1/(2N)\) in the last inequality. Therefore,
Since \(F_0(\xi )(1-F_0(\xi ))\le 1\) and by using the integral formulation of CvM and AD (see [50]) the same is true replacing \(W^2_N\) by \(A^2_N\). For either of the CvM or AD statistic \(Q_{S_N}(\alpha )=O(N^{-1/2})\) and hence \(\sup _{F_0\in \mathcal {F}_{S_N}^\alpha }S^2_N(F_0)=O(N^{-1})\). Therefore, both the CvM and AD tests are uniformly consistent.
Letting \(M=\lfloor \frac{1}{2}+N(1-D'_N(F_0))\rfloor \) we have \(\sum _{i=1}^N\min \left\{ 1,\frac{2i-1}{2N}+D'_N(F_0)\right\} =\frac{M^2}{2N}+MD'_N(F_0)+N-M\) so that in the case of \(D'_N(F_0)\ge 1/\sqrt{N}+1/N\), \(\left( \frac{1}{N}\sum _{i=1}^N\min \left\{ 1,\frac{2i-1}{2N}+D'_N(F_0)\right\} -\frac{1}{2}\right) ^2=O(1/N)\). Thus, the Watson test is also uniformly consistent.\(\square \)
1.8 Proof of Proposition 5
Proof
Apply Theorem 2 to each i and restrict to the almost sure event that (18) holds for all i. Fix \(F_N\) such that \(F_N\in \mathcal {F}_N\) eventually. Then, (18) yields \(\mathbb {E}_{F_N}[c_i(x;\xi _i)]\rightarrow \mathbb {E}_{F}[c_i(x;\xi _i)]\) for every \(x\in X\). Summing over i yields the contrapositive of the c-consistency condition.\(\square \)
1.9 Proof of Proposition 6
Proof
Restrict to a sample path in the almost sure event \(\mathbb {E}_{\hat{F}_N}[\xi _i]\rightarrow \mathbb {E}_{\hat{F}}[\xi _i]\), \(\mathbb {E}_{\hat{F}_N}[\xi _i\xi _j]\rightarrow \mathbb {E}_{\hat{F}}[\xi _i\xi _j]\) for all i, j. Consider any \(F_N\) such that \(F_N\in \mathcal {F}^\alpha _{\text {CEG},N}\) eventually. Then clearly \(\mathbb {E}_{F_N}[\xi _i]\rightarrow \mathbb {E}_{F}[\xi _i]\), \(\mathbb {E}_{F_N}[\xi _i\xi _j]\rightarrow \mathbb {E}_{F}[\xi _i\xi _j]\).
Consider any \(F_N\) such that \(F_N\in \mathcal {F}^\alpha _{\text {DY},N}\) eventually. Because covariances exist, we may restrict to N large enough so that \(\left| \left| \hat{\varSigma }_N\right| \right| _2\le M\) (operator norm) and \(F_N\in \mathcal {F}^\alpha _{\text {DY},N}\). Then we get
and
which gives \(\left| \left| \mathbb {E}_{F_0}[\left( \xi -\hat{\mu }_N\right) \left( \xi -\hat{\mu }_N\right) ^T]-\hat{\varSigma }_N\right| \right| _2\le M\max \left\{ \gamma _{2,N}(\alpha )-1,\right. \left. 1-\gamma _{3,N}(\alpha )\right\} \rightarrow 0\). Then again, we have \(\mathbb {E}_{F_N}[\xi _i]\rightarrow \mathbb {E}_{ F}[\xi _i]\), \(\mathbb {E}_{F_N}[\xi _i\xi _j]\rightarrow \mathbb {E}_{ F}[\xi _i\xi _j]\).
In either case we get \(\mathbb {E}_{F_N}[c(x;\xi )]\rightarrow \mathbb {E}_{F_N}[c(x;\xi )]\) for any x due to factorability as in (26). This yields the contrapositive of the c-consistency condition.\(\square \)
1.10 Proof of Theorem 6
Proof
If \(F_0\ne F\) then Theorem 1 of [44] yields that either \(F_0\npreceq _\text {LCX}F\) or there is some \(j=1,\dots ,d\) such that \(\mathbb {E}_{F_0} [\xi _j^2]\ne \mathbb {E}_{F} [\xi _j^2]\). If \(F_0\npreceq _\text {LCX}F\) then probability of rejection approaches one since \(C_N>0\) but \(Q_{C_N}(\alpha _1)\rightarrow 0\). Otherwise, \(F_0\preceq _\text {LCX}F\) yields \(\mathbb {E}_{F_0} [\xi _i^2]\le \mathbb {E}_{F} [\xi _i^2]\) for all i via (12) using \(a=e_i\) and \(\phi (\zeta )=\zeta ^2\). Then \(\mathbb {E}_{F_0} [\xi _j^2]\ne \mathbb {E}_{F} [\xi _j^2]\) must mean that \(\mathbb {E}_{F_0}\left[ \left| \left| \xi \right| \right| _2^2\right] <\mathbb {E}_{F}\left[ \left| \left| \xi \right| \right| _2^2\right] \) and probability of rejection goes to one.\(\square \)
1.11 Proof of Theorem 7
Proof
Let \(R=\sup _{\xi \in \varXi }\left| \left| \xi \right| \right| _2<\infty \). Restrict to the almost sure event that \(\hat{F}_N\rightarrow F\). Consider \(F_N\) such that \(F_N\in \mathcal {F}_N\) eventually. Let N be large enough so that it is so. Fix \(\left| \left| a\right| \right| _2=1\). Let \(a_1=a\) and complete an orthonormal basis for \(\mathbb {R}^{d}\): \(a_1,a_2,\dots ,a_d\). On the one hand we have \(Q_{R_N}(\alpha _2)\ge \mathbb {E}_{\hat{F}_N}\left[ \sum _{i=1}^d(a_i^T\xi )^2\right] -\mathbb {E}_{F_N}\left[ \sum _{i=1}^d(a_i^T\xi )^2\right] \). On the other hand, for each i,
Therefore, \(q_N=Q_{R_N}(\alpha _2)+(d-1)p_N\ge \mathbb {E}_{\hat{F}_N}\left[ (a^T\xi )^2\right] -\mathbb {E}_{F_N}\left[ (a^T\xi )^2\right] \) and \(Q_{R_N}(\alpha _2),Q_{C_N}(\alpha _1),p_N,q_N\rightarrow 0\). Let \(G_N(t)=F_N(\left\{ \xi :a^T\xi \le t\right\} )\in [0,1]\) and \(\hat{G}_N(t)=\hat{F}_N(\left\{ \xi :a^T\xi \le t\right\} )\in [0,1]\) be the CDFs of \(a^T\xi \) under \(F_N\) and \(\hat{F}_N\), respectively. Then,
Because \(\hat{F}_N\rightarrow F\), we get \(\hat{G}_N(t)\rightarrow F(\left\{ \xi :a^T\xi \le t\right\} )\) and therefore \(G_N(t)\rightarrow F(\left\{ \xi :a^T\xi \le t\right\} )\) at every continuity point t. Because this is true for every a, the Cramer-Wold device yields \(F_N\rightarrow F\). This is the contrapositive of the uniform consistency condition.\(\square \)
1.12 Proof of Theorem 8
Proof
The proof amounts to dualizing a finite-support phi-divergence as done in Theorem 1 of [3] and is included here only for the sake of self-containment.
Let \(\phi _{X}(t)=(t-1)^2/t\) and \(\phi _{G}(t)=-\log (t)+t-1\) (corresponding to “\(\chi ^2\)-distance” and “Burg entropy” in Table 2 of [3], respectively). Then we can write
Therefore, letting \(c_j=c(x;\hat{\xi }^j)\) and fixing \(\phi \) and Q appropriately, the inner problem (4) is equal to
By Fenchel duality, the above is equal to
where \(\phi ^*(\tau )=\sup _{\rho \ge 0}(\tau \rho -\phi (\rho ))\) is the convex conjugate. Therefore, the inner problem (4) is equal to
It is easy to verify that
(see e.g. Table 4 of [3]). In the case of \(X_N\), since \(s\ge 0\), we get
In the case of \(G_N\), since \(s\ge 0\), we get
\(\square \)
1.13 Proof of Theorem 9
Proof
Problem (3) is equal to the optimization problems of Theorem 8 augmented with the variable \(x\in X\) and weak optimization is polynomially reducible to weak separation (see [23]). Tractable weak separation for all constraints except \(x\in X\) and (28) is given by the tractable weak optimization over these standard conic-affine constraints. A weak separation oracle is assumed given for \(x\in X\). We polynomially reduce separation over \(c_j\ge \max _kc_{jk}(x)\) for fixed \(c'_j,x'\) to the oracles. We first call the evaluation oracle for each k to check violation and if there is a violation and \(k^*\in \arg \max _kc_{jk}(x')\) then we call the subgradient oracle to get \(s\in \partial c_{jk^*}(x')\) with \(\left| \left| s\right| \right| _\infty \le 1\) and produce the separating hyperplane \(0\ge c_{jk^*}(x')-c_j+s^T(x-x')\).
\(\square \)
1.14 Proof of Theorem 10
Proof
Substituting the given formulas for \(K_{S_N},\,A_{S_N},\,b_{S_N,\alpha }\) for each \(S_N\in \{D_N,V_N,W_N,U,N,A_N\}\) in \(A_{S_N}\zeta -b_{S_N,\alpha }\in K_{S_N}\) we obtain exactly \(S_N(\zeta _1,\dots ,\zeta _N)\le Q_{S_N}(\alpha )\) for \(S_N\) as defined in (8). We omit the detailed arithmetic.\(\square \)
1.15 Proof of Theorem 13
Proof
Under these assumptions (3) is equal to the optimization problems of Theorem 11 or 12 augmented with the variable x and weak optimization is polynomially reducible to weak separation (see [23]). Tractable weak separation for all constraints except \(x\in X\) and (30) is given by the tractable weak optimization over these standard conic-affine constraints. A weak separation oracle is assumed given for \(x\in X\). By continuity and given structure of \(c(x;\xi )\), we may rewrite (30) as
We polynomially reduce weak \(\delta \)-separation over the kth constraint at fixed \(c'_{i},x'\) to the oracles. We call the \(\delta \)-optimization oracle to find \(\xi '\in [\xi ^{(i-1)},\xi ^{(i)}]\) such that \(c_k(x';\xi ')\ge \max _{\xi \in [\xi ^{(i-1)},\xi ^{(i)}]} c_k(x;\xi )-\delta \). If \(c_i'\ge c_k(x';\xi ')\) then \((c_i'+\delta ,x')\) satisfy the constraint and is within \(\delta \) of \((c_i',x')\). If \(c_i'< c_k(x';\xi ')\) then we call the subgradient oracle to get \(s\in \partial _x c_{k}(x',\xi ')\) with \(\left| \left| s\right| \right| _\infty \le 1\) and produce the hyperplane \(c_i\ge c(x';\xi ')+s^T(x-x')\) that is violated by \((c_i',x')\) and for any \((c_i,x)\) satisfying (55) (in particular if it is in the \(\delta \)-interior) we have \(c_i\ge \max _{\xi \in [\xi ^{(i-1)},\xi ^{(i)}]} c_k(x;\xi )\ge c_k(x;\xi ')\ge c_k(x';\xi ')+s^T(x-x')\) since s is a subgradient. The case for constraints (31) is similar.
\(\square \)
1.16 Proof of Proposition 7
Proof
According to Theorem 11, the observations in Example 2, and by renaming variables, the DRO (3) is given by
Applying linear optimization duality we get that its dual is
It can be directly verified that the following primal and dual solutions are respectively feasible
and that both have objective cost in their respective programs of
This proves optimality of x. Adding \(0=(c-b)\theta x-(r-c)(1-\theta )x\) to the above yields the form of the optimal objective given in the statement of the result. \(\square \)
1.17 Proof of Theorem 14
Proof
Fix x. Let \(S=\{(a,b)\in \mathbb {R}^{d+1}:\left| \left| a\right| \right| _1+\left| b\right| \le 1\}\). Using the notation of [46], letting C be the cone of nonnegative measures on \(\varXi \) and \(C'\) the cone of nonnegative measures on S, we write the inner problem as
Invoking Proposition 2.8 of [46] (with the generalized Slater point equal to the empirical distribution), we have that the strongly dual minimization problem is
If (40) is true, we can infer from constraint (57) that
This shows that \(\tau =0\) is the only feasible choice.
In general, \(\tau =0\) is a feasible choice and fixing it so provides an upper bound on (56) and hence on the original inner problem by weak duality. Moreover, plugging \(\xi _0\) in to (57) we conclude that
Hence, setting \(\tau =0\) in (57) and replacing \(\tau \) by the above bound in the objective provides a lower bound on (56) and hence on the original inner problem.
In order to study both cases, and both the upper and lower bounds in the latter case, we consider for the rest of the proof the following general problem given \(\eta \) and \(\nu \),
We first rewrite (59) using the representation (37):
Next, invoking [46], and employing the concave conjugate, we rewrite the kth of these constraints as follows:
Introducing the variables \(H_k\) and this equivalent constraint into the problem (58) and invoking [46] we find that (58) is equal to the dual problem:
Since \(c_k(x;\xi )\) is closed concave in \(\xi \), \(c_{k**}(x;\xi )=c_k(x;\xi )\). Moreover, recognizing that \(\psi _k=\max \left\{ a^Tq_k-p_kb,0\right\} \) is optimal, we can rewrite the above problem as:
Next, we rewrite (61) by splitting it over the different branches on the sum of K maxima, noting that the all-zeros branch is trivial:
Next, we invoke linear optimization duality to rewrite the \(\gamma \)th constraint as follows:

Introducing the variables \(\mu _\gamma ,\rho _\gamma ,u_\gamma ,v_\gamma ,u'_\gamma ,v'_\gamma \) and this equivalent constraint into (60) and invoking linear optimization duality yields \(\mathcal {C}'(x;\nu ,\eta )\) and the result.\(\square \)
1.18 Proof of Theorem 15
Proof
Under these assumptions, weak optimization is polynomially reducible to weak separation (see [23]) and separation is given for all constraints but (39). We polynomially reduce weak \(\delta \)-separation over the kth constraint at fixed \(x',h'_k,g'_k\). First we call the concave conjugate oracle to find \(\xi '\) such that \( {h_k'}^T\xi '-c_k(x';\xi ')\le c_{k*}(x';h_k')+\delta . \) If \(g_k'\le {h_k'}^T\xi -c_k(x';\xi ')\) then \(x',h'_k,g'_k-\delta \) satisfies the constraint and is within \(\delta \) of the given point. Otherwise, we call the subgradient oracle to get \(s\in \partial _x c_{k}(x',\xi ')\) with \(\left| \left| s\right| \right| _\infty \le 1\) and produce the hyperplane \(g_k\le h_k^T\xi '-c_k(x';\xi ')-s^T(x-x')\), which is violated by the given point and is valid whenever the original constraint is valid (in particular in its \(\delta \)-interior).\(\square \)
Rights and permissions
About this article
Cite this article
Bertsimas, D., Gupta, V. & Kallus, N. Robust sample average approximation. Math. Program. 171, 217–282 (2018). https://doi.org/10.1007/s10107-017-1174-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-017-1174-z
Keywords
- Sample average approximation of stochastic optimization
- Data-driven optimization
- Goodness-of-fit testing
- Distributionally robust optimization
- Conic programming
- Inventory management
- Portfolio allocation