Journal of Monetary Economics 69 (2015) 114–120
Contents lists available at ScienceDirect
Journal of Monetary Economics
journal homepage: www.elsevier.com/locate/jme
Four types of ignorance: Discussion$
Harald Uhlig a,b,c,n,1
a
Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, USA
NBER, USA
c
CEPR, UK
b
a r t i c l e in f o
abstract
Article history:
Received 12 December 2014
Accepted 12 December 2014
Available online 19 December 2014
This is a comment on Hansen and Sargent (2015). Using a generalized one-period setup, I
provide some intuition and perspective.
& 2014 Elsevier B.V. All rights reserved.
Keywords:
Risk
Robustness
1. Introduction
Let me try to give my own “birds-eye” perspective of this beautiful paper. The authors consider a version of a problem
posed by Friedman (1953). A policy maker seeks to minimize a discounted sum of E½X 2t þ 1 , where
X t þ 1 ¼ κ X t þ β U t þ αW t þ 1
ð1Þ
and where, as benchmark, κ ; β; α are known parameters, where Ut is chosen by the policy maker, whereas W t þ 1 is noise: as
a benchmark, it is normally distributed with mean zero and of a given variance. The authors consider four cases of ignorance
or distrust of the original model (1). The first follows Friedman (1953), in that the policy maker does not know β, but has a
Bayesian prior over β. The other three are examining the problem of a robust decision maker, who is concerned that the
outcome X t þ 1 is particularly bad, due to
2. … a distribution for β nearby the prior.
3. … a distribution for W t þ 1 nearby the benchmark normal.
4. … a distribution for W t þ 1 of particular form and nearby the benchmark normal.
After solving the Friedman case, the authors proceed to solve the next three, new cases. They show that concerns about
nearby distributions for β give rise to even more caution than the Friedman case, compared to knowing β for sure. Here, the
authors examine “multiplier preferences” and “constraint” preferences to constrain how far the decision maker will go in
contemplating alternative distributions. The authors provide a lucid and useful discussion into a numerical example
illustrating this case. The authors show that concerns about nearby distributions for W t þ 1 does not give rise to extra
☆
n
This research has been supported by the NSF Grant SES-1227280.
Correspondence address: Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, USA.
E-mail address: huhlig@uchicago.edu
1
I have an ongoing consulting relationship with a Federal Reserve Bank, the Bundesbank and the ECB.
http://dx.doi.org/10.1016/j.jmoneco.2014.12.006
0304-3932/& 2014 Elsevier B.V. All rights reserved.
H. Uhlig / Journal of Monetary Economics 69 (2015) 114–120
115
caution, and leaves the decision for Ut unaltered, compared to the case when β is known. However, specific misspecifications
as in the last case reintroduce caution.
This is a beautiful and insightful paper indeed. One can perhaps view it as an introduction to what is possible with the
tools provided in the book treatment by Hansen–Sargent (2008), wetting the readers appetite. One can also view it as a
paper about the scope and the need for caution for policy makers, when the “true” model is not known: a situation certainly
reasonable and appealing to many policy makers out there.
Generally, I found the calculations and discussions about the second case of ignorance (concerns about misspecifying the
distribution for β) the most interesting: I can well imagine that many readers will focus on that case for reading and further
research. By contrast, the last case of ignorance is the hardest to digest, and it may well be that future readers will tend to
skip that part. That, obviously, would be a shame. The paper is not exactly for the faint of heart, and there occasionally is
quite a bit of algebra and notation to digest, before one gets to see the key insights. The key insights are quite a reward for
this effort, though, so this is a price well worth paying. In my discussion, I shall try to abbreviate the path to these insights a
bit: precision gets lost and the insights get a bit distorted as a result. But every good game needs a cheat sheet, so here it is.
Or perhaps, this perspective is even more daunting than the original? In that case, hopefully it is a useful complement.
2. A generalized perspective
Let me strip the problem down a bit. There is a quadratic objective function hðÞ. There are some univariate and
independent normal probability distributions F and G. There are some other univariate probability distributions F~ and G~
under considerations for the policy maker. There is a notion of distance dð; Þ between distributions. Here, Hansen and
Sargent (2015) use relative entropy (and let us not dwell much here on what one needs to do to make that a proper
“distance” and whether and how their results might change as a result).
With these pieces on the table, the policy maker seeks to solve
VðxÞ ¼ max minfE½hðyÞ þ e δ VðyÞj
u
F~ ;G~
~ rγ
dðF; F~ Þ r η; dðG; GÞ
~
y ¼ x þ βu þ αw; β F~ ; w Gg
Friedman considered the case γ ¼ η ¼ 0. There are two places to introduce concerns for robustness:
1. “Robust-A” sets γ ¼ 0, focussing on robustness concerns and alternative distributions F~ regarding the feedback coefficient β.
2. “Robust-B” sets η ¼ 0, focussing on robustness concerns and alternative distributions G~ regarding the disturbance term w.
The use of relative entropy for calculating “distance” has a number of advantages. There is a natural interpretation as a
likelihood. One obtains analytical and closed-form results. There is some really beautiful algebra, resulting from it all. But
there is also a price to be paid. The algebra can get quite dense. The plots require numerics, anyhow. And one may wonder
which results depend specifically on using relative entropy and which results obtain more generally, when thinking about
“nearby” probability distributions.
So let me strip down the problem some more, by eliminating the dynamics. So, now the policy maker solves
VðxÞ ¼ max minfE½hðyÞj
u
F~ ;G~
~ rγ
dðF; F~ Þ r η; dðG; GÞ
~
y ¼ x þ βu þ αw; β F~ ; w Gg
It then is quite natural to write down the Lagrangian
~
VðxÞ ¼ max minfE½hðyÞ þ θdðF; F~ Þ þ λdðG; GÞ∣
u
F~ ;G~
~
y ¼ x þ βu þ αw; β F~ ; w Gg
From this, one can understand the two ways in which Hansen and Sargent approach solving for the robust solutions:
1. For “constraint preferences” or the operator C2 : fix η; γ . Then, θ; λ depend on (x,u).
2. For “multiplier preferences” or the operator T2 : fix θ; λ. Then, η; γ depend on (x,u).
For their last case, they also consider what I shall call “Robust-C”, where λ ¼ λðx; uÞ for some specific function of x; u.
116
H. Uhlig / Journal of Monetary Economics 69 (2015) 114–120
Let me analyze this a bit further. Let me first substitute out y and state the Lagrangian problem as
max minEF~ ;G~ ½hðx þ βu þ αwÞ þ θdðF; F~ Þ
u
F~ ;G~
~
λdðG; GÞ
ð2Þ
~ Given x, suppose, the solution is u; F ; G. What does one typically do, when faced with a (differentiable)
where β F~ ; w G.
maximization problem? Yes, let us calculate first-order conditions.
Doing so for probability distributions may feel a bit strange, but it is not that hard. To warm up, consider the case where
the F's are probability distributions over finitely many given points β1 ; …; β n . Then, some F can be represented as a vector in
Rn with the property that each vector entry is between 0 and 1 and that the vector entries sum to unity. Let us consider a
function f(F) from such probability vectors to the real line. Taking the derivative now just means taking the derivative of a
0
function defined2 on Rn . The derivative, evaluated at some F , is a gradient or “row” vector v ¼ f ðF Þ in Rn . One can think of
n
that as a linear mapping, telling us, what it will do to some other vector z A R , per calculating the inner product vðzÞ ¼ v z.
So, for some other distribution F^ and with z ¼ F^ F , this is the term that shows up in the usual first-order linear
approximation
f ðF^ Þ
f ðF Þ v ðF^
FÞ
This is basic calculus, of course. And it is also basic calculus, that this approximation is exact, if f ðÞ happens to be a linear or
affine-linear function. And finally, the first-order condition of setting that derivative to zero is the same3 as stating that the
inner product v ðF^ F Þ is zero, for all probability vectors F^ .
For more general probability distributions, these derivatives turn into Fréchet derivatives: once again, they are linear
mappings. The first-order conditions now say, that for all F^ ; G^
0
∂
: 0 ¼ EF ;G β h x þ β u þ αw
∂u
∂
: 0 ¼ EF^
∂F~
∂
: 0 ¼ EF ;G^
∂G~
F ;G
h x þ βu þ αw þ θd2 F; F F^
G
h x þ βu þ αw þ γ d2 G; G G^
ð3Þ
F
G
ð4Þ
ð5Þ
where I have exploited the linearity of the expectation operator, as a function of the probability distribution and where, say,
EF^ F ;G ½hðyÞ stands for the difference between the expectation of h(y) once calculated using the distribution F^ and once
calculated using the distribution F .
3. Robust-A: intuition
One can use this generalized perspective to gain some intuition for the “Robust-A” case and the results in Figs. 1 and 4 in
Hansen and Sargent. Start with the constraint preferences case. A given amount of distance in the distribution for β is now
permitted. Graphically, this is presented in Fig. 1 as some variation for the outcome x þ βu, indicated by the dotted vertical
lines. Drawn this way, it is hard to distinguish between Bayesian uncertainty regarding β or robustness concerns regarding
the distribution for β: perhaps then it is not surprising that both deliver rather similar decision rules indeed, as Fig. 4 in
Hansen and Sargent shows. Examine now the resulting variation in the objective function hðx þ β u þ wÞ. It is larger for larger
x: this is obvious for u ¼0, and it still holds generally, if the decision rule u(x) does not overcompensate for x, as is plausible.
The larger resulting variation in the objective function h translates into a larger potential loss: this is the “fear” expressed in
the robust-control max–min calculus of always having to play against the worst distribution. The larger potential loss means
a larger shadow value θ on constraining the original variation in β or a larger Lagrange multiplier for that constraint for
larger x. This is the result in Fig. 4 in Hansen–Sargent, and I suspect that it carries over to this more general perspective.
One can now turn this perspective on its head and enforce a multiplier θ of the same size or the same utility costs for
misspecifications regarding the β-distribution, as is done in the multiplier preference case. In Fig. 2, this is shown as
imposing the same range of variation for the objective function h, which now translates into a much narrower range for
x þ βu, if x is larger. Unless the original distribution F is a point mass, this now translates into a value for u closer to zero. This
is the result in Fig. 1 of Hansen and Sargent for larger x. It is offset for small values of x by the optimal linear reaction to x,
when β is known: there, the objective function h is sufficiently flat, so that the force of narrowing the range for x þ βu in
Fig. 2 is too small by comparison. Hansen and Sargent in their Fig. 1 show a hump-shaped decision rule for u as a function of
x 4 0 (and the same for x o0): approximately linear in x for small x, but converging to zero for larger x. The arguments here
suggest that once again this insight may be considerably more general (possibly allowing for multiple humps), but I have not
tried to work out the algebra. I suspect that one can make considerable progress here by investigating the implications of
the first-order conditions stated above or, potentially, considering second-order considerations.
2
Strictly speaking, it may only be defined on the simplex of probability vectors.
There is a subtle point here, in that we only insist on the derivative being zero in the “direction” of other probability distributions. Indeed, we might
have been “lazy” and define f only the probability vectors simplex.
3
117
H. Uhlig / Journal of Monetary Economics 69 (2015) 114–120
x1+ u
0
x2 + u
y
1
h
2
Fig. 1. Intuition for Fig. 4 in Hansen–Sargent and the Robust-A constraint-preferences case.
0
x 1+ u
x2+ u
y
h
Fig. 2. Intuition for Fig. 1 in Hansen–Sargent and the Robust-A multiplier-preferences case.
4. Robust-B: the no-increase-in-caution result
Consider now the “Robust-B” problem of focussing on robustness concerns and alternative distributions G~ regarding the
disturbance term w. Hansen and Sargent show the surprising and negative “no-increase-in-caution” result in Section 5, that
such robustness concerns do not alter the policy choice u. Can we get some insight into that result, using the generalized
perspective above?
~ Suppose that G^ differs from G by a small shift ϵ in wTo that end, examine the first-order condition (4) with respect to G.
space. Indeed, the shift in the mean of the normal distributions considered by Hansen and Sargent is exactly of this kind.
Calculate
EF ;G^ ½hðx þ βu þ αwÞ
¼ EF ;G ½hðx þ βu þ αðw þ ϵÞÞ
0
EF ;G ½hðx þ βu þ αwÞ þ ϵEF ;G ½h ðx þ βu þ αwÞ
Recall the first FOC (3). Let us additionally assume that F is a pointmass, i.e., that β is known with certainty, and additionally,
0
that it is different from zero. Then, (3) implies EF ;G ½h ðx þ βu þ αwÞ ¼ 0. Together with the calculation above, this implies
EF ;G^
G
½hðx þ βu þ αwÞ ¼ 0
Plug into the third first-order condition (5): the first term drops out and what remains is
0 ¼ γ d2 ðG; GÞðG^
GÞ
The only way to satisfy this equation for a proper distance measure is per G^ G ¼ 0. Put differently, adding some extra shift ϵ
only “costs distance” and it is better avoided. Allowing that extra flexibility of picking an ϵ-shifted G^ does not yield any
118
H. Uhlig / Journal of Monetary Economics 69 (2015) 114–120
advantages to the “evil probability-distribution-picking agent. And therefore, the original solution u is still the solution in
this “Robust-B” problem.
It seems to me that this may be what is truly at heart and behind the “no-increase-in-caution” result in Hansen–Sargent.
This somewhat heuristic reasoning shows that one may obtain this result rather quickly from examining the basic first-order
conditions, with little additional algebra.
5. Caution versus petrification
The algebra for the “Robust-B” case also shows the limits of the “no-increase-in-caution” result. If the G^ alternatives
were something else than shifts of the entire distribution by some ϵ, it does not seem likely that the “no-increase-incaution” result still obtains. In fact, the “no-increase-in-caution” result might be something like a benchmark in the middle.
Depending on the type of G^ alternatives, one might get an increase in caution: this is, in essence, what Hansen and Sargent
strive for in their analysis of the Robust-C case. But one might just as well get the opposite, and a decrease in caution – or
better, a greater nervousness by the policy maker to do something in order to avoid disaster. Sometimes, the worst thing is
to do nothing at all, and the most cautious course of action is to act decisively. When a truck is coming straight at you on a
narrow road, veer to the left or veer to the right. Do not keep heading straight for the truck.
It is not hard to come up with real-life policy scenarios, where there were was a strong desire by policy makers to do
something: whether rightfully so would require a full analysis. Examples are the 2008 financial crisis, the 2010 European
debt crisis, the Ebola outbreak in 2014 and so forth. One fascinating possibility with the framework provided to us by
Hansen and Sargent is to shed light on when caution implies decisive action. It seems to me that this possibility already
arises in the simple case discussed here.
6. A Bayesian interpretation
Notice that the first-order conditions above are same, no matter whether the problem is stated as maxu minF~ ;G~ or as
minF~ ;G~ maxu . This is familiar from standard calculus. But while the first way of stating the problem is a robust control
decision problem, the second way of stating the problem becomes a Bayesian decision problem for u, given the selected
distributions of the “outer” minF~ ;G~ step. So, the key for re-interpreting the problem in a Bayesian fashion may be whether it
is indeed legitimate to characterize the optimum, using first-order conditions only or whether the sequence of optimization
matters. Standard calculus reasoning may go a long way to resolve this in any specific formulation.
7. Entropy, catastrophes and policy objectives
In my discussion, I have used a general distance measure to evaluate “nearby” distributions. Hansen and Sargent argue
that relative entropy is particularly attractive here. The key argument is that policy makers presumably wish to guard
against alternatives that are hard to detect econometrically. It is important to recognize, however, that there is no a priori
reason for this principle. It may be fine for some applications. It may not be fine for others.
It is plausible that policy makers do not care so much about fine-tuning for some small misspecification of their model at
hand, but really care about guarding against catastrophes. Central bankers like to avoid the specter of deflation. Policy
makers generally like to avoid the meltdown of economies. I buy into the Hansen–Sargent perspective that policy makers do
not care so much about getting it exactly right as they care about not getting it exactly wrong. But “not getting it exactly
wrong” may be about some catastrophic scenario, whether easy to detect or not. Moreover, it is fine to ascribe some
particular objective function to policy makers for the purpose of analysis: in practice, they might be hard-pressed to be
specific enough about it, and that may matter too. In any case, the objective of the policy maker ought to matter for
calculating “distance”. Let me illustrate these points with two abstract examples, in the language of the Hansen–Sargent
formulation.
For the first example, suppose that F and the alternative F~ both are distributions on β A f1; 1000g, assigning probability
weights f1 ϵ; ϵg and f1 ϵ~ ; ϵ~ g respectively, where 0 o ϵ o ϵ~ o 1. Indeed, suppose that both ϵ are small, but that ϵ is vastly
smaller than ϵ~ : ϵ ¼ e 10;000 and ϵ~ ¼ e 6 1=400 will do. The likelihood ratio mðβÞ is given by the two numbers
fð1 ϵ~ Þ=ð1 ϵÞ; ϵ~ =ϵg or approximately f1; e9994 g for the specific numbers given. The relative entropy entðF~ ; FÞ is
1 ϵ~
ϵ~
ent F~ ; F ¼ ð1 ϵ~ Þlog
þ ϵ~ log
25
1 ϵ
ϵ
for the specific example. For relative entropies, that is pretty far. For comparison: for normal densities, the mean would need
to be shifted by about seven standard deviations to create that kind of an entropy-distance. Pretty far indeed.
So, a Hansen–Sargent policy maker should not be concerned about the alternative F~ . Essentially, if β ¼ 1000 gets realized
in a short sample, the policy maker can be quite sure that it is really the alternative distribution describing the uncertainty
regarding β, it is an easily distinguishable probability distribution.
But consider such a policy maker. Perhaps, the policy maker typically takes some action Ut rather different from zero,
which then is justified by a “normal times” draw of β ¼ 0: perhaps this has happened hundred times in a row and it should
happen many more times under F, and there is no good reason to worry about β ¼ 1000 under F. But then suddenly
119
H. Uhlig / Journal of Monetary Economics 69 (2015) 114–120
Two objective functions
Two objective functions
0
0
−0.05
−10
−0.1
−20
−0.15
−30
−0.2
−40
−0.25
−0.3
−0.35
−0.5
−50
2
h(y)=−y
h(y)=1−exp(y 2)
0
y
0.5
−60
−2
h(y)=−y 2
h(y)=1−exp(y 2)
−1
0
y
1
2
Fig. 3. Comparing two objective functions.
β ¼ 1000 gets drawn. The policy maker now knows that, very likely, F~ and not F is at work. But by now, it is too late: if some
action Ut reasonably far from zero has already been taken, this creates a huge loss per X t þ 1 ¼ X t þ 1000nU t þ αW t þ 1 . It seems
to me that the policy maker would have been well advised to be cautious against the possibility F~ , even though it is far away
in an entropy sense.
It may well be that this is a small sample problem. If the policy maker had a reasonably long sample of β-draws before
embarking on policy choices Ut, there would have been no problem in distinguishing F from F~ . But is not that often precisely
the problem policy makers face? Their samples are small, and there is a rich amount of data to consider. Tail probabilities for
rare events and catastrophes matter, and policy makers do not have the luxury to wait for a long time to decide whether
these tail probabilities are small or truly tiny: “small” may be too large.
For the second example and regarding the point that the objective function matters for calculating distance, suppose the
2
objective function is E½1 ey rather than E½ y2 . Fig. 3 compares these two objective functions: they nearly coincide for
moderate values of y, but with the new objective function, large values for ∣y∣ are far worse than for the quadratic objective.
Now, if W t þ 1 and thus X t þ 1 are normally distributed, one may go ahead and calculate the optimal choice for Ut, just as
before. But suppose that W t þ 1 and thus X t þ 1 are t1,000,000-distributed, as the alternative density. As a result, V ¼ 1, no
matter which Ut is chosen: it is pointless now for the policy maker to put much effort into picking a best Ut. More generally,
it would not be hard to modify this example a bit and construct “intermediate” cases, where the prediction regarding the
optimal Ut under the alternative become pretty wild. So, should not robustness in light of the objective be on the table?
What, if the policy maker actually is unsure about the objective function? Should perhaps decision rules be sought that are
robust to misspecifications of the objective function?
8. Conclusion
Hansen and Sargent keep paving the way for thinking about the very practical issue of model misspecification and the
search for robust decision rules, see also Uhlig (2012). This one here is a nifty, little paper, exposing key elements in a smallscale model. There are lots of insights, lots of food for thought, lots of details to chew on.
I do buy into their perspective that policy makers typically seek to find decision rules that do robustly well, even if the
underlying model has been somewhat misspecified. It is very attractive and plausible from a practical perspective to adopt
the Hansen–Sargent perspective that policy makers do not care so much about getting it exactly right as they care about not
getting it too wrong. Hansen and Sargent push us to take this very practical problem, and think about it using the theoretical
tools of research economists. They discipline their exercise by considering alternatives that are “nearby” in an econometric
sense: essentially, they wish to be robust against alternatives that are hard to distinguish, when using likelihood ratio tests.
Hansen and Sargent reach the conclusion for their particular set-up that this translates into reacting even less to
circumstances.
I believe, though, that it is important to recognize the limitations of this particular conclusion. It may well be that the
cautious thing is to act decisively: indeed this may already arise in their framework, with small modifications. And it may
well be that the misspecification alternatives, which policy makers need to worry about, are very clearly visible and easy to
distinguish econometrically, once they manifest themselves. It is not knowing whether and when they occur, which may be
the important challenge to policy makers. But all these are matters of debate that merit closer attention in future research.
Overall, this is a powerful perspective and paradigm, and a fascinating research agenda. I wish more researchers would
take it up. Hansen and Sargent already have done deep and powerful work exploring big stretches of the coastal areas of this
continent. But a large continent it is. A large terra incognita still awaits us. For those who dare venture into this unknown, I
recommend taking the Hansen–Sargent charts and this particular paper along the way: it will serve them well.
120
H. Uhlig / Journal of Monetary Economics 69 (2015) 114–120
References
Friedman, Milton, 1953. The effects of full employment policy on economic stability: a formal analysis. In: Friedman, Milton (Ed.), Essays in Positive
Economics, University of Chicago Press, Chicago, IL, pp. 117–132.
Hansen, Lars Peter, Sargent, Thomas J., 2008. Robustness. Princeton University Press, Princeton, NJ.
Hansen, Lars Peter, Thomas J. Sargent, 2015. Four types of ignorance. J. Monet. Econ. 69, 97–113, http://dx.doi.org/10.1016/j.jmoneco.2014.12.008.
Uhlig, Harald, 2012. Agents as Empirical Macroeconomists: Thomas J. Sargent's Contribution to Economics. Scand. J. of Economics 114 (4), 1055–1081.