CS412: Lecture #15
Mridul Aanjaneya
March 12, 2015
We shall turn our attention to solving linear equations
Ax = b
where A ∈ Rm×n , x ∈ Rn , b ∈ Rm . We already saw examples of methods that
required the solution of a linear system as part of the overall algorithm, e.g., the
Vandermonde system for interpolation, which was a square system (m = n).
Another category of methods that leads to rectangular systems with m > n
is least square methods. They answer questions of the form:
• What is the best n-order polynomial we can use to approximate (not in-
terpolate) m + 1 data points (where m > n).
• More generally, find the solution that most closely satisfies m equations
in the presence of n (n < m) unknowns.
All these algorithms need to be conscious about error, and there are at least
three sources for it.
• Some algorithms are “imperfect” in the sense that they require several
iterations to generate a good quality approximation. Thus, intermediate
results are subject to error.
• Sometimes, it is not possible to find an “ideal” solution, e.g. because we
have more equations than unknowns. In this case, not all equations will
be satisfied exactly, and we need a notion of the “error” incurred in not
satisfying certain equations fully.
• Inputs to an algorithm are often computed by noise, roundoff error, etc.
For example, instead of solving an “intended” system AX = b we may be
solving A? x = b? , where the entries A? and b? have been subject to noise
and inaccuracy. It is important to know how those translate to errors in
determining x.
Vector and Matrix Norms
Norms are valuable tools in arguing about the extent and magnitude of error.
We will introduce some concepts that we will use broadly later on.
1
Definition A vector norm is a function from Rn to R, with a certain number of
properties. If x ∈ Rn , we symbolize its norm by ||x||. The defining properties
of a norm are:
1. ||x|| ≥ 0 for all x ∈ Rn , also ||x|| = 0 if and only if x = 0.
2. ||αx|| = |α| · ||x|| for all α ∈ R, x ∈ Rn .
3. ||x + y|| ≤ ||x|| + ||y|| for all x, y ∈ Rn .
Note that the properties above do not determine a unique form of a “norm”
function. In fact, many different valid norms exist. Typically, we will use
subscripts (|| · ||a , || · ||b ) to denote different types of norms.
Vector norms - why are they needed?
When dealing (for example) with the solution of a nonlinear equation f (x) = 0,
the error e = xapprox − xexact is a single number. Thus, the absolute value of |e|
gives us a good idea of the “extent” of the error.
When solving a system of linear equations Ax = b, the exact solution xexact
as well as any approximation xapprox are vectors, and the error e = xapprox −xexact
is a vector too! It is not as straightforward to assess the “magnitude” of such a
vector-valued error. For e.g., consider e1 , e2 ∈ R1000 , and
0.1 100
0.1 0
0.1
e1 = , e2 = 0
.. ..
. .
0.1 0
Which one is worse? e1 has a modest amount of error, distributed over all
components. In e2 , all but one component are exact, but one of them has a
huge discrepancy!
Exactly how we quantify and assess the extent of error is application-dependent.
Vector norms are alternative ways to measure this magnitude, and different
norms would be appropriate for different tasks. Some norms which satisfy the
properties of vector norms are (here, x = (x1 , x2 , . . . , xn )):
1. The L1 -norm (or 1-norm)
n
X
||x||1 = |xi |
i=1
2. The L2 norm (or 2-norm, or Euclidean norm)
v
u n
uX
||x||2 = t x2i
i=1
2
3. The infinity norm (or max-norm)
||x||∞ = max |xi |
1≤i≤n
4. (Less common) Lp norm
n
!1/p
X
p
||x||p = |xi |
i=1
It is relatively easy to show that these satisfy the defining properties of a norm,
e.g., for || · ||1 :
Pn
• ||x||1 = i=1 |xi | ≥ 0
Pn
• if x = 0 ⇒ i=1 |xi | = 0 ⇒ |xi | = 0, ∀i ⇒ x = 0
Pn Pn
• ||αx|| = i=1 |αxi | = |α| i=1 |xi | = |α| · ||x||1
Pn Pn Pn
• ||x + y|| = i=1 |xi + yi | ≤ i=1 |xi | + i=1 |yi | = ||x||1 + ||y||1
Similar proofs can be given for || · ||∞ (just as easy), || · ||2 (a bit more difficult)
and || · ||p (rather complicated).
We can actually define norms for (square) matrices as well. A matrix norm
is a function || · || : Rn×n → R which satisfies:
1. ||M || ≥ 0, ∀M ∈ Rn×n , ||M || = 0 if and only if M = O.
2. ||αM || = |α| · ||M ||
3. ||M + N || ≤ ||M || + ||N ||
4. ||M · N || ≤ ||M || · ||N ||
(Property (4) is the one that has slightly different flavor than vector norms.)
Although more types of matrix norms exist, one common category is that of
matrix norms induced by vector norms.
Definition: If || · ||? is a valid vector norm, its induced matrix norm is defined
as
||M x||?
||M ||? = max
x∈R n ,x6=0 ||x||?
or equivalently,
||M ||? = max ||M x||?
x∈Rn ,||x||? =1
Note, again, that not all valid matrix norms are induced by vector norms. One
notable example is the very commonly used Frobenius norm:
v
u n
uX
||M ||F = t Mij2
i,j=1
3
We can easily show that induced norms satisfy properties (1) through (4). Prop-
erties (1)-(3) are rather trivial, e.g.:
||(M + N )x|| ||M x|| + ||N x||
||M + N || = max ≤ max
x6=0 ||x|| x6=0 ||x||
||M x|| ||N x||
= max + max = ||M || + ||N ||
x6=0 ||x|| x6=0 ||x||
Property (4) is slightly trickier to show. First, a lemma:
Lemma 1. If || · || is a matrix norm induced by a vector norm || · ||, then:
||Ax|| ≤ ||A|| · ||x||
Proof. Since ||A|| = maxx6=0 ||Ax||/||x||, we have that for an arbitrary y ∈ Rn
(y 6= 0)
||Ax|| ||Ay||
||A|| = max ≥ ⇒ ||Ay|| ≤ ||A|| · ||y||
x6=0 ||x|| ||y||
This holds for y 6= 0, but we can see it is also true for y = 0.
Property (4)
||M N || = max ||M N x|| ≤ max ||M || · ||N x||
||x||=1 ||x||=1
= ||M || · max ||N x|| = ||M || · ||N ||
||x||=1
⇒ ||M N || ≤ ||M || · ||N ||
Although the definition of an induced norm allowed us to prove certain prop-
erties, it does not necessarily provide a convenient formula for evaluating the
matrix norm.
Fortunately, such formulas do exist for the L1 and L∞ induced matrix norms.
Given here (without proof):
n
X
||A||1 = max |Aij | (maximum absolute column sum)
j
i=1
Xn
||A||∞ = max |Aij | (maximum absolute row sum)
i
i=1
(|| · ||2 is much more complicated!) Where do these vector/matrix norms come
handy?