Starting with Two Matrices
Gilbert Strang, Massachusetts Institute of Technology
Imagine that you have never seen matrices. On the principle that examples are amazingly powerful, we
study two matrices A and C . The reader is requested to be exceptionally patient, suspending all prior
experience—and suspending also any hunger for precision and proof. Please allow a partial understanding
to be established first.
The first sections of this paper represent an imaginary lecture, very near the beginning of a linear
algebra course. That lecture shows by example where the course is going. The key ideas of linear algebra
(and the key words) come very early, to point the way. My own course now includes this lecture, and
Notes 1-6 below are addressed to teachers.
A first example Linear algebra can begin with three specific vectors a1 , a2 , a3 :
2 3 2 3 2 3
1 0 0
a1 D 4 1 5 a2 D 4 1 5 a3 D 4 0 5 :
0 1 1
The fundamental operation on vectors is to take linear combinations. Multiply these
vectors a1 , a2 , a3 by numbers x1 , x2 , x3 and add. This produces the linear combination x1 a1 C x2 a2 C
x3 a3 D b: 2 3 2 3 2 3 2 3 2 3
1 0 0 x1 b1
x1 4 1 5 C x2 4 1 5 C x3 4 0 5 D 4 x2 x1 5 D 4 b2 5 : (1)
0 1 1 x3 x2 b3
Step 2 is to rewrite that vector equation as a matrix equation Ax D b. Put a1 , a2 , a3 into the columns of a
matrix and put x1 , x2 , x3 into a vector:
2 3 2 3 2 3
1 0 0 x1
Matrix A D 4 a1 a2 a3 5 D 4 1 1 0 5 Vector x D 4 x2 5
0 1 1 x3
Key point A times x is exactly x1 a1 C x2 a2 C x3 a3 , a combination of the columns. This definition of
Ax brings a crucial change in viewpoint. At first, the xs were multiplying the as. Now, the matrix A is
multiplying x. The matrix acts on the vector x to produce a vector b:
2 32 3 2 3 2 3
1 0 0 x1 x1 b1
Ax D b Ax D 4 1 1 0 5 4 x2 5 D 4 x2 x1 5 D 4 b2 5 : (2)
0 1 1 x3 x3 x2 b3
When the xs are known, the matrix A takes their differences. We could imagine an
unwritten x0 D 0, and put in x1 x0 to complete the pattern. A is a difference matrix.
Note 1 Multiplying a matrix times a vector is the crucial step. If students have seen Ax before, it was
row times column. In examples they are free to compute that way (as I do). “Dot product with rows” gives
AMS Classification 15A09 1
the same answer as “combination of columns”. When the combination x1 a1 C x2 a2 C x3 a3 is computed
one component at a time, we are using the rows.
The example illustrates how the same Ax arrives both ways. Differences like x2 x1 come from row
times column. Combining the columns of A is probably new to the class: good. The relation of the rows
to the columns is truly at the heart of linear algebra.
Note 2 Three basic questions in linear algebra, and their answers, show why the column description of
Ax is so essential:
When does a linear system Ax D b have a solution?
Ax D b asks us to express b as a combination of the columns of A. So there is a solution exactly
when b is in the column space of A.
When are vectors a1 ; : : : ; an linearly independent?
The combinations of a1 ,...,an are the vectors Ax. For independence, Ax D 0 must have only the
zero solution. The nullspace of A must contain only the vector x D 0.
How do you express b as a combination of basis vectors?
Put those basis vectors into the columns of A. Solve Ax D b.
Note 3 The reader may object that we have only answered questions by introducing new words. My
response is, those ideas of column space and nullspace and basis are crucial definitions in this subject.
The student moves to a higher level—a subspace level—by understanding these words. We are constantly
putting vectors into the columns of a matrix, and then working with that matrix.
I don’t accept that inevitably “The fog rolls in” when linear independence is defined [1]. The concrete
way to dependence vs. independence is through Ax D 0: many solutions or only the solution x D 0. This
comes immediately in returning to the example of specific a1 , a2 , a3 .
Suppose the numbers x1 ; x2 ; x3 are not known but b1 ; b2 ; b3 are known. Then Ax D b becomes an
equation for x, not an equation for b. We start with the differences (the bs) and ask which xs have those
differences. This is a new viewpoint of Ax D b, and linear algebra is always interested first in b D 0:
2 3 2 3
x1 0 x1 D 0
Ax D 0 Ax D x2 x1 D 0 :
4 5 4 5 Then x2 D 0 (3)
x3 x2 0 x3 D 0
For this matrix, the only solution to Ax D 0 is x D 0. That may seem automatic but it’s not. A key word
in linear algebra (we are foreshadowing its importance) describes this situation. These column vectors a1 ,
a2 , a3 are independent. Their combination x1 a1 C x2 a2 C x3 a3 is Ax D 0 only when all the xs are zero.
Move now to nonzero differences b1 D 1, b2 D 3, b3 D 5. Is there a choice of x1 , x2 , x3 that produces
those differences 1; 3; 5? Solving the three equations in forward order, the xs are 1; 4; 9:
2 3 2 3 2 3 2 3
x1 1 x1 1
Ax D b 4 x2 x1 D 35 4 5 leads to 4 x2 D 4 5 :
5 4 (4)
x3 x2 5 x3 9
2
This case x D 1; 4; 9 has special interest. When the bs are the odd numbers in order, the xs are the perfect
squares in order. But linear algebra is not number theory—forget that special case ! For any b1 , b2 , b3
there is a neat formula for x1 , x2 , x3 :
2 3 2 3 2 3 2 3
x1 b1 x1 b1
4 x2 x1 5 D 4 b2 5 leads to 4 x2 5 D 4 b1 C b2 5: (5)
x3 x2 b3 x3 b1 C b2 C b3
This general solution includes the examples with b D 0; 0; 0 (when x D 0; 0; 0) and
b D 1; 3; 5 (when x D 1; 4; 9). One more insight will complete the example.
We started with a linear combination of a1 , a2 , a3 to get b. Now b is given and equation (5) goes
backward to find x. Write that solution with three new vectors whose combination gives x:
2 3 2 3 2 3 2 32 3
1 0 0 1 0 0 b1
x D b1 4 1 5 C b2 4 1 5 C b3 4 0 5 D 4 1 1 0 5 4 b2 5 D Sb: (6)
1 1 1 1 1 1 b3
This is beautiful, to see a sum matrix S in the formula for x. The equation Ax D b is solved by x D Sb.
The matrix S is the “inverse” of the matrix A. The difference matrix is inverted by the sum matrix. Where
A took differences of x1 ; x2 ; x3 , the new matrix S takes sums of b1 ; b2 ; b3 .
Note 4 I believe there is value in naming these matrices. The words “difference matrix” and “sum
matrix” tell how they act. It is the action of matrices, when we form Ax and C x and Sb, that makes linear
algebra such a dynamic and beautiful subject.
The linear algebra symbol for the inverse matrix is A 1 (not 1=A). Thus S D A 1 finds x from b.
This example shows how linear algebra goes in parallel with calculus. Sums are the inverse of differences,
and integration is the inverse of differentiation:
Z t
1
dx
S DA Ax D D b.t / is solved by x.t / D Sb D b: (7)
dt 0
The integral starts at x.0/ D 0, exactly as the sum started at x0 D 0.
The second example This example begins with almost the same three vectors—only one component is
changed: 2 3 2 3 2 3
1 0 1
c1 D 4 1 5 c2 D 4 1 5 c3 D 4 0 5 :
0 1 1
The combination x1 c1 C x2 c2 C x3 c3 is again a matrix multiplication C x:
2 32 3 2 32 3 2 3 2 3
x1 1 0 1 x1 x1 x3 b1
C x D 4c1 c2 c3 5 4x2 5 D 4 1 1 05 4x2 5 D 4x2 x1 5 D 4b2 5 : (8)
x3 0 1 1 x3 x3 x2 b3
With the new vector in the third column, C is a “cyclic” difference matrix. Instead of x1 0 we have
x1 x3 . The differences of xs “wrap around” to give the new bs. The inverse direction begins with
b1 ; b2 ; b3 and asks for x1 ; x2 ; x3 .
3
We always start with 0; 0; 0 as the bs. You will see the change: nonzero xs can have zero differences.
As long as the xs are equal, all their differences will be zero:
2 3 2 3 2 3 2 3
x1 x3 0 x1 1
Cx D 0 4 x2 x1 D 05 4 5 is solved by x D x1 D x1 1 5 :
4 5 4 (9)
x3 x2 0 x1 1
The zero solution x D 0 is included (when x1 D 0). But 1; 1; 1 and 2; 2; 2 and ; ; are also solutions—
all these constant vectors have zero differences and solve C x D 0. The columns c1 ; c2 ; c3 are dependent
and not independent.
In the row-column description of Ax, we have found a vector x D .1; 1; 1/ that is perpendicular to
every row of A. The columns combine to give Ax D 0 when x is perpendicular to every row.
This misfortune produces a new difficulty, when we try to solve C x D b:
2 3 2 3
x1 x3 b1
4x2 x1 5 D 4 b2 5 cannot be solved unless b1 C b2 C b3 D 0:
x3 x2 b3
The three left sides add to zero, because x3 is now cancelled by x3 . So the bs on the right side must add
to zero. There is no solution like equation (5) for every b1 , b2 , b3 . There is no inverse matrix like S to
give x D Sb. The cyclic matrix C is not invertible.
Summary Both examples began by putting vectors into the columns of a matrix. Combinations of the
columns (with multipliers x) became Ax and C x. Difference matrices A and C (noncyclic and cyclic)
multiplied x—that was an important switch in thinking. The details of those column vectors made Ax D b
solvable for all b, while C x D b is not always solvable. The words that express the contrast between A
and C are a crucial part of the language of linear algebra:
The vectors a1 , a2 , a3 are independent.
The nullspace for Ax D 0 contains only x D 0.
The equation Ax D b is solved by x D Sb.
The square matrix A has the inverse matrix S D A 1 .
The vectors c1 , c2 , c3 are dependent.
The nullspace for C x D 0 contains every “constant vector” x1 , x1 , x1 .
The equation C x D b cannot be solved unless b1 C b2 C b3 D 0.
C has no inverse matrix.
A picture of the three vectors, a1 ; a2 ; a3 on the left and c1 ; c2 ; c3 on the right, explains the difference
in a useful way. On the left, the three directions are independent. The three arrows don’t lie in a plane.
The combinations x1 a1 C x2 a2 C x3 a3 produce every three-dimensional vector b. The good multipliers
x1 ; x2 ; x3 are given by x D Sb.
On the right, the three arrows do lie in a plane. The vectors c1 ; c2 ; c3 are dependent. Each vector has
components adding to 1 1 D 0, so all combinations of these vectors will have b1 C b2 C b3 D 0 (this is
the equation for the plane). The differences x1 x3 and x2 x1 and x3 x2 can never be 1; 1; 1 because
those differences add to zero.
4
3 0
23 2 3 3 c3
1
a3 D 4 0 5
6 c3 D 4 0 5
1 1
2
1
3 2 2 3 2
0
a1 D 4 1 5 ja D4 1 5
2
c1 j c2
0 1 1 1
Note 5 Almost unconsciously, one way of teaching a new subject is illustrated by these examples. The
ideas and the words are used before they are fully defined. I believe we learn our own language this way—
by hearing words, trying to use them, making mistakes, and eventually getting it right. A proper definition
is certainly needed, it is not at all an afterthought. But maybe it is an afterword.
Note 6 Allow me to close these lecture ideas by returning to Note 1: Ax is a combination of the columns
of A. Extend that matrix-vector multiplication to matrix-matrix: If the columns of B are b1 ; b2 ; b3 then
the columns of AB are Ab1 ; Ab2 ; Ab3 .
The crucial fact about matrix multiplication is that .AB/C D A.BC /. By the previous sentence
we may prove this fact by considering one column vector c.
3 2
c1
Left side .AB/c D ŒAb1 Ab2 Ab3 4 c2 5 D c1 Ab1 C c2 Ab2 C c3 Ab3 (10)
c3
Right side A.Bc/ D A.c1 b1 C c2 b2 C c3 b3 /: (11)
In this way, .AB/C D A.BC / brings out the even more fundamental fact that matrix multiplication is
linear: .10/ D .11/.
Expressed differently, the multiplication AB has been defined to produce the composition rule: AB
acting on c is equal to A acting on B acting on c.
Time after time, this associative law is the heart of short proofs. I will admit that these “proofs by
parenthesis” are almost the only ones I present in class. Here are examples of .AB/C D A.BC / at
three key points in the course. (I don’t always use the ominous word proof in the video lectures [2] on
ocw.mit.edu, but the reader will see through this loss of courage.)
If AB D I and BC D I then C D A.
Right inverse D Left inverse C D .AB/C D A.BC / D A
If y T A D 0 then y is perpendicular to every Ax in the column space.
Nullspace of AT ? column space of A y T .Ax/ D .y T A/x D 0
1
If an invertible B contains eigenvectors b1 ; b2 ; b3 of A, then B AB is diagonal.
Multiply AB by columns AŒb1 b2 b3 D ŒAb1 Ab2 Ab3 D Œ1 b1 2 b2 3 b3
5
Then separate this AB into B times the eigenvalue matrix ƒ:
2 3
1
AB D Œ1 b1 2 b2 3 b3 D Œb1 b2 b3 4 2 5 (again by columns!)
3
AB D Bƒ gives the diagonalization B 1 AB D ƒ. Equivalently it produces the factorization
A D BƒB 1 . Parentheses are not necessary in any of these triple factorizations:
Spectral theorem for a symmetric matrix A D QƒQT
Elimination on a symmetric matrix A D LDLT
Singular Value Decomposition of any matrix A D U †V T
One final comment: Factorizations express the central ideas of linear algebra in a very effective
way. The eigenvectors of a symmetric matrix can be chosen orthonormal: QT Q D I in the spectral
theorem A D QƒQT . For all matrices, eigenvectors of AAT and AT A are the columns of U and V
in the Singular Value Decomposition. And our favorite rule .AAT /A D A.AT A/ is the key step in
establishing that SVD, long after this early lecture...
These orthonormal vectors u1 ; :::; um and v1 ; :::; vn are perfect bases for the Four Fundamental
Subspaces: the column space and nullspace of A and AT . Those subspaces become the organizing
principle of the course [2]. The Fundamental Theorem connects their dimensions to the rank of A.
The flow of ideas is from numbers to vectors to subspaces. Each level comes naturally, and everyone
can get it—by seeing examples.
References
[1] David Carlson, Teaching Linear Algebra: Must the Fog Always Roll in?, College Mathematics Jour-
nal, 24 (1993) 29-40.
[2] Gilbert Strang, Introduction to Linear Algebra, Fourth edition, Wellesley-Cambridge Press (2009).
Video lectures on web.mit.edu/18.06 and on ocw.mit.edu.