Paul M Cohn-Further Algebra
Paul M Cohn-Further Algebra
Further Algebra
and Applications
With 27 Figures
,        Springer
P.M. Cohn, MA, PhD, FRS
Department of Mathematics, University College London,
Gower Street, London WClE 6BT
  QAI54.3.C64 2003
  512-<1c21                                            2002026862
Apart from any fair dealing for the purposes of research or private study, OI criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers,
or in the case of reprographic reproduction in accordance with the terms of licences issued by the
Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the
publishers.
ISBN 978-1-4471-1120-7
http://www.springer.co.uk
The publisher makes no representation, express or implied, with regard to the accuracy of the information
contained in this book and cannot accept any legal responsibility or liability for any errors or omissions
that may be made.
Typesetting by BC Typesetting, Bristol BS31 lNZ
                                    Contents
  l. Universal algebra
     l.1   Algebras and homomorphisms............................................................                                 1
     l.2   Congruences and the isomorphism theorems ...................................                                           4
     l.3   Free algebras and varieties ...................................................................                       11
     1.4   The diamond lemma ............................................................................                        18
     1.5   Ultraproducts.........................................................................................                20
     l.6   The natural numbers ............................................................................                      24
 2. Homological algebra
    2.1  Additive and abelian categories ...........................................................                             33
    2.2  Functors on abelian categories ............................................................                             41
    2.3  The category ModR ...............................................................................                       50
    2.4  Homological dimension ................................................................ .......                          56
    2.5  Derived functors....................................................................................                    61
    2.6  Ext, Tor and global dimension............................................................                               72
    2.7  Tensor algebras, universal derivations and syzygies..........................                                           78
 4. Algebras
    4.1   The Krull-Schmidt theorem ..............................................................                               l35
    4.2   The projective cover of a module .....................................................                                 l39
    4.3   Semiperfect rings.................................................................................                     142
                                                                   v
vi                                                                                Further Algebra and Applications
 9. Skew fields
    9.1   Generalities.............................................................................................      343
    9.2   The Dieudonne determinant................................................................                      346
Contents                                                                                                                         vii
This volume follows on the subject matter treated in Basic Algebra and together with
that volume represents the contents of volumes 2 and 3 of my book on algebra, now
out of print; the topics have been rearranged a little, with most of the applications in
the present volume, while the basic theories (groups, rings, fields) are pursued
further in the earlier book. In any case all parts of volumes 2 and 3 are represented.
The whole text has been revised, some exercises have been added and of course errors
have been corrected; I am grateful to a number of readers for bringing such errors to
my attention.
   Chapter 1 presents the basic notions of universal algebra: the isomorphism
theorems, free algebras and varieties, with the natural numbers, viewed as algebra
with a unary operator as an application, as well as the ultraproduct theorem and
the diamond lemma. The introduction to homological algebra in Chapter 2 goes
as far as derived functors and global dimension, with the case of polynomial rings
and free algebras as an application. Chapter 3, on group theory, discusses some
items of general interest and importance (group extensions, Hall subgroups, trans-
fer), but also topics which find an echo elsewhere in the book, such as free groups
and linear groups. Chapter 4, on algebras, deals with the Krull-Schmidt theorem,
projective covers, Morita equivalence and related matters, but stops short of the
representation theory of algebras, which would have required more space than was
available. This is followed by an account of central simple algebras (Chapter 5),
introducing the Brauer group and crossed products. The representation theory of
finite groups in Chapter 6 presents the standard facts on representations and
characters and illustrates this work by the symmetric group. The next two chapters
return to rings; Chapter 7 presents topics on Noetherian rings such as Goldie's
theory, as well as polynomial identities and central polynomials, while Chapter 8
deals with the general density theorem, the various radicals and non-unital algebras.
Chapter 9, on skew fields, gives a simplified treatment of the Dieudonne determinant
and establishes the existence of 'free fields'. Its proof is based on the specialization
lemma, which is of independent interest.
   The final two chapters are applications of a different kind. Chapter 10 is an intro-
duction to block codes, in particular linear codes and cyclic codes, as well as some
other kinds. Chapter 11 deals with algebraic language theory and the related
topics of variable-length codes, automata and power series rings. In both chapters
it is only possible to take the first steps in the subject, but we go far enough to
show how techniques from coding theory are used in the study of free algebras.
                                           ix
x                                                        Further Algebra and Applications
   The text assumes an acquaintance with much of Basic Algebra, to which reference
is made in the form 'BA' followed by the section number. Definitions and key
properties are usually recalled in some detail, but not necessarily on their first occur-
rence; the reader can easily trace explanations through the index. As before, there are
occasional historical references and numerous exercises, often with hints, though no
solutions.
   A number of colleagues and friends have made comments on the earlier edition
and I would like to express my thanks to them here. My thanks also go to the
staff of Springer-Verlag London and to Mrs Lyn Imeson for the efficient way they
have carried out their task.
University College London                                                    P.M. Cohn
October 2002
                        Conventions on
                        Terminology and notes to
                        the reader
References to Basic Algebra are in the form BA, followed by the section number.
    A property is said to hold for almost all members of a set if it holds for all but
a finite number. The complement of a subset Y in a set X is written X\Y. As a
rule mappings are written on the right; in particular this is done when mappings
have to be composed, so that a{3 means: first a, then {3. If a is a mapping from a
set X and Y is a subset of X, then the restriction of a to Y is written al Y.
    All rings and monoids have a unit element or one, which acts as neutral element
for multiplication, usually denoted by 1; by contrast an algebra (over a coefficient
ring) need not have a one. A ring is trivial or the zero ring if it consists of 0
alone; this happens just when 1 = O. An element a of a ring is called a zero-divisor
if a -=j:. 0 and ab = 0 or ba = 0 for some b -=j:. 0; if a is neither 0 nor a zero-divisor,
it is said to be regular (see Section 7.1). A non-trivial ring without zero-divisors is
called an integral domain; this term is not taken to imply commutativity. A ring
in which the non-zero elements form a group under multiplication is called a skew
field; in the commutative case this reduces to a field, but sometimes (in Chapter 9)
this term is also used in the general case. In any ring R, the set of all non-zero
elements is denoted by R x; this notation is mainly used for integral domains,
where R x is a monoid. A skew field finite-dimensional over its centre is called a divi-
sion algebra, but the term 'algebra' by itself is not taken to imply finite dimension-
ality. A ring is said to have invariant basis number (IBN) if any two bases of a free
module have the same number of elements, or equivalently, if any matrix with a
two-sided inverse is square (see BA, Section 4.6).
    References to the bibliography are by name of author and date in round brackets
for books and square brackets for papers. As in BA, all results in a section are
numbered consecutively; further we abbreviate 'if and only if' by iff (except in
enunciations) and use. to indicate the end (or absence) of a proof.
    The chapters are to a large extent independent, so no interdependence chart has
been given, but the reader may have to turn back for the occasional result; this is
usually clearly indicated.
                                            xi
                         Universal algebra
Most algebraic systems such as groups, vector spaces, rings, lattices etc. can be
regarded from a common point of view as sets with operations defined on them, sub-
ject to certain laws. This is done in Section 1.1 and it allows many basic results, such
as the isomorphism theorems, to be stated and proved quite generally, as we shall see
in Section 1.2. Of the general theory of universal algebra (by now quite extensive), we
shall need very little, this forms the subject of Section 1.3; in addition to the basic
concepts we define the notion of an algebraic variety, i.e. a class of algebraic systems
defined by identical relations, or laws. But there are one or two other topics, not
strictly part of the subject that are needed: the diamond lemma forms the subject
of Section 1.4, while dependence relations have already been discussed in BA
(Section 11.1). There is also the ultraproduct theorem in Section 1.5, a result from
logic with many uses (see Chapter 7). The chapter ends in Section 1.6 with an axio-
matic development of the natural numbers, regarded as an algebraic system, in an
account following Leon Henkin [1960] (see also Cohn (1981)).
The set A is called the carrier of the algebra. Strictly speaking we should denote the
algebra by (A, Q, <p), where <p is the family of mappings <Pn : Q(n) --+ Map (An ,A)
defined by (1.1.1), but usually we shall not distinguish notationally between an
algebra and its carrier. The set Q is called the operator domain, or also the signature
of the algebra. We give some examples.
1. Groups. A group (G,', -1,1) is given by a set with a binary operation (multipli-
   cation), a unary operation (inversion) and a constant operation (the neutral ele-
   ment), satisfying certain laws which are familiar to the reader (see Section 1.3
   below).
2. Rings. A ring (R, +, -, x, 0, 1) is given by set with two binary operations +, x,
   two constant operations 0, 1 and a unary operation -, again satisfying well-
   known laws.
3. Lattices. A lattice may be defined as a partially ordered set in which each pair of
   elements has a supremum and an infimum, or as an algebra (L, v, /\) with two
   binary operations satisfying certain laws (see BA, Section 3.1). For Boolean
   algebras we require in addition a constant operation 0 and a unary operation "
   which leads to another constant operation 1 = 0', an instance of a derived
   operation (see Section 1.3).
4. Vector spaces. Let k be a field. A vector space over k is an algebra (V, +, 0, k) with
   a binary operation +, a constant operation 0 and a family of unary operations
   indexed by k : Wa : u 1--+ exu(u E V, ex E k), satisfying the laws familiar from linear
   algebra. For an infinite field k this is an example of an algebra with an infinite
   signature.
S. A i-element set has a unique Q-algebra structure for any Q. This is called the
   trivial Q-algebra.
6. The empty set is an Q-algebra precisely when Q has no constant operators.
Given an Q-algebra A and w E Q(n), we can apply w to any n-tuple ai, ... , an E A
and obtain another element of A which is written al ... anw. In the case n = 0 this
just singles out an element of A, denoted by w; the zero element in a ring is an
example.
   Many algebraic concepts can be formulated for general Q-algebras. Thus given an
Q-algebra A, an Q-subalgebra is an Q-algebra B whose carrier is a subset of that of A
and which is closed under the operations of Q, as defined in A. It is clear from the
definition that a given subset of A can be defined as a subalgebra of A in at most one
way. To give an example, the ring Z of integers has no proper subrings, because the
1.1 Algebras and homomorphisms                                                         3
constant operation I already generates the whole of Z. The subset {OJ is again a ring,
but it is not a subring because the operation I has different values on Z and on {OJ.
   It is not hard to see that the intersection of any family of subalgebras of a given
algebra A is again a subalgebra of A. Hence for any subset X of A we can form
the intersection of all subalgebras containing X. This is called the subalgebra of A
generated by X; it may also be obtained by applying the operations of Q to the
elements of X and repeating this operation a finite number of times. If the subalgebra
generated by X is the whole of A, then X is called a generating set of A. Clearly every
algebra A has a generating set, e.g. A itself.
   A mapping f : A --+ B between Q-algebras A, B is said to be compatible with
wE Q(n) iffor all al,"" an E A,
(1.1.2)
   From any family (Ai)iEI of Q-algebras we can form the direct product P = TI Ai;
its carrier is the Cartesian product of the Ai, and the operations are carried out
componentwise. Thus if 1Ti : P --+ Ai are the projections from the Cartesian product
to the factors, then any W E Q(n) is defined on P by the equation
(1.1.3)
It is easily checked that this defines an Q-algebra structure on P and the form of
(1.1.3) shows that the projection 1Ti is a homomorphism from P to Ai.
   Of course the Ai need not be all distinct. If for example Ai = A for all i E I, we
obtain the direct power of A indexed by I, which is denoted by AI. Its members
may be regarded as functions f : I --+ A and the operations are defined component-
wise; e.g. if an addition is defined on A, then in AI we have
Exercises
1. Show that the set of all subalgebras of an Q-algebra is a complete lattice (i.e. a
   lattice in which every subset has a sup and an inf, see BA, Section 3.1).
2. Verify the equivalence of the two definitions of sub algebra generated by X, given
   in the text, i.e. show that the set obtained from X by repeatedly applying Q is the
   least subalgebra containing X.
3. Show that if Q is finite, then there are only finitely many Q-algebras on a given
   finite set as carrier. Is there a bound in terms of the size of the carrier alone?
4. Show that every homomorphism which is bijective is an isomorphism.
5. Let A be an Q-algebra with a carrier of n elements. Show that A has at most n!
   automorphisms and at most nn endomorphisms. Find bounds on the number
   of automorphisms and endomorphisms if Q includes a constant operator. Find
   bounds if A has an r-element generating set.
6. Let A be an Q-algebra. Show that the set Map(A) of all mappings of A into itself
   may be regarded as an Q-algebra. Further show that End(A), the set of all endo-
   morphisms, is a sub algebra of Map (A) provided the following condition is satis-
   fied by A: Given () E Q(m), WE Q(n) and any m x n matrix over A, the element
   obtained by applying () to each column and W to the result is the same as the
   element obtained by applying W to each row and () to the result.
  Let S, T be any sets and f : S ~ T a mapping between them. Then the image off is
defined as S r f' also written im for Sf; the kernel off is defined as the correspondence
( 1.2.3)
Conditions (1.2.3) and (1.2.4) are immediate from the definitions. If (1.2.3) is
applied to (1.2.4), we get X* ;2 X*** and (1.2.4) applied with X* in place of X
gives X * ~ X ***. Hence X *** = X * and similarly for Y *. This proves (1.2.5) as a
consequence of (1.2.3) and (1.2.4) alone.
    A pair of mappings (1.2.2) between .91(5) and &,(T) satisfying (1.2.3), (1.2.4) and
hence (1.2.5) is called a Galois connexion. An obvious example, which also accounts
for the name, is the situation in field theory. If F is a field and G the group of all its
automorphisms, then the pairs (x, ex) E F x G such that XU = x form a correspon-
dence which establishes a Galois connexion between certain subfields of F and
subgroups of G. If G is a finite group of automorphisms of F and k is the subfield
of elements left fixed by G, then there is a correspondence between all subgroups
of G and all fields between k and F (see BA, Section 7.6 and Section 11.8).
    Let us define a congruence on an n-algebra A as an equivalence on A 2 which is also
a sub algebra of A2. For example, lA and A2 are congruences on A, and every other
congruence q lies between these two: lA ~ q ~ A2. The congruence on Z (as ring)
determined by a given positive integer m consists of the residue classes mod m,
i.e. the sets of numbers leaving a given remainder after division by m. As in this
example, we shall sometimes, for any congruence q on A, write a == b (mod q) to
mean (a, b) E q.
    The next two results explain the significance of congruences for algebras.
   Given a group G with a normal subgroup N, we can put a group structure on the
set GIN such that the natural mapping G -+ GIN is a homomorphism. In the same
way we can, for any congruence q on an n-algebra A, define an algebra structure on
the set of q-classes Alq such that the natural mapping A -+ Alq is a homomorphism
with kernel q. This is the content of
1.2 Congruences and the isomorphism theorems                                                  7
                                   A" ~ (A/qt
                                                  I
                                                  :0:1
                                                  t
                                   A ~ A/q
to a commutative square; thus we have to find a map           Wi :   (A/qt   ~   A/q such that
                                                                                       (1.2.6)
This equation defines Wi uniquely if we can show that the right-hand side is indepen-
dent of the choice of ai in its q-class. Let (ai, aD E q; since q is a subalgebra, we have
(a) ... anw, a; ... a~w) E q, i.e.
                              [a) ... anwl = [a'l ... a~w],
and this is what we had to show.
                                                                                            •
   The algebra so defined on A/q is again denoted by A/q and is called the quotient
algebra of A by q, with the natural homomorphism v: A ~ A/q. For example, as
we have seen, A always has the congruences lA, A2; the corresponding quotients
are A and the trivial Q-algebra consisting of a single element. An Q-algebra is said
to be simple if it is non-trivial and has no quotients other than itself and the trivial
algebra. It follows that an algebra A is simple iff it is non-trivial and has no con-
gruences other than lA or A2.
   The isomorphism theorems for groups have precise analogues for Q-algebras. We
begin with the factor theorem, which is also familiar in the case of groups (BA, The-
orem 2.3.1).
Thus there can be at most one such mapping. To show that there is one we have to
verify that the right-hand side of (1.2.7) depends only on the q-class containing a and
8                                                                               Universal algebra
not on a itself. Let (a, a') E q; then (a, a') E ker f, hence af = a'f as claimed. Thus
there is a unique well-defined mapping I' to satisfy (1.2.7) and it only remains to
show that I' is a homomorphism. Given ai E A, WE n(n), we have (al ... anw)f =
(ad) ... (anf)w, hence by (1.2.7),
                             [al ... anwl!'   = [ad!' ... [anl!, w,
and this shows I' to be a homomorphism. It is injective iff no two distinct q-classes
are identified by I' and this is just the condition q = ker f.                     •
Proof. Let v: A -+ A/q be the natural homomorphism and VI its restriction to AI.
Then VI is a homomorphism of Al into Alq; its image is the set of q-classes meeting
AI> namely A~/q, and its kernel is q n Ai = ql. Applying Theorem 1.2.5 we obtain
(1.2.9).                                                                         •
 Similarly, by applying the factor theorem with B = A/t and the natural homo-
morphism Vt : A -+ A/t for f, we obtain
Theorem 1.2.9. Let A be an n-algebra, A' a subalgebra and S a subset of A. Then there
exists a subalgebra e of A which is maximal subject to the conditions e:2 A',
enS =A'ns.
Proof. The family of all subalgebras e such that e :2 A', ens = A' n S is easily seen
to be inductive; hence by Zorn's lemma there is a subalgebra which is maximal
subject to these conditions.                                                       •
    We conclude this section with a construction which is often used, the subdirect
product. Let us again take a direct product of n-algebras: P = TI Ai, with projections
7Ti : P ~ Ai. It is easily seen that P may be characterized by the properties:
(i) for any x, YEP, if X7Ti = y7Ti for all i, then x = y,
(ii) given any family (ai), where ai E Ai, there exists x   E   P such that X7Ti = ai.
    Often one encounters situations where only (i) holds. This means that we are deal-
ing essentially with a certain subalgebra of the direct product TI Ai, with projections
7Ti : P ~ Ai. An algebra A is called a subdirect product of the Ai if there is an embed-
ding of A in the direct product P such that the image is mapped by 7Ti onto Ai, for
                                            TI
all i. We remark that any subalgebra A of Ai is a subdirect product of the family A;,
where A: is the image of the restriction map 7Ti IA. Subdirect products usually arise as
follows.
Proof. Since the trivial algebra may be written as the empty product, we may take A
to be non-trivial. If a, b E A, a =I=- b, then by Corollary 1.2.10, there is a maximal con-
gruence qo not containing (a, b). Thus (a, b) ¢ qo but (a, b) E q' for all q' J qo;
hence Alqo is sub directly irreducible. Moreover, if (qJ is the family of congruences
formed for all such pairs a, b, then n qj = 1 because any pair a =I=- b is separated
by some qj. Thus by Proposition 1.2.11, A is a subdirect product of subdirectly
irreducible algebras AI qj' each a homomorphic image of A.                               •
Exercises
1. Write down the conditions for a correspondence from sets S to T to be a bijection.
2. Describe a partial ordering on a set S in terms of the correspondence
     {(a, b)   E   S21a ::::: b}.
3. Fill in the details in the proof of Lemma 1.2.1.
4. Verify that the kernel of a ring homomorphism (in the sense defined in the text) is
   the equivalence whose classes are the cosets of an ideal. Consider the isomorphism
   theorems of the text in the case of rings.
5. Verify that sets (without structure) can be regarded as the special case of Q-
   algebras with Q = 0. Interpret the factor theorem and the isomorphism theorems
   for sets in this way.
6. Show that Z as a ring is a subdirect product of the fields Fp , where p runs over all
   primes. Do the same for Fq where q = pn for a fixed prime and n = 1,2, ....
1.3 Free algebras and varieties                                                              11
Clearly X is a subset of W(Q; X); the subalgebra generated by X is called the Q-word
algebra on X and is denoted by WdX). Its elements are Q-words in the alphabet X.
For example, if there is one binary operation a, then Xlx2X3aX4aa is an Q-word,
while XlaaX2aX3 is an Q-row which is not an Q-word.
   We shall need a simple test for finding which Q-rows are words. For this purpose
we associate two integers with each Q-row. The length of w E W(Q; X), written Iwl,
is the number of terms in w; thus if w = CI .•• CN, where Ci E Q U X, then Iwl = N.
Secondly we define the valency of w as v( w) = Li V(Ci), where
                                               I        if   Ci   EX,
                              v(d = {
                                               1- n     if   Ci E   Q(n).
and
                                               v(w) = 1.                                (1.3.2)
Moreover, each word can be obtained in just one way from its constituents.
Proof. We shall show more generally, by induction on the length                Iwl, that w is a
sequence of r words if (1.3.1) holds and
                                               v(w) = r.                                (1.3.3)
This includes the assertion of the theorem for r = 1. When Iwl = 1, (1.3.1) implies
that v(w) = 1, so W E XU Q(O), and conversely, so the result holds in this case; we
may therefore take Iwl > 1.
12                                                                          Universal algebra
   Suppose first that w is an Q-word, say w = Ul ... UnW, where Ui E Wn(X) and
WE Q(n).     By the induction hypothesis, V(Ui) = 1 and v(w) = 1 - n, so
v(w) = n + 1 - n = 1. Moreover, every prefix of each Ui has positive valency,
hence the same is true of w. When w is a sequence of r words, (1.3.1) again holds
and (1.3.3) follows by addition.
   Conversely, let w be an Q-row satisfying (1.3.1) and Iwl > 1, v(w) = r > O. We
write w = w'c, where cEQ UX and v(w ' ) = r' > 0 by (1.3.1). By induction on
the length, Wi is then a sequence of r' Q-words. Now either c EX U Q(O), and
then w is a sequence of r' + 1 words and v(w) = r' + 1; or c E Q(n), where
n > 0, and then, since v(w) = r > 0, we have r' + 1 - n = r > 0, hence
n = r' + 1 - rand c is applied to the last n words of Wi to produce a single word,
so w is a sequence of r' - (n - 1) = r Q-words, as we had to show. This analysis
of w also shows that it is built up from its constituents in just one way.      •
If A is a second binary operation, then the familiar distributive laws take the form
It is essential to write the operation symbols on one side of the variables, say on the
right, as has been done here. Equivalently the operation symbols can all be written on
the left (the Lukasiewicz prefix notation). But with the usual infix notation Xl + X2
an ambiguity arises as soon as we form Xl + X2 + X3.
   Let A be an Q-algebra. If in an element w of W = W n (X) we replace each element
of X by an element of A we obtain a unique element of A. For Iwl = 1 this is clear, so
assume that Iwl > 1 and use induction. We have w = Ul •.. UnW(Ui E W, WE Q(n)),
where the ui are uniquely determined once w is given, by Proposition 1.3.1. By
induction each Ui becomes some ai E A when we replace the elements of X by
elements of A; hence w becomes al ... anw, another element of A. This remark
can be used to establish the universal property of the Q-word algebra.
Thus wO* is just the unique element of A obtained by replacing each X E X by xO.
The remark preceding the theorem shows that 0* is well-defined, and it is easily
seen to be a homomorphism extending 0, which is unique by Proposition 1.1.1. •
1.3 Free algebras and varieties                                                      13
  The content of this theorem is also expressed by saying that W(X) is the free
Q-algebra on X as free generating set. Soon we shall meet free algebras in varieties
of algebras; the free groups encountered in BA Section 3.3, free modules of BA
Section 4.6 and the free associative algebras of BA Section 5.1 are examples.
  Given any Q-algebra A, we can take a generating set X of A and apply the
construction of Theorem l.3.2. This yields
   The Q-words may also be thought of as operations. Any word in XI, ... ,Xm E X
(and elements of Q) may be regarded as an m-ary operation, called a derived
operation. For example, in groups the commutator (x, y) = x-Iy-I xy is a derived
operation. The derived operations include the original operations W E Q, in the
form XI ... XnW, as well as m operations Xi (i = 1, ... , m). They are the projection
operators
                                                                                (1.3.4)
  The distinction between binary and higher operations is much less precise, for as
the next result, due to Waclaw Sierpinski [1945] shows, every finitary operation can
be composed from binary ones.
Theorem 1.3.4. Let A be a finite set. Then every finitary operation on A can be obtained
by composition of binary operations on A.
Proof. Suppose that IAI = n; we may without loss of generality regard A as the ring
of integers mod n, Z/n. The ring operations on Z/n are at most binary, so it will be
enough to show that every operation can be expressed in terms of the ring operations
and the 8-function
                                                  I     if x     = r,
                                8 (x)- {                                               (1.3.5)
                                 r       -        0     'f
                                                        I    X I ...l-
                                                                         r.
( 1.3.6)
where the summation is over all k-tuples (ai, ... , ak). It is of course important to
realize that the aj on the right of (1.3.6) are parameters, not variables; thus ax,
for any a E Zjn, can be built up from x by repeated addition and so is (for any
given a) a unary operation.                                                        •
   The theorem still holds when A is infinite, but the proof in that case is quite
different and is based on the fact that there is then a bijection from A2 to A,
which can be used to reduce n-ary operations to binary ones (see Cohn (1981) and
Exercise 3).
   When we come to define a concrete class of algebras such as groups, we do so by
specifying its operations: /L binary, v unary and 8 O-ary. The axioms for groups in
terms of these operations take the form:
     (associativity )                                                                  (1.3.7)
Actually these laws as stated are redundant: parts of (1.3.8) and (1.3.9) follow from
the rest. This point is well known and does not concern us here.
   We see that the axioms take the form of equations holding identically for all values
of the variables. Generally, by an identity or law over Q in X we understand a pair
(u, v) E W2, or sometimes the equation formed from the pair:
u=v. (1.3.10)
We shall say that the law (1.3.10) holds in the Q-algebra A or that A satisfies (1.3.10)
if every homomorphism W -+ A maps u and v to the same element of A, in other
words, if u and v define the same derived operation on A.
1.3 Free algebras and varieties                                                        15
   The relation between laws and algebras establishes a Galois connexion between the
set of all sets of laws in the given alphabet X and the class of all sets of n-algebras.
Given any set L of laws, we can form j'd L ), the class of all n-algebras satisfying
all the laws in L. This class "I'n ( L) is called the variety generated by L. For
example, groups form the variety of (11, v, e)-algebras generated by (1.3.7)-(1.3.9).
Likewise rings form a variety, but fields do not. In the other direction we can
from any set C€ of n-algebras form the set q(C€) of all laws holding in all algebras
of C€. Now our Galois connexion relates each variety of n-algebras to a correspon-
dence on Wn(X) of the form q(C€).
   For any class C€ of n-algebras its members will be called C€ -algebras. Our next task
will be to determine the precise form of the set q(C€). A subalgebra of A is called fully
invariant in A if it is mapped into itself by all endomorphisms of A; this definition
also extends to congruences on A, as subalgebras of A 2 •
This will follow if we can show that all the laws corresponding to the elements of q
hold in W/q. Let (u, v) E q, let a : W -+ W /q be any homomorphism and denote
the natural homomorphism W -+ W/q by v. We shall define an endomorphism
a' of W such that
                              wa'v = wa     for all   WE W;                      (1.3.14)
16                                                                      Universal algebra
JLf' = f. (1.3.15)
Remarks
1. If (Ij contains non-trivial algebras, then JL is an embedding. For, given a, b E X,
   a =1= b, we can map X to a (Ij -algebra by a mapping f such that af =1= bf; hence
   by (1.3.15), aJL =1= bJL.
2. If (Ij admits subalgebras, then the free (Ij -algebra F is generated by the image XJL.
   For otherwise we could replace F by the subalgebra generated by XJL; since F is
   unique up to isomorphism, it must itself be generated by XJL. Thus XJL generates
   F; it is called a free generating set.
3. If (Ij admits sub algebras, F is a free (Ij-algebra on X and X' is a subset of X, then
   the subalgebra of F generated by X' is the free (Ij-algebra on X'. For this sub-
   algebra is easily seen to possess the universal property.
Not every class has free algebras, but they exist in varieties, by our next result.
Proposition 1.3.6. Let "Y be any variety of n-algebras and q the congruence on
W = Wn(X) consisting of all the laws holding in "Y. Then W/q is the free "Y-algebra
on X.
Proof. By (1.3.13), W/q is a "Y-algebra, so it only remains to verify the universal
property. Let us write v : W ~ W / q for the natural mapping. Given any mapping
i: X ~ A to a "Y -algebra, by Theorem 1.3.2 this extends to a homomorphism
f: W ~ A. Given u, v E W, if u == v (mod q), then (u, v) is a law in "Y and so
holds in A, hence   uJ vJ.                     L
                       = Thus q ~ ker and by the factor theorem there is a
homomorphism f' : W / q ~ A such that f = vf'. If JL : X ~ W is the injection,
we have f = JLJ = JLvf', and f' is unique, since it is given on a generating set of
W/q. Thus W/q satisfies all the conditions for a free "Y -algebra.               •
1.3 Free algebras and varieties                                                       17
  There is another way of forming free algebras, which leads to a useful criterion,
due to Garrett Birkhoff, for a class of algebras to be a variety.
Proof. The necessity of the conditions is easy to check; given any n-algebra A, it is
clear that any subalgebra and any homomorphic image of A satisfy all the laws hold-
ing in A. Moreover, if a law holds in every member of a family of n-algebras, then it
also holds in their direct product. This shows that every variety satisfies the given
conditions.
   Conversely, let ~ be a class of n-algebras closed under subalgebras, homomorphic
images and direct products. Then ~ contains the trivial algebra (as the direct product
of the empty family). If there are no other algebras in ~, then we have the variety
defined by the law Xl = X2. SO we may now assume that ~ contains a non-trivial
algebra. We can form a free ~ -algebra on a given set X as follows. Consider the
set of all ~ -algebras with a generating set of cardinal not exceeding that of X.
Take all mappings fa : X -+ Aa, where Aa is a ~ -algebra and Xfa a generating set
of Aa, and in the direct product P =    n   Aa consider the subalgebra F generated by
the elements (xfa), X E X. As a sub algebra of the direct product, F is again in ~.
We claim that F satisfies the universal property relative to the mapping
IL : X 1-+ (xfa). For if f : X -+ A is any mapping to a ~ -algebra A and A' is the sub-
algebra generated by Xf, then the restriction f IA' coincides with some fa and so A' is
a homomorphic image of F, the mapping F -+ A' being the projection on the appro-
priate factor. Hence we have a homomorphism f' : F -+ A such that f = 1Lf' and f'
is unique since it is prescribed on a generating set of F. Thus F is the free ~ -algebra
on X.
   Clearly we have
                                       ~   S;   ~**,                            (1.3.16)
and it remains to prove equality. Let q = ~* be the set of all laws holding in ~. By
Proposition 1.3.6, the free ~-algebra is W/q. If A E ~**, we can write A as a homo-
morphic image of W, for an appropriate X, say f : W -+ A. By the definition of
~** = q*, A satisfies all the laws of q, hence f can be factored by q; thus A is a homo-
morphic image of WI q, the free ~ -algebra on X, and A is therefore itself a ~ -algebra.
Hence equality in (1.3.16) is established.                                            •
   We have already remarked that rings and groups are examples of varieties. We
now see that fields (commutative or not) do not form a variety, since they do not
admit direct products; for if E, F are any fields, their direct product E x F as a
ring has zero-divisors and so cannot be a field.
18                                                                     Universal algebra
Exercises
 1. Show that if some operation symbols are written on the left and others on the
    right of their arguments, then ambiguities can arise.
 2. Let w = CI •.. CN be an n-word of the form UI ... UnW (Ui E W, W E n(n)).
    Show that any proper subsequence CiCi + I ... Cj' where j - i < N - 1, which is
    itself an n-word, can only occur within a single factor Uk.
 3. Assuming a bijection JL : A2 ** A between a set A and its Cartesian square A 2,
    show that every n-ary operation W on A can be expressed in terms of the
    binary operation JL and a suitable unary operation.
 4. Verify that the set of all essentially unary operations on a set is a clone. Deduce
    that any operation derived from essentially unary operations is again unary.
 5. Let A be any n-algebra, X a set and for each x E X, let ox: AX -+ A be the
    projection on the x-th factor. Show that the sub algebra of the direct power
    AAx generated by all the oAx E X) is the free algebra on X for the variety
    generated by the algebra A (i.e. the least variety containing A).
 6. Show that modular lattices form a variety. Similarly for distributive lattices, and
    Boolean algebras.
 7. Show that groups may be defined in terms of the operation xya = xy-I as non-
    empty algebras satisfying xzayzaa = xya, xxayyayaa = y. Show that abelian
    groups may be defined by xxyaa = y, xyaza = xzaya.
 8. Show that any variety of groups defined by a finite set oflaws can also be defined
    by a single law.
 9. Show that the automorphism group of WdX) is isomorphic to the group of all
    permutations of X.
10. Let "fI be a variety of n-algebras and F the free "fI-algebra on a set X. Given a
    homomorphism f : A -+ B between "fI-algebras which is surjective, and a
    homomorphism e: F -+ B, find a homomorphism e': F -+ A such that
    e = e'f. (Hint. See the proof of Theorem 1.3.5.)
admits no direct moves and the main result of this section, the diamond lemma, gives
conditions under which each equivalence class contains a single reduced expression.
The conditions are of a form that frequently applies, and it leads to a simple solution
of our problem: To test if two expressions are equivalent we apply direct moves until
each is in reduced form; if these reduced forms are equal, then and only then are the
two expressions equivalent.
   A typical application is the existence proof of a normal form for the elements of a
free group (see Exercise 4 and Chapter 3). For a discussion of the applications to
rings, with many illuminating examples, see Bergman [1978].
Exercises
1. In Lemma 1.4.1(ii) assume that if u is transformed to x by one direct move and to
   y by another, where x #- y, then there exists v E A which can be reached from each
   of x, y by just one direct move. Show that all reduction chains from a given
   element to a reduced element have the same length. Show that the extra condition
   cannot be omitted.
2. (M. H. A. Newman) Show that the conclusion of Lemma 1.4.1 still applies if (ii)
   holds but instead of (i) we have merely the minimum condition: no element
   admits an infinite succession of direct moves. (Hint. Repeat the construction in
   the proof of Lemma 1.4.1 and use the minimum condition.)
3. Let A be a ring with an endomorphism a. Show that in the ring R generated by A
   and a symbol x satisfying ax = xaCY. for all a E A, every element can be uniquely
20                                                                       Universal algebra
     show that 5 can be embedded in a semigroup in which (1.4.2) has a solution for
     all a, b. (Hint. Adjoin a new symbol p to S and consider all words in 5 U {p} with
     direct move pa -+ b. Verify the conditions of Lemma 1.4.1 and show that distinct
     elements of S cannot be equivalent. Now show that the resulting semigroup again
     satisfies (1.4.1) and repeat the process (see Cohn [1956]).)
1.5 Ultraproducts
Let us again consider the direct product construction. Given a direct product
p     n
  = Ai of n-algebras, we have seen that if a law u = v holds in each factor Ai
then it holds in the product. On the other hand, consider the statement occurring
in the definition of a field:
                    for all a i=- 0 there exists a' such that aa' = 1.            ( 1.5.1)
This may well hold in each factor Ai and yet fail to hold in the direct product, as we
see by taking the direct product of two fields; the element (1,0) is different from 0
but does not have an inverse.
   In order to remedy the situation we introduce certain homomorphic images of
direct products, called ultraproducts, which have the property that every sentence
of first-order logic which holds in all the factors, also holds in the product. For a
complete proof we would need a detailed description of what constitutes a sentence
in first-order logic, i.e. a sentence without free variables, and in which all quantifica-
tions are over object variables (an 'elementary sentence'), and this would take us
rather far afield. However, the construction itself is easily explained and has many
uses in algebra. We describe it below and refer for further details to Bell and Slomson
(1971), Barwise (1977) and Cohn (1981).
   We shall need the concept of an ultrafilter. Let I be a non-empty set. By a filter on
lone understands a collection ff' of subsets of I such that
1.5 Ultraproducts                                                                         21
F.l IEff,0¢ff,
F.2 if X, Y E ff, then X n Y E ff,
F.3 if X E ff and X ~ X' ~ I, then X'       E   ff.
For example, given a subset A of I, if A :/= 0, then the set of all subsets of I contain-
ing A is a filter, called the principal filter generated by A. More generally, if (A)J is any
family of subsets of I, then the collection of all subsets containing a set of the form
(1.5.2)
forms a filter, provided that none of the sets (1.5.2) is empty. This condition on the
AA is called the finite intersection property. Thus any family of subsets of I with the
finite intersection property is contained in a filter on I. Such a family is also called
a filter base.
   An ultrafilter on I is a filter which is maximal among all the filters on I. An alter-
native characterization is given by
Lemma 1.5.1. A filter ff on I is an ultrafilter if and only if for each subset A of I, either
A or its complement A' belongs to ff.
  Of course A, A' cannot both belong to ff, because then ff would contain
0=AnA'.
Proof. Let ff be an ultrafilter. If A ¢ ff, then by F.3, no member of ff can be con-
tained in A and so each member of ff meets A'. It follows that the family of all sets
containing some F n A'(F E ff) is a filter containing ff, but then it must equal ff by
maximality of the latter, so A' E ff. Conversely, if for each A ~ I, either A or A'
and so  °
belongs to ff, consider a filter ff 1 J ff and B E ff 1\ff. By assumption, B' E ff
           = B n B' E ff 1 which is a contradiction.                                •
Proof. Let ff be a filter on I and consider the set of all filters containing ff. This set
is easily seen to be inductive, hence it has a maximal member, which is the required
ultrafilter.                                                                           •
                                         nA;jff                                      (1.5.3)
22                                                                           Universal algebra
is the homomorphic image of the direct product P =             Il Ai, defined by the rule:
                  for any x, YEP, x    == Y -¢}   {i E Ilxni = ynd E :F,              ( 1.5.4)
where ni is the projection on Ai. Let us call a subset of I :F-Iarge if it belongs to :F.
Then the definition states that x == y iff x and y agree on an :F-Iarge set. We have to
verify that we obtain an Q-algebra in this way, i.e. that the correspondence defined on P
by (1.5.4) is a congruence. Reflexivity and symmetry are clear and transitivity follows
by F.2. Now take wE Q(n) and let Xv == Yv(v = 1, ... , n), say Xv and Yv agree on
Av E:F. Then Al n ... nAn E:F and on this set Xl" 'XnW and Yl .. . ynw agree.
Thus we have
                                                  if i E J.
Let us denote the image of (b i) in Kby b. Since aibi = biai = 1 for i E l' and l' E :F,
we find that ab = ba = 1. Hence every non-zero element of K has an inverse and so
K is a skew field, as claimed.                                                        •
   It is instructive to take a reduced product and see where the proof fails; it was for
(1.5.5) that we needed the property of ultrafilters singled out in Lemma 1.5.1.
   To illustrate Theorem 1.5.4 and the ultraproduct theorem mentioned earlier, let
us take a sentence I¥ in the language of fields and suppose that we can find fields
of arbitrarily large characteristic for which I¥ holds. Then I¥ also holds in their ultra-
product, and this will be of characteristic 0, if it was formed with a non-principal
1.5 Ultra products                                                                        23
ultrafilter. For suppose that ki is a field of characteristic pi, where PI ::: P2 ::: ... and
Pi --+ 00 as i --+ 00. Then the sentence
                                <Pn : 1 + 1 + ... + 1 = 0
                                      --...-
                                         n
holds in only finitely many of the ki for each n, and hence its negation -'<Pn holds in
their ultraproduct; this shows the latter to be of characteristic O. Since \11 holds in
each ki' it also holds in the ultra product. Thus we have
   For example, consider the statement: every non-degenerate binary quadratic form
is universal. This may be stated as
It holds for all finite fields of characteristic not two (see BA, Section 8.2); hence it
also holds in certain fields of characteristic O.
   As a second illustration we observe that a field of characteristic P may be defined
by the sentence '-'<P I /\ <Pr'. Hence a field of finite characteristic is defined by the
'infinite disjunction'
This is not an elementary sentence as it stands. But we can assert further that it is not
equivalent to any set of elementary sentences. For if it were, it would hold in all fields
of finite characteristic and hence also in some fields of characteristic 0, which is
clearly not the case.
Exercises
1. Show that any ultrafilter which includes a finite set must be principal.
2. Let A be an infinite set. Show that for every non-empty subset B of A there is an
   ultrafilter ff B including B. What is the condition on B for ff B to include all cofi-
   nite subsets?
3. Let I be a set and (!IJ(I) the Boolean algebra of all subsets of I (see BA, Section 3.4).
   Defining ideals of Boolean algebras as inverse images of 0 in homomorphisms,
   show that a filter on I is just the complement of a non-zero ideal in (!IJ(I).
   Which ideals correspond to ultrafilters?
4. Show that any formula <P (x) holds in an ultraproduct          n   A;/ ff iff it holds in
   all the factors Ai for an ff -large set of indices. (Hint. Verify that the formulae
   for which this is true include all atomic formulae and are closed under
   V, /\, -., "1,3. Hence the result holds for sentences, i.e. formulae without free
   variables.)
5. (Compactness theorem of model theory) Let g- be a set of elementary sentences
   about Q-algebras. Show that if each finite subset P of g- has a model (i.e. there is
24                                                                        Universal algebra
   an algebra in which each sentence of P holds), then f7 has a model. (Hint. For
   each P ~ f7 take a model Ap and form a suitable ultraproduct of the Ap)
6. Show that an integral domain R which is embeddable in a direct product of skew
   fields is embeddable in a skew field. (Hint. Let Kj (i E 1) be the family of skew
   fields and for c E R X let Ie be the set of indices i for which c is inverted in Kj.
   Verify that the Ie form a filter base.)
   For example, if we take A = N, b = 1 and remember N.5, we see from the lemma
that every number different from 1 is the successor of a number. Thus if n #- 1, then
there is a number which we shall denote by n - 1 such that (n - 1)' = n. By NA,
n - 1 is uniquely determined by n; it is called the predecessor of n.
   For any n E N we denote by [nl the subalgebra generated by n.
n' #- n. ( 1.6.1)
For n = 1 this holds by N.3; if it holds for any n #- 1, then it holds for n' by NA,
hence it holds for all n, by induction (i.e. N.S).
   Now by Lemma 1.6.1, [1'1 consists entirely of successors of numbers, whereas 1 is
not a successor, hence 1 ¢ [1'1. Suppose now that n ¢ [n'l but that n' E [n"l. By
(1.6.1), n' #- nil, so n' must be the successor of an element in [n"l; but this can
only be n (by NA), so n E [n"l and [n"l C [n'l, therefore n E [n'l, which contradicts
the hypothesis. By induction we conclude that n ¢ [n'l for all n.                 •
   Let us write In] for the complement of [n'l in N. By Lemma 1.6.2, n E In]; the
elements of In] other than n will be called the antecedents of n. When n #- 1, they
clearly include the predecessor n - 1 of n. With these preparations we can prove a
result on which the box principle is based.
Theorem 1.6.3.Let m, n E N. There is an injective mapping from 1m] to In] if and only
if 1m] £ In]. Further there is a bijection between 1m] and In] if and only if m = n.
Proof. If 1m] £ In], then the inclusion mapping is the required injection, and for
m = n this is a bijection. Conversely, assume that f : 1m] ~ In] is an injective map-
ping; we must show that 1m] £ In]. When m = 1, then since 1 ¢ [n'l, we have
1 E In] and so 11] £ In]. We may therefore assume that m #- 1 and use induction
on m. Since m #- 1, there is a predecessor m - 1; we define a mapping
g: 1m - 1] ~ In] by the rule
                                        kf     if kf    #- n,
                                kg= {
                                        mf if        kf = n.
To check that this is a well-defined mapping we note that there is at most one
number k such that kf = n, because f is injective. Denote this number by ko; if
ko = m or ko is not defined (because the image of f does not include n), then g is
just f restricted to 1m - 1]. Otherwise g differs from f only at ko and there it has
the value mf which it assumes nowhere else, for the domain of g does not include
m. Thus g is well-defined; moreover g is injective and it does not assume the value
n. It follows that n #- 1 and that g is an injective mapping from 1m - 1] to
26                                                                           Universal algebra
   A set S is called finite if there is a bijection between S and In], for some n E N.
By what has been said, there can be at most one such n and this is called the
cardinal of S. Thus for any finite set there is a natural number which is its cardinal.
   Theorem 1.6.3 leads to the familiar ordering of the natural numbers: we write
m S n to mean that m is an antecedent of n, or equivalently, 1m] ~ In]. It is clear
that this relation is reflexive and transitive, and by the last part of Theorem 1.6.3
we see that m S n, n S m implies m = n. Thus we have a partial ordering. From
the definition it is clear that m S n implies m' S n' and it is easy to show that
's' is a total ordering. Given m, n E N, if m = 1, then clearly m S n; similarly if
n = 1, then n S m. Now if m, n i- 1, we can form m - 1, n - 1 and by induction
either m - 1 S n - lorn - 1 S m - 1. Taking successors we find that m S n or
n S m. We shall also adopt the usual notation of writing m < n or n > m to
mean 'm S n but m i- n' and m ::: n to mean n sm.
   In contrapositive form Theorem 1.6.3 shows that if m ¢ In], so that m > n, then
there can be no injective mapping from 1m] to In]. In particular, taking
m = n'( = n + 1), we see that when n' objects are distributed over n boxes, at
least one box contains more than one element. This is just Dirichlet's Box Principle,
already encountered in BA, p.2, where it was stated without formal proof.
   The natural numbers have another property not shared by all ordered sets; they are
well-ordered, i.e. every non-empty subset ofN has a least element. Given 0 C S ~ N,
let T be the set of numbers m such that m S n for all n E S. Clearly 1 E S; we claim
that there is a number a E T such that a' ¢ T. For if a' E T for all a E T, then by
induction T = Nand S must be empty, a contradiction. Hence there exists a E T
such that a' ¢ T and it follows that a is the least number in S, since a S n for all
n E S, and a E S since otherwise a' E T. This proves
Theorem 1.6.5. N is the free induction algebra on 1. Thus if A is any induction algebra
and a   E   A, there is a unique homomorphism a : N     ~   A such that 1a   = a.
Proof. In detail the assertion states that A is a set with a single unary operation
x I~ x', and given a E A, there is a unique mapping x I~ xa from N to A such that
By Proposition 1.1.1 there can be at most one such mapping. To prove its existence
we form the direct product N x A; this is again an induction algebra, with the opera-
tion (x, y)' = (x', y'). Let H be the sub algebra of N x A generated by (1, a); further
1.6 The natural numbers                                                               27
  Functions on N are frequently defined recursively, for example the sum of the
squares of the first n natural numbers may be defined as the function g : N --+ N
such that
Theorem 1.6.6. Given a E N and any function from N to N, there exists a unique func-
tion cp from N to itself, satisfying the equations:
Proof. Suppose first that f is independent of its first argument. Then we have to find
cp : N --+ N to satisfy
In this case the result follows immediately from Theorem 1.6.5, taking A there to be
the set N with f as its successor function.
28                                                                           Universal algebra
From this definition it is easy to prove the associative and commutative laws of
addition:
                                   (a   + b) + c = a + (b + c),                       (1.6.7)
We shall prove (1.6.7) as an example and leave (1.6.8) to the reader. For c = 1,
(1.6.7) reduces to (a + b) + 1 = a + (b + 1), i.e. (a + b)' = a + b', which is true
by the definition (1.6.6). If we assume that (1.6.7) holds for c = n, then
(a + b) + n' = [(a + b) + nJ' = [a + (b + n)]' = a + (b + n)' = a + (b + n' ),     by
(1.6.6) and the case c = n. Hence (1.6.7) holds for c = n', and so by induction it
holds for all n.
    Similarly we can define multiplication by constructing for each a E N, a mapping
f.La : N --+ N such that If.La = a, x' f.La = Xf.La + a. The existence and uniqueness
follow again from Theorem 1.6.6, and we write as usual Xf.La = ax, so that the defini-
tion takes the form
al = a, a(x + 1) = ax + a. (1.6.9)
The associative and commutative laws can again be proved without difficulty, as well
as the distributive law. If we adjoin a new element, denoted by 0, to N, to satisfy
a + 0 = a, aO = 0, we have a monoid under addition and the usual procedure for
1.6 The natural numbers                                                                 29
Exercises
1. Prove the commutative law of addition in N.
2. Prove the associative and commutative laws of multiplication, the distributive law
   and the cancellation law in N.
3. Give a direct proof that an induction algebra generated by a single element 1 satis-
   fies either N.3 or NA.
30                                                                     Universal algebra
4. Show that in any induction algebra the union of two subalgebras is again a sub-
   algebra.
5. Let fbe the function on the non-negative integers, defined by f(O) = 0, f(x') = x.
   Describe the function g(x, y) defined by g(x, 0) = x, g(x, y') = f(g(x, y)).
6. Define the natural ordering on Z in terms of the ordering on N and prove its
   compatibility with addition and multiplication by positive numbers.
7. Show that there is no total ordering on Zlm, the set of integers mod m, preserving
   addition and multiplication.
S. Use Theorem 1.6.3 to give a proof that N is not finite.
9. Give a direct proof by induction that there exists no surjective mapping from 1m]
   to In] if m < n.
                                            33
P. Cohn, Further Algebra and Applications
© Professor P.M. Cohn, 2003
34                                                                      Homological algebra
UX
<:JUi UY
  Let d be a category and (Xi) any family of d -objects. Given d -objects P, Y, each
family of maps 7ri : P """* Xi gives rise to a natural mapping
where g 1"""* g7ri andn  is the usual Cartesian product. When (2.1.1) is bijective for
each Y we call P a product of the Xi with natural projections 7ri. The product P
with its maps 7ri can also be described as the solution of a universal problem, for
it is the final object in the category whose objects (A, f;) are families of maps
f; : A """* Xi and whose morphisms ~ : (A, f;) """* (B, gi) are families of commutative
triangles, thus ~: A """* B satisfies f; = ~gi. It follows that the product, when it
                                                                  n
exists, is unique up to isomorphism. We shall denote it by Xi; thus we have an
isomorphism, natural in Y:
(2.1.2)
Here     n
         on the left denotes the product just defined, and on the right the usual
Cartesian product of sets.
Examples
1. In Ens,   n   reduces to the Cartesian product. Likewise in Ab, the category of
   abelian groups, or more generally, in ModR, the category of right R-modules,
     n is the direct product introduced in BA, Section 4.2.
2. In the category of all abelian torsion groups,     n
                                                      Xi is the torsion subgroup of the
   ordinary direct product of the Xi.
3. In the category of all finite abelian groups the product does not exist; this is easily
   seen by taking an infinite family of non-trivial groups.
4. The product of the empty family (in any category where it exists) is the final
   object in the category. For here we have on the right of (2.1.1) the empty product,
   which by convention is a I-element set.
There is a dual construction, the coproduct: given any family (Xi) of d -objects, their
coproduct, also called sum, is an d -object S with maps f.Li : Xi """* S, called the natural
injections, such that (5, {Li) is a product of the Xi in the dual category dO. For an
explicit definition we need only reverse the arrows in the definition of the product.
Thus (S, f.L;) is a coproduct if for any d-object Y the natural mapping
Theorem 2.1.1. In an additive category d let (Xi) be a finite family of objects. Given
an d-object B and maps Pi: B -+ Xi, qi : Xi -+ B, such that qiPj = Dij' the following
conditions are equivalent:
36                                                                     Homological algebra
Proof. (a) ;::::} (c). Let (B, Pi) be a product and write cP = LPiqi; then CPPi : B ~ Xi
satisfies CPPi = Pi, for CPPi = LjPj%Pi = Pi. By uniqueness, cP = 1B and so B is a
biproduct.
   (c) ;::::}(a). Given [; : A ~ Xi, we can define f : A ~ B by f = L[;qi' Then
fpj = Li[;qiPj = fi, therefore (B, Pi) is a product. Thus we have shown that (a) {}
(c) and by duality, (b) {} (c).                                                       •
  We observe that any finite product (or coproduct) in any category satisfying Ad.!
and Ad.2 can be completed to a biproduct in a unique way. Given a product (B, Pi)
of Xl, ... ,Xm fix j and define [; : Xj ~ Xi by [; = Oij. Then there exists % : Xj ~ B
such that qjPi = Oij' This holds for j = 1, ... , n, and now (B, pi, qi) is a biproduct, by
Theorem 2.1.1. Thus every finite product can be completed to a biproduct in a
unique way, and by duality the same holds for finite coproducts. So we find
Corollary 2.1.2. Let d be a category satisfying Ad. 1 and Ad.2. Then any finite family
of d -objects has a coproduct if and only if it has a product, and the two are isomorphic.
In particular, the product and coproduct of any finite family are isomorphic in any
additive category.                                                                     •
   Taking the empty family, we see that the initial and final object in any additive
category are isomorphic. An object that is both initial and final is called a zero
object. Thus every additive category has a zero object. By a zero morphism we under-
stand a morphism which can be factored via a zero object. With this definition it is
easily seen that in an additive category the neutral element in each hom group is the
zero morphism.
   We also note that on writing P = (PI, ... ,Pn), q = (ql,"" qn?, we can express
(2.1.4) in the form
                                                l Xt
                          pq   = 1B,   qp = (
                                                 o
Our next task is a categorical description of kernels; we begin with monomorphisms
and subobjects. In any category (not necessarily additive) a map a: X ~ Y is said to
be monic or a monomorphism if whenever A.a, /.La are both defined, then
                               A.a = /.La   implies    A.   = /.L.
In an additive category this condition can of course be simplified to
A.a = 0 implies A. = 0,
whenever A.a is defined. In Ens the monomorphisms are just the injective mappings.
More generally, in any concrete category injective morphisms are monic; the converse
2.1 Additive and abelian categories                                                   37
Here coim a is the largest quotient of X killing ker a, hence there is a map
K: coim a ---+ Y such that a   = (coim a)K. It follows that (coim a)K(coker a) = 0;
but coim is epic, so K(coker a) = 0, and since im a is the largest subobject of Y
killed by coker a, there is a unique map a' : coim a ---+ im a to make the diagram
38                                                                    Homological algebra
commute. If we had proceeded in dual fashion, starting from im a and going via a
map X -+ im a, we would have obtained another map a" : coim a -+ im a to make
the square commute. Now a = (coim a)a'(im a) = (coim a)a"(im a); since coim a
is epic and im a is monic, it follows that a' = a", so the maps coincide and there is
complete symmetry. In important cases a' is an isomorphism and this suggests the
Proposition 2.1.3. In any abelian category a map a is monic if and only ifker a = 0
and epic if and only if coker a = O. Ifker a = coker a = 0, then a is an isomorphism.
Proof. By definition of ker a we have Aa = 0 iff A = A'(ker a) for some A'. Hence
A = 0 holds for all such maps iffker a = O. This proves the first assertion; the second
follows by duality. If a : X -+ Y is such that ker a = coker a = 0, then im a = Y,
coim a = X and a = a' is an isomorphism.                                             •
  We observe that this result often fails to hold in more general categories, e.g. in Rg
the inclusion map Z -+ Q is both epic and monic but is clearly not an isomorphism.
  A sequence of objects and maps in an abelian category
Corollary 2.1.5. For a short exact sequence (2.1.6) in an abelian category the following
conditions are equivalent:
(a) A is a section,
(b) JL is a retraction,
(c) A ~ A' 0 A" for suitable maps a: A -+ A', 13 : A" -+ A.
Proof. (a) :::::} (c). By hypothesis there is a map a such that Aa = 1. Put v: ker a -+A
for the canonical inclusion. Since JL = coker A, it follows from Proposition 2.1.4 that
VJL is an isomorphism and on writing 13 = (VJL) -1 vAil -+ A, we find that f3JL = 1.
We claim that A is a biproduct of A', A" relative to the maps A, 13; a, JL. Clearly
AJL = 0, f3a = 0, so it only remains to show that aA + JLf3 = 1. Write
f = aA + JLf3 - l A j then fJL = JLf3JL - JL = 0, hence f = f'A where f' = f'Aa =
fa = aAa + JLf3a - a = a - a = 0; it follows that f = 0 as claimed. Thus
A ~ A' 0 A"j the converse is clear, hence (a) ~ (c), and now (b) ~ (c) follows by
duality.                                                                               •
  A short exact sequence satisfying the equivalent conditions of this corollary is said
to be split exact.
   We recall from BA, Section 4.2 that in a category of modules, for any pair of maps
with a common target, a : A -+ C, 13 : B -+ C, there is a 'least common left multiple'
P with maps a' : P -+ B, 13' : P -+ A such that a'f3 = 13' a and for any pair a", 13"
such that a"f3 = f3"a there exists y such that a" = ya', 13" = yf3". It is called the
pullback of the triple (a, 13, C). This pullback exists in any abelian category, for,
given a : A -+ C, 13 : B -+ C, form the product     An B with projections p, q on A,
B respectively; now it is easily verified that ker(pa - qf3) is a pullback of a, 13.
A dual construction can be carried out for the pushout of a triple (C, a, 13) as
coker (ai,f3j), where i, j are the injections of A, B (the targets of a, 13) into the
coproduct A UB.
40                                                                   Homological algebra
  The following property of pullbacks was proved in BA for the module case
(Proposition 4.2.1); we now see that it holds quite generally:
Proof. Write ker a' = (K', v'); we shall show that (K', v' f3') is a kernel of a. In the
first place v'fJ'a = v'a'fJ = 0; secondly, if v: K -+ A is such that va = 0, then the
triangle ABC can be completed by K to a commutative square, with 0 : K -+ B,
hence there is a unique map A: K -+ P such that AfJ' = v, Aa' = o. Since
v' = ker a', there is a unique map /L: K -+ K' such that /LV' = A, hence
/LV' fJ' = v and this shows that v can be factored uniquely by v' fJ'; therefore ker
a = (K', v'fJ'), as claimed.                                                          •
                           o ~ P ----+
                                   A A   0 B ----+ C
                                                 /l    ~    0,                   (2.1. 7)
where A = (fJ'i, a'j), /L = pa - qfJ and i, j, p, q are the natural injections and pro-
jections of the biproduct AD B. The square is a pullback iff P = ker(pa - qfJ), i.e.
(2.1.7) is exact at P and AD B; it is a pushout iff C = coker(fJ'i, a'j), i.e. (2.1.7)
is exact at AD Band C. It follows that a pullback is also a pushout whenever /L
is epic. Suppose now that a is epic and let v be such that /LV = o. Then
av = ipav = i(pa - qfJ)v = i/Lv = 0, and hence v = 0; this means that /L is epic.
Thus if in a pullback a is epic, then we have a pushout and so by Proposition
2.1.6, a' is also epic. This proves
Exercises
 1. Show that Ens has an initial and a final object, but no zero object.
 2. Show that in Rg the inclusion Z -+ Q is monic and epic but not an iso-
    morphism. Is the inclusion Z -+ R an epimorphism?
 3. Show that in a concrete category every monomorphism is injective.
2.2 Functors on abelian categories                                                    41
Proposition 2.2.1. Any additive functor acting on an abelian category preserves finite
products, coproducts and biproducts.                                               •
   Clearly any functor takes zero maps to zero maps and hence transforms a complex
into a complex. We shall be particularly interested in functors that preserve exact-
ness. A functor T is said to be exact if it transforms each exact sequence
(2.2.1)
                                                                                       (2.2.2)
For example, an equivalence between categories is an exact functor. We recall from
BA, Section 3.3 that two categories .91, f!J are equivalent if there are two functors
T: .91 -+ f!J, S : f!J -+ .91 such that TS is naturally isomorphic to the identity functor
on .91, and similarly ST is naturally isomorphic to the identity on ffl. Any functor
T : .91 -+ f!J defines for each pair X, Y of .91 -objects a mapping
(2.2.3)
The functor T is called faithful if (2.2.3) is injective and full if (2.2.3) is surjective. For
an equivalence functor T, (2.2.3) is a bijection, so in this case T is full and faithful.
Moreover, an equivalence functor T is dense in the sense that every .91 -object is iso-
morphic to one of the form XT, for some .91 -object X. As we saw in BA, Proposition
3.3.1, a functor T is an equivalence iff it is full, faithful and dense.
   All this holds in quite arbitrary categories; when .91, ffl are additive (and by
assumption T is an additive functor), (2.2.3) is clearly a group homomorphism,
and it follows easily from this that any equivalence functor is again exact.
   However, exact functors are rare; most functors only satisfy a weaker condition.
We define a functor to be left exact if it preserves kernels and right exact if it
preserves cokernels. First we have a restatement of this condition.
                                                                                       (2.2.4)
implies the exactness of
                                                                                       (2.2.5)
2.2 Functors on abelian categories                                                        43
Similarly T is right exact if and only if it preserves exactness when the   ain (2.2.4) is at
the other end (i.e. when coker JL = 0).
Proof. The exactness of (2.2.4) is expressed by the equation A = ker JL. If T preserves
kernels, it follows that AT = ker JL T and so (2.2.5) is exact. Conversely, if (2.2.5) is
exact, then by applying T to the exact sequence 0 ---+ 0 ---+ A ---+ B, we find that the
sequence 0 ---+ AT ---+ BT is exact, as well as (2.2.5), so )J = ker JL T, and this
shows T to be left exact. Similarly for right exactness.                              •
Corollary 2.2.3. A functor between abelian categories is exact if and only if it is left and
right exact.
Proof. Clearly an exact functor is left and right exact; conversely, if a functor T is left
and right exact, it preserves kernels and cokernels, hence images and coimages. By
hypothesis im A = ker JL in (2.2.1), hence im AT = (im A)T = (ker JL)T = ker JLT,
so T is indeed exact.                                                                   •
Corollary 2.2.4. A functor between abelian categories is exact if and only if it preserves
the exactness of short exact sequences.                                                   •
   So far all functors were tacitly assumed to be covariant. If T : .s1 ---+ (JI is a contra-
variant functor, we shall call T left exact if the covariant functor op.T : .s10 ---+ (JI is
left exact. Right exact contravariant functors are defined correspondingly, by the
right exactness of op.T. The reason for this form of the definition (rather than
using T.op : .s1 ---+ (JI0) is to be found in
Theorem 2.2.5. For any abelian category.s1, the bifunctor .s1(X, Y) is left exact in each
argument, i.e.   we, hy are each left exact.
Proof. For any JL: Y ---+ Y" in .s1 the kernel of the induced mapping
.s1(X, JL) : .s1(X, Y) ---+ .s1(X, ylI) is the set of morphisms killed by JL, i.e. the
maps that factor uniquely through ker JL. Thus
isomorphic to hP and one also says that F is represented by P. When the category d is
abelian, we can similarly define the representability of a functor from d to abelian
groups. A contravariant functor G from d is called representable if there is an
d-object Q such that yG = d(Y, Q), thus G is naturally isomorphic to hQ • For
example, the dual of a vector space is representable, almost by definition:
V* = Homk(V, k). To give another example, consider U(R), the group of units of
a ring R. It can be shown that this functor is representable by the infinite cyclic
group Z, thus U(R) ~ Mon(Z, R), where Mon is the category of mono ids and R is
considered as multiplicative monoid.
   Sometimes we shall need a criterion for a functor to preserve inexact sequences;
a sequence ~ ~ is called inexact if it is not exact, i.e. if im ).. =1= ker JL.
Proof. Suppose first T preserves inexact sequences; we must show that T is faithful,
i.e. a =1= 0 implies aT =1= O. Given a: A ---+ B, where a =1= 0, the sequence
A ~ A ~ B is inexact, hence it remains so on applying T, i.e. ker aT =1= AT and
so aT =1= O.
    Conversely, assume that T is faithful and consider the sequence (2.2.2). If this is
exact, then ()"JL)T = )..T JL T = 0, hence )"JL = O. Now let ker JL = (B', i) and consider
the composition B' ~ B ~ C. This is zero, hence so is the result of applying T
and it gives rise to a map (ker JL) T ---+ ker JL T. Likewise there is a map coker
AT ---+ (cokerA)T, and the sequence
is exact at BT; hence the composition (ker JL)T ---+ BT ---+ (coker )..)T is zero, and it
follows that the sequence (2.2.1) is exact at B.                                      •
  There is a useful test for exactness in the case of adjoint functors. Given two
functors T: d ---+ f!J, S : f!J ---+ d, we call d, f!J an adjoint pair, or more precisely,
S a left adjoint and T a right adjoint if for any d-object X and f!J-object Y,
(2.2.6)
Gp(Fx, G) ~ Ens(X, G U ),
where Fx is the free group on X. Generally nearly every universal construction arises
as the left adjoint of a forgetful functor. To give another example, if i : Ab ---+ Gp is
2.2 Functors on abelian categories                                                       45
The typical construction described by a right adjoint singles out a subset by some
closure operation; see for example Exercise 8.
   Returning to the general case of an adjoint pair (2.2.6), we observe that each of S, T
determines the other up to natural isomorphism, for if we had
(2.2.7)
let us first take Y = XT and denote by a : XT ~ X T' the map on the right of (2.2.7)
corresponding to the identity map on the left; next take Y = X T ' and let
f3 : X T' ~ XT be the map on the left corresponding to the identity map on the
right. Then af3 = I xT, f3a = IxT', so a is a natural isomorphism.
   We also note that the hom functor as a bifunctor is faithful. Taking for example,
IJA : X I~ d(A, X), we have for a : X ~ Y, h" : A I~ Aa, thus h" is right multi-
plication by a, and choosing A = Ix we find that Aa = 0 for all A implies a = 0;
similarly for hfJ. With these preparations we have
Theorem 2.2.7. Let Sand T be a pair of adjoint functors between abelian categories d
and f!4. Then the left adjoint S is right exact and the right adjoint T is left exact. More-
over, if d, f!4 have arbitrary products and coproducts, then T preserves products and S
preserves coproducts.
Proof. Let us apply (2.2.6) to a short exact sequence
                                          A
                                 o ~ X' ---+     11
                                             X ---+ X".
By Theorem 2.2.5 the top row is exact, hence so is the bottom row, and this arises by
applying the functor hY to the sequence
(2.2.9)
But hY, when Y is allowed to vary, is faithful and so preserves inexact sequences; since
the bottom row in (2.2.8) is exact, so is (2.2.9). This proves T to be left exact, and
it preserves products, by-(2.2.6). A dual argument shows S to be right exact and
to preserve coproducts.                                                               •
  Let '?5 be an abelian category and I a partially ordered set, regarded as a small cate-
gory. We denote by '?5 1 the functor category whose objects are functors from I to '?5
with natural transformations as morphisms; thus the objects are families of '?5 -objects
46                                                                          Homological algebra
Conditions (2.2.10) are called the coherence conditions and a family satisfying them is
said to be coherent. A morphism f: (Ai, O!ij) -+ (Bi, f3ij) is a family of maps
fi : Ai -+ Bi such that O!ijjj = fif3ij for i :::: j.
    We have the diagonal functor
(2.2.11)
which with each ~-object A associates the constant family (Ai, O!ij) with Ai = A,
O!ij = 1. The adjoint functors of D.. play an important role. The direct limit (also
called the inductive limit or colimit ) lim ---> is defined as the left adjoint of D..:
(2.2.12)
If L = lim (Ai) exists, there are maps Ui : Ai -+ L satisfying Ui = O!ijUj for i :::: j such
that any map (fi) from (Ai, O!ij) to D..(B) can be factored uniquely by (Ui), thus
fi = Ui A for all i E I, for a unique A : L -+ B.
     For example, when I is totally unordered, the direct limit reduces to the co-
product. If I consists of three points i, j, k with i < j, i < k, we obtain the pushout.
Let us describe direct limits for modules. The construction is simplified if we assume
I to be a directed partially ordered set (i.e. given i, j E I, there exists k :::: i, j); in that
case the family (Ai, O!ij) is also called a direct family. The direct limit L is the direct
sum of the Ai modulo the submodule generated by the elements x - XO!ij(X E Ai) for
i :::: j. For an example from field theory, let F be a field and (Ei) the family of all
extensions of F of finite degree, with inclusions as mappings; then it is clear that
we have a direct family. Its direct limit is a field n containing F, which is algebraic
over F and algebraically closed, and hence is the algebraic closure of F (see BA,
Section 7.3 and Section 11.8).
     Similarly the inverse limit (also the projective limit or simply limit) lim..- is defined
as the right adjoint of D..:
2.2 Functors on abelian categories                                                      47
The inverse limit C has maps Vi : C --* Ai such that ViOlij = Vj and any map (f;) from
b.(B) to (Ai, Olij) can be factored uniquely by (Vi), thus f; = JLVi for all i E I, for a
unique JL : B --* C.
   When I is totally unordered, this reduces to the product of the Ai. For a triple i, j, k
with i < k, j < k it becomes the pullback. To describe the construction for modules
we take I inversely directed and refer to (Ai, Olij) as an inverse family. The inverse
limit of such a family of modules is obtained by forming the product            n Ai and
taking the submodule of all elements (Xi) such that XiOlij = Xj.
   To illustrate the notion of inverse limit consider a free group F of rank> 1. Let us
write (Ni) for the family of all normal subgroups of finite index in F. Then Gi = F INi
is a finite group and for Ni S; Nj we have a natural homomorphism C{Jij : Gi --* Gj'
and these homomorphisms are coherent. Since the intersection of any two of the
Ni is again of finite index, we have an inverse family and we can form the inverse
limit G = lim<- (Gi ). This group G is called a profinite group (as projective limit
of finite groups). Since the natural homomorphisms F --* G; are compatible with
the C{J;j, we have a canonical homomorphism y: F --* G. As we shall see in
Section 3.4, nNi = I, and it follows easily from this fact that y is injective. However,
y is not surjective, for G, as inverse limit, is uncountable, whereas F is countable
whenever its rank is at most countable. A similar construction is possible for abelian
groups, thus for example Z can be embedded in a profinite group, or even in a pro-
p-group (see Exercise 10).
   The last example illustrates another important point, namely the lack of duality in
general module categories. As we have seen, the notion of an abelian category can be
developed in an entirely self-dual manner. However, the category of all modules over
a given ring is not self-dual, except for very special rings, so in order to describe
module categories axiomatically one will need axioms whose duals may not hold.
We shall not carry out the full axiomatization (which can be found in most books
on category theory) but merely list one axiom holding in all module categories
but not always for their duals. This is Grothendieck's
ABS Axiom. Given a chain of subobjects (Ai) and any subobject B of an object, we have
Since lim..... is in any case right exact, as left adjoint, this requires that for any families
of exact sequences 0 -+ Ai -+ Bi the sequence
should again be exact. Like (2.2.13) this condition is easily verified in any category of
modules. The dual states
(a) P is projective,
(b) every short exact sequence
(2.2.14)
                                        ~.
                                             .. '.' !
                                      B-B"-O
Condition (c) may be expressed by saying: every map from P to a quotient of B may
be lifted to B. We note that the statement of this theorem is quite similar to that of
Theorem 4.7.4 of BA for modules, but we shall not be able to use the proof given
there, which depended on the existence of free modules. On the other hand, the
proof given below provides another proof of Theorem 4.7.4 of BA.
2.2 Functors on abelian categories                                               49
Proof. (a) :::} (b). Given a short exact sequence (2.2.14), we find by (a) that the
sequence of abelian groups
                      0-+ d(P, A) -+ d(P, B) -+ d(P, P) -+ 0
is exact. Now 1p E d(P, P) and by exactness there exists f3 E d(P, B) such that
f3IL = 1p , hence (2.2.14) splits.
    (b) :::} (c). By forming the pullback of the given diagram we obtain
kerex~C~P
B~B"~O
By Corollary 2.1.7, ex is epic, hence by (b) the top row splits, so there is a map
f : P -+  C and f f3 : P -+ B is the required map.
  (c) :::} (a). Given a short exact sequence, we apply d(P, -) and obtain
                     0-+ d(P, B') -+ d(P, B) -+ d(P, B") -+ O.             (2.2.15)
By the left exactness of hom this can fail to be exact only at d(P, B"). But by (c)
every map P -+ B" lifts to a map P -+ B, and this means that (2.2.15) is also
exact at d(P, B").                                                               •
Theorem 2.2.9. Let I be an object in an abelian category d. Then the following con-
ditions are equivalent:
(a) I is injective,
(b) every short exact sequence
                                     O-A'-A
                                          ~        .' ...
                                              ;'
                                          I
 Here (c) may be expressed by saying: every map from a subobject of A to I can be
extended to A.
  The proof is dual to that of Theorem 2.2.8 and so may be left to the reader. •
   Although the notions of projective and injective module are dual, they can have
very different appearance in actual categories and we shall return to this question
for module categories in Section 2.3 and Section 4.6.
50                                                                  Homological algebra
Exercises
 1. Show that a functor between additive categories is additive iff it preserves finite
    products.
 2. Use Exercise 1 to show that a functor between additive categories forming part
    of an adjoint pair is necessarily additive.
 3. Show that a subcategory of an abelian category is abelian iff the inclusion functor
    is exact.
 4. Show that for a faithful functor T in an abelian category, C i=- 0 implies CT i=- o.
    Show that for an exact functor this condition is sufficient as well as necessary.
 5. Let T: d ~ 11, 5: fJ ~ d be a pair of functors giving an equivalence of
    categories. Show that 5, T is an adjoint pair as well as T, S.
 6. Show that for any abelian category the following are equivalent: (a) every object
    is projective, (b) every object is injective, (c) every short exact sequence splits.
 7. Let 5, T be a pair of adjoint functors between abelian categories. Show that if 5
    is left exact, then T preserves injectives; if T is right exact, then 5 preserves
    projectives.
 8. For any group G denote by ZG the group ring of Gover Z. Show that the
    correspondence G I~ ZG is a functor from Gp to Rg whose right adjoint is
    the functor R I~ U(R), where U(R) is the group of units of R.
 9. Show that a functor d ~ 11 is full and faithful iff d is equivalent to a full
    subcategory of fJ.
10. Let p be a prime number. Verify that npnz = 0 and deduce that there is a
    natural injection Z ~ lim+- (Zjpn).
As a rule K will be an arbitrary commutative ring, fixed in any given context, and all
rings will be K-algebras. The case of abstract rings is included by taking K = Z.
   We recall that a right R-module structure on M can be described by saying that we
have a homomorphism
                                                                                (2.3.1)
this is a K-linear mapping f such that (xy)f = yf.xf. This remark is often used to
avoid having to pass to the opposite ring. Thus if we have a homomorphism
RO --* EndK(M), we shall regard M as a left R-module rather than a right RO-module.
   Let R, T be any rings and rModR the category of (T, R)-bimodules; clearly this is a
subcategory of ModR. The following lemma on the transport of ring action is often
useful.
Lemma 2.3.1. Let R, S, T be any rings (or K-algebras) and P : ModR --* Mods a
covariant functor. Then P induces a functor P' : rModR --* rMod s. Similarly a contra-
variant functor G: ModR --* sMod induces a functor G' : rModR --* sMod r .
Proof. Given a (T, R)-bimodule M, we know that MF is an S-module; further, for
any t E T we can define the action of ton MF as t F. Since t defines an endomor-
phism of MR, t F defines an endomorphism of (MFh, i.e. an element of Ends(MF),
and so
                         (xa)l   = (xl)a   for any x EMF, a E S.
   As an example, important for what follows, consider the hom functor. Let M be an
(S, R)-bimodule and N a (T, R)-bimodule; we shall express this briefly by saying that
we are in the situation (SMR, rNR). Consider H = HomR(M, N); when we regard M,
N as right R-modules, H is just an abelian group (or a K-module). But the left
T-module structure on N induces a left T-module structure on H, while the left
S-module structure on M induces a right S-module structure on H; here the side is
reversed because Hom(M, N) is contravariant in M. Thus we see that H is a left
T-, right S-module; in fact it is a (T, S)-bimodule. To show this let us write (f, x) for
the effect off E H on x E M. Then by definition we have for any r E R, s E S, t E T,
(f, x)r = (f, xr), (fs, x) = (f, sx), (tf, x) = t(f, x). Hence ((tf)s, x) = (tf, sx) =
t(f, sx) = t(fs, x) = (t(fs), x), i.e. (tf)s = t(fs), as claimed.
   A second functor of great importance is the tensor product. We recall from BA,
Section 4.8 that for a K-algebra R and modules (UR,R V) there is a K-module
U ®R V with a mapping
                                  A. : U x V --* U ®R V,
which is universal for K-bilinear mappings      f   from U x V to K-modules that are
R-balanced, i.e. such that
                  (xr, y)f = (x, ry)f for all x     E   U, Y E V, r   E   R.
52                                                                  Homological algebra
We remark that (assuming the tensor product over the commutative ring K as
known), U ®R V may also be obtained as the homomorphic image of U ®K V by
adding the relations xr ® y = x ® ry(r E R). Further we recall the equations of
adjoint associativity which follow from the definition of the tensor product. For
the situation (OUR, R Vs, TWs) we have the natural isomorphism of (T, Q)-
bimodules (adjoint associativity)
                                                                                (2.3.2)
This may be expressed by saying that - ®R V is the left adjoint of the functor
hV = Homs(V, -). By symmetry the same holds for U ®R - in the situation
(SUR, RVT, sWO)' using the isomorphism
                  Homs(U ®R V, W) ~ HomR(V, Homs(U, W)).                        (2.3.3 )
Proposition 2.3.2. For a left R-module V over any ring R, the tensor product functor
- ®R V is right exact and preserves direct sums; similarly for right R-modules.   •
  Further we recall the associative law for tensor products; in the situation
(UR,R Vs,s W) we have
                        U®R (V®s W) ~ (U®R V) ®s W.                             (2.3.4)
(2.3.6)
We have already met projective and injective objects in the category of modules in
BA, Section 4.7. In particular, we see from the characterization given there that
the projective R-modules are precisely the direct summands of free R-modules.
There is no such explicit description of injective modules (but see Section 4.6
below); for the moment we note that by Theorem 2.2.9 a module M is injective iff
every short exact sequence with M as first term splits, i.e. iff M is a direct summand
in every module containing it as a submodule. This leads to the following criterion.
An extension of modules M s:; N is called essential and M is said to be a large
submodule of N if M has a non-zero intersection with every non-zero submodule
of N.
  We have already met Reinhold Baer's injectivity criterion in BA, Theorem 4.7.7;
here is another very short proof, due to Peter Freyd.
Theorem 2.3.4 (Baer's criterion). For any ring R, a left R-module M is injective if and
only if every homomorphism from a left ideal of R into M can be extended to a homo-
morphism from R to M.
Proof. The necessity is clear; to prove the sufficiency of the condition we show that
when it holds, M has no proper essential extension. Let MeL be a proper extension,
fix u E L \M and consider the pullback diagram shown, where R --+ L is the map
r 1--+ ru.
   With every homomorphism of rings there are several transfer functors associated,
which are often useful. Given any rings R, S and a homomorphism f : R --+ S, any
right S-module U may be defined as an R-module by putting
                           x.a = x(af)   for x   E   U, a E R.
This R-action on U is said to be defined by pullback along f (not to be confused with
the pullback diagram in Section 2.1), and the resulting R-module is writtenfU. The
correspondence U 1--+ f U is a functor from Mods to ModR rather like the forgetful
functor. We shall want to go in the opposite direction and construct an adjoint;
thus we are given an R-module A and we ask for an associated S-module. There
are two constructions, arising as the left adjoint and the right adjoint of the functor
f U; they are known as the change-of-rings constructions.
   The module At is called the induced and At the coinduced extension of A along f
Here the variance refers to S, not A; in fact, both are covariant in A. They are some-
times called relatively projective and relatively injective, on account of the following
property:
   An abelian category is said to possess enough projectives if every object can be writ-
ten as a quotient of a projective object. For example, the category ModR of right
modules over any ring R has enough projectives, because every module is a homo-
morphic image of a free (hence projective) module, by BA, Theorem 4.6.3. Dually, an
abelian category is said to have enough injectives if every object can be embedded as a
subobject of an injective object. For example, Z as Z-module is contained in Q which
is injective. Let us show that ModR has enough injectives.
Proposition 2.3.7. Let R be any ring. Then ModR (as well as RMod) has enough injec-
tives, i.e. every R-module can be embedded in an injective R-module.
Proof. We first take the special case R = Z. Every abelian group A can be written as a
quotient of a free abelian group: A ~ FIN. Now F is a direct sum of copies of Z and
by embedding Z in Q we can embed the abelian group F in a vector space over Q, G
say. Clearly G is divisible as Z-module and hence so is GIN, and it contains FIN ~ A
as a submodule. But for a Z-module 'divisible' is the same as 'injective' (by BA,
Proposition 4.7.8), so the Z-module A has been embedded in an injective Z-module.
2.3 The category Mod R                                                                55
   Consider now the general case. Given any ring R, there is a natural homo-
morphism f : Z -+ R, obtained by mapping n 1-+ n.1, and we can consider any R-
module M as Z-module by pullback along f By what has been proved, M can be
embedded in an injective Z-module I, hence the coinduced extension
Mf = Homz(R, M) is a submodule of If, by the left exactness of Hom. By Proposi-
tion 2.3.5, M is a direct summand of Mf, hence it is an R-submodule of If, and If is
injective as R-module by Corollary 2.3.6.                                        •
Theorem 2.3.8. Let R be any ring. Given R-modules M, E, where M      ~   E, the following
conditions are equivalent:
(a) E is a maximal essential extension of M,
(b) E is a minimal injective module containing M.
Such an extension E exists for any R-module M, and if E' is another extension of M
satisfying (a) and (b), then there is an isomorphism from E to E' leaving M elementwise
fixed.
Proof. (a) =} (b). If E is a maximal essential extension of M, then any essential exten-
sion F of E is an essential extension of M, for any non-zero submodule of F meets E
and hence M non-trivially. By maximality we have F = E, so E has no proper essen-
tial extensions and is therefore injective, by Proposition 2.3.3, but any submodule
of E containing M has E as essential extension and so cannot be injective unless it
is the whole of E, again by Proposition 2.3.3. Hence E is a minimal injective
module containing M.
   (b) =} (a). Assume that E is a minimal injective module containing M and let F
be any essential extension of M; we claim that F can be embedded in E. For the
inclusion of M in E extends to a homomorphism f : F -+ E because E is injective,
and M n ker f = 0, hence ker f = 0, because F is an essential extension of M.
Thus F is embedded in E. If F is a maximal essential extension of M, then as we
have just seen, we can take F to be a submodule of E and by the first part of the
proof F is injective. It follows that F is a direct summand of E, and so, by the mini-
mality of E, we have E = F, as we had to show.
   We can always construct such an E by taking an injective module I containing
M (Proposition 2.3.7) and inside I taking a maximal essential extension of M,
using Zorn's lemma. Finally, if E, E' are two modules both satisfying (a) and (b),
then the identity mapping on M extends to a homomorphism a : E -+ E' by the
injectivity of E'. The kernel of a meets Min 0, hence ker a = 0 by (a), so im a is
an injective submodule of E' and hence im a = E' by (b). This shows a to be an
isomorphism.                                                                          •
functor. Later, in Chapter 4, we shall find that over certain rings there is a dual
notion of projective cover for every finitely generated module.
Exercises
 1. For any ring R show that the finitely generated right R-modules and homo-
    morphisms form a full subcategory of Mod R• Is it abelian? What about the
    full subcategory of cyclic right R-modules?
 2. Prove the rules (2.3.4)-(2.3.6) in detail.
 3. Let (Ai) be a family of objects in an abelian category. Show that if TI Ai exists
    then it is injective iff each Ai is injective; likewise, if U Ai exists, then it is
    projective iff each Ai is projective. (Warning. Exact sequences need not be
    preserved under products or coproducts in general abelian categories.)
 4. An object P in a category d is called a generator of d if hP = d(P, -) is faith-
    ful. Show that a generator in Mod R is faithful as R-module, i.e. any non-zero
    element of R defines a non-zero action.
 5. Show that in an abelian category with arbitrary coproducts an object P is a
    generator iff every object is a quotient of a copower of P. Deduce that an abelian
    category with arbitrary coproducts and a projective generator has enough
    projectives.
 6. Dualize Exercises 4 and 5 to show that an abelian category with arbitrary
    products and an injective cogenerator (i.e. hI = A( -, I) is exact and faithful)
    has enough injectives.
 7. Show that R is a generator of Mod R • More generally, show that M is a generator
    of Mod R iff R is a direct summand of nM for some n :::: 1.
 8. Let f : R ~ S be a ring homomorphism. Show that for any modules UR , 5 V we
    have Uf ®s V ~ U ®RfV.
 9. Show that for f : R ~ S as in Exercise 8 and modules UR , R V we have
    f( Uf) ®R V ~ U ®R f (Vf). Show further that when R, S are commutative,
    then Uf ®s Vf ~ (U ®R V)f·
10. Show that if M is a finitely generated module over a Noetherian ring R, then
    M* = HomR(M, R) is again finitely generated.
11. Show that in Baer's criterion (Theorem 2.3.4) it is enough to test all large left
    ideals.
terminates, i.e. that K is in some sense closer to being projective than A. Our first
objective is to assign a numerical value to this lack of projectivity.
   Given any R-module A, we take a projective module Po mapping onto A. The
kernel Ko need not be projective, but we can again take a projective PI mapping
onto Ko. This map has kernel KI and we can continue the process, giving rise to a
commutative diagram with exact row as follows:
As a rule one omits the kernels and just writes the exact sequence
(2.4.1)
This is called a projective resolution of A. For example, let R = k[x,y] be the poly-
nomial ring in x, y over a field k and consider k as R-module, by pullback along
the natural homomorphism R --+ R/(x,y) ~ k. We resolve k by mapping R to
R/(x, y); the kernel is the ideal (x, y) and we next take the map R --+ (x, y) defined
by (a, b) 1--+ ax - by. Its kernel is the set (cy, cx), c E R, and this is isomorphic to R.
Thus we have obtained a resolution
It is clear that a projective resolution, possibly infinite, exists for any module, because
ModR has enough projectives, and the resolution terminates when we reach a pro-
jective kernel. In order to compare different resolutions of a given module we
need Schanuel's lemma. It is useful to have this in an extended form.
Proposition 2.4.1. Let R be any ring and M an R-module. Given two short exact
sequences 0 --+ A --+ P --+ M --+ 0 and 0 --+ B --+ Q --+ M --+ 0, where P is projective,
we have an exact sequence
Proof. If we form the pullback of P --+ M, Q --+ M and recall Proposition 2.1.6 and
Corollary 2.1.7, we obtain an exact commutative diagram, where A' ~ A, B' ~ B:
58                                                                              Homological algebra
                                               o         o
                                               -I-     -I-
                                               A' ---+ A
                                               -I-       -I-
                            o ---+ B   I   ---+ C ---+   P ---+ 0
Lemma 2.4.2 (Schanuel's lemma). Given two short exact sequences as in Proposition
2.4.1, if P and Q are projective, then P EB B ~ Q EB A.                        •
     This result suggests the following definition. Two modules M, N are called projec-
tively equivalent if there exist projectives P, Q such that
                                           PEBM      ~     QEBN.
It is clear that this is in fact an equivalence relation; we shall denote the class of M by
[M] and note that [M] = 0 iff M is projective.
    On the set of all equivalence classes we can define an operation as follows. Given
M, we resolve it by a projective:
                                 o ---+ A ---+ P ---+ M             ---+ 0,
and write n(M) = [A]. By Schanuel's lemma the class [A] depends only on M, not
on the resolution chosen. If we replace M by M EB Q, where Q is projective, we have a
resolution
                            o ---+ A ---+ P EB Q ---+ M EB Q ---+ 0,
and this shows that n(M) depends in fact only on the class [M] and not on M itself;
n is sometimes called the loop functor.
  We can now define the homological (or projective) dimension of a module M as
If necessary, we shall distinguish the right and left global dimensions, formed from
right or left modules. In general these two numbers may be distinct, although they
coincide for Noetherian rings (see Section 2.6 below). The rings of global dimension
o are just the semisimple rings, for they are the rings for which every module is
projective (BA, Theorem 5.2.7). As an example of a ring of infinite global dimension
we have the ring Z/4, as the resolution (2.4.3) shows.
   There is an analogous development using injective resolutions, based on the fact
that ModR also has enough injectives (Proposition 2.3.7). Given an R-module M,
we form an injective resolution
o~ M ~ 10 ~ h ~ h .... , (2.4.5)
                                               !~.~: .. ,! fi'
                               O-A-I-C-O
Since {3 is monic, the pushout is a pullback, by the dual of the argument following
Proposition 2.1.6. Now by hypothesis there is a map e : P ~ I such that {3' = ea',
therefore by the pullback property there is a map A : P ~ B such that Aa = 1,
A{3 = e. Thus the given sequence splits and this shows P to be projective.       •
60                                                                   Homological algebra
Theorem 2.4.4. For any ring R the following conditions are equivalent:
                                    ,
                                    I~1"-O
                                           ..   ~
                                     0-1"-1
                                                                                      •
   A ring satisfying the conditions of this theorem is said to be right hereditary. For
example, any principal ideal domain (commutative or not) is right (and also left)
hereditary. From (b) it is clear that the right hereditary rings are just the rings of
global dimension at most 1, and (a) shows that the same class is obtained by com-
puting the global dimension from injective resolutions.
   In the commutative case hereditary integral domains have another more illumi-
nating description. We recall from BA, Section 10.5 that an ideal a in a commutative
integral domain R is called invertible if there is an R-submodule b of its field of frac-
tions K such that ab = R. In BA, Proposition 10.5.1 we saw that an ideal is invertible
iff it is non-zero projective; in particular such an ideal must be finitely generated. In
fact a commutative hereditary domain is precisely a Dedekind domain, as the
description of the latter in BA, Section 10.5 shows.
Exercises
 1. Given two projective resolutions (Pi), (PD of finite length of a module M, show
    that Po EB p /l EB Pz EB ... ~ P~ EB PI EB P; EB ... (extended Schanuellemma).
 2. Find the global dimension of Zin. (Hint. Take first the case of a prime power.)
 3. Let R be a right hereditary Noetherian ring and M a finitely generated sub-
    module of R1, as right R-module, for some set 1. Show that M is a direct sum
    of modules isomorphic to right ideals of R, and hence is projective. (Hint.
    Among the direct summands of M isomorphic to a direct sum of right ideals,
    pick a maximal one.)
2.5 Derived functors                                                                      61
and apply F:
                       ... -+ FXn -+ ... -+ FX I -+ FXo -+ FA -+ O.
In general this will no longer be exact, but it still is a complex. From any such com-
plex one can form homology groups Hn(A) described below, which measure the lack
of exactness of F; taking the Xj projective ensures that these groups depend only on
A and F, but not on the choice of the resolution X.
   Before we enter on the actual construction, we need some properties of commu-
tative diagrams. These are true in any abelian category, but we shall only consider the
case of modules, where they can be verified by diagram-chasing. As a matter of fact,
62                                                                                           Homological algebra
most of the results then follow for general abelian categories, because every small
abelian category has an exact embedding into a module category (see Mitchell
(1965) and Further Exercise 12 of Chapter 4), but we shall not make use of this fact.
  Given any commutative square I:
                                                        Ci
                                                   ---+
                                                        8
                                                   ---+
we define the image ratio of I as i(I) = (im y n im 8) /im(f38) and the kernel ratio of I
as k(I) = ker(f38)/(ker a + ker 13). We note that if y or 8 is monic, then i(I) = 0,
and if a or 13 is epic, then k(I) = O. The ratios of two adjacent squares are related by
Lemma 2.5.1 (Two-square lemma). Given a commutative diagram with exact rows:
                                             A'                 11-'
                                            ---+               ---+
                                     im A' n im 13     ker(f3t.t')
                                     _ _ _ _'-- C>i _ _-'-'-'--'-_
                                        im Af3     - ker 13 + ker t.t .
                                                                                                             •
Lemma 2.5.2. Given the commutative diagram with exact rows,
A~B~C--+O
                                                   A'              /1'
                                       0--+ A' ---+ B' ---+ C'
                                                  A*                         *
                                     ker a -                ker ~ ~ ker y
                                          1       A             1       Jl                   1
                                          A                 - 8                       - C                -0
                                          1       A'            1       Jl
                                                                             I
                                                                                             1
                   0                - A'                    - 8'                      - C'
Here A*, JL * are the maps induced between the kernels and A:, JL: are the maps induced
between the cokernels. Moreover, if A is monic, then so is A* and if JL' is epic, so is JL:.
  The proof, by diagram chasing, is straightforward and may be left to the reader. •
Lemma 2.5.3 (Snake lemma). Given the diagram in the hypothesis of Lemma 2.5.2,
there exists a homomorphism f). : ker y --+ coker ex such that the sequence
                        )..*              {.l*              ~                        ).,'*            /1'*
                ker ex ---+ ker      f3   ---+ ker y ---+ coker ex ---+ coker f3 ---+ coker y
is exact.
Proof. (J. Lambek) We have to prove exactness at ker yand at coker ex, for a suitable
/"}" and for this it is enough to show that coker /-L* ~ ker A~. Writing X = coker /-L*,
Y = ker A', we have the following commutative diagram with exact rows and
columns:
                                                                                 0               ·x
                                                                                             1
                                                                                 1                1
                                                  ker~-                 kery                 -x               -0
                                                                3                            2
                                                       1                         1                1
                               A                  -8                    'C                       -0
                                                       1~
                                              5                 4
                               1a                                                1Y
        0                  .   A'                 .8 '                  -
                                                                          C'
                    7                      6
            1                  1                       1
            Y -coker a -coker ~
                    8
            1                  1
            Y              ·0
64                                                                                    Homological algebra
By the 2-square lemma, X           ~   i(1)   ~   k(2)   ~   i(3)    ~   k(4)   ~   i(5)   ~   k(6)   ~   i(7)   ~
k(8) ~ Y.                                                                                                        •
These maps are traditionally called chain-maps. The condition d2 = 0 means that
im d ~ ker d. We shall write im d = B, ker d = C and call the elements of B bound-
aries and those of C cycles. Finally H = C / B is called the homology group of X. (For an
excellent concise indication of the geometrical background, see Mac Lane (1963).)
   In general Band C and hence H = H(X) are merely abelian groups, but when R is
a K-algebra, they are K-modules. We have
Theorem 2.5.4. For any K-algebra R, H : DiffR --* ModK is a covariant K-linear func-
tor from differential modules to K-modules.
     The proof is a straightforward verification, which may be left to the reader.                               •
(2.5.1)
we can regard this as a graded differential module and the homology group is again a
graded module; in detail we have
Here do is taken to be the zero map, thus Ho(X) = Xo/im d\. It is clear that the
complex given by (2.5.1) is exact precisely when H(X) = 0, so that H(X) may be
taken as a measure of the lack of exactness of X. We note further that if in (2.5.1)
each Xn for n ~ 1 IS projective, then (2.5.1) is a projective resolution of the
R-module A iff
                                         A          for n = 0,
                               Hn(X) = { 0                                                                (2.5.2)
                                                    for n #-        o.
2.5 Derived functors                                                                    65
It is clear that this relation between chain maps is an equivalence; moreover it has
the desired property:
Proposition 2.5.5. Homotopic chain maps induce the same homology map, I.e. if
f ~ g, then H(f) = H(g).
                                                        fJ
                          X:      o ~ X'      IX
                                             ----+ X ----+ X" ~ 0,
there exists a homomorphism !'l. : H(X")           ~    H(X') natural in X, such that the
triangle
66                                                                   Homological algebra
H(X)
                             ~~
                          H(X') •               "            H(X")
is exact.
     !'l. is known as the connecting homomorphism.
where e and X/B are the kernel and cokernel of d respectively. By Lemma 2.5.2 the
whole diagram is commutative, with exact rows and columns. Now the map
d: X -+ X induces a map XIB -+ e, because Xd ~ e and Bd = 0, and this map
has both kernel and cokernel equal to H = eI B. Hence we have a commutative
diagram
                               H' --+ H --+ H"
                               ~        ~           ~
                            X,/B' --+ XIB --+ X',/B" -+ 0
                               ~        ~           ~
                          0-+ e' --+ e --+ e"
                               ~        ~           ~
                               H' --+ H --+ H"
which has exact rows and columns, by Lemma 2.5.2. Further, by the snake lemma,
there is a homomorphism !'l.: HI! -+ H' which makes the homology triangle
exact, and which from its derivation is natural in X.                       •
   The important case of this theorem is that where X I , X, X II are in fact graded
modules, usually chain or cochain complexes, with maps of degree zero between
them, and with d of degree -lor 1 respectively. In the case of chain complexes,
say, the exact triangle takes on the form of an infinite sequence
2.5 Derived functors                                                               67
which is called the exact homology sequence associated with the short exact
sequence X.
                              °
   Any R-module may be trivially regarded as a chain complex concentrated in
degree 0, i.e. Mo = M, Mn = for n =I- 0, and d = 0. In the sequel all chain com-
                                                              °
plexes will be zero in negative dimension, i.e. Xn = for n < 0. The complex X is
                                                                  °
said to be over M if there is an exact sequence X -+ M -+ (regarding M as a trivial
chain complex in the way described above). In full this sequence reads
(2.5.4)
                                                                                   °
If this is exact, it is called a resolution, or also an acyclic complex; then Hn(X) =
for n > 0, Ho(X) ~ M. If each X is projective, we have a projective resolution. An
important property of projective resolutions is that they are universal among resolu-
tions of M. This follows from the more general
                                  v
                                         e'
                                  X' ~ M' -+              °
where X is projective, X' is a resolution of M' and cp is a homomorphism, then there
exists a chain map f : X -+ X' such that the resulting diagram commutes, and f is
unique up to homotopy.
   We shall also say that f is over cp or that f lifts cpo
Proof. We have to construct fn : Xn -+ X ~ such that fnd' = dfn - 1 (n::: 1) and
fos' = scp. We construct these maps recursively, using the fact that Xn is projective
and im(dfn-I) ~ im d' (by the exactness of X'). At the n-th stage (when fn-I has
been constructed) we have the diagram
                                              .Xn
                                        .... I df n _ 1
                                   ~.          +
                                X'~imd'-O
                                 n
The construction needed is quite similar to the one just carried out and may be left
to the reader.                                                                   •
68                                                                  Homological algebra
Corollary 2.5.9. Any two projective resolutions of a module M are chain equivalent and
hence give rise to isomorphic homology groups, for any functor.                      •
We now have all the means at our disposal for constructing derived functors.
Theorem 2.5.10. Let F be a covariant right exact functor on Mod R . Then there exist
functors Fn(n = 0, 1, ... ) such that
A: 0' -+ A -+ A" -+ O.
o -+ K -+ P -+ A -+ 0, (2.5.6)
where P is projective, we have for any Fn satisfying (i)-(iii), the exact sequence
thus any exact sequence (2.5.6) with P projective determines FlA. If FIA is obtained
from (2.5.6) and F;A from
                              o -+ K' -+ p' -+ A -+ 0,
where P' is again projective, we form the pullback Q of P -+ A and P' -+ A and
apply F. We thus obtain the commutative diagram:
2.5 Derived functors                                                                69
                                                           o
                                                           1
                                                0-F1A-0
                                                 1    1    1
                                       o   -FK'-FK'-O
                                       1    3    1    2    1
                       o    -FK-FQ-FP'-O
                        1   5          1    4    1         1
         0-F1A-FK-FP-FA - 0
                       111                                 1
                       o              o         o          o
  The row and column meeting in FQ are split exact, because they arose by applying
F to split exact rows and columns (split because P and pi are projective). The
remaining rows and columns are also exact, and by the 2-square lemma we have
F;A ~ i(l) ~ k(2) ~ i(3) ~ k(4) ~ i(5) ~ FlA. This shows that FIA is determined
up to isomorphism by (i)-(iii). For n > 1 the short exact sequence (2.5.6) yields
the long exact sequence
and FnP = Fn -IP = 0 by (ii). Thus in terms of the loop operator    1T   introduced in
Section 2.4 we have the formula
(2.5.7)
this makes sense since F is constant on projective equivalence classes for n > 0, by
(ii). Now the uniqueness follows by induction on n.
   It remains to prove the existence of Fn; here we may use any projective resolution
for A, say X -+ A -+ O. Applying F, we get a complex FX -+ FA -+ O. In detail this
reads
                        O--~---~---~--O
                                  .         .        .
                        o-- K~ - - - Kn
                                       '"   ,   ;:
                                                - - - K~ - - 0
     where Xn = X~ EB X~. Since X~ is projective, we have a map X~ -+ Kn, while
     the map X~ -+ Kn arises by composition (via K~). By definition of Xn as a
     direct sum (i.e. product) we obtain a map Xn -+ Kn to make the squares
     commute, and a simple diagram chase shows that this map is epic (this also
     follows from the 5-lemma). By induction on n we thus have a resolution X
     of A. Now the row
                              o -+ X' -+ X -+ X" -+ 0
     is split exact, by definition; applying F, we obtain the exact sequence of
     complexes
                           o -+ FX' -+ FX -+ FX" -+ 0,
     and Theorem 2.5.7 provides us with the exact homology sequence, which is the
     required exact sequence.                                                  •
  The functors Fn constructed in Theorem 2.5.10 are called the (left) derived functors
of F. The same result, appropriately modified, gives a construction of right derived
functors of a left exact covariant functor, using injective resolutions. Any given
module A can be embedded in an injective module 10 by Proposition 2.3.7; by
embedding the cokernel in an injective module II and continuing in this fashion,
we obtain an injective resolution
(2.5.8)
For any left exact covariant functor F we have a series of functors pn such that
   The proof is exactly analogous to that of Theorem 2.5.10. We note that here the
index appears as a superscript, and its value increases along the sequence, whereas in
(2.5.5) the index is a subscript, which decreases as we go along the sequence.
   For a contravariant functor the roles are reversed; if F is left exact, we define F by
means of a projective resolution and obtain a long exact sequence (2.5.9), while for a
right exact contravariant functor F we define F by an injective resolution and obtain
the long exact sequence (2.5.5). To sum up, a projective resolution is needed for a
right exact covariant and a left exact contravariant functor, and an injective resolu-
tion for a left exact covariant or a right exact contravariant functor.
   Of course the construction of Theorem 2.5.10 can be carried out for any functor,
not necessarily right (or left) exact. In general we obtain in this way the left derived
functor of F; similarly we can form the right derived functor (using an injective reso-
lution) and together they form a long exact sequence extending in both directions
(see Exercise 9).
Exercises
 1. Show that if I is a pullback or pushout square, then i(I) = 0 and k(I) = O.
 2. Show that the category of chain complexes and chain maps is an abelian
    category.
 3. Let M be a finitely presented R-module, i.e. with a resolution
Theorem 2.6.1. Let F be a bifunctor, covariant right exact in each argument, and
denote by F~, F~ the derived functors obtained by resolving the first and second argu-
ment of F respectively. If F is balanced, then
                                 F~(A, B) ~ F~(A, B),                             (2.6.1)
     and obtain
                               F(X, B)   -4   F(A, B)    -4   o.
2.6 Ext, Tor and global dimension                                                        73
To account for the name we shall briefly indicate an interpretation of Ext. Given two
R-modules A, B over a ring R, an extension of A by B is a module E together with a
short exact sequence
                                    o -+ A -+ E -+ B -+ O.                          (2.6.4)
We can form the category Ex, whose objects are short exact sequences, with the
obvious maps between them: a morphism is a triple of homomorphisms making
the diagram
Consider the image in Ext! (B, A) of the identity map j on B : jll.. This is called the
obstruction or the characteristic class of the extension. Clearly it depends only on the
isomorphism type of the extension (2.6.4). Moreover, it is zero iff (2.6.4) splits, for
jll. = 0 iff j is induced by a homomorphism B -+ E, which is just the condition for
(2.6.4) to split, by Corollary 2.1.5.
    We could also apply Hom( -, A) to the sequence (2.6.4) and get
                                                                 /';
       o -+ Hom(B, A) -+ Hom(E, A) -+ Hom(A, A) ~ Ext! (B, A) -+              ...
74                                                                        Homological algebra
This will give the same obstruction; in fact we have a bijection from the set of
isomorphism classes of extensions of A by B to Ext1(B, A). We shall return to this
topic in Section 3.1.
   The homological dimension of a module was defined in Section 2.4 in terms of the
loop functor n; we now show how to express it in terms of Ext. For simplicity we
shall not distinguish between the class [nA) and a module in it.
Proposition 2.6.2. For any R-module A over any ring R the following conditions are
equivalent:
(a) hdA::::n,
(b) Extk(A, -) = 0 for all k > n,
(c) Extn+1(A, -) = o.
Proof. For any k > 0 and any R-module B we have, by (2.5.7) and its dual,
                Ext k(7l'A, B)   = Extk+ 1(A, B) = Extk(A, lB)     for   k > O.       (2.6.5)
Now (a) states that nn A is projective; so in that case we have for any k > n,
Proposition 2.6.3. For any R-module B over any ring R the following are equivalent:
(a) cd B :::: n,
(b) Extk( -, B) = 0 for all k > n,
(c) Ext"+l(_,B) =0,
(d) Ext"+l(C, B) = 0 for all cyclic modules C.
The proof that (a)-(c) are equivalent is entirely analogous to that of Proposition
2.6.2 and it is clear that (c) implies (d). Conversely, assume (d) for right modules
say; then for any right ideal a of R,
   From Proposition 2.6.2 we see that the global homological dimension of R may be
defined as sup {nlExtn =1= OJ, while Proposition 2.6.3 shows that this determines the
global cohomological dimension. Hence we have
Corollary 2.6.4. For any ring R, the (right) global homological and cohomological
dimensions are equal, and may be defined as
            r.gl.dim(R) = sup{nIExtn( C, B)          =1=   0 for cyclic C and any B}
i.e. sup(hd C) for cyclic right R-modules C.
                                                                                           •
   Of course it must be borne in mind that the global dimension defined here refers
to right R-modules, and in general it will be necessary to distinguish this from the
left global dimension, l.gl.dim(R). As we shall soon see, for Noetherian rings these
numbers coincide, but we shall also meet more general examples where they differ
(see Exercise 8 of Section 2.4).
   We now turn to the tensor product. The functor A ® B is covariant right exact in
each argument and is balanced. We therefore have a unique derived functor, some-
times called the torsion product, written
                                Tor~(A, B)      or     Torn(A, B).
As an example consider the case R = Z. Here Torn = 0 for n > 1, because Z is
hereditary and so all projective resolutions have length at most 1. Writing Ck for
the cyclic group of order k, we have an exact sequence
                                          k
                                 o -+ Z ----*   Z -+ Ck -+ 0,
where k indicates multiplication by k. Tensoring up with Ck we get
              0-+ Torl(Ck, Ck) -+ Ck ® Z -+ Ck ® Z -+ Ck ® Ck -+ O.
Denote a generator of Ck by c; then under the induced map 1 ® k we have
c ® 11-+ c ® k H ck ® 1 = O. Therefore ker(1 ® k) = Ck and we find
                                     Torf(Cb Cd ~ Ck.                                  (2.6.6)
Since Tor, like ®, preserves direct sums, (2.6.6) together with the equation
Torl(Ch, Ck) = 0 for coprime h, k is enough to determine Tor for any finitely
generated abelian group.
In terms of the loop functor Jr we can also define wd A as the least integer k such that
Jrk Ais flat. Now the weak global dimension of R is defined as
This map is clearly R-balanced and biadditive, and hence defines a homomorphism
(2.6.7), which is easily seen to be natural. Moreover, for A = R it is an isomorphism,
hence it is an isomorphism for any finitely generated projective R-module A. We
now fix C to be Z-injective, i.e. divisible and consider the two sides of (2.6.7) as a
functor in A. Both sides are covariant right exact; if we apply them to a projective
resolution (2.6.2) of A we obtain a natural transformation
Suppose now that R is right Noetherian and A is finitely generated. Then each term X
in the resolution (2.6.2) may be taken to be finitely generated and so (2.6.8) will be
an isomorphism. Thus we have proved
Proposition 2.6.5. Let R be a right Noetherian ring and C a divisible abelian group.
Then for any right R-modules A, B such that A is finitely generated, we have
because every projective module is flat. It follows that for any ring R,
                      w.gl.dim(R) :::: r.gl.dim(R), l.gl.dim(R).                    (2.6.11)
Now assume that R is right Noetherian and A is finitely generated over R. Choose n
such that n :::: hd A; then ExtR(A, B) #- 0 for some R-module B, and moreover, we
can find a divisible group C into which ExtR(A, B) has a non-zero homomorphism
(indeed an embedding, by Proposition 2.3.7). Thus for suitable B, C the right-hand
side of (2.6.9) is non-zero; looking at the left-hand side, we deduce that wd A::: n.
Hence equality must hold in (2.6.10) and we have proved
2.6 Ext, Tor and global dimension                                                    77
Theorem 2.6.6. If R is a right Noetherian ring, then for any finitely generated right
R-module A, wdA = hdA.                                                             •
Proof. By Corollary 2.6.4 the right global dimension is the supremum of the projec-
tive dimensions of the cyclic right R-modules, and by Theorem 2.6.6 this cannot
exceed the weak global dimension of R; by (2.6.lO) it cannot be less, and so the
equality in (2.6.11) follows. The inequality follows similarly from (2.6.lO), bearing
in mind that the weak global dimension is left-right symmetric.                    •
U r ® Us 2:' U r + s (2.7.3)
which follows from the associative law for tensor products. If A is commutative and
the left and right actions on U agree, the ring defined by (2.7.2) is an A-algebra, but
for general A the ring TA(U) so obtained is an A-ring, i.e. a ring with a homo-
morphism A """"'"* TA(U). If U is the free K-module on a set X, we write h(U) as
K(X); this is just the free K-algebra on X (see BA, Section 6.2). The ring TA(U)
has the special property that any A-linear mapping of U into an A-ring R can be
extended to a homomorphism ofTA(U) into R:
Theorem 2.7.1. Let A be any ring and U an A -bimodule. Then TA (U) is the universal
A-ring for A-linear mappings of U into A-rings: there is a homomorphism
A: U""""'"* TA(U) such that for every A-linear map f : U""""'"* R into an A-ring R there
is a homomorphism f* : TA(U) """"'"* R such that
f = Af*. (2.7.4)
Proof. The map A may be taken as the embedding which identifies U with U I • Given
an A -linear mapping f : U """"'"* R, we extend f to TA (U) by defining
By the properties of the tensor product this defines a mapping f * from U to R, which
is easily seen to be a homomorphism; f* is unique since it is determined on the
generating set U and (2.7.4) holds, almost by definition.                          •
   It turns out that derivations have the same property. We recall that for any ring
homomorphisms a : C """"'"* A, f3: C """"'"* B and an (A, B)-bimodule M, a mapping
8 : C """"'"* M is called an (a, f3)-derivation if 8 is linear and satisfies
(2.7.5)
                                         Jt'   x8 )
                                 x 1-+ ( 0     xf3    x E C,
(2.7.6)
                  =x.l +x 8 .y.
80                                                                   Homological algebra
(2.7.8)
                                 QA(R) = R 0 U 0 R                               (2.7.9)
and the exact sequence
O-+R®U®R~R®R~R-+ 0 (2.7.lO)
Theorem 2.7.5. To any ring R there corresponds a commutative ring Rab with a homo-
morphism v : R -+ Rab which is universal for homomorphisms of R into commutative
2.7 Tensor algebras, universal derivations and syzygies                               81
Proof. Let c be the commutator ideal of R, i.e. the ideal generated by all the com-
mutators xy - yx, where x, y E R, and write Rab = Ric, with the natural homo-
morphism v: R -+ Rab. Then any homomorphism f from R to a commutative
ring maps xy - yx to 0, for all x, y E R, hence ker f 2 c, and so by the factor theorem
(Theorem 1.2.4) f can be factored uniquely by v.                                      •
(2.7.11)
By what has just been said, we see that S(U) can be obtained from T(U) by imposing
the relations
                               xy   = yx   for all   x,y   E   U,              (2.7.12)
Theorem 2.7.6. Let K be a commutative ring. For any K-module U there is a com-
mutative K-algebra S(U) with a K-linear mapping IL : U -+ S(U) which is universal
for K-linear mappings from U to commutative K-algebras. S( U) can be obtained
from the tensor algebra by imposing the relations (2.7.12).                    •
Now a simple verification shows that uf and vf commute, for any u, v E U, so by the
universal property of S( U) there is a unique K-algebra homomorphism from S( U)
extending f. This proves
82                                                                   Homological algebra
Proposition 2.7.7. Let U be a K-module (over a commutative ring K) and S(U) its
symmetric algebra. Then any K-linear mapping 8 : U ~ S( U) extends to a unique
derivation of S(U).                                                         •
  We also note the following test for algebraic dependence in fields. If Elk is an
extension field generated by Xl, ... ,Xm then it is easily verified that the universal
derivation module Qk(E) is spanned by the dXj. Let us write Dj or a/aXj for the deri-
vation of the polynomial ring k[XI, ... ,xnl with respect to Xj; this is the derivation
over k which maps Xj to 1 and Xj for j :f. i to O.
Theorem 2.7.8. Let Elk be a field extension in characteristic 0, and let (Xj) be any
family of elements of E. Then
(i)   the Xj are algebraically independent if and only if the dXj are linarly independent
      over E,
(ii) E/k(xj) is algebraic if and only if the dXj span Qk(E) as E-space,
(iii) (Xj) is a transcendence basis for E if and only if the dXj form a basis for Qk(E)
      over E.
Proof. This follows from the fact that any polynomial relation f(xl, ... , xn) = 0
corresponds to a relation L,DJ(x).dxj     = O.                                        •
   For our last result in this section we shall need a change-of-rings theorem which is
also generally useful. Let f : R ~ S be a homomorphism of rings; we saw in Section
2.3 that every S-module U can be considered as R-modulefU by pullback alongf, in
particular S itself becomes an R-bimodule in this way. Further, every R-module A
gives rise to an induced extension Af = A ®R S and a coinduced extension
Af = HomR(S, A).
(2.7.13)
(2.7.14)
and
                                                                                (2.7.15)
and here the terms I~ are injective, as coinduced modules, so this is an injective
resolution of Af. If we now apply the hom functor to this resolution and bear in
mind that by (2.3.3) and (2.3.4),
Homs((Pn)f, U) ~ HomR(pn,fI)
to obtain (2.7.14); in the same way we apply the tensor product to the isomorphism
(Pn)f ®s U ~ Pn ®RfU
to obtain (2.7.15).
                                                                                     •
  We conclude this section by finding an estimate for the global dimension of a
tensor ring:
for any right R-module N. Since cU is flat, so is cR and so by Theorem 2.7.8 this
simplifies to
... -* Ext~(M, N) -* Ext~(fM,fN) -* Ext~(fM ® U,fN) -* Ext~+l(M, N) -* ...
                                                                                        (2.7.18)
It follows that r.gl.dim(R)::::: n + 1. Moreover,            by the        definition    of n,
Ext'(/l(fM/N) = 0, so we have a surjection
                       Ext~(fM ® U,fN) -* Ext~+ 1 (M, N) -* O.                          (2.7.19)
We next show that r.gl.dim(R) :::: n. Choose right C-modules A, B such that
Ext~+ 1 (A, B) -# 0 and consider A, B as right R- modules with trivial U-action, i.e.
AU = BU = O. The C-module structure is then recovered by pullback along f
Taking M = A, N = Bin (2.7.18) and observing that HomR(A, B) -* HomdA, B)
is then an isomorphism, we conclude by exactness that HomdA, B) -*
HomdA ® U, B) is then the zero map. The same applies if we resolve B, hence by
(2.7.18) we have the exact sequence
         o -* Ext~-l (A ® u, B) -* Ext~(A, B) -* Ext~(A, B) -* o.                       (2.7.20)
It follows that   Ext~(A,B) i- 0, and so r.gl.dim(R) :::: n; this proves (2.7.16). Now if
hd(A ® U) = n for some A c , then by (2.7.20) with n replaced by n + 1, we have
r.gl.dim(R) = n + 1, while if hd(A ® U) < n for all Ac, then hd(fM ® U) < n
and by (2.7.19), r.gl.dim(R) ::::: n.
   The proof of (ii) is similar. For any right R-module M we have
We see again that r.gl.dim(R) ::::: n + 1, and the surjection (2.7.l9) is replaced by
                    Ext~(HomdU,M),N) -* Ext~+l(M,N) -*                o.
The argument as before gives the analogue of (2.7.20):
          o -*    Ext~-l(HomdU,A),B) -* Ext~(A,B) -* Ext~(A,B) -* 0
It follows that Tor! (A, B) f:. 0, so w.gl.dim(R) ::: m. If wd(A ® U) = m for some
Ac, then replacing m by m + 1 in (2.7.21) we find w.gl.dim(R) = m + 1; if
wd(A ® U) < m for all Ac, then wd(fM ® U) < m and by (2.7.22) we have
w.gl.dim(R) :::: m.                                                             •
   Thus for any free algebra k(X) over a field k, every right ideal (and every left ideal)
is projective. In fact these ideals are free, of unique rank (see Cohn (1985)); for the
case of finitely generated right (or left) ideals this will be proved in Section 8.7 and
the full result is in Section 11.5 below.
Exercises
 1. Verify that U H S( U) is a functor.
 2. Show that S(U $ V) 9:! S(U) ® S(V).
 3. Given a surjective homomorphism J.L : U --+ V of K-modules (where K is a com-
    mutative ring), show that S(J.L) : S(U) --+ S(V) is surjective.
 4. Writing [x, y] = xy - yx, verify the identity [xy, z] = x[y, z] + [x, z]y, which
    expresses the fact that the mapping U 1--+ [u, z] is a derivation. Use this result
    to give another proof of the remark following Theorem 2.7.5, that the commu-
    tator ideal of a ring R is generated by all [x, y], where x, y range over a generating
    set of R.
 5. Find extensions of Proposition 2.7.3 and Proposition 2.7.4 to (a, ,B)-derivations.
 6. Given a situation (AR,R Bs), where AR and Bs are flat, show that (A ® B)s is flat.
    Deduce that if a C-bimodule U is flat as right C-module, then so is TdU). Do
    the same for 'projective' in place of 'flat'.
 7. Find an extension of Theorem 2.7.8 to the case of prime characteristic.
 8. Apply Theorem 2.7.8 to prove that the transcendence degree (in characteristic 0)
    is an invariant of the dimension. What can be said in prime characteristic?
 9. A finitely generated R-module M is called stably free if integers r, s exist such that
    M E8 Rr 9:! RS • Given that every finitely generated projective over a polynomial
    ring is stably free, derive Hilbert's form of the syzygy theorem from Corollary
    2.7.11.
10. Let k be a field and R the k-algebra generated by (disjoint) finite sets Xl, ... , Xr
    with the defining relations xy = yx precisely when x, y lie in different sets. Show
    that gl.dim(R) = r. Find the global dimension of R when the relation xy = yx
    holds precisely for x, y in the same set.
    Given f : A ---+ Band g : B ---+ A such that fg = I, show that f is monic and g is
    epic. By applying the windmill lemma (Exercise 10 of Section 2.1) to the
    sequences obtained from (2.8.1) for f, g, deduce that A is isomorphic to a
    summand in a coproduct representation of B.
 4. Prove Yoneda's lemma for additive categories: If F : .91 ---+ Ab is given and for
    P E XF a natural transformation p. : hX ---+ F is defined by the rule that for
    a E d(X, Y), a 1---+ paF maps Yh x to yF, verify the naturality and show that
    the resulting map XF ---+ Nat(hX, F) to the set of natural transformations is an
    isomorphism.
 5. If G: .91 ---+ Ab is a contravariant functor, show that X G ~ Nat(hx, G), where
    hx : Y ---+ A(X, Y) is defined by the rule
 6. Use Yoneda's lemma to show that the left adjoint of a functor, if it exists, is
    unique up to isomorphism.
 7. Show that a functor is exact iff it is right exact and preserves monics or left exact
    and preserves epics.
 8. Show that any left exact functor preserves pullbacks.
 9. Show that in any category .91 the product of two .91 -objects X, Y is the object
    representing the functor A H d(A, X) x d(A, Y) (if it exists). Similarly their
    coproduct is the object representing A 1---+ d(X, A) x d(Y, A).
10. Let X be an object in an abelian category. Assuming that the equivalence classes
    of subobjects of X form a set, partially ordered by inclusion, show that this set is
    a modular lattice.
11. In an additive category consider the sequence
                                   P ---*
                                       A A   n   B ---*
                                                     J1 Q.
      Show that if i, j, p, q are the natural injections and projections of the biproduct
      Ail B,    then the square formed by the maps AP, -Aq, ilL, JIL commutes if
      AIL = 0, is a pullback if A = ker IL and is a pushout if IL = coker A.
12.   Given a pullback in an additive category (with notation as in Section 2.1) show
      that a' is monic iff a is (for abelian categories this follows from Proposition
      2.1.6, but not in general).
13.   Let P be a pullback of a : A ---+ C, 13 : B ---+ C in the category of rings. Show that
      if P is an integral domain, then one of a, 13, say a is injective, and P is isomorphic
      to a subring of B.
14.   Show that in an abelian category the intersection of two subobjects of a given
      object can be defined as a pullback and describe the dual concept.
15.   Given a 3 x 3 'matrix' of short exact sequences between modules Uij
      (i, j = 1,2,3) forming a commutative diagram as in the 3 x 3 lemma, show
      that the kernel of the composite map U22 ---+ U33 is im(Uj2 ) + im(U1d.
      Deduce that for any two short exact sequences of modules Ui, Vi the kernel of
      the map U2 ® V1 ---+ U3 ® V3 is im(Uj ® V1) + im(Uz ® Vj).
16.   Let .91, f!6 be abelian categories with direct sums and F, G two right exact
      functors from .91 to f!6, which preserve coproducts. Show that if there is a natural
88                                                                    Homological algebra
23. Show that the triangular matrix ring     (~ ~) over a field k is hereditary. Is it a
    principal ideal ring?
24. Given a short exact sequence of modules
26. A ring is said to be right semihereditary if every finitely generated right ideal is
    projective. Show that in a right semihereditary ring the finitely generated right
    ideals form a sublattice of the lattice of all right ideals.
27. A ring R is said to be weakly semihereditary if, given finitely generated projective
    modules P, Po, PI and maps ex : Po ---+ P, fJ : P ---+ PI such that exfJ = 0, there is a
    decomposition P = pi ffi P" such that im ex ~ pi ~ ker fJ. Show that R is weakly
    semihereditary iff for any matrices A E r Rn, BEn RS such that AB = 0 there
    exists an idempotent n x n matrix E over R such that AE = A, EB = O.
    Deduce that the condition is left-right symmetric. Show also that every right
    (or left) semihereditary ring is weakly semihereditary.
28. Show that over a right semihereditary ring R every finitely generated submodule
    of a projective module is projective and that every finitely generated projective
    module is isomorphic to a direct sum of a finite number of finitely generated
    right ideals.
29. Show that an injective R-module E is a cogenerator iff HomR(S, E) j=. 0 for every
    simple R-module S. Verify that Q/Z is an injective cogenerator for Z (see also
    Section 4.6).
30. Show that for any commutative ring K, Ext~(A, B) and Tor~ (A, B) have a
    natural K-module structure. Show that if, moreover, K is Noetherian and A, B
    are finitely generated, then ExtK(A, B) and Tor~ (A, B) are finitely generated as
    K-modules.
                         Further group theory
Group theory has developed so much in recent years that a separate volume would
be needed even for an introduction to all the main areas of research. The most a
chapter can do is to give the reader a taste by a selection of topics; our choice was
made on the basis of general importance or interest, and relevance in later applica-
tions. Thus ideas from extension theory (Section 3.1) are used in the study of simple
algebras, while the notion of transfer (Section 3.3) has its counterpart in rings in the
form of determinants. Hall subgroups (Section 3.2) are basic in the deeper study of
finite groups, the ideas of universal algebra are exemplified by free groups
(Section 3.4) and linear groups (Section 3.5) lead to an important class of simple
groups, as do symplectic groups (Section 3.6) and orthogonal groups (Section 3.7).
We recall some standard notations from BA. If a group G is generated by a set X, we
write G = gp{X}, and we put gp{XIR} for a group with generating set X and set of
defining relations R. For subsets X, Y of G, XY denotes the set of all products xy,
where x E X, Y E Y. We write N <I G to indicate that N is a normal subgroup in
G, i.e. mapped into itself by all inner automorphisms of G. If H, K are subgroups
of G, then HK is a subgroup precisely when HK = KH; in particular this holds
when H or K is normal in G. We also recall the modular law: given subgroups K,
L, M of G, if K ~ M, then K(L n M) = KL n M.
                                            A        f1
                                l--+A-+E-+G--+l.                                (3.1.1)
Given any two groups A, G, their direct product G x A is such an extension, but in
general there will be many others. Two extensions of A by G, say EI and E2 are said to
be isomorphic if there is an isomorphism f : EI --+ E2 that the diagram
This just means that E = G I A I. Next we observe that G I n A I = 1, for the restriction
of fJ, to GI is injective and its kernel is GI n AI. It follows that the expression (3.1.2)
for x is unique: if x=ga=g'a', where g,g' E GI> a,a' EAI> then a'a- I =
g'-Ig E GI n Al = 1, hence a' = a, g' = g. Let us identify G with GI and A with AI>
so that we have E = G x A, as sets. By hypothesis A is normal in E; if G is also normal
in E, then E is just the direct product of G and A, as is easily checked, but in general G
need not be normal in E; each element of G then defines an inner automorphism of
E, which induces an automorphism of the normal subgroup A. If we write CXg for the
automorphism of A induced by g E G, then we have the commutation rule
(3.1.4)
where ()I = 1. It is no longer true that () has to be a homomorphism, i.e. ()a()p will not
in general equal ()aP, but will differ from it by an inner automorphism; by (3.1.6) we
have
                                                                                    (3.1.9)
where for any x E A, t(x) : u 1-+ X-I ux is the inner automorphism defined by x. As is
easily verified, the set Inn A of all these inner automorphisms is a normal subgroup
of Aut A, the group of all automorphisms of A. The quotient Aut(A) /Inn(A) is called
the automorphism class group and (3.1.9) shows that we have a homomorphism
                                 e: G -+ Aut(A)/Inn(A).                            (3.1.10)
94                                                                                 Further group theory
The set ma, fJ is called a factor set of the extension, normalized because it satisfies
(3.1.7). In E we have ga(gfJgy) = gagfJymfJ,y = gafJyma, fJymfJ,y and (gagfJ)gy =
gafJma,fJgy = gafJgy(ma,fJ(}y) = gafJymafJ,y(ma,fJ(}y), hence by the associative law we
obtain the factor set condition
                                                                                              (3.1.11)
Conversely, given any groups G and A and mappings (): G -+ Aut A and
m: G 2 -+ A such that (}I = 1 and (3.1.7), (3.1.9) and (3.1.11) hold, we can define
a multiplication on the set G x A by putting
                            g~   = gaca,     where Ca   E   A,   CI   = 1,
and the new factor set    {m~, fJ}   obtained from the g~ is related to the old by the equa-
tions gafJcafJm~, fJ = g~fJm~, fJ = g~g/J = gacagfJcfJ = gagfJ(ca(}fJ)cfJ = gafJma, fJ(ca(}fJ)cfJ·
Hence
                                                                                              (3.1.13)
we shall express (3.1.13) by saying that {ma,fJ} and             {m~,fJ}     are associated. Similarly,
we obtain from (3.1.8)
                                                                                              (3.1.14)
Conversely, if m~,fJ and ()~ are defined by (3.1.13) and (3.1.14), they lead to an exten-
sion isomorphic to the given one, as we see by retracing our steps. We sum up these
results in
Theorem 3.1.2 (Schreier's extension theorem). Given two groups G and A, a homo-
morphism    e:G -+ Aut(A)/Inn(A) and a map m : G 2 -+ A satisfying the factor set
condition (3.1.11), we can define a multiplication on the set G x A by (3.1.12) and
so obtain a group which is an extension of A by G. All extensions of A by G are obtained
in this way and two extensions are isomorphic if and only if their factor sets are
associated.                                                                           •
  To prove this result in full, some verifications are needed, which are straight-
forward (though tedious) and will therefore be left to the reader. Instead we shall
examine a special case, important for the applications, in more detail.
3.1 Group extensions                                                                    95
  Let us assume that A is abelian; in that case Inn A is trivial and so the map
e: G ~ Aut A is a homomorphism. Moreover, as (3.1.14) shows, the automorphism
ea depends only on ex and not on the choice of e. So in this case we have an action
of G on A (by automorphisms) and in place of xea we shall simply write xa. This
action is trivial precisely when A is contained in the centre of E; we call this a central
extension.
   In what follows we shall write A as an additive group, but G will still be multi-
plicative. The action of G then turns A into a G-module and the factor set condition
(3.1.11) now reads
                                                                                  (3.1.15)
(3.1.16)
Our aim is to obtain a homological description of all extensions with abelian kernel.
We recall that G-modules may equally well be regarded as ZG-modules, where ZG is
the group algebra of Gover Z. Moreover, any left G-module A can be regarded as a
right G-module by using the canonical antiautomorphism of ZG. Explicitly we put
                          a.g=g-la       forallaEA,gEG.
Let A, B be any right G-modules and consider Hom(A, B), the group of all (abelian
group) homomorphisms from A to B. We can define a G-module structure on this
group as follows. Iff EHom(A, B), we put
                           r:    al~(f(as-l))s     for s   E   G.                 (3.1.17)
f'(as) = f(a)s,
functor of A G is written Hn( G, A) and is called the n-th cohomology group of the G-
module A. By (3.1.19) we see that
where for the subscript on the right we have written G rather than ZG. We note that
for any coinduced module A we have Hn(G,A) = O. For ZG is free as Z-module,
hence by the change-of-rings formula (2.7.13), we have
This holds for any coinduced module Bf = Homz(ZG, B), where f : Z --+ ZG.
  Similarly the n-th homology group of the G-module A is defined as
                                   Hn(G,A)   = Tor~(Z,A).
If A is induced, say A = Bf = ZG ®z B, then Tor~(Z, Bf) ~ Tor~(Z, B) = 0 for all
n::: 1, hence we have Hn(G,A) = 0 for any induced G-module A.
   Let e : ZG --+ Z be the augmentation map, defined as e : Lass 1--+ L as. Its kernel
IG is called the augmentation ideal of ZG, and from the split exact sequence
                               o --+ IG --+ ZG --+ Z --+ 0                              (3.1.20)
Hence Z®A ~A/(IG)A and we see that Hn(G,A) is the left derived functor of
A/(IG)A. We note that A/(IG)A, sometimes written A G, is the largest quotient of
A with trivial G-action, as is easily seen. Hence we find that
From the exact sequence (3.1.20) and the definitions of H n , H n we obtain by shifting
dimensions,
(3.1.22)
(3.1.23)
where the caret on Si means that this term is to be omitted. The augmentation map
B : Xo --+ Z is defined by SoB = 1. With these definitions we have dZ = 0, as we see by
counting how often the term (so, ... , Sp, ... , Sq, ... ,sn) where p < q, occurs in
(so, ... , sn)d z.
   To prove exactness, we define a homotopy h: Xn --+ Xn+ I by the rule:
(so, ... , sn)h = (1, so, ... , sn) for n ~ 0, I.h = 1 for n = -1. Then it is easily verified
that hd + dh = 1. This resolution is called the standard (or bar) resolution of Z over
ZG, in homogeneous form. The elements of ker d, im d are called cycles and bound-
aries respectively. Sometimes the inhomogeneous form is more convenient to use;
this is defined as
(3.1.25)
As an illustration let us calculate HI (G, Z). We take the resolution X of Z and tensor
with Z:
Since g is arbitrary in G, [is completely determined by its values when its last argu-
ment is 1. Let us write
98                                                                                    Further group theory
(ffJd)(t) , ... , tn +))   = ffJ(tz, ... , tn +)) + L      (- 1)iffJ(t) , ... , titi+), ... , t +))
                                                                                             n
(3.1.27)
It is a coboundary iff
                                ffJ(g)   =c-    cg for some fixed c EA.                               (3.1.29)
A function ffJ satisfying (3.1.28) is sometimes called a                    derivation (it is an (e, 1)-
derivation in the sense of BA, Section 6.2) or a crossed                    homomorphism. If (3.1.29)
holds, it is said to be inner or a principal crossed                         homomorphism. Writing
Der( G, A) for the group of derivations and IDer( G, A)                     for the subgroup of inner
derivations, we have
If G acts trivially on A, the derivations are just the homomorphisms and the inner
derivations are 0, so in this case H)(G,A) ~Hom(G,A). Since A is abelian, the
homomorphisms G -+ A correspond to the homomorphisms Gab -+ A, and so we
obtain
Proposition 3.1.4. For any group G and any module A with trivial G-action ag                          = a for
all a E A, g   E   G, we have
Theorem 3.1.5. Let G be any group and A a G-module. Then there is a natural bijec-
tion between the group HZ( G, A) and the set of isomorphism classes of extensions ofA by
G with the given G-action on A.                                                       •
Some simple calculations are suggested in the exercises; we add a general result which
is often useful.
Proposition 3.1.6. Let G be a finite group of order r and A any G-module. Then for any
n > 0, each element of Hn(G, A) has order dividing r.
Proof. We define a 'homotopy mod r' by the equation
Hence we have
                                      dh + hd = r.1.
For finite A the order of A annihilates Hn( G, A), so Proposition 3.1.6 yields
Corollary 3.1.7. If G, A are both finite, of coprime orders, then Hn(G, A) = 0 for any
n>Q                                                                                 •
Proposition 3.1.8. Let G be any finite group and K = Q/Z the group of rational
numbers mod 1, as trivial G-module. Then
Proof. The exact sequence 0 ---+ Z ---+ Q ---+ K ---+ 0 leads to the derived sequence
Since Q is uniquely divisible, the extreme terms are 0 and so the other two are iso-
morphic: H2(G, Z) ~ Hl(G, K). Since the G-action on K is trivial, Hl(G, K) ~
Hom(G, K) and the result follows.                                                 •
Nevertheless a similar result can be obtained on going via a short exact sequence of
L-modules. It is given by
Theorem 3.1.9. Given an exact sequence (3.1.31) describing a group extension G, there
is a 5-term exact sequence for any L-module A:
            l(         11'  l(     A'        ab     t   2       11'  2
  0--+ H         L,A) ---+ H G,A) ---+ HomL(N ,A) ---+ HL(L,A) ---+ HG(G,A).
                                                                                   (3.1.32)
The map IL *, arising from IL, is called the inflation map and )." *, arising from the
inclusion)." is called the restriction map. The connecting map t is called the trans-
gression.
Proof. By tensoring (3.1.20) with ZL over G we obtain the exact sequence
Here the last two terms just represent the augmentation of ZL, so the kernel in the
third term is IL. For the first term we have, by Theorem 2.7.9 and Proposition 3.1.3,
Torr(ZL, Z) ~ Torf(Z, Z) ~ Nab. Hence we obtain the exact sequence of
L-modules
From the associativity of the tensor product we know that for any L-module A,
                          HomL(ZL ®G IG, A)        ~     HomG(IG, A).
Let us write the two sides of this formula as P(A) ~ Q(A) and write pn, Qn for the
left derived functors. We take a short resolution of A with I injective:
                                  o --+ A --+ I   --+ C --+ 0,
It follows that there is an injection from P(A) to Q(A), i.e. from Ext(ZL ® IG, A) to
Ext(IG, A). If we now apply Hom( -, A) to (3.1.33), we find
This reduces to (3.1.32) if we use (3.1.23) on the first two terms, replace the last term
by Ext~(IG, A) and again use (3.1.23).                                                  •
   Occasionally the cohomology groups are needed for a more general coefficient
ring. If K is any commutative ring and KG is the group algebra of Gover K, then
the standard resolution X -+ Z -+ 0 on tensoring with Kover Z becomes
                                  X®zK     -+   K -+ 0,
and this is still a projective resolution, because the original resolution was Z-split, by
the homotopy found earlier. Thus we obtain, for any KG-module A,
                                                                                 (3.1.34)
Similarly we have
(3.1.35)
We recall that the terms in (3.1.34) vanish for n ::=: 1 if A is coinduced, and those in
(3.1.35) vanish if A is induced. But for G-modules over a finite group G, induced and
coinduced mean the same thing, for iff: K -+ KG is the inclusion map, then there is
an isomorphism of KG-modules:
                              HomK(KG, U) ~ U ® KG,
Exercises
 1. Supply the details for the proof of Theorem 3.1.2.
 2. Examine the case of unnormalized factor sets.
 3. (0. Holder) Let E be an extension of A by B, where A, B are cyclic groups of
    orders m, n respectively (such a group E is called metacyclic). Show that E has
    a presentation gp{a, bla m = 1, bn = 1, b-1ab = as}, where 5 == 1 (mod m)
    and r(s - 1) == 0 (mod m). Conversely, given m, n, r, 5 satisfying these relations,
    show that there is a metacyclic group with this presentation.
 4. Show that an extension of A by G with factor set {m",,B} splits iff there exist
    c" E A such that m",p = C;;pI(Cexep)cp. Let E be an extension of an abelian
    group A by G. By adjoining free abelian generators Cex to E and using the
    above relations to define c"e,B show that E can be embedded in a group E*
    which is a semidirect product of A * and G, where A * ;2 A.
 5. Show that the group of isometries of Euclidean n-space is a split extension of the
    normal subgroup of translations by the orthogonal group.
102                                                                  Further group theory
 6. Show that HI (G, A) is in natural bijection with the set of isomorphism classes of
    A-torsors, i.e. left A-sets with regular A-action such that (a.x)" = a".x"
    (ex E G, a E A).
 7. Show that the augmentation ideal, defined as in (3.1.20), is Z-free on the s - 1
    (1 =J. s E G). Verify that HomcCIG,A) is isomorphic to the group of l-cocycles.
 8. Using the mapping s I~ s - 1 (mod IG 2) of G into IG / (IG)2 show that
    Gab ~ IG/(IG)2. Deduce another proof of Proposition 3.1.3.
 9. Let G be a finite group of order r. Show that every cocycle in Hn( G, C X ) is coho-
    mologous to a cocycle whose values are r-th roots of 1. Deduce that the multi-
    plicator of G has order at most rr'.
10. Let G = Cm, the cyclic group of order m, with generator s and write D = s - 1,
    N = 1 + s + ... + sm - I in ZG. Verify that there is a free resolution
    W ~ Z ~ 0, where Wn = ZG and d2n : W2n ~ W2n _ 1 is d2n = N, while
    d2n - 1 = D. With the notation NA = {a E AlaN = O} for any G-module A
    show that H 2n (G,A) =AG/AN, H 2n - 1 (G,A) =NA/AD and HnCG,A) =
    Hn-1(G,A) for all n > 1.
For any finite group G, n( G) denotes the set of primes dividing IG I. If n( G) S; n, the
group G is called a n-group. A Hall subgroup is a subgroup of G whose order is prime
to its index. A n-subgroup of G whose index is prime to n (i.e. not divisible by any
prime in n) is called a Hall n-subgroup; e.g. when n = {p}, a Hall p-subgroup is just a
Sylow p-subgroup. A Hall p' -subgroup is also called a p-complement. To establish the
existence of Hall subgroups in soluble groups we shall need some preliminary results.
(3.2.2)
   We note that in the special case where H or K is normal in G, the lemma follows
from the second isomorphism theorem (Theorem 1.2.6).
Lemma 3.2.2. If H, K are subgroups of a finite group G whose indices in G are coprime,
then HK = G and (G: H n K) = (G: H)(G: K).
Proof. Put (G: H)  = m, (G: K) = n; by Lemma 3.2.1 we have
               (G: HnK) = (G: K)(K: HnK) = (G: K)(HK: H).
Clearly (HK : H) is a factor of m = (G: H), hence
                           (G: H    n K) =    nml,     where mdm.
Similarly,
                            (G: H   n K) = mnj,         where ndn.
Hence min = nlm, so nlnl        = mimi = 1,          because m and n are coprime, and it
follows that
                                     (G : H   n K)   = mn.
Moreover, (G: H)      = m = ml = (HK: H), hence HK = G, by Lemma 3.2.1.               •
  We shall need the result that a normal Hall subgroup always has a complement.
When the normal subgroup is abelian, this follows from Corollary 3.1.7; the general
case can be deduced from this by an induction on the order:
                                          E
                               H--    ~
                                 ~ ..!!----N
                                  H nN     I
                                          I~K
                                         P           I
                                          I.---"----L
                                         Z           I
                                          I _______ M
                                         1
Lemma 3.2.4. Let G be a finite soluble group. Then any minimal normal subgroup is
elementary abelian.
Proof. Any minimal normal subgroup H of G is abelian, because its derived group is
a proper subgroup and so must be the trivial group. Now H can have no character-
istic subgroups and so must be a p-group for some prime p, and moreover all
elements satisfy x P = 1, i.e. it is elementary abelian, as claimed.           •
Theorem 3.2.5 (P. Hall [1928]). Let G be a finite soluble group and rr a finite set
of primes. Then any rr-subgroup of G is contained in a Hall rr-subgroup, hence Hall
rr-subgroups exist for any rr, and any two Hall rr-subgroups are conjugate in G.
Proof. We shall use induction on IGI. Let Gbe a finite soluble group and M a mini-
mal normal subgroup of G. By Lemma 3.2.4, M is an elementary abelian p-group, for
some prime p. Write G= G/M; by the induction hypothesis G contains a Hall
rr-subgroup iI, where H ;;2 M. Moreover, if A is a rr-subgroup of G, its image A in
G is a rr-subgroup and so is contained in some conjugate of iI, say A S; fIx; it
follows that A S; HX. If P E rr, then (G: H)tr = 1 and H is a rr-subgroup, hence a
Hall rr-subgroup of G, so is H X and A has been embedded in a Hall rr-subgroup.
Since 1 is always a rr-subgroup, this shows that Hall rr-subgroups exist. Moreover,
all Hall rr-subgroups are conjugate, for if A is a Hall rr-subgroup and A S; H X ,
then A = H X because IAI = IHI.
   There remains the case p ¢ rr. By Theorem 3.2.3 there is a complement K of Min H
and all complements are conjugate. Since IKI = liIl = IGltr' K is a Hall rr-subgroup
of G, and all the Hall rr-subgroups are conjugate. If A is any rr-subgroup of G, we
have as before A S; H X for some x E G. Either H C G, then by induction, A is con-
tained in a Hallrr-subgroup of H, which is also a Hallrr-subgroup of G; or H = G.
Then A C G = KM and hence
                          AM=AMnKM= (AM n K)M.
106                                                                 Further group theory
 For the converse we shall use an interesting solubility criterion due to Helmut
Wielandt:
Theorem 3.2.6. Let G be a finite group. If G has three soluble subgroups whose indices
are pairwise coprime, then G is soluble.
Proof. Let the subgroups be HI, H2 , H3 ; if HI = 1, then IGI = (G: Hd is prime to
(G: H 2 ), so the latter is 1 and H2 = G is soluble. Hence we may assume that HI #- 1.
Let M be a minimal normal subgroup of HI; since HI is soluble, M is an elementary
abelian p-group, for some prime p. Now p cannot divide the indices of both H2 and
H 3 , say it is prime to (G : H2)' Then pllHzl, hence H2 contains a non-trivial Sylow
p-subgroup P, which is also a Sylow p-subgroup of G. Let PI be a Sylow p-subgroup
of HI; then PI S; px for some x E G. We may replace H2 by H~ without affecting the
hypothesis; then PI S; P and M S; P, because M <l HI. Thus we have M S; HI n H2 •
Now by Lemma 3.2.2, G = H IH 2; hence any x E G has the form x = XIX2, Xi E Hi;
therefore M X = M X' S; H2 • Hence H2 contains the normal closure of M in G:
K = MG = gp{MYly E G} S; H 2• Since H2 is soluble, so is K. Further, the subgroups
KH;/K of GIK satisfy the hypothesis, so GIK is soluble, hence so is G.             •
   We can now complete the proof of Hall's criterion; we recall that a p-complement
is a subgroup of p-power index and order prime to p; in particular, if p does not
divide the order of G, then G itself is the only p-complement.
Theorem 3.2.7{ P. Hall [1937]). Any finite group is soluble if and only if it contains a
p-complement for each prime p.
Proof. For soluble groups the result follows by Theorem 3.2.5. Now assume that
IGI = p~1 ... P~', where the Pi are distinct primes, and that G has a Pi-complement
Hi for i = 1, ... , r. If r = 1 or 2, G is soluble by Burnside's p"'q.B-theorem (Theorem
6.8.3 below), hence we may assume r::: 3. We claim that each Hi is soluble. Clearly
(G : Hi) = p~;, hence by Lemma 3.2.2, HI n Hi is a Hall rri-subgroup of G, where
rri = rr( G) \ {Pi> pd, and it follows that HI n Hi is a Pi-complement of HI' By induc-
tion HI is soluble, similarly for H2 , ••• , Hr; now we apply Theorem 3.2.6 to complete
the proof.                                                                             •
Exercises
1. By a Hall system in a finite group G is meant a family L of Hall subgroups of G,
   one of order d for each divisor d of IGI prime to IGI/d, such that each HE Lis
   the intersection of all subgroups in L whose orders are divisible by IHI. Verify
   that every family of p-complements, where p runs over the prime divisors of
   IGI, gives rise to a Hall system, and that for any H, K E L, HK = KH E L.
3.3 The transfer                                                                     107
                            J-Li/a)   =   I
                                          0
                                              -I
                                          Siasia
                                                   a
                                                       if j   = i(1a,
                                                       otherwise.
Thus J-L(a) has a single non-zero entry in each row and one in each column; it is
called a monomial matrix over H. To show that we have indeed a representation,
take a, bEG; we have (1ab = (1a(1b because (1 is a permutation representation of G
108                                                                                  Further group theory
                                                          i=1
                                                                                                  (3.3.2)
            n(hisiasi~alhi~al)f = n n(siasi~al)f( n
             n
            i=1
                                        n
                                       i=1
                                             (hJ)
                                                     n
                                                    i=1
                                                                       11
                                                                     i=l
                                                                            hJ)-1        = aV,
aV = n r
                              i=1
                                    (tianiti-l)f,        where tianiti-l    E   H.                (3.3.3)
We shall apply the transfer to prove the existence of normal p-complements under
suitable conditions. Here we need
Lemma 3.3.1. Let G be a finite group, P a Sylow p-subgroup of G and X, Y two subsets
of G normalized by P and conjugate in G. Then X and Yare conjugate in NcCP).
Proof. By hypothesis Y = X b for some bEG, and X, Yare normalized by P, hence
Y = X b is normalized by pb. Thus N = Nc(Y) contains P and p b; clearly they are
Sylow subgroups of N and by Sylow's theorem they are conjugate in N, say
pbc = P for c E N. Writing a = be, we have a E Nc(P), hence xa = Xbc =
y c = Y, because c E N.                                                       •
3.4 Free groups                                                                       109
Theorem 3.3.2 (Burnside). Let G be a finite group and suppose that the Sylow
p-subgroup P of G is contained in the centre of its normalizer. Then P has a normal
complement in G.
Proof. By hypothesis P is abelian, so we can take the transfer V: G -+ P. For any
u E P and nj, tj as in (3.3.3) (for H = P), uni and tjUnit j- 1 lie in P, are normalized
by P and are conjugate in G. By the above lemma they are conjugate in NG(P),
hence equal (because P lies in the centre of NG(P)), so we obtain
Corollary 3.3.3. Let G be a finite non-abelian group with a non-trivial cyclic Sylow
2-subgroup. Then G has a normal 2-complement and so cannot be simple.
Proof. Let P, of order 2a , be a Sylow 2-subgroup of G. Since it is cyclic, its auto-
morphism group has order cp(2a) = 2a - l , therefore (Nc(P): P) is a factor of 2a,
but it also divides (G: P), which is odd, hence NG(P) = P, so the hypothesis of
Theorem 3.3.2 is satisfied and we obtain a normal complement of P.                 •
Exercises
1. In the definition (3.3.2) the sign of the determinant was ignored. Show that taking
     the sign into account only amounts to taking the sign of the permutation repre-
     sentation of G on GIH.
2.   Show that the corestriction mapping HI (G, Z) -+ HI (H, Z) (induced by the
     restriction from G to H) is just the transfer (see also Section 5.6).
3.   Show that if G has an abelian Sylow p-subgroup P with normalizer N, then
     P n G' = P n N' and P = (P n N') x (P n Z(N)). Show also that the maximal
     p-factor group of G (i.e. the maximal quotient group which is a p-group) is
     ~ pnZ(N).
4.   Show that if a Sylow p-subgroup P of G has trivial intersection with any of its
     distinct conjugates, then any two elements of P which are conjugate in G are con-
     jugate in NG(P).
5.   Let P, Q be two distinct Sylow p-subgroups of G, chosen so that (for fixed p)
     P n Q has maximal order. Show that the only conjugates of P n Q in G that
     are contained in P are conjugate to P n Q in NG(P),
groups allows the elements of a free group to be written in an easily recognized form,
that we shall now describe.
  Let X be any non-empty set. By a group word in X we understand an expression
(3.4.1)
Theorem 3.4.1. Let X be a set and F the free group on X. Then every element of F is
represented by exactly one reduced group word in X and two group words represent the
same element of F if and only if they are equivalent.
Proof. If X = 0, F is the trivial group consisting of 1 alone, so we may assume that
X is not empty. It is clear from the definitions that every element of F is represented
by a group word in X, and that equivalent group words represent the same element.
The multiplication of group words by juxtaposition is associative, and it is easily
checked that the equivalence class of a product depends only on those of the factors,
not on the factors themselves. Further, the empty word 1 acts as neutral under multi-
plication. Hence the set of equivalence classes forms a monoid under multiplication,
and this is in fact a group, since u I U2 ... Un has the inverse u;; I ... u:' where
uo-I
 t
     = X-I if ut = X' uo-I
                O
                         t
                             = x if ut = X-I:
                                    O
by elementary reductions. It only remains to show that each group word is equiva-
 lent to exactly one reduced group word. Given a group word (3.4.1), we apply
 elementary reductions as often as possible; each such reduction reduces the length,
 so we arrive at a reduced form after a finite number of steps. In order to show
that this form is independent of the order in which the reductions are made, we
 shall use the diamond lemma (Lemma 1.4.1). We have to show that if a word f is
 reduced to gl, g2 by different elementary reductions, then there is a word h which
can be obtained by reduction from gl as well as g2. There are two cases: (i) The
terms being reduced do not overlap, say UiUi+1 = xx-I, UjUj+1 = yy-I, where
j > i + 1 and x, y E X U X - I, and of course (x - I ) - I = x. Clearly these reductions
3.4 Free groups                                                                         111
can be performed in either order and the outcome will be the same. (ii) The terms
overlap; then a subword uu - I U or u - I UU - I occurs. Taking uu - I u, we can either
reduce uu - I to 1 or u - I U to 1, and we are left with u in each case, so again the
outcome is the same, and similar reasoning applies to u - I UU - I. Now it follows
by Lemma 1.4.1 that we have a unique reduced form.                                   •
  Here is another proof, not using the diamond lemma. The essential step is to show
that distinct reduced words represent distinct group elements. We define P as a
permutation group on the set W of all reduced group words as follows. Given
W = UI ... Un E Wand x E X, we put
                                             if   Un   #- x-lorn = 0,
                                             if   Un   = X-I.
                                               if   Un   #- X   or n = 0,
                                               if   Un   = x.
It is easily checked that ax is a permutation of W with inverse f3x. Thus P has been
defined as a permutation group on W. Now, given any reduced word UI ... Un> if we
apply aUl ... au" (where ax-l = f3x), to the empty word, we find UI ... Un, hence dis-
tinct words define distinct permutations of Wand so represent different elements of
R                                                                                  •
   A free generating set in a free group will also be called a basis. Let P be the free
group with basis X = {XI, ... ,Xd} and denote by p2 the subgroup generated by all
squares. Then P/ p 2 is the elementary abelian 2-group generated by the images of
XI, ... , Xd, and hence is of order 2d. In particular this shows that the number d of
elements in a basis is independent of the choice of basis. It is called the rank of P,
written rk P, and the above argument shows that free groups of different ranks
cannot be isomorphic (see also Further Exercise 1 of Chapter 1).
   We remark that by the results of Section 1.3 every group G can be written as a
homomorphic image of a free group; more precisely, if G can be generated by d
elements, then it can be expressed as a quotient of a free group of rank d.
   We note that Theorem 3.4.1 solves the word problem for free groups, for it pro-
vides a means of deciding when two group words (in a given free generating set)
represent the same group element. Let us now look at the conjugacy problem and
show that in free groups this can be solved in a similar manner.
   Two group words f, g are said to be cyclic conjugates if f = UV, g = vu for suitable
words u, v. Thus xy - I Z - I xy and xyxy - I Z - I are cyclic conjugates. We remark that
even when a word is in reduced form, it may have a cyclic conjugate which is not
reduced, e.g. X- I yX. A group word is said to be cyclically reduced if all its cyclic
conjugates are reduced. Now we have
Proposition 3.4.2. Let P be a free group on a set X. Then every element of P has a cycli-
cally reduced conjugate and two elements of P are conjugate if and only if their cyclically
reduced forms are cyclic conjugates.
112                                                                   Further group theory
  It is clear that one can check in a finite number of steps whether two reduced
words are cyclic conjugates; for words of length n we need only compare n different
words.
Proof. Let f, g E F; if their reduced forms are cyclic conjugates, then f = uv, g = vu
and hence g = u - lfu. Conversely, suppose that f, g are conjugate. By passing to
appropriate conjugates we may suppose that both f and g are in cyclically reduced
form. Let
                                      g=       U   -Ifiu,                         (3.4.2)
where u is in reduced form. Since g is cyclically reduced, the right-hand side cannot
be in reduced form, say u and f begin with the same letter x. Then f cannot end in
x - I (because f is cyclically reduced), so no cancellation can take place between f and
u, and the only way to reduce the right-hand side of (3.4.2) is by cancelling an initial
portion of u against that of f So we have either u = UI U2, f = uIil> hence
u-1fu = u;ltzuIU2; this is reduced and so must equal g; but the latter is cyclically
reduced, so U2 = 1, UI = u and f = ufl> g = fl u, showing f, g to be cyclic conjugates.
Or we have u = fU2; then u-1fu = U;I fU2 and the argument shows as before that
U2 = 1; now it follows again that f, g are cyclically conjugate.                      •
   We conclude this section by proving the Schreier subgroup theorem, first proved
by Otto Schreier in 1927; this is an important result in its own right, while the proof
illustrates some of the methods used in the study of free groups.
   Given any group G with generating set X and a G-set M, we define its diagram
(G, X, M) as the graph whose vertices are the points of M, with an edge from p to q
whenever q = px for some x E X. More precisely, the diagram may be regarded as
a directed graph (digraph), in which each x E X is represented by a directed edge
and x - I by its opposite.
   For example, the symmetric group Sym3 with generating set {(12), (23)} acting on
the set {I, 2, 3} has the diagram shown below; the second diagram represents Sym3
acting on the set of all arrangements of 1, 2, 3. In both cases continuous lines repre-
sent (12) and broken lines (23).
                                           ,   --
                         2                                  3
                                               231
               123                                              312
   We note that the second diagram may be regarded as the diagram of the regular
representation (G, X, G); such a diagram contains all the information contained in
the multiplication table, but in a more accessible form. For example, we can from
the diagram read off the relation [(12)(23)]3 = 1, corresponding to a circuit
based at 123.
   It is easily seen that the diagram of (G, X, M) has a connected graph iff G acts
transitively on M. When this is so, we can take M to consist of the cosets in G of
a stabilizer of a point, and from this point of view a diagram may be called a coset
diagram. Any path in such a coset diagram (G, X, G(H) represents an element of
G applied to a certain coset; in particular any element of H may be represented by
a circuit beginning and ending at the coset H. Since the graph is connected, it
contains a subgraph on all the vertices which is a tree (BA, Theorem 1.3.3), i.e. a con-
nected graph without circuits. Such a tree including all the vertices of our graph is
also called a spanning tree for the graph. Let us express this in terms of our group.
Given any coset Hu, there is a path in the coset diagram from H to Hu (where
the orientation in forming the path is disregarded). Each such path corresponds
to a group word W on X such that Hw = Hu. Now the choice of spanning tree in
our graph amounts to the choice of a particular representative W = WI ... Wr
(Wi E X U X-I) for our coset. It has the property that any left factor WI ..• Wi of
W is the chosen representative of this coset. A transversal with this property is
called a Schreier transversal. So the choice of a spanning tree amounts to choosing
a Schreier transversal, and it is not hard to verify that any Schreier transversal in
turn leads to a spanning tree. Therefore the existence of a spanning tree assures us
that every subgroup of G has a Schreier transversal, relative to the given generating
set of G. We note that neither X nor (G : H) need be finite for the spanning tree to
exist.
   In any group G with generating set X and a subgroup H consider a coset diagram
(G, X, G(H) and choose a spanning tree r. Any edge a in our diagram, from P to q
say, gives rise to a circuit as follows. Denote by po the vertex corresponding to the
coset H. There is a unique path WI within r from Po to p, and a unique path W2
from q to Po; hence WI aW2 is a circuit on po. Here we have W2 = (WI a) -I precisely
if a E r, so our circuit is trivial (reduced to Po) precisely if a E r. If a fj r, so that
the circuit is non-trivial, it will be called a basic circuit, corresponding to a. As we
saw earlier, any circuit on Po corresponds to an element of H. We now observe
that any circuit on Po can be written as a product of basic circuits. For, given
edges aI, ... ,ar forming a circuit on po, suppose that ai goes from Pi -1 to pi,
where pr = po, and let Wi be the path within r from Po to Pi; in particular, Wo is
the empty path.
   Then we have
(3.4.3)
Here Wi _ 1aiwi- 1 either is trivial and so can be omitted, or it is basic. This expresses
our circuit as a product of basic circuits, and it shows that H is generated by the
elements corresponding to basic circuits. Thus we have
114                                                                   Further group theory
Proposition 3.4.3. Let G be a finitely generated group. Then any subgroup of finite
index is finitely generated; more precisely, if G has a generating set of d elements and
(G: H) = m, then H can be generated by m(d - 1) + 1 elements.
  It only remains to prove the last part. The coset diagram (G, X, G/H) has m
vertices and dm edges; by BA, Theorem l.3.3, a spanning tree has m - 1 edges.
We saw that H has a generating set whose elements correspond to the non-tree
edges, and their number is dm - (m -1) = m(d - 1) + l.                                •
Theorem 3.4.4. Any subgroup of a free group is free. IfF is free of finite rank d and H is
a subgroup of finite index m in F, then H has rank given by
                             rkH = (F: H)(rkF -1)           + l.                   (3.4.4)
Equation (3.4.4) is known as Schreier's formula (see also Section ll.5 below).
Proof. We take the coset diagram of (F, X, F / H) and choose a spanning tree r. As
we saw, H is generated by the elements corresponding to basic circuits; our aim is to
show that these elements generate H freely. For each non-tree edge ft we have a basic
circuit it and it will be enough to show that if ftl ... ftr is a reduced word =I- 1 in the
non-tree edges, then
                                                                                   (3.4.5)
   Finally we shall prove that free groups are residually finite p-groups, for any
prime p. We recall from Section l.2 that this means: given any element c =I- 1 in a
free group F, there exists a normal subgroup N of F not containing c, such that
FIN is a finite p-group. This will also show that for any prime p, F can be expressed as
a subdirect product of finite p-groups.
   Let F be the free group on Xl,"" Xd and consider a non-trivial word
                                                                                   (3.4.6)
Let m be a positive integer so large that pm does not divide ftl ... ftr and take G to be
the group of all upper unitriangular matrices of order r + lover Zlpm, i.e. matrices
of the form I + N, where N = (njj), njj = 0 for i ::: j. It follows that N r + I = 0, hence
(I + N)pt = I + Npt = I for pt ::: r + 1; this shows G to be a finite p-group. Our
object will be to find a homomorphism F --+ G such that (3.4.6) is not mapped
to l. We map Xj to Aj, where
                                  Aj   =   n
                                           k
                                               (I +ekk+d,
3.4 Free groups                                                                       115
where the product is taken over all k such that Ik = I. Then we have ekAik              =
ek + ek+ I> where the e are unit row vectors; hence
where the final dots indicate terms in e2, ... , er • We have thus found a homo-
morphism of the required form; we remark that G does not depend on the rank d
of F, and this may be taken to be infinite. We thus obtain
Theorem 3.4.5. The free group (of any rank) is residually a finite p-group, for any
prime p.                                                                               •
Proof. Suppose that nnYn(F) =1= 1 and take a word c =1= 1 in the intersection. By
Theorem 3.4.5 there is a normal subgroup N not containing c such that FIN is a
finite p-group (for some p). Since FIN is nilpotent, of class h say, we have
Yh+I(F) ~ N and so c fj. Yh+I(F), which is a contradiction.                    •
Exercises
 1. Give a direct proof that every abelian subgroup of a free group is cyclic.
 2. In a free group, if w = un, where n ::: 1, u is called a root of w, primitive if n is
    maximal for a given w. Show that every element of a free group has a unique
    primitive root. Show that two elements u, v =1= 1 have a common root or inverse
    roots iff they commute.
 3. Let F be a free group. Show that every subgroup of finite index meets every sub-
    group =1= 1 non-trivially.
 4. Define a group G to be projective (for this exercise only) if any homomorphism
    G --* HI> where HI is a quotient of a group H, can be lifted to a homomorphism
    G --* H. Show that a group is projective iff it is free. (Note that this allows one to
    define free groups without reference to a basis.)
 5. Let G be any group and X a subset such that X n X - I = 0 and any non-empty
    product of elements of X U X - I with no factors xx - I or x - IX is =1= 1. Show that
    X is a free generating set of the subgroup generated by it.
 6. Show that a free group of rank d cannot be generated by fewer than d elements.
 7. A group is called Hopfian (after Heinz Hopf) if every surjective endomorphism
    of G is an automorphism. Show that every free group of finite rank is Hopfian.
 8. Show that in a free group of rank 3 any subgroup of finite index has odd rank.
116                                                                   Further group theory
 9. In the free group on x, y, z find a basis for the subgroup generated by all the
    squares.
10. Show that in the free group on x, y, the elements y-nxyn form a free generating
    set. Deduce that any non-cyclic free group contains a free subgroup of countable
    rank.
11. Show that in any free group F of rank > 1, the derived group F I has countable
    rank.
12. Let G be a group with generating set X and H a subgroup. Show how to con-
    struct a Schreier transversal and verify that it corresponds to a spanning tree
    of the graph (G,X, GjH).
It is clear that this matrix lies in SLn(k), with inverse Bjj( - a). A diagonal matrix
                           n                n
L ajEjj lies in GLn(k) iff aj =j:. 0, while aj = 1 is the condition for it to belong
to SLn(k).
   Taking V again as our vector space over k, let u =j:. 0 be a vector in V. By a trans-
vection along u we understand a linear mapping T of V keeping u fixed and such that
xT - x is in the direction of u. Thus we can express T as
xT = x + A(X)U,
where A is a linear functional on V such that A(U) = O. For example, with the stan-
dard basis e1,"" en in kn, B1n (a) maps x = LXjej to x + ax1en and so is a trans-
vection along en (for n > O. Given a transvection T along u, let us choose a basis
U1, ... , Un of T such that U1 = u while U2,"" Un forms a basis for the kernel of
T - I. Then T takes the form xT = x + A(X)Ul> so T is represented by an elementary
matrix relative to this basis.
   For our first result we take our ring to be a Euclidean domain; we recall (from BA,
Section 10.2) that a Euclidean domain is an integral domain in which the Euclidean
algorithm holds (relative to a norm function).
Theorem 3.5.1. For any (commutative) Euclidean domain R and any n:::: 1, the
special linear group SLn(R) is generated by all elementary matrices and the general
linear group GLn(R) by all elementary and diagonal matrices with units along the
main diagonal. In particular, this holds for any field.
Proof. The result is clear for n = 1, so assume n > 1. Let A E SLn(R); it is clear that
right multiplication by Bi/c) corresponds to the operation of adding the i-th
column, multiplied by c, to the j-th column. Now the Euclidean algorithm shows
that multiplication on the right alternatively by Bdp) and B2l (q) for suitable
p, q E R reduces all, al2 to d, 0 respectively, where d is an HCF of all and al2. By
operating similarly on other columns we reduce all elements of the first row after
the first one to zero. We next repeat this process on the rows, to reduce all elements
in the first column after the first one to zero, by multiplying by suitable elementary
matrices on the left. This either leaves the top row unchanged or it replaces the
(1, I)-entry by a proper factor. By an induction on the number of factors we
reduce A to the form a $ A b where A 1 is (n - 1) x (n - I), and another induction,
this time on n, reduces A to diagonal form. Using the factorization
  ( oCO)
      c-   1   -
                    (1 0) (1 -C)
                     clIO     1 (1 10) (1-1 0)1 (01 11)(1-1 01)'
                                          c- 1
we can express diag(a1, a2, ... , an) by diag(1, a1a2, a3, ... , an) and a product of
elementary matrices, and hence by another induction express A as a product of
elementary matrices.
   The same process applied to a matrix of GLn(R) reduces it to a diagonal matrix
and this proves the second assertion.                                              •
118                                                                   Further group theory
(3.5.1)
Let us recall that a group G is called perfect if it coincides with its derived group G'.
Since R is commutative, it follows that GLn(R)' ~ SLn(R); for fields we have the
following more precise relation:
except when n = 2 and k consists of 2 or 3 elements. Thus SLn(k) is perfect (with the
exceptions listed).
Proof. As we have remarked, we have SLn(k)' ~ GLn(k) ' ~ SLn(k). To establish
equality it is enough, by Theorem 3.5.1, to show that every elementary matrix is a
product of commutators. It is easily checked that for any distinct indices i, j, k,
(3.5.3)
hence if k contains an element b such that b =I- 0, b2 =I- 1, then we can express any
elementary matrix Bda) as a commutator by taking c = (1- b2)-la in (3.5.3),
and similarly for B21 (a). This shows that SLn(k) ' = SLn(k) except when n = 2 and
b3 = b for all b E k, and this happens only when \k\ = 2 or 3.                     •
   We shall see that SL2 (F 2 ) and SL2 (F 3 ) are soluble (see Exercise 1), so the excep-
tions made in Proposition 3.5.2 do in fact occur.
   Our next aim is to show that SLn(k) modulo its centre is simple (except when
n = 2 and \k\ :::: 3). By taking k to be finite we thus obtain a family of finite
simple groups. The centre of GLn(k) or SLn(k) clearly consists of all scalar matrices
and the quotients are known as the projective linear groups, written
where Z is the centre of GLn(k). Let us define projective n-space pn(k) as the set of all
(n + I)-tuples x = (xo, XI, ••• ,xn ) E kn + I, where the Xi are not all 0 and X = Y iff
Xi = AYi for some A E P. Thus the points of pn(k) are given by the ratios of
n + 1 coordinates. It is clear that PGLn+l(k) and PSLn+l(k) act in a natural way
on the points of pn(k).
   To prove the simplicity of PSLn (following Iwasawa) we shall need to recall the
notion of primitivity and derive some of its properties. We recall that a permuta-
tion group G acting on a set S is transitive if for any p, q E S there exists g E G
3.5 Linear groups                                                                   119
Lemma 3.5.3. Let G be a permutation group acting on a set S and denote the stabilizer
of PES by Stp. If G is transitive, then it is primitive if and only if Stp is a maximal
proper subgroup of G.
Proof. Assume G to be imprimitive and let f : S -+ T be a compatible mapping onto
another G-set which is neither injective nor a constant mapping. Take p, q E S
such that pf = qf and denote the stabilizer of pf by L. If pg = p, then
(pf)g = (pg)f = pf, hence Stp S; L S; G. By transitivity there exists h E G such
that ph = q, hence (pf)h = (ph)f = qf = pf. This shows that h E L, but h fj. Stp,
so Stp eLand L is a proper subgroup of G, because f is non-constant. Thus Stp is
not maximal.
  Conversely, if Stp eKe G for some subgroup K of G, we may regard S as the
coset space U Stpx and the mapping Stpx 1-+ Kx is compatible and neither constant
nor injective, hence G cannot be primitive.                                    •
Lemma 3.5.4. Let G be a primitive permutation group acting on a set S. Then any non-
trivial normal subgroup N acts transitively and G = Stp.N for the stabilizer Stp of any
point p of S.
Proof. Consider the orbits under the action of N. For any PES, g E G, we have
pNg = pgN, and this equals pN if pg = p and otherwise is disjoint. By primitivity
it follows that pN = S. Now fix PES and take any g E G; then pg = pu for some
u E N, hence gu E Stp and so g E Stp.N as claimed.                            •
  We shall use these results to prove that PSLn(k) is simple, taking the action of
PSLn(k) on pn-I(k).
Theorem 3.5.6. Let k be a field and n :::: 2. Then PSLn(k) is simple except when n = 2
and Ikl ~ 3.
Proof. As we have seen, the action of G = PSLn(k) on pn-I(k) is primitive; more-
over, G is perfect, for if H = SLn(k) and Z denotes the centre of H, then
G = HjZ, hence by Proposition 3.5.2, G' = H'ZjZ = HjZ = G. To complete the
proof we shall verify the conditions of Proposition 3.5.5. Let p be the point with
coordinates (1,0, ... ,0). This is left fixed by any matrix
( ab
     A
      0) with inverse (a-       0)
                                I
                       -A- ba- A-
                                    I
                                        I     I'
  Let us determine the order of PSLn(k) when k is finite, with q elements say. We
begin with GLn(k); this group acts on V = kn, a vector space with qn elements.
Any element of GLn(k) is completely determined by the image of the standard
basis el, e2, ... , en of V, and this image can be any basis of V. Thus el can map to
3.6 The symplectic group                                                                 121
any non-zero vector, giving qn - 1 choices, e2 can map to any vector independent of
the image of el> giving qn - q choices, e3 can map to any vector not a linear combi-
nation of the images of el, e2, giving qn - q2 choices, and so on. In this way we find
that
To find the order of PSLn(k) we need to calculate the order of its centre Z. We recall
that Z consists of all matrices cI such that cn = I, so we need to find the number of
solutions of cn = 1 in k. But k is a cyclic group of order q - I, so the number we
want is d = (n, q - 1). Thus
IPSLn(k)I = (qn - l)(qn - q) ... (qn - qn-2)qn-1 jd, where d = (n, q - 1).
Exercises
l. By examining the action of PSL2 (k) on pi (k) show that PSL 2 (F 2 ) ~ Sym3'
   PSL2 (F3) ~ Alt3. Show also that GL 2 (F 2 ) ~ Sym3.
2. Show that PSL 2 (F 4 ) ~ PSL2 (Fs ) ~ Alts.
3. Show that PSL 2 (F 9 ) ~ Alt6, PSL4 (F 2 ) ~ Alts.
4. Apply Proposition 3.5.5 to show that Alts is simple.
5. How much remains true of Theorem 3.5.1 when commutativity is dropped?
6. Show that PSL3(F 4 ) and PSL4 (F 2 ) have the same order but are not isomorphic.
   (Hint. Compare the Sylow 2-subgroups.)
                                                                                     (3.6.1)
122                                                                    Further group theory
Now the symplectic transformations may be described by the matrices P such that
                                                                                   (3.6.2)
To establish the existence of a symplectic basis we recall a result from BA. Bya hyper-
bolic pair of vectors we understand a pair u, v E V such that b(u, v) = 1; clearly a
two-dimensional symplectic space has a basis consisting of a hyperbolic pair; such
a space is called a hyperbolic plane. We recall that a subspace N is called totally iso-
tropic if the form restricted to N is identically zero.
Lemma 3.6.1. Let V be a symplectic space, U any subspace and Uo a maximal totally
isotropic subspace of U. Then dim U.::: 2dim Uo and any basis UI..'" Ur of Uo can
after renumbering form part of a basis UI,"" ur, VI, ... ,Vs of U such that the
(Ui, Vi) (i = 1, ... ,s .::: r) are mutually orthogonal hyperbolic pairs. Moreover, this
basis of U can be completed to a symplectic basis of V.
Proof. Given U and Uo as stated, if Uo =f. U, then no vector of Uo is orthogonal to
all of Uo, so there is VI E U such that b(Ui' vd = 0 for all i except one, which may be
taken as 1 by renumbering the u's; thus b(UI, VI) =f. 0 and replacing VI by vdb(uI. vd
we have b(UI, vd = 1. If (Uo, VI) =f. U, we can repeat the process and after a finite
number of steps we reach a basis of U of the required form. By the maximality of
Uo we have s .::: r, hence dim U = r + s .::: 2r.
   Now if s < r, we can in V find Vr such that b(Ui, vr) = Oir (because V is regular);
continuing in this way we find UI, ... , Ur, VI. ... ,Vn a symplectic basis for a sub-
space W of V. If W =f. V, we can have an orthogonal sum V = W1.W1.; by induction
on dim V we can find a symplectic basis for W 1. which together with the basis found
for W forms a symplectic basis for V.                                                •
Lemma 3.6.2. Let V be a symplectic space of dimension 2m and U any non-zero vector
in V. Then any transvection which is symplectic, with kernel u1., has the form .u,c for
some c E k. Moreover,
(i)   for fixed U the mapping c 1-+ .u,c is an injective homomorphism of the additive
      group of k into SP2m(k),
(ii) for any a E SP2m(k), a-I.u,ca = .U(J,c and
(iii) for a E P, .au,c = .u,a2 c'
Proof. Any linear functional on V with kernel u1. is a multiple of b( -, u), hence the
symplectic transformation with kernel u1. has the form .u,c for some c E k. The other
properties are verified without difficulty.                                         •
Theorem 3.6.3. For any field k, and any m ::: 1, SPzm(k) is transitive on the hyperbolic
pairs, and is generated by the set of all symplectic transvections.
Proof. We denote by T the subgroup generated by all symplectic transvections and
divide the proof into three parts:
(i)    T is transitive on the non-zero vectors of V. For if Uj, Uz #- 0 and b( Uj, uz) #- 0,
       let c E k be such that cb(uj, uz) = 1 and put U = Uj - Uz. Then ru,c maps Uj to
       Uj + cb(uj, Uj - uZ)(Uj - uz) = Uj - (Uj - U2) = U2. If b(uj, uz) = 0 and
       v E V is such that b(uj, v) #- 0 (i = 1,2), then the result just proved yields
       transvections to map Uj to v and v to Uz, so it only remains to find v. If
       UI , U2 are linearly dependent, we can take any v not in ut. Otherwise UI, U2
       are linearly independent and such that b(UI, U2) = 0; then we can by Lemma
       3.6.1, find VI, Vz such that (UI, VI) and (U2, V2) are orthogonal hyperbolic
       pairs, and now v = Vj + Vz is the required vector. Thus T has been shown to
       be transitive on the non-zero vectors of V.
(ii) Next we show that T is transitive on the hyperbolic pairs in V. Let (Ui, Vi)
       (i = 1,2) be two such pairs. By what has been proved we may take
       Uj = Uz = u. If b( Vj, V2) #- 0, then (as we saw in (i)) there is a symplectic trans-
       vection r along Vz - VI with VI r = V2; since b( u, VI - vz) = b( U, VI) -
      b(u, vz) = 0, we have ur = u, so r maps (u, vd to (u, vz). If b(VI, vz) = 0, we use
      the hyperbolic pair (u, Vj + u); since -b(VI, VI + u) = b(VI + u, VI) = 1, we can
      find symplectic transvections to map (u, Vj) to (U, Vj + u) and (u, VI + u) to
      (u, vz).
(iii) We now use induction on m to show that T = SP2m(k). Suppose that m = 1,
       and that u, V is a symplectic basis. Any linear transformation has the form
                                   U 1-* U I   = au + bv,
                                   V 1-* Vi = CU + dv,
      and this will be symplectic iff b(u', Vi) = 1, i.e. ad - bc = 1. Thus Spz(k) con-
      sists precisely of all matrices with determinant 1. By Theorem 3.5.1 this group is
      generated by all elementary matrices, and these are easily seen to be trans-
      vections. Now assume that m > 1, let a E SPZm(k) and take any hyperbolic
      pair (u, v) in V. By (ii) there exists rET such that (u, v)a = (u, v)r, hence
      ar- I leaves u, V fixed. Therefore it maps W = (u, v)l. into itself and defines
      an isometry there. By the induction hypothesis ar-IIW =           n r;, where   r;
                                                                                       is
                                                            r;
      a symplectic transvection on W. We can extend to a symplectic transvection
      ri on Vby defining it as the identity on (u, v). Then a =    n ri.r and this shows
      that a E T.                                                                     •
Corollary 3.6.5. The centre of SP2m(k) consists of the transformations I and -I.
Proof. If T is a symplectic transvection along u, then XT - x is proportional to u,
hence XT.a - xa is proportional to ua, for any a E SP2m(k). Now xa.T - xa is pro-
portional to u and not always 0, so if a and T commute, then ua must be propor-
tional to u. For a given a this can happen for all u only if a is a scalar, say
a = c.l. Now the condition (3.6.2) shows that c2 = 1, hence c = ±1.              •
Theorem 3.6.6. SP2m(k) is perfect except when m = 1 and                    Ikl ::::   3 or m = 2 and
Ikl =2.
Proof. In the proof of Theorem 3.6.3 we saw that Sp2(k) ~ SL 2(k) and by Proposi-
tion 3.5.2 this is perfect when Ikl > 2, so we may assume that m ~ 2. Suppose first
that Ikl > 3; we shall show that every transvection Tu,a is a commutator. In k there
exists c :j:. 0 such that c2 :j:. 1. Put b = (1 - c2) -la, d = -c2b; then b + d = a, hence
Tu.a = Tu,bTu,d. If a is any symplectic mapping such that ua = cu (Theorem 3.6.3),
then
             a
                 -1    -1
                      Tu,ba=a
                                  -1
                                       Tu,-b a = TU(J,-b   = Tcu,-b = Tu,-c b = Tu,d,
                                                                           2
Hence
(3,6.4)
It is easily verified that B - A-I B(A T) - 1 = Ell, so we have again a symplectic trans-
formation which is a commutator; for m > 3 we again take A, B to be I, 0 respec-
tively on the new coordinates. So we have in all cases expressed a transvection as
a commutator, and the result follows.                                                  •
  Finally we come to the simplicity proof, which runs along similar lines to that for
the general linear group (see Section 3.5).
Theorem 3.6.7. The projective symplectic group PSP2m(k) is simple for all fields k and
all integers m 2: I, except when m    = 1 and Ikl ::: 3 or m = 2 and Ikl = 2.
Proof. Since Sp2(k) ~ SL2(k), we may assume that m > 1. We consider the action of
G = PSP2m (k) on the space P = P 2m - 1 (k) and begin by showing that G is primitive.
Let Q be a set in a partition of P compatible with the G-action and containing more
than one point. Suppose first that Q contains a pair of points (x}, (y} defined by
a hyperbolic pair of vectors (x, y). Given any other point (z} in P, if b(x, z) i- 0,
we may assume that b(x, z) = 1; since G is transitive on the hyperbolic pairs, by
Theorem 3.6.3, there exists a E G mapping (x, y) to (x, z), hence Qa n Q i- I, so
Qa = Q, but (z} E Qa = Q, so Q = P. If b(x, z) = 0, we may assume that
(z} i- (x}, because otherwise (z} = (x} E Q. Then there exists wE V such that
b(x, w) = b(z, w) = I, and by what has been shown, (z} E Q. Further there exists
a E G mapping (x, w) to (z, w), hence Qa n Q i- 0, so again (z} E Qa = Q and
it follows again that Q = P. The alternative is that the subspace defining Q is totally
isotropic. Let (x, y} be a plane in Q and choose WE V such that b(x, w) = I,
b(y, w) = O. Writing H = (x, w}, we have V = H.lH.1. Further, y E H.1 and for
o i- z E H.1 there exists a symplectic transformation a leaving H fixed and mapping
y to z. Since (x} E Q, it follows that Qa = Q and since (y} E Q, we have
(z} E Qa = Q. Hence Q contains all points defined by vectors in H.1. Since
m > I, H.1 contains a hyperbolic pair, so the first part of the argument can be
applied to show that Q = P, and this shows G to be primitive.
   In order to apply Proposition 3.5.5, we need to find a normal abelian subgroup of
a stabilizer whose conjugates generate G. Take a point (x}, let S be its stabilizer and
let A be the group of transvections rx,a (a E k). Then (3.6.4) shows that A <I S and by
Theorem 3.6.3, A and its conjugates generate G. Hence G is indeed simple, with the
exceptions listed.                                                                   •
(3.6.6)
hence
                                                                                  (3.6.7)
For PSP2m(k) the order is the same when q is even and has one half this value when q
is odd.
Proof. Let V be a two-dimensional space over k; we first determine the number of
hyperbolic pairs in V. For a hyperbolic pair (x,y), x may be any non-zero vector
in V, so there are lm - 1 choices. Given a particular hyperbolic pair (x, Yo), any
other hyperbolic pair with first vector x has the form (x, y), where y = yo + z for
a vector z E x~, and here we have qlm - I choices for z. Hence there are
qlm-I (lm - 1) pairs in all. By Theorem 3.6.3, SPlm(k) is transitive on the set of
these pairs, and the stabilizer of a particular pair (x, y) is isomorphic to the sym-
plectic group of (x,y)~, which is SPlm-l(k). Hence we obtain (3.6.6) and now
(3.6.7) is an easy consequence. The final remark follows because the centre is ±1,
as we saw in Corollary 3.6.5 and -1 = 1 when q is even.                            •
Exercises
1. Show that in the action of PSPlm(k) on p lm - I(k), the stabilizer of a point has
   exactly three orbits.
2. Verify that PSpl(F l ) and PSpl(F 3 ) are both soluble, of orders 6 and 12 respec-
   tively, and express them as permutation groups.
3. Show that Sp4(Fl ) ~ Sym6 by considering its action on quintuples of vectors
   UI, ... , Us such that b(Ui, Uj) = 1 for i of. j in a four-dimensional symplectic
   space over Fl. (Hint. Find the number of hyperbolic pairs in a quintuple and
   the number of quintuples containing a given hyperbolic pair.)
dim V ::: 3; the case dim V = 1 or 2 is easily dealt with separately (see Exercises 4
and 5). We begin by determining the centre of O(V). This turns out to be the
same as that of the symplectic group, but for the proof we use symmetries instead
of transvections.
   The symmetry au is defined only for anisotropic vectors u; in the isotropic case
one has the following replacement, going back to Carl Ludwig Siegel.
   Let U be an isotropic vector, choose v so that u, v is a hyperbolic pair and take
o -I W E (u, v) ~. We have V = (v) El7 u~, hence the equations
                               XPu,w _= x + b(x, w)u                    ~
                           {                                 (x E   U       )     (3.7.1)
                               VPu,w - v - q(w)u - w,
define the linear transformation Pu,w completely and it is easily checked that Pu,w
is orthogonal. Moreover, it is proper (i.e. of determinant 1); in fact it is unipotent,
i.e. 1 - Pu, w is nilpotent.
   We remark that Pu, w is uniquely determined as the orthogonal transformation
which maps x to x + b(x, w)u for all x E u~. For this mapping can be extended to
an orthogonal transformation, by Witt's theorem (BA, Theorem 8.5.5) and if
there were two such mappings, their quotient would leave u~ fixed. Now va has
the form AU + MV + z, where Z E (u, v)~. For any x E (u, v)~ we have 0 = b(x, v) =
b(xa, va) = b(x, AU + MV + z) = b(x, z); hence Z E (u, V}H = 0, i.e. z = O. Further,
1= b(u, v) = b(ua, va) = b(u, AU + MV) = M, thus M = 1 and finally 0 = q(v) =
q(va) = q(Au + v) = A, hence A = O. So va = v and therefore a = 1. This shows
Pu,w to be uniquely determined by its effect on u~.
   We shall use the transformation Pu, w to construct an abelian normal subgroup of
the stabilizer of u.
is a subgroup ofO(V) which is abelian and normal in the stabilizer of u. Moreover, the
mapping
                                        W 1-+   Pu,w                               (3.7.2)
   Our next aim will be to show that under some mild restrictions on V, the derived
group O( V) I is generated by all the A. That some restriction is needed is clear since
there are no A unless the Witt index of V is positive. It will be convenient to write n
for the subgroup of O( V) generated by all the Au> where u ranges over all isotropic
vectors. We begin by establishing some transitivity properties.
Proof. (i) Let z, y be isotropic vectors not orthogonal to u; we may assume that
b(u, x) = b(u, y) = 1. Write y = AU + f.1,X + z, where z E (u, x)\ since b(u, y) = 1,
we have f.1, = 1 and the relation q(y) = 0 shows that .1..+ q(z) = O. Hence
y = x - q(z)u + z = xpu,-z and (i) follows.
   (ii) Let UI, U2 be as stated; if b(UI, U2) =f 0, we may assume that b(UI, U2) = 1.
The space (Ulo U2}.l contains an anisotropic vector w, by the regularity of V. We
put V=UI-q(W)U2+W; then q(v) =-q(w)+q(w) =0, b(uI,v) =-q(w) =fO,
b(U2, v) = 1, so with Al = -l/q(w), .1..2 = 1, v satisfies the required conditions. If
b(UI, U2) = 0, then since UI, U2 are linearly independent, there is a linear functional
equal to 1 on UI, U2, hence by the regularity of b there exists v such that b(Ui' v) = 1,
and by subtracting suitable multiples of UI from v we can ensure that q( v) = 0. Then
(Ui, v) are again two hyperbolic pairs.
    (iii) Let (Ui, v) (i = 1, 2) be any two hyperbolic pairs. If u I, U2 are linearly inde-
pendent, we can by (ii) find v and Ai such that (Aiui, v) are hyperbolic pairs; by
(ii) there is then an element of Av mapping (ud to (U2). Thus for some a E n,
ula = CU2. We now have the hyperbolic pairs (ula, via) = (CU2, via) and (U2, V2);
applying (i) again, we find!' E Au, such that via!' = V2, U2!' = U2, so a!' maps
(UI, vd to (CU2' V2), but 1 = b(UI, VI) = b(CU2, V2) = c, hence C= 1. If UI, U2 are
3.7 The orthogonal group                                                                   129
Aa = (~ a~ 1) or Aa r = (a ~ 1 ~ ). where r = (~ ~ ). (3.7.3)
and we have seen that O"i = Vi-Irivi, where ri is a symmetry in 0 1 and            Vi   E Q. Thus
we have
Since Q is normal in O(V), this relation can be written P = IHI ... rZn where f..L E Q,
so we have SO(V) ~ Q.SOI' But as we saw, Q ~ SO(V), and we conclude that
                                          SO(V) = Q.SOI'                                   (3.7.5)
To complete the proofwe show that Q is perfect, for then Q = Q' = SO(V)'. It will
be enough to show that Pu,w E Q' for all isotropic u and all W orthogonal to a hyper-
bolic plane containing u. So we may take u, v, W, 0 1 as before. Let Aa, r be the trans-
formations in 0 1 defined as in (3.7.3), so that Aa2 E Q by (3.7.4) and (3.7.6). For any
WE W we have
             A-I -I A
              a' Pu,w a'Pu,w      = Pa'u,-wPu,w = Pu,-a'wPu,w = Pu,(I-a')w'
When Ik I > 3, we can choose a E P such that a Z =f. 1 and replacing W by
(1- aZ)-lw we see that Pu,w E Q', hence Q = Q' = O(V)' when Ikl > 3.
   Suppose now that Ik I = 3; then n ::: 4 and for n = 4, V = 1. Thus dim W ::: 2 and
we have an orthogonal basis WI = W, Wz, ... , Wr for W. If q(wl) = q(wz), then
the map a: WI, Wz, ... , Wr 1-+ - W2, WI, W3, ... , Wr is an isometry, hence
a Z E O(V)' ~ Q and a Z maps W to -w. Thus
                            -z
                     -I
                   Pu,w a        Pu,w a
                                          Z
                                              = Pu,-wPu,-w = Pu,-Zw = Pu,w
and this shows that Pu, w E Q'.
   It remains to justify the assumption q(wl) = q(wz). Since q(wl) E P and Ikl = 3,
q( WI) is 1 or -1. For n = 4, W is isotropic, so q( WI), q( wz) have the same sign and
hence must be equal. When n ::: 5, W n w-L is regular and at least two-dimensional,
so q restricted to this subspace is universal (see BA, Theorem 8.2.7), and it follows
that W contains w' orthogonal to W such that q(w') = q(w). Thus we can in all
cases find a basis of the required form.                                            •
  We now have the means at our disposal to prove the main structure theorem for
orthogonal groups. The result was first established, with a restriction on the index, by
Dickson (1901), and in its full generality by Dieudonne in 1940.
Theorem 3.7.5. Let V be a regular quadratic space of dimension n ::: 3 and of positive
Witt index v. Denote by C the centre ofSO(V). Then SO(V)/C is simple, except when
n = 4, v = 2 and n = 3, Ikl = 3.
Proof. We remark that C has order 2 when n is even and is trivial when n is odd. For
by Proposition 3.7.1 the centre can only contain 1 and -1 and det( -1) = ( - It.
3.7 The orthogonal group                                                             131
   Consider the quadric cone Q defined as the set of points (x) in pn-I(k) satisfying
q(x) = O. By Lemma 3.7.3(i), Q acts transitively on Q. We first show that the action
is primitive except when n = 4, v = 2. If b(x, y) =f. 0 for any two linearly indepen-
dent isotropic vectors x, y, then by Lemma 3.7.3(iii), Q is 2-fold transitive, and
hence primitive. In particular, this always holds for v = 1, so we may assume hence-
forth that v ~ 2 and hence n ~ 5. Let S be a subset with more than one point in a
partition of Q stable under Q; we have to show that S = Q. Given (XI), (X2) E S, if
b(XI,X2) = 0, then there exist YI,Y2 E Q such that (xl,yd, (X2,Y2) are orthogonal
hyperbolic pairs. By Lemma 3.7.3(i) applied to (X2' yz)~ there exists a E Q mapping
XI to YI and leaving X2, Y2 fixed. Since (X2) E S, we have Sa = S and (XI) E S, there-
fore (YI) E S. Now if (z) is any point of Q different from (Xl), we can by Lemma
3.7.3(ii) find v such that (Xl, v), (z, v) are hyperbolic pairs. By Lemma 3.7.3(iii)
there exists {3 E Q mapping (XI,YI) to (XI, v), hence S{3 = S and (v) E S{3 = S.
Similarly there is Y E Q mapping (XI, v) to (z, v), hence (v) E Sy = Sand z = Xl y,
so (z) E S. Since z was arbitrary, this means that S = Q.
   We may therefore assume that for any distinct points (Xl), (X2) in S, b(xj, Xz) =f. o.
Given (z) =f. (xd in Q as before, we can find v such that (XI, v), (z, v) are hyperbolic
pairs, and {3 E Q maps (XI,XZ) to (Xl, v). It follows that S{3 = S and (v) E S. If now
Y E Q is chosen so as to map (XI, v) to (z, v), then since vy = v, we have Sy = Sand
z = Xl y, hence (z) E S and we again find that S = Q.
   We thus see that Q acts primitively on Q and Q is perfect, by Theorem 3.7.4.
Moreover, if u is isotropic, then Au is a normal abelian subgroup of the stabilizer
of (u) and the conjugates of Au generate Q, by the proof of Theorem 3.7.4. Hence
by Proposition 3.5.5, Q is simple and now the result follows by Theorem 3.7.4. •
   Some of the exceptions of Theorem 3.7.5 will be considered in the exercises. Let us
now take up some particular cases to show that the hypotheses on the Witt index
cannot be omitted, without striving for full generality. We take a quadratic form
over R; if its index is 0, the form must be definite, say positive definite, and in
suitable coordinates it will have an orthonormal basis. For simplicity consider the
case n = 3, thus SO(V) is the group of rotations in 3-space. We claim that
SO(V) acts primitively on the unit sphere S. For let T be a subset of S with more
than one point, stable under all rotations. Given p, q E T, T must include all
points of the circle through q about p as axis. If the points at opposite ends of a
diameter of this circle are a spherical distance d apart, then T will include points
at any distance:::: d from q, and by repetition, points at any finite (spherical) dis-
tance, hence T = S and the action is primitive. The rotations about a point form
an abelian subgroup whose conjugacy classes generate SO(V), and this shows
SO(V) to be simple. The same argument applies for any odd dimension ~ 3,
while for even dimensions ~ 6, PSO( V) is simple. When dim V = 4, we have
PO ~ G x G, where G = PSL2 (k) when V has index 2, and G = SO(R 3 ) for a Eucli-
dean 3-space when V = R4 is Euclidean. When R4 has index 1 (e.g. the Lorentz
metric of relativity theory), then PO ~ PSL 2 (C); in this case Q consists of all rota-
tions which do not reverse the time direction.
   The argument just used to show that for a definite quadratic form PSO(R n) is
simple depended essentially on the fact that R is Archimedean ordered. For an
132                                                                     Further group theory
ordered field K which is non-Archimedean (i.e. there are elements greater than any
integer) it can be shown that PSO(K n ) is not simple: the infinitesimal rotations
generate a proper normal subgroup (see Exercise 6). This happens, for example,
for the field of formal Laurent series R( (x)), ordered by the sign of its first coefficient.
   For a finite field it is again possible to calculate the order of the orthogonal group,
but this depends on the quadratic character of the determinant as well as the parity
of the dimension (see Exercises 7-9).
Exercises
 1. Give the details of the proof that for a real Euclidean space V of dimension::: 5,
    PSO( V) is simple.
 2. Verify that Pu,w defined by (3.7.1) satisfies (1 - Pu,w? = O. When is
    (1- Pu,w)2 = O?
 3. Let V be an n-dimensional quadratic space. Given two anisotropic vectors x, y,
    show that axay is a rotation in the plane (x,y) leaving (x,y)~ fixed. Verify that
    for a Euclidean V the angle of rotation is twice the angle between x and y.
 4. Use the method of proof of Proposition 3.7.1 to find the centre of 0 1(k), 02(k).
 5. Examine the form Lemmas 3.7.2 and 3.7.3 take when dim V = 1 or 2.
 6. Let V be a Euclidean space over an ordered field K which is non-Archimedean.
    Show that the rotation through an infinitesimal angle generates a proper normal
    subgroup (ex is infinitesimal if nex < 1 for all n E Z).
 7. Show that over a finite field of odd characteristic every regular quadratic form
    has the form (In-I,d) where d is the determinant and that Ik x /k x2 1=2, so
    that there are just two classes of forms in each dimension. (Hint. Recall from
    BA, Section 8.2 that a quadratic form of rank::: 2 over a finite field is universal.)
 8. Let Ikl = q be odd. Show that the number of solutions of L~ (xf - y1) = b
    is q2m - I - qm - I + 8obq m, the number of solutions of L~ (xf - yf) -
    (d - l)y~ = b is q2m - I + qm - I - 8obq m, and the number of solutions of
    L~(xf-y1) -z2=b is q2m+(_b/q)qm, where (-b/q) is a Legendre
    symbol, i.e. 0, 1 or -1 according as -b is 0, a non-zero square or not a
    square in k.
 9. Show that for a regular form over a finite field Fq (q odd), 102m(Fq)1 =
    (q2m-1 _ qm-I)10 2m_I(Fq)I,          102m+I(Fq)1 = (q2m + (_ d/q)qm)1 0 2m(Fq )I,
    where d is the determinant of the form. Hence calculate the order of the ortho-
    gonal group.
10. What form do the equations (3.7.1) take when u is replaced by -u? Show that
    P-u,-w = Pu,w and that the normalizer of Au includes any a mapping u to -u.
                      o ~ x(kF ®k kF) ~ kF ®k kF ~           kF ~   o.
    By applying k ® kF deduce that the augmentation ideal IF is free on x-I (x E X)
    as F-module. Hence show that Hn(F, A) = Hn(F, A) = 0 for n > 1.
 4. Show that in a finite soluble group G, with a Hall subgroup of order r, the
    number hr of subgroups of order r in G has the form hr = CI ••• Cm where
    each Ci is a prime power dividing the order of a chief factor of G and Ci == 1
    modulo a prime factor of r. (Hint. Put IGI = rs, where (r, s) = 1 and first
    treat the case when G has a normal subgroup of index r's', where r'lr, s'ls
    and s' > 1.)
 5. Let Elk be a finite Galois extension of degree m = ql ... qr> where the qi are
    powers of distinct primes. Show that if E contains a subfield Ei of degree qi
    over k, for i = 1, ... , r, then
                            E = EI ® ... ® Er , [Ei:    kJ = qi·
    Use Hall's theorem to show that such a decomposition of E exists iff
    G = Gal(E(k) is soluble and that any two such decompositions of E are conju-
    gate by an element of G.
 6. Show that for any finite Galois extension Elk, a decomposition as in Exercise 5,
    where the E;/k are all Galois extensions, exists iff Gal(E(k) is nilpotent.
 7. Show that the order of a finite simple group is divisible either by 12 or by the
    cube of the least prime dividing its order. (Hint. If a Sylow p-subgroup P has
    order p or p2, it is abelian; now use Theorem 3.3.2 to describe the action of
    Nc(P) on P.)
 8. Show that PSL3 (F 2 ) ~ PSL2 (F 7) and find its order. (Hint. This is the subgroup
    of Sym7 in the action on the columns of the array
                                 1 2     345        6    7
                                2345671
                                 4   5   6   7      2    3
    which map each column into another. The columns may be interpreted as lines
    in the projective plane over F2 , p 2 (F 2 ).)
 9. Describe O(V)ab for a two-dimensional anisotropic space V.
10. Show that in the action of PSP2m(k) on p 2m - l (k), the stabilizer of a point has
    exactly three orbits.
11. Let X be a subset of a free group F and define an elementary transformation of X
    as one of the following: (i) replace x by xy (x, y E X, x =f. y), (ii) replace x by X-I
    (x E X), (iii) omit 1. A Nielsen transformation is a series of elementary trans-
134                                                                   Further group theory
In Section 5.2 of BA we saw that semisimple Artinian rings can be described quite
explicitly as direct products of full matrix rings over skew fields (Wedderburn's
theorem). Later in Chapters 7 and 8 we shall see what can be said when the Artinian
hypothesis is dropped, but in many cases, such as the study of group algebras in
finite characteristic, it is important to find out more about the non-semisimple
(but Artinian) case. There is now a substantial theory of such algebras which is still
developing, and a full treatment is beyond the framework of this book, but some of
the basic properties are described here.
   One of the main results, the Krull-Schmidt theorem, asserts the uniqueness of
decompositions of a module as a direct sum of indecomposables. This is established
in Section 4.1 for finitely generated modules over Artinian rings, but as we shall see
in Section 4.3, for projective modules it holds over the somewhat larger class of semi-
perfect rings. These rings are also of interest because they allow the construction of a
projective cover for each finitely generated module (Section 4.2).
   The rest of the chapter is concerned with conditions for two rings to have equiva-
lent module categories (Section 4.4), leading in Section 4.5 to Morita equivalence;
Section 4.6 deals with flat modules and their relation to projective and injective
modules, while Section 4.7 studies the homology of algebras and in particular separ-
able algebras.
since the latter is nilpotent in any Artinian ring (BA, Theorem 5.3.5), it follows that
an Artinian ring is local iff every element is either nilpotent or a unit. Such rings arise
naturally as endomorphism rings of indecomposable modules, as we shall now see.
In what follows we shall write our modules as left modules and put module homo-
morphisms on the right, except when otherwise stated.
Lemma4.1.1. (Fitting's lemma). Let R be any ring and M an R-module offinite com-
position length. Given any endomorphism a of M, there exists a direct decomposition
such that Mo, Ml both admit a, and a restricted to Mo is nilpotent, while its restriction
to Ml is an automorphism.
Lemma 4.1.3. Let R be any ring and V an indecomposable R-module such that
EndR(V) = E is a local ring with maximal ideal m. Given any R-module M and homo-
4.1 The Krull-Schmidt theorem                                                       137
                           o -+    V ~ M -+ coker a -+       o.
The sequence is split by f3y, hence M     ~   V $ coker a.
                                                                                    •
   In what follows we shall fix a left R-module V with local endomorphism ring
E = EndR(V). The maximal ideal of E will be denoted by m and we put K = Elm
and write x 1-+ [x] for the natural homomorphism E -+ K. Given any left
R-module M, we can consider HomR(V, M) as left E-module in a natural way. We
shall write [V, M] = HomR(V, M)/mHomR(V, M); this is a left E-module which is
annihilated by m, so it can be defined as a left vector space over K in a natural way.
Similarly we can consider HomR(M, V) as a right E-module and hence define
[M, V] = HomR(M, V)/HomR(M, V)m as a right K-space. Next we define a
bilinear mapping
                                b: [V, M] x [M, V] -+ K
by the following rule: Given a E [V, M], f3       E   [M, V], take homomorphisms f, g
such that [fl = a, [g] = f3 and define
This is a well-defined operation on a, f3, for if [fl = [f'], say f' = f + E Ajhj,
where Aj Em, hi E Hom(V, M), then [f'g] = [fg] + E Aj[hjg] = [fg], hence [fg]
depends only on [fl, not on f, and similarly for g. In this way (4.1.2) defines a pair-
ing of the spaces [V, M], [M, V ]. We define the rank of b in the usual way as the
rank of the matrix obtained by taking bases. Thus if (Ui) is a left K-basis of
[V, M] and (Vj) a right K-basis of [M, V], then the rank of b is given by the rank
of the matrix (b(uj, Vj)). Clearly this is independent of the choice of bases; we
shall denote it by fLv(M), thus
Suppose now that M is expressed as a direct sum: M = Et $Mi . It is clear that this
gives rise to a direct sum decomposition for both [V, M] and [M, V]:
Mi is indecomposable and not isomorphic to V. All these arguments still apply when
Mi is an infinite direct sum, and they allow us to draw the following conclusion:
Proposition 4.1.4. Let R be a ring and V an R-module with local endomorphism ring.
Given an R-module M which is expressed as a direct sum of indecomposable modules,
the multiplicity of V in this direct sum is independent of the decomposition chosen for M
and is equal to /Lv(M).
Proof. Let the given decomposition of M be
                                                                                  (4.1.4)
and write /LMA (M) = /LA(M) for short. Then [V, Mj[M, Vl is a direct sum of terms
[V, MAl [MA, V 1, by the above remarks, and since MA is indecomposable, we have
                                 M   = EBIMA = EBJN/L
be two direct decompositions of M into indecomposable modules. If the components in at
least one of these decompositions have local endomorphism rings, then there is a bijection
).. 1-+)..' from I to J such that MA ~ N A,.
Proof. Suppose that EndR(MA) is local for all )... By Proposition 4.1.4, the multipli-
city of each MA is the same in both decompositions and this allows us to construct
the desired bijection.                                                             •
Theorem 4.1.6 (Krull-Schmidt theorem). Let R be any Artinian ring. Any finitely
generated R-module M has a finite direct decomposition
                                                                                  (4.1.5)
where the terms Mi are indecomposable, and this decomposition is unique up to iso-
morphism and the order of the terms; thus if also M = NJ EB ... EB N s, where each
Ni is indecomposable, then s = r and there is a permutation i 1-+ i' of 1, ... ,r such
that Mi ~ Ni'·
Proof. Any finitely generated R-module over an Artinian ring R is Artinian (BA,
Theorem 4.2.3) and so has finite length (BA, Theorem 5.3.9). Hence we can form
a direct decomposition (4.1.5) with a maximum number of terms, and all terms
are then indecomposable. By Corollary 4.1.2 each endomorphism ring EndR(M) is
local, so we can apply Corollary 4.1.5 to conclude that the decompositions have
isomorphic terms, possibly after reordering.                                 •
4.2 The projective cover of a module                                                 139
   Theorem 4.1.6 was stated for finite groups by Joseph H. M. Wedderburn in 1909,
and first completely proved by Robert Remak in 1911. It was later extended to
abelian groups with operators by Wolfgang Krull in 1928 and to general groups
with operators by Otto Yu. Schmidt 1928. In 1950 Goro Azumaya noted that the
finiteness condition could be replaced by the condition that the endomorphism
rings of the components be local (Corollary 4.1.5). The above presentation is
based on a lecture by Sandy Green.
Exercises
1. Show that a ring R with Jacobson radical J is local iff RtJ is a skew field.
2. Show that in any ring an idempotent with I-sided inverse equals 1. Deduce that a
   ring R in which for each a E R either a or 1 - a has a I-sided inverse, must be
   local. (Here 'a has a I-sided inverse' is taken to mean: either ax = 1 or xa = 1
   has a solution.)
3. Give an example of a non-Artinian non-local ring in which every element is either
   nilpotent or a unit. (Hint. Try the commutative case.)
4. Show that a ring R is indecomposable as module over itself iff R contains no
   idempotent -# 0, 1.
5. Use Exercise 4 to give an example of an indecomposable module whose endo-
   morphism ring is not local.
6. Let Vbe an indecomposable module which is not injective and let Ibe its injective
   hull. Show that [V, 11 -# 0 but [V, I]. [I, Vl = o.
7. Let M = MJ $ ... $ Mr and suppose that V is an indecomposable module which
   is a direct summand of M. Show that V is a direct summand of Mi for some
   i, 1 ::::: i ::::: r.
Lemma 4.2.1. Let R be a ring with Jacobson radical J and let M be a finitely generated
R-module. Then
                                                                                 (4.2.1)
140                                                                           Algebras
where M). ranges over all maximal submodules of M. Moreover, if R/I is semisimple,
then equality holds in (4.2.1) and we have
                                                                               (4.2.2)
Lemma 4.2.2. Let R be any ring. Given an R-module M, a finitely generated projective
R-module P and a surjective homomorphism ot : P ~ M, ifker ot 5; IP, then ot is essen-
tial. If R/I is semisimple, this sufficient condition is also necessary.
Proof. Let pi be any submodule of P such that P'ot = M. Then for any x E P
there exists x' E pi such that Xot = x' ot, i.e. x E pi + ker ot. Thus pi + ker ot = P
and by Nakayama's lemma, pi = P; this shows ot to be essential. Suppose now
that R/J is semisimple and ot is essential. Let PI be any maximal submodule of
            ct
P; if ker ot PI, then PI + ker ot = P, hence Plot = Pot = M, contradicting the
fact that ot is essential. Therefore ker ot 5; PI and now ker ot 5; np). = IP, by
Lemma 4.2.1.                                                                        •
Our first task is to prove the existence of projective covers in the Artinian case.
Proposition 4.2.3. Let R be a left Artinian ring. Then any finitely generated left
R-module has a projective cover.
Proof. Let M be a finitely generated projective R-module; we can find a finitely
generated projective module P mapping onto M. We choose P of shortest length
and claim that in this case the homomorphism n : P ~ M is essential. For take a
minimal submodule N of P such that nlN is surjective and let i : N ~ P be the
inclusion map and put f = in : N ~ M. Since P is projective, there is a map
g : P ~ N such that gf = n; now n maps N onto M, so f maps im(gIN) onto M
and by the minimality of N it follows that im(gIN) = N. Thus giN is a surjective
4.2 The projective cover of a module                                                     141
  When a projective cover exists, it must be unique; this can be proved quite
generally.
Proposition 4.2.4. If P, Q are projective covers of a module M (over any ring R), with
essential homomorphisms ex: P ---+ M, f3: Q ---+ M, then there is an isomorphism
() : P ---+ Q such that ex = ()f3.
Proof. Since P is a projective module and f3 is surjective, there exists () : P ---+ Q such
that ex = ()f3. This map () must be surjective, because f3 is essential, and so P splits over
ker (), say P = Ql EEl ker (), where Ql 9:! Q. But then Qlex = Ql()f3 = M, hence Ql = P
and so ker () = O. Thus () is an isomorphism, as claimed.                                  •
where J = J(R) is the Jacobson radical. If M is finitely generated and T(M) = 0, then
M = 0; this is just the content of Nakayama's lemma. When M is finitely generated,
then the natural homomorphism T : M ---+ T(M) is essential, for it is clearly surjec-
tive, and if N C M, then N is contained in a maximal submodule No of M. We have
No :2 JM, hence the natural map v : M ---+ MINo can be factored as M ---+ T(M) ---+
MINo, where vlNo = 0, but this contradicts the fact that TINo is surjective. This
shows T to be essential. When R is Artinian, or more generally, when RIJ is semi-
simple, then T(M) is semisimple by Lemma 4.2.1.
   We note that the projective cover, when it exists, may be obtained as the projective
cover of its top:
Proposition 4.2.5. For any ring R and any R-module M with a projective cover we
have P(M) 9:! P(T(M)).
Proof. The composition of two essential maps is clearly essential and so we have the
essential map
                                 P(M) ---+ M ---+ T(M),                              (4.2.4)
Proposition 4.2.6. Let R be a ring such that RlI is semisimple, and let M be a finitely
generated R-module with a projective cover. Then
                                    T(P(M))        ~     T(M).                        (4.2.5)
Proof. Write P = P(M) and let N be the kernel of the essential map ex : P ---+ M. By
Lemma 4.2.1, N ~ IP and since ex is surjective, it maps JP onto 1M. Therefore
T(P) = PjIP ~ PjN/IPjN ~ MjIM ~ T(M).
  In particular this shows that the projective cover of a simple module has a simple
top.
Exercises
l. Verify that T is a functor. Under what circumstances is P a functor?
2. Show that the only finitely generated Z-modules with a projective cover are the
   free modules.
3. Show that T(T(M)) = T(M), P(P(M)) = P(M).
4. Show that if P(M) is indecomposable, then so is M. Does the converse hold?
5. Show that if ex : P ---+ M is a projective cover and f3 : Q ---+ M is surjective, where
   Q is a projective module, then Q = Po E9 PI, where PI ~ P and f3lPl corresponds
   to ex, while f3IPo = O. Use the result to give another proof of Proposition 4.2.4.
Proposition 4.3.1. Let R be any ring. Any decomposition of R as a direct sum of a finite
number of left ideals
                                     R=     01   E9 ... E9      Or                   (4.3.1)
   In the Artinian case one can construct complete decompositions (4.3.1) (i.e. into
indecomposable terms) by writing the semisimple ring R/J as a direct sum of
simple left ideals and then 'lifting' this decomposition to R. The essential step is
to lift an idempotent from R/J to R; below we shall see how to do this for Artinian
rings and then go on to describe a somewhat larger class of rings for which this is
possible.
   Let R be any ring and 1)1 an ideal of R; an element u E R such that u2 == u (mod 1)1)
is called an idempotent mod 1)1, and we say that u can be lifted to R if there exists
e E R such that e2 = e and e == u (mod 1)1). For such a lifting to be possible one
usually has to assume that N ~ J(R), but this by itself is not enough. For example,
in the ring R of rational numbers with denominators prime to 6, J(R) = 6R and 3, 4
are idempotents mod 6R, which however cannot be lifted to R. The next result gives a
sufficient condition. We recall that a nil ideal is an ideal consisting of nilpotent
elements.
Lemma 4.3.2. Let R be a ring and 1)1 a nil ideal in R. Then idempotents mod 1)1 can be
lifted to R.
Proof. Let u be an idempotent mod 1)1; then (u - u2)m    = 0 for some m ::: o. We have
On the right the first m terms are divisible by um , while each term after the first m is
divisible by (1- u)m, so on denoting the sum of the first m terms bye, we can write
1 = e + (1 - u)mg, where g is a polynomial in u. Now u(1 - u) E 1)1, so
  This lemma together with the Krull-Schmidt theorem (Theorem 4.1.6) shows that
in an Artinian ring isomorphic idempotents are conjugate, so in this case iso-
morphism and conjugacy for idempotents mean the same thing.
4.3 Semiperfect rings                                                                145
Proposition 4.3.5. Let R be any ring and e, f idempotents in R. Write 1 = J(R) and
denote the natural homomorphism R -+ R/l by x 1-+ [x]. Then
(i) e = 0 or 1 if and only if [e] = 0 or 1 respectively;
(ii) e is isomorphic to f if and only if [e] is isomorphic to [fl; in particular, if
      [e] = [fl, then e is conjugate to f;
(iii) if ef == fe == 0 (mod J), then there exists an idempotent fl such that     n
                                                                                == f
      (mod J) and efl = fl e = o.
Proof. (i) If eEl, then 1 - e is a unit and since e(l - e) = 0, we have e = o. Simi-
larly if 1 - eEl and the converse is clear.
   (ii) Let a, b E R be such that a == eaf, b == foe, ab == e, ba == f (mod l) and put
al = eaf, bl = foe. Then alb l = e - z, where Z E ele. Let z' be the quasi-inverse of
z (i.e. z + z' = zz' = z' z) and put z" = ez' e; then z" is a quasi-inverse of z in eRe,
and it follows that al bl (e - z") = e. Putting a2 = ai, b2 = bl (e - z"), we have
a2 == a, b2 == b (mod J) and a2b2 = e. Next write b2a2 = f - y; then y E f 1f and
since (b2a2)2 = b2ea2 = b2a2, it follows that f - y = (f - y)2 = F - fy - yf+
y2 = f - 2y + yZ. Thus y(y - 1) = 0, and since y E l, we find that y = 0 and so
b2a2 = f. Thus e is isomorphic to f, the rest is clear.
   (iii) If ef == fe == 0 (mod J), then 1 - fe is invertible. Put
  We shall use this result to lift decompositions of the form (4.3.2), again without
further hypothesis:
Proposition 4.3.6. In any ring R, let el, ... ,er be a set of idempotents such that
eiej == 0 (mod l) for i #- j. Then there exist idempotents e; such that e; == ei (mod l)
and e;ej = 0 for i #- j. If moreover,
                        1 = el   + ... + e"      eiej == 0 (mod J),   i #- j,    (4.3.3)
then there exist idempotents e; such that e; == ei (mod J), 1 = L e; and e;ej   = 0 for
i #-j.
Proof. For n = 1 there is nothing to prove, so we assume that n > 1 and use induc-
tion on n. This means that we may assume eiej = 0 for i #- j, i, j > 1. Put
e = e2 + ... + er; then e is again idempotent and ele == eel == 0 (mod J). By Propo-
sition 4.3.5 there exists an idempotent e~ such that e~ == el (mod J) and
ee~ = e; e = o. It follows that e;, e2, ... , er are pairwise orthogonal idempotents.
146                                                                              Algebras
   The first part remains true (with the same proof) for a countable set of idem-
potents, but it ceases to hold for uncountable sets (Zelinsky, 1954).
   Proposition 4.3.6 shows that every semiperfect ring R can be written as R = L Rei,
where the ei are primitive and so the Rei are indecomposable. To establish the
uniqueness of such decompositions we shall want to apply the Krull-Schmidt
theorem and we need to check that the endomorphism ring of an indecomposable
left ideal Re is local. Here we shall need a couple of elementary lemmas:
Lemma 4.3.8. Let R be any ring, e an idempotent in Rand M a left R-module. Then
there is an isomorphism of left eRe-modules:
                                                                                  (4.3.4)
  Over a semiperfect ring R, the top of any finitely generated R-module M can be
written
                                     T(M) = EBS/l'
where each S/l is a simple quotient of M, by Lemma 4.2.1. We shall use this remark to
show that projective covers exist over a semiperfect ring.
Theorem 4.3.9. Any finitely generated module over a semiperfect ring R has a projective
cover. More precisely, the projective module P is a projective cover for M if and only if
T(P) ~ T(M).
4.3 Semi perfect rings                                                             147
Proof. Let P be a projective cover for M, say PIN ~ M, where N ~ IP, by Lemma
4.2.1. Then 1M ~ IP/N, hence T(M) = M/IM ~ P/N/IP/N ~ P/IP = T(P), so
the condition is necessary. Now let M be any finitely generated left R-module and
put R = R/f. As we have just seen, T(M) is a finite direct sum of simple R-modules,
which may also be regarded as simple R-modules; further, any simple R-module
has the form Re, where e is a primitive idempotent in R. By definition of R, elifts
to an idempotent e in R, which is again primitive, by Proposition 4.3.5. Write
P = Re for the corresponding indecomposable projective and put P = E9P. Then
T(P) ~ E9Re ~ T(M). More generally, given any projective module P and an iso-
morphism e : T(P) --+ T(M), we have a diagram
P-P/IP-O
                                ,
                                .a
                                            ! e
                               M-M/IM-O
   This result and its proof allows us to view finitely generated modules over a semi-
perfect ring in a new light. With every such module M we associate on the one hand
its top T and on the other its projective cover P. We have essential mappings
P --+ M, P --+ T,
and P, T are the largest resp. smallest modules for which such essential mappings
exist, for a given M. We conclude with an analogue of the Krull-Schmidt theorem
for projective modules over semiperfect rings.
Theorem 4.3.10. Let R be a semiperfect ring. Every finitely generated projective left
R-module P can be written as a direct sum
(4.3.6)
   It can be shown that semiperfect rings form the precise class of rings for which
every finitely generated module has a projective cover. A ring over which every
left module has a projective cover is said to be left perfect. Such a ring R is character-
ized by the fact that R/J is semisimple and J is right vanishing (or also left T-
nilpotent), i.e. for any infinite sequence {a v } in J there exists n such that a] ... an = 0
(see Bass [1960]).
Exercises
 1. Show that an idempotent e in a ring R is primitive iff eRe is non-trivial and has
    no idempotents =j=. 0, 1.
 2. Let R be a ring and a a minimal left ideal. Show that either a2 = 0 and aR is a
    nilpotent two-sided ideal in R, or a = Re for some idempotent e, and hence a is
    a direct summand in R.
 3. Find conditions on idempotents e, f for Re = Rf to hold.
 4. Show that if Re ~ Rf for idempotents e, f where e is central, then f = ef = fe. If
    f is also central, deduce that e = f.
 5. Let R be a local ring and P a finitely generated projective left R-module. Show by
    lifting a basis of T(P) that P is free.
 6. Let R be a semiperfect ring such that R/J is simple. Show that R is a full matrix
    ring over a local ring. Iffurther, R is an integral domain, deduce that it must be a
    local ring.
 7. Let e be a central idempotent in a ring R. Show that if e] is an idempotent such
    that e] == e (mod J), then e] = e.
 8. Show that if 1 = :L ej =:Lf; are two decompositions of 1 into orthogonal
    families of idempotents such that ej == f; (mod J), then u = :L ejf; is a unit
    andf; = u-1eju.
 9. Show that if R is semiperfect, then so is Rn for all n :::: 1.
10. Show that a commutative Artinian ring is a direct product of completely primary
    rings (i.e. every non-unit is nilpotent). Give a counter-example in the non-
    commutative case.
11. Show that a commutative ring is semiperfect iff it is a direct product of finitely
    many local rings. Show that this ring is (left and right) perfect iff the maximal
    ideal of each local factor is vanishing.
where Pf ~ P and f runs over d(P, X), with natural injection if : Pf -+ S; then
the family of maps f : Pf -+ X gives rise to a map F: S -+ X such that f = ifF.
To establish that X is a quotient of S we show that F is epic. Let g : X -+ coker F
be the natural map. Then Fg = 0, hence fg = 0 for all f E d(P, X) and since P is
a generator, it follows that g = O. Thus coker F = 0 and F is epic, as claimed.
  Conversely, assume that X is a quotient of S = UP).., where P).. ~ P, F : S -+ X
and write i).. : P).. -+ S for the natural injection. Given f : X -+ Y, f =f. 0, we have
Ff =f. 0 because F is epic, so i)..Ff =f. 0 for some A, but i)..F E d(P, X). This shows
P to be a generator.
  It is clear that in the category ModR of all right R-modules, R is a generator. The
existence of a generator gives rise to a useful criterion for a natural isomorphism of
functors.
Theorem 4.4.1. Let d, f!J be abelian categories with coproducts and let F, G be fundors
from d to   f!4   which are right exact and preserve coproducts. If there is a natural trans-
formation
                                            t: F -+ G,                                (4.4.1)
S? ---+ s7 ---+ x G -+ 0
By hypothesis     Sf (U               U
                 = p)..)F = pf, hence t2 = t52 is an isomorphism, and likewise
tl' By the 5-lemma, tx is an isomorphism, as asserted.                     •
150                                                                            Algebras
  Let us now consider right exact functors ModA -+ ModB which preserve direct
sums (coproducts). An example of such a functor is Q9A M, where M is an (A, B)-
bimodule. The next result shows that this is essentially the only case:
(4.4.3)
YHHomB(P, Y).
   If one (and hence all) of (a)-(c) holds, we shall call Mod A a quotient category of
Mod B and write A -< B. The functor S is called the section functor and T the retraction
functor.
Proof. (a)    =>
              (b) follows by Lemma 4.4.2 and (b) => (a) will follow if we show T to
be right adjoint to S. This follows because we have the natural transformation
Proposition 4.4.4. Let A, B be rings such that A -< B, with modules APB, BQA satisfying
P® Q ~A. Then
(i)     Q ~ HomB(P, B), P ~ HomBCQ, B);
(ii)    A ~ EndB(P) ~ EndB(Q);
(iii)   PB and BQ are projective;
(iv)    AP and QA are generators;
(v)     we have the following lattice homomorphisms with right inverses ('retractions'):
        Lat(AA) --+ Lat(PB) with 2-sided ideals of A corresponding to (A, B)-submodules
        ofP,
        Lat(AA) --+ Lat(BQ) with 2-sided ideals of A corresponding to (B, A)-submodules
        ofQ·
Proof. In each case it is enough to prove the first part; the second then follows by
symmetry.
(i)     Write S = ®P, T   = ®Q; we have
                   HomB(P, B) ~ HomB(A s , B) ~ HomA(A, BT) ~ Q.
(ii) We have the bimodule homomorphisms
(iii) We have
                         HomB(P, Y) ~ HomA(A, yT) ~ yT,
     and by hypothesis the functor T : Y 1-+ yT is right exact. Hence HomB(P, -) is
     right exact and so PB is projective.
(iv) We have P $ pi ~ IB, where IB stands for a direct sum of III copies of B. Hence
   Later, in Theorem 4.5.4, we shall find that the modules P and Q are actually
finitely generated. For the moment we note that Proposition 4.4.3 leads to a criterion
for Morita equivalence:
Theorem 4.4.5. For any rings A, B the following conditions are equivalent:
(a) ModA ~ Mod B,
(aO) AMod ~ BMod,
(b) there exist bimodules APB, BQA with bimodule isomorphisms
Proof. The equivalence of (a), (b) is clear by the proof of Proposition 4.4.3, and now
the equivalence of (aO), (b) follows by the evident symmetry of (b). Let us pick iso-
morphisms [ : P ® Q -+ A, g : Q ® P -+ B; then all the arrows in the diagrams are
isomorphisms. Consider the first diagram. If we take PEP and move it (anticlock-
wise) round the square we obtain ()p -+ P, where () is an (A, B)-automorphism of P.
Now EndB(P) ~ EndA(A) ~ AO, so () is left multiplication by a unit u in A; since ()
is also an A-automorphism, u lies in the centre of A. If we replace [by uf, then
the first square becomes commutative. We complete the proof by showing that
with this choice of f, g the second square also commutes. For brevity write
f(p®q) = (p,q),g(q®p) = [q,p); we have adjustedfso that
                                  (p, q)p'      = p[q, pi],                    (4.4.4)
and we must show that
                                      [q,p)q' = q(p, q').                      (4.4.5)
4.4 Equivalence of module categories                                                153
   To illustrate the result, we take B = An for some n > 1. Then we may choose
P = An, Q = n A and it is clear that P ® Q ~ A, Q ® P ~ An.
   As a first consequence we see how to sharpen Proposition 4.4.4 for Morita equiva-
lent rings.
X~Y
In the particular case where SIt = Mod A , Nat(I) consists of all A-endomorphisms
which commute with all A-homomorphisms. Writing C for the centre of the
ring A, we have for each c E C an element ILc of Nat(I), defined as
                            XlLc   = xc,       for x   E   X, X   E   Mod A •
154                                                                             Algebras
Theorem 4.4.7. The centre of a ring A is isomorphic to the centre of the category ModA •
Hence Morita equivalent rings have isomorphic centres.                               •
   This result shows for example that two commutative rings are equivalent iff they
are isomorphic; more generally, a ring R is equivalent to a commutative ring iff it is
equivalent to its centre.
   The property of being finitely generated can be expressed categorically: M is
finitely generated iff M cannot be expressed as the union of a chain of proper sub-
modules. This means that any module corresponding to a finitely generated module
under a category equivalence is again finitely generated. However, the cardinal of a
minimal generating set may well be different for the two modules, and this fact can
be utilized to turn any problem on finitely generated modules into a problem on
cyclic modules. In order to show this clearly we examine the equivalence A rv An
in greater detail.
   Fix n ?: 1 and write P = An, Q =n A. We have the functors
                                                                                 (4.4.6)
and
                         NHN~=N0Q              (NEModAn ).                       (4.4.7)
Here N~ may also be defined as HomAn(P, N). It is easily checked that
and this provides an explicit form for the equivalence A rv An. Given any finitely
generated right A-module M, with generating set U\, ... , Un say, we can apply
(4.4.6) and pass to the right An-module Mn, which is generated by the single element
(u\, ... , un). We state the result as
Theorem 4.4.8. For any ring A, any finitely generated A-module M corresponds to a
cyclic An-module under the category-equivalence (4.4.6), for suitable n. In fact it is
enough to take n equal to the cardinal of a generating set of M.                   •
Exercises
l. Show that a skew field K is Morita equivalent only to K n , n = 1,2, ....
2. Verify directly that the centre of A is HomA_A(A,A).
3. Verify that for a ring to be Noetherian or Artinian is a Morita invariant.
4. Show that A -< B {} AO -< BO.
5. Show that any non-trivial ring without IBN has a simple homomorphic image
   without IBN. Verify that a simple ring without IBN is Morita equivalent only
   to a finite number of rings (up to isomorphism).
This is a two-sided ideal in A, called the trace ideal of M. For example, if F is a non-
zero free A-module, then r(F) = A. The modules for which r = A are of particular
interest; as we see from the next result, they are just the generators of the category
ModA •
Lemma 4.5.1. Let A be any ring. For any right A-module M the following are
equivalent:
(a) M is a generator,
(b) rA(M) = A,
(c) M n ~ A EB N for some integer n and some N A.
Proof. (a) ::::} (b). Assume that r(M) = a =f. A; then the natural homomorphism
   A -+ A/a is non-zero, hence by (a) the induced map
7T :
is non-zero. But every f E M* maps M into a by assumption, and this means that
(4.5.1) is zero, a contradiction. Hence r(M) = A and (b) holds.
156                                                                                  Algebras
  (b)   =>(c). By hypothesis reM) = A, hence there exist fl, ... ,fn E M*, UI, ... ,
Un EM such that        L (t;, Ui) = 1. We define a homomorphism rp: M n ---+ A by the
rule (XI, ... , x n ) H L (t;, Xi)' Its image in A is a right ideal containing
1 = L (f;, Ui), hence rp is surjective and if ker rp = N, we have the exact sequence
                             o ---+ N   ---+ M n ---+ A ---+   o.
Since A is projective, this sequence splits and (c) follows.
   (c) => (a). Given a map f: X ---+ Y of A-modules, if the induced map
Hom(M, X) ---+ Hom(M, Y) is zero, then this also holds for the map
Hom(Mn,X) ---+ Hom(Mn, Y), i.e. Hom(AE9N,X) ---+ Hom(AE9N, Y). But the
restriction to the first summand is just the original map f : X ---+ Y (because
Hom(A, X) ~ X), so f = O. This shows hM to be faithful, so (a) holds.        •
  We now consider the following situation. Given two rings A, Band bimodules P,
Q, assume that we have two bimodule homomorphisms:
                          r : P ® Q ---+ A,      f,   XH    (f, x),                   (4.5.2)
such that
                                  [x,fly=x(f,y),                                      (4.5.4)
f,gEP, X,YEQ,
the usual matrix multiplication. This sums up the module laws and (4.5.2), (4.5.3),
while (4.5.4), (4.5.5) are instances of the associative law. The 6-tuple
(A, B, P, Q, r, fL) is called a Morita context. We remark that im r is an ideal in A
and im fL is an ideal in B.
  Starting from any module EA we obtain a Morita context as follows. We put
and regard E as a (B, A)-bimodule and E* as an (A, B)-bimodule in the natural way.
Further, we have a natural map r : E* ® E ---+ A given by evaluation, as in (4.5.2).
To find fL, we use (4.5.4): for any x E E, f E E* we define [x, fl by its effect on E:
   We thus have a Morita context (A, B, E*, E, T, p,) starting from EA ; this is called
the Morita context derived from EA. To give an example, if A is a simple Artinian
ring, say A ~ Kn, where K is a skew field (by Wedderburn's theorem, BA, Theorem
5.2.2) and E = K n is a simple right A-module, then E * = nK and the derived Morita
context has the form (Kn, K, nK, K n, T, p,).
   For general Morita contexts we shall be interested in the case where T, p, are iso-
morphisms. In that case we have a Morita equivalence between A and B, by Theorem
4.4.5, with functors
                                                                        = L      [YA,gAjX; ®f;   = O.
This shows p, to be injective, and hence an isomorphism. The same argument applies
to T.                                                                           ..
   In the special case of a derived Morita context we can give an explicit criterion for
p, to be surjective. We recall the dual basis lemma from BA, Lemma 4.7.5. In the
finitely generated case (BA, Corollary 4.7.6) this states that AP is a direct summand
of An iff there exist UI, •.. , Un E P, fl, ... ,fn E P* (the 'projective coordinate
system') such that
Similarly for a right module this equation takes the form x =L uj(f;, x).
Lemma 4.5.3. Given any module QA, let (A, B, P, Q, T, /.L) be the derived Morita
context. Then /.L : Q ® P ---+ B is an isomorphism if and only if QA is finitely generated
projective.
Proof. By the dual basis lemma just quoted, QA is finitely generated projective iff
there is a finite projective coordinate system
Bearing in mind that Q* = P, we can by (4.5.4) write this as x = L [Uj, f;lx for all
x E Q, i.e. L [Uj, f;l = 1. But this is just the condition for /.L to be surjective, and by
Lemma 4.5.2, for /.L to be an isomorphism.                                               •
  We can now state a condition on any module QA for its derived Morita context to
define an equivalence.
Theorem 4.5.4. Let A be a ring, Q a right A-module and (A, B, P, Q,    T, /.L) the Morita
context derived from Q. Then this context defines a Morita equivalence between A and B
if and only if Q is a finitely generated projective generator.
   A finitely generated projective generator is also called a progenerator.
   This result shows that every Morita equivalence can be obtained from a particular
Morita context. Given A ~ B, to find Q we need a finitely generated projective A-
module; this is a direct summand of An for some n :::: 1, and it may be specified
by an idempotent e in An ~ EndA(A n). Suppose first that n = 1; this means that
OA = eA, where e is an idempotent in A. By Lemma 4.3.8 (or rather, its left-right
dual) we have P = HomA(eA, A) = Ae, B = EndA(eA) = eAe. Now the condition
for Q to be a generator reads: the natural map P ® Q ---+ A is surjective, i.e.
AeA = A. The translation to An is now clear. We choose n:::: 1 and take an idem-
potent e in An such that AneAn = An. Then we have B = eAne, and all rings
Morita equivalent to A are obtained in this way, with the appropriate Morita context
(A, eAne, Ne, enA, T, /.L).
   An important particular case is obtained by starting from a commutative ring K
say. If Q is any finitely generated projective K-module, then A = EndK(Q) 1S a
K-algebra and for any K-algebra R we have
4.5 The Morita context                                                                           '59
K-algebra into one Morita equivalent to it. Such algebras are called Brauer equivalent;
in the special case when K is a field, we have A = Kn and R 0 Kn ~ Rn, while the
general case leads to the study of Azumaya algebras (Azumaya [1950], Auslander
and Goldman [1960]; see also Section 8.6 below).
   We conclude this section by describing another important Morita invariant, the
trace group. Let K be any commutative ring and consider a K-algebra A. With A
we associate a K-module, its trace group, defined as
The natural map A -+ T(A) is called the trace function and is written tr(x). Clearly it
has the following properties:
T.1 tr: A -+ T(A) is K-linear,
T.2 tr(xy) = tr(yx) for all x, yEA.
Moreover, any linear map a: A -+ M into a K-module such that a(xy) = a(yx) can
be written as a(x) = a'(tr(x)) for a unique a' : T(A) -+ M. Thus tr is the universal
mapping satisfying T.1-T.2. Let us show that T is a Morita invariant:
To show that this is well-defined we must check that the map f, x H tr( [x, f]) is
bilinear and B-balanced. The bilinearity is clear, and we have for any b E B,
                 tr([x, fb]) = tr([x, flb) = tr(b[x, fl) = tr([bx, fl),
by T.2, hence the result. Moreover, a(aa') = a(a' a), because a( L: (f, x)a) =
tr(L: [xa,fl) =tr(L: [x,af]) =a(L:a(f,x)); hence a induces a map
a' : T(A) -+ T(B). By symmetry there is a map fi' : T(B) -+ T(A) and these two
maps are easily seen to be mutually inverse.                                 •
Exercises
1. Given a Morita context (A, B, P, Q, r, JL), if P ® Q ~ A and A ~ End(Q), show
   that Q is finitely generated projective.
2. Let K be a commutative ring and P a finitely generated faithful projective module
   (P is faithful if Pa = 0 :::} a = 0). Show by using the dual basis lemma that
   r(P) = K and so P is a generator.
3. If (A, B, P, Q, r, JL) is a Morita context defining a Morita equivalence and
(4.6.2)
In general F, G will not be finitely generated and A may have infinitely many rows
and columns, but each row has only finitely many non-zero entries; we say that A
is row-finite. We note that M is determined up to isomorphism by A as the cokernel
of the corresponding map a, by (4.6.1). Moreover, every row-finite matrix A defines
a module in this way. If F, G have finite ranks m, n respectively, then the presentation
matrix is m x n.
   It is clear that F can be taken to be of finite rank iff M is finitely generated. If G can
be taken to be of finite rank, M is said to be finitely related and the module M is
called finitely presented if F, G can both be taken to be of finite rank. Given two
presentations
                          0---+ Ki --+ Fi ---+ M --+ 0   (i = 1,2)                   (4.6.3)
where FI , Fz are free (but K I , Kz need not be free), we have by Schanuel's lemma
(Lemma 2.4.2), FI E9 K z ~ Fz E9 KJ> hence if M has a presentation with FI of finite
4.6 Projective, injective and flat modules                                                  161
rank and another with K2 finitely generated, then F2 , Kj are also finitely generated
and M is finitely presented.
  Let us examine the presentation matrix of a projective module.
Proposition 4.6.1. Let M be an R-module with the presentation (4.6.1). Then M is pro-
jective if and only if there exists a mapping a' : F -+ G such that aa'a = a. Thus a
matrix A represents a projective module if and only if there is a matrix A' such that
AA'A =A.
Proof. Assume that M is projective; then (4.6.1) splits and so F ~ M EB ker f3 ~
M EB im a, so im a is also projective. Hence the projection F -+ im a can be lifted to
a map a' : F -+ G, whose composition with a is the projection onto im a, i.e.
aa'a=a.
   Conversely, if a' satisfies aa'a = a, then aa' is an idempotent endomorphism of
G, hence G = Gj EB G2 , where Gj = im aa', G2 = ker aa', and writing aj = alG, we
have the exact sequence (4.6.1) with G, a replaced by Gl> aj. Since aa' = 1, the
sequence splits and so M is projective. The final assertion follows by rewriting the
result in terms of matrices.                                                      •
Theorem 4.6.2. For any right R-module U (over any ring R) the following conditions
are equivalent:
(a)   Torf(U, -)    = 0,
(b) U is fiat,
(c) for any free left R-module F and any sub module G ofF, the map U ® G -+ U ® F
    induced by the inclusion G ~ F is injective,
(d) given uc = 0, where u   E un, c E nR, there exists A E rnR n and v E urn such that
    u=vA andAc=O,
(e) for any finitely generated left ideal a of R the map U ® a -+ U induced by the
    inclusion a ~ R is injective, so that U ® a ~ Ua,
(f) as (e), but for any left ideal.
Condition (d) may be expressed loosely by saying that any relation in U is a conse-
quence of relations in R; this explains the name 'flat', if one thinks of relations in a
module as a kind of torsion.
Proof. It is clear from the definition that (a) and (b) are equivalent. When U is flat,
the induced sequence
is injective, so (b)   =}   (c). To show that (c)   =}   (d), consider the exact sequence
                                               a         f3
                                      0-+ K -+ R n -+ R,
where f3 : (Xi) 1-+ L XiCi and K = ker {3. By (c) the induced sequence
                                     °-+ U   ® K -+ Un -+ U
162                                                                                    Algebras
                       °
hence Torf(U, C) = for any cyclic left R-module C. Hence Torf(U, A) = 0 for
any left R-module A, by induction on the number of generators of A, using the
exact homology sequence, and then taking the limit over finitely generated left R-
modules A.                                                                      •
because the fA are free. Moreover, since fAY E K, we have f;fJ =   L (fhfJ)ahi, so Theorem
4.6.2( d) is satisfied, and U is flat.                                                    •
   This shows that for example, over a Noetherian ring, every finitely generated flat
module is projective. We can now also characterize rings over which every (left or
right) module is flat (M. Auslander, 1957):
Theorem 4.6.5. For any ring R the following conditions are equivalent:
(a) every right R-module is flat,
(aO) every left R-module is flat,
(b) given a E R, there exists a' E R such that aa' a = a.
Proof. We shall prove the implications (aO) => (b) => (a); the theorem then follows
by symmetry.
  Let us apply Proposition 4.6.3 to the exact sequence of left R-modules
                                0-+   Ra -+ R -+ R/Ra -+ O.                           (4.6.5)
  A ring satisfying condition (b) of this theorem is called (von Neumann) regular or
sometimes absolutely flat. Clearly every semisimple ring is regular, but of course the
converse does not hold. For example, if K is any field and I a set, then the direct
power K [ is a regular ring, but it is not semisimple, unless I is finite.
  A link between flat and injective modules is provided by
Proposition 4.6.6. A left R-module V (over any ring R) is flat if and only if
V = Hom(V, K) is injective, as right R-module, where K = Q/Z.
Proof. For any right R-module U we have, by adjoint associativity, the natural
isomorphism
                     HomR(U, Homz(V, K)) ~ Homz(U ® V, K).                            (4.6.6)
164                                                                               Algebras
Now assume that V is flat; then - ® V is an exact functor, hence the right-hand side
of (4.6.6) is exact as a functor of U, hence so is the left and this means that V is injec-
tive (by definition). Conversely, when V is injective, the two sides of (4.6.6) are exact
as functors of U. But the functor Homz ( -, K) is faithful, and so preserves inexact
sequences, by Proposition 2.2.6, hence U ® V must be exact in U, so V is flat, as
claimed.                                                                                 •
    For example, Ris always injective; this module has another property that is some-
times useful. We recall that a cogenerator U is defined dually to the term 'generator'
by the condition that hu is faithful; explicitly, for U E RMod, this means that for any
RM and 0 =1= x E M there is a homomorphism q; : M -+ U such that xq; =1= O. For
example, K = QjZ is a cogenerator for Z. For, given any abelian group A and
o =1= x E A, let p be a prime factor of the order of x (or any prime if x has infinite
order). Then xZjpxZ is of order p and can be embedded in K; since K is injective
(i.e. divisible in this case, see Section 2.3), this embedding can be extended to a
homomorphism of AjpxZ into K; combined with the natural mapping
A -+ AjpxZ this gives a homomorphism A -+ K which does not kill x. In the general
case we obtain a cogenerator by forming a coinduced extension, as follows.
   From the definition it is clear that the class of projective modules admits direct
sums, while that of injective modules admits direct products. In the Noetherian
case one can say a little more (E. Matlis, Z. Papp, 1958-59). A module will be
called uniform if it is non-zero and any two non-zero submodules have a non-
zero intersection.
Theorem 4.6.8. Let R be a ring. Then the direct sum of any family of injective left
R-modules is injective if and only if R is left Noetherian. When this is so, every
finitely generated injective left R-module is a direct sum of uniform injectives.
Proof. Suppose that R is left Noetherian, let {EA } be any family of injective left
R-modules and put E = $IE A• We have to show that any homomorphismf : a -+ E
from a left ideal a of R into E extends to a homomorphism of R into E (by Theorem
2.3.4). Since R is left Noetherian, a is finitely generated, by Ul, ... , U r say and the
images ulf, ... , urf have only finitely many non-zero components and so lie in a
4.6 Projective, injective and flat modules                                            165
submodule E' = EBI' EA, where I' is a finite subset of the index set 1. Thus f maps a into
E'; as a finite direct sum of injective modules, E' is again injective, so f extends to a
homomorphism of R into E' and combining this with the inclusion E' ~ E we obtain
the desired extension of f
  Conversely, assume that any direct sum of injective modules is injective and
consider an ascending chain of left ideals in R:
                                                                                  (4.6.7)
Write a = UaA, let In be the injective hull of R/a n, put I = EBIn and define f: a --+ I
as xf = L xfn, where fn : a --+ In is the homomorphism induced by the natural map-
ping a --+ a/an' If x E a, we have x E ar for some r = r(x) and so xfn = 0 for all
n ::: r; hence only finitely many of the xfn are non-zero and f is well-defined. By
hypothesis I is injective, so there is a homomorphism f' : R --+ I extending f Let
If' = c E I; then cEIl + ... + Is for some s and so f' followed by the projection
on In is zero for n > s. Hence the same is true of f, so any x in a must lie in as,
i.e. a = as. Thus (4.6.7) breaks off and this shows R to be left Noetherian.
    Clearly every indecomposable injective is uniform, so the last part follows in the
Noetherian case.                                                                      •
   The corresponding problems for projective and flat modules have been solved by
Stephen Chase [1960 J. A ring R is said to be left coherent if every finitely generated
left ideal in R is finitely related. Now Chase proves (i) the direct product of any
family of flat left R-modules is flat iff R is left coherent, and (ii) the direct product
of any family of projective left R-modules is projective iff R is right perfect and left
coherent.
   For commutative Noetherian rings we can describe the indecomposable injective
modules in terms of prime ideals of the ring. We recall from BA, Section 10.8 that a
prime ideal p of a commutative ring R is meet-irreducible (this is easily verified
directly), hence E(R/p), the injective hull of R/p, is indecomposable. Further we
recall that a maximal annihilator of a module M is a prime ideal; the set of all
prime ideals which form annihilators of elements of M is just Ass(M), and this
consists of a single element p precisely when 0 is primary in M.
  Although this result has been extended for certain non-commutative rings, there is
166                                                                                Algebras
Let R be any ring and M a right R-module such that for any u EM and any a E R,
the equation u = va has a solution v in M whenever (4.6.8) holds. In that case M is
said to be i-divisible. Over an integral domain this reduces to the notion of a divi-
sible module and the proof of Proposition 4.7.8 of BA can be adapted to obtain
Proposition 4.6.10. Over any ring R, an injective module is i-divisible; over a principal
ideal domain the converse holds too.
Proof. This will follow from the more general result in Theorem 4.6.11 below.           •
   In general, being I-divisible is of course not sufficient for injectivity, but a neces-
sary and sufficient condition is now easily obtained. We recall that for any index set
I, M I denotes the direct product of copies of M indexed by I, while I M denotes the
direct sum of I copies. We shall visualize the elements of M I and I M as rows and
columns respectively; thus for any u E M I, x E I R we can form ux = L UjXj, because
almost all components of x are O.
   A right R-module M is called fully divisible if it satisfies the following generaliza-
tion of (4.6.8):
   Given any set I, if U E M I , a E RI are such that
Theorem 4.6.11. A module M over a ring R is injective if and only if it is fully divisible.
Proof. The necessity follows easily: given U E M I , a E RI satisfying (4.6.9), let a be
the right ideal generated by the components of a = (a;) and define a homomorphism
a -+ M by
                                   L ajXj 1-+ L    UjXj.
generating set of (1. Suppose we have a homomorphism f : (1 -+ M and let aif = Ui.
If the Xi -+ R are such that                 L
                                           aiXi = 0, then          L
                                                                  UiXi =     L
                                                                             (a;f)xi =
(L aixi)f = 0, hence (4.6.9) holds and by full divisibility there exists v E M such
that Ui = vai. Now the map X 1-+ vx of R into M extends f, because ai 1-+ vai = Ui,
hence L aiXi 1-+ L UiXi for any family (Xi) E I R. By Baer's criterion it follows that
M is injective.                                                                      •
   Let us call M finitely divisible if for any integer n   ~   1 and any U E Mn, A E Rn
satisfying the condition
                            Ax = 0 => ux = 0 for all X E nR.                     (4.6.10)
there exists v E M n such that U = vA. Essentially this states that M n is I-divisible, as
Rn-module, for all n. Then we have
Corollary 4.6.12. A right R-module M over a right Noetherian ring R is injective if and
only if it is finitely divisible.
Proof. The necessity is clear; to prove the sufficiency, we observe that by Morita
equivalence we have for all n ~ 1,
the bottom row is exact; hence so is the top row. Now the result follows again by
Baer's criterion.                                                              •
Exercises
 l. Show that a module is flat whenever every finitely generated submodule is
    contained in a flat submodule.
 2. Show that a flat module over an integral domain is torsion-free, and that the
    converse holds over a principal ideal domain.
 3. Show that a direct sum of modules is flat iff each term is flat.
 4. Use Proposition 4.6.6 to show that MR is flat iff the canonical map M ® (1 -+ M
    is injective for every finitely generated left ideal (1 of R. Hence obtain another
    proof that (b) {} (d) in Theorem 4.6.2.
 5. Show that every finitely related module is a direct sum of a free module and a
    finitely presented module.
168                                                                            Algebras
(x ® y)(a ® b) = ax ® yb.
Similarly A itself is a right N-module with multiplication rule x(a ® b) = axb, and
the multiplication mapping JL: A ® A -+ A defined by (x ® Y)JL = xy is an
N-module homomorphism. Its kernel is again denoted by Q, as in Section 2.7, so
that we have an exact sequence
                           o -+                 J.l
                                  Q -+ A ® A ---+ A -+ O.                       (4.7.1)
Proposition 4.7.2. Let e E A be such that ell      = 1.   Then e is a separator for A if and
only if (ker Il)e = o.
(4.7.2)
where M is regarded as left N-module. Secondly we define the n-th cohomology group
of M as the derived functor of M 1---+ HomA' (A, M):
where M is regarded as right N-module. These groups were first introduced (when K
is a field) by Gerhard Hochschild in 1945 and are called the Hochschild groups of M
andA.
170                                                                                                  Algebras
As one might expect, f satisfies a factor set condition, derived from the associative
law. Write (4.7.5) as (ab)y = (ay)(by) + f(a, b); then (ab.c)y = (ab)y.cy+
f(ab, c) = ay.by.cy + f(a, b)c + f(ab, c). Similarly, (a.bc)y = ay.by.cy + af(b, c)
+f(a, bc), hence
                       f(a, b)c + f(ab, c) = af(b, c) + f(a, bc);
this shows f to be a cocycle. The extension splits iff there is a section y for f3 which is
a homomorphism. This means that there is a mapping g : A -+ N such that y,
defined by ay = (a, -g(a)) is a homomorphism, i.e. by (4.7.6),
   In a similar way the first cohomology group describes the module extensions.
Given two right A-modules U, V, we can form HomK(U, V), which inherits the
right A-module structure from V and a left A-module structure from U and a veri-
fication, as in Section 2.3, shows that we have an A-bimodule structure. We now
have the following result:
Theorem 4.7.4. Let A be a K-algebra and U, V right A-modules, where A and U are
projective as K-modules. Then
                                                                                   (4.7.7)
Proof. We start from the situation (AAA, KUA , KVA ). As we have just seen, there is a
natural A-bimodule structure on HomK(U, V) and as is easily verified, we have a
natural isomorphism (essentially by adjoint associativity)
(4.7.8)
form A @K Fn @KA, where Fn is a free K-module. Now the tensor product (over K)
of free K-modules is free, hence the tensor product of projective K-modules
is projective, so Xn = (A @K Fn) @K A is right A-projective and U @A Xn =
(U @K Fn) @KA is right A-projective. By (4.7.8) we obtain an isomorphism of
complexes
                  HomA' (X, HomK(U, V))       ~   HomA(U@AX, V).
The complex on the left has cohomology groups Ext~e(A, HomK(U, V)) ~
Hn(A, Hom(U, V)). On the right, since clearly Tor,;(U, A) = 0 for n ~ I, U @AX
is a resolution for U. As we saw, it is A-projective, and so we obtain for its cohom-
ology group Ext~ (U, V) and (4.7.7) follows.                                       •
   We observe that the hypotheses of Theorem 4.7.4 on A and U are satisfied when K
is a field. The results of Theorem 4.7.4 and Proposition 4.7.3 can now be combined
to establish one of the main theorems on the splitting of algebra extensions:
B = C + I, C n I = 12.
Now CjI2 = Cj(I n C) ~ (C + I)jI = BjI satisfies the same hypothesis; since I is
nilpotent, 12 C I, hence C C B and applying induction again, we find a subalgebra
A of C such that C=AEBI2. Now B=C+I=A+I 2 +I=A+I and
A n I = A n C n I = A n 12 = 0, therefore (4.7.9) holds.                       •
   In particular, taking I to be the Jacobson radical J(B) of B, we see that J(B) is com-
plemented whenever BtJ(B) is separable. In that case the result can be proved more
explicitly as follows. Let e = 'L,Pi @ qi be a separator for Btl. Given the cocycle
f(a, b) arising from a section, we put g(a) = 'L,f(a,Pi)qi. Then
The last two terms cancel because be = eb and the first reduces to f(a, b) because
elL = 1; hence the right-hand side is just f(a, b) and this shows fto be a coboundary.
   It remains to determine the separable algebras. In the first place we note that every
separable algebra over a field is finite-dimensional. This follows from
4.7 Hochschild cohomology and separable algebras                                                      173
The mapping (x, y) I~ (Pic, x)y from AD x A to A is bilinear, hence there exists
((Jic : N ~ A such that
and for any w, z the two sides of (4.7.12) vanish for almost all A, because this is true
of the Pic. Thus ((Jic is a right A-module homomorphism. We claim that there exists a
family Wic E N(A E A) such that for any wEN, A(w) = {A E A I«((Jic, w) -=I O} is
finite and
(4.7.13)
This is an analogue of the dual basis lemma (BA, Lemma 4.7.5). We define
Wic = n(uic) ® 1A ; then for any x E AO, YEA,
= L (wAJL)q;(fh,YPi).
Hence A is generated by the family (WAJL)qi, where A ranges over the finite set A(ey),
but A(ey) ~ L(e), which is finite and independent of y. Thus we have found a finite
generating set for A.                                                                 •
  For the rest of this section we shall confine ourselves to algebras over a field. Our
aim will be to show that an algebra over a field k is separable iff it is semisimple and
remains so under all extensions of k. We begin with some generalities.
Proposition 4.7.7. (i) If A, B are separable algebras over a field k, then so are A E9 B
and A ®B.
  (ii) Given a k-algebra A and a field extension F of k, the algebra Ap = A ®k F is
separable if and only if A is.
   (iii) For any field k and any n ::: 1, the full matrix ring kn is separable.
Proof. (i) The separability may be described by the existence of a section A for the
multiplication JL. If AA, AB are sections for the multiplication in A and B respectively,
then AA + AB : A ® B -+ (A ® A) E9 (A ® B) E9 (B ® A) E9 (B ® B) is a section for
the multiplication in A E9 B, while AA ® AB : A ® B -+ A ® B ® A ® B ~
A ® A ® B ® B is a section for A ® B.
   (ii) Let e = LPi ® qi be a separator for A; then e is still a separator for A p , for
eJL = 1 and ae = ea continue to hold for a E Ap. Conversely, if Ap is separable,
choose a basis Uj for F over k such that Uo = 1 and write the separator for Ap as
Theorem 4.7.8. Let A be an algebra over a field k. Then A is separable if and only if Ap
is semisimple for all extension fields F of k. Moreover, when this holds, then A is finite-
dimensional over k.
Proof. A is finite-dimensional whenever it is separable or semisimple, by Proposition
4.7.6 and Wedderburn's theorem (BA, Theorem 5.2.4); so we may assume A to be
finite-dimensional in what follows. Assume A separable; then by Theorem 4.7.4 all
module extensions split, i.e. every A-module is semisimple, so A is semisimple.
4.7 Hochschild cohomology and separable algebras                                     175
Exercises
 l. Verify that a separating idempotent for A is indeed idempotent. (Hint. Use
    Proposition 4.7.2 and the fact that (1 - e)J-L = 0.)
 2. Show that for any commutative ring K and any n :::: 1, Kn is separable by verify-
    ing that L ejl ® eli (where the ejj are the usual matrix units) is a separating
    idempotent.
 3. Let A be a separable algebra over a field k. Show that any right A-module M is a
    direct summand of M ®k A and hence is projective. Deduce that A is semi-
    simple.
 4. Show that an algebra over a field k is separable iff it is semisimple and its centre
    is a direct product of separable field extensions of k.
 5. Show that a K-algebra A (over a commutative ring K) is separable iff the functor
    M 1-+ MA for any A-bimodule M is exact.
 6. Let k be a field of prime characteristic p and F = k(a) a p-radical extension of
    degree p, where a P = a E k. Let A be the k-algebra generated by an element u
    with the defining relation (uP - a)2 = o. Show that J = J(A) is spanned by
    v = uP - a and that A = AIJ is semisimple, but A is not. Verify that A contains
    no sub algebra ~ A.
 7. Show that if Elk is a separable field extension and A is a commutative k-algebra,
    then AE is separable over A.
 8. Let K be a commutative ring and G a finite group whose order n is a unit in K.
    Show that the group algebra KG is separable, by verifying that (1 In) L g - I ® g
    is a separating idempotent.
 9. Let E be a commutative separable K-algebra. Show that for any K-algebra A,
    gl.dim(A ® E) = gl.dim(A). (Hint. Use the separating idempotent of E over K
    to define an averaging operator as in Exercise 8.)
10. Use Exercise 9 to show that a K-algebra A is K-projective, i.e. an exact sequence
    of A-modules which is K-split is A-split.
176                                                                                  Algebras
14. Show that a short exact sequence of R-modules with first two terms M', M is
    pure (M' is pure in M) iff uA = p, where u E Mm, p E Min, A E mR n implies
    that u'A = p for some u' E M'm.
15. Show that for any R-module U (over any ring R) the correspondence
    U 1--+ U= Homz(U, K) is a faithful covariant functor. Show also that there is
    a natural transformation U 1--+ Ua which is an embedding.
16. Show that a module M is flat iff for every homomorphism ex : P --+ M, where P
    is finitely generated projective and for every x E ker ex there is a factorization
    ex = fJy, where fJ: P --+ Q, y: Q --+ M, Q finitely generated projective, such
    that xfJ = O.
17. Let A be a K-algebra and M an A-bimodule. Show that the sequence
Skew fields are more complicated than fields and much less is known about them.
However, in the case of division algebras (the case of finite dimension over the
centre) the situation is rather better. It is convenient to include full matrix rings
over division algebras, thus our topic in Section 5.1 is essentially the class of
simple Artinian rings. Although some of our results are proved in this generality
we shall soon specialize to the finite-dimensional case over a field.
   There is no space to enter into such interesting questions as the discussion of
division algebras over number fields, but we shall in Section 5.2 introduce an impor-
tant field invariant, the Brauer group, show its significance for division algebras and
in Section 5.3 describe some of their invariants. Then, after a look at quaternion
algebras (Section 5.4), we introduce crossed products (Section 5.5). In Section 5.6
we study the effect of changing the base field, and in Section 5.7 illustrate them
on cyclic algebras.
Theorem 5.1.1 (Density theorem). The centre of any simple ring is a field. If A is a
simple ring with centre k, then N = AO 0 A is dense in the ring ofk-linear transforma-
tions of A. In particular, when [A : kJ = n is finite, then
                                                                                     (5.1.1)
  This result will be proved again in a more general context in Section 8.1 below.
That (5.1.1) is an isomorphism also follows from the fact, soon to be proved (in
Corollary 5.1.3) that N is simple, for any simple ring with centre k.
  Let A be a k-algebra, where k is a field. Then for any a E k, a, b E A we have
b H A ® b (b an ideal of B) (5.1.2)
(A ® b) nB = b. (5.1.4)
This holds even for left ideals, hence (5.1.2) is injective for anyone-sided ideal b.
   Next let (t be an ideal in A ® B and put b = (t n B; then A ® b ~ (t and we have to
establish equality here. Any c E (t can be written in terms of a basis Uj of A as
c = L Uj ® Zj, where Zj E B. Only finitely many of the Zj are non-zero, say Zj =j::. 0
for i = 1, ... ,r. By Theorem 5.1.1, N acts densely on A, hence there exist Xj'
Yj E A (j = 1, ... , s) such that LjXjUjYj = 8il (i = 1, ... , r). It follows that
L Xj fYj = Lij Xj UiYj ® Zj = 1 ® Z\ E (t; so Z\ E b and similarly Zi E b for
i = 2, ... ,r. Therefore (t = A ® b and this is the desired equality. Thus (5.1.2),
(5.1.3) are mutually inverse and the lattice isomorphism follows.                   •
  We observe that no finiteness assumptions are needed here. The theorem has a
number of important consequences.
   Sometimes we shall want to know when a given division algebra over k can be
embedded in a matrix ring over a skew field D containing k in its centre. There is
a simple answer when the centre of D is a regular extension of k; it is given by the
following result, taken from Schofield (1985) (a field extension Flk is regular if
E ® F is an integral domain for any field extension Elk).
Proposition 5.1.4. Let D be a skew field whose centre F is a regular extension of k, and
let A be a simple Artinian k-algebra. Then N ®k D is a simple Artinian ring, with a
unique simple module S which is finite-dimensional over D, say [S: D] = s, and A
can be embedded in 9Jtn (D) if and only if sin.
and A is central simple over C; hence by Theorem 5.1.2, the ideals of AD ® D corre-
spond to those of C ®k D. Next we have
  We note that in this result the regularity assumption can be omitted when A is a
central k-algebra.
  We shall want to know when a given k-algebra can be written as a tensor product.
First we shall look at the general case; in the finite-dimensional case we can then
easily obtain a complete answer.
Proof. By hypothesis, A and A' commute elementwise, so the mapping (x, y)             1-+
xy(xE A, YEA') gives rise to a homomorphism
A®A' -+ P. (5.1.5)
Its kernel is an ideal in A ® A', which by Theorem 5.1.2 is of the form A ® a, where a
is the kernel of the restriction of (5.1.5) to A'. But this is the inclusion mapping,
5.1 Simple Artinian rings                                                              183
which is injective, so a = 0 and (5.1.5) is injective. Clearly its image is the sub algebra
generated by A and A'.
   Suppose now that [A : k] = n is finite. We can regard P as N-module, i.e. by
Theorem 5.1.1 as kn-module. This module is semisimple, hence P = EBPA, where
PA is simple, isomorphic to A. Let UA E PA correspond to 1 in this isomorphism;
then uAa = aU A for all a E A, hence UA E A' and so P = AA'. Therefore (5.1.5) is
surjective, hence an isomorphism in this case. Now the assertion about the centre
follows by Corollary 5.1.3.                                                              •
(5.1.6)
which is also easily verified directly. We remark that in general A ® A' will be a
proper sub algebra of Pj for example, if P = k(x, y, z), the free algebra on x, y, z
and A is the subalgebra generated by x, y, then P and A both have centre k and
A' = k, so AA' =f. P.
  We recall from field theory the theorem of the primitive element: A finite separ-
able extension F/k can be generated by a single element over k (see BA, Theorem
7.9.2). Of course a noncommutative algebra cannot be generated by a single element,
but as we shall now see, in many cases two elements suffice:
Proposition 5.1.6. Let D be a central division algebra over a field k and let F be a
maximal separable subfield. Then there is an element U in D such that D = FuF; in
particular, D can be generated by two elements over k.
Proof. Let M(D) be the multiplication algebra of D, generated by the left and right
multiplications Aa and Pa resp. for a E D. Writing DO for the opposite ring of D,
we have a homomorphism DO ® D --+ M(D) mapping a ® b to AaPbj it is surjective
by definition and since DO ® D is simple, it is an isomorphism. Restricting a and b
to F, we obtain a faithful action of F ® F on D. Now F is separable, so (by BA,
Corollary 5.7.4) we have F ® F ~ El X ... x En, where n = [F : k] and the Ei are
fields (composites of F with itself, hence isomorphic to F). Let ei be the element
of M(D) corresponding to the unit element of Ei and choose Ui E D such that
Uiei =f. O. If we now write U = L Uiei, then the map L ar ® br 1--+ L UAa,Pb, is injec-
tive, because U is not annihilated by any Ei • A comparison of dimensions shows that
it is also surjective, and so D = FuF. As a separable extension F/k is generated by a
single element, c say, hence D is generated by U and cover k.                        •
  We next come to a basic result in the theory of central simple algebras, the
Skolem-Noether theorem, which asserts that every automorphism of a finite-
dimensional central simple algebra is inner. It is useful to have a slightly more
general form of this result:
Theorem 5.1.7. Let A be a simple Artinian ring with centre k and B any finite-
dimensional simple k-algebra. Given any homomorphisms Jl' h from B into A, there
184                                                                  Central simple algebras
Corollary S.1.S. In any simple Artinian ring with centre k, isomorphic finite-
dimensional simple k-subalgebras are conjugate and hence have conjugate centralizers. •
  We remark that Corollary 5.1.8 as it stands does not extend to the case of semi-
simple subalgebras (see Exercise 5).
5.1 Simple Artinian rings                                                              185
  We next look at the relation between the dimension of a simple sub algebra and
that of its centralizer.
Theorem 5.1.10 (R. Brauer, 1932). Let A be a simple Artinian ring with centre k and
B a finite-dimensional simple subalgebra with centre P. Then the centralizer B' of B in A
is again simple with centre P and the centralizer B" of B' equals B, while the centralizer
P' of P is given by
                                         P' =B®B'.                                (5.1.10)
Moreover,
                                    [A : B']    = [B : k],                        (5.1.11)
and if [B : k]   = r,   then
                                                                                  (5.1.12)
Since B' ® kr ~ B~, this proves (5.1.12), and comparing dimensions over B', we find
that r2 = [A: B'][B: k], from which (5.1.11) follows on dividing by r.
  Now A ® BO is simple, hence by (5.1.12), so is B' (using Corollary 5.1.3 twice).
Clearly B";2 B, Bill = B', and replacing B by B" in (5.1.11), we find that
[B" : k] = r, hence B" = B.
  Finally, if the centre of B' is E, then E ;2 P and since B" = B, we also have P ;2 E,
hence E = P. Thus B, B' are central simple P-algebras, both sub algebras of P', and
now (5.1.10) follows from Proposition 5.1.5.                                         •
  We observe that B ® B' is in general distinct from A, for the two sides have centres
P and k respectively. Only when P = k do we have P'   = A and the above result then
reduces to part of Proposition 5.1.5.
Corollary 5.1.11. Let A be a simple Artinian ring with centre k and let P be a subfield of
A such that P ;2 k and [P: k] = r is finite. Then
                                   A ®k P ~ P' ®k kr •
If moreover, A has finite dimension n over k, then r21n and writing B = P', we have
A ®k BO  ~ Pn/ r•
Proof. This is just the case B = P of Theorem 5.1.10; here [P': P]      = n/r2.        •
186                                                                 Central simple algebras
   We remark that a central simple algebra A may have no subfield F satisfying the
conditions of Corollary 5.1.12, e.g. A = kn , where k is algebraically closed and
n > 1. Nevertheless, the dimension of a central simple algebra is always a perfect
square, for by Wedderburn's theorem, A ~ Dn where D is a skew field, again with
centre k. If F is a maximal subfield of D, then by Corollary 5.1.12 applied to D we
find that [D: k] = [F : k]2, hence [A: k] = n2[F: kf. In the next section we shall
meet another proof of this important fact.
   As another application of Theorem 5.1.10 we have Wedderburn's theorem on
finite fields. This was proved in BA, Theorem 7.8.6; below is another proof. We
shall need the following remark about finite groups.
Lemma 5.1.13. Let G be a finite group and H a proper subgroup. Then G cannot be
written as the union of all the conjugates of H.
Theorem 5.1.14 (Wedderburn's theorem on finite fields). Any finite skew field is
commutative.
Proof. Suppose that D is a finite skew field; let k denote its centre and let F be a
maximal subfield. Then F is a finite field; all maximal subfields of D have the
same degree r, say, over k (Corollary 5.1.12) and hence are isomorphic, as minimal
splitting fields of x q ' - x where q = Ik I. By Corollary 5.1.8 they are conjugate to F.
Now each element of D lies in some maximal commutative subfield of D, so D is the
union of conjugates of F. It follows that the multiplicative group D X is a finite group,
equal to the union of the conjugates of PX. But this is impossible by Lemma 5.1.13,
hence D must be commutative.                                                           •
5.2 The Brauer group                                                                  187
Exercises
 1. Show that every finite-dimensional central simple algebra over a finite field F has
    the form 9J1n (F), for some n ~ 1.
 2. Let A be a finite-dimensional k-algebra. Show that if AE ~ En for some extension
    field Elk and some n ~ 1, then A is central simple over k.
 3. Let D be a skew field with centre k and let E be a finite-dimensional subalgebra
     (necessarily a skew field). Show that if E' is the centralizer of E, then the
    centralizer of E' is E and [D: E'l = [E: kl.
 4. Let C be a finite-dimensional k-algebra and A a central simple subalgebra. Show
    that [A : kJ divides [C : kJ.
 5. Show that Corollary 5.1.8 no longer holds for semisimple sub algebras. (Hint.
    Take appropriate diagonal subalgebras isomorphic to k2 of k3')
 6. Let R, S be k-algebras, where k is a field, and let R U, Vs be modules as indicated,
    where R acts densely on U with centralizer k. Show that for 0 =f:. Uo E U the map-
    ping v 1--* Uo ® v embeds V in U ®k V. Construct a lattice-isomorphism between
    S-submodules of V and (R, S)-subbimodules of U ® V.
 7. Let D be a skew field with centre k and let F be a maximal subfield of D. Show
    that Dp is a dense ring of linear transformations on D as F-space. Show that if
    either [D : FJ or [F : kJ is finite, then so is the other and Dp is then a full matrix
    ring over F. (Hint. Use the regular representation of D to get a homomorphism
    D ®k F --* Fn , where n is the dimension of D as right F-space.)
 8. Let 8 be a derivation of a central simple algebra A. By representing 8 as an
    isomorphism of (triangular) sub algebras of 9J12 (A) show that 8 is an inner
    derivation.
 9. (Wedderburn, 1921) Let D be a skew field with centre k. Show that any two
    elements of D with the same minimal equation over k are conjugate.
10. (A. Kupferoth) Let D be a skew field with centre C and K a subfield with centre
    F. Show that [K: FJ ::: [D: Cl and when both are finite and equal then F S; C
    and D~ K®p C.
11. Show that if A is a finite-dimensional central simple k-algebra and B is any
    k-algebra, then A ® B is semisimple iff B is.
12. Let D be a skew field with centre k and E a skew sub field with centre C. Show
    that E and the sub field generated by C and k are linearly disjoint over C.
This shows that the class (k) is the neutral element for multiplication and (AO) is the
inverse of (A), and it proves
Theorem 5.2.1. For any field k, the similarity classes of finite-dimensional central
simple k-algebras form an abelian group with respect to the multiplication induced by
the tensor product.
  We still need to check that the collection of all classes is actually a set; this is easily
seen if we observe that the central simple algebras are finite-dimensional over k. •
   The group so obtained is called the Brauer group of k and is written Bk; its
elements are the Brauer classes of central simple k-algebras. The Brauer group is
an invariant of the field k which provides information about the central division
algebras over k. Later, in Section 5.5, we shall meet another description of Bb as a
cohomology group.
   As an example take an algebraically closed field F. If D is a division algebra over F,
then for any a E D, F(a) is a finite extension field, hence F(a) = F because F is
algebraically closed, and so a E F, i.e. D = F. Thus there are no division algebras
over F apart from F itself, and we conclude that BF = 1, i.e. the Brauer group of
an algebraically closed field is trivial. Of course once we drop the restriction on
the dimension, we can find skew fields with centre F, see Section 7.3 and Section 9.1.
   For a closer study of Bk we need to examine the behaviour of algebras under
ground field extension. Let A be a (finite-dimensional) k-algebra and F an extension
field of k (not necessarily finite-dimensional over k). Then the F-algebra defined by
(5.2.2)
by the above remark. It remains to show that we can replace E by a finite extension.
Let UJ, ... , Un be a basis of A and U J , ••• , Un the matrices over E which correspond
to the u's under the mapping (5.2.2). Then we can express the matrix units in Er as
eij = L ctijv Uv for some ctijv E E. Denote the finite set of entries of the Uv and the
ctijv by X. Since E is algebraic over k, X generates a finite extension F of k and it is
clear that AF ~ Fr. Since A is embedded in A F, it can also be embedded in Fr. •
   We note that this result could not be deduced from Proposition 5.1.4, because
here F/k is never regular. Proposition 5.2.2 provides no explicit bound on [F: k],
but we shall soon meet such a bound, in Corollary 5.2.7.
   For any central simple k-algebra A the integer J[A : k] is called the degree of A,
written Deg A. If A ~ D ® km' then Deg A = m(Deg D); clearly the degree of Dis
also an invariant of A, called the Schur index, or simply the index of A. It is a measure
of how far A deviates from being a full matrix algebra over k. We note that the index
is an invariant of the Brauer class of A, while the degree determines A up to
isomorphism within its Brauer class.
   If F is an extension field of k, then the mapping A 1-+ AF induces a mapping of
Brauer classes, for if A ~ D ® km' then AF ~ D ® km ® F ~ D ® Fm, so that
(A F) = (D F). The mapping is a homomorphism, for (A ®k B)F ~ A ®k B ®k F ~
Ap ®F Bp, so we have a group homomorphism
Its kernel B(F /k), the relative Brauer group, consists of those classes (A) over k for
which Ap ~ Fm for some m. Such a class is said to be split by F and F is called a
splitting field for this class, or for the algebra A. If AF = A ® F ~ Fm, then on
taking anti-isomorphisms, we find that AO ® F ~ Fm, hence F splits A iff it splits
AO. Any central simple k-algebra has a splitting field, which may be taken of finite
degree over k, by Proposition 5.2.2.
   Next we examine more closely which fields split a given Brauer class, but first we
establish a relation between the indices of the extensions.
Proposition 5.2.3 (Index reduction lemma, A. A. Albert). Let Flk be a finite field
extension, say [F : k] =r, and A a central simple k-algebra. If A, Ap have indices m,
/L, then /LIm and mlw.
190                                                                  Central simple algebras
(5.2.3)
where G is the skew field component of B (in fact G s:: D by uniqueness). Comparing
dimensions, we find that r = qs, thus mj JL = q Ir, i.e. mlw.                    •
  The factor q in m = JLq is called the index reduction factor; we note that q = r
whenever F is isomorphic to a subfield of D.
  The next corollaries are immediate consequences of Proposition 5.2.3.
Corollary 5.2.4. Let A be a central simple k-algebra and Flk a field extension. If (F : k]
is prime to the index of A, then A and Ap have the same index. In particular, if A is a
division algebra of degree prime to (F : k], then Ap is again a division algebra.    •
Corollary 5.2.5. The degree of a splitting field (over k) of a central simple k-algebra A is
divisible by the index of A.                                                             •
  We now obtain a criterion for a given finite extension field of k to split a given
Brauer class over k.
Theorem 5.2.6. Let W E Bk and let F be an extension field of k, of degree r. Then F splits
W if and only if some algebra in w contains F as a maximal subfield; this algebra neces-
sarily has degree r.
Proof. Let D be the division algebra in the class wand let n be the least integer such
that F can be embedded in Dn. Then the centralizer F' of F in Dn is a skew field, by
the minimality of n, and F is the centre of F'. By Corollary 5.1.11,
(5.2.4)
Since F' has centre F, this shows that F splits D iff F' = F, i.e. iff F is a maximal sub-
field of Dn.                                                                            •
Corollary 5.2.7. Let D be a central division algebra over k. Then any maximal subfield
of D is a splitting field for D.
Proof. By Corollary 5.1.12 any maximal subfield F satisfies (D: k] = (F : k]2, so the
theorem may be applied.                                                            •
5.2 The Brauer group                                                                     191
   The next step is to show that the splitting field of a central simple algebra can
always be taken to be a separable extension of the ground field. We prove more
than this, namely that we can actually find a separable splitting field in the skew
field component. More precisely, the proof below shows that every algebraic skew
field extension contains a separable element.
Theorem 5.2.8 (Kothe, 1932). Every central division algebra Dover k contains a
maximal commutative subfield (hence splitting D) which is separable over k.
Proof. (Herstein) Clearly we may assume that char k = P =1= 0. Our first task will be
to find a separable extension F of kin D. Since [D : k] is finite, each element of Dis
algebraic over k; if some a ~ k is separable over k, then k(a) is the required extension.
Otherwise there are no separable extensions of k, so each element of D is p-radical
over k, say x P' E k for some r = r(x). Hence we can find a ~ k such that a P E k.
Denote by 8 the inner derivation induced by a, 8 : x 1-+ xa - ax. Then x8 P =
xa P - aPx = 0, but of course 8 =1= 0, because a ~ k. Choose bED such that
                                     °
c = b8 =1= 0, b8 2 = 0. Then c8 = = a8, so if u = bc-Ia, then u8 = cc-1a = a.
Writing this out, we have ua - au = a, i.e. u = 1 + aua - 1, but uq E k for some
q = ps, hence u q = 1 + (aua-I)q = 1 + u q (because u q E k). We obtain 1 = 0, a
contradiction. It follows that D contains a proper separable extension.
   Taking a separable extension of maximal degree in D, we obtain a maximal separ-
able extension F. By Theorem 5.1.10, its centralizer F' is simple with centre F, but as
centralizer in a division algebra F' is itself a division algebra, so if F' =1= F, then F has
a proper separable extension E, by the first part. But then E is separable over k, which
contradicts the maximality of F. Hence F' = F, i.e. F is a maximal sub field and by
Corollary 5.2.7, a splitting field of D, separable by construction.                        •
   We remark that the result holds for any skew fields that are algebraic over k but
not necessarily finite-dimensional. For we can use Zorn's lemma instead of a dimen-
sion argument to obtain a maximal separable extension.
Corollary 5.2.9. Every Brauer class of k has a splitting field which is a finite Galois
extension of k.
Proposition 5.2.10. If C, D are two central division k-algebras of coprime degrees, then
C ® D is again a division algebra.
  Our next result provides a description of the index of a tensor product, even when
only one of the factors has finite degree. We remark that if R is a simple Artinian ring
and Rr ~ Dn> where D is a skew field, then rln, say n = rs and R ~ Ds. For by
Wedderburn's theorem, R has the form R ~ Ks where K is a skew field. It follows
that Krs ~ Dn and by uniqueness, n = rs and K ~ D, and so R ~ Ds.
Theorem 5.2.11. Let Flk be a field extension of degree d, and let C, D be skew fields
with centres k, F respectively. If either C or D is of finite degree, then
                                                                                      (5.2.5)
      In particular, in case (i) qlr2 and C ® D is a skew field if and only if q = r2, while
      in case (ii) nls 2d and C ® D is a skew field if and only if n = s 2d.
(iii) When both C, D have finite degrees r, s respectively, and n, q are as in (i), (ii), then
      n, q are related by the equation
(5.2.8)
From (i), (ii) it is clear that r, s, d are independent, while n, q are related as in (5.2.8),
and m, t are determined by (5.2.9) in terms of r, s, d, n, q.
Proof. The algebra C ® D is simple with centre F, by Theorem 5.1.2. If C has finite
degree, then C ® D is finite-dimensional over Dj if D has finite degree, C ® D is
finite-dimensional over C. In either case it is Artinian and by Wedderburn's theorem
it has the form Gm for some m, where G is a skew field with centre F.
5.2 The Brauer group                                                                  193
   (i) Suppose now that Deg C = r; then C and hence Co can be embedded in kr2
and hence in D r 2. Let q be the least integer such that C can be embedded in Dq as
k-algebra. Then by Proposition 5.1.5, Dq ~ Co ®k E, where E is a simple algebra
with centre F, by Theorem 5.1.2. Moreover, E is Artinian, for if a is a left ideal,
then Co ® a is a left D-space, hence the length of chains of left ideals is bounded
by q. Thus E is a matrix ring over a skew field. Taking Brauer classes we have
(E) = (C)(D) = (G), therefore E ~ Gh for some h, but if h > 1, we can replace q
by q/h, which contradicts the minimality of q. Hence h = 1 and Dq ~ Co ® G. It
follows that Dqm ~ Co ® Gm ~ Co ® C ® D ~ kr' ® D S:! D r2 and so qm = r2;
thus (5.2.6) is established.
   (ii) Next assume that Deg D = s; then D and with it DO can be embedded in Fs'
and hence in ks2d. Let n be the least integer for which DO can be embedded in Cn and
denote by H the centralizer of DO in Cn. Then by Theorem 5.1.10, F' S:! DO ®p H,
where F' is the centralizer of F, and H is simple with centre F, while
Cn ®k D S:! Hs'd, by (5.1.12). Comparing this relation with (5.2.5), we see that
s2d = mn and H S:! G. Finally when both C and D have finite degrees, then by
combining (5.2.6) and (5.2.7) we see that m = r2jq = s2djn, therefore nr2 = qs2d,
and if Deg G = t, then a comparison of degrees in (5.2.5), (5.2.6) yields
t = sqjr = rnjsd.                                                                 •
Corollary 5.2.12. Let C be a skew field with centre k and F a finite extension of k. Then
Cp S:! Gm for some skew field G, where m divides [F: kJ, with equality if and only if F
can be embedded in C.                                                               •
   3 and 4 above (see Weil (1967)or Reiner (1976». More precisely, we have an exact
   sequence (Hasse reciprocity)
Exercises
1. Define Brauer classes for any central simple algebra, not necessarily finite-
   dimensional, and show that these classes form a monoid whose group of units
   is Bk.
2. In Proposition 5.2.3 show that q is the degree of the largest subfield common to F
   and D.
3. Show that any skew field p-radical over its centre is commutative.
4. Prove Theorem 5.2.8 in detail for skew fields algebraic over k.
5. Let D be a central division k-algebra. For any automorphism a of D as a skew field
   define the inner order as the least r such that a' is inner. Show that the inner order
   of any such a divides the order of the restriction alk.
6. Show that a central simple algebra of degree n is split iff it has a left ideal of
   dimension n.
Here p: A ~ ootrn(k) is the regular representation and the norm and trace are
defined in terms of it by
Let eij be the standard basis of matrix units for Fn and write a         E   A as a   = L aijeij'
Then the equation (5.3.1) takes the form
                                   e,sa   =L       asje'j;
                                              j
5.3 The reduced norm and trace                                                       195
hence the matrix p(a) has as (rs, uv)-entry (asvo ur ) and the equations (5.3.2) become
in this case
We note that whereas A is the canonical mapping a 1--+ a ® I, /l is not uniquely deter-
mined, but the definition (5.3.5) is independent of the choice of /l, for two iso-
morphisms of AF with Fn differ by an automorphism of Fn which must be inner,
by the Skolem-Noether theorem, and so leave N, T unaffected. From the definition
N(a), T(a) lie in F, but if a is a k-automorphism of F, then a induces a k-auto-
morphism of Fn which gives another representation a 1--+ (aA/l). Since A is a
k-algebra, a leaves a E A fixed and so N(a) <I = N(a), T(a)<I = T(a). This holds for
all a E Gal(F /k), hence N(a), T(a) E k. Further, if F' is another separable splitting
field of A, we can find a Galois extension E to contain both F and F ' , and it follows
that F and F I give rise to the same Nand T. The following familiar properties of
norm and trace are easily verified; here [A: k] = n2 •
Rl N(ab) = N(a)N(b), N(aa) = aN(a), NO) = 1, where a               E   k,
R2 T(a + b) = T(a) + T(b), T(aa) = aT(a), TO) = n,
R3 T(ab) = T(ba),
R.4 Nm(a) = N(a)n, Tr(a) = n.T(a).
We also have a product formula for the reduced norm and trace. For a field exten-
sion F/k we shall write NF/k and TF/k for the usual norm and trace.
                                                                                 (5.3.6)
196                                                                   Central simple algebras
Proof. Let E be a Galois splitting field of B which also splits A. Then AE        ~ En   and
under the mapping A --+ A E , B becomes
              B ®kE = (B ®kF) ®P E ~ (B ®P E) ®kF ~ (E ®kF)r'                        (5.3.7)
We thus have an embedding of (E ®k F)r in En> so En is an r X r matrix ring,
En ~ Cn where C is simple Artinian, hence C ~ En/r by uniqueness.
   Since AE ~ En> there is a unique simple right AE-module V ~ P. By (5.3.7),
B ® E is faithfully represented by endomorphisms of U = (E ® F) r. Now [U : E J =
r.[E ® F: EJ = rt and B ® E acts on V, hence V ~ U 5 for some s, and a comparison
of dimensions shows that n = rst. For any bE B we have b ® 1 E B ®P E ~ En so
b ® 1 is represented by an r x r matrix and NB/P(b) = det(b ® 1), TB/P(b) =
Tr(b ® 1). If we now tensor with F and consider b ® 1 ® 1 in (B ®P E) ®k F =
(E ®k F)" we have
         det(b ® 1 ® 1)   = Np/k(NB/P(b»,    Tr(b ® 1 ® 1)    = Tp/k(TB/P(b»,
and since V ~ U 5 , we obtain (5.3.6).                                                    •
  We note the special case B = F:
Corollary 5.3.2. Let A be a central simple k-algebra of degree nand F a subfield ofA, of
degree t over k. Then tin, say n = st, and for any a E F,
                     NA/k(a)   = NFjk(a)S,   TA/k(a)   = S.Tp/k(a).                       •
  Let us denote the group of units of A by U(A). The reduced norm defines a map-
ping U(A) --+ P which by R.l above is a homomorphism. Since k is commutative,
the commutator subgroup U(A)' is mapped to 1 in this homomorphism. Let us
define the Whitehead group of A as
(5.3.8)
  The exceptions in Lemma 5.3.4 are treated in Exercises 1 and 2. We next describe
the diagram resulting from an algebra homomorphism.
Theorem 5.3.5. Let Flk be a field extension and let A, B be simple algebras with centres
k, F respectively. If there is a k-algebra homomorphism () : A --+ B, then Deg B =
d.Deg A for some d 2: 1, and there are homomorphisms such that the diagram
                  1 --+ SKI (A) --+ KI (A) --+ k X --+ coker     VA/k   --+ 1
                         t             tK1(e)    t          t                   (5.3.10)
                  1 --+ SKI (B) --+ KI (B) --+ F X --+ coker    VBjP    --+ 1
   We note the special case when A is any central simple k-algebra and B = A F • Then
we obtain an exact commutative diagram of the form (5.3.10) with B = A p , where
the mapping P --+ Px is the inclusion mapping (because now d = 1). In particular,
taking F to be a splitting field of A, we have an isomorphism KI (Ap) ~ PX, by
Lemma 5.3.4 (with the exceptions listed), hence SKI (Ap) and coker VApfP are then
trivial. We remark that any a E K satisfies N(a) = an; it follows that coker v is a
group of exponent dividing n, the index of A. It can be shown that SKI (A) has
finite exponent dividing npfi- I where npfi is the index of A (see Draxl
(1983». In fact for many ground fields, e.g. all algebraic number fields, it can be
shown that SKI (A) = 1 and it was an open problem for many years whether
(apart from the trivial exceptions of Exercises 1 and 2) algebras with non-trivial
reduced Whitehead group exist (Tannaka-Artin problem). In 1975 Vladimir Platonov
gave examples of algebras with non-trivial reduced Whitehead group; we shall meet
some simple examples due to Peter Draxllater, in Section 7.3.
198                                                                Central simple algebras
   The reduced norm can be used to show that Bk is trivial for certain fields k. In any
central division algebra A of degree r we have N(x) "# 0 for x "# 0, and taking a basis
u), ... , Un (n = r2) of A, we can write the general element of A as x = L ~iUi' Now
N(x) become a form, i.e. a homogeneous polynomial of degree r in the r2 variables ~i.
A field k is said to be quasi-algebraically closed or a C)-field if every form of degree d
in n > d variables has a non-trivial zero. With this definition we have
Proof. Let k be a C1-field; we have to show that there are no central division algebras
other than k. Let D be a central division algebra of degree rover k. The reduced norm
N(x) is a form of degree r in r2 variables, and N(x) = 0 has no non-trivial solutions,
hence r2 ::: r, so r = 1 and D = k, as claimed.                                      •
 An obvious example of C)-fields are the algebraically closed fields; we shall soon
meet other examples. For the moment we note a reduction that is sometimes useful:
Proof. Let F/k be an extension of degree r and take a basis v), ... , Vr of F over k.
If f(xj, ... ,xn) is a form of degree d < n with coefficients in F, let us write
X A = L ~AiVi and consider
   Let us show that every finite field is C1. By Proposition 5.3.7 we can limit ourselves
to Fp ' but that is no easier. We shall need a formula for power sums in Fq:
                            _ '" m -_ {-I
                          Sm-~x                    if q - 11m,
                                x Ek         0     otherwise.
In particular, iff has zero constant term, then it has a non-trivial zero.
Proof. For each x   E   P we have
                                                  1 if x E V(f),
                           1-f(x)q-l        ==   {0
                                                     if x ¢ V(f).
Thus 1 - f(X)q-l is the characteristic function of V(f), and summing over all
points x of V(f), we find
(5.3.12)
If Vi = 0 for some i, then SVi == 0 (mod q) and we get zero, so we may assume that
Vi> 0 for i = 1, ... , n. But by Lemma 5.3.8, Sm = 0 unless q - 11m, and since
LVi::: d(q - 1) < n(q - 1), it follows that some Vi is not divisible by q - 1. So in
any case the sum in (5.3.12) is zero and (5.3.11) follows.
   Moreover, if f(O) = 0, then the number of non-zero roots of f = 0 is
== -1 (mod p), hence it is non-zero.                                              •
Theorem 5.3.11 (Tsen's theorem). Let k be an algebraically closed field and F a field
of functions in one variable over k. Then BF       = o.
200                                                                      Central simple algebras
Proof. F is a finite algebraic extension of the rational function field k(t). By Proposi-
tion 5.3.7 it will be enough to show that k(t) is a Cl-field. Let [(Xl, ... ,xn ) be a poly-
nomial over k(t), homogeneous of degree d < n. We shall show that [(x) = 0 has a
solution when the X are polynomials in t. Write
                                Xi   = ~iO + ~ilt + ... + ~irtr.
The coefficients of [ are rational functions of t and on multiplying [by an element
of k(t) we may take them to be polynomials in t, of degree:::: k, say. Then
Exercises
1. Show that for A = 9Jt2 (F 2 ), SKI(A) = KI(A) = C2 , the cyclic group of order 2.
2. Show that for A = 9Jt2 (F 3 ), SKI (A) = C3 , KI (A) = C6 • (Hint. Verify that the
                                      1          j      k
                                                 j      k
                                          -1     k     -j                         (5.4.1)
                                 j    j   -k    -1
                                 k    k    J     -1    -1
u 2 = a, v 2 = b, vu = -uv. (5.4.2)
u2 = a, v 2 + v = b, vu = uv + u. (5.4.3)
It is easily checked that in each case the quaternion algebra is central simple; hence by
Wedderburn's theorem, it is either a division algebra or it is split, i.e. a full 2 x 2
matrix ring over k.
    Let A be a quaternion algebra. Then any element a not in k is quadratic over k;
its equation may be written
(5.4.4)
where t(a) and n(a) are the trace and norm respectively. Explicitly, if a = t    + xu +
yv + zuv, then for char k i= 2,
                          t(a) = 2t, n(a) = t 2x 2a - /b -   z2 ab,
This is most easily seen by observing that A has an involution, i.e. an anti-
automorphism whose square is 1, al-+a, such that t(a) =a+a, n(a) =aa.
  Our first result shows that the quaternion algebras effectively include all four-
dimensional division algebras.
Theorem 5.4.1. Let k be any field. Then any central simple k-algebra A possessing a
two-dimensional splitting field is either split or a quaternion algebra.
202                                                                     Central simple algebras
As a consequence we have
Corollary 5.4.2 Frobenius' theorem, 1886). The only division algebras over the real
field are R, C and H, the Hamilton quaternions.
Proof. Let D be a division algebra over R. Since C is the only proper algebraic field
extension of R, if D i= R, C, then it must be non-commutative. Let F be a maximal
subfield of D; F is a proper extension of R, hence F ~ C and by Corollary 5.2.7, F is
a splitting field of D. Thus D is a quaternion algebra, (a, b; R) say, by Theorem 5.4.1.
If a or b is positive, it is easily seen to split, hence a, b < 0 and on dividing the basis
elements u by v' - a and v by v' - b, we reach the form ( - 1, -1; R).                  •
In general (a, b; k) may be split; conditions for this to happen are given by
where u 2 = 1, v 2 = -1. It is easily checked that this map preserves the defining rela-
tions of H and so defines a homomorphism from H to k2 • It is clearly surjective, and
so it is an isomorphism by a comparison of dimensions.
   (b) :::} (c) is clear, as is (c) :::} (d), for if n(x) i= 0, then x has an inverse, as we see
from (5.4.4).
5.5 Crossed products                                                                   203
  (d) '*(e). Ifb is a square in k, then k(Jb) = k and the conclusion follows. Other-
wise take q 'I 0 with n(q) = 0; on writing q = t + xu + yv + zuv, we have
0= n(q) = t 2 - ax 2 - by2 - abz 2, hence
                                 ~ + b( 1:.. )2 = (~ )2;
                                 a       ax        ax
changing variables, we have Z2 - by2 = a -I. Taking the basis of H to be 1, i, j, k, we
put u = zi + yk; then u 2 = az 2 - aby2 = a(z2 - by2) = 1. Thus u 2 = 1; further, we
have ju = -uj, so if v = [(1- b) + (1 + b)u]jj2b, then uv = -vu and v2 =-1.
This shows that H = (1, -1; k).                                                      •
Exercises
1. Verify directly that (1, 1; k) is split.
2. Show that in the Hamilton quaternions the equation x 2 + 1 = 0 has infinitely
   many solutions, all conjugate.
3. Show that two elements of a quaternion algebra satisfying the same irreducible
   equation either commute or are conjugate. Deduce that every quaternion of
   norm 1 is a commutator, i.e. SKI (H) = 1.
4. Show that in characteristic 2, (a, b; kl splits iff a = N(ex) for some ex E k(s;J-I(b)),
   where s;J(x) = x P - x.
5. Show that (a, b; k) is multiplicative in each factor, i.e. (a, b; k) ®
   (ai, b; k) ~ (aa ' , b; k) and similarly for the other factor. Likewise for (a, b; kl
   when char k = 2.
6. Show that if H ~ (a, b; k) is not split but is split by k(Ja ' ), then H ~ (ai, b' ; k)
   for suitable b' E k.
7. (A. A. Albert, P. K. Draxl) Show that if (ai, b' ; k) ® (a", b"; k) is similar to a
   quaternion algebra (a, b; k), then there exist e' , e", dE k such that (ai, b' ; k) ~
   (e ' , d; k) and (a", b"; k) ~ (e", d; k). Deduce that a tensor product of two
   quaternion algebras H, K is split iff H, K have a common splitting field which
   is separable quadratic over k.
   It is easy to see that every Brauer class contains a crossed product: if D is a central
division algebra, then D has a separable splitting field F, by Theorem 5.2.8, and the
normal closure E of F/k is a Galois extension of k. Let [F: k] = r, [E : F] = n; then
E ~ Dn and [E: k] = nr, [Dn : k] = n 2r2, hence E is a maximal subfield of Dn, by
Corollary 5.1.12. Thus Dn is a crossed product, though D itself need not be (because
the maximal subfield may not be Galois over k). The situation was first studied in the
1930s by Helmut Hasse, Adrian Albert and others, and it was found that every
central division algebra over Q is a crossed product, but it was only much later
that Shimshon Amitsur [1972] gave examples of central division algebras that are
not crossed products.
   Crossed products have an explicit description which is of importance (and which
accounts for the name). Let A be a crossed product, with Galois splitting field F over
k as subfield. Denote by U the group of units of A and by N the normalizer of F X
in U:
N = {u E Ulu-1Fu ~ F}.
(5.5.2)
Further, we have
where the C(J,T satisfy the factor set condition (by the associativity of N):
                                                                                     (5.5.4)
We assert that A is determined completely as right F-space with basis U(J (a E r) and
the multiplication rules (5.5.2), (5.5.3). We know that [A : k] = n 2 = [A: F] [F : k]
and [F: k] = n, hence [A: F] = n, so the dimension is correct, since there are
n = Ifl basis elements. It only remains to show that the U(J are right linearly
independent over F. If there is a non-trivial relation
(5.5.5)
let us take such a relation with the fewest non-zero coefficients. Pick pEr such that
a p =1= 0 and multiply on the left by   u;1 so as to obtain a relation (5.5.5) with al =1= O.
The left-hand side of (5.5.5) cannot consist of a single term, hence aT =1= 0 for some
T =1= 1. Let b E F be such that b r =1= b and take the commutator of (5.5.5) with b:
The coefficient of Ur is ar(b r - b) =1= 0, so this relation is non-trivial, but it has fewer
terms than (5.5.5), because the coefficient of Ul is al(b - b) = O. This contradicts
the minimality of (5.5.5) and it shows that the U(J are right F-linearly indepen-
dent. We note that this is essentially the argument of Dedekind's lemma (BA,
Lemma 7.5.1).
   Suppose now that we are given a (finite) Galois extension F/k with group r, and a
group extension N of F X by r, where r acts on F by automorphisms. Let us take a
transversal {u(J} of r in N; this determines a factor set for which (5.5.3) holds. We
define an algebra A by taking the right F-space on the U(J as basis, with multiplication
defined by (5.5.2), (5.5.3). Then we claim that A is a crossed product.
   In the first place, A is simple, for if A is a non-zero quotient, it is spanned by the
u(J (a E r) over F and u(J =1= 0 because U(J is a unit in A and so cannot map to O. Now
the same argument as before shows that the u(J are linearly independent over F, hence
the mapping L u(Ja(J 1..-+ L u(Ja(J is injective and A is simple.
   Next we note that A has centre k. For if x = L u(Ja(J lies in the centre, then
xb = bx for all bE F, so L u(Ja(J(b(J - b) = O. Hence a(J(b(J - b) = 0 for all bE F
and a E r, therefore a(J = 0 for a =1= 1, and x = Ulal E F. Now U(JX = XU(J = u(Jx(J,
hence x(J = x for all a E r, and so x E k. Thus k is the centre of A. Finally
[A : F] = [F : k] by construction, so F is a splitting field of A. This proves
Theorem 5.5.1. Any crossed product A over k with Galois splitting field F contained in
A is defined up to isomorphism by an extension N of Px by Gal(F /k) and conversely,
any such extension N defines a crossed product.                                    •
   We now examine when two factor sets define isomorphic crossed products.
Identifying our two isomorphic algebras, we have to compare two transversals
{u(J}, {u~} in our crossed product A; two such transversals are related by equations
hence the factor sets c, c' are associated (see (3.1.13)). The factor sets form a group
C under multiplication, the group of 2-cocycles, in which the bounding cocycles
form a subgroup B. These are the co cycles associated to 1:
The quotient C/B is just H2(r, PX), the second cohomology group of r with coeffi-
cients in F X , and by Theorem 5.5.1 we have a mapping
(5.5.6)
The above remarks show this mapping to be injective and its image is the relative
Brauer group B(F /k), the subgroup of Brauer classes split by F, already encountered
in Section 5.2. Since each central simple k-algebra has a separable splitting field,
contained in a Galois extension of k, it follows that Bk is a union of the B(F /k),
as F ranges over the finite Galois extensions of k.
   It remains to show that (5.5.6) is a homomorphism. To establish this fact we need
to verify that the tensor product of algebras corresponds to the Baer product of the
extensions. Take wEBb with Galois splitting field F and let BE w. Put [F : kJ = n,
[B: kJ = r2 and let V be an F-space of dimension r; then
BO ® F ~ B~ ~ Fr ~ Endp(V).
Denote the respective centralizers of BO, B'O , (B ® B')o by A, A', A"; then A ~ B,
A' ~ B', A" ~ B ® B' and so (A)(A') = (A"). Let N, N ' , Nil be the normalizers of
F X in A,A',A" respectively; to show that (5.5.6) is a homomorphism we must
prove that Nil is just the Baer product of Nand N ' . To find this Baer product,
let No N ' be the pullback of the mappings N --+ r, N ' --+ r, i.e. the subgroup of
N x N ' of elements of the form (uaa, u~{3), where a, {3 E F, Ua, u~ are trans-
versals of r in N, N ' respectively and U\ = u~ = 1 for simplicity. The set
5.5 Crossed products                                                                 207
Theorem 5.5.2. Let Flk be a finite Galois extension with group r. Then
                                                                     •
Once we have the homomorphism property, the injectivity also follows from the
                                                                                  (5.5.7)
Theorem 5.5.4. For any field k the Brauer group Bk is a torsion group. More precisely, if
w E Bk has index r, then w r = 1. Hence for an extension Flk of degree n we have
n.B(F /k) = O.
Proof. Let w E Bk have index r and take a Galois splitting field F of w, where
[F: kJ = n, say. By Theorem 5.5.2, w corresponds to an element c of H 2 (r, FX)
and it follows from Proposition 3.1.6 that the order of c and hence of w divides n,
but we want to get the sharper bound r.
   Let A E W be a crossed product with F as maximal sub field and let V be a minimal
right ideal of A; then [V : F 1 = r and we can represent A by F-linear transformations
of V. With a right F-basis VI, .•. , Vr of V we have vja = L vjajj for any a E A, or in
matrix form
                                     (v)a   = (v)a,
208                                                                Central simple algebras
where (v) = (VI, ... , vr) and cx = (CXij). In particular, if Ua 1-* Ua, we have
hence
                                                                                   (5.5.8)
where the Ua are r x r matrices over F. Now write da = det Ua and take determi-
nants in (5.5.8):
Proposition 5.5.5. For any w E Bk the index and the exponent have the same prime
factors.
Proof. Let w have index m and exponent t, so that tim, by Theorem 5.5.4. We have
to show that any prime factor p of m also divides t. Take a Galois splitting field F
of w, with group r and let S be a Sylow p-subgroup of r with fixed field E.
Then [E: kJ = (r : S) = v is prime to p, while lSI = pc>. For any A E w, the index
reduction factor from A to AE divides v (Proposition 5.2.3) and so is prime to p,
hence the index of AE is still divisible by p and it is enough to show that p also divides
the exponent. Now AE is a central simple E-algebra which does not split but which is
split by F. Since [F : EJ = pC>, its exponent is a positive power of p, as we had to
show.
  This result leads to a remarkable decomposition formula:
Theorem 5.5.6. Any central division algebra D of degree m = ql ... q" where the qi are
powers of distinct primes, has the decomposition
(5.5.9)
Proof. The class (D) has exponent n = q; ... q~, where q;lqi and by Proposition
5.5.5, q;> 1. By the basis theorem for abelian groups, (D) can be written as a
product of classes which are powers of (D) with prime power exponent. Let D(i)
be a division algebra similar to a power of D with exponent q;; then
By Proposition 5.5.5 the D(i) have coprime degrees and by Proposition 5.2.10 we
have a division algebra on the left, hence the two sides are isomorphic.     •
Proposition 5.5.7. Every central simple algebra is a tensor product of primary algebras.
The primary k-algebras are VJlp(k), where p is a prime, and certain division algebras of
prime power degree.                                                                  •
  A division algebra of prime power degree is not necessarily primary, though this
does hold over an algebraic number field.
Exercises
1. Let Flk be a finite Galois extension with group r of order n. Show that
   F ®k kr ~ kn • (Hint. Use a normal basis for Flk.)
2. Let k be a perfect field of prime characteristic p and D a central division algebra.
   Show that Deg D is prime to p. (Hint. Use Theorem 5.2.8 to show that D contains
   no proper extension of degree p over k.) Deduce that Bk has trivial p-component.
3. Show that every division k-algebra has a splitting field which is a tensor product of
   extensions of k with prime power degrees.
4. Let G be a group whose centre Z is free abelian and of finite index n in G. By
   constructing a suitable crossed product with group GIZ, show that G can be
   embedded in a division algebra of degree n.
5. Let A, B be central division k-algebras that are crossed products with groups G, H.
   Show that if A ® B is a division algebra, then it is a crossed product with group
   GxH.
Proposition 5.6.1. Let Flk be a Galois extension and E any field extension of k, where
E, F are both contained in the same field. The EFIE is Galois, with group isomorphic to
the subgroup of Gal(FIk) corresponding to En F.
Proof. This is essentially a translation of the parallelogram rule applied to the Galois
groups. The isomorphism is obtained by taking an automorphism of EFIE and
restricting it to F; this provides an isomorphism with Gal(F IE n F).                •
   In the above situation let us write G = Gal(F Ik) and denote by H the subgroup
leaving E n F fixed, so that H ~ Gal(EF IE). Any factor set {c} : G x G --+ F x when
restricted to H yields a factor set {c ' } : H x H --+ (EF) x • It is clear that a split factor
set has a split restriction, hence the inclusion H ~ G gives rise to a homomorphism,
the restriction
(5.6.1)
In what follows we shall write (F Ik, c) for the crossed product over Flk with factor
set {c}.
Theorem 5.6.2 (Restriction theorem). Let Flk be a finite Galois extension with group
G = Gal(F Ik), let Elk be any extension (within a field containing F) and let H be the
subgroup corresponding to K = En F, so that H ~ Gal(EFI E). Then
                                  (Flk,   C)E   rv   (EF/E, c' ),                        (5.6.2)
                                 H2(G, F) ~ H 2(H, F)
                                        -l-                 -l-
                                    B(Flk)              B(FIK)
                                    K' ® kr ~ A ®k K,
and this will establish (5.6.3) if we can show that K'                ~   (F/K, c'). In A take
x = L uaaa (aa E F); we have x E K' iff xy = yx for all y E K, i.e. L uaaay =
Lyuaaa = L uayaaa. Thus we must have aa = 0 whenever ya i= y for some y, i.e.
when (J f/. Gal(F/K) = H. This shows that K' = {L uaaalaa E F, (J E H} and
(5.6.3) follows.
5.6 Change of base field                                                            211
Theorem 5.6.3 (Inflation theorem). Let k ~ K ~ F, where Flk, Klk are Galois exten-
sions, G = Gal(F Ik) and N is the (normal) subgroup of G corresponding to K.
   Given any factor set {c} on GIN and the corresponding factor set {c} on G derived by
the inflation rule (5.6.5), then
                     (Flk, c)   ~   (Klk, c) ® k"      where r = [F: KJ.        (5.6.6)
ei X =L tij(x)ej for x E F.
On writing e = (el, ... , er)T, we can express this equation in matrix form as
                            ex = T(x)e,           where T(x) E Kr •             (5.6.7)
where we have replaced T by r in the action on P because the latter has entries in K.
Applying a to (5.6.7), we find eUx U = T(x)UeU, i.e. PuT(x U) = T(x)Upu and again
T(x) has entries in K, so that
                                                                              (5.6.lO)
212                                                               Central simple algebras
                              L v"a" ~ L uijP"T(a,,).
For the proof we need only verify the conditions on v,,; using (5.6.10), we have
  We note that the natural homomorphism G --+ G/N induces the inflation homo-
morphism H2(G/N, KX) --+ H2(G, PX) and Theorem 5.6.3 can be expressed as a
commutative square, which together with the previous ones gives the commutative
diagram
                       ,j,                 ,j,               t
                                 inc
                0--+ B(K/k)     ~      B(F/k)            B(P/K)
The bottom row is easily seen to be exact: a central simple k-algebra split by P will
split as K-algebra iff it is split by K. Hence the top row is also exact.
   As a third operation we have the corestriction (or transfer), which for k S; F
provides a homomorphism Bp --+ Bk.
   Let B be an P-algebra with a finite group G of automorphisms such that each
element of G other than 1 restricts to a non-trivial automorphism of F. As usual
we write BG and p G for the subset fixed by G. Given a k-algebra A, if B = Ap,
where k = P G, then B G = A, as is easily verified.
  We begin by showing that BG can be expressed in terms of the trace, where for
bE B we define tr b = LerEG ber and tr B = {tr bib E B}.
Lemma 5.6.5. Let B be an P-algebra with a finite group of automorphisms which induce
distinct automorphisms on P. Then tr B coincides with the fixed algebra BG and if
FG = k, then
                                                                                (5.6.12)
5.6 Change of base field                                                                                        213
Proof. Clearly tr B ~ BG and both tr B, BG are vector spaces over k. Let C be the
F-space spanned by tr B; if C c B, then there is an F-linear functional cp : B -+ F such
that cp( C) = 0 but cp =1= 0, say cp( u) =1= o. For any a E F we have
Given any separable extension Flk, let Elk be a Galois extension containing Flk, with
group G, and denote by H the subgroup corresponding to F. Put n = [F: k] =
(G: H) and let al,"" an be a transversal of H in G: G = UHai. Then for any
F-algebra B the corestriction from F to k is defined as follows. Put
and define a G-action on B(G:H) by writing, for any a E G, aia = ti(a)ai', where
ti(a) E Hand i 1-+ i' is a permutation of 1, ... , n determined by a. Now put
(5.6.14)
To check that this indeed defines a G-action, let aia                       = ti(a)ai"   ai'i   = t;-(i)ai*. Then
aiai = ti(a)ti'(i)ai* and
 oi=1
        ((b i ® ait)' =    0
                           i=1
                                 (hi' ® a:i,(a)), =   0
                                                      i=1
                                                               (hi' ® a:~(a)t.,(,))=            0
                                                                                                i=1
                                                                                                      (hi ® ait'.
214                                                                    Central simple algebras
(5.6.15)
Proposition 5.6.6. Let P/k be a separable extension of degree nand B a central simple
P-algebra. Then corp/kB is a central simple k-algebra which depends only on P/k, not on
the Galois extension or the choice of transversal. Moreover, the correspondence corp /k is
a homomorphism from Bp to Bk.
Proof. If B is any central simple P-algebra, then e = B (C:H) is simple with centre E,
by Corollary 5.1.3. Hence e C has centre k; now any ideal of e G gives rise to an ideal
of e, and the simplicity of the latter shows e G to be simple.
   Now let Nbe a normal subgroup of G contained in H and let L be the correspond-
ing subfield; thus P <; L <; E and Uk is Galois, with group GIN. The transversal
ai, ... , an of H in G is still a transversal of if = HIN in G= GIN. Suppose that
A = corp/k(B) is formed as above, going via E; going via L we obtain e = B(c:if),
and it is clear that e = (B(G:H))N. Therefore
so we reach the same algebra going via L. Given any two extensions L 1 , L2 of P, we
can find a Galois extension Elk containing both, and the above argument shows that
using L1 or L2 we obtain the same algebra corp/k(B) as if we had used E, hence
all three cases give the same result. Further, suppose that B = B' 0 B", write
e = B(G:H) and define e', e" similarly in terms of B', B". Then by the associativity
and commutativity of the tensor product, e = e' 0 e", hence e G = e'G 0 C"G and
this shows the corestriction to be a homomorphism.                                •
   For our next result we note that if K is any commutative ring and E is a commu-
tative K-algebra, then for any K-modules U, V, writing again UE = U 0K E etc., we
have the K-module isomorphism
(5.6.17)
Proposition 5.6.7. Let P/k be a separable extension of degree n and let A be a central
simple k-algebra. Then
                                                                                    (5.6.18)
5.7 Cyclic algebras                                                                  215
Proof. Take a Galois extension Elk containing F, with group G and subgroup H cor-
responding to F. We have Ap ®P E ~ A ®k E = AE andA~ = A ® E a , for any a E G,
hence
Exercises
1. Let Flk be a finite Galois extension. Show that for any k-algebra A, tr Ap = A.
2. Given a field F of characteristic p =I- 0, an F-algebra B and a group G of auto-
   morphisms of B inducing distinct automorphisms of F, show that the order of
   G is prime to p.
3. Show that for a central simple F-algebra B, if [F: k] = n, then [cor B : k] =
   [B:F]n.
4. Let A be a central simple k-algebra of degree pr, where p is prime. Show that A is
   similar to a crossed product of degree ps, for some s. (Hint. Take a Galois splitting
   field and a subfield corresponding to a Sylow p-subgroup.) What can be said
   about the relations of rand s?
                            ..
                           u'u J =
                                     {U i +..
                                           j    if i   +j   < n,
                                                                                  (5.7.1)
                                      au'+J-n   if   i +j   ~ n.
Proposition 5.7.1. Let F/k be a cyclic Galois extension of degree n. Two cyclic algebras
(F /k, a, a) and (F /k, a, (3) are isomorphic if and only if f3/a = Np/k(C), where c E F.
In particular, (F/k, a, a) splits precisely when a = Np/k(C) (c E F).
216                                                                   Central simple algebras
   This condition is more briefly expressed by saying that a (resp. fJ/a) is a norm
from F to k.
Proof. Assume that (F /k, a, a) ~ (F /k, a, fJ); then the canonical generators u, v are
related by an equation v = ue, where e E F. Hence vi = (ue)i = uieeo- ... eo-i~l; for
i = n we find v n = (ue)n = unN(e), i.e. N(e) = fJ/a. Conversely, if fJ/a = N(e),
then the same calculation shows that (ue)n = fJ, so that ue is a canonical generator
for the isomorphic algebra (F /k, a, fJ).                                            •
   For example, (F /k, a, 1) ~ kn (by Proposition 5.5.3); this algebra can be realized
as endomorphism ring of F = P, e.g. by taking a normal basis in F. Then a acts by
cyclic permutation of the coordinates.
   The above presentation of a cyclic k-algebra A only provides a basis over F, but
frequently one needs to have a basis over k. Such a basis takes a simple form if k con-
tains a primitive n-th root of 1, say w. Then a k-basis for A can be formed as follows.
The splitting field F is of the form F = k(v), where v n = fJ E k. Taking u E A such
that u-1vu = wv, we have un = a E k and so A has the k-basis uiv j (i,j = 1, ... ,
n - 1) with the defining relations
By a symbol (a, fJ; k)n or (a, fJ)n one understands a cyclic algebra over k with the
presentation (5.7.2). We note the following consequence of (5.7.2):
Corollary 5.7.2. Let k be a field containing a primitive n-th root of 1. Then a symbol
(a,   fJ; k)n splits if and only if a is a norm from k(fJl/n) to k.                       •
   There remains the case when char k divides the degree of the algebra. We shall
only consider the case of a cyclic algebra A of prime degree p over a field k of
characteristic p. A Galois splitting field F of degree p over k contains an element v
such that v P - v = fJ E k and u such that u-1vu = v + 1. Hence uP = a E k and
so A has the basis uiv j (i,j = 0, 1, ... ,p - 1) with the defining relations
n (x - i) = x P - x;
Exercises
1. Show that (a,  fJ; k)n splits if it contains an element x such that 1, x, ... ,x n- 1 are
   linearly independent and xn is an n-th power in k.
2. Show that if (F /k, a, a) is cyclic of degree n, then xn - a is irreducible over k and
   D contains a maximal subfield generated by a root of xn = a.
3. Prove that (F/k, a, a) @ (F/k, a, fJ) '" (F/k, a, afJ).
4. Show that if (F /k, a, a) has degree n, then for any r prime to n, (F /k, aT, aT) 3:'
   F/k, a, a).
5. Show that (F /k, a, a) has exponent e, where e is the least number for which a e is a
   norm from F.
6. (Wedderburn) Show that a cyclic algebra (F/k, a, a) of degree n is a division
   algebra if an is the least power of a which is a norm.
7. Let k ~ F ~ E, where Elk is cyclic of degree nand [E: F] = d. Show that if
   Gal(E/k) is generated by a and alF = a, then (F /k, a, a) '" (F /k, a, ad).
8. With the notation of Theorem 5.6.2, show that if Elk is cyclic, then
   (E/k, a, a)p '" (EF/F, aT, a), where r = [E n F: k].
9. Let k be a field with a primitive n-th root of 1. Show that if A = (F /k, a, a) is a
   cyclic division algebra of degree n2 , then A can also be represented as a crossed
   product with group Cn x Cn.
 3. (A. A. Albert) Let D be a skew field which is totally ordered (BA, Section 8.8),
    and suppose that D is algebraic over its centre k. Show that the conjugates of
    any positive element are again positive. Deduce that the sum of the conjugates
    of any non-zero element cannot be zero, and hence prove that D must be com-
    mutative. (Hint. Use Exercise 9 of Section 5.1.)
 4. (Kharchenko) Let A be a central simple k-algebra of finite degree. By regarding
    A as a right N-module show that every k-linear mapping of A into itself has the
    form f: x 1-+ L aixbi (ai, bi E A). Deduce the existence of a non-constant
    central polynomial, i.e. a polynomial with values in k (see Section 7.7 below).
 5. Let F/k be a field extension. Show that there is an exact sequence
                                                      f
                                0-+ B(F/k) -+ Bk ---+ Bp .
    Identify coker f in case F/k is a Galois extension.
 6. (J.-P. Serre) Suppose that in a central division k-algebra D of degree n every
    extension is p-radical. Show that xn E k for all xED. By extension to a splitting
    field obtain a contradiction, and hence give another proof of Theorem 5.2.8.
 7. Let A, B be crossed products with factor sets (a), (b) respectively. Given a Galois
    splitting field F for both A and B, write F = k(O), put P = A ® Band
e= n (0 ® 1 - 1 ® OrJ)/(O - OrJ) ® 1,
      where the product is taken over all a =11 in Gal(F/k). Verify that for the mini-
      mum polynomialf of 0 over k, f(O ® 1) = f(O) ® 1 = 0 and (c ® l)e = (1 ® c)e
      for all c E F. Hence show that e is idempotent and that ePe is a crossed product
      with factor set (a)(b).
 8.   Let F = k(.jc) (char k =12). Show that there is a central division k-algebra with
      F as maximal sub field iff the form x 2 + cy2 is not universal over k (i.e. it does not
      represent every a E P). Similarly in characteristic 2, if F is generated over k by a
      root of x 2 + x + c = 0, the same holds iff x 2 + xy + cl is not universal.
 9.   Show that SL 2 (F 3 ) has order 24 but is not isomorphic to Sym4. (Hint. Show that
      its derived group is the quaternion group. It is known as the binary tetrahedral
      group.)
10.   The Hilbert norm residue symbol (a, b) p is defined to be 1 or -1 according as
      (a, b; Qp) does or does not split, where Qp is the p-adic field when p is prime and
      R when p = 00. Using Proposition 5.4.3(e), show that for fixed band p the a
      with (a, b)p = 1 form a group under multiplication. (Note that the law of quad-
      ratic reciprocity, BA, Chapter 7, Further Exercise 23, may be expressed as
      n   (a, b)p = I, where the product is taken over all primes and over p = 00.)
11.   Let k be a field with a primitive n-th root of 1 and F 2 k. Show that for any
      a E k, {3 E F, corp/k(a, {3; F)n'V   (a, N p/k({3); k)n.
12.   Let A be a central simple k-algebra of index pCim, where p is a prime not dividing
      m and a ::: 1. Show that there is a separable extension F of degree prime to p
      over k such that Ap has index pci.
13.   Show that the Brauer classes split by a cyclic extension F/k of degree n form a
      group H 2 (C n , FX) <::::! P /Np/k(PX).
5.7 Cyclic algebras                                                            219
14. Show that if x, yare regular elements over a field of prime characteristic p,
    satisfying xy - yx = 1, then (xy)P = xPyP + xy; deduce further that (xy)P-l =
    yP-1x p - t   + 1.
15. Let D be a central division k-algebra of prime degree p. If D has a maximal
    subfield E not its own normalizer in D X , show that Elk is Galois and deduce
    that D is a crossed product.
16. (L. E. Dickson) Let D be a central division k-algebra of degree 3 and suppose
    Ut E D\k has the minimal polynomial (x - ud(x - U2)(X - U3). Find v E D
    such that UjV = VUj+l (i mod 3) and show that either k(v) or k(utv) is not its
    own normalizer. Deduce that D is a crossed product. (Hint. Try a quadratic
    polynomial in the u's for v.)
17. Show that every central division algebra of degree 6 is cyclic.
                         Representation theory of
                         finite groups
Although much of the theory of finite-dimensional algebras had its origins in the
theory of group representations, it seems simpler nowadays to develop the theory
of algebras first and then use it to give an account of group representations. This
theory has been a powerful tool in the study of groups, especially the modular
theory (representations over a field of finite characteristic), which has played a key
role in the classification of finite simple groups. The theory also has important appli-
cations to physics: quantum mechanics describes physical systems by means of states
which are represented by vectors in Hilbert space (infinite-dimensional complete
unitary space). Any group which may act on the system, such as the rotation
group or a permutation group of the constituent particles, acts by unitary trans-
formations on this Hilbert space and any finite-dimensional subspace admitting
the group leads to a representation of the group. If we know the irreducible repre-
sentations of our group, this will often allow us to classify these spaces
   Of course an introductory chapter like the present one is not the place to develop
modular representations, nor the applications to physics. The plan of the chapter is
as follows. The first four sections give a concise account of the theory based on the
Wedderburn theorems (Chapter 5 of BA), including the basic results on ortho-
gonality and completeness (Section 6.3) and in Section 6.4 we explain the role of
characters. Some simplifications can be made over the complex numbers and they
are described in Section 6.5. The rest of the chapter deals with representations and
characters of the symmetric group in Section 6.6, and in Section 6.7 describes
induced representations, an important technique which is illustrated in Section 6.8
by the theorems of Burnside and Frobenius.
(6.1.1)
221
where GLd(k) is the general linear group of degree dover k, i.e. the group of all inver-
tible d x d matrices over k. Thus we have a mapping x 1---+ p(x) such that
Since each matrix p(x) is invertible, we have p(l) = I, where 1 is the neutral element
of G, and p(x- I ) = p(x) -I. The integer d is called the degree of the representation.
  For example, to find a representation of the cyclic group C 3 = {I, t, t 2 } over R of
(6.1.3)
Every group has the trivial representation, obtained by mapping each element of G
to 1.
  At the other extreme we have the faithful representations, defined as homo-
morphisms with trivial kernel; e.g. (6.1.3) is a faithful representation of C3 .
   Two representations p, a of a group G are said to be equivalent, if they have the
same degree, d say, and there exists P E GLd(k) such that
It is clear that this is indeed an equivalence relation on the set of all representations
of G. For example, if w is a primitive cube root of 1, then
               (0 -1)( 1 1) (1 1)(W 0)
                 1   -1       -w      _w2     -    -w       _w2             0   w2   '
where Pij(x) E k, and it is easily checked that the matrices p(x) = (Pi/X)) form a
representation of G; we shall say that the G-module V with the basis VI, ..• , Vd
affords the representation p. Conversely, given a representation P = (Pij) of G of
degree d and a d-dimensional vector space V over k, we can turn V into a G-
module by defining the action of x E G on a basis VI, ... , Vd by (6.l. 7) and generally
putting
The verification that this provides a G-module is straightforward and may be left to
the reader.
   To see that the operations of passing between representations and modules are
mutually inverse we need to examine the effect of a change of basis on the represen-
tation. Let V be a G-module affording the representation P relative to a basis
VI, ... , Vd. Thus the equations (6.l.7) hold, which may be written concisely as
VX = p(x)v, (6.l.8)
where V = (VI, ... , Vd) T stands for the column of basis vectors VI, ... , Vd. Suppose
that u = (UI, ... , Ud)T is a second basis of V, affording the representation cr, so that
                                      UX = cr(x)u.                                (6.l.9)
thus p and cr are equivalent, and what we have shown is that different bases of a
G-module afford equivalent representations. Moreover, since P may be any invertible
matrix, we see that representations of G that are equivalent are afforded by the same
G-module, for suitable bases. Further, if two G-modules afford the same representa-
tion, they must be isomorphic. For take the modules to be V, W with bases
V = (VI, ... , Vd)T, W = (WI, ... , Wd)T and let
vx = p(x)v, wx = p(x)w.
Proposition 6.1.1. For any group G there is a natural bijection between the sets of
equivalence classes of representations and isomorphism classes of G-modules.    •
                              p(x) =
                                       (
                                           PI(X)
                                           O(x)
                                                    0)
                                                   p"(x) .                         (6.1.12)
We note that p'(X) is a representation afforded by V', while p" is afforded by the
                                                                      u
quotient module V/V ' relative to the basis Vt+ I, ... Vd, where denotes the residue
class of u. Both pi and p" are sometimes called subrepresentations of p.
   We note that if the basis of V is chosen so that the last t members form a basis of
V' (instead of the first t), then p takes the form
                                           p'(X)   O(x) )
                              p(x) = (
                                            o      p"(x)    .
                               p(x)   =(
                                             PI(X)
                                               0
                                                        0)
                                                      pl/(x)    ,
                                      *
If P is completely reducible, i.e. we can find an equivalent representation of the form
(6.1.13) in which * = 0, this means that the corresponding G-module is a direct sum
of simple modules, i.e. semisimple.
  Just as for modules over a ring we can define left G-modules; they are vector spaces
V with a G-action v I~ xv such that
                                x(yv) = (xy)v,        Iv = v.
However, any such left G-module may be regarded as a right G-module by defining
v.x=X-1V(XEG). For we have v.(xy) = (xy)-lV=(y-1X-1)V=y-l(X-1V) =
y-l(V.X) = (v.x).y.
  In terms of the group algebra this may be expressed as follows: the group algebra
kG has an anti-automorphism +, i.e. a linear mapping satisfying (ab) + = b+a+,
given by
(6.1.14)
Now any left kG-module becomes a right kG-module on putting v.a = a+v.
  We remark that the mapping defined by (6.1.14) has the property a++ = a.
An antiautomorphism of order two is called an involution, thus kG is an algebra
with an involution.
Exercises
1. Let G be a group and consider the regular representation of kG (defined by right
   multiplication). Show that this representation always has the trivial representation
   x I~ 1 as a subrepresentation.
2. Let F be a field containing a primitive n-th root of 1, say w (hence of characteristic
   o or prime to n) and let C n be the cyclic group of order n, with generator t. Show
   that Pk : t r I~ wkr is a representation of degree 1 of Cn for k = 0, 1, ... , n - 1.
226                                                            Representation theory of finite groups
Theorem 6.2.1 (Maschke's theorem, 1899). Let G be a finite group and k a field of
characteristic 0 or prime to the order of G. Then every representation of Gover k is
completely reducible.
Proof. Let p be a representation of G and suppose that p is reduced:
                                  p(x) =
                                             ( P'(X)
                                               B(x)
                                                               0)
                                                           p"(x)    ,
                                                                                             (6.2.1)
where pi, p" are subrepresentations of degrees d', d" respectively. To establish com-
plete reducibility it will be enough to find a d" x d' matrix I-t such that
                ( P'(X)
                  B(x)
                            0 )(1 0)
                          p"(x)      /L      I   -
                                                     (I O)(P'(X)
                                                          /L   I        0
                                                                               0)
                                                                            p"(x)'
When we multiply out, only the (2, I)-block gives anything new:
                                  B(x)    = I-tP' (x) -    p" (x)/L,                         (6.2.2)
6.2 The averaging lemma and Maschke's theorem                                           227
and we shall complete the proof by finding a matrix JL to satisfy this equation. By
substituting from (6.2.1) in the relation p(xy) = p(x)p(y) we obtain the following
equation for e(x):
                                 e(xy) = e(x)p'(y)       + pl/(x)e(y).              (6.2.3)
                        me(x)     = Le(x)p'(y)p'(y-l)
                                      y
                                  = L[e(xy) - pl/(x)e(y)]p'(y-l).
                                      y
Put z = xy; then y - I = Z - I x, and as y runs over G, so does z, for fixed x. Hence we
can rewrite this last sum as
                       Le(z)p'(z-lx) - Lpl/(x)e(y)p'(y-l),
                        z                        y
so
z y
   In view of its importance we shall give a second proof of this result, or rather,
restate the same proof in module terms. The essential step is a lemma which is
also used elsewhere, but first we shall need to introduce some notation. If U, V
are G-modules over k and a is a mapping from U to V, we shall write
a : U -+ kV, a : U -+ G V to indicate that a is k-linear or a G-homomorphism
respectively. The space of all k-linear mappings from U to V is denoted by
Homk(U, V) and the subspace of G-homomorphisms by HomdU, V).
   In the next lemma we shall (exceptionally) write mappings between right G-
modules on the right, so that for a : U -+ k V the condition for a G-homomorphism
is that
                            (ux)a = (ua)x      for all u E U, x E G.
Lemma 6.2.2 (Averaging lemma). Let G be a finite group and k a field of characteristic
o or prime to IGI.   Given any two G-modules U, V and a : U -+ kV, the mapping
Proof. Let us fix a E G and write y = xa, x = ya -1. Then as one of x, y runs over G,
so does the other. Now for a : U -+ k V we have
Theorem 6.2.3 (Maschke's theorem, form 2). Let G be a finite group and k a field of
characteristic a or prime to IGI. Then kG is semisimple.
Proof. We shall show that every (finite-dimensional) G-module is semisimple, or
equivalently, that every short exact sequence of G-modules
                                             f3
                            o -+ Vi ---*
                                      a
                                         V ---* V" -+ O.                                 (6.2.6)
splits. Such a sequence certainly splits as a sequence of k-spaces, for this just means
that V' as k-subspace of V has a vector space complement. Thus we have a k-linear
splitting map y: V -+ V'. We have ay = Iv,; therefore 1 = 1* = (ay)* = ay*, and
so y* is the desired G-homomorphism splitting the sequence (6.2.6).                  •
Exercises
1. Let G be a finite group and Va finite-dimensional G-module over a field of char-
   acteristic prime to IG I. Show that if G acts trivially on every simple composition
   factor of V then the G-action on V is trivial.
2. Show that for any (finite-dimensional) left G-modules U, V, Homk(U, k) ®k V ~
   Homk(U, V).
3. For G = Cp = gp{tlt P = I} and k = Fp define a two-dimensional space V with
   basis VI, V2 as G-module by Vlt = VI + V2, V2t = V2' Verify that V is not semi-
   simple; calculate the corresponding representation.
4. Show that the infinite cyclic group has, over a field of characteristic 0, a faithful
   two-dimensional representation which is not completely reducible.
5. Let G be a finite group and k a field of characteristic dividing IGI. Show that the
   element z = Lxx is central and nilpotent in kG. Deduce that kG is not semi-
   simple.
6.3 Orthogonality and completeness                                                     229
6. Let k be a field of characteristic p and G a finite group. Show that for any element
   g of p-power order, g - 1 is nilpotent in kG. Deduce that for a finite p-group G
   the radical of kG is the augmentation ideal. (Hint. Find a basis of nilpotent
   elements for the radical and use Theorem 5.5.4.) Deduce further that kG is
   completely primary.
Lemma 6.3.1 (Schur's lemma). Let R be any ring and U, V two simple R-modules.
Then
(i) HomR(U, V) = 0 unless U ~ V,
(ii) EndR ( U) is a skew field.
Proof. (i) Iff: U --+ V is a non-zero homomorphism, then ker f is a proper sub-
module of U, and hence is 0, while im f is a non-zero submodule of V and so is
equal to V. Thus f is an isomorphism, as claimed.
   (ii) When V = U, this argument shows that every non-zero endomorphism of U
is an automorphism, and (ii) follows.                                        •
  When the ground field k is algebraically closed, every matrix over k has an eigen-
value, so for each automorphism f of U there exists A E k such that f - A.l is non-
invertible, and hence zero, i.e. f = A.1. This proves the following sharper form of
Lemma 6.3.1:
Lemma 6.3.2 (Schur's lemma for algebraically closed fields). Let k be an algebrai-
cally closed field, A a k-algebra and U, V any two simple A-modules, finite-dimensional
over k. Then
                           HomA(U, V)       = {k
                                               o
                                                   if U ~ V
                                                   otherwise.          •           (6.3.1)
Let G be a finite group; we shall define an inner product on the group algebra kG by
the rule
(6.3.2)
It is clear that this product is bilinear; it is not symmetric, but satisfies the equation
where. is the involution defined by (6.1.14). The product is regular, i.e. non-
singular: if (f, x) = 0 for all x E G, then f(x- I ) = 0 for all x E G and so f = O. Of
course from the point of view of the inner product (6.3.2) the multiplication on kG is
immaterial, and kG may be thought of as the space of all k-valued functions on G.
   Our next aim is to show that the different representation coefficients, regarded as
functions on G, are orthogonal.
Thus different representation coefficients are orthogonal. We note that the alterna-
tives on the right of (6.3.4) are not exhaustive: the representations p, (1 may be
equivalent but distinct. In that case (6.3.4) will not apply (but of course we can
use (6.3.4) even then, after transforming one of p, (1 into the other).
Proof. Take spaces U, V affording p, (1 with bases u], ... , U"                VI,""    Vd   and let
ajp : U ~ k V be the linear mapping defined by
                            IGI.(ajp)iq = L      Pih(X)8hj8pr(1rq(X)
                                         h.r,x
                                      = LPij(X- I )(1pq(x).
                                           x
If p, (1 are inequivalent, then ajp = 0 by Lemma 6.3.2, and this proves the first line
of (6.3.4).
   Next we take P = (1. By Lemma 6.3.2, ajp = Ajp E k, hence we have
By the hypothesis on k we can divide by IGI, hence d -:f. 0 in k and Ajp = d- 18jp.
Inserting this value in (6.3.6) we obtain the second line of (6.3.4).           •
6.3 Orthogonality and completeness                                                    231
   To illustrate this result, let us take the trivial representation for a. Then a(x) = 1
for all x E G and we find that every non-trivial irreducible representation P of G
satisfies
this shows the Pij to be a linearly independent set of functions; in particular their
number cannot exceed dim kG = IG I. Hence we have
• (6.3.8)
Our next task is to show that equality holds in (6.3.8) if we take enough represen-
tations. This means that every k-valued function on G can be written as a linear
combination of irreducible representation coefficients; this is expressed by saying
that these coefficients form a complete system of functions on G.
   To see this, let us go back to the group algebra kG. We have seen that this is semi-
simple, hence a direct product of full matrix rings over skew fields. But as we saw, the
only skew field finite-dimensional over k is k itself, because k is algebraically closed.
Thus kG is a direct product of full matrix rings over k:
kG ~ n5
                                            i=1
                                                  Md;(k).                         (6.3.9)
(6.3.10)
Moreover, a comparison with (6.3.8) shows that the set of representations provided
by (6.3.9) is complete, so s = t. Of course we can also take the regular representation
of G, i.e. we take kG as G-module under right multiplication by G. Each irreducible
representation Pi occurs di times, representing the di rows of the corresponding
matrix. Thus we again obtain the equation (6.3.10).
232                                                  Representation theory of finite groups
                                 (u ® v) g   = ug ® vg.
Since the right-hand side is bilinear in u and v, this defines an action and U ® V is
easily verified to be a G-module. If the representations afforded by U, V are p, a
relative to bases UI, •.. , Urn, VI, ... , Vn respectively, then
                                  Pi ® Pj   =L    gijkPb                             (6.3.11)
                                              k
where the gijk are non-negative integers, indicating how often a given representation
Pk occurs in the tensor product.
  For each representation P of G we define its kernel as
                                K   = {x E Glp(x) = I}.
Thus K is a normal subgroup of G and G has a faithful irreducible representation iff
some irreducible representation of G has a trivial kernel. This need not be the case,
but at any rate we have
Theorem 6.3.5. For any finite group G the intersection of the kernels of all the
irreducible representations over a field of characteristic zero or prime to   IGI   is trivial.
Proof. The regular representation of G is faithful; since it is a direct sum of
irreducible representations, the conclusion follows.                         •
Exercises
1. Find all irreducible representations of Sym3 by reducing the regular representation.
2. Show that for any representation P of a finite group G the set N =
   {x E Gldet p(x) = I} is a normal subgroup of G with cyclic quotient.
3. Let G be a finite group, k an algebraically closed field of characteristic 0 and p, a
   inequivalent irreducible representations of G of degrees c, d. Show that for any
   c x d matrix T we have      Lx   p(x- I ) Ta(x) = O. Further show that for a d x d
   matrix P we have    Lx  a(x- I )Pa(x) = d-1·IGI.Tr(P).I.
4. Show that if PI, ... , Pr are irreducible pairwise inequivalent representations of a
   group and P = Ef)ciPi, then the centralizer of P has dimension L cf. Use this fact
   to obtain another proof of (6.3.10).
6.4 Characters                                                                       233
6.4 Characters
A one-dimensional representation is also called a linear character or simply a
character if we are dealing with an abelian group. Such characters have already
been discussed in section Section 4.9 ofBA. We recall that an irreducible representa-
tion of an abelian group over C (an algebraically closed field) is necessarily one-
dimensional, by Schur's lemma. For a non-abelian group there will always be
irreducible representations of degrees greater than 1 (see Proposition 6.4.2 below)
and the definition then runs as follows. Given any representation p of a group G
over C, its character is defined as
                               X(x) = tr p(x),    x   E   G,                     (6.4.1)
where tr denotes the trace of the matrix p(x); thus if p(x) = (Pi/X)), then
tr p(x) = Li Pii(X). When X and p are related as in (6.4.1), p is said to afford the
character X. For example, any representation of degree 1 is its own character; in
particular, the function XI (x) = 1 for all x EGis the character afforded by the trivial
representation" and is called the trivial or also the principal character.
   Some obvious properties of characters are collected in
The degree of X is x(1) and for any x E G of order n, X(x) is a sum of n-th roots of 1.
   If PI> P2 are any representations with the characters XI, X2 then the characters
afforded by PI $ P2 and rl ® r2 are XI + X2 and XIX2 respectively.
234                                                            Representation theory of finite groups
Proof. Let X be the character of P; any equivalent representation has the form
T- 1p(x)T and since tr(BA) = tr(AB) for any square matrices of the same size, we
have
so both p and T- 1pT afford the same character. For the same reason,
tr p(y-1xy) = tr(p(y) -1 p(x)p(y» = tr p(x), and (6.4.2) follows. x(1) equals the
degree because char C = 0, while A = p(x) satisfies An = I if xn = 1. Thus A satisfies
an equation with distinct roots and so can be transformed to diagonal form over C;
its diagonal elements A again satisfy An = I, so they are n-th roots of 1, and X(x) is
the sum of these diagonal elements.
The final assertion follows because tr(A EB B) = tr A + tr Band tr(A ® B) =
tr A.tr B.                                                                          •
  The next result may be regarded as a generalization of the fact that a finite abelian
group is isomorphic to its dual.
Proposition 6.4.2. For any finite group G the number of linear characters is (G : G'),
where G' is the derived group. Hence every non-abelian group has irreducible represen-
tations of degree greater than 1.
Proof. Every homomorphism ex : G -+ C x corresponds to a homomorphism from
GIG' to C X and conversely. But we know from Theorem 4.9.1 of BA that the
number of such homomorphisms is IGIG'I = (G: G'), so the result follows by
(6.3.10).                                                                •
  In Section 6.3 we defined an inner product on kG; we shall now see how in the case
of k = C we can define a hermitian inner product on CG. Let us put
Since every character ex is a sum of roots of I, we have &(x)                = ex(x- 1 ). Hence for
characters the formula (6.4.3) can also be written
so in this case it agrees with the inner product introduced in Section 6.3. From the
orthogonality relations in Theorem 6.3.3 we obtain the following orthogonality
relations for irreducible characters, by putting j = i, q = p in (6.3.4) and summing
over i and p:
The inner product (6.4.4) can also be expressed directly in terms of the modules
affording the representation:
Proposition 6.4.3. Let G be a finite group and U, V any G-modules over an algebrai-
cally closed field k of characteristic 0, affording representations with characters Ct, fJ
respectively. Then
Proof. Suppose first that U, V are simple. Then by Lemma 6.3.2 the right-hand side
of (6.4.6) is 1 or 0 according as U, V are or are not isomorphic, and this is just the
value of the left, by (6.4.5). Now the general case follows because every G-module is a
direct sum of simple modules.                                                       •
   The number on the right of (6.4.6) is also called the intertwining number of U
and V.
   Above we have found the character of the regular representation, and it is not
difficult to obtain an explicit expression for it. Sometimes we shall want an expres-
sion for the character of the representation afforded by a given right ideal of the
group algebra. Since the latter is semisimple, each right ideal is generated by an
idempotent, and the next result expresses the character in terms of this idempotent.
Proposition 6.4.4. Let G be a finite group, A = kG its group algebra and I = eA a right
ideal in A, with idempotent generator e = L e(x)x. Then the character afforded by I is
                                  X(g)   = Le(xg-lx- l );
                                                x
                                          x.v
236                                                    Representation theory of finite groups
Proof. Consider the operation p(b) : a 1--+ eab, representing the projection on I
followed by the right regular representation. We have
hence
           y-lay = a for all y     E   G -# a(yzy-l) = a(z)     for all y, z   E   G.
Thus an element a = L a(x)x lies in the centre of kG iff a(x) is constant on con-
jugacy classes. This just means that we can write a = L aAcA, where CA is the sum
of all elements in a given conjugacy class CA' It follows that these class sums form
a basis for the centre of kG. This proves the first part of our next result:
Theorem 6.4.5 Let G be a finite group and k an algebraically closed field of character-
istic 0 or prime to IGI. An element a = L a(x)x of kG lies in the centre if and only if
a(x) is a class function. Moreover, the class sums CA form a basis of the centre of kG and
the irreducible characters over k form a basis for the class functions; thus if Xl, .. ; , Xr
are the different irreducible characters, then any class function a on G may be written in
the form
Hence the number of irreducible characters equals the number of conjugacy classes
ofG.
  To complete the proof we denote the number of irreducible characters by rand
the number of conjugacy classes by s; as we have seen, s is the dimension of the
6.4 Characters                                                                             237
centre of kG. Now kG is the direct product of r full matrix rings over k. Clearly each
matrix ring has a one-dimensional centre and the centre of the direct product is easily
seen to be the direct product of the centres. Hence the centre of kG is
r-dimensional over k, and it follows that r = s. The characters are independent, by
the orthogonality relation (6.4.5), hence they form a basis, which is orthonormal
by (6.4.5), and we therefore have (6.4.8).                                          •
  Let us consider the multiplication table for the basis Cl, ••• , Cr of the centre of kG.
Any product of classes CAC/l is a union of a number of classes, hence we have
(6.4.9)
If we apply p to (6.4.9) we obtain 1/A1//l =   L YA/lV1/V and it follows that 1//l is a root of
the equation
(6.4.11)
This shows 1//l to be an algebraic integer. Further, (6.4.10) shows that for each
irreducible character X, h/lX(/l) Id is a root of (6.4.11); since (6.4.11) is of degree r,
its roots are the values h/lX}/l) I di for the different irreducible characters of G.
   As a consequence of this development we can show that the degrees of the irredu-
cible representations divide the group order. We recall from Section 9.4 of BA that
the sum and product of algebraic integers are again integral.
Proposition 6.4.6 {Frobenius}. For any finite group G the degree of each irreducible
representation over C divides IGI.
Proof. Let Xbe an irreducible character and d its degree. As a sum of roots of 1, X is an
algebraic integer, and so is hAX(A) Id, as we saw above. By the orthogonality relations
(6.4.5) we have
and since sums and products of algebraic integers are integral, it follows that IGlld
is integral, as we had to show.                                                   •
238                                                           Representation theory of finite groups
When A 1= /L, we can omit hI., and so we obtain the second orthogonality relation for
characters:
Proposition 6.4.7. For any finite group G, if X;A) is the value of the i-th irreducible
character on the conjugacy class CA and ICAI = hI., then
                                                                                                   •
The character of a representation may also be used to describe its kernel.
Corollary 6.4.9. Let G be a finite group. Given x         E   G, if x(x)   = X( 1) for all irreducible
characters X of Gover C, then x = 1.                                                               •
point of B is fixed by IHI elements, for its stabilizer is conjugate to H. Recalling the
orbit formula from BA, Section 2.1: IBI = (G: H), we therefore have
For general permutation modules we can apply the result to each orbit and obtain
Proposition 6.4.10. Let G be a finite permutation group acting on a set B and denote
the character afforded by the corresponding permutation module by X. Then
                                    L X(x) = n·IGI,
                                     x
(6.4.12)
Examples
We end this section with some examples of representations and characters; through-
out, k is algebraically closed of characteristic O.
   is a representation. We get distinct characters for nj different values of Vj, and the
   nl ... nm characters so obtained are all different and constitute all the irreducible
   characters of A. This corresponds to the fact that the dual of A, i.e. its group of
   characters, is isomorphic to A itself (see Theorem 4.9.1 of BA).
240                                                         Representation theory of finite groups
2. Consider D m , the dihedral group of order 2m, with generators a, b and defining
   relations am = 1, b2 = 1, b - lab = a - I. Every element can be uniquely expressed
   in the form aCtbfJ , where 0 ~ ex < m, 0 ~ f3 < 2. It is easily verified that the con-
   jugacy classes are for odd m : {ar, a- r} (r = 1, ... , (m - 1)/2), {I}, {aCtb}; and for
   even m: {ar, a- r} (r = 1, ... , m/2 -1), {I}, {a m/ 2}, {a 2Ctb }, {a 2a + l b}. Further it
   may be checked that the index of the derived group in Dm is 2 when m is odd
   and 4 when m is even.
      To find the representations of Dm we have a homomorphism Dm -+ C2
   obtained by mapping a 1-+ 1, which gives rise to two representations, the trivial
   representation and a 1-+ 1, b 1-+ - 1. Further representations are obtained by
   taking a primitive m-th root of 1, say w, and writing
                          o .),bl-+(O1 01)
                         w- I
                                                        (i = 1, ... , [m/2]).              (6.4.13)
                                ( Zm
                                  1    -   1) .2 2   + 4.1 2 = 2m.
3. Character tables for the symmetric groups Sym3, Sym4. In the tables below the
   rows indicate the different characters, while the columns indicate the conjugacy
   classes, headed by a typical element and the order of each class. As is well
   known and easily checked, the conjugacy class of each permutation is determined
   by its cycle structure, hence the number of conjugacy classes of Symn is the
   number of partitions of n into positive integers. Moreover, the derived group is
   the alternating group Alt n and its index in Sym n is 2; hence there are just two
   linear characters, the trivial character and the sign character.
           1      3      2                             6       8          6          3
                (12)   (123)                         (12)    (123)     (1234)     (12)(34)
      XI   1      1        1        XI       1          1                 1            1
                                    X2       1        -1                 -1
      X2   1    -1         1        X3       3                 0         -1          -1
                                    X4       3        -1       0          1          -1
      X3   2      0      -1         XS       2         0      -1          0           2
6.5 Complex representations                                                            241
   In each table the first row is the trivial character and the second row the sign
   character. The degrees in the first column are found by solving the degree
   equation L    dr= nL Each character X gives rise to a 'conjugate' character XX2,
   corresponding to the tensor product p ® P2 of the representations. Thus X3
   and X4 in the second table are conjugate and X3 in the first table and Xs in the
   second table are self-conjugate, hence they vanish on odd classes. Now the
   remaining values are found by orthogonality, using Proposition 6.4.7.
      We note that the characters for Sym3, Sym4 are rational. This is a general
   feature of symmetric groups; in fact, as we shall see in Section 6.6, all the irredu-
   cible representations of Symn can be expressed over Q.
Exercises
1. Verify the calculations in the above examples, and make a character table for the
   quaternion group of order 8.
2. Show that any two elements x, y of a group G are conjugate iff X(x) = X(y) for all
   irreducible characters X of G.
3. Show that any character which is zero on all elements =I=- 1 of G is a multiple of the
   regular representation.
4. Show that if S = {Xl, ... ,xn } is a transitive G-set, then the vector space spanned
   by the Xi - Xj is a G-module affording a representation which does not contain the
   trivial representation.
5. Let X be a character of a finite group over a field of characteristic O. Show that
   (X, X) is a positive integer.
6. Let p be an irreducible representation of degree d of G, with character X. Show
   that the simple factor of the group algebra corresponding to p has the unit
   element e given by IG I.e = d. L x(x -1 )x.
7. If gijk is defined as in (6.3.11), show that gijk = (XiXj, Xk). Use Proposition 6.4.7 to
   evaluate the sum of all the gijk and deduce the formula Lijk gijk = IG I LA hi: 1.
   Show that the value of this sum for Sym3 is 11 and for Sym4 is 43, and verify
   the formula in these cases.
8. (Theodor E. Molien, 1897) Let G be a finite group with irreducible characters
   Xl, ... , Xr and U a G-module with basis U1,"" Un. Show that the character
   afforded by the G-module U ® U ® ... ® U (n factors) is L niXi, where ni is
   the coefficient of t nin IG 1-1 Lx Xi(X) detU - tp(x)) -1, P being the representa-
   tion afforded by U. Deduce that the number of invariants of degree n in the u's is
   the coefficient of t nin IGI- 1 Lx detU - tp(x)) -1.
 (AU   + A'U', v) =   A(U, v)   + A'(U', v),    (v, u) = (u, v), (u, u) > 0 for u =j:. 0, (6.5.1)
which in addition satisfies
                        (ux, vx) = (u, v)         for all u, v   E U,   x   E   G.          (6.5.2)
For, given a unitary representation p, relative to the basis VI, ... ,Vd of U, let us
define a hermitian form by writing (Vi, Vj) = Oij. If the vectors u, v have coordinate
rows a, fJ, then (u, v) = afJH and (ux, vx) = ap(x)(fJp(x))H = ap(x)p(x)HfJH =
afJH, because p(X)p(X)H = I by unitarity. Conversely, if we have a positive definite
hermitian form satisfying (6.5.2), then transformation by x preserves the metric and
so must be unitary.
   Any unitary representation is completely reducible. To verify this assertion we take
the corresponding module U. If W is any submodule, then its orthogonal com-
plement W~ = {u E UI(u, w) = 0 for all wE W} is again a G-submodule and is
complementary to W. In terms of representations we can also verify this fact by
noting that a reduced matrix p(x) must be fully reduced, because its transpose is
p(x- l ).
   The importance of unitary representations is underlined by the following result,
which incidentally provides another proof of Maschke's theorem.
Thus (u, v) is invariant under G, and it is clearly positive definite hermitian, as a sum
of such forms. On choosing an orthonormal basis, we obtain the desired unitary
6.5 Complex representations                                                                243
Since p is irreducible, we have SHS = AI by Schur's lemma, and here A > 0, because
S H S is positive definite. Writing A = fJ, 2 (fJ, > 0) and T = fJ, - 1S, we obtain a unitary
matrix T such that a(x) = Tp(x)T- 1 •
   The irreducible complex representations may be classified as follows. Let p be an
irreducible complex representation (not necessarily unitary). If p is equivalent to a
real representation, it is said to be of the first kind. If p is not of the first kind,
but is equivalent to its conjugate p, it is said to be of the second kind; in the remain-
ing case p is of the third kind. Our next result shows how to distinguish the first two
of these cases. In the proof we shall need the elementary fact that a symmetric unitary
matrix can always be written as the square of a symmetric unitary matrix. We recall
the proof.
   Let P be symmetric and unitary; as a unitary matrix P is similar to a diagonal
matrix, say S-l PS = D for a unitary matrix S, and
i.e. STSD = DSTS. Now D is diagonal and again unitary, so its diagonal elements
have absolute value 1 and we can find a diagonal matrix E such that E2 = D and
(6.5.3)
Then either P T   = P and p is of the first kind, or P T = - P and p is of the second kind.
Proof. Taking complex conjugates in (6.5.4) we have p(x) = pT p(x)(pT) -1, because
p- 1 = pH = pT; therefore
   Now it is clear from (6.5.4) that P cannot be of the third kind, so it is enough to
show that p is of the first kind iff P T = P. If p is of the first kind, then there is an
invertible matrix L such that L -1 p(x)L is real for all x E G, hence by (6.5.4),
It follows that PLL -1 commutes with p(x) and so PLL -1 = cd, i.e. P = aLL -1. Now
if pT = -P, then p- 1 =p H = -P and so I=PP- 1 = -aLL- 1.aLL- 1 = -aaI,
which is a contradiction. Therefore P T = P in this case.
   Conversely, assume that P T = P. Then P is symmetric unitary and by the above
remark, P = Q2, for a symmetric unitary matrix Q. Hence Q = QH = Q - 1, and
Q-l p(x)Q = QP- 1p(X)PQ-l = Q-l p(x)Q. Therefore p is equivalent to a real
representation, and so is of the first kind.                                                  •
Corollary 6.5.3. Any complex irreducible representation of the second kind has even
degree.                                                                                       •
Theorem 6.5.4. Let X be a complex irreducible character of a finite group G and v(X)
its indicator as in (6.5.5). Then X is of the first kind if v(x) = 1, of the second kind if
v(x) = -1 and of the third kind if v(x) = O.
Proof. Let p be a representation affording X and denote its degree by d. We may take
p to be unitary, and then have
i,x ij x
                               = LLPij(X)Pi/X- 1 ).
                                    ij    x
If P is of the third kind, p and p are inequivalent and then v(X) = 0 by the ortho-
gonality relations (Theorem 6.3.3). If p is of the first kind, we may assume it to
be real; in that case
6.5 Complex representations                                                                                245
and hence v(X) = 1. Finally, if P is of the second kind, then by Proposition 6.5.2
there is a unitary matrix P such that pT = -p and p-I p(x)P = p(x); hence on writ-
ing P = (Pij), p-I = (qij) we have, again by the orthogonality relations,
= d- I LqirPir
                                                        = d- I tr(p-IpT)           = tr( d-   I)
                                                                                                   = -1.
Thus v(X)   = -1, as we wished to show.
                                                                                                           •
   Finally we show how the indicator is related to the solution of the equation Xl                      =a
in G.
Proposition 6.5.5. Let G be a finite group. Given a E G, let t(a) be the number of
solutions of the equation Xl = a in G. If XI, ... , Xr are all the inequivalent complex
irreducible characters of G, then
                                                    r
                                     t(a) = L V(Xi)Xi(a).                                            (6.5.6)
                                               i= I
Proof. It is clear that t(a) is a class function on G, so it can be written in the form
t(a) = L ciXi(a), and it only remains to show that Ci = V(Xi). The sets T(a) =
{x E G IXl = a} form a partition of G, and we have by Theorem 6.4.5, using (6.5.8) to
determine the coefficient Ci:
                                =   L Xi(X    l )   = IGI·v(Xi),
                                    XEG
by (6.5.5). Since the indicator is always real, the desired relation Ci                             = V(Xi)
follows.                                                                                                   •
   Let us apply the result for a = 1. In this case x(l) = d is the degree of X. Bearing
in mind that v(X) ::: 1, with equality precisely when X is of the first kind, by Theorem
6.5.4, we obtain
Corollary 6.5.6. If t is the number of elements of order 2 in the finite group G, and the
degrees of its complex irreducible representations are dI, ... d" then
with equality if and only if all the irreducible characters are of the first kind.                         •
246                                                  Representation theory of finite groups
Since equality holds here, all the representations must be of the first kind, and going
through the proof of Proposition 6.5.2, we find that the representation given at the
end of Section 6.4 is equivalent to the real representation
               I     I
  a 1-+ -I(A+A- A-A- )
        2 A-I_A A+A- I
                                   , bl-+
                                            (0 1)
                                             1 0
                                                 ,     A = w j , j = 1, ... , (m -1)/2.
Further, we can rule out the low dimensions 2, 3 because an integral representation
can be reduced mod 2, and we cannot have a homomorphism of Syms into GL2 (F 2 )
(order (2 2 -1)(2 2 - 2) = 6) or into GL3 (F 2)) order (2 3 -1)(2 3 - 2)(2 3 - 22) =
192) which maps Alts non-trivially. This leaves as the only possibility for the degrees
1, 1, 4, 4, 5, 5, 6.
Exercises
1. Use the information on Syms to make a character table for it.
2. Find the matrix of transformation which reduces the representation (6.4.14) for
   Dm to the form given above.
3. Let V be a simple G-module with real character. Show that there is just one
   invariant bilinear form b on V, up to scalar multiples. Show further that b is
   either symmetric or antisymmetric, and that the latter can happen only when
   dim V is even.
4. Show that in a group G of odd order no element =f. 1 is conjugate to its inverse.
   Deduce that for any y =f. 1, Li Xi(y)2 = Li Xi(Y)xi(y-l) = O. Hence show that
   G has an irreducible character of the third kind.
6.6 Representations of the symmetric group                                                  247
5. Show that every non-trivial irreducible character of a group of odd order is of the
   third kind. (Hint. Use the methods of Exercise 4 and the fact that x(l) is an odd
   integer.)
6. Find a quadratic polynomial f such that a complex irreducible representation of
   the n-th kind has the indicator f(n). For what numbering of the different kinds
   of representations can f be chosen as a linear polynomial?
                1 2 3 4 5 6 7 8 9)
            (                      = (1 7 2 5)(3 6 9)(4 8).
                756 8 1 9 2 4 3
The cycles have no digits in common, so they commute and we can arrange them by
decreasing length. If the lengths are ai, ... , ah, we may thus suppose that
                                 al     + a2 + ... + ah =             n,                (6.6.1)
and
                                                                                        (6.6.2)
If Al of the ai are I, A2 are 2, etc., we can also write 1AI2A2 ... rA, (where r is the
largest ai) for the set of a's. This is called the cycle structure of the permutation.
Two permutations have the same cycle structure iff they are conjugate in Sn: If
then g = p-Ifp, where p: ai 1-'" bi. Hence two permutations with the same cycle
structure are conjugate; the converse is clear.
   It follows that the number of conjugacy classes of Sn equals the number of
sequences (al,"" ah) of positive integers satisfying (6.6.1) and (6.6.2); by Theorem
6.4.5 this is also the number of inequivalent irreducible representations of Sn. To get
a complete system of irreducible representations of Sn we need only construct for
each sequence (ai, ... ,ah) an irreducible representation, such that representations
corresponding to different sequences are inequivalent. This will be our aim in
what follows.
248                                                  Representation theory of finite groups
This is called a Young diagram and is again denoted bya. Since a r n, we can write
the numbers 1 to n in these squares (in some order). The result is called a Young
tableau. For example
(3,2,1)                             gp¥ 1
                                        4
                                            65
                                            3
We see that for each diagram there are n! distinct tableaux. If Ta is a Young tableau
and g E Sn> then Tag denotes the tableau obtained from Ta by applying g.
   We can let each Young tableau represent a permutation by regarding the rows as
cycles. In this way the tableau just illustrated represents the permutation (265)(1 3).
If Ta represents c in this way, then Tag represents g-l cg, as is easily verified.
   We now fix a tableau Ta and define two subgroups of Sn as follows: Pa is the set of
all permutations leaving each symbol in its row, briefly the set of row permutations of
Ta, and Qa is the set of all permutations leaving each symbol in its column, the set of
column permutations of Ta. Here Pa and Qa depend of course on Ta and not merely
on the diagram a. For example, in the above tableau Pa is generated by (26), (2 6 5),
(1 3), while Qa is generated by (2 1), (2 1 4), (3 6). If we apply g to Ta and use the
remark made earlier, we obtain
Lemma 6.6.1. Let Ta be a Young tableau with groups Pa' Qa and let g        E   Sn. Then the
groups for Tag are g - 1Pag and g - 1Qag.                                                •
     Let A be the group algebra of Sn (over the rational numbers, say) and write e(g) or
eg   for the sign of the permutation g. Writing p, q for the typical elements of Pa, Qa
6.6 Representations of the symmetric group                                            249
respectively, we define two elements of A, the sum of the row permutations and the
alternating sum of the column permutations:
                                                                                  (6.6.3)
                                     p              q
lemma 6.6.2. Let Ta be a Young tableau and fa, ga as in (6.6.3), and write IPal = ra,
IQal = Sa· Then
                                                                                  (6.6.4)
(6.6.5)
(6.6.6)
Proof. We have pfa = Lp' pp' = LP' = fa, Cqqga = LCqCq'qq' = ga> which estab-
lishes one half of (6.6.4), (6.6.5); the other half follows similarly. Now
f~ = Lppfa = Lfa = rafa, g~ = LCqqga = Lga = saga'                         •
lemma 6.6.3. Let a, f3 I- n and take any tableaux Ta, Tfi. If a ::: f3 and no two
numbers in the same row of Ta occur in the same column of Tfi, then (i) a = f3 and
(ii) Tfi = Taqp for some p E Pw q E Qa.
   Some care is needed here, for we have abused notation by writing Pa instead of the
more accurate P(Ta)' In fact we shall take Pa, Qa to be the groups associated with Ta
and write Pfi, Qfi for the groups of T~.
Proof. Since a ::: f3, we have al ::: f31. The first row of Ta has al numbers, which
must be in different columns of T~, so f31 ::: al> and hence f31 = al. Now for a
certain column permutation q'l E Q'p we can bring these numbers into the top
row, though possibly in a different order from that in Ta.
   Leaving out the top row in T~q~ and Ta we can repeat the argument, showing that
f32 = a2 and finding q; such that T~q~ q; has the same numbers as Ta in the second
row as well as the top row. After h steps (if a has h parts) we get q' = q~ q; ... q~
such that T~q' differs from Ta only by a row permutation: T~q' = TaP, where
p EPa, q' E Q'p. Now Ta = T~q'p-l, hence
                        Qa = (q'p-I)-IQ'pq'p-1 = pQ'pp-l;
Corollary 6.6.4. Given ex, {3 I- n and any tableaux T", T p' if ex > {3, then
f"Agfl = o. (6.6.8)
By Lemma 6.6.3 there must be two numbers i, k which lie in the same row of T" and
in the same column of Tp. Write t = (i k); then f"t = f", tgfl = -gfl, hence f"gfl =
                                                                                                    .
f"t 2gfl = -f"gfl and (6.6.9) follows. Now x-Igflx corresponds to the tableau Tpx and
(6.6.7) follows by applying (6.6.9) with g'p = x-Igflx. Replacing X-I by y and multi-
plying by y on the right, we have f"ygfl = 0, and now (6.6.8) follows by summing
~~
(6.6.11)
We may consider hex as an operator symmetrizing the rows and antisymmetrizing the
columns; it is called the Young symmetrizer associated with the tableau Ta.
Proposition 6.6.5. Let h" be the Young symmetrizer associated with a tableau T". Then
the relation
                                pa8qq = a for all PEP", q E Q"                                (6.6.12)
holds for a = h", and any a satisfying (6.6.12) must be of the form a = Ah", where A is a
scalar. Moreover,
                             h"bhfl = 0 for ex > (3 and any b E A,                            (6.6.13)
(6.6.14)
Proof. By (6.6.4), Piag" = f"g", while (6.6.5) yields f"g"q = 8qfag,,; hence (6.6.12)
holds for a = h". Now let a = L a(x)x satisfy (6.6.12). Then
                       L 8qa(x)pxq = L a(x)x
                        x
                                                         for all PEP", q E Q".                (6.6.15)
we claim that a(x) = 0 when x is not of the formpq. Consider Ta and T~ = Tax-I;
by Lemma 6.6.3 there are two numbers j, k in the same row in Ta and in the same
column in T~ (because T~ is not of the form Tapq). Put t = (j k); then we have
tEPa, t E Q~, where Q~ corresponds to T~; therefore t E xQax-1 or also
x-Itx E Qa. In (6.6.15) let us take p = t, q = x-ItXj comparing coefficients of x
we find ( - l)a(x) = a(x), hence a(x) = o. Together with (6.6.16) this shows that
a = a(l). LPq£q = a(l).ha as claimed.
  Now when ex > {J, then by Corollary 6.6.4, habhfJ = fagabffJgfJ E faAgfJ = o. This
proves (6.6.13), and (6.6.14) follows because habha satisfies (6.6.12).           •
Theorem 6.6.6. With each Young diagram ex let us associate one definite Young tableau
Ta and construct the corresponding Young symmetrizer ha as an element of the group
algebra A = QSa:
Then the I a = haA are simple submodules for the right regular representation ofSn; they
are pairwise non-isomorphic and afford a complete system of irreducible representations
for Sn·
Proof. We first show that I a = haA is a minimal right ideal of A. Given a right ideal
m S; la, we have mha S; Iah a S; Qha. We distinguish two cases: (i) mha = Qha, then
I a = haA = mhaA S; m, hence I a = mj (ii) mha = 0, then m2 = mI'" = mh",A = 0,
so m2 = 0 and therefore m = O. It follows that I'" is minimal and the representation
induced by the regular representation is irreducible.
   We next show that la, IfJ are not isomorphic for ex =j:. {J. If ex> (J, say, then by
Corollary 6.6.4, h",AhfJ = 0, hence I"'hfJ = 0, but I"'ha =j:. 0, because h", =j:. 0, and
the conclusion follows. Now the number of distinct diagrams is the number of parti-
tions of n, which is just the number of conjugacy classes of Sn, as we have seenj hence
by Theorem 6.4.5, we have a complete set of irreducible representations.               •
(6.6.17)
On the other hand, for a E la, aA(ha) = haa    =L    h",(uv- 1 )a(v).u, hence the matrix
of A(h",) (in the natural basis) is given by
252                                                       Representation theory of finite groups
                                        /L" = n!jd".
Recalling Proposition 6.4.4, we find that the corresponding character is given by
(6.6.19)
From this formula it is possible to calculate       x" explicitly in terms of the partitions
a of n; we shall give the result here without proof and refer to Weyl (1939),
Chapter VII or James and Kerber (1981), Chapter 2 for details.
  Let x E Sn have the cycle structure f3 = (1 PI 2 P2 ••• t p, ); since the value of a
character X at x depends only on f3, we shall write it as X(f3). Let XJ, ... , Xh be any
variables and denote the Vandermonde determinant formed from them by writing
down the terms on the main diagonal:
                                    IXIh-I x2h-2 .. . xh01 .
This is an alternating function of the x's, briefly denoted by ~ in what follows. More
generally, for any positive integers aI, ... , ah the function
                                IX"II +h-IX"2
                                            2
                                              +h-2 ••• X"h
                                                        h
                                                           I
formed in the same way, is an alternating function of the x's, zero unless all the a's
are in descending order, and it may be written as S(aJ, ... , ah)~' where S as a
quotient of two alternating functions is symmetric in the x's. Such a function S is
called a bialternant or S-function.
   Given a = (aJ, ... , ah) I- n, let us write rl = al + (h - 1), r2 = a2 +
(h - 2), ...• rh = ah. We define the power sums in the x's as Si = I: x~ and for a
                                                                    sfl
given f3 = (f3J, ...• f3t) satisfying I: if3i = n put a(f3) = S~2 ... s~'. With these
notations we have the following relation for the characters of Sn corresponding to
the partitions into at most h parts:
                                 d _
                                   ,,-n.
                                          ,ad" (rj -      ri)
                                                           ,.
                                            rl·r2· ... rho
6.7 Induced representations                                                        253
Exercises
1. Compute the characters of 54 ,55 by the methods of this section.
2. For each partition a of n define the conjugate partition as a', where a; is the
   number of parts of a that are ::: i. Describe the relation between the Young
   diagrams of a and a', and show that a" = a.
3. Show that the character corresponding to the conjugate partition is given by
   X~ = eXOI, where e is the sign character.
4. Show that for n ::: 4, 5n has at least two irreducible representations of degree
   n - 1; and show that for n = 6 it has four.
5. A group is called ambivalent if each element is conjugate to its inverse. Such a
   group, if non-trivial, must have even order. Show that an ambivalent group has
   a real character table.
6. Show that the symmetric group 5n of any degree n is ambivalent (it can be shown
   that the alternating group Alt n is ambivalent precisely when n = 1,2,5,6, 10, 14;
   see James and Kerber (1981), Chapter 1).
7. Show that any irreducible representation of 5n of degree greater than 1 is faithful
   except the representation corresponding to 22 for n = 4.
Each ut). is a Ht).-module and under the action of G these terms are permuted
among themselves.
  Given x E G, we can for each A = 1, ... , r find a unique hE Hand f.L, 1 :::: f.L :::: r,
such that tJ..X = htll" Then we have
this shows how the action of x permutes the terms in (6.7.3), as well as acting on
each. To find the representation afforded by U G , let us take a basis UJ, ..• , Un of
U and let p(h) = (Pij(h)) (h E H) be the corresponding representation of H. Then
the elements Ui ® tJ.. form a basis of U G and we have, for any x E G,
where
                                  _ { p(tJ..xt;: 1)     if tJ..xt;: 1 E H,
                           PJ..JL -
                                      o                 otherwise.
This is called the induced representation and is denoted by ind~ P or PG. Hence if the
character of p is a, then pG has the character ind~ a = a G, given by
                                          =   L
                                              J..
                                                     trPu
where the sum is over all A such that tJ..xtJ..-1 E H. Let us define a e on G by
                                              a(x)      if x E H,
                                ae(x) = {
                                               o        otherwise.
6.7 Induced representations                                                                             255
Bearing in mind that Ol(h)        = tr(p(h)), we can rewrite the equation for OlG(X) as
                                    OlG(X) = L Ol°(tAXtA- I ).                      (6.7.4)
                                                    A
                               IHI.OlG(x) = L           LOlO(htAxtA-Ih-I)
                                                hEH      A
We remark that for any class function a on H this formula defines a class function OlG
on G. To illustrate (6.7.5), consider the case H = 1. There is only one character,
namely the trivial character 1. We see from (6.7.1) that the induced representation
is just the regular representation of G. More generally, let us take an arbitrary sub-
group H of G but take the trivial character 1H on H. Then we obtain an induced
character
where for any x E G, nx is the number of elements t)..xt)..-I (A = 1, ... , r) that lie in H.
   As another example take the dihedral group of order 2m, Dm = gp{a, b lam =
b2 = (ab)2 = 1}, with subgroup H = gp{a} of order m. H has the representation
a 1--+ w, where w m = 1. With the transversal tl = 1, t2 = b of H in Dm we have
                              p(a)=(~ W~I}P(b)=(~ ~}
and the character x(a) = w + w- I , X(b) = o.
  It is clear from (6.7.1) that ind~ is a covariant functor; thus
It follows that for any character 01 of K (indeed for any class function),
The following important relation between induction and restriction was proved by
Georg Frobenius in 1898:
        where the subscripts denote the groups on which the scalar products are taken.
      For irreducible characters P on G and a on H this tells us that the multiplicity of a
in    PH is equal to that of pin aGo
                       = IGI-1IHI- 1La-(xg).B(x g)
                       =   IHI- 1 L a(x).B(x)
                                 XEH
G= UHaiK (6.7,11)
into disjoint sets. It is obtained by letting the direct product H x K act on G by the
rule: x I~ h - I xk for x E G, (h, k) E H x K. Each orbit has the form HaK, for some
6.7 Induced representations                                                              257
(6.7.13)
ind~ V=HWi,
where Wi = ED{Vtlt E HaiK} is a right K-module. Fix a = ai and put W             = Wi' For
any y, z E K we have
                          Hay = Haz <:>yz-l EHanK=D,
say. Hence if K = UDy is a coset decomposition of Kover D, then HaK               = UHay,
and so HaK = UHay. Now
                              W   = «(Va)D)K = ind;(Va)D,
by the definition of induced module. Now (6.7.12) follows by summing over i and
restricting to K, and (6.7.13) follows by taking characters in (6.7.12).     •
Corollary 6.7.4. Let G be a finite group and H a normal subgroup. For any character a
of H, ind~ a is irreducible if and only if a is irreducible and different from aa for all
a¢H.                                                                                  •
Exercises
1. Show that if N <l G and a is a class function on N, then indE a vanishes outside N.
2. Let a be a character of a subgroup H of G and define a G as in (6.7.5). Show
   directly that (a G , x)c is a non-negative integer for every irreducible character X
   of G, and deduce that a G is a character of G.
3. If a+ denotes the contragredient of p (see Exercise 3 of 6.1), show that
   (pG)+ = (p+)G.
4. Given a group G with subgroup H and characters a, f3 on H, G respectively, show
   that indE(a, reSH f3) = (ind~ a)f3.
5. Given a group G and subgroups H, K, show that if a, f3 are characters of H, K
   respectively, then
(6.8.1)
where TJA is given by p(cA) = hAl, p being the corresponding representation. Further
we saw in Section 6.4 that both X(A) and hA are algebraic integers.
                                         = d(aTJA   + bi A)).
Now both TJA and iA) are algebraic integers, hence so is X(A) Id, and the same holds
for its conjugates. Therefore the norm    Nci
                                           A) Id) is an integer; as we have seen, it is
less than 1 in absolute value, so N(i A) Id) = 0, and it follows that iA) = 0, as
claimed.                                                                            •
   This lemma has the following important consequence. We recall that for any ele-
ment x of a finite group G the number of conjugates of x in G is the index (G: Cx ),
where Cx is the centralizer of x in G (see BA, Section 2.1). In what follows we under-
stand by a simple group a simple non-abelian group.
                                                                                  (6.8.2)
260                                                      Representation theory of finite groups
and here the right-hand side is divisible by p, because pm = IC"I is the index of a
centralizer. On the left of (6.8.2) we have dl = 1, so two cases can arise. Either
there are more linear representations of G, in which case by Proposition 6.4.2
their number is (G : G'), where G' is the derived group, hence G' i- G and so G
is not simple; or all the non-trivial representations have degree greater than 1;
then by (6.8.2) there must be a representation Pi of degree di prime to p. By
Lemma 6.8.1, either   xi")   = 0 or Pi(X) for x E C" is a scalar. In the latter case either
Pi is not faithful or x lies in the centre of G; each time it follows that G is not simple.
This only leaves the alternative   xi")  = O. Thus for any character Xj we have either
xy)  = 0 or XY) == 0 (mod p), except for j = 1, when Xli) = xf) = 1. By ortho-
gonality we have Lj XY) XY) = 0, hence 1 == 0 (mod p), which is a contra-
diction.                                                                                 •
Theorem 6.8.3 (Burnside). Every group of order pOlqfJ, where a, f3 ::: 1 and p, q are
primes, is soluble.
Proof. Take a simple factor H of order paqb; by Theorem 6.8.2 no conjugacy class
can have prime power order, hence a, b ::: 1 and moreover, any conjugacy class
has order divisible by p and q. The class equation for H reads
                                 paqb     = 1+ L    pa;qb;,
where a, b and all the ai, bi are positive. This is a contradiction and the result
follows.                                                                       •
Lemma 6.8.4. Let G be a finite group and H a non-trivial subgroup such that
H Xn H = 1 for all x fj H. If a is a class function on H such that a(l) = 0 and
a G = ind~a, then (aG)H = a and for any class function f3 on H, (a G, f3G)G = (a, f3)H'
and this proves the first assertion. For any other class function f3 on H, Frobenius
reciprocity gives (a G, f3G)G = ((aG)H, f3)H = (a, f3)H' by what has been proved. •
  The actual result of Frobenius can be stated in two equivalent forms, in terms of
permutation groups or abstractly.
Theorem 6.S.5 (Frobenius). Let G be a finite transitive permutation group such that
no element =j::. 1 has more than one fixed point. Then the elements without fixed
point, together with 1, form a normal subgroup N of G.
  The normal subgroup N is called the Frobenius kernel. We remark that if H is the
stabilizer of a point, then
NH=G
Theorem 6.S.6 (Frobenius) . Let G be a finite group with a subgroup H such that
i.e. H meets each of its conjugates in 1 and is its own normalizer. Then H has a normal
complement in G.
    A group G with these properties is called a Frobenius group and the subgroup H
with the property (6.8.4) is called a Frobenius subgroup of G. So Theorem 6.8.6
states that every Frobenius subgroup has a normal complement.
Proof. Let us first show the equivalence of Theorems 8.5 and 8.6. Under the hypo-
thesis of Theorem 6.8.5 let H be the stabilizer of a point. If H <I G, then all points
have the same stabilizer. In that case H = G or H = 1 and the conclusion follows
with N = 1 or G respectively, so we may exclude this case. Further, the conditions
of Theorem 6.8.5 mean that any x E G\H moves the point fixed by H to another
point, not fixed by any element in H+ (where H+ = H\{I}), so that H X n H = 1,
while the elements moving all points make up precisely the set G\ U HX. Conversely,
given the hypotheses of Theorem 6.8.6, we see that on taking the coset representation
on G/H that no element =j::. 1 fixes more than one point and the elements moving all
points comprise the complement of the union of all the conjugates of H in G. If they,
together with 1, form a subgroup (and this is what we have to prove), then it must be
a complement of H in G.
   Let us denote this set, namely (G\ U HX) U {l}, by N. It is clear that N is a normal
set (i.e. a union of conjugacy classes), and writing IGI = n, IHI = m, n = mr, we
have INI = n - r(m - 1) = r, so if N is a subgroup, it will be a complement of H
because r = (G : H) and clearly H n N = l. So all we need to show is that N is a
subgroup. We shall obtain it as the kernel of a certain representation.
262                                                 Representation theory of finite groups
  Let a be any class function on H and put a = a - a(1).I H, where IH is the trivial
character on H. Then a( 1) = 0, and a is a class function on G. Put
                                 a*   = a G + a(1).IG'                            (6.8.5)
Then
Exercises
l. Let the finite group G be a split extension of N by a subgroup H. Show that H is a
   Frobenius subgroup iff H acts freely on H+ = H\{I}, i.e. for any x E N+,
   h E H+, then xh #- x.
2. Verify that in a dihedral group of order 2m, where m is odd, the cyclic subgroup
   of order m is a Frobenius kernel.
3. Let G be a finite group with Frobenius subgroup H. Show that under the action of
   H on the coset space G/H there is one orbit of length 1, while all the others have
6.8 Applications: The theorems of Burnside and Frobenius                               263
   length IHI. Deduce that IHI divides (G : H) - 1 and that the Frobenius kernel N
   has order prime to its index.
4. Show that if a conjugacy class C of a non-trivial group G has pm elements, where
   m ::: 1 and p is prime, then the set C - IC generates a proper normal subgroup
   ofG.
5. Give a direct proof of Theorem 6.8.5 in the case where the set acted on by G has
   pm elements, where m ::: 1 and p is prime.
6. Show that every normal Hall subgroup (i.e. of order prime to its index, see
   Section 3.2) is characteristic in G (i.e. admitting all automorphisms of G).
10. Let G be a finite group. Show that if every irreducible representation of Gover C
    is I-dimensional, then G is abelian. (Hint. Diagonalize a matrix of the regular
    representation. )
11. (J. A. Green) Show that if V is a simple G-module over a finite field F,
    then E = Endc(V) is a finite field. Further, if IFI = q, then V ® V ® ... ® V
    (n factors) has exactly (qn - 1)/(q - 1) simple submodules.
12. Show that the affine group A = Affl (Fq) of all transformations x 1--+ ax + b
    (a, b E Fq , a #- 0) is a Frobenius group with the translation group as kernel and
    the stabilizer of 0 as complement.
13. Let G be a finite group and X = (X~A)) its character table. Show that any auto-
    morphism of G induces a permutation of the rows of X, and a permutation of
    the columns, where the effect of ex on X is defined by xi(x) = Xi(X U). If P(ex),
    Q(ex) are the permutation matrices describing the effects of ex on the rows and
    columns of X respectively, then XU = P(ex)X = XQ(ex). Deduce that P(ex),
    Q(ex) are conjugate and hence have the same trace. Hence show that for any
    group A of automorphisms acting on G the number of orbits of the set of
    irreducible characters is the same as the number of orbits of the conjugacy classes
    of G under the action of A.
14. Let G be a Frobenius group with kernel K and complement H. Show that for any
    non-trivial irreducible character ex of K, ind~ ex is an irreducible character of G.
    (Hint. Consider the group of automorphisms of K induced by H; show that
    C E H+ fixes no conjugacy class #- {l} of K, and use Exercise 13 to show that
    for any non-trivial character X of K, XC #- X; now apply reciprocity to show
    that ind~ ex is irreducible.)
15. Let G, H, K be as in Exercise 14. Show that any irreducible character X of Gis
    either trivial on K or induced up from an irreducible character 1{r of K.
    Deduce that for such X, 1{r and any C E H, res~ X(c) = 8cl1{r(l)·IHI.
16. Let H be a subgroup of Sn and for h E H denote by Fh the set of numbers left
    fixed by h. Show that cp(h) = IFhl - 1 is a character of H. (Hint. Let Sn act by
    permutations on a basis Ul, ... , Un of an n-dimensional vector space V and
    take a decomposition of V including the subset spanned by LUi.)
                         Noetherian rings and
                         polynomial identities
The Artinian condition on rings leads to a very satisfactory theory, at least in the
semisimple case, yet it excludes such familiar examples as the ring of integers.
This ring is included in the wider class of Noetherian rings, which has been much
studied in recent years. We shall present some of the highlights, such as localization
(Section 7.1), non-commutative principal ideal domains (Section 7.2) and Goldie's
theorem (Section 7.4), and illustrate the theory by examples from skew polynomial
rings and power series rings in Section 7.3.
   Another condition which helps to make rings amenable is the existence of a poly-
nomial identity. The topics treated include generic matrix rings and central polyno-
mials (Section 7.7) and the theorems of Regev (Section 7.6) and Amitsur (Section
7.8), as well as some generalities in Section 7.5, while Kaplansky's basic theorem
on PI-rings is reserved for Chapter 8.
Theorem 7.1.1. Let R be any ring and S a subset of R. Then there exists a ring Rs and a
homomorphism A : R -+ Rs which is universal S-inverting.
Proof. In detail the assertion means that A is S-inverting and every S-inverting
homomorphism can be factored uniquely by A. To construct Rs we take for each
a ERa symbol pa and for each s E S an additional symbol qs and form the ring
Rs on all these symbols as generators, with the defining relations
The first three sets of equations ensure that the mapping A : a 1-+ pa is a homo-
morphism of R into Rs, while the fourth shows that A is S-inverting. Now given
any S-inverting homomorphism f : R -+ R', we can define a homomorphism g of
the free ring F on the p's and q's into R' by the rules pa 1-+ af, qs 1-+ (sf) -1. It is
clear that g preserves the relations (7.1.1); hence it induces a homomorphism
gl : R -+ R', and it is easily checked that Agl = f. Moreover, gl is uniquely
determined by this equation, since its value on pa is given, while its value on qs is
determined by (7.1.1): if qsgl = c, then sf.c = {Psqs)gl = 1 = {qsPs)gl = c.sf,
so c = (sf) -1. Thus gl is determined on a generating set of Rs and hence is
unique.
                                                                                        •
Corollary 7.1.2. Let R, R' be rings with subsets S, S' respectively. Then any homo-
morphism f : R -+ R' which maps S into S' can be extended in a unique way to a
homomorphism of Rs into R~,.
Proof. The composition R ~ R' ~ R~, is S-inverting and so can be factored
uniquely by A: R -+ Rs to give the required homomorphism It : Rs -+ R~,. •
and this provides a clue to the condition needed. Of course, if every element is to be
expressed as a fraction with denominator in S, we must also assume S to be multi-
plicative, i.e. to contain 1 and be closed under products.
Condition 0.1 is called the right Ore condition. A multiplicative subset 5 of R satisfy-
ing 0.1 is called a right Ore set; if 5 also satisfies 0.2, it is called (right) reversible or
also a right denominator set. Such a set allows the construction of right fractions
a/s = (aA)(sA) -I, by Theorem 7.1.3. They must be carefully distinguished from
left fractions (SA) - I (aA). By symmetry we have the notion of a (reversible) left
Ore set, which allows us to construct all the elements of R as left fractions, and
the set 5 in Theorem 7.1.3 may well be a right but not left Ore set.
Proof. The proof of Theorem 7.1.3 is similar to the commutative case (BA, Theorem
10.3.1), though more care is needed, owing to the lack of commutativity. Guided by
(7.1.3), we define a relation on R x 5 by writing
    (a, s)   rv   (a', s') {} there exist u, u'   E   R such that au   = a'u', su = s'u' E S.
We claim that this is an equivalence. Clearly it is reflexive and symmetric. To prove
transitivity, let (a, s) rv (a', s'), (a', s') rv (a", s"); say au = a'u', su = s'u' E 5,
a'v = a"v', s'v = s"v' E S. By 0.1 there exist z E 5, z' E R such that s'u'z = s'vz',
hence s'u'z E 5 (by multiplicativity) and moreover, s'(u'z - vz') = 0, hence by
0.2 there exists t E 5 such that u'zt = vz't. Now we have auzt = a'u'zt =
a'vz't = a"v'z't, suzt = s'u'zt = s'vz't = s"v'z't, and this lies in 5 because s'u'z E S
and t E S. Thus (a,s) rv (a",s").
   We thus have an equivalence on R x 5; let us write al s for the equivalence class
containing (a, s) and call a the numerator and s the denominator of this expression.
We note that (7.1.3) now holds by definition, and it may be interpreted as saying that
two fractions are equal iff when they are brought to a common denominator, their
numerators agree. Of course it follows from 0.1 that any two expressions can be
brought to a common denominator. For this reason we can define the addition of
fractions by the rule
                                     a/s + b/s = (a + b)/s.                                (7.1.5)
Here it is necessary to check that the expression on the right depends only on a/s, bl s
and not on a, b, s, a task which may be left to the reader. To define the product of a/s
and bIt we determine bl E Rand Sl E 5 such that bS I = sb l and then put
                                      (a/s)(b/t)      = abJ/tsj.
Again the proof that this is well-defined is left to the reader. Now it is easy to check
that with these operations the classes als form a ring T say, and the mapping a 1-+ a/I
is an S-inverting homomorphism from R to T. Moreover, if f : R -+ R' is any
S-inverting homomorphism, then the mapping h : R x 5 -+ R' given by (a, s) 1-+
(af) (sf) - I is constant on each equivalence class and so can be factored via T
268                                                    Noetherian rings and polynomial identities
to provide a homomorphism f' : T ~ R' such that f = Af'. Here f' is unique,
because it is determined on a/land 1/s, so by uniqueness T is indeed the universal
S-inverting ring. Finally, ker A consists of all a/I = 0/1, i.e. by (7.1.3), all a such that
at = 0 for some t E S.                                                                    •
 An important case is that where S lies in the centre of R. Then 0.1-0.2 are auto-
matic and we have
Corollary 7.1.4. Let R be a ring and S any multiplicative subset of the centre of R. Then
S is a reversible Ore set and the universal S-inverting ring R consists of all fractions a/s
(a   E   R, s E S), where
                      a/s = a' /s'   <=}   (as' - sa')t = 0 for some t   E   S.
The conditions of Theorem 7.1.3 simplify slightly when S consists entirely of regular
                                                                                              •
elements. Then 0.2 is superfluous and ker A = O. We state this as
Corollary 7.1.5. Let R be a ring and S a right Ore subset of regular elements. Then the
natural homomorphism A : R ~ Rs is injective.                                        •
   The subset T of all regular elements in R is always multiplicative and satisfies 0.2.
When it satisfies 0.1, we can form RT ; this is called the total (or classical) quotient
ring. Generally one understands by a quotient ring a ring in which every regular
element is a unit.
   Finally we note the special case of integral domains.
Then R is a regular Ore set, K = RRx is a skew field and the natural mapping
A : R ~ K is an embedding. Conversely, if R is an integral domain with an embedding
in a skew field whose elements all have the form ab- 1 (a, b E R, b "I- 0), then (7.1.6)
holds.                                                                                        •
   The skew field K occurring here is called the field offractions of R. The special case
(7.1.6) of 0.1 was used by Ore [1931] in his proof of Corollary 7.1.6; since then there
have been many papers generalizing Ore's construction to the case of Theorem 7.1.3
or a special case. An integral domain satisfying (7.1.6) is called a right Ore domain;
left Ore domains are defined similarly and an Ore domain is a domain which is left
and right Ore.
   The following property of fractions is often useful.
Proposition 7.1.7. Let R be a ring with a reversible right Ore set S and universal
S-inverting ring Rs. Then any finite set in Rs can be brought to a common denominator.
Proof. We shall use induction on the number of elements. Let adsi E R
(i - 1, ... , n) be given. For n = 1 there is nothing to prove, so by induction we may
7.1 Rings of fractions                                                                   269
assume the fractions to be in the form ad Sl, ads, ... , ani s. By 0.1 there exist t E 5,
c E R such that sc = SIt = u E 5, hence the fractions can be written altlu,
a2clu, ... , anclu.                                                                   •
and fl' fz are not both zero, for otherwise a = 0 and f would be the zero polynomial.
Suppose that Ii :f=. 0; then Ii (a, b) :f=. 0, by the minimality of deg f, but f(a, b) = 0,
hence f(a, b)b = 0 and so
and this is non-zero because the left-hand side is non-zero. This contradicts the fact
that aR n bR = 0, and it proves
   We observe that the last two possibilities are not mutually exclusive; an Ore
domain may well contain a free algebra. We also recall from BA, Further Exercise 9
of Chapter 6, that an algebra containing a free algebra of rank 2 contains a free
algebra of countable rank.
   An interesting observation made by Alfred Goldie [1958] (actually the simplest
case of Proposition 7.4.8 below) is that the Ore condition is a consequence of the
Noetherian condition.
Proposition 7.1.9. An integral domain either is a right Ore domain or it contains free
right ideals of infinite rank. In particular, any right Noetherian domain is right Ore.
Proof. Let R be an integral domain and suppose that R is not right Ore. Then there
exist a, b E R x such that aR n bR = 0; now the conclusion will follow if we show
that the elements b, ab, a2 b, ... are right linearly independent over R. Suppose
270                                               Noetherian rings and polynomial identities
that L,aibci = 0 and let Cv be the first non-zero coefficient. We can cancel aV and
obtain the relation bcv + abcv+I + ... + an-vbcn = 0, i.e.
Proposition 7.1.10. Let R be a prime ring and 5 a right Ore set consisting of regular
elements of R. Then R is embedded in Rs and Rs is again prime.
Proof. The mapping R --+ Rs is an embedding, by Corollary 7.1.5. Suppose that
uRv = 0, where u = as-I, v = bt- I (a, b E R, s, t E 5). Then for any x E R, axb =
as-I.sx.bt-I.t = 0, hence a = 0 or b = 0 and accordingly u = 0 or v = o.        •
Corollary 7.1.11. Let R be a prime ring with centre C. Then     e x consists of regular ele-
ments in R; in particular, e is an integral domain, and if its field of fractions is denoted
by K, then R is embedded in Re x ~ R 18)e K, and the latter ring is again prime, with
centre K.
Proof. We first show that e x consists of regular elements. If c E e x and ca = 0
(a E R), then for all x E R, cxa = xca = 0, hence a = o. Since ac = ca for a E R, this
shows c to be regular. Since K is universal eX-inverting, it follows that R 18) K is
universal eX-inverting and by Proposition 7.1.10, the mapping R --+ Rex ~ R 18) K
is an embedding. Clearly K is contained in the centre of R 18) K; conversely, if ac- I
is in the centre, then for any x E R, ax = ac-I.cx = cx.ac- I = xa, because c E C.
Hence a E e and so ac- I E K.                                                       •
Exercises
 1. Show that every right Artinian ring is its own quotient ring.
 2. Show that an integral domain in which every finitely generated right ideal is
    principal is a right Ore domain (such a ring is called a right Bezout domain).
 3. Show that in a right Noetherian ring every right Ore set is reversible.
 4. Show that for any reversible right Ore set S in a ring R, the ring Rs is flat as left
      R-module.
 5. Show that for any ring R and subset 5 the natural mapping R --+ Rs is an epi-
    morphism in the category of rings.
 6. Show that the characteristic can be defined for simple rings as for integral
    domains, and that a simple ring is a Q-algebra or an Fp-algebra according as
    the characteristic is 0 or p.
 7. Let R be a simple ring and 5 a reversible right Ore subset. Show that the centre of
    Rs coincides with the centre of R.
7.2 Principal ideal domains                                                               271
Theorem 7.2.1. Let R be a principal ideal domain and A E              R n. Then there exist
                                                                      m
P E GLm(R), Q E GLn(R) such that for some integer r ::: min (m, n),
Proof. The aim will be to reduce A to the required form by a number of invertible
operations on the rows and columns. In the first place there are the elementary
operations; for the columns they are
(i) interchange two columns,
(ii) multiply a column on the right by a unit factor,
(iii) add a right multiple of one column to another.
For a Euclidean domain these operations are enough, but in the general case another
operation is needed; this is best illustrated by reducing a 1 x 2 matrix. We have a
matrix (a b) and need an invertible 2 x 2 matrix Q such that
                                      (a b)Q = (k 0).                                 (7.2.2)
Clearly k, the highest common left factor (HCLF) of a and b, will be a generator for
the right ideal generated by a and b. We may exclude the case k = 0, for then
a = b = 0 and there is nothing to prove. Thus we have aR + bR = kR, say a = kal>
b = kbl> and there exist c;, d; such that kald; - kblc; = k, hence aid; - blc; = 1.
By hypothesis Rc; n Rd; is principal, with generator dlC; = cld; and CI, d l have no
common left factor, so there exist    a; ,
                                      b; E R such that dI       a; -
                                                             CI b; = 1. Thus we have
                                                  -b;)
                                                   a;
                                                       (1 m).
                                                       =
                                                            0      1
This shows that the first matrix on the left, C say, is invertible. It follows that
(k O)C = (a b) and (a b)C- 1 = (k 0), so C- I is the required matrix. Thus we
have a fourth operation
(iv) multiply two columns on the right by an invertible 2 x 2 matrix.
We can now proceed with the reduction. If A = 0, there is nothing to prove; other-
wise we bring a non-zero element to the (1, I)-position in A, by permuting rows and
permuting columns, using (i). Next we use (iv) to replace all successively by the
HCLF of all and al2, then by the HCLF of the new all and al3, and so on. After
n - 1 steps we have transformed A to a form where al2 = al3 = ... = al n = o. By
symmetry the same process can be applied to the first column of A; in the course
of the reduction the first row of A may again become non-zero, but this can
happen only if the length (i.e. the number of factors) of all is reduced; therefore
by induction on the length of all we reach the form all ED AI> where Al is
m - 1 x n - 1. We now apply the same process to Al and by induction on
max(m, n) reach the form
                        ( 1
                          o
                                d)(al
                                1       0
                                             0)
                                             a2
                                                   = (a l
                                                       0
now we can further diminish the length of al unless al is a left factor of da2 for
all d E R, i.e. unless aiR ;2 Ra2. But in that case aiR ;2 Ra2R ;2 Ra2; thus allcla2,
7.2 Principal ideal domains                                                          273
where c is the invariant generator of the ideal Ra2R. Hence allla2, and by repeating
the argument we obtain the expression on the right of (7.2.1). The totality of column
operations amount to right multiplication by an invertible matrix Q, while the row
operations yield P and we thus have the equation (7.2.1).                          •
   We remark that most of the PIDs we encounter will be integral domains with a
norm function satisfying the Euclidean algorithm, i.e. Euclidean domains (possibly
non-commutative). In that case we can instead of (iv) use the Euclidean algorithm,
with an induction on the norm instead of the length, to accomplish the reduction in
Theorem 7.2.1 (cf. the proof of Theorem 3.5.1). Further we can dispense with (ii), so
P, Q can in this case be taken to be products of elementary matrices.
   In the case of simple rings Theorem 7.2.1 still simplifies, since then every invariant
element is a unit.
Corollary 7.2.2. If R is a simple principal ideal domain, then every matrix over R is
associated to a matrix of the form diag(l, 1, ... , 1, a, 0, 0, ... , 0) (a E R).
Proof. If allb, then either b = 0 or a is a unit. Now any unit can be transformed to 1
by applying (ii), so there can only be one diagonal element not 1 or O.             •
Proposition 7.2.3. Let R be a principal ideal domain. Then every submodule of a free
module of finite rank n is free of rank at most n. Further, any finitely generated R-
module is a direct sum of cyclic modules
If moreover, R is simple, then every finite generated R-module is the direct sum of a
cyclic module and a free module of finite rank.
Proof. Let F be a free right R-module of finite rank n; we shall use induction on n. If
al denotes the projection on the first component R in F, then aIM is a right ideal,
principal by hypothesis, and we have the exact sequence
Here M' is a submodule of ker al> while aIM is free (of rank 1 or 0), hence the
sequence splits and M ~ M' EB aIM, and since M' is free of rank .::: n - 1 by the
induction hypothesis, the first assertion follows. Now the rest is clear from Theorem
7.2.1 and Corollary 7.2.2.                                                          •
   In a skew field every non-zero element is a unit, so then Corollary 7.2.2 yields the
well-known result that every matrix over a skew field is associated to Ir EB 0, where r
is the rank of the matrix. We can also use Theorem 7.2.1 to describe the rank of a
matrix over the rational function field K(t), defined as the field of fractions of
K[t), where K is any skew field and t an indeterminate. In K[t] we have the Euclidean
algorithm relative to the degree function, hence K[t] is a PID. We further remark
274                                              Noetherian rings and polynomial identities
that if C is the centre of K, then for any polynomial f of degree n over K and any
A E C such that f(A) = 0, we can write f = (t - A)g, where g has degree n - 1. By
induction it follows that f cannot have more than n zeros in C.
Lemma 7.2.4. Let K be a skew field with infinite centre C, and consider the polynomial
ring K[t], with field of fractions K(t). If A = A(t) is a matrix over K[t], then the rank
of A over K(t) is the supremum of the ranks of A(a), a E C. In fact, this supremum is
assumed for all but a finite number of values of a.
Proof. Since K[t] is a PID we can by Theorem 7.2.1 find invertible matrices P, Q
such that
                PAQ = diag(fl,'"     ,Jr, 0, ... ,0),   where   fi   E   K[t].     (7.2.3)
The product of the non-zero diagonal terms on the right gives us a polynomial f
whose zeros in C are the only points of C at which A = A(t) falls short of its max-
imum rank, and the number of these values cannot exceed the degree of f          •
Exercises
1. Show that over a PID any submodule of a free module (even of infinite rank) is
   free.
2. Give the details of the proof that for a Euclidean domain operations (i), (iii)
   suffice to accomplish the reduction to the form (7.2.1).
3. Let K be a skew field with centre C. Show that a polynomial over C may well have
   infinitely many zeros in K. How is this to be reconciled with the remark before
   Lemma 7.2A?
4. Let R be a PID and M a right R-module. An element x of M is called a torsion
   element if xc = 0 for some c E R x. Verify that the set tM of all torsion elements
   of M is a submodule of M (called the torsion submodule).
5. If tM = 0, M is called torsion free. Show that any finitely generated torsion free
   module over a PID R is free and deduce that any finitely generated R-module
   splits over its torsion submodule.
6. Let R be any ring and M a right R-module. Show that for any invariant element
   c of R, Mc is a sub module. If R is a PID and M is presented by a matrix
   diag(al, ... , ar , 0, ... , 0), ad lai+ i> then for any invariant element c such that
   adclai+l we have M/Mc~R/alRtB ... tBR/aiRtBR/cRtB ... tBR/cR. Deduce
   that the ai are unique up to similarity, where a, b are similar if R/ aR ~ R/bR.
where A is the subring of R generated by 1. This looks very different from R[x); its
elements are not at all like the usual polynomials, but we can simplify matters by
taking the special case of those rings, whose elements can be written in the form
of polynomials. Thus for a given ring R we consider a ring P whose elements can
be written uniquely in the form
(7.3.1)
As usual we define the degree off as the highest power of x which occurs with a non-
zero coefficient:
We shall characterize the ring P under the assumption that the degree has the usual
properties:
0.1 d(f) ::: 0 for f i= 0, d(O) = -00,
0.2 d(f - g) ~ max{d(f), d(g)},
0.3 d(fg) = d(f) + d(g).
An integer-valued function d on a ring satisfying 0.1-0.3 is called a degree function
(essentially this means that -d is a valuation, see Section 9.4 or BA, Chapter 9).
Leaving aside the trivial case R = 0, we see from 0.3 that P is an integral domain
and moreover, for any a E R X , ax has the degree I, so there exist a", a 8 E R such
that
ax = xa" + a 8• (7.3.3)
By the uniqueness of the form (7.3.1), the elements a", a 8 are uniquely determined
by a, and a" = 0 iff a = O. By (7.3.3) we have (a + b)x = x(a + b)" + (a + b)8,
ax + bx = xa" + a 8 + xb" + b8, hence on comparing the right-hand sides we find
(7.3.4)
Theorem 7.3.1. Let P be a ring whose elements can be expressed uniquely as poly-
nomials in x with coefficients in a non-trivial ring R, as in (7.3.1), with a degree func-
tion defined by (7.3.2), and satisfying the commutation rule (7.3.3). Then R is an
integral domain, a is an injective endomorphism, 8 is an a-derivation and
P = R[x; a, 8] is the skew polynomial ring in x over R, relative to a,8. Conversely,
given an integral domain R with an injective endomorphism a and an a-derivation
8, there exists a skew polynomial ring R[x; a, 8].
Proof. It only remains to prove the converse. Consider the set RN of all sequences
(ai) = (ao, aJ, ... ) (ai E R), as right R-module. Besides the right multiplication by
R we have the additive group endomorphism
Hence ax = xa" + a 8 in P, and (7.3.3) holds. It follows that every element of P can
be written as a polynomial (7.3.1), and this expression is unique, for we have
In general the left and right skew polynomial rings are distinct, but when a is an
automorphism of R, with inverse (3, say, then on replacing a by af! we can write
(7.3.3) as af!x = xa + af!8, i.e.
Proposition 7.3.2. The ring R[x; a, 8] is a left skew polynomial ring whenever a is an
automorphism of R.                                                                   •
   In particular, for a skew field K the skew polynomial ring K[x; a, 8] is right
Noetherian and hence also right Ore, by Proposition 7.1.9, so we can form its
field of fractions. This is denoted by K(x; a, 8); its elements are fg- I, where f, g
are polynomials (7.3.1) with coefficients in K.
Proposition 7.3.4. Any skew polynomial ring over a right Ore domain is again right
Ore.
Proof. Let R be a right Ore domain and K its field of fractions. Since a is an injective
endomorphism of R, it can by Corollary 7.1.2 be extended to an endomorphism of K,
again denoted by a. Further, 8 gives rise to an R-inverting homomorphism from R to
K 2 , and this gives an a-derivation of K, again written 8. Now we have the inclusions
 The Hilbert basis theorem extends to skew polynomial rings relative to an auto-
morphism (for endomorphisms it need not hold, see Exercise 2).
From these formulae it is easy to show that for a field k of characteristic 0, Al [k] is
a simple ring. For if a is a non-zero ideal in Al [k], pick an element f(u, v) -I- 0 in a
of least possible degree in u. Then oflou = fv - vf E a, but this has lower degree and
so must be o. Hence f = f(v) is a polynomial in valone. If its v-degree is taken
minimal, then oflov = uf - fu = 0 and so f = C E k. Thus a contains a non-zero
element of k and so must be the whole ring, i.e. Al [kJ is simple, as claimed.
7.3 Skew polynomial rings and Laurent series                                             279
Taking first the case 8 = 0, we can form the ring R[[x; all of formal power series
over R as the set of all infinite series
                                                                                    (7.3.11)
infinite sequence (ai) = (ao, al, ... ), with addition (ai)   + (bj) =   (aj   + bj) and with
multiplication
Alternatively we can regard R [ [x; a]] as the completion of the skew polynomial ring
R [x; a] with respect to the powers of the ideal generated by x; these powers define a
topology called the x-adic topology. This topological viewpoint is not essential for
the construction, but it helps in understanding the situation.
   Let R be a ring and a an automorphism of R. The powers of x form an Ore set in
R[[x; all and by taking fractions we obtain the ring R((x; a)) of all formal Laurent
senes
         00
        '"' i
        ~xai=x -r a_ r      + ... + x - I a_I + ao + xal + xa2+·.··
                                                            2                        (7.3.13)
        -r
This is again a ring, with the same multiplication (7.3.12); here the restriction to
finitely many negative powers is necessary to ensure that the multiplication rule
(7.3.12) has a sense. This is also the reason for taking a to be an automorphism,
since now j may take negative values in (7.3.12).
   Let us now consider a skew polynomial ring R[x; a, 8], where 8 may be non-zero,
but a is still an automorphism, and ask whether a power series ring can be formed.
If we attempt to define the multiplication of power series by means of the com-
mutation formula (7.3.3), we shall find that (apart from a more complicated form
for the coefficients of the product), the product cf, where C E R, cannot always be
expressed as a power series, because there will in general be contributions to the
coefficient of a given power xr from each term cxna n (n ~ r) and so we may have
infinitely many such terms to consider. In terms of the x-adic topology we can
express this by saying that left multiplication by C E R is not continuous; this follows
from (7.3.3), because when a -I=- 0, we have o(ax) < o(x).
   One way to overcome this difficulty is to introduce y = X-I and rewrite (7.3.3) in
terms of y. We find
With the help of this commutation formula we can multiply power series in y and
even Laurent series. We observe that in passing from x to y = x - I we have also
had to change the side on which the coefficients are put; of course this is immaterial
as long as a is an automorphism. To be precise, from any skew polynomial ring
R[x; a, 8] we can form a skew power series ring in X-I, with coefficients on the
left; in order to define Laurent series in x - I we need to assume that a is an auto-
morphism. We shall not pursue this point but note one result which illustrates
the usefulness of power series.
7.3 Skew polynomial rings and Laurent series                                             281
Theorem 7.3.6. Let K be a skew field and a an automorphism of K. Further denote the
centre of K by C and the subfield of C fixed under ex by Co. If no positive power of ex is an
inner automorphism of K, then the centre of K(x; ex) is Co.
Proof. Every element of K(x; ex) can be written as a Laurent series f = Lxiai. If this
lies in the centre, then fc = cf for all c E K, i.e. LXi (aic - Ca; ai) = 0, hence
        °
If a i i= for some i > 0, then ex i is inner, by (7.3.15); in case i < 0, ex- i is inner, but
                                               °          °
this contradicts the hypothesis. Hence a i = for i i= and f = ao E K. Now (7.3.15)
reads aoc = cao, so ao E C, and since xao = aox = xag, we have f = ao E Co.
Conversely, every element of Co commutes with every element of K(x; ex).                   •
  Finally we note a rationality criterion for power series, which applies also in the
skew case.
Proposition 7.3.7. Let K be a skew field with an automorphism ex, and consider the
natural embedding of K(x; ex) in the skew field K((x : ex)) of formal Laurent series.
A given series f = Lxiai lies in K(x; ex) if and only if there exist integers r, no and
elements CI, .•. , Cr E K such that
(7.3.16)
Proof. The series flies in K (x; ex) iff there is a polynomial g with constant term 1 such
that fg is a Laurent polynomial. Writing g = 1 - XCI - ... - xr Cn we require that in
the product ( L xiai)(1 - L xhj ) all coefficients of powers beyond a given one, say
                                                      °
xno vanish. On equating the coefficient of xn to we just obtain (7.3.16), and the
conclusion follows.                                                                     •
Exercises
 1. Supply the details of the proof of Theorem 7.3.5, and point out where the fact
    that ex is an automorphism is used.
 2. (Jategaonkar, Koshevoi) Let K be a skew field and ex an endomorphism of K.
    Show that K[x; ex] is a left Ore domain iff ex is surjective (and hence an auto-
    morphism). Using Proposition 7.1.8, obtain an embedding of the free algebra
    K (x, y) in a right Ore domain, and hence an embedding of K (x, y) in a skew
    field.
 3. Find a localization of the Weyl algebra over a field of finite characteristic, which
    is a crossed product.
 4. Show that for any ring K, a Weyl K-ring Al [K] on u, v may be defined by (7.3.9),
    where u, v commute with the elements of K. Show that if K is a simple ring of
    characteristic 0, then so is A I [K ].
 5. Show that if R is a Noetherian ring with an automorphism ex, then R[x, X-I; ex]
    is again Noetherian.
282                                               Noetherian rings and polynomial identities
L.1 If M' is large in M and Mil is large in M' then Mil is large in M.
    For ifO:j:. N ~ M, thenNnM':j:. 0, hence NnM" = (N nM') nM":j:. 0. •
                                                                           °
L.2 Let M be a right R-module and M' a large submodule of M. For any m E M, the set
    e = {x E R Imx EM'} is a large right ideal in R, and if m :j:. 0, then :j:. me ~ M'.
    Clearly e is a right ideal; for any non-zero right ideal a in R, either ma = 0, in
                                                         °
    which case a ~ e, or ma:j:. 0; then mR n M' :j:. and for any a E a such that
    °                        °
      :j:. ma EM' we have :j:. aR ~ en a, which shows e to be right large. If more-
    over ma EM', then a E e, so O:j:. me ~ M'.                                        •
L.3 For any nilpotent ideal n of R, the left annihilator (n) / = {x E R Ixn = O} is a right
    large ideal of R.
    For, given c:j:. 0, there exists s::: 1 such that ens - I = 0, enS = 0, and
    ens - I ~ eRn (n)/.                                                                  •
                                                             °
   We recall that a module M is called uniform if M :j:. and every non-zero sub-
module of M is large. For example, an integral domain R is right uniform (i.e. uni-
form as right R-module) iff it is a right Ore domain. With the help of uniform
modules we can define a form of dependence relation which leads to a notion of
rank in general modules. Let M be an R-module (for any ring R) and denote by 0/1
the collection of all its uniform submodules. On 0/1 we introduce a dependence rela-
tion as follows. If N, PI, ... , Pr E 0/1, then N is said to be dependent on PI, ... , Pr
if N n L Pi :j:. 0. Generally N is said to be dependent on a (possibly infinite)
family of uniform submodules if it is dependent on a finite subfamily. A set of uni-
form modules is independent if no member is dependent on the rest. This depen-
dence relation satisfies the following conditions:
D.2 (Exchange property) If N is dependent on ff U {M'} but not on ff, then M' is
    dependent on ff U {N}.
  We note that these conditions are like those listed in Section 11.1 of BA (except
that D.1' is a weaker form of D.1 listed there).
  D.O is clear; to prove D.l' we may take the families to be finite, say N is dependent
on the independent family {PI, ... , P r} and each Pi is dependent on {QI, ... , Qs}. By
hypothesis, N n I: Pi -=I 0, so there exists n E N, n -=I 0, such that
Writing Q = I: Qj, we have to show that N n Q -=I O. If Pi E Q for all i, the con-
clusion follows from (7.4.2); otherwise we choose an equation (7.4.2) with n -=I 0
such that the least number of Pi are not in Q. If PI f/. Q say, then since PI is uniform,
PIR n (PI n Q) -=I 0, say 0 -=I PIC E Q. Now in
there are fewer terms outside Q than in (7.4.2) and nc -=I 0, because PIC -=I 0 and the
sum I: Pi is direct. This contradiction shows that N n Q -=I 0, as claimed.
   The exchange axiom D.2 follows easily: if N is dependent on PI, ... , P r but not on
P2 , .•• , P r then there exists n E N, n -=I 0, such that (7.4.2) holds with PI -=I 0, by
hypothesis. If we now rewrite (7.4.2) as PI = n - P2 - ... - Pro we have a relation
showing PI to be dependent on N, P2 , ••• , Pro so D.2 holds.                           •
Proposition 7.4.1. Let R be any ring and M an R-module which contains no infinite
direct sums of non-zero submodules. Then there is a direct sum of uniform submodules
which is large in M, so that M has a rank, and in fact the rank is then finite. Conversely,
a module of finite rank contains uniform submodules, but no infinite direct sum of
submodules.
Proof. We begin by showing that every non-zero submodule N of M contains a
uniform submodule. For if N is not itself uniform, then it contains a direct sum
N J EB N{; now either N{ is uniform or it contains a direct sum N2 EB N{ and con-
tinuing in this way, we obtain in M the direct sum
Since M contains no infinite direct sums, this process must break off, which can
happen only when we reach an N[ which is uniform. Hence N contains a uniform
submodule.
   Now let L~ Uj be a direct sum of uniform submodules in M; such a sum exists,
e.g. for r = O. Either it is large in M or we can find V =f. 0 such that V n L Uj =f. O.
By the first part of the proof V contains a uniform submodule Ur + J and now
L ~ + 1 Uj is a direct sum of r + 1 terms. If we continue in this way, the process
must break off, because M has no infinite direct sums, and it can end only when
we have a direct sum of uniform submodules which is large in M. This shows that
M has a rank and that this rank is finite. Conversely, if rk M = n, then we know
                                                                                          •
that any direct sum contains part of a basis and so cannot have more than n
terms.
  This result may be applied to R itself, as left or right R-module, and in this way we
obtain the notion of left rank and right rank of R. For example, an integral domain
has right rank 1 iff it is a right Ore domain. In fact, by Proposition 7.1.9 we obtain
Corollary 7.4.2. An integral domain which has finite right rank, necessarily has right
rankl.                                                                              •
   We remark that for a submodule M' of M we have rk M' S rk M, with equality iff
M' is large in M. On the other hand, going over to a quotient module may well raise
the rank, e.g. rk Z = 1, but rk(Z/m) > 1 unless m is a prime power.
   Let R be any ring and M be a right R-module. For any subset 5 of M we define the
right annihilator of 5 in R as
                                 (5)r = {x   E   RI5x =   OJ.
It is clear that (5)r is a right ideal; if 5 is a sub module, (5)r is even a two-sided ideal.
When 5 =f:. 0, (5)r cannot contain 1 and so will be proper. If 5 = {a}, we write (a)r
instead of ({a})r' In particular, this defines right annihilators of subsets of R; the left
annihilator of a subset 5 of R (or more generally, of a left R-module) is defined simi-
larly. The ring R is said to satisfy the maximum condition on right annihilators if every
collection of right annihilators in R (of subsets of R) has a maximal member, e.g. any
right Noetherian ring satisfies the maximum condition on right annihilators.
286                                                  Noetherian rings and polynomial identities
                                                    °
Proof. It is clearly enough to prove the result for principal ideals, and we need only
consider left ideals, for Ra is nil iff (xa) n = for all x and suitable n = n(x). Now
          °
(xa)n = implies (ax)n+1 = a(xa)nx = 0, hence if Ra is nil, then so is aR.
   Thus let Ra be a non-zero nil left ideal and choose a maximal annihilator not equal
to R of the form (xa)r' Writing b = xa, we choose y E R; if yb =f. 0, take v 2: 2 such
that (yb)v-I =f. 0, (yb)V = 0. Then (b)r S; ((yb)V-I)r and by maximality we have
equality here; since yb E ((yb) v-I)" we conclude that byb = 0. This holds for all
y E R, even when yb = 0. Hence bRb = 0, b =f. 0, in contradiction to the fact that
R is semiprime. Hence every nil left (or right) ideal is 0.                         •
   We shall also need the notion of a singular submodule. For any ring R and any
right R-module M consider the set
                   Z(M) = {m      E   MI(m)r is a large right ideal of R}.
This set is a submodule of M, called the singular submodule. To verify the module
property, let u, v E Z(M); then a = (u)r and b = (v)r are right large, hence so is
an band (u - v)(a n b) = 0, therefore (u - v)r is right large. Further, if a E R,
then (ua)r is right large, by 1.2, for x E (ua)r {} ax E (u)r and the latter is right
large by hypothesis. Thus Z(M) is indeed a submodule. In particular, taking
M = R, we obtain the right singular ideal Z(R) of R. By what has been shown, it
is a right ideal; in fact it is two-sided, for if (a)r is right large, then so is (ba)r :::2 (a)r'
   Although Goldie's theorem is concerned with Noetherian rings, it applies to a
somewhat wider class, defined as follows. A ring which is of finite right rank and
satisfies the maximum condition on right annihilators is called a right Goldie ring.
In particular, every right Noetherian ring is right Goldie; of course the converse is
false, as the example of commutative integral domains shows. In a right Goldie
ring the right singular ideal is nilpotent; we shall only need the special case where
the ring is semiprime, when it is a consequence of the next result (see Exercise 9
of Section 8.5 for the general case).
                                                                                     °
Proposition 7.4.4. In a right Goldie ring R, for each a E R there exists n 2: such that
anR + (an)r is right large. Moreover, (aV)r = (an)r for all v 2: n and the sum
an R + (an)r is direct.
Proof. The sequence (a)r S; (a 2)r S; ... becomes stationary, say (a V)r = (an)r for
v 2: n. It follows that (an)r n anR = 0, for if an.anx = 0, then x E (a 2n )r = (an)"
hence anx = 0. If c is any right ideal such that c n (anR + (an)r) = 0, then the
sum c + anc + a2n c + . .. is direct, for if asncs + a (s+ I)n cs +1 + ... + atnct = 0,
where Cj E C, Cs =f. 0, then Cs E (an)r + anR, which is a contradiction. Since R has
                       °
finite rank, asnc = for some s 2: 1, i.e. c S; (asn)r = (an)r and it follows that
c = c n (anR + (an)r) =f. 0. This proves that anR + (an)r is right large, and we have
seen that the sum is direct.                                                        •
Corollary 7.4.5. In a right Goldie ring R, if a is left regular, then aR is right large.
Proof. In this case (an)r   = 0 for all n and if anR is right large, then so is aR.        •
  The next lemma, essentially the converse of Corollary 7.4.5, will be needed for
Goldie's theorem, but is also useful elsewhere.
Lemma 7.4.7. In a semiprime right Goldie ring any large right ideal contains a regular
element.
Proof. By Proposition 7.4.3, any nil right ideal of R is 0, so a non-zero right ideal a
will contain a non-nilpotent element a. By Proposition 7.4.4 we can find a power al
of a such that (al)r = (aDr and so aiR + (al)r is direct. If (al)r n a #- 0, we choose
a2 E (al)r n a such that a2 #- 0 and (a2)r = (a~)r' Then a2R + ((adr n (a2)r n a) is a
direct sum contained in (adr n a, hence the sum aiR + a2R + ((al)r n (a2)r n a) is
direct. If (al)r n (a2)r n a #- 0, we can continue the process; at the n-th stage we have
a direct sum
                                                                                     (7.4.4)
So far a was any right ideal   #- 0; if we take a to be right large, then by (7.4.5) we find
                                                                                     (7.4.6)
  In the presence of maximum conditions the definition of a right Ore set can be
simplified a little; this is sometimes useful, although it is not actually needed here.
Proposition 7.4.8. Let R be a ring with maximum condition on right annihilators and
let S be a multiplicative subset of R such that
(i) for any a E R, s E S, as n sR #- 0,
(ii) for any a E R, s E S, as = 0 :::} a =   o.
Then S is a right Ore set consisting of regular elements.
288                                                   Noetherian rings and polynomial identities
Proof. We need only show that for any a          E   R, 5 E S, sa = 0 :::} a = O. By the maxi-
mum condition the sequence
Theorem 7.4.9 (Goldie's theorem). A ring R has a total quotient ring Q which is
semisimple if and only if R is a semiprime right Goldie ring. Moreover, R is simple
Artinian if and only if R is prime right Goldie.
Proof. Assume that R is semiprime right Goldie and let S denote the set of all regular
elements of R. We shall show that S is a right Ore set. Given a E R, 5 E S, define
                                   c = {x   E   Rlax E sR}.
By Corollary 704.5, sR is right large, hence by 1.2, so is c, therefore it contains a
regular element (Lemma 704.7). This shows that sR n as =f:. 0, so S is a right Ore set.
   Let Q = Rs be the total quotient ring. If Ql is a large right ideal of Q, then Ql n R is
right large in R and so contains a regular element. This must be a unit in Q, hence
Ql = Q, i.e. Q has no proper large right ideals, and so is semisimple, by LA.
   Conversely, let R be a ring with a semisimple quotient ring Q. We shall show that
for any right ideal c of R the following conditions are equivalent:
(a) c is right large in R,
(b) cQ=Q,
(c) c contains a regular element of R.
   (a) {} (b). Assume that c is right large and let Ql be any non-zero right ideal of Q.
Then Ql n R =f:. 0, hence Ql n R n c =f:. 0, and so Ql n cQ =f:. O. This shows that cQ is
right large in Q, hence cQ = Q by LA. Conversely, if cQ = Q and a is a non-zero
right ideal in R, then aQ n cQ =f:. 0, hence a n c =f:. 0, so c is right large in R.
   (b) {} (c). If cQ = Q, then 1 = as-I, where a E c and 5 is a regular element in R,
hence a = 5 is regular. Conversely, if c contains a regular element, then clearly
cQ=Q.
   We can now complete the proof of the theorem by verifying that R is semiprime
right Goldie.
   Let n be a nilpotent ideal of R. Then (n), is a right large ideal in R, by 1.3, so it
contains a regular element and it follows that n = O. Thus R is semiprime.
   Next let a = L: a A be a direct sum of non-zero right ideals in R which is right
large; then a contains a regular element c, say:
                          c = XI   + ... +Xn ,       where   Xi E   a A;.
Now cR is right large and is contained in a AI       + ... + aAn' hence the latter sum is right
large, so the sum L: aA was finite.
7.4 Goldie's theorem                                                                 289
                                    (I)~ = (I)~   n R.
Now the maximum condition for right annihilators follows for R, because it holds
in Q. Finally, if R is prime, then so is Q, by Proposition 7.1.10. It follows that Q
is simple. Conversely, if Q is simple, then R must be prime, for if a, b are ideals
in R such that ab = 0, then bQa n R is an ideal of R whose square is zero; since R
is semiprime, we have bQa n R = 0 and hence bQa = O. But Q is simple, so it
follows that a = 0 or b = 0, and this shows R to be prime.                        •
   For prime rings Theorem 7.4.9 was proved by Goldie [1958] and extended by him
to semiprime rings in 1960. The result raises the question of localizing relative to a
prime ideal. If R is a Noetherian ring with a prime ideal p, and Cp denotes the set of
all elements of R that are regular mod p, i.e. whose image in RIp is regular, then it is
not necessarily the case that Cp is a right Ore set. Here it is necessary to consider
more than one prime (a so-called 'clique' of prime ideals), and to localize at the
set of elements that are regular modulo all the prime ideals considered. For a detailed
account of this method see Jategaonkar (1986), McConnell and Robson (1987) or
Goodearl and Warfield (1989).
Exercises
 1. (Kasch-Sandomierski) Show that the socle of a module is the intersection of
      all its large submodules. (Hint. Show first that every submodule is a direct
      summand of a large submodule).
 2.   Let A be the direct sum of an infinite and a finite (non-zero) cyclic group. Show
      that for the dependence relation defined for A as Z-module the form of 0.1'
      without the independence of ff (0.1 of BA) does not hold. Show also that
      this form of 0.1' does hold on any torsion-free module.
 3.   Show that an Artinian semiprime ring is self-injective.
 4.   Find the rank of Zlm, for a positive integer m.
 5.   Show that for any ring R and any n ~ 1, rk(Rn) = n.rk R.
 6.   Show that the injective hull of a uniform module is indecomposable. Use this
      fact and the Krull-Schmidt theorem to give another proof of Proposition 7.4.1.
 7.   Let F be the ring of all real continuous functions on the unit interval with point-
      wise operations: (f + g)(x) = f(x) + g(x), (fg)(x) = f(x)g(x). Show that F has
      no uniform ideals.
 8.   Show that the maximum condition on left annihilators is equivalent to the mini-
      mum condition on right annihilators.
 9.   Show that a reduced ring (x 2 = 0 ::::} x = 0) is non-singular.
10.   Show that over an Ore domain every finitely generated flat module is projective.
      (Hint. Use Further Exercise 16 of Chapter 4.)
11.   Show that every left and right self-injective ring is its own quotient ring (R is
      right self-injective if RR is injective).
290                                                  Noetherian rings and polynomial identities
12. Show that a non-zero submodule of a direct sum of uniform modules has a uni-
    form submodule. (Hint. First find a non-zero submodule in a finite direct sum.)
7.5 PI-algebras
Let k be a field and F = k(xj, ... ,Xd) the free k-algebra on Xj, ... ,Xd. The elements
of F are called polynomials and a k-algebra A is said to satisfy the polynomial identity
if P is an element of F which vanishes for all values of the x's in A. If A satisfies a non-
trivial polynomial identity, i.e. an identity (7.5.1) where p is not the zero poly-
nomial, then A is called a PI-algebra. Many of the results proved for Noetherian
rings have their counterparts for PI -algebras. It will simplify matters to assume a
field of coefficients, though it is possible to consider more general coefficient rings
(see Procesi (1973) or Rowen (1980)).
Examples
1. Every commutative k-algebra satisfies the identity xy - yx = 0 and so is a PI-
   algebra.
2. Every Boolean ring satisfies the identity X2 - x = O.
3. Every finite-dimensional algebra satisfies an identity. If dim A < n, then A satis-
   fies the identity
                                                                                       (7.5.3)
7.5 PI-algebras                                                                          291
   If the equation for some element of A has degree less than n, we can bring it to the
   form (7.5.3) by multiplying by a power of x. Writing [x, y] = xy - yx, we obtain
   from (7.5.3),
   Thus the commutators in this expression are linearly dependent and so A satisfies
   the identity
Corollary 7.5.2. Every PI-algebra which is also an integral domain is a left and right
Ore domain.                                                                               •
Proposition 7.5.3. Any algebra A satisfying a polynomial identity of degree n also satis-
fies a polynomial identity of degree n which is multilinear.
Proof. Let p = P(XI, ... , Xd) be a polynomial of degree n which vanishes identically
on A, and let r be the highest degree to which any variable occurs, say p has degree r
in Xl. If r > 1, replace p by P(XI + Xd+l, X2, ... ) - P(XI, X2, ... ) - P(Xd+l, X2, ... );
since (Xl +Xd+S -X~ - Xd+ l #- 0 (even in finite characteristic, because XIXd+1 #-
Xd+ IXI), we get a polynomial in which Xl, Xd+ I occur, but of degree less than rand
the degree of the other variables is not raised. By a double induction, on the highest
degree r and on the number of variables occurring to this degree, we reduce P to
a polynomial of degree 1 in each variable. The process preserves the total degree,
so we get a polynomial q with a term axIX2 ... Xn> say. Now replace q by
q( ... ,Xi, ... ) - q( ... ,0, ... )(1 :::: i :::: n) to get rid of terms not involving Xi. •
Here we do not restrict our algebra to be unital (for otherwise it would have to be
trivial, by (7.5.4)). Then A also satisfies (x + y)2 - X2 - y2 = 0, i.e.
                                       xy+yx = O.                                    (7.5.5)
292                                                 Noetherian rings and polynomial identities
and this is multilinear. In characteristic other than 2 the identities (7.5.4) and (7.5.5)
are equivalent, for we can get back from (7.5.5) to (7.5.4) by putting y = x, which
gives 2X2 = o.
   A multilinear identity has the advantage that to verify it we need only check the
elements of a basis. Moreover, such identities are preserved under extensions:
Corollary 7.5.4. Let R be a k-algebra with centre C. IfR contains a subalgebra A which
is a PI-algebra such that R = AC, then R is a PI-algebra.
Proof. By Proposition 7.5.3, A satisfies a multilinear identity p(x], ... , xn)     = O. Let
{u)J be a k-basis for R and put ai = L aiA uA, where aiA E C. Then
   In the opposite direction from Proposition 7.5.3 one can show that every PI-
algebra satisfies an identity in two variables. To prove this fact we shall need the fact
that every free algebra of rank at least 2 contains a free sub algebra of countable rank.
This is easily verified: in k(x,y) the elements Zn = xyn (n = 1,2, ... ) are free,
because in any equation f(z], ... , zn) = 0 we can equate homogeneous components,
and if L UiZi = 0, then L Uixyi = 0, and it follows that Ui = 0, so we reach the
conclusion by induction on the degree.
  One of the main results in PI-theory, Kaplansky's theorem, asserts that a primitive
PI-algebra is finite-dimensional; this will be proved in Chapter 8, where primitive
7.5 PI-algebras                                                                                   293
rings are discussed. For the moment we shall show that the n x n matrix ring over a
commutative ring satisfies the standard identity S2n (Amitsur-Levitzki theorem). The
proof uses exterior algebras; we recall that the exterior algebra on a vector space V is
the algebra generated by V with the defining relations v2 = 0 (v E V). If V has the
basis V), •.. , Vn then a basis for the algebra is given by the elements Vii'" Vi,
(i) < ... < i r , 1 S r S n) (see BA, Section 6.4). We shall also need an elementary
result on traces.
Lemma 7.5.7. Let K be a commutative Q-algebra and A                E   9Jtn (K). If tr(A')   = 0 for
r = 1, ... ,n, then An = O.
Proof. Suppose first that K is an algebraically closed field (necessarily of character-
istic 0) and let A be an n x n matrix over K, with eigenvalues A), ... , An. The
characteristic polynomial of A is
(7.5.7)
where the Ci are (except for sign) the elementary symmetric functions of the A's.
Since we are in characteristic 0, we can express C), ••. , Cn as polynomials in the
power sums of the A's, Sr = L A~ = tr(A'), r = 1, ... , n (Newton's formulae):
mutative and by (7.5.10), A2 E 9J1 n (Eo), so we need only show that tr(A 2r ) = 0, for
r = 1, ... , n, by Lemma 7.5.7. By (7.5.10) this will follow if we show
                     tr(Sm(A I , ••• ,Am))   = 0,    where m is even.                (7.5.11)
Let S be the symmetric group on 1, ... , m and T the stabilizer of 1 in S; then a trans-
versal of T in S is 1, r, ... , r m - l , where r = (1,2, ... , m). Every element of Sis
uniquely expressible as ria, where 0::: i < m and a E T; for ri brings 1 to the
right place and a then permutes 2, ... , m as needed. Hence we can write
Here the second sum is independent of i, and L sgn( ri) = 0, because m is even, so
(7.5.11) follows, and this proves (7.5.9) when K is a Q-algebra. Therefore it holds for
a polynomial ring over Z (which can be embedded in a Q-algebra), and hence for any
commutative ring (which is a homomorphic image of a polynomial ring).                •
Lemma 7.5.9 (Staircase lemma). Let A be a K-algebra with 1 -=I- 0. Then 9J1 n (A) satis-
fies no polynomial identity of degree less than 2n.
Proof. If A satisfies an identity of degree r < 2n, then it also satisfies a multilinear
identity of degree r. Let this be p = 0, where each term of p consists of Xl, ... , Xr
in some order. Thus p has the form
                                                                                     (7.5.12)
where ex E k and p I is the sum of products of the x's in other orders than that shown.
Now the matrix units in A satisfy
while the product in any other order is 0, and this applies even if we only take the
first r, where r > 1. Hence if we put Xl = ell, X2 = el2, X3 = e22, ... then the first
term in (7.5.12) is exeli for some i, while all other terms vanish, so p does not
vanish on An and we have reached a contradiction.                                  •
Exercises
1. Show that a polynomial which vanishes identically on a non-trivial algebra must
   have zero constant term.
2. Let R be a prime PI -algebra, satisfying an identity of degree d. Show that the left
   (or right) uniform rank of R is < d.
7.6 Varieties of PI-algebras and Regev's theorem                                        295
3. Show that if a Q-algebra with 1 satisfies the standard identity S2n+ 1 = 0, then it
   also satisfies S2n = o.
4. Show that Sn( [x, y], [X2, y], ... , [xn, y]) = 0 holds in mtn(k) but not in mtn+ 1 (k),
   if k is infinite.
5. Show that Sn+l(Xl, ... ,xn+d=L:(_1)i+lxiS(Xl, ... ,Xi-l,Xi+l, ... ,Xn+l).
   Deduce that Sn = 0 implies that Srn = 0 for all m > n.
6. Let R be a central simple algebra of degree n over an infinite field as centre. Show
   that R satisfies S2n = 0 but no identity of degree < 2n.
7. Let R, S be such that mtn(R) ~ mtn(S) and R is commutative. Show that S is also
   commutative and deduce that R ~ S. (Hint. Apply the standard identity with
   suitable arguments including aell, bell.)
8. Explain the name of Lemma 7.5.9 (keeping in mind matrix notation).
where t is the ideal of relations in A. Suppose that A is relatively free and that
A : F ---+ A is a surjective homomorphism with kernel t, whose restriction to X
defines a bijection with Y. Given any mapping cp : X ---+ F, there is a unique endo-
morphism of F agreeing with cp on X, which we may again denote by cpo We have
296                                               Noetherian rings and polynomial identities
to show that q; maps t into itself. By hypothesis the mapping A- j q;A : Y ----+ A extends
to an endomorphism g of A; thus Xq;A = XAg for all x E X, hence q;A = Ag holds on
all of F. If PEt, then pA = 0, hence pq;A = pAg = 0 and it follows that
pq; E ker A = t, which shows that A ~ FIt.
   Conversely, assume that A ~ Fit, where t is a T-ideal and let a mapping
g: Y ----+ A be given. We define q; : X ----+ F as follows: given x E X, choose an element
u of F such that UA = XAg and put u = xq;. The mapping q; extends to an endo-
morphism of F, which will again be denoted by q;. Since t is a T-ideal, if PEt,
then pq; E t; thus q; can be factored by A to give an endomorphism h of A such
that q; = Ah. For any x E X we have Xq;A = xAh = XAg, hence h is an endomorphism
of A which agrees with g on Y, and this shows A to be relatively free.                  •
   Our aim in what follows is to prove that the tensor product of two PI-algebras is
again a PI-algebra. This is Amitai Regev's theorem; the proof given here is due to
Victor Latyshev. Some preparation is necessary.
   Consider a permutation a of 1, 2, ... , n. We use a to define a partial ordering on
the set {l, ... , n} by writing i -< j whenever i < j and ia < ja. Let us recall that an
antichain in a partially ordered set is a subset of pairwise incomparable elements,
and the width of the set is the maximum number of elements in an antichain.
For example, for our permutation a an antichain of d elements is a set of numbers
ij < i2 < ... < id such that ija> i2a > ... > ida. We also recall (BA, Theorem
1.3.1):
Dilworth's theorem. In any finite partially ordered set S, the minimum number of
disjoint chains into which S can be decomposed is the width of s.            •
  Given a permutation a, suppose that the corresponding partially ordered set can
be decomposed into d chains. Then to specify a we need only give the d chains and
                                                            1 2 3 4 5 6 7 8)
their images under a. For example, the permutation (
                                                            358         1 476           2
defines a partially ordered set of width 4, and it may be expressed as the unions of the
chains {l, 2, 3}, {4, 5, 6}, {7}, {8} with images {3, 5, 8}, {l, 4, 7}, {6}, {2}. Dilworth's
theorem allows us to estimate the number of permutations of a given width:
Theorem 7.6.2. The set of permutations of 1, 2, ... , n for which any set of d numbers
(2 :S d < n) contains at least one pair in their natural order is at most (d - 1)2n.
Proof. Let a be such a permutation and consider the partial ordering defined by a (as
above) on {I, 2, ... , n}. The hypothesis states that no antichain has d elements, so
the set has width less than d, and by Dilworth's theorem it can be written as a disjoint
union of at most d - 1 chains. Let us number these chains from 1 to 8, where 8 < d.
To specify a we have to give the distribution of 1, 2, ... , n and their images under a
over these 8 chains. This can be done by defining two mappings from {l, ... , n} to
{l, ... , 8}. For each mapping there are 8n choices, hence in all there are 82n choices
7.6 Varieties of PI-algebras and Regev's theorem                                                297
  Let F = k(X) be the free k-algebra on X = {Xl, X2, ... } and denote by Ln the sub-
space of all multilinear forms in Xl, ... , Xn; Ln is spanned by the monomials
(7.6.2)
hence its dimension over k is nL We fix an integer d, 2 .::: d .::: n, and call a monomial
(7.6.2) good if the partial ordering defined by a has width < d; by Theorem 7.6.2 the
number of good monomials (7.6.2) does not exceed (d - 1)2n. We shall use this fact
to bound the dimension of a relatively free algebra; here it will be convenient to
restrict attention to multilinear elements.
Proposition 7.6.3. Let t be a T-ideal in the free algebra F = k(X), and put tn = tn Ln>
where Ln is the space of multilinear forms of degree n, as above. If t contains a poly-
nomial of degree d, where 2 .::: d .::: n, then
(7.6.3)
Proof. Let us take the monomial basis in Ln with the lexicographic ordering. By
linearization it follows that FIt satisfies a d-linear identity:
For the proof it will be enough to show that Ln is spanned (mod t) by the good
monomials. Suppose that u is a monomial which is not good. Then we have a
factorization
where al > a2 > ... > ad. If in (7.6.4) we put Yi = Xa i ••• Xf3i' Yd = x ad ' we obtain
Theorem 7.6.4 (Regev's theorem, 1972). The tensor product of two PI-algebras over
a field k is again a PI-algebra.
298                                                    Noetherian rings and polynomial identities
                                                                                         (7.6.7)
                                      a
For such n the equations (7.6.7) have a non-trivial solution Y2 and thenf            = Ly2x a
is the required polynomial identity for A ® B.                                                •
Exercises
1. Show that an algebra A is universal for homomorphisms into some family rtf of
   algebras iff A is relatively free.
2. Show that if two algebras A, B satisfy the same polynomial identity of degree d,
   then A ® B satisfies an identity of degree at most (d - 1)4. (Hint. Use Stirling's
   formula: (n -1)! '" nne-n y'(2rr/n).)
3. Show that an algebra satisfying X2 = 0 also satisfies xyz = O.
4. Show that if A, B satisfy X2 = 0, then A ® B satisfies X3 = O. Find an identity for
   A ® B when A satisfies X2 = 0 and B satisfies x3 = O.
7.7 Generic matrix rings and central polynomials                                       299
Proposition 7.7.1. Let F = k(tl ... , td) be the free k-algebra, F = k(XI, ... , Xd)(n) the
generic matrix ring of degree n and v : F -+ F(n) the k-algebra homomorphism in which
t8 1-+ X8. Then P E F vanishes identically on every n x n matrix over a commutative
k-algebra if and only if pv = O.
Proof. If p vanishes on every n x n matrix ring, then in particular, pv = O.
Conversely, if pv = 0, then since every homomorphism cp : F -+ m'tn(A) (A commu-
tative) can be factored by v, say cp = vcp', we have pcp = pvcp' = o.         •
   Thus if P E F and we want to check whether p = 0 holds in all matrix rings, we
need only find its image in F(n)' Here it is often convenient to embed the coefficient
ring k[tij8] in an algebraically closed field K. Then we can transform any matrix over
K with distinct eigenvalues to diagonal form. Now the generic matrix X8 = (tij8)
certainly has distinct eigenvalues, since we can specialize it to any other matrix.
Hence we can always transform Xl to diagonal form; of course the same applies to
X2, ... , Xd, but we cannot transform more than one of the x's simultaneously to
diagonal form, because they do not commute.
   It turns out that F(n) can be embedded in a skew field; this follows from
Proposition 7.7.2. The generic matrix ring k(XI, ... , Xd)(n) is a left and right Ore
domain.
300                                             Noetherian rings and polynomial identities
Proof. We have already seen that F(n) is a PI -algebra; if we can show that it is an
integral domain, the desired result will follow by Corollary 7.5.2.
   It remains to show that F(n) is an integral domain; the essence of the proof will be
to show that any polynomial identity holding in F(n) also holds in a certain division
algebra. Let K be the field offractions of the coefficient ring k[tij8] and let E be an
extension field of K with a K-automorphism ex of order n, e.g. we may take
E = K(~l' ... ,~n) to be a rational function field and ex a cyclic permutation of the
fs. Let D = E(z; ex) be the skew field of fractions of the skew polynomial ring. As
we have seen in Example 6 of Section 7.3, this is a central division algebra of
degree n over its centre C, so C is infinite. Let L be a splitting field of D; then
                                                                                  (7.7.1)
Here the left-hand side contains D, while the right-hand side contains Kn. Now let
f, g E F(n) and suppose that fg = O. Then fg vanishes identically on Ln and hence
(by (7.7.1)) on D. Since D is a skew field, it follows that for each choice of arguments
either f or g vanishes, so if y is a new indeterminate, then
                                                                                  (7.7.2)
identically in D. By Proposition 7.5.5, this also holds in DL ~ Ln and hence in Kn, but
Kn  is simple, hence prime, so either f = 0 or g = 0, as elements of F(n)' This shows
F(n) to be an integral domain.                                                       •
   From this result it follows that F has a skew field of fractions, called the generic
division algebra of degree n over k. We shall see below (in Corollary 7.7.5) that its
degree over its centre is n.
   These concepts have been used by Shimshon Amitsur [1972] to prove that not
every division algebra is a crossed product. It can be shown that if the generic divi-
sion algebra of degree n were a crossed product, with Galois group r, then every
division algebra of degree n would be a crossed product with group r. Now Amitsur
constructs two division algebras of degree n (for a certain n, and in characteristic 0)
which cannot be expressed as crossed products with the same Galois group. It follows
that the generic division algebra cannot be a crossed product (see Rowen (1988)).
   In studying the centre of a PI-algebra it would be useful to have 'centre-valued' or
'central' polynomials, i.e. polynomials p which when evaluated, always yield elements
of the centre. By a central polynomial for n x n matrices one understands a poly-
nomial p E k(Xl, ... , Xd) which when evaluated in 9J1n (A), where A is a commutative
k-algebra, takes values in the centre of 9J1n (A). Of course we are only interested in
non-constant polynomials, i.e. polynomials taking at least two values.
   As an example consider 2 x 2 matrices over R. A commutator of 2 x 2 matrices
                               ex 2 =(a2 +bc
                                         o
                                                   0).
                                                a2 +bc
7.7 Generic matrix rings and central polynomials                                     301
This is a scalar, hence (xy - yx)2 is a central polynomial for 2 x 2 matrices. This was
essentially the only non-constant central polynomial known for many years, but a
large family of central polynomials for all values of n was discovered in 1972 by
Edward Formanek. In 1974 Yuri Razmyslov discovered multilinear central poly-
nomials for n x n matrices, and we shall now describe his construction. We preface
the main theorem by two remarks.
   Let k be a field; the matrix ring kn may be considered as an n2 -dimensional vector
space with a bilinear form tr(xy). The usual matrix basis {eij} has the dual basis {eji},
for we clearly have
                                     tr( eijekl) = OjkOil.
For simplicity we shall index the eij by a single suffix 'A: eA ('A = 1, ... , n2 ) and
denote the dual basis by e~. We observe that for any matrix a = L aijeij we have
L eAae~ = tr(a).1 for any dual bases {eA}, {<} of kn. (7.7.3)
Secondly, put F = k(Xl,"" Xd} and consider the subspace <I> of elements that are
homogeneous of degree 1 in Xl. These elements can be written in the form
L aiXI bi, where ai, bi E k(X2, ... ,Xd}, and the space <I> admits the linear mapping
Lemma 7.7.3. Given any k-algebra R, let Xij be n2 commuting indeterminates over k
and write X = (Xij). Then the R-bimodule generated by x in 9J1n(R ® k[Xij]) admits
the k-linear mapping
                                                                                 (7.7.4)
Proof. The subspace of matrices linear in the xij has the R-basis xijers . Let us define
(xijers)Ax = xsreji; then (7.7.4) holds for al = eri, bl = ejs' for then alxbl = xijers ,
blxal = Xsreji. Hence it holds generally, by linearity.                               •
Proof. Let   {e~J be a basis for kn and denote by E the field of fractions of the poly-
nomial ring in the commuting indeterminates x?_),y~A) (A. = 1, ... , n2 , i = 0,1,
... , n2 ). We shall prove the theorem by taking Xi = Lx)A)eA,Yi = Ly;A)eA and
establishing (7.7.6) in En.
   In the first place we note that tr( C) =f=. O. For if we take the Xi to be the ers in some
order, we can choose the Yj so that
while all the other terms in Care 0, and with these values tr( C) = l.
  It is clear from the definition that C vanishes when we put Xi = Xj' where i =f=. j.
Hence by (7.7.5),
                                                           0    if j =f=. i,
                         x- CA-,IXi-> I)) = tr (IXi->X)
                     tr (J(                     C       = { (C)'f"
                                                           tr   1 J = 1.
It follows that up to a scalar (non-zero!) factor {(CA;lxi -+ l)} is a dual basis for
{Xi}. So by (7.7.3),
D = LXiZ(CAilxi-> I) = tr(z)tr(C).1,
as we wished to show.
                                                                                              •
   The polynomial C = Cn is called the Capelli polynomial and D = Dn is the
Razmyslov polynomial for n x n matrices. By Theorem 7.7.4 it is a non-constant
central polynomial on 9J1 n (k) whose value is relatively easy to calculate, using the
formula (7.7.6).
   As Claudio Procesi has observed, any central polynomial cp for n x n matrices,
with zero constant term, vanishes identically on 9J1 m (k) for m < n, for we can
regard any m x m matrix as an n x n matrix whose last n - m rows and columns
are O. Now the value of cp must be central, i.e. a scalar, and this scalar is 0, as we
see by looking at the (n, n)-entry. More generally, let A be a simple algebra of
degree m with centre k. If E is a splitting field, then AE = A 0 E ~ Ern and since
Dn is multilinear, it vanishes on A whenever m < n. This proves
7.8 Generalized polynomial identities                                                   303
Corollary 7.7.5. The Razmyslov polynomial Dn is central and non-vanishing for any
central simple algebra of degree n, while it vanishes identically on central simple algebras
of degree less than n.                                                                    •
   It is clear that Dn does not vanish on the generic division algebra F(n), whereas
Dn + 1  does; this shows F(n) to be of degree n. We shall return to this point in Section
8.5, when we come to discuss prime PI -algebras.
Exercises
1. Show that the exterior algebraA(V) on any vector space Vis a PI-algebra, but if V
   is infinite-dimensional, A(V) does not satisfy a standard identity and so cannot
   be embedded in a matrix algebra over a commutative ring.
2. Show that in a prime PI-algebra every non-zero ideal contains a central regular
   element.
3. Show that Corollary 7.7.5 holds for any central polynomial with zero constant
   term. (Hint. Use Proposition 7.5.5 and treat the case of a finite ground field
   separately.)
4. Show that [tr(ax)b]Ax = a.tr(xb), where Ax is the Razmyslov transposition.
5. (Razmyslov)Show that the commutators [a, b] = ab - ba span an (n 2 - 1)-
   dimensional subspace of kn and use this fact to prove that (in the notation of
   Theorem 7.7.4) B=C(X,[U2,V2], ... ,[U n2,Vn,];Yo, ... ,Yn,)=tr(x)p, where p
   is a matrix depending on u, v, y. Deduce that BAx is again a central polynomial.
6. Let fbe a polynomial in the entries of a square matrix A over a field. Show that if
   f is unchanged when A is replaced by P - 1AP, then f is a symmetric function of the
   eigenvalues of A.
bound on the dimension. This can be illustrated by the example of the n x n matrix
ring over a commutative ring, which satisfies a GPI of degree 2:
whereas an ordinary identity has degree at least 2n, as we saw in the staircase lemma
(Lemma 7.5.9). An even simpler example is given by a non-prime ring, which always
satisfies a GPI of degree 1: axb = 0 (for suitable a, b :j:. 0). We shall confine our
attention to the special case of Amitsur's theorem where R is a skew field; thus we
shall essentially prove that a skew field satisfying a GPI is finite-dimensional over
its centre. This then will also provide an independent proof of Kaplansky's theorem
for the case of a skew field. We begin with two lemmas.
Lemma 7.B.1. Let D be a skew field, ~ a multiplicative subset of D and K its centralizer
in D. If ai, bi E D X (i = 1, ... , n) are such that
(7.8.2)
                       0= - LalXUjbIvj = taix(LUjbiVj).
                               }               1=2                }
This holds for all x E ~ and is a shorter relation than (7.8.1), hence the coefficient of
each ai must vanish, i.e. L ujbivj = 0, for i = 2, ... , n, as we wished to show.
   This shows the mapping (7.8.2) to be well-defined. It is right D-linear, i.e.
Yi(ZC) = Yi(Z)C for all C E D. Taking Z = 1, we find that Yi(C) = Yi.C is left multiplica-
tion by an element Yi. Moreover, Yi is also left ~-linear: YiW = WYi for all W E ~,
hence Yi E K. By definition, Yibl = bi, hence
Since bl :j:. 0, we have   Lam =   0, and here YI    = 1, so the a's are linearly dependent
~K.                                                                                              •
7.8 Generalized polynomial identities                                                  305
Lemma 7.8.2. Let D, I;, K be as in Lemma 7.S.1. Suppose that ai, ... , an ED are right
linearly independent over K and bl , ... , bn E D are such that the set
E = {L aixbi Ix E I;} is contained in a finite-dimensional left K-space. Then there
exists c E D x such that KcI; is finite-dimensional as left K-space.
Proof. Let   UI, ... , U r   be a left K-basis for a space containing E, so that
                n               r
              Laixbi         = LA/X)Uj       for all x E I; and some Aj(X) E K.
                I               I
If biy #- Yib for some Y E I; and some i, we can apply induction on n to reach the
conclusion. Otherwise biy = yb i for all Y E I; and so bi E K; hence L aixbi =
(L aibi)x and we are reduced to the case n = 1.                                 •
   We now come to the main result of this section. Here the restriction to multilinear
elements is necessary because I; may not admit sums.
where bi E D and no term in any qj has zero degree in the x's. We may again take
bl - 1 and suppose f chosen so that rand s are minimal. Then for any Y E I;
                                                 r                     5
  f(XI, ... ,Xn)Y - f(xIY, X2, ... ,Xn )     = Lgixi (biy -   yb i) + LPjXI (qjY - yqj).
                                                 2                     I
                                                                                    (7.8.4)
Since r was chosen minimal in (7.8.3), 1, b2 , •.• , br are linearly independent over K,
in particular, bi fj. K for i > 1, so none of the terms biy - yb i in the first sum can
vanish identically for Y E I;. Choosing Y = Yo E I; such that b2yo #- YOb2, we
obtain a GPI in D with a smaller value of r, unless r = 1, when the first sum on
306                                              Noetherian rings and polynomial identities
the right of (7.8.4) is absent. In the latter case consider qjY - Y%; if this vanishes
identically for all Y E 1:, then qj E K for all values of the x's in 1:. Write qj =
L cix2 di, where the values of X3, ••• ,Xn in 1: are chosen so that qj ::f 0 (which is
possible, by induction on n). Then the set {L ciyddy E 1:} is one-dimensional
over K, hence Kc 1: is finite-dimensional for some C E D x, by Lemma 7.8.2, in
contradiction to the hypothesis.
   There remains the case where the qjY - yqj do not all vanish identically; we shall
show that this leads to a contradiction. In this case the left-hand side of (7.8.4), for
suitable y = Yo E D is a non-zero polynomial fl, again multilinear in XI, ... , Xno with
no term in which XI is last. Moreover, each term in fl has the x's in the same order as
some term in f, so if Xv does not come last in any term off, then the same is true of fl'
We apply the same reduction to X2, ... , Xn in turn and finally obtain a polynomial f *
in which no Xn comes last. This is impossible, so this case cannot occur.             •
Theorem 7.8.4 (Amitsur). Let D be a skew field with centre C such that [D: C] =        00.
If f E Dc(X) is non-zero, then f (a) ::f 0 for some a E D x •
  The above proof uses simplifications by Wallace Martindale and Yitz Herstein, see
Herstein (1976).
Exercises
1. Let R be a simple ring with centre C. Show that Lemma 7.8.1 holds for D = R,
   1: = R, K = C. Deduce that if a, b E R satisfy axb = bxa for all X E R, and
   a ::f 0, then b = Aa for some A E C.
2. Let k be a field of characteristic O. Show that in the Weyl algebra Al (k) with
   generators u, v every non-trivial multilinear polynomial f is non-zero when the
   variables in f are replaced by suitable powers of u and v.
3. Let G be a group and Ll( G) the set of elements of G which have only finitely many
   conjugates in G. Show that Ll (G) is a characteristic subgroup of G. Show further
   that if the group algebra kG has a GPI holding for all arguments in G, then Ll (G)
   is of finite index in G.
 2. Show that the skew field of fractions of a right Ore domain is unique up to an
    isomorphism leaving R fixed. (Note that this does not extend to general rings,
    e.g. a free algebra of rank at least two has many non-isomorphic skew fields
    of fractions, see Exercise 2 of Section 7.3 and Cohn (1985), (1995).)
 3. Let M be a cancellation monoid. Given a, b E M, if aM n bM = 0, show that
    the submonoid generated by a and b is free on these generators.
 4. Let R be a ring with total quotient ring Q. Show that Q is semisimple whenever
    every right Q-module is injective as right R-module. (Hint. Take a right ideal in
    Q, form its complement C as right R-module and verify that C is a Q-module.)
 5. Let k be an algebraically closed field and let A be the translation ring over k,
    generated by x, y with xy = y(x + 1) (see Section 7.3, Example 4). Verify that
    yA is a prime ideal and its complement is an Ore set. Show that any prime
    ideal other than 0 or yA has the form Ca = yA + (x - a)A, where a E k.
    Verify that YCa = Ca + I Y and deduce that the complement of Ca is not an Ore
    set, but the complement of nnca+n is an Ore set, for each a E k.
 6. In the k-algebra with generators x, y and defining relation xy = Ayx, where
    A E k x, find the maximal prime ideals and the complements of intersections
    of prime ideals that are Ore sets. (Hint. Treat the case where A is a root of 1
    separately.)
 7. (Ore) Let k be a field containing a finite subfield Fq • Show that the elements of
    k[x] which as functions on k are linear over Fq are the q-polynomials L:>iXq',
    and that they form a ring under substitution as multiplication:
    fg(x) = f(g(x)). Verify that this ring is isomorphic to k[z; cp], where cp: a H a q .
 8. (P. Fatou) A power series over Z is called primitive if no prime divides all its
    coefficients. Show that the product of primitive power series is primitive.
    Deduce that if P, Q E Z[x] are coprime polynomials such that the power
    series PIQ has integer coefficients, then Q(O) = ±l. (Hint. Find polynomials
    f, g E Z[x] such that fP + gQ = m is a positive integer and express m as a pro-
    duct of Q and another series in Z[[xll.)
 9. Let R be a right Bezout domain. Show that any finitely generated torsion-free left
    R-module is free. Deduce that over a 2-sided Bezout domain every finitely
    generated module splits over its torsion submodule.
10. Let R be a right Ore domain and K its field of fractions. Show that any ring
    between Rand K is again right Ore.
11. Show that a semiprime right Goldie ring satisfies the minimum condition on
    right annihilator ideals. (Hint. In any chain the rank becomes stationary; now
    use (a) =} (c) in the proof of Theorem 7.4.9 and 1.2 to show that essential exten-
    sions are trivial.)
12. A module is called semi-Artinian if every non-zero quotient contains a simple
    submodule. Show that a semi-Artinian module is non-singular iff its socle is
    projective.
13. (A. R. Kerner) If the symmetrizer ha corresponds to the polynomial Fa (XI , ... , x n)
    by the rule L au<Y 1--+ L auXluXZu ... Xnu , show that Fa is the linearization of
    Sn, (XI, ... , Xn, ) ... Sn, (XI, ... , x n,), where the Si are standard polynomials and
    a has columns nl, ... , nr •
308                                           Noetherian rings and polynomial identities
14. (P. J. Higgins) Write P(x, y) = L 3xiyxn-J-i. Show that if an algebra A over a
    field of characteristic prime to n! satisfies xn = 0, then it also satisfies
    L Xl cr ••• Xncr = 0, where a runs over all permutations of 1, ... , n, and deduce
                   °
    that P(x,y) = in A. By evaluating the expression Lxizyjxn-l-iyn-l-j in
    two ways, show that A satisfies the identity x n - Jzy n - I = 0.
15. (P. J. Higgins) Use Exercise 14 to prove the Nagata-Higman theorem: if an
    algebra of characteristic prime to n! satisfies the identity xn = 0, then
    A zn - 1 = 0. (Hint. Let I be the ideal generated by all elements an-I, where
                               °
    a EA. Show that n!IAI = by Exercise 14 and apply induction on n to A/I.)
                        Rings without finiteness
                        assumption
For general rings there is naturally not as much structure theory as in the Artinian
or Noetherian case. It is true that some of the same methods can be used, e.g. the
radical can be defined, semiprimitive rings can be expressed as subdirect products
of primitive rings etc., but these methods are less precise and they do not lead to
a complete classification. For primitive rings a structure theorem can be proved
using a general version of the density theorem; this is presented in Section 8.1 and
applied in Section 8.2, while Section 8.3 deals with semiprimitive rings. So far we
have taken the existence of a unit element for granted, but some work has been
done on 'rings without one' and we cast a brief glance at it in Section 8.4; we
shall examine the case of simple rings and also see when the existence of a 'one'
follows from other assumptions. In Section 8.5 we study semiprime rings, and in
Section 8.6 we present an analogue of Goldie's theorem for PI-rings. The final
section, Section 8.7, takes a brief look at a natural generalization of principal ideal
domains: free ideal rings.
                                            309
P. Cohn, Further Algebra and Applications
© Professor P.M. Cohn, 2003
310                                                      Rings without finiteness assumption
   The first step is to find a matrix representation for E. Here it is not necessary for K
to be a skew field; we may take any ring R and consider a free R-module of infinite
rank v. Thus we take an index-set I of cardinal v and take a free left R-module V with
basis {vaJ (ex E I). In terms of this basis any endomorphism a of V is described by the
equations expressing the image of each Va in terms of the v's:
Here (aafJ) is a v x v matrix, i.e. a square array of elements of R, whose rows and
columns are indexed by a set I of cardinal v. Moreover, for each ex E I, there are
only finitely many non-zero coefficients aafJ in (8.1.1), hence each row of the
matrix (aafJ) contains only finitely many non-zero entries; we say: it is row-finite.
Conversely, every row-finite v x v matrix over R defines an endomorphism of Vrela-
tive to the basis {va}. For we can define vaa by (8.1.1), and for a general element
x = L~ava of V put
                                   xa   = L ~aaafJvfJ'
It is easily checked that the mapping a so defined is an endomorphism of V, so that
we have a bijection between EndR(V) and the set 9Jtv (R) of all row-finite v x v
matrices over R. If we define the addition and multiplication of row-finite matrices,
as in the finite case, by the formulae
we find that 9Jtv (R) is a ring isomorphic to E. We observe that some restriction on
the matrices, such as row-finiteness, is essential for the product to be defined. Our
conclusion may be stated as
Theorem 8.1.1. Let R be any ring and V a free left R -module of infinite rank v. Then
EndR(V) is isomorphic to the ring of all row-finite v x v matrices over R.         •
This agrees with the usual definition in the finite-dimensional case, and we have the
following rules:
8.1 The density theorem revisited                                                       311
Lemma 8.1.2. Let K be a skew field and Va left K-space. If a, b E EndK(V) and
pea) ::: pCb), then there exist p, q E EndK(V) such that
                                           b = paq,                                 (8.1.2)
and pep) = p(q) = pCb).
Proof. Choose complements N a, Nb of ker a, ker b in V respectively, so that
V  = ker a EB Na = ker b EB Nb. Then Na ~ im a, Nb ~ im b, so by hypothesis,
[Na : KJ ::: [Nb : KJ; hence there exists p E EndK(V) mapping ker b to 0 and
embedding Nb in N a; clearly pep) = pCb). If {u,,} is a basis of Nb, then the u"p are
linearly independent in Na and the u"pa are linearly independent, because the
restriction a INa is injective. Likewise the u"b are linearly independent in Vb. Now
choose a complement L in V for the subspace spanned by the u"pa, and define q
as the endomorphism mapping L to 0 and u"pa to u"b. Then p(q) = pCb) and
(8.1.2) holds, for both sides map u" to u"b and ker b to O.                        •
Theorem 8.1.3. Let K be a skew field and Va K-space of infinite dimension v. For any
infinite cardinal p, denote by Ell the set of all endomorphisms of V of rank < p,. Then
the Ell (for p, ::s v) are distinct, each Ell is an ideal in E = EndK(V), and these are the
only ideals apart from 0 and E.
Proof. Let a, b E Ell' where p, is an infinite cardinal. Then
rank < IL, hence as; E/l' Conversely, if bE Ell> then p(b) < IL, hence p(b) :::: p(a) for
some a Ea. By Lemma 8.1.2, b = paq for some p, q E E, so bE a, and this shows
that a = E/l' Thus every non-zero ideal in E is of the form Ell for some infinite
cardinal IL, and clearly if IL > v, then Ell = E.                                      •
   This result shows in particular that (for infinite [V : K]) EndK(V) is never simple.
However, it has another property; as we shall see in Section 8.2, EndK(V) is primi-
tive, and there is a close relation between primitive rings and such endomorphism
rings, which is described in Theorem 8.2.3.
   We now come to the density theorem in its general form; our presentation follows
Nicolas Bourbaki. We begin by describing the bicentral action on a module.
   Let R be any ring and M a right R-module; the action of Ron M may be described
by saying that we have a homomorphism of R into End(M), each a E R correspond-
ing to an endomorphism a' of M, qua additive group. The centralizer S of this set
R' = {a'ia E R} in End(M) is the ring of all R-endomorphisms, S = EndR(M),
and we shall regard M as left S-module. Since (ax)a = a(xa) for all x E M, a E R,
a E S, by definition of S, we see that M becomes an (S, R)-bimodule in this way.
Let T be the centralizer of S; this is again a subring of End(M), called the bicentralizer
of R. Clearly for any a E R, a' centralizes S and so lies in T, i.e. R S; T. If equality
holds, we say that R acts bicentrally on M, or also that MR is bicentral.
   As an example consider the ring R itself. As is well known and easily proved (see
BA, Theorem 5.1.3), the centralizer of RR is the set of all right multiplications; by
symmetry the centralizer of RR is the left of all left multiplications, hence RR as
well as RR is bicentral. For this property the presence of a unit element is of course
material.
   The definition of 'bicentral' may be restated as follows: Given a right R-module M,
R acts bicentrally if for each e in the bicentralizer T there exists a E R such that
This reduces to the definition in Section 5.1 when the centralizer of R is a field k.
  It is clear that bicentral ~ dense ~ full. The next result gives a useful condition
under which dense {} bicentral.
an S-submodule of M and it contains the generating set Uj, ... ,Un of M, hence
N = M, i.e. x() = xa for all x E M, so R acts bicentrally. The converse is clear. •
Proposition 8.1.5. For any R-module M and any n :::: 1, M and nM have isomorphic
bicentralizers.                                                                        •
  To say that R acts densely on M means that for all n and all x E nM, () E T, there
exists a E R such that x() = xa. This just states that R acts fully on nM, for all n; thus
we have proved
Theorem 8.1.6 (Density theorem). Let M be any right R-module. Then R acts densely
on M if and only if R acts fully on nM, for all n :::: 1.                     •
  We give several applications, which show the power of this result. In the first place,
we clearly have
Corollary 8.1.7. If R acts densely on a module M, then it acts densely on nM, for any
n::::   1.                                                                             •
Theorem 8.1.8. Let M be a semisimple right R-module. Then R acts densely on M, and
M is also semisimple over the centralizer of M. Moreover, if R is right Artinian, then M
is finitely generated over the centralizer of Rand R acts bicentrally on M.
Proof. Denote by S the centralizer and by T the bicentralizer of R on M. We first
show that every R-submodule of M is a T-submodule. Let M j be an R-submodule
of M; since M is semisimple, we have M = M j E9 Ml for a submodule M l . Let ()j
be the projection on M j ; then ()j E S, hence for any t E T and x E M j xt =
(()jx)t = ()j (xt), so xt E M j as claimed.
   Next we show that R acts fully on M, i.e. for each t E T and x E M there exists
a E R such that xt = xa. This states that for each x E M, xT S; xR. But xR is an
314                                                  Rings without finiteness assumption
(8.1.5)
   For a simple module we can say rather more. By combining Corollary 8.1.7 with
Schur's lemma (Lemma 6.3.1) we obtain a generalization of Wedderburn's first
structure theorem (see BA, Theorem 5.2.2):
Exercises
1. Let E = End(V) and Ev be as in the text, where [V: KJ = v is infinite. Verify
   that EIEv is simple but not semisimple. Show that for a, bEE there exists
   p such that b = pa iff im b ~ im a and there exists q such that b = aq iff
   ker b;2 ker a. Deduce that E is neither Artinian nor Noetherian; prove the
8.2 Primitive rings                                                                  315
     same for EIEv. (Hint. Consider the set of endomorphisms with image in a given
     finite-dimensional subspace, or with kernel containing a given finite-dimensional
     subspace.)
2.   In E = EndK(V) show that the endomorphisms with image in a given one-
     dimensional subspace form a minimal left ideal and that all such left ideals are
     isomorphic. Show further that the sum of all these left ideals is the unique mini-
     mal ideal in E (it is the socle of E).
3.   In the notation of Theorem 8.1.3 show that Ell' as algebra without 1, has as ideals
     precisely the EA (A .:::: /1) and O.
4.   Show that for any vector space V over a field k, Endk(V) is a regular ring.
5.   Show that in any monoid the centralizer of any subset is its own bicentralizer.
     Deduce that for any R-module M, EndR(M) acts bicentrally on M.
6.   Let A be a finitely generated abelian group, as Z-module. Show that Z acts
     bicentrally on A.
7.   Show that a simple ring acts bicentrally on any right ideal.
Proof. In this case the core of a is a itself, so 0 must be the maximal ideal of Rand
this means that R is a field.                                                      •
   To give an example, any simple ring (Artinian or not) is primitive, for R has a
maximal right ideal by Krull's theorem (see BA, Theorem 4.2.6) and its core is a
proper ideal, which must be O. The converse does not hold: if V is an infinite-
dimensional vector space over a field and E is its endomorphism ring, then E acts
faithfully on V and V is clearly simple as E-module, so E is primitive, but as we
saw in Section 8.1, E is not simple. The next result describes primitive rings more
precisely:
Theorem 8.2.3. Any primitive ring is isomorphic to a dense ring of linear trans-
formations in a vector space over a skew field K. Conversely, any dense subring of
EnddV) is primitive.
Proof. Given a primitive ring R, let V be a simple R-module on which R acts faith-
fully. Its centralizer K is a skew field, by Schur's lemma, and R is naturally embedded
as a dense subring in End(V), by Corollary 8.1.9. For the converse we need only
observe that any dense subring R of End(V) acts simply: given u, v E V, u =f. 0,
there exist a E R such that ua = v. Hence the R-submodule generated by any
u =f. 0 is V, i.e. V is simple.                                                      •
   As we saw, a primitive ring need not be simple, but if R is right Artinian as well as
primitive, say it is a dense subring of EndK(V), then V is finitely generated over K,
by Theorem 8.1.8, hence V ~ K n and R acts bicentrally: R ~ EndK(K n ) ~ Kn. This
expression is unique up to isomorphism of K (BA, Theorem 5.2.2). In the general
case there is no such uniqueness, but we have the following consequence which is
sometimes useful:
Proposition 8.2.4 (0. Litoff). Let R be a primitive ring which is not right Artinian.
Then for every n 2: 1, R has a subring with a homomorphism onto a full n x n
matrix ring over a skew field.
Proof. We may take R to be a dense subring of EndK(V), where V is an infinite-
dimensional vector space over K. Given n 2: 1, take an n-dimensional subspace U
of V, with a basis UJ, ••• ,Un' Let RJ be the subring of R mapping U into itself:
every element of RJ defines by restriction an endomorphism of U, thus we have a
homomorphism RJ -+ EndK(U) ~ Kn, and this is surjective, by density, so it is the
required homomorphism.                                                         •
   Let R be a ring with a minimal right ideal and define the socle s of R as the sum of
all minimal right ideals. This socle is an ideal, for, given any minimal right ideal a of
R and any x E R, then xa is a minimal right ideal or 0 and so is contained in s.
Primitive rings with non-zero socle have a more precise description:
Theorem 8.2.5. A primitive ring has a non-zero socle if and only if in its representation
as a dense ring of linear transformations of a K-space, R contains transformations of
8.2 Primitive rings                                                                  317
finite (non-zero) rank. When this is so, all faithful simple right ideals of Rare
isomorphic and the skew field K is determined up to isomorphism as the centralizer
of a faithful simple right R-module.
Proof. If there is an element of R defining a linear transformation of finite rank,
take c E R such that the rank p(c) is the least positive number possible. Then
ker cx 2 ker c for any x E R, and if cx =f. 0, then p(cx) = p(c), hence ker cx =
ker c, and the complement of ker c is finite-dimensional. By density we can find
Y E R such that cxy = c; this shows cR to be minimal, hence R has a non-zero socle.
   Now assume that R has a non-zero socle, and hence a minimal right ideal a. If M
is any faithful simple right R-module, then Ma =f. 0, so ua =f. 0 for some u E M. But
ua is a submodule of M, hence ua = M and so the mapping x 1-+ ux (x E a) is a
surjective homomorphism a -+ M. By the minimality of a, M ~ a as right R-
module, so a is also faithful; this shows that every simple faithful right R-module
is isomorphic to M. It follows that the isomorphism type of K is uniquely determined
as the endomorphism ring of a. Finally, since a is faithful, a 2 =f. 0, hence a2 = a, and
so ca = a for some c E a. We claim that p(c) = 1; for if not, then there exist
x, y E V, where V is the K-space on which R acts, such that xc, yc are linear indepen-
dent over K, and by density there exists b E R such that xcb =f. 0, ycb = O. Then
a = cR meets the annihilator n of y in R, and n is a right ideal, so a ~ n, by the
minimality of a. But this means that yc = 0, which is a contradiction, and it
shows that R contains elements of rank 1.                                              •
By contrast, simple rings with minimal right ideals are much more special:
Proposition 8.2.6. Any simple ring with minimal right ideals is Artinian.
Proof. Let R be a simple ring with minimal right ideals. The sum of all minimal right
                                                                                      .
ideal is the socle, a two-sided ideal, which coincides with R, by simplicity. Thus R is a
sum of simple right R-modules, hence a direct such sum, and since R is finitely
generated (by 1), this direct sum is finite. Thus R is right Artinian. Now R is also
semisimple as left R-module, hence it is also left Artinian, and so it is an Artinian
~
Exercises
1. For any ring R and any n ::: 1, show that Rn is primitive iff R is. More generally,
   show that being primitive is a Morita invariant.
2. Show that every minimal right ideal in a primitive ring has an idempotent
   generator. (Hint. Use the primitivity to show that the right ideal is not nilpotent.)
3. Show that if e is a non-zero idempotent in a primitive ring R, then eRe is
   primitive.
4. Show that for any idempotent e in a ring R, eR is a minimal right ideal iff Re is a
   minimal left ideal. Deduce that in a primitive ring with non-zero socle, the socle
   coincides with the left socle (defined correspondingly).
5. Show that the socle of a primitive ring is a minimal two-sided ideal. Deduce that a
   simple ring with non-zero socle is Artinian (Proposition 8.2.6).
318                                                         Rings without finiteness assumption
6. Show that a left Artinian primitive ring is simple. What can be said about a (right)
   primitive ring with minimal left ideals?
7. Show that the centre of a primitive ring is an integral domain. If A is any
   commutative integral domain with field of fractions K, show that the set of
   infinite matrices over K which are equal to a scalar in A outside a finite square
   is a primitive ring with centre A.
 Proof. Let {Pl.} be the family of all maximal right ideals of the semiprimitive ring R,
 and denote the core of PI. by CA' Since R is semiprimitive, we have npA = 0, and since
 CA ~ Pl.' it follows that nC A = o. If we put RA = RIcA, then RA is primitive, for it
 is represented faithfully on the simple module R/pA' Now the natural maps
fA : R ~ RA can be combined to a homomorphism into the direct product
(8.3.2)
Proposition 8.3.3. Let R be a ring and a an ideal such that a ~ J(R). Then
radical is (x). However, in the absence of nil ideals we obtain a semiprimitive ring by
adjoining an indeterminate.
Proposition 8.3.5 (Amitsur). If R is a ring with no non-zero nil ideals, then R[tl is
semiprimitive, where t is an indeterminate.
Proof. We have to show that the radical J of R[tl is O. If J =f:. 0, let a be the set
consisting of 0 and all leading coefficients of elements in f. It is clear that a is
an ideal in R and the conclusion will follow if we prove that a = O. Let al E a,
say f = alx n + ... E f. Then ft E J and so there exists g E R[tl such that
(1 + g) (1 - ft) = I, i.e.
Let us take r > deg g and equate the coefficients of terms of degree r( n + 1). On the
right there is no contribution, while on the left we have (1 + g)a~, therefore a~ = O.
Thus a is a nil ideal, hence a = 0 and so J = 0, as we had to show.                 •
Corollary 8.3.7. A skew field which satisfies a polynomial identity of degree d is of finite
dimension:::: [dj2f over its centre.                                                      •
8.3 Semi primitive rings and the Jacobson radical                                      321
   The result can be extended to semiprimitive rings as follows. We remark that any
ring R can be embedded in the full matrix ring Rd (e.g. as scalar matrices), hence Rm
can be embedded in Rn whenever min.
Proof. By Theorem 8.3.1, R is a subdirect product of primitive rings R)., where R). is
a homomorphic image of R and hence again satisfies an identity of degree d. By
Theorem 8.3.6, R). is a simple algebra of degree n over its centre, where n :s d12.
By taking a splitting field E). we can thus embed R). in a matrix algebra 9J1n, (E).).
The least common multiple of the degrees n). which occur is r :s [d 12]! and R).
can also be embedded in 9J1 r (E).). Now TI 9J1r (E).) ~ 9J1 r (A), where A = TI EA and
so we have an embedding of R in 9J1r (E).                                            •
Theorem 8.3.9. Let R be a PI-algebra without non-zero nil ideals. Then R can be
embedded in 9J1 n (A), where A is a commutative ring, and if R satisfies an identity of
degree d, then n :s [dI2l!.                                                         •
   For Artinian rings 'semiprimitive' reduces to 'semisimple'; this follows from the
fact that for an Artinian ring R, R!J(R) is semisimple (see BA, Theorem 5.3.5),
and it will also be derived in a more general context below in Section 8.4. In the
Artinian case the radical may be described as the intersection of all maximal two-
sided ideals. This does not hold generally: we dearly have, for any ring R,
where the first equality holds by the characterization of J(R). For an example where
the inclusion is strict, take R = EndK(V), where V is an infinite-dimensional vector
space over a field K. We have seen in Section 8.2 that R is primitive, hence J(R) = 0,
but the ideals in R form a chain, so the intersection on the right is the unique max-
imal ideal of R. Let us denote by 50 the socle of R; in the representation on V this
corresponds to the set of elements of finite rank. Assuming [V: Kl to be countable,
with basis {ei}, let us write ai for the set of elements of R mapping ei to O. Then ai is a
maximal right ideal and 50 <t. ai, n ai = O. Similarly, if bj is the set of elements of R
mapping V into Lj oj j Kej' then bi is a maximal left ideal such that 50 <t. bj , n bj = O.
By Krull's theorem R also has maximal right ideals (and maximal left ideals) con-
taining 50, but the proof is non-constructive, and there is no obvious procedure
for finding them.
322                                                          Rings without finiteness assumption
Exercises
1. Show that a subdirect product of semiprimitive rings is semiprimitive.
2. Show that the Jacobson radical of a ring contains no non-zero idempotent.
   Deduce that every regular ring is semiprimitive.
3. Verify that in an Artinian ring the intersection of all maximal two-sided ideals is
   just the radical.
4. Let R be a ring and a an ideal in R. Show that if J(R/a) = 0, then J(R) S; a.
5. Show that a subdirect product of a finite number of simple rings is a direct
   product of simple rings.
xe = x for all x E M,
e : A --+ K.
This means that e is a ring homomorphism such that (aa)e = a(ae) for all a E K,
a EA. In particular, (ae)e = a, hence ae 1--+ a is a bijection between K.e (where e
is the one of A) and K, and we may embed K in A by identifying a with ae. The
kernel of e is the augmentation ideal of A and we have the direct sum decomposition
A = kereEBK, (8.4.1)
                                 x = (x - Xe)   + xe.
8.4 Non-unital algebras                                                               323
For any K-algebra A we can form the augmented algebra A I = A EEl K by defining the
multiplication
                              (a, a)(b, f3) = (ab + af3 + ab, af3),
and this algebra has the unit-element (0, 1) and augmentation ideal A. Conversely, if
C is an augmented K-algebra with augmentation ideal A, then A I ~ C, as augmented
K-algebras. Moreover, the category of A-modules is equivalent to the category of
unital AI-modules.
   For K-algebras the notion of simple module has to be modified. A right module M
over a K-algebra A is called simple if MA =/; 0 and M has no submodules other than
oand M. When A has a one and acts unitally, this reduces to the previous definition.
Our object will be to study simple A-modules in terms of Al (where the results of
Section 8.3 can be used).
   Any simple A-module M can again be represented as All for a maximal right ideal
I, but we must also have A2 ct. I, to ensure that MA =/; o. Further, we can no longer
use Krull's theorem to find maximal right ideals because A may not be finitely
generated as right A-module. To overcome these difficulties, let us look more closely
at the correspondence between right ideals in A and in A I.
   Let I be a right ideal of A and I' a right ideal of A I such that I' ;2 I. Then
I S; I' n A, so there is a natural homomorphism of A-modules
Lemma 8.4.1. Let A be a K-algebra with a right ideal I. Then there is a right ideal I' of
Al such that
                                  I'+A=A I ,        I'nA=I,                       (8.4.2)
   In (8.4.3) 1 is the one of A I, but we can express (8.4.3) entirely within A by writing
a - ea E I for all a E A. A right ideal I satisfying (8.4.3) for some e E A is called a
modular right ideal.
   This lemma shows in particular that for any maximal right ideal I' of Al which
does not contain A, I' n A is a modular right ideal of A.
   By a maximal modular right ideal of A we shall understand a maximal member of
the set of all proper modular right ideals. Any right ideal containing a modular right
ideal is again modular (by the definition), hence a maximal modular right ideal is
also maximal in the set of all proper right ideals. Moreover, any proper modular
right ideal is contained in a maximal modular right ideal for, given I;2 (1 - e)A,
we can by Krull's theorem find a right ideal containing I but not e, and maximal
with these properties, and this is easily seen to be modular. In fact the notion of a
modular right ideal may be regarded as a device for producing maximal right
ideals in non-unital algebras; the corresponding quotients are simple modules, as
we shall see below.
   In any K-algebra A let us define the Jacobson radical J(A) as the set of all elements
of A represented by 0 in any simple A-module. If J(A) = 0, A is said to be semi-
primitive. For unital algebras these definitions reduce to the earlier ones, by the
characterization quoted in Section 8.3. Now J(A) can be described as follows in
terms of the maximal modular right ideals of A:
It follows that J(A) = J(A I) n A. Now assume that x E J(A I) \A; then
From the definition of the Jacobson radical quoted in Section 8.3 we now obtain
   We note that the intersection of all the maximal right ideals of A is in general
different from HA). For example, let k be a field and A the algebra of formal
power series in x with zero constant term. Then J(A) = A, but the intersection of
all maximal ideals is xA. More generally, simple algebras have been constructed
which coincide with their Jacobson radical, by Edward Sa~iada in 1961 (see Sa~iada
and Cohn [1967]).
   We can now prove the Wedderburn structure theorem for semisimple rings under
weaker hypotheses. First an auxiliary result on modular right ideals; we recall that
two right ideals a, b of A are called comaximal if n + b = A.
Lemma 8.4.4. Let A be any K-algebra and                 n, b modular right ideals which are co-
maximal. Then a n b is again modular.
Proof. By hypothesis there exist e,f E A such that (1 - e)A C; n, (1 - f)A C; b. Since
a + b = A, by comaximality, there exist ai E n, bi E b (i = 1,2) such that
                                     e = al   + bl, f   = a2   + b2.
Hence for any x    E   A,
   In Section 8.2 we saw that any simple ring with minimal right ideals is Artinian;
for non-unital algebras this is no longer always so, and it is of some interest to
examine the form such algebras take. To begin with we need to clarify what is to
be understood by a simple algebra; for example, a I-dimensional vector space A
over a field with multiplication xy = 0 for all x, YEA has no ideals apart from 0
and A, but such trivial cases should clearly be excluded. We therefore define an
algebra A to be simple if A2 =1= 0 and A has no ideals other than 0, A. Now a
simple algebra with minimal right ideals is again equal to its socle, but the latter
need not be finitely generated. To describe such algebras more precisely we need
the notion of a Rees matrix algebra.
   In the rest of this section we shall take all our algebras to be bimodules over a skew
field K such that x(yz) = (xy)z for any x, y, z in K or in the algebra. Thus they could
be described as K-rings, were it not for the fact that they will in general lack 1. We
shall regard them as algebras over the centre of K; an alternative would be to suspend
the convention that rings have a 1, but we shall not take that course to avoid con-
fusion. In practical terms this makes no difference, since we shall not have occasion
to consider the augmented algebra.
   Let C be any algebra and I, A any sets; we shall write A C I for the set of all matrices
over C with rows indexed by A and columns indexed by I, briefly, A x I matrices.
Fix a matrix P in AC I which is regular, i.e. whose rows are left linearly independent
and whose columns are right linearly independent over C. By the Rees matrix algebra
over C with sandwich matrix P one understands the set M of all I x A matrices
A = (ai).) over C, with almost all entries zero, with componentwise addition:
Theorem 8.4.6. For any algebra A the following conditions are equivalent:
(a)  A is a simple algebra with a minimal right ideal,
(b)  A is a prime algebra which coincides with its socle,
(c)  A is isomorphic to a Rees matrix algebra over a skew field,
(d)  A is isomorphic to a dense algebra of linear transformations of finite rank on a
      vector space over a skew field,
(aO)-(dO) the left-right analogues of (a)-(d).
Proof. The equivalence (a) ~ (d) follows as in Section 8.2, so it only remains
to prove (a), (b), (c) equivalent; the equivalence to (aO)-(dO) then follows by the
symmetry of (c).
   (a) =} (b). By hypothesis the socle of A is not zero, so it equals A, and being
simple, A is prime.
8.4 Non-unital algebras                                                                   327
    (b) => (c). Let t be a minimal right ideal. Since A is prime, t 2 =1= 0, so t 2 = t and
hence t = at for some a E t. Choose e E t such that ae = a; if e2 =1= e, then
(e 2 - e)t = t and we have t = at = a(e2 - e)t = 0, a contradiction. Hence e2 = e
and of course e =1= 0, so t = et and e is an idempotent generator. By Lemma 4.3.8
and Schur's lemma, EndA(eA) = eAe is a skew field, K say. Now A, being equal to
its socle, is semisimple as right A-module. Let s be the sum of all right ideals
isomorphic to t. This is a two-sided ideal; if s =1= A, then we have A = sElls' for
some non-zero right ideal s', hence s's = 0, and so s' = 0, because A is prime.
Thus A = s is a direct sum of right ideals isomorphic to t. Now t = eA is a left
K-space and we can take a basis UA (A E A); likewise Ae is a right K-space with
basis Vi (i E 1), say; further we note that U,Yi E eAe = K. We define a A x I
matrix P = (PAi) over K by
and claim that P is regular. For if (cA) is a family in K, almost all zero, such that
L CAPAi = 0 for all i, then L CAUAVi = 0, hence L CAUl. annihilates Ae and so must
be 0, and now we have CA = 0 by the linear independence of the U A• Similarly
L PAidi = 0 for all A implies di = O.
  Let M be the Rees matrix algebra over K with sandwich matrix P and define a
mapping f : M """* A by
(8.4.6)
Since almost all the ail. vanish, this is well-defined. We shall establish (c) by proving
that f is an isomorphism. It is clearly additive, and we have
bflj = LPfliCiAPAj.
If bflj = 0 for all fJ" j, then by the regularity of P, CiA = 0 for all i, A, a contradiction.
Hence bflj =1= 0 for some (fJ"j) E A x I. Let us write EiA for the matrix unit with
(i, A)-entry 1 and the rest O. For any d E K we have
where
328                                                   Rings without finiteness assumption
Since d was arbitrary in K, Ap ranges over K, and every matrix of M is a finite sum of
terms dEi)., it follows that every matrix of M lies in the ideal generated by C, hence M
is simple.
    It remains to find a minimal right ideal. By the definition of P, P).i i= 0 for some
pair (A, i) E A x I. Writing P).i = p, we have p-1Ei).p-1Ei). = p-1Ei).PEi).P-1 =
P- 1Ei ). i= 0, hence e = p - I Ei ). is a non-zero idempotent in M. Now the mapping
({J: K ~ M defined by CI~p-IEi)'C is an injective homomorphism with image
eMe, as is easily verified. Hence eMe is a skew field and so eM is a minimal right
~~~M                                                                                   •
Exercises
1. Show that a non-zero ideal in a primitive ring is again primitive (as a non-unital
   algebra).
2. Let A be an algebra without nilpotent (two-sided) non-zero ideal. Show that A has
   no nilpotent non-zero left or right ideals. Deduce that for any non-zero idem-
   potent e in A, eA is a minimal right ideal iff eAe is a skew field.
3. Let A be a nil algebra (i.e. every element is nilpotent). Show that every maximal
   right ideal is two-sided and contains A 2 •
4. Let A be an algebra over a field k; show that if A has no zero-divisors, then A is an
   integral domain iff 0 is not modular, as right ideal. By considering End(A), show
   that A can then be embedded in an integral domain.
5. Let k be a field and A the k-subalgebra of the free algebra k(x, y) consisting of all
   polynomials with zero constant term. Find two modular right ideals of A whose
   intersection is not modular.
6. Show that if a A x I sandwich matrix is row-finite, then II I :s: IA I.
7. Let A be a Rees matrix algebra over a skew field K with A x I sandwich matrix P.
   Show that IAI :s: 2111 and give an example where equality occurs. (Hint. If V is a
   K-space of dimension IJI, then its dual V* has dimension 2111; interpret the rows
   of P as vectors in V*.)
the relation between them. We recall that a prime ring is a ring R         #- 0 such that for
any two ideals a, b of R,
ab = 0 ::::} a = 0 or b = O.
Theorem 8.5.2. Let R be a ring, Man m-system in R and a an ideal in R disjoint from
M. Then there exists an ideal p in R which contains a, is disjoint from M and is
maximal with respect to these properties. Any such ideal p is prime.
Proof. Let .91 be the set of all ideals a' of R such that a' ;2 a, a'      nM   = 0. Then
a E .91, so .91 is not empty. It is easily seen to be inductive and so by Zorn's
lemma, it contains a maximal member p, which is an ideal with the required
properties.
  Now let p be an ideal with the properties stated and assume that a, b 1J p, ab ~ p.
Then a + p, b + p are strictly larger than p and so must meet M, say s = p + a,
t = q + bE M, where p, q E p, a E a, bE b. By hypothesis there exists x E R such
that sxt E M; thus M contains
                                                                                          .
               (p + a)x(q + b)     = px(q + b) + axq + axb E p + ab = p,
a contradiction. Hence a, b ct p implies ab          ct p and clearly 1 fj. p, so P is indeed
~~
   In the commutative case we found that the intersection of all prime ideals of R is
the set of all nilpotent elements of R. Theorem 8.5.2 can be used to obtain a corre-
sponding result for general rings; however, the situation is a little more complicated
330                                                          Rings without finiteness assumption
here because the ideal generated by a nilpotent element need not be nilpotent. Let us
recall that an ideal a is called nilpotent if a r = 0 for some r :::: 1, i.e. XI ... Xr = 0 for
any Xi E a, and a is nil if it consists of nilpotent elements. Clearly every nilpotent
ideal is nil, but the converse does not hold generally.
  We also recall that a ring R is called semiprime if for any two-sided ideal a of R,
a2 = 0 => a = O. (8.5.1)
In terms of elements this states that aRa = 0 implies a = 0, for all a E R. We observe
that a ring is semiprime iff it has no non-zero nilpotent ideals. For when this
holds, (8.5.1) is clearly satisfied. Conversely, assume (8.5.1) and let a be a nilpotent
ideal, say a r - I i- 0, a r = 0, where r:::: 2. Then 2r - 2 :::: r, hence 0 = a2r - 2 =
(ar - I )2 i- 0, by (8.5.1), which is a contradiction.
   In the commutative case the semiprime rings are the reduced rings, i.e. rings in
which 0 is the only nilpotent element; in the general case every reduced ring is semi-
prime, but not conversely.
   An ideal a in a ring R is called semiprime if Ria is a semiprime ring. We note that R
itself, as an ideal of R, is semiprime but not prime.
   We first elucidate the relation between prime and semiprime ideals.
Proposition 8.5.3. Let R be a ring. Then every intersection of prime ideals is semiprime;
conversely every semiprime ideal is an intersection of prime ideals.
Proof. Let c =  np", where the p" are prime ideals, and suppose that aRa ~ c. Then
aRa ~ p", hence a E p" for all A, and so a E np" = c, and this shows c to be semi-
prime. Conversely, let c be a semiprime ideal in R; by passing to the residue class ring
Ric, we may take c = O. We have to show that in a semiprime ring the intersection of
all prime ideals is O. Take any a E R; then aRa i- 0, so there exists bo such that
al = aboa i- O. Generally, if we have constructed ai, ... , an such that ai+ I E aiRai
and an i- 0, then there exists bn such that an + I = anbna n i- O. The set
M = {I, ao = a, ai, a2, ... } is an m-system, for given ar, as, choose n> r, s; then
an as are factors of an and anbna n i- 0, hence aruas i- 0 for some u E R. Thus M is
an m-system and a E M, 0 f/. M; hence we can find a prime ideal p disjoint from
M, by Theorem 8.5.2. It follows that a f/. p; since a was any non-zero element of R,
we see that the intersection of all prime ideals of R is zero.                        •
Corollary 8.5.4. A ring is semiprime if and only if the intersection of all its prime ideals
is zero. Hence any semiprime ring R can be written as a subdirect product of prime rings,
which are homomorphic images of R. In particular, every semiprimitive ring is semi-
prime.
Proof. The first part follows by applying Proposition 8.5.3 to the zero ideal, and the
second part now follows as in the proof of Theorem 8.3.1.                           •
and such that RIN is semiprime. A ring may have more than one nilradical; in what
follows we shall describe the greatest and least such radical. We begin with a lemma.
Lemma 8.5.5. The sum of any family of nil ideals in a ring is a nil ideal.
Proof. Consider first two nil ideals aI, a2 and write a = aI + a2. Then the ideal a/ a2
of the ring Rla2 is nil, because a/a2 ~ ad(al n a2) and the latter is a homomorphic
image of al and hence is nil. Thus any element of a has a power in a2 and a power of
this is 0, therefore a is nil. Now an induction shows that the sum of any finite
number of nil ideals is nil. In the general case let a = L aA, where each aA is a nil
ideal. Any element of a lies in the sum of a finite number of the aA and so is nil-
potent, therefore a is nil.                                                         •
   By this lemma the sum of all nil ideals in a ring R is a nil ideal; it is denoted by
U = U(R) and is called the (Baer) upper nil radical or also the KOthe nilradical. It is
indeed a nilradical, for RIU cannot have any non-zero nil ideals, by the maximality of
U; a fortiori U must be semiprime. Since U contains all nil ideals of R, it contains all
nil radicals.
   To obtain the least nilradical we need another definition. An element c of R is
called strongly nilpotent if any sequence CI = C, C2, C3,'" such that Cn+1 E cnRc n is
ultimately zero. It is clear that such an element is nilpotent and that any element
of a nilpotent (left or right) ideal is strongly nilpotent. Moreover, in a semiprime
                                       °
ring R, the only strongly nilpotent element is 0. For if C=j:. 0, then cRc =j:. 0, say
C2 = cac =j:. 0, now C3 = C2a' C2 =j:. for some a' E R and continuing in this way we
                                               °
obtain a sequence CI = C, C2, C3, ... such that Cn+1 E cnRcn and no Cn is zero, so C
                                                     °
is not strongly nilpotent. Conversely, if is the only strongly nilpotent element in
R, then R is semiprime. For if not, then cRc = for some C=j:. 0, so Cis strongly nil-
potent. Thus we have
Proposition 8.5.6. Any ring R is semiprime if and only if the only strongly nilpotent
element is O.                                                                               •
Theorem 8.5.7. In any ring R the set L(R) of all strongly nilpotent elements is the least
nilradical, and is equal to the intersection of all prime ideals in R:
                                L(R)   = n{plp prime in R}.                            (8.5.2)
Proof. If x is strongly nilpotent in R, so is its residue class x (mod p) for any prime
            x
ideal p, so = 0, which means that x E p. Thus L(R) is contained in the right-hand
side of (8.5.2). Now suppose that c fj. L(R); then c is not strongly nilpotent, so there
                                           °
exists a sequence {c n} such that CI = C, =j:. Cn + I E cnRcn. The set 5 = {I, CI, C2, ... } is
an m-system, for given Cr, Cs> where r ~ s say, we have Cs+I E crRcr n csRcs. By
Theorem 8.5.2 there is an ideal p which is maximal disjoint from 5, and p is
prime, thus c fj. p and this shows that equality holds in (8.5.2), and incidentally,
that L(R) is indeed an ideal.
332                                                    Rings without finiteness assumption
   Now L(R) is clearly a nil ideal and RjL(R) is semiprime, hence L(R) is a nilradical,
and by (8.5.2) it is the least ideal with semiprime quotient, hence it is the least nil-
radical.                                                                             •
Kothe's conjecture. If a ring has a non-zero nil right ideal, then it has a non-zero nil
ideal.
   Equivalently, the prime radical contains every nil right ideal. We remark that in
Noetherian rings every nil right ideal is nilpotent (Proposition 7.4.3), so the conjec-
ture is valid in that case.
   In general rings the upper and lower radical may well be distinct; to give an
example it is enough to construct a semiprime ring with a non-zero nil ideal. Let
A be the (non-unital) k-algebra generated by XI, X2, ... with the following defining
relations: for any element a involving only XI, X2, ... , Xn we have an = O. Then
every element of A is nilpotent, and in the augmented algebra R = A I we have the
nil ideal A. However, R is semiprime, for, given a E R x, let a be of degree m in
the x's and put N = 2m + 2. Any relation involving XN consists of terms of degree
at least N, hence axNa =j::. 0, because this expression has only terms of degree at
most 2m + 1. This shows R to be semi prime.
   For another example, this time finitely generated, take the finitely generated non-
nilpotent nil algebra A constructed by Golod (see BA, Exercise 5 of Section 6.3). It
can be verified that A I is semiprime but it has the nil ideal A. More generally, a
simple nil algebra has recently been constructed by Agata Smoktunowicz [2002].
   However, in a Noetherian ring the upper and lower nilradicals coincide. For as we
saw in Proposition 7.4.3, in a semiprime ring with maximum condition on right
annihilators every nil left or right ideal is O. Hence if R is right Noetherian, then
RIL(R) has no non-zero nil ideals and so U(R) = L(R); by symmetry the same
holds for left Noetherian rings, so we have
Proposition 8.5.8. In a right (or left) Noetherian ring the upper and lower nilradical
coincide. Thus in a Noetherian ring the prime radical is a nilpotent ideal containing all
nilpotent ideals.                                                                     •
RIN has zero Levitzki radical, so it is semiprime, and this shows N to be indeed a
nilradical. In any ring we have
and in the first example given above N ::) L, at least in characteristic 0, by the
Nagata-Higman theorem (see Further Exercise 15 of Chapter 7), while U::) N in
Golod's example.
Exercises
 1. Show that in any right Noetherian ring 0 can be expressed as a product of prime
    ideals.
 2. Show that a prime ring with a minimal right ideal is primitive.
 3. Show that in a semiprime ring the socle and the left socle coincide. (Hint. Use
    Exercise 2 of Section 8.4.) Give an example to show that this does not hold
    generally.
4. Let R be the subring of 9)12 (Z) consisting of all matrices (: ~) such that
        = =
      a d, b c (mod 2). Show that R is a prime ring, but not an integral domain;
      find all its idempotents.
 5.   Give an example of a commutative ring with a nil ideal which is not nilpotent.
 6.   Show that the sum of any family of locally nilpotent ideals is locally nilpotent.
      (Hint. Imitate the proof of Lemma 8.5.5.)
 7.   In any ring with Levitzki radical N, verify that RIN has zero Levitzki radical.
 8.   In any ring R, define Nt as the sum of all nilpotent ideals and define Na for any
      ordinal Ci recursively by Na+dNa = Nj(RjNa), while at a limit ordinal,
      Na = U/3<aNa. Show that the union of all the Na is the lower nilradical of R.
 9.   Show that in a left or right Noetherian ring any nilpotent element is strongly nil-
      potent. Deduce that every nil (left or right) ideal is nilpotent.
10.   Show that a reduced prime ring is an integral domain.
11.   Show that any ideal of a semiprime ring is semiprime, qua non-unital algebra.
12.   Show that in a reduced ring R, if a product of n elements in a certain order is 0,
      then the product in any order is O. (Hint. Show that if a\ •.. an = 0, then
      atx\a2X2 •.. anXn = 0 for all Xi E R.)
13.   Show that in any ring R, every prime ideal contains a minimal prime ideal.
      (Hint. Take a maximal m-system containing the complement of the given
      prime ideal.)
14.   Let R be a reduced ring and p a minimal prime ideal. Show that the monoid M
      generated by the complement of p does not contain O. Deduce that M does not
      meet p and hence that Rip is an integral domain.
15.   Show that every reduced ring is a subdirect product of integral domains. (Hint.
      Use Exercises 10-14. This is a theorem of Andrunakievich-Ryabukhin; the proof
      is due to Herstein.)
334                                                         Rings without finiteness assumption
Theorem 8.6.1. Let R be a semiprimitive PI-algebra with centre C. Then every non-zero
ideal a of R meets C non-trivially: an C i- o.
Proof. By hypothesis R has a family of primitive ideals tA (A E A) whose intersection
is O. Let f = 0 be a polynomial identity for R; then RA = RitA is a primitive PI-
algebra satisfying f = 0 and we have an embedding
R --+ n RA,
and choose f.L E A such that the degree m of RJl is maximal. Then the Razmyslov
polynomial D = DJl is central and non-zero on RJl and since aEJl = RJl, there exist
aI, ... , ar E a such that x = D(al EJl , ... ,arEJl) is a non-zero element of the centre
of RJl' It follows that y = D(al, ... , ar ) i- 0; moreover, yEA = 0 for A f/. AD, while
for A E AD, yEA is in the centre of RA; therefore yEa n C.                             •
  Here none of the conditions can be omitted, for the polynomial ring k[x] is a
semiprimitive PI-algebra which is not simple, and for an infinite-dimensional k-
space V, Endk(V) is a primitive k-algebra whose centre is a field, but which is not
simple (and of course not a PI-algebra).
  Theorem 8.6.1 has a useful generalization. To prove it we first note a PI-analogue
of Proposition 7.4.3.
Proposition 8.6.3. Any prime PI-algebra A satisfies the maximum condition on left (or
right) annihilators; hence every nil left or right ideal of A is zero.
Proof. Let 0 C (I C he ... be a strictly ascending chain of left annihilators and put
tn = On)" so that tl :J t2 :J .... By Proposition 7.5.3 we may take the polynomial
identity to be multilinear; denote by d the least degree for which there is a homo-
geneous multilinear polynomial f of degree d:
such that f vanishes for all x E [i (i = 1, ... , d). Here d > 1, for if al [I = 0, then
[I = 0, because A is prime. We thus have
(8.6.1)
N ow when dO' i=- d, then Xd" E [d _ 1 and the corresponding term in (8.6.1) is 0; the
remaining terms have the form a"XI" ... X(d _ 1)"Xdtd - I ' Hence
(8.6.2)
where a ranges over all permutations of 1, ... , d - l. But     [dtd _ 1   is a non-zero two-
sided ideal and A is prime, so
and this contradicts the definition of d. Hence the chain of [i breaks off and the
maximum condition holds. By symmetry the same is true for right annihilators
and now the rest follows by Proposition 7.4.3.                                  •
Corollary 8.6.4. In a semiprime PI-algebra every nil left or right ideal is zero.
Theorem 8.6.5 (Rowen). In a semiprime PI-algebra, every non-zero ideal meets the
centre non trivially. In particular, a semiprime PI-algebra whose centre is a field is
simple.
Proof. Let R be a semiprime PI -algebra with centre C. By Corollary 8.6.4, R has no
non-zero nil ideals, hence by Proposition 8.3.5, the polynomial ring R[t] is semi-
primitive, and its centre is clearly Crt], so by Theorem 8.6.1, for any non-zero
ideal a in R we have art] n Crt] i=- o. On comparing coefficients we find that
an C i=- o. Now the rest follows as in Corollary 8.6.2.                          •
Theorem 8.6.6 (Posner, 1960). Let R be a prime PI-algebra with centre C. Then Cis
an integral domain, and if K is its field of fractions, then the natural mapping
R ---+ Q = R 0c K is an embedding and Q is a finite-dimensional simple K-algebra.
Proof. The first part follows by Corollary 7.1.11; now any multilinear identity in R
also holds in Q, so by Theorem 8.6.5, Q is simple. Hence by Kaplansky's theorem
(Theorem 8.3.6), Q is finite-dimensional over K.                                  •
336                                                        Rings without finiteness assumption
  Let R be a prime PI -algebra and Q its quotient ring by fractions of the centre. By
Theorem 8.6.6, Q is of finite dimension over its centre and this dimension is a
square, n2 say (Proposition 5.2.2); the number n is called the PI-degree of R. We
can now determine the PI-degree of the generic division algebra:
and the other arguments of Dnv also lie in R. Denote by Dnvi the function obtained
from Dnv (defined above) by replacing xI-' byal-'i for J.L =f. v. By the remark preceding
8.7 Firs and semifirs                                                                 337
Thus we have a (finite) projective coordinate system for R, showing that R is finitely
generated projective over C.
  It remains to show that (8.6.3) is an isomorphism. We note that EnddR) is
generated by elements of the form CDnvi and if Dnvi(X) = LjUvijXVvij, then
(L CUvij 18) Vvij)A = CDnvi' so A is surjective. Now
If LPj 18) % E ker A, then LPjX% = 0 for all x E R, hence LPj 18) qj = 0 by (8.6.5);
this shows A to be injective, so it is indeed an isomorphism.                    •
  It can be shown that the sufficient condition of this theorem is also necessary (see
Rowen (1980), (1988)).
Exercises
1. Let D be a skew field with centre k. Show that for any (skew) subfield E of D,
   E and Ek have the same PI -degree.
2. Show that in any PI-algebra the upper and lower nilradicals coincide.
3. (Amitsur) Let R be a PI-algebra and 1)1 the sum of all its nil ideals. Use Corollary
   8.3.8 to show that R/1)1 can be embedded in a matrix ring 9Jl n (E), where E is a
   commutative ring. Deduce that R satisfies an identity S~n = 0, where S2n is the
   standard polynomial and m ::: 1. (By Exercise 1 of Section 7.7, m cannot always
   be taken to be 1. Hint. Express R as homomorphic image of a relatively free
   algebra satisfying a given polynomial identity holding in R.)
Definition. By a free right ideal ring or right fir for short, we understand a ring R
in which each right ideal is free, of uniquely determined rank. Left firs are defined
similarly and a left and right fir is called a fir.
   It follows that each right (or left) fir R has invariant basis number (IBN), for this is
so when R is right Noetherian, and when this is not the case, it will contain free right
ideals of arbitrary rank, by Proposition 7.1.9. We shall meet firs again in Section 11.5,
where it will be shown that the free algebra k(X) on any set X over a field k is a fir;
this may be regarded as a generalization of the fact that the polynomial ring k[x) in a
single variable over a field is a PID.
   Often one meets an even wider class than firs; to describe it we shall need to look
at general relations in rings. A relation
(8.7.2)
(8.7.3)
Theorem 8.7.1. Let R be a non-zero ring. Then the following conditions are equivalent:
(a) Every relation in R can be trivialized.
(b) Every finitely generated right ideal in R is free, as right R -module, of unique rank.
(c) R has IBN and every finitely generated submodule of a free right R-module is free.
(d) Every matrix relation in R can be trivialized.
(aO)-(dO) the leftright analogues of (a)-(d).
Further, any such ring is an integral domain.
Proof. (a) :::} (b). Let a be a finitely generated right ideal of R and let n be the least
integer such that a has an n-element generating set, UI, ... , Un say. Then a is free on
8.7 Firs and semifirs                                                                     339
UI, ... , Un> for if not, take a non-trivial relation u.a = O. By (a) this can be trivialized,
say u' = Up-I, a' = Pa. Since a i= 0, we have a' i= 0, say a~ i= O. But u'.a' = 0 is
trivial, so u~ = 0 and it follows that a is generated by u'i' ... , u~_ l' which contra-
dicts the choice of n; hence a is free on UI, ... , Un. If a has another basis VI, ..• , Vm
and m i= n, then m > nand R m ~ R n; this yields an endomorphism f of a which is
surjective but not injective. Thus fUI, ... , fUn generate a but not freely; by the first
part we see that a can be generated by fewer than n elements, which is again a contra-
diction, hence m = n and a has unique rank.
   (b) :::} (c). Let F be a free right R-module and G a finitely generated submodule.
The finite generating set of F involves only finitely many generators of F; by ignoring
the rest we may take F to be finitely generated. Let Al be the projection of F on the
first factor R and denote by F' the kernel of AI. Then we have the exact sequence
                               o -+ F' n G -+   G -+ a -+ 0,                          (8.7.4)
where a is the image of G under AI> a finitely generated right ideal of R. By (b), a is
free, hence (8.7.2) splits and G 2:! (F' n G) EB a. By induction on the rank of F,
F' n G as finitely generated submodule of F' is free and it follows that G is free.
   We next show that R is an integral domain. Let a E R and denote its right anni-
hilator by n. Then aR ~ Rln; since aR is free, we have R = n EB a, where a ~ aR.
Both n and a are free; by the uniqueness of the rank, either a = 0 or n = 0, so a
is either 0 or right regular. This holds for all a E R, hence R is an integral
domain. Now if R is right Ore, then it is embeddable in a skew field and so has
IBN (BA, Theorem 4.6.7); otherwise by Proposition 7.1.9 it has free right ideals of
any finite rank and this rank is unique by (b); hence R has IBN.
   (c) :::} (d). Let XY = 0 be a matrix relation, where X E rRn, Y E nR'. The matrix X
defines a linear map cp: nR -+ r R by left multiplication and we have an exact
sequence
                             0-+ ker cp -+ nR -+ im cp -+     o.                      (8.7.5)
  A non-zero ring satisfying the conditions of this theorem is called a semifir. By (b)
every left or right fir is a semifir; however there are right firs that are not left firs (see
Exercise 5). We also remark that a commutative semifir is just a Bezout domain,
while a commutative fir is a PID.
  As already remarked, the free algebra k(X) on any set X over a field k is a fir; more
generally, this holds for the tensor D-ring Dk(X), where D is any skew field and k a
subfield. This is proved by means of the weak algorithm, which we shall not define
340                                                      Rings without finiteness assumption
here (see Cohn (1985), Chapter 2); it is easier to prove the weaker statement that
Dk(X) is a semifir and we shall do so now (but see also Section 11.5).
Theorem 8.7.2. Let D be a skew field and F a subfield. Then the tensor D-ring DK(X)
on any set X is a semifir.
Proof. Let {u v} be a right F-basis of D and X = {x;}; then every element of Dp(X) can
be uniquely written in the form c + L UvX;fvi, where c E D and fvi E Dp(X). This
follows because we can express every element of D as a = L uva v, where av E F
and axi = L UvavXi = L UvXiav'
   Suppose now that we have a relation in Dp (X):
                                        n
                                       Laibi = O.                                   (8.7.6)
                                        I
In order to show that this relation can be trivialized it is enough to do this in a given
degree; thus we can assume that the ai, bi are homogeneous and that deg ai +
deg bi = r > O. We shall use double induction, on nand r. If each ai has positive
degree, we can write ai = L UvXtavti; equating cofactors of UvX t we find
Lavtibi = 0,
and now the result follows by induction on r, for we can make a transformation
reducing one of the b's to O. There remains the case where some ai, say al has
degree O. Then we can replace a2 by a2 - al.aila2 = 0 and bl by b~ = bl +
alailb; we thus obtain
and we have now diminished n, and so can apply induction on n to complete the
F~                                                                                       •
   We conclude this section with a result that is often useful, the inertia lemma for
semifirs (see also Cohn (1985), Lemma 4.6.3). We recall that from any ring R we can
form the polynomial ring R[t] in a central indeterminate t; its completion by power
series is the formal power series ring R [[ t]], and if we localize now at the powers of t
we obtain the formal Laurent series ring R((t)). Since t is regular in R[[t]J, the latter
ring is embedded in the Laurent series ring.
   Let B be any ring and A a subring. Then A is said to be finitely inert in B if for any
matrix Z ErN, if Z = XY over B, where X is r x nand Y is n x s, there exists
P E GLn(B) such that Xp- I , PY have their entries in A.
Lemma 8.7.3 (Inertia lemma). Let R be a semifir. Then for any central indeterminate
t, the formal power series ring R[[t]] is finitely inert in R((t)).
Proof. Put 5 = R[[t]L T = R((t)) and indicate the natural homomorphism 5 --+ R
by f 1--+ (no; it amounts to putting t = O. We take A E r5 5 and suppose that over T:
                      A = PQ, where Pis r x nand Q is n x s.                        (8.7.7)
8.7 Firs and semifirs                                                                       341
(8.7.8)
If A ::: 0, there is nothing to prove, so assume that A > 0. Then (PQ)o = 0; since R is
a semifir, we can find a matrix U E GLn(R) trivializing this relation, and on replacing
P, Q by PU- I , UQ we find that for some h (0 ::: h ::: n) all the columns in (P)o after
the first hare 0, while the first h rows of (Q)o are 0. If we multiply P on the right by
V = t1h \B In-h and Q on the left by V-I, then P becomes divisible by t while Q still
has all its entries in S. In this way we can, by cancelling a factor t, replace t - A by t I - A
in (8.7.8) and after A steps obtain the same equation with A = 0. This proves the
finite inertia.                                                                               •
   We remark that there is a stronger notion, total inertia, and the inertia theorem
asserts that an inversely filtered fir with inverse weak algorithm is totally inert in
its completion (see Cohn (1985), Theorem 2.9.15).
Exercises
1. Show that a right Ore domain is a right fir iff it is right principal.
2. Show that every semifir is weakly finite.
3. Let R be a semifir and A, B finitely generated submodules of a free R-module.
   Show that A n B, A + B are again free and that rk(A + B) + rk(A n B) =
   rkA + rkE.
4. Let R be a right fir. Show that any submodule of a free right R-module is free.
5. In the group algebra over a field k of the free group on x, y let R be the sub algebra
   generated by x, y, x-Iy, x- 2y, .... Verify that R is a semifir but not a left fir (it
   can be shown that R is a right fir, see Cohn (1985), Section 2.10).
 4. Let V be a (K, R )-bimodule, where K is a skew field and R is a ring which acts
     fully on V with centralizer K. Show that VR is simple; if moreover R acts fully on
     2V, then R acts densely on V.
 5. (Jacobson) Let k be a field and R the (unital} k-algebra generated by u, v subject
     to uv = 1. Show that R is primitive by representing it by the linear transforma-
     tions eiU = ei + 1> eiV = ei - I if i > 1, el v = o.
 6. (P. Samuel) Let k be a field and k(x, y} the free k-algebra on x, y. Show that this is
    primitive by representing it as follows: eix = ei' + I> ei y = ei _ I if i > 1, el y = o.
 7. Show that in any ring R the ideals with primitive quotient are just the cores of
     maximal right ideals. Show further that the intersection of these ideals is J(R).
 8. Show that in a prime ring R every non-zero right ideal is a faithful R-module.
 9. Show that every maximal ideal in a ring is a prime ideal.
10. Show that the least ideal sn in a ring R such that R/sn is semiprime is a nil ideal
    and deduce that sn is the lower nilradical.
11. Find the socle of the ring of all upper triangular matrices over a field, and show
    that it may differ from the left socle.
12. Let K be a commutative ring and t an indeterminate. Show that if ao + al t +
    ... + ant n E K[tJ is a unit, then ao is a unit and al, ... , an are nilpotent. Deduce
    that the Jacobson radical of K[tJ is its nilradical.
13. Show that the left (or right) ideal generated by a strongly nilpotent element is
    nilpotent.
14. Let R be a prime ring. Show that any non-zero central element of R is a non-
    zero-divisor; deduce that the centre of R is an integral domain.
15. (Kaplansky) With any square matrix A over a ring K we can associate an 'infinite
    periodic' matrix by taking the diagonal sum of countably many copies of A. Let
    R be the ring of all upper triangular infinite periodic matrices over a field k.
    Show that R is prime, with Levitzki nilradical equal to the Jacobson radical.
16. Let R be any K-algebra and define the centroid of R as End(RRR). Show that the
    centroid r is a commutative K-algebra and that R has a natural r -algebra struc-
    ture. If R is primitive and R 2 = R, show that r is an integral domain.
17. Show that the group algebra of the additive group of rationals is a Bezout
    domain. For real numbers a, f3 show that the monoid M of all positive numbers
    of the form rna + nf3 is such that the monoid algebra kM is Bezout iff al f3 is
    rational.
18. Over a semifir R show that if a matrix product AB (A ErR n, BEn W) has a u x v
    block of zeros, then for some t (0 :::: t:::: n) and a suitable P E GLn(R), AP- I has
    a u x t block of zeros and PB has an n - t x v block of zeros (this is the partition
    lemma, see Cohn (1985), Lemma 1.1.4).
19. Show that the group algebra of a free group is a semifir.
                         Skew fields
Skew fields arise quite naturally in the application of Schur's lemma and elsewhere,
but most of the known theory deals with the case of skew fields finite-dimensional
over their centres (see Chapter 5). In the general case only isolated results are
known, much of it depending on the coproduct construction (see Cohn (1995),
Schofield (1985)). This lies outside our framework and we shall confine ourselves
to presenting some of the highlights which do not require special prerequisites.
   After some general remarks in Section 9.1, including the Cartan-Brauer-Hua
theorem, we shall give an account in Section 9.2 of determinants over skew fields.
This is followed in Section 9.3 by a proof of the existence of free fields, based on
the specialization lemma whose proof has been much simplified. Many of the con-
cepts (though fewer of the actual results) of valuation theory carry over to skew fields
and in Section 9.4 we shall examine the situation and pursue some of the results that
continue to hold in the general case. The final section, Section 9.5, is concerned with
the question when the left and right dimensions of a field extension are equal. We
shall also meet examples of skew field extensions whose left and right dimensions
are different (Emil Artin's problem); they are most easily obtained as pseudo-
linear extensions, a more tractable class of finite-dimensional extensions.
   Throughout this chapter we shall use the term 'field' to mean 'not necessarily
commutative division ring'; the prefix 'skew' is sometimes added for emphasis.
9.1 Generalities
Let K be a skew field. Its centre C is a commutative field, and the characteristic of C is
also called the characteristic of K. We have already seen in Theorem 5.1.14 that every
finite field is commutative. However, an infinite field may well have a finite centre, in
fact, for every commutative field k there is a skew field with centre k. In characteristic
owe can adjoin variables u, v with the relation uv - vu = 1 to obtain such a field, but
there is another construction which applies quite generally; a third construction,
generally valid, is used in the proof of Theorem 9.3.3 below.
Proposition 9.1.1. Let k be any commutative field. Then there is a skew field D whose
centre is k.
                                            343
Proof. Consider the group algebra of the additive group of rationals over
k: k[xAIA E Ql. Clearly this is a commutative k-algebra; we take the subalgebra
generated by x 2 - n , n = 0, 1, ... and form its field of fractions
                                  E = k(x,   Xl/2, Xl/4, ... ).
This field has the automorphism ct : f(x) 1-+ f(x 2 ), which is of infinite order. More-
over, k is the precise subfield fixed by ct, for iffinvolves x, let 2n (n ?: 0) be the largest
denominator in any power of x occurring in f Then x r/ 2n occurs in f for some odd r,
but it does not occur in!", so fis not fixed under ct. Now form the skew polynomial
ring E[y; ctl and let D be its field of fractions. By Theorem 7.3.6, D has the precise
centre k.                                                                                  •
   Much of linear algebra can be done over a skew field (see BA, Chapter 4); this
rests on the fact that every module over a field is free. It is well known (BA,
Theorem 4.6.8) that this property actually characterizes skew fields.
   We go on to prove two general results, which although not used here, are of
importance, with many applications. The first concerns normal subgroups of the
multiplicative group of a field; the proof is based on an idea of Jan Treur.
                             (a   + l)(a + 1, c)    = a(a, c)   + 1.
By the linear independence of 1, a over K, we have (a, c) = (a + 1, c) = 1; thus
ac = ca for all c E K, a tj K. But if b E K, then a + b tj K, therefore be - cb =
(a + b)c - c(a + b) - lac - cal = 0, hence c commutes with every element of L,
i.e. K is contained in the centre of L.                                         •
   This result was obtained in the case of division algebras by Henri Cartan in 1947 as
a consequence of his Galois theory of skew fields. Richard Brauer and independently
Hua Loo-Keng observed in 1949 that the general case could be proved directly.
   Our second result concerns additive mappings preserving inversion:
Proof. (E. Artin) We must show that for all a, b E K, either {ab)a = aaba or for all
a, b E K, {ab)a = baa a. We observe that aa{a-I)a = 1 by (9.1.1), so a f:. o::::}
aa f:. 0 and it follows that a is injective. We start from the following identity (Hua's
identity):
(9.1.2)
valid whenever all inversions are defined, i.e. ab   f:. 0, 1. To prove (9.1.2), we observe
that for any x f:. 0, 1,
                             {X-I_O- I = {l-X)-I -1,                                   (9.1.3)
as we see by multiplying out. Let ab f:. 0, 1; then a - I (b- I - a) = (ba) - I   -   1, hence
on taking x = ba in (9.1.3), we find
                   {b- I - a)-Ia = {{ba)-I    -0- 1 = {l- ba)-I -1,
             {b- I - a)-I = {l- ba)-Ia-I - a-I = {a - aba)-I - a-I.
Hence
                          a - aba = {a- I + {b- I _ a)-I)-I,
= {e - ab - ba + abe-Iba)a,
(9.1.7)
for all a, b, e such that e f:. O. For e = ab the right-hand side reduces to
ab - ab - ba + ba = 0, hence the left-hand side of (9.1.7) vanishes for e = abo
Thus (ab)a is either aaba or baa a; it only remains to show that the same alternative
holds for all pairs.
   Fix a E K and put Ua = {b E KI{ab)a = aaba}, Va = {b E KI{ab)a = baa a}.
They are clearly subgroups of the additive group of K whose union is K, hence for
346                                                                            Skew fields
each a E K, one of them must be all of K (see Exercise 2). Now put
U = {a E KIUa = K}, V = {a E KIVa = K}; then U, V are again subgroups
whose union is K, so one of them must be all of K, i.e. either (ab)a = a!Jb a for all
a, b E K or (ab)a = baa a for all a, bE K.                                        •
Exercises
1. Let k be a commutative field of characteristic O. Verify that the field of fractions of
   the Weyl algebra Al (k) has centre k. What is the centre when k has prime
   characteristic?
2. In the proof of Theorem 9.1.3 the fact was used that a group G cannot be the
   union of two proper subgroups H, K. By considering the product of two elements,
   one not in H and one not in K, prove this fact.
3. Let K be a skew field. Show that for any C E K the centralizer of the set of all con-
   jugates of c is either K or the centre of K. Deduce that no non-central element c
   can satisfy the identity ex - I ex = x - I exc for all x E K x .
4. Let a: K --+ L be a mapping between fields such that (x + y)a = x a + ya, l a =).,
   (x a ) - I =). -I(X- I ) a). -I. Show that x a = XI)., where r is a homomorphism or
   an antihomomorphism.
5. Show that any non-central element in a skew field has infinitely many conjugates.
6. Let K be a skew field. Show that any conjugacy class of elements of K outside the
   centre generates K as a field. Show that any subfield containing K x I coincides
   with K.
7. (Herstein) Let K be a skew field of prime characteristic and G a finite subgroup of
   K x • Denoting by P the prime subfield of K, show that the P-space spanned by G is
   an algebra, hence a finite field. Deduce that G must be cyclic.
8. Let K be a skew field. Show that any abelian normal subgroup of K x is contained
   in the centre of K.
to the non-commutative case; we have seen in Section 5.3 how this can be done for a
finite-dimensional algebra by means of the norm. The general definition is due to
Jean Dieudonne [1943]; before presenting it let us examine the simplest case, that
of a 2 x 2 matrix.
  The columns of the matrix ( :      ~) over a skew field K say, are linearly dependent
iff the equations
                             ax + by = 0,     ex + dy = 0,                        (9.2.1)
have a non-trivial solution (x, y). If a = 0, such a solution exists precisely when
b = 0 or e = 0, so let us assume that a i=- o. Then in any non-trivial solution
y i=- 0 and by eliminating x from (9.2.1) we obtain (d - ea-1b)y = 0, hence the
condition for linear dependence is
                                                                                  (9.2.2)
  Depending on which entries of A are zero, we can find various expressions whose
vanishing characterizes the linear dependence of the columns of A, but a few trials
make it clear that there is no polynomial in a, b, e, d with this property. Thus
we must expect a determinant function (if one exists) to be a rational function.
A second point is that under any reasonable definition one would expect         (~ ~ )
and   (~   :) to have the same determinant. This suggests that even for a skew field
the values of the determinant must lie in an abelian group. For any field K we define
the abelianized group K ab as
                                    K ab = K X /K x "
where K x' is the derived group of the multiplicative group K X. Thus for a commu-
tative field K ab reduces to K x . It is clear that K ab is universal for homomorphisms of
K x into abelian groups.
   As usual we write GLn(K) for the group of all invertible n x n matrices over K. Let
us recall that any m x n matrix A over K may be interpreted as the matrix of a linear
mapping from an m-dimensional to an n-dimensional vector space over K. By
choosing suitable bases in these spaces we can ensure that A takes the form Ir EB 0,
where r is the rank of A. Thus there exist P E GLm(K), Q E GLn(K) such that
                                   PAQ =    ( Ir0 0)
                                                  0 .                             (9.2.3)
This was proved in the remark after Proposition 7.2.3. In particular, (9.2.3) shows
that a square matrix over any field is invertible iff it is left (or equivalently, right)
regular (see also Corollary 9.2.3 below). Such a matrix is also called non-singular
and a non-invertible matrix is called singular.
  Without a determinant we cannot define SLn' but we have the group En(K)
generated by all elementary matrices Bij(e) = I + eEij (see Section 3.5). We observe
348                                                                                       Skew fields
Lemma 9.2.1 (Whitehead's lemma). Let R be any ring and n ~ 1. For any
A, B E GLn(R), AB $ 1 and BA $ 1 lie in the same (left or right) coset of E2n (R), a
fact which may be expressed by writing
for each matrix on the right can be written as a product of n2 elementary matrices.
Hence we have
                ( C-
                    o
                     I
                           0)=(0 -1)( 0
                           C           1        0       -C-
                                                              I
                                                                  o )EE2n (R),
                                                                  C                          (9.2.7)
for the matrices on the right are instances of (9.2.6). Now we have
  As we shall soon see, in the case of a skew field En(K) is the derived group
GL n (K)' (except when n = 2 = IK I), in particular, En is a normal subgroup of
GLn; for a commutative field k, En(k) = SLn(k) and the result was proved in
Proposition 3.5.2.
  We can embed GL n (K) in GL n + 1 (K) by mapping A to A $ 1. In this way we
obtain an ascending chain
                                 GL 1 (K)   c   GL2 (K)   c ... ,
whose union is again a group, written GL(K) and called the stable general linear
group. Its elements may be thought of as infinite matrices which differ from the
unit matrix only in a finite square. Similarly the union of the groups En(K) is a
group E(K).
9.2 The Dieudonne determinant                                                                 349
    In order to obtain a definition for the determinant we shall need to refine the
expression (9.2.3) for a matrix. A square matrix is called lower unitriangular if all
                                             I
the entries on the main diagonal are and those above it are                 0;
                                                                      it is clear from
the definition that such a matrix is a product of elementary matrices and hence is
invertible. Moreover, the lower unitriangular matrices form a group under multi-
plication. An upper triangular matrix is defined similarly. We observe that left multi-
plication by a lower unitriangular matrix amounts to adding left multiples of certain
rows to later rows, and right multiplication by an upper unitriangular matrix comes
to adding right multiples of certain columns to later columns.
   We can now describe a decomposition which applies to any matrix over a skew
field. Our account follows that of Peter Draxl (1983), with some simplifications.
Theorem 9.2.2 (Bruhat normal form). Let K be a skew field and A                  E   mKn. Then A
can be written in the form
                                         A=LMU,                                           (9.2.8)
               (     I
                   ca- I
                           0) (a
                           l   0
                                         0
                                     d-ca-Ib
                                                   ) ( I
                                                        °a   -I
                                                             1
                                                                  b)   if a # 0,
Proof. Suppose that the first non-zero row of A is the i-th row and that its first non-
zero entry is aij' By adding left multiples of the i-th row to each succeeding row we
can reduce every entry in the j-th column except aij to O. These operations corre-
spond to left multiplication by a certain lower unitriangular matrix. Next we add
right multiples of the j-th column to succeeding columns to reduce all entries in
the i-th row except aij to 0; these operations will correspond to right multiplication
by an upper unitriangular matrix. As a result aij is the only non-zero element in its
row and column, and all the rows above the i-th are zero. Next we take the first non-
zero row after the i-th, say the k-th row and with the first non-zero entry akl in this
row we continue the process. After at most m steps A has been reduced to the form
M where each row and each column has at most one non-zero entry, with a left
factor which is lower and a right factor which is upper unitriangular. This is the
required decomposition (9.2.8).
350                                                                           Skew fields
  We remark that the matrices L, U in (9.2.8) are not generally unique. For example,
we have
Corollary 9.2.3. For any m x n matrix A over a skew field K the following conditions
are equivalent:
(a) A is left regular: XA = 0 => X = 0,
(b) A has a right inverse: AB = I for some BE nKm.
Moreover, when (a), (b) hold, then m .:s n, with equality if and only if (a), (b) are
equivalent to their left-right analogues.
Proof. Either of (a), (b) clearly holds for A precisely when it holds for its core and it
holds for the core iff each row has a non-zero entry.                                 •
    Let us define a monomial matrix as a square matrix with precisely one non-zero
entry in each row and each column, e.g. the core of an invertible matrix is a mono-
mial matrix. If all the non-zero entries in a monomial matrix are 1, we have a
permutation matrix; this may also be defined as the matrix obtained by permuting
the rows (or equivalently, the columns) of the unit matrix. The determinant of a per-
mutation matrix is 1 or -1 according as the permutation is even or odd. Sometimes
it is more convenient to have matrices of determinant 1; this can be accomplished by
using a signed permutation matrix, i.e. a matrix obtained from the unit matrix by a
series of operations which consist in interchanging two columns and changing the
sign of one of them. By (9.2.6) such a matrix is a product of elementary matrices.
    Any monomial matrix M may be written in the form
                                       M=DP,                                      (9.2.9)
9.2 The Dieudonne determinant                                                        351
Proposition 9.2.4. For any skew field K and any n :=: 1, we have
(9.2.10)
where Dn(K) is the group of all diagonal matrices in GLn(K). Moreover, for any n :=: 2,
                                                                                (9.2.11)
(9.2.12)
provided that   a = b(l - v). If K i=- F2 , there is an element vi=- 0, 1 and putting
b = a(l- v) -I, we can use (9.2.12) to express Bij(a) as a commutator. Hence
En s; G~ whenever K i=- F2 • If n :=: 3, we have
   We now define for any skew field K, a mapping 8 : GL(K) ---* K ab from the stable
linear group to the abelianized group of K, as follows: for A E GL n (K)8(A) =
IT= 1 di , where the di are the diagonal elements of D in the expression (9.2.9) for the
core of A and di is its residue class mod K x '. The value 8(A) is called the Dieudonne
determinant, or simply the determinant of A. To obtain its properties we shall need
352                                                                           Skew fields
to find how it is affected by permutations of its rows; for simplicity we consider the
effect of signed permutations.
Lemma 9.2.5. For any A    E   GLn(K} and any signed permutation matrix P,
                                       8(PA}   = 8(A}.
Proof. By induction it will be enough to prove the result when P is the signed per-
mutation matrix corresponding to a transposition, (r, s) say, where r < s. Denote
this matrix by Po and take a Bruhat decomposition (9.2.8) of A, where the core is
factorized as in (9.2.9):
                                       A=LDPU,
and denote the (s, r}-entry of L by b. We have
where L', D' are again lower unitriangular and diagonal respectively, and D' differs
from D by an interchange of the r-th and s-th diagonal elements. If c = 0 (which is
the case iff b = O) or if the permutation corresponding to PoP preserves the order of
r, s, then the matrix on the right takes the form L'D'PoPBrs(c}U; this is again in
Bruhat normal form and so we have 8(PoA} = 8(A} in this case.
   Suppose now that c =f:. 0 and that PoP inverts the order r, s. Then the formula
                  1 1 -1)
                       0 (1c- O)(C
                              1 0 c-0)(10 _C-1
                                                                      1
             ( 1 C)(O             =                                       )
              o                            1                1
shows that, on writing Di(C} for the matrix differing from the unit matrix only in the
(i, i}-entry, which is c, we have
                    Brs(c}PO = Bsr(C-l}Dr(c}Ds(c-l}Brs( - c- 1 ).
Inserting this in the expression (9.2.13) for PoA, we find
Theorem 9.2.6. For any skew field K, the determinant function 8 is a homomorphism
giving rise to an exact sequence
unitriangular, so BA and A have the same core and hence the same determinant. If
i < j, let Po be the signed permutation matrix corresponding to the transposition
(i,j). Using Lemma 9.2.5 and the case just proved, we have
hence the result holds generally. The same argument applies for multiplying by an
elementary matrix on the right. Now Lemma 9.2.1 shows that
This shows 8 to be a homomorphism. Clearly its image is Kab; its kernel includes
GL(K)' = E(K) because K ab is abelian. Conversely, if 8(A) = 1, then the core of
A has the form DP, where P is a signed permutation matrix and so 1 = 8(A) =
8(D). By (9.2.7) we can apply elementary operations to reduce D to the form D)(c),
where c is the product of the diagonal elements of D. But by hypothesis c = 1, hence
D has been reduced to 1 and so A E En(K), as we wished to show.                   •
   Recently another more general form, the quasideterminant, has been defined by
Izrail Gelfand and Vladimir Retakh [1997], which is essentially a rational expression
defined recursively in terms of the n - 1 x n - 1 sub matrices.
Exercises
1. Show that the transpose of every invertible matrix over a field K is invertible iff K
   is commutative. (Hint. Try a 2 x 2 matrix with (1, I)-entry 1.)
2. Use Theorem 9.2.2 to show that GLn(R) = Dn(R)En(R) for any local ring R.
3. Show that GL2 (F 2 ) = E2 (F 2 ) = Sym30
4. Let Kbe a skew field. Show that A E GLn(K) can be written as A = LDPU, where
   L is lower, U is upper unitriangular, D is diagonal, P is a permutation matrix and
   PUp- 1 is also upper triangular. Moreover, such a representation is unique (Draxl
   (1983); this is known as the strict Bruhat normal form).
5. Show that a homomorphism GLn(K) --+ Symn can be defined by associating with
   A E GLn(K) the permutation matrix P from the representation A = LDPU.
6. Show that if P is the permutation matrix obtained by applying a permutation a to
   the rows of I, then it can also be obtained by applying a-I to the columns of I.
354                                                                           Skew fields
7. Let K be a skew field with centre C. Show that 8 restricted to C reduces to the
   usual determinant, provided that no element of C other than I is a product of
   commutators.
r :::: s. For if r > s, we can adjoin zero columns to A and zero rows to B to obtain
square matrices. Now we have
Comparing (r, r)-entries, we obtain 0 = 1, which contradicts the fact that R is non-
trivial.
Lemma 9.3.1. Let R be a non-trivial weakly finite ring and consider a partitioned
matrix over R:
and these transformations leave the inner rank unchanged. If pA      = s, this matrix can
be written as
   The existence proof for free fields is based on a lemma of independent interest, the
specialization lemma. This may be regarded as an analogue of the GPI-theorem
(Theorem 7.8.3), which is used in the proof; its commutative counterpart is the
elementary result that a polynomial vanishing for all values in an infinite field
must be the zero polynomial. In the proof we shall need the result that any full
matrix over the tensor ring D(X) remains full over the formal power series ring
D((X)) (Lemma 5.9.4 of Cohn (1985) or Proposition 6.2.2 of Cohn (1995)).
Lemma 9.3.2 (Specialization lemma). Let D be a skew field with infinite centre C
and such that [D : C 1 is infinite. Then any full matrix over the tensor ring Dc(X) is
invertible for some choice of X in D.
Proof. Let A = A(x) be any full n x n matrix over Dc(X) and denote by r the
supremum of its ranks as its arguments range over D. We have to show that
356                                                                         Skew fields
holds over D[ [t]], for all a E DX. This means that the matrix
(9.3.3)
vanishes when the elements of X are replaced by any values in D. Now (9.3.3) is a
power series in t with coefficients that are matrices over Dc(X). Thus the coefficients
are generalized polynomial identities (or identically 0), so by Amitsur's GPI-theorem
(Theorem 7.8.4), the expression (9.3.3) vanishes as a matrix over Dc(X)[[tJ]. It
follows that for r < n, A(tx) is non-full over Dc(X) [[tJ]. Hence we can write
A(tx) as a product PQ, where P is n x rand Q is r x nand P, Q have entries from
Dc(X) [[ t ll; putting t = 1, we obtain a corresponding factorization
                                         A(x)   = PQ                              (9.3.4)
over Dc( (X)) which by the result quoted can be taken over Dc(X). Thus A(x) is non-
full over Dc(X), a contradiction, which proves the result.                       •
   In this lemma the condition [D: C] = 00 is clearly necessary; whether the con-
dition that C be infinite is needed is not known.
   We can now prove the existence of free fields:
Theorem 9.3.3. Let D be a skew field with centre C and X any set. Then Dc(X) can be
embedded in a field U, generated by Dc(X), such that every full matrix over Dc(X)
becomes invertible over U.
Proof. Suppose first that [D: C]       = 00, ICI = 00, and consider the mapping
                                       Dc (X) -+ DDX,
where P E Dc(X) is mapped to (PI)' with PI = p(xf), for any f E DX. With each
square matrix A over Dc(X) we associate a subset f0(A) of D X defined by
~(A) is called the singularity support of A. Of course ~(A) = 0 unless A is full, but
by Lemma 9.3.2, ~(A) =f. 0 whenever A is full. If P, Q are any invertible matrices,
then P EB Q is invertible, hence A(x) EB B(x) becomes singular precisely when A(x) or
B(x) becomes singular, thus
                              ~(A)   n ~(B)   = ~(A EB B).
It follows that the family of sets ~(A), where A is full, is closed under finite intersec-
tions. Hence it is contained in an ultrafilter $' on D X (see Section 1.5), and we have
a homomorphism to an ultrapower
(9.3.5)
where by definition, every full matrix A over Dc(X} is invertible on ~(A) and so is
invertible in the ultrapower. Hence the subfield of the ultrapower generated by
Dc(X} is the required field U.
   In the general case we take indeterminates r, s, t and define D\ = D(r),
D2 = D\ (s). On D2 we have an automorphism ex : f(s) 1-+ f(rs), with fixed field
D\. We now form E = D2 (t; ex); the centre of E is the centre of Db namely C(r)
(see Theorem 7.3.6). This is infinite and E has infinite dimension over C(r), because
the powers sn are linearly independent. It is clear that we have an embedding
Dc(X} -+ Eqr)(X}, and it follows from the inertia lemma (Lemma 8.7.3) that this
embedding is honest, i.e. full matrices are mapped to full matrices. Hence on
taking the field U constructed earlier for Eqr)(X}, we obtain a field over which
every full matrix over D(X} becomes invertible.                                    •
   The field U whose existence has been established in Theorem 9.3.3 is denoted by
Dc( (X}) and is called the universal field of fractions of Dc(X} or also the free field
over D with centre C (its centre can be shown to be C). The existence proof for
free fields goes back to Shimshon Amitsur [1966], who used his results on general-
ized rational identities. The existence of such a universal field of fractions can be
proved more generally for any tensor ring Kr(X}, where K is any skew field and L
any subfield of K. This is a special case of the fact that every semifir has a universal
field of fractions over which every full matrix can be inverted (see Cohn (1985),
Chapter 7).
   We remark that any automorphism of Dc(X} is honest and therefore extends to an
automorphism of U. Further, by representing derivations as homomorphisms from
Dc(X} to U2 we see that for any automorphisms ex, fJ of U, any (ex, fJ)-derivation of
Dc(X) extends to one of U.
Exercises
1. Show that the n x n unit matrix over a ring R is full iff R n cannot be generated by
   less than n elements. Show also that a non-trivial weakly finite ring has IBN.
2. Let E = k(t) be the field of rational functions in one variable t, with the
   endomorphism ex r : f(t) 1-+ f(t r ) (r > 1). Show that the subalgebra of E[x; ex r1
   generated by x and y = xt is free on x, y. Using Exercise 2 of Section 7.3,
358                                                                              Skew fields
     obtain for each r > I a field of fractions LT of the free algebra F. Show that these
     fields are non-isomorphic as F-rings 0. L. Fisher).
3.   Show that an endomorphism () of Dc(X} can be extended to an endomorphism of
     the free field iff () is honest.
4.   Show that every honest endomorphism is injective; give an example of an endo-
     morphism of the free algebra k(X} which is injective but not honest.
5.   Verify that over a skew field the inner rank agrees with the rank.
6.   Let K be a skew field with infinite centre. Show that for any square matrix A over
     K there is an element a in K such that A - aI is non-singular. For a finite field F
     find a matrix A such that A - xl is singular for all values of x in F (for infinite
     fields with finite centre the question remains open).
7.   Show that over a PID, a square matrix is regular iff it is full. Give an example of a
     square matrix over a free algebra which is regular but not full. (Hint. Try a 3 x 3
     matrix with a 2 x 2 block of zeros.)
and it is easily verified that V is a valuation ring on K. In this way valuation rings on
K and valuations correspond to each other; to make the correspondence bijective we
define two valuations v, v' on Kwith value groups r, r' to be equivalent if there is an
order-preserving isomorphism cp: r -+ r' such that
Theorem 9.4.1. On any field K there is a natural bijection between valuation rings and
equivalence classes of valuations on K.
Proof. This is an easy consequence of the above remarks and may be left to the
reader to prove.                                                            •
+ +
(9.4.3)
When V is principal,   r   is infinite cyclic, so then the horizontal sequences split and
                                                                                   (9.4.4)
For this reason the rows of (9.4.3) add nothing to our knowledge in the cases usually
encountered, but in general, especially with a non-abelian value group, (9.4.3)
provides more information about K x •
  In constructing valuations it is helpful to know that any valuation on an Ore
domain has a unique extension to its field of fractions.
Proposition 9.4.2. Let R be a right Ore domain with field offractions K. If v is a valua-
tion on R satisfying V.I-V.3, then v has a unique extension to K.
Proof. If v is to have an extension to K, then for p = as -   I   E   K we must have
                                   v(p)   = v(a) -   v(s).                         (9.4.5)
Suppose that as- I = alsi l . Then there exist u, UI E R such that SUI = Sl U =j::. 0,
aUI = alu; hence a = 0 {} al = 0, and when a, al =j::. 0, then -v(ad + v(a) =
v(u) - V(UI) = -v(sd + v(s) and so v(a) - v(s) = v(al) - V(SI). This shows the
definition (9.4.5) to be independent of the particular representation as- I of p.
Now it is easily verified that v so defined satisfies V.I-V.3 on K.                •
Examples of valuations
1. Let K be any field with a valuation v. We can extend v to the rational function
   field K(x) by defining v on any polynomialf = Lxia; (a; E K) by the rule
                                   v(f)   = min{v(a;)},                            (9.4.6)
   and using Proposition 9.4.2 to extend v to K(x). This is called the Gaussian
   extension of v; the value group is unchanged while the residue class field under-
   goes a simple transcendental extension. The same construction works if instead of
   K(x) we use K(x; a), the skew function field with respect to an automorphism a
   of K, provided that v(aa) = v(a) for all a E K.
9.4 Valuations on skew fields                                                         361
   we obtain an extension, called the x-adic extension (provided that 8 > 0), with the
   same residue class field and enlarged group.
2. Consider the free field k( (x, y})j let E be the subfield generated over k by
   Yi = x-iyx i (i E Z). The conjugation by x defines an automorphism ex which
   maps E into itself, the 'shift automorphism' Yi 1-+ Yi+ 1. Moreover, k«x, y}) may
   be obtained as the skew function field E(x; ex). Taking the x-adic extension of
   the trivial valuation on E, we obtain a principal valuation on k( (x, y) ) with residue
   class field E. We note that whereas the general aim of valuation theory is to obtain
   a simpler residue class field, E is actually more complicated than the original field.
   As we shall see, in order to simplify the residue class field we must allow a more
   complicated value group.
3. In a skew field it may happen that an element is conjugate to its inverse:
   y- 1xy = X-I. For example, let ex be the automorphism of the rational function
   field F = k(x) defined by f(x) H f(x- 1 ) and put E = F(y; ex). Ifv is any valuation
   on E, then v(x) = 0, for if v(x) =I- 0, say v(x) > 0, then v(x- 1 ) < 0, but X-I =
   y-lxy, hence v(x- 1 ) = -v(y) + v(x) + v(y) > 0, a contradiction. In fact fields
   exist in which every element outside the centre is conjugate to its inverse (e.g.
   the existentially closed fields, see Cohn (1995)). It is clear that such a field can
   have no non-trivial valuation.
One of the main tools of the commutative theory is Chevalley's extension theorem
(see BA, Section 9.5), which allows one to construct extensions for any valuations
defined on a sub field. Such a result is not to be expected in general, but an analogue
exists when the value group is abelian, and it is no harder to prove.
   A valuation on a field K is said to be abelian if its value group is abelian. For any
field K we denote the derived group of K x by K C • It is clear that any abelian valua-
tion on K is trivial on K C • As an almost immediate consequence we have
Lemma 9.4.3. Let K be a skew field with a valuation v and valuation ring V. Then v is
abelian if and only if V 2 K C or equivalently, v(a) = 0 for all a E K C• Moreover, any
subring A of K such that A 2 K C is invariant, and any ideal in A is invariant.
Proof. The second sentence follows from the fact that v is abelian iff the unit group
of V contains K C • To prove the last sentence, take any a E A x, b E K x j then
b -1 ab = a.a -1 b -1 ab E A, and similarly for any ideal of A.                    •
   To state the analogue ofChevalley's theorem (Lemma 9.4.4) we require the notion
of domination. On any field K we consider the pairs (R, a) consisting of a subring R
of K and a proper ideal a of R. Given two such pairs Pi = (Ri, ai) (i = 1,2), we say
that PI dominates Pz, in symbols PI :::: Pz if Rl 2 Rz and al 2 az and write PI > Pz
(as usual) for proper domination, i.e. to exclude equality. If the pair (R, a) is such
362                                                                            Skew fields
that R ;2 KC, then every element of K Cis a unit in R (because K Cis a group), and so
K C n a = 0. The essential step in our construction is the following
Lemma 9.4.4. Let K be a skew field, R a subring containing K C and a a proper ideal in
R. Then there is a subring V with a proper ideal m such that (V, m) is maximal among
pairs dominating (R, a), and any such maximal pair (V, m) consists of a valuation ring
and its maximal ideal.
Proof. This is quite similar to the commutative case (BA, Lemma 9.4.3); we briefly
recall it to show where changes are needed.
   The pairs dominating (R, a) form an inductive family, so a maximal pair exists by
Zorn's lemma. If (V, m) is a maximal pair, then m is a maximal ideal in V, and since
V ;2 KC, V and m are invariant. To show that V is a total subring in K, take c E K; if
c ¢ V, then V[c] :J V, so if the ideal m' generated by min V[c] is proper, we have
(V[c], m') > (V, m), contradicting the maximality of the latter. Hence m' = V[c]
and we have an equation
(9.4.8)
Here we were able to collect powers of c on the right of each term because of the
invariance of m, using the equation cb = cbc-I.c.
  Similarly if c - I ¢ V, we have
(9.4.9)
We assume that m, n are chosen as small as possible and suppose that m          ~   n, say.
Multiplying (9.4.9) on the right by cm, we obtain
                                                                                 (9.4.10)
By the invariance of V, xc = c.xy for all x E V, where y = y(c) is an automorphism
of V which maps m into itself. If we multiply (9.4.8) by 1 - bo on the left and (9.4.10)
by amym on the right and substitute into (9.4.8), we obtain an equation of the same
form as (9.4.8) but of degree less than m, a contradiction. This proves V to be total,
and hence a valuation ring; from the maximality it is clear that m is the maximal
ideal on V.                                                                           •
v, with maximal ideal m, then mL C is a proper ideal in VU, and by Lemma 9.4.4 there
is a maximal pair (W, n) dominating (VU, mLC). Now Wis a valuation ring satisfy-
ing W n K :2 V, n n K :2 m, hence W n K = V and so W defines the desired
extension.
                                                                                        •
   To make valuations more tractable we shall require an abelian value group and
commutative residue class field. It is convenient to impose an even stronger con-
dition, as in the next result:
Proposition 9.4.6. Let K be a skew field with a valuation v, having valuation ring V,
maximal ideal m and group of I-units U1 . Then the following conditions are equivalent:
(a) KXjU I is abelian,
(b) K C S; 1 + m = Ul>
(c) v(l - a) > 0 for all a E K C•
Moreover, when (a)-(c) hold, then the value group and residue class field are
commutative.
Proof. This is an almost immediate consequence of the definitions, and the last
sentence is clear from a glance at the diagram (9.4.3).                     •
Theorem 9.4.7. Let K be a skew field with a quasi-commutative valuation v, and let L
be an extension field of K. Then v can be extended to a quasi-commutative valuation of
L if and only if there is no equation in L of the form
                                                                                  (9.4.12)
because v(ai) > 0, w(% - 1) > O. It follows that no equation of the form (9.4.12) can
hold. Conversely, assume that there is no equation (9.4.12) and consider the set q of
all expressions L aipi + L bj (qj - 1), where ai, bj, Pi, % are as before. It is clear that
q is closed under addition and contains the maximal ideal corresponding to the
valuation v. Moreover, q is invariant in L, i.e. u - 1qu = q for all u E LX, because
U-1aiPiu = ai.ai-lu-laiu.u-lpiU E VLc, and similarly for the other terms. In the
same way we verify that q admits multiplication. We now define
   Let D be a skew field with centre C and let X be any set. If D has a quasi-
commutative valuation v, one can use Theorem 9.4.7 to extend v to a quasi-
commutative valuation of the free field D( (X)), but this requires more detail on how
free fields are formed. In essence one uses the specialization lemma to show that if
there is an equation (9.4.12), then X can be specialized to values in D so as to
yield an equation (9.4.12) in D, which is a contradiction (see Cohn [1987], [1989]).
Exercises
1. Show that any total subring of a field is a local ring.
2. A place of a field K in another, L, is defined as a mapping f : K ~ L U {(Xl} such
   that f-I(L) = V is an invariant subring of K, the restriction flV is a homo-
   morphism and xf = (Xl implies x =1= 0 and (x-I)f = O. Show that V is a valuation
   ring and that conversely, every valuation ring on K leads to a place of K in the
   residue class field of V. Define a notion of equivalence of places and show that
   there is a natural bijection between valuation rings on K and equivalence classes
   of places.
3. Verify that a valuation on a field K has value group Z iff its valuation ring is a
   PID.
4. Let F = Dc(X) and denote by U the free field Dc( (X)). Form U(t) with a central
   indeterminate t and define a homomorphism 'A: U ~ U(t) as the identity on D
   and mapping x E X to xt. Let Va be the t-adic valuation on U(t) over U (i.e. trivial
   on U) and put v(p) = va(P'A) for P E U. Verify that v is a valuation on U; find its
   value group and residue class field.
5. Let K be a field with a valuation v whose value group r is a subgroup of R. Define
   an extension of v to the rational function field K(x) by (9.4.7), where 8 E R, 8 > 0,
   and find the new value group and residue class field. Distinguish the cases
   r n 8Z = {O}, r n 8Z =1= {O}.
6. Let E be the field of fractions of the Weyl algebra on k (generated by u, v with
   uv - vu = 1), where char k = O. Writing t = u -I, verify that vt = t( v + t). Show
   that the t-adic valuation on E is quasi-commutative.
9.5 Pseudo-linear extensions                                                         365
   There are a number of cases where the left and right dimensions of a field exten-
                                                                                     •
sion are equal. In the first case equality holds for an extension EI D if E is finite-
dimensional over its centre.
Proposition 9.5.2. Let EID be a skew field extension and assume that E is finite-
dimensional over its centre. Then
                                    [E: Dh    = [E: DlR,                         (9.5.1)
Now A is a sub algebra of the division algebra E, so A is also a skew field and by
Proposition 9.5.1,
                      [E: C]   = [E: A]dA : C] = [E: AlR[A : C].                 (9.5.3)
Since [E: C] is finite, so is [A: C]; dividing (9.5.3) by [A: C], we find that
[E : Ah = [E : A]R' If we now multiply by (9.5.2) and use Proposition 9.5.1 and its
left-right dual, we obtain the required formula (9.5.1).                        •
Theorem 9.5.4. If EID is a skew field extension, then (9.5.4) holds whenever either side
is finite, provided that either (i) D is commutative or (ii) E or D is finite-dimensional
over its centre.
Proof. It only remains to treat the case where D is finite-dimensional over its centre.
Let Z be this centre, and denote the centre of E by C; further assume that [E : D 1L is
finite, so
                                  [E : Zh   = [E : DldD : Z],                    (9.5.5)
and this is also finite. Denote by K the subfield generated by C and Z; clearly K
is commutative and [E: Zh = [E : KJdK : Zh, so [E: Kh is finite, hence by
Proposition 9.5.3, [E: K1L = [E : K1 R; now [K: ZlL = [K: ZlR because K is
commutative, and by combining these equalities, we find that [E : ZlL = [E : ZJR'
Now (9.5.4) follows from this equation, combined with (9.5.5) and its right-hand
analogue.                                                                      •
and
                        u2 + UA + /L = 0 for certain A, /L          E   D.       (9.5.7)
Here ca , c 8 are uniquely determined by c and a calculation as in Section 7.3 shows a
to be an endomorphism of D and 8 an a-derivation. Moreover, the structure of E is
completely determined by D and (9.5.6), (9.5.7).
   Conversely, if D is any field with an endomorphism a and an a-derivation 8, then
for given A, /L EDit is possible to write down necessary and sufficient conditions for
a quadratic extension of D to be defined by (9.5.6) and (9.5.7) (see Exercise 3).
9.5 Pseudo-linear extensions                                                                   367
What has been said shows that every extension of right dimension 2 is pseudo-linear;
in higher dimensions the pseudo-linear extensions form a special class. We note the
following formula for the left dimension:
Proof. Take a generator u of EID and write Eo               = D, Ej = uEj -I + D (i ::: 1). Then by
induction on r we have
                                Er      = D + uD + ... + urD.
Moreover, each E;      IS   a left D-module, by (9.5.6), so we have a tower of left
D-modules
                               D=Eo eEl          e ... eEn- 1 =E,
and (9.5.9) will follow if we prove
(9.5.11)
where .1.. 0 , AI, ... ,Ar-I range independently over I, form a basis of Er (mod Er- I ).
This will prove (9.5.10) and hence (9.5.9).
  Any c E D can be written as a linear combination of the eA with coefficients in D",
say c = I::C~D eAD • If we repeat the process on cAD we obtain cAD = I::C~DAI eAI . Hence
                                C   =   L ,,'
                                           CAO ... Ar-l   e,,'-I
                                                           Ar-l '" eA0 •
368                                                                                   Skew fields
Therefore
Hence the elements (9.5.11) span Er (modEr-I). To prove their linear indepen-
dence, assume that
                      L      ra'
                          U C,      ,
                              "-0···l1.r-1
                                                a'-I
                                              e,I'.r-l ... eAo == 0
Since the eA are left linearly independent over Da, we can equate the coefficients of
eAo to 0 and using induction on r we find that eAo"'A,_1 = O.                      •
  In order to obtain a quadratic extension satisfying (9.5.6) and (9.5.7) let us assume
that A = 0, a8 + 8a = 0, and that 8 2 is the inner a 2 -derivation induced by -/L.
Further assume that /L a = /L,/L 8 = 0 and that D contains no element a satisfying
a.aa + a 8 + /L = O. (9.5.12)
  We form the skew polynomial ring R = D[t; a, 8] and consider f = t 2 + /L. For
any e E D we have
                              = t 2 ea + /Lea 2                2
                                                                   - e
                                                                         8'
                                                                              + e8'
                              = fe       a2
                                              •
one obtains a field P containing E in which x has a 2n-th root for all n ::: 0, so the
mapping x I~ x 2 , Y I~ - Y is an automorphism of P and the restriction to E is
the required endomorphism. This is fairly plausible, but we shall not give a formal
proof here (see Cohn (1995), Section 5.9).
    It is easily checked that ex maps D into itself; moreover D admits the inner ex-
derivation 8 : a I~ ay - yaa induced by y. If we can show that y ¢ D and that ex is not
an automorphism of D, we have a quadratic extension EID with left dimension> 2;
in fact we shall find that [E: Dh = 00.
    Consider the (ex, i)-derivation yon E such that x Y = 0, yY = 1. We have (ab)Y =
aYb + aab Y, by definition, hence x Y = 0, (y2) Y = Y + ya = 0, (xy - yx2) Y =
x 2 - x 2 = O. Hence y vanishes on D, but yY = I, so Y ¢ D.
    Finally to show that [E: Dh = 00, we first note that if [D: Dah = n, then
[E: Eah is finite. For let UI,"" Un be a left Da-basis of D. We claim that
Ui, UiUj, UiYUj span E as left Ea-space. Any pEE has the form p = a + yb, where
a, bED, say a = L aiui, b = L bi ui; hence p = L aiui + y L bi u i = L ai u +
L biYUi - L bfui = L ai u + L b'0Uiyuj - L C'0UiUj, for suitable bij, Cij ED. This
proves our claim; in particular, it shows that [E : Eah ~ 2n2 + n, if [D : Dah = n.
Now E a = k( (x 2, y» and so the elements xyr (r = 1,2, ... ) are left Ea-linearly inde-
pendent. This is intuitively plausible and can be proved with the methods of Cohn
(1977), Lemma 5.5.5. It follows that [E: Ph = 00 and by Proposition 9.5.5,
[E: D1L = 00.
Exercises
1. Show that every cyclic division algebra may be described as a pseudo-linear exten-
   sion of a maximal commutative subfield.
2. Use the methods of this section to construct a field extension of right dimension n
   and infinite left dimension.
3. Show that (9.5.6) and (9.5.7) define an algebra of right dimension 2 over D iff
   ca8 + c8a = )..c a2 _ ca).., c 82 + c8).. = ILe a2 - Cf.L, )..8 = f.L - f.La - )..().. - )..a), f.L8 =
   f.L().. a - )..), and this extension is a field iff c.c a + c).. + f.L -=f:. 0 for all c E D.
The theory of error-correcting codes deals with the design of codes which will detect,
and if possible correct, any errors that occur in transmission. Codes should be dis-
tinguished from cyphers, which form the subject of cryptography. The subject of
codes dates from Claude Shannon's classic paper on information theory (Shannon
[1948]) and Section 10.1 provides a sketch of the background, leading up to the
statement (but no proof) of Shannon's theorem. Most of the codes dealt with are
block codes which are described in Section 10.2, with a more detailed account of
special cases in Sections 10.3-10.5; much of this is an application of the theory of
finite fields (see BA, Section 7.8).
    Many of our codes are binary: q = 2. For example, in the game of 'twenty ques-
tions' an object has to be guessed by asking 20 questions which can be answered 'yes'
or 'no'. This allows one to pick out one object in a million (since 220 "-' 10 6 ). Usually
a binary code will have the alphabet {O, 1}; our coded message will then be a string
of 0' sand l' s. As a simple check we can add 1 when the number of l' s in the message
is odd and 0 when it is even. If the received message contains seven l' s we know that
a mistake has occurred and we can ask for the message to be repeated (if this is
possible). This is a parity check; it will show us when an odd number of errors
occurs, but it does not enable us to correct errors, as is possible by means of
more elaborate checks. Before describing ways of doing this we shall briefly discuss
the question of information content, although strictly speaking this falls outside our
topic. The rest of this section will not be used in the sequel and can be omitted with-
out loss of continuity.
   It is intuitively clear that the probability of error can be made arbitrarily small by
adding sufficiently many checks to our message, and one might think that this will
make the transmission rate also quite small. However, a remarkable theorem due to
Shannon asserts that every transmission channel has a capacity C, usually a positive
number, and for any transmission rate less than C the probability of error can be
made arbitrarily small. Let us briefly explain these terms.
   The information content of a message is determined by the likelihood of the event
it describes. Thus a message describing a highly probable event (e.g. 'the cat is on the
mat') has a low information content, while for an unlikely message (,the cow jumped
over the moon') the information content is large. If the probability of the event
described by the message is P, where 0 :::: P :::: 1, we shall assign as a measure of
information -10g2 p. Here the minus sign is included to make the information
positive and the logarithm is chosen to ensure that the resulting function is additive.
If independent messages occur with probabilities PI, P2 then the probability that both
occur is PIP2 and here the information content is
All logs are taken to the base 2 and the unit of information is the bit (binary digit).
Thus if we use a binary code and 0, 1 are equally likely, then each digit carries the
information -log (1/2) = 1, i.e. one bit of information.
   Suppose we have a channel transmitting our binary code in which a given message,
consisting of blocks of k bits, is encoded into blocks of n bits; the information rate of
this system is defined as
R = k/n. (10.1.1)
We assume further that the probability of error, x say, is the same for each digit; this
is the binary symmetric channel. When an error occurs, the amount of information
lost is -log x, so on average the information lost per digit is -x. log x. But when no
error occurs, there is also some loss of information (because we do not know that no
error occurred); this is -log (1 - x). The total amount of information lost per digit
is therefore
                       H(x) = -x. log x - (1 - x). log (1 - x).
10.2 Block codes                                                                  373
This is also called the entropy, e.g. H(O.l) = 0.469, H(O.Ol) = 0.0808. The channel
capacity, in bits per digit, is the amount of information passed, i.e.
                                   C(x) = 1 - H(x).
Thus C(O.l)   = 0.531, C(O.Ol) = 0.9192. We note that C(O) = 1j this means that for
x = 0 there is no loss of information. By contrast, C(1/2)
                                                         = OJ thus when there is an
even chance of error, no information can be sent. The fundamental theorem of
coding theory, proved by Shannon in 1948, states that for any 8,8 > 0, there exist
codes with information rate R greater than C(x) - 8, for which the probability of
error is less than 8. In other words, information flows through the channel at
nearly the rate C(x) with a probability of error that can be made arbitrarily small.
Here the information rate of the code is represented by (10.1.1). More generally,
if there are M different code words, all of length n, then R = (log M) I n. For a
binary code there are 2k different words of length k, so log M = k and the rate
reduces to kin.
Exercises
l. How many questions need to be asked to determine one object in 109 if the reply
   is one of three alternatives?
2. In the binary symmetric channel with probability of error 1, no information is
   lost, i.e. C(1) = l. How is this to be interpreted?
3. If n symbols are transmitted and the probability of error in each of them is x,
if we do not wish to specify M. For successful decoding we have to ensure that the
code words are not too close together; if d(x, y) is large, this means that x and y differ
in many of the n places, and x is unlikely to suffer so many changes in transmission
that y is received. Thus our aim will be to find codes for which d is large. Our first
result tells us how a large value of d allows us to detect and correct errors. A code is
said to be r-error-detecting (correcting) if for any word differing from a code word u
in at most r places we can tell that an error has occurred (resp. find the correct code
word u).
   We shall define the r-sphere about a word x E Q as the sphere of radius r with
centre x:
                                                                                        (10.2.1)
Clearly it represents the set of all words differing from x in at most r places.
Proposition 10.2.1.A code with minimum distance d can (i) detect up to d - 1 errors,
and (ii) correct up to (( d - 1) /2] errors.
     Here   (~]   denotes the greatest integer ::::   ~.
Proof. (i) If x is a code word and s errors occur in transmission, then the received
word x' will be such that d(x, x') = s. Hence if 0 < s < d, x' cannot be a code
word and it will be noticed that an error has occurred.
   (ii) If e :::: (( d - 1) /2], then 2e + 1 :::: d and it follows that the e-spheres about the
different code words are disjoint. For if x, yare code words and u E B(x) n B(y),
then
                         2e::: d(x, u)   + d(u, y) ::: d(x, y) ::: d ::: 2e + 1,
which is a contradiction. Thus for any word differing from a code word x in at most
e places there is a unique nearest code word, namely x.                          •
   For example, the parity check mentioned in Section 10.1 has minimum distance 2
and it will detect single errors, but will not correct errors.
   Proposition 10.2.1 puts limits on the number of code words in an error-correcting
code. Given n, d, we denote by A(n, d) or Aq(n, d) the largest number of code words
in a q-ary code of length n and minimum distance d; thus Aq(n, d) is the largest M
for which a q-ary (n, M, d)-code exists. A code for which this maximum is attained
is also called an optimal code; any optimal (n, M, d)-code is necessarily maximal, i.e.
it is not contained in an (n, M + 1, d)-code.
   To obtain estimates for Aq(n, d) we need a formula for the number of elements in
Br(x). This number depends on q, n, r but not on x; it is usually denoted by Vq(n, r).
To find its value let us count the number of words at distance i from x. These
words differ from x in i places, and the values at these places can be anyone of
A set of spheres is said to cover Q or form a covering if every point of Q lies in at least
one sphere. It is called a packing if every point of Q lies in at most one sphere, i.e. the
spheres are non-overlapping. We note that for 0 ::: r::: n,
Theorem 10.2.2. Given integers q ~ 2, n, d, put e = [(d - 1)/2]. Then the number
Aq(n, d) of code words in an optimal (n,', d)-code satisfies
(10.2.3)
Proof. Let C be an optimal (n, M, d)-code; then C is maximal, and it follows that no
word in Qn has distance ~ d from all the words of C, for such a word would allow us
to enlarge the code, and so increase M. Hence every word of Qn is within distance at
most d - 1 of some word of C; thus the (d - I)-spheres about the code words as
centres cover Qn and so M.V(n, d -1) ~ qn, which gives the first inequality in
(10.2.3).
   On the other hand, we have 2e + 1 ::: d, so the spheres Be(x), as x runs over an
(n, M, d)-code, are disjoint and hence form a packing of Qn. Therefore
M.V(n, e) ::: qn, and the second inequality in (10.2.3) follows.                  •
   The above proof actually shows that a code with qn IV(n, d - 1) code words and
minimum distance d can always be constructed; we shall not carry out the construc-
tion yet, since we shall see in Section 10.3 that it can always be realized by a linear
code.
   The first inequality in (10.2.3) is called the Gilbert-Varshamov bound, and the
second is the sphere-packing or Hamming bound. A code is said to be perfect if,
for some e ~ 1, the e-spheres with centres at the code words form both a packing
and a covering of Qn. Such a code is an (n, M, 2e + I)-code for which
M. V (n, e) = q n, so it is certainly optimal. It is characterized by the property that
every word of Qn is nearer to one code word than to any of the others. To give
an example, any code consisting of a single code word, or of the whole of Qn is per-
fect. For q = 2 and odd n (with alphabet {O, I}) the binary repetition code {On, In} is
also perfect. These are the trivial examples; we shall soon meet non-trivial ones.
   There are several ways of modifying a code to produce others, possibly with better
properties. Methods of extending codes will be discussed in Section 10.3, when we
come to linear codes. For the moment we observe that from any (n, M, d)-code C
we obtain an (n - 1, M, d')-code, where d' = d or d - I, by deleting the last
symbol of each word. This is called puncturing the code C. If we consider all
words of C ending in a given symbol and take this set of words with the last
376                                                                           Coding theory
symbol omitted, we obtain an (n - 1, M', d')-code, where M' ::::: M and d' ~ d. This
is called shortening the code C. Of course we can also puncture or shorten a given
code by operating on any position other than the last one.
Exercises
1. Show that Aq(n, 1) = qn, Aq(n, n) = q.
2. Use the proof of Theorem 10.2.2 to construct an (n, qn /Vq(n, d - 1), d)-code, for
   any q ~ 2, nand d.
3. Show that there is a binary (8,4, 5)-code, and that this is optimal.
4. (The Singleton bound) Prove that Aq(n, d) ::::: qn-d+l. (Hint. Take an optimal
   (n, M, d)-code and puncture it repeatedly; for a linear [n, d]-code this gives
   k::::: n - d + 1.)
U 1---+ uG.
For example, for the simple parity check code with generator matrix            (~ ~ ~ )
this takes the form
1003 Linear codes                                                                 377
Since any x E C has the form x = uG(u E F;), the vectors y of C.l are obtained as
the solutions of the system GyT = O. Here G has rank k, therefore C.l is an (n - k)-
dimensional subspace of F;; thus C.l is an [n, n - k]-code. It is clear from the
definition that C H = C. When n is even, it may happen that C.l = C; in that
case C is said to be self-dual. For example, the binary code with generator
matrix   (1o   0 0
               1 1 0
                        1)   is self-dual.
  Generally, if G, H are generator matrices for codes C, C.l that are mutually dual,
we have
(10.3.2)
since G, H are both left full, it follows that the sum of their ranks is n and by the
theory of linear equations (see e.g. Cohn (1994), Chapter 4),
A generator matrix H for C.l is called a parity check matrix for C. For example,
xH T = Xl + Xz + X3; if this is non-zero, an error has occurred. Our code detects one
error, but no more (as we have already seen). Before introducing more elaborate
codes we shall describe a normal form for the generator and parity check matrices.
   Two block codes of length n are said to be equivalent if one can be obtained from
the other by permuting the n places of the code symbols and (in the case of a linear
code) multiplying the symbols in a given place by a non-zero scalar. For a generator
matrix of a linear code these operations amount to (i) permuting the columns and
(ii) multiplying a column by a non-zero scalar. We can of course also change the
basis and this will not affect the code. This amounts to performing elementary opera-
tions on the rows of the generator matrix (and so may affect the encoding rules). We
recall that any matrix over a field may be reduced to the form
(10.3.5)
by elementary row operations and column permutations (see Cohn (1994), p. 59),
and for a left full matrix the zero rows are of course absent. Hence we obtain
378                                                                           Coding theory
Theorem 10.3.1. Any [n, kJ-code is equivalent to a code with generator matrix of the
form
                                       G = (I P),                                 (10.3.6)
where P is a k x (n - k) matrix.
   It should be emphasized that whereas the row operations change merely the basis,
                                                                                        •
the column operations may change the code (to an equivalent code). More precisely,
an [n, k J-code has a generator matrix of the form (10.3.6) iff the first k columns of its
generator matrix are linearly independent. This condition is satisfied in most prac-
tical cases. Ifwe use a generator matrix G in the standard form (10.3.6) for encoding,
the code word uG will consist of the message symbols u\, ... , Uk followed by n - k
check symbols.
   The standard form (10.3.6) for the generator matrix of C makes it easy to write
down the parity check matrix; its standard form is
                                                                                  (10.3.7)
For we have GH T = P - P = 0, and since H is left full, HT is right full, thus of rank
n - k, and it follows that the rows of H form a basis for the dual code C-L.
   The process of decoding just consists in finding the code word nearest to the
received word. Let us see how the parity check matrix may be used here. We have
an [n, kJ-code C with parity check matrix H. For any vector x E           F;the vector
XHT E F;-q is called a syndrome of x. By (10.3.3), the syndrome of x is 0 precisely
when x E C. More generally, two vectors x, x' E        F;
                                                       are in the same coset of C iff
XHT = x'HT. Thus the syndrome determines the coset: if a vector x in C is trans-
mitted and the received word y is x + e, then y and e have the same syndrome.
To 'decode' y, i.e. to find the nearest code word, we choose a vector f of minimum
weight in the coset of C containing y and then replace y by y - f. Such a vector f of
minimum weight in its coset need not be unique; we choose one such f in each coset
and call it the coset leader. The process described above is called syndrome decoding.
   For example consider a binary [4, 3J-code with generator matrix
                                 G~ G! ~            :)
To encode a vector we have
The parity check matrix is H = (1 Ill). The possible syndromes are 0 and 1.
We arrange the 16 vectors of Fi as a 2 x 8 array with cosets as rows, headed by
the syndrome and coset leaders:
                    G= (~         0
                                  1 0        ~)      H=C     0
                                                             1 0
                                                                   1
                                                                       ~).
A standard array is
                             00       0000    1011   0101   1110
                            01        0100    III 1 0001    1010
                             10       0010    1001   0111   1100
                             II       1000    0011   1101   0110
To decode X = (1101) we form xHT = (11) and then subtract from X the coset
leader for the syndrome (ll), giving (1l0I) - (1000) = (0101). The minimum
distance is 2, so the code again detects single errors.
   It is not necessary for decoding to write down the complete standard array, but
merely the first column, consisting of the coset leaders. We note that this method
of decoding assumes that all errors are equally likely, i.e. that we have a symmetric
channel. In more general cases one has to modify the weight function by taking the
probability of error into account; we shall not enter into the details.
   We now turn to the construction of linear codes with a large value of M for given
nand d. By the Gilbert-Varshamov bound in Theorem 10.2.2 we have Aq(n, d) ::: qk,
provided that qn-k ::: Vq(n, d - 1). However, this result does not guarantee the
construction of linear codes. In fact we can construct linear codes with a rather
better bound, as the next result shows.
Theorem 10.3.2. There exists an [n, k ]-code over Fq with minimum distance at least d,
provided that
                                                                                   (10.3.8)
For comparison we note that Theorem 10.2.2 gives Vq(n, d - 1) ::; qn-k, so (10.3.8)
is a weaker condition.
Proof. Let C be any [n, k]-code over Fq with parity check matrix H. Each vector x
in C satisfies xHT = 0; this equation means that the entries of x define a linear
dependence between the rows of H T , i.e. the columns of H. We require a code for
which the minimum distance is at least d, i.e. no vector of C has weight less than
d; this will follow if no d - 1 columns of H are linearly dependent.
   To construct such a matrix H we need only choose successively n vectors in n - kF q
such that none is a linear combination of d - 2 of the preceding ones. In choosing
the r-th column we have to avoid the vectors that are linear combinations of at most
380                                                                       Coding theory
Thus we can adjoin an r-th column, provided that Vq(r - 1, d - 2) < qn-k. By
(10.3.8) this holds for r = 0, 1, ... ,n, so we can form the required parity check
matrix H. This proves the existence of a code with the required properties, since it
is completely determined by H.                                                   •
(10.3.9)
Thus a Hamming code has the property that any two columns of its parity check
matrix are linearly independent. In any code with odd minimum distance
d = 2e + 1 every error pattern of weight at most e is the unique coset leader in its
coset, because two vectors of weight ::: e have distance ::: 2e and so are in different
cosets. For a perfect code all coset leaders are of this form. Thus in a Hamming
code each vector of weight 1 is the unique vector of least weight in its coset. Now
the number of cosets is q n/ q k = q n- k. Omitting the zero coset we see from
(10.3.10) that we have just n(q - 1) non-zero cosets and these are represented by
taking as coset leaders the n(q - 1) vectors of weight 1. This makes a Hamming
code particularly easy to decode: given x E       F;,
                                                  we calculate xHT. If x has a single
error (which is all we can detect), then xHT = YHJ, where Hj is the j-th column
of Hand y E Fq • Now the error can be corrected by subtracting y from the j-th
coordinate of x.
   The simplest non-trivial case is the binary [3, 1]-code with generator and parity
check matrices
                        G= (1           1),   H   =     C   0   ~).
1003 Linear codes                                                                  381
It consists in repeating each code word three times. The information rate is
1/3 = 0.33.
   The next case of the Hamming code, the binary [7, 4]-code, is one of the best-
known codes and one of the first to be discovered (in 1947). Its generator and
parity check matrices are
                                                                1 1 0
                                                                101
                                                                   o   0
Here the information rate is 4/7 = 0.57. The minimum distance is 3, so the code will
correct 1 and detect 2 errors.
  From any q-ary [n, k]-code C we can form another code
called the extension of C by parity check. If C is binary with odd minimum distance
d, then C has minimum distance d + 1 and its parity check matrix if is obtained
from the of C by bordering it first with a zero column and then a row of 1's.
From an (n, M, d)-code we thus obtain an (n + 1, M, d + 1)-code, and we can get
C back by puncturing C (in the last column).
   Theorem 10.3.2 implicitly gives a lower bound for d in terms of q, n, k, but it does
not seem easy to make this explicit. However, we do have the following upper bound
for d:
Proposition 10.3.3 (Plotkin bound). Let C be a linear [n, k]-code over Fq• Then the
minimum distance d of C satisfies
                                      n(q - 1) q k-l
                                 d<        k                                 (10.3.11)
                                  -      q-1
  Sometimes a more precise measure than the minimum distance is needed. This
382                                                                                  Coding theory
where Ai is the number of code words of weight i in C. A basic result, the Mac-
Williams identity, relates the weight enumerator of a code to that of its dual. This
is useful for finding the weight enumerator of an [n, kj-code when k is close to n,
so that n - k is small. We begin with a lemma on characters in fields. Here we under-
stand by a character on a field F a homomorphism from the additive group of F to the
multiplicative group of complex numbers, non-trivial if it takes values other than 1.
By the duality of abelian groups (BA, Section 4.9) every finite field has non-trivial
characters x, and L x(a} = 0 by orthogonality to the trivial character.
where B(z} is the weight enumerator of the dual code e1. and Ie I is the number of code
words in C.
Proof. We have
For VEe 1. the second sum on the right is Ie I. If v f/. e..L, then uv T takes every value
in Fq the same number of times, say N times, and we have L X(uv T} =
N. L x(a} = 0, because X is non-trivial. Hence the right-hand side reduces to
ICI·B(z}.                                                                               •
We can now derive a formula for the weight enumerator of the dual code.
Theorem 10.3.5 (MacWilliams identity). Let e be an [n, kj-code over Fq with weight
enumerator A(z} and let B(z} be the weight enumerator of the dual code                 e1.. Then
                    B(z}=q-k[I+(q-l)zjn.A (                 l-z
                                                            (   ))'                     (1O.3.14)
                                                          l+q-Iz
                                      = L:    n
                                       vEF; i=l
                                               n
                                                   zW(v;} X(UiVi)
                                  1   +z(L:x(a)) = 1- z.
Hence we obtain
Substituting into (10.3.13) and remembering that ICI = qk, we obtain (10.3.14). •
Exercises
l. Show that a code C can correct t errors and detect a further s errors if
   d(C) > 2t+s+ l.
2. Let C be a block code oflength n over an alphabet Q and let A E Q. Show that C is
   equivalent to a code which includes the word An.
3. Show that for odd d, A2 (n, d) = A2 (n + 1, d + 1). (Hint. Use extension by parity
   check and puncturing.)
4. Construct a table of syndromes and coset leaders for the ternary [4, 2]-Hamming
   code.
5. Show that the binary [7, 4]-Hamming code extended by parity check is self-dual.
6. Show that for linear codes Aq(n, d) ~ qk, where k is the largest integer satisfying
   qk.Vq(n - 1, d - 2) < qn.
7. Show that for an [n, k]-code the MacWilliams identity can be written
              L (i)
               n
              i=Or
                    Ai =l-r. L( -1)',(n-i)
                                      n
                                       _
                                      n r
                                     i=O
                                           Bi,                      for 0 ~ r ~ n.
8. Verify that formula (10.3.14) is consistent with the corresponding formula for the
   dual code. (Hint. Use the form (10.3.15).)
in the ring An = Fq[x]/(x n- I ). In this sense we can interpret any linear code as a
subset of An and the cyclic permutation corresponds to multiplication by x. Clearly
a subspace of An admits multiplication by x iff it is an ideal, and this proves
gl gr 0 0
G~(~ go
                              0
                                          gr-l
                                           go
                                                   gr
                                                   gl
                                                             0
                                                             g2
                                                                         ~.}
                                                                         gr
Proof. We have gr = 1 and by considering the last r columns we see that G is left full.
The n - r rows represent the code words g, xg, ... , xn-r-1g and we have to show
that the linear combinations are just the code words. This is clear since the code
words are of the form fg, where f is a polynomial of degree < n - r.                •
   We next derive a parity check matrix for the cyclic code C. This is done most easily
in terms of an appropriate polynomial. Let C be a cyclic [n, k ]-code with generator
polynomial g. Then g is a divisor of xn - 1, so there is a unique polynomial h
satisfying
                                    g(x)h(x)   = xn -   1.                     (1004.1)
hk - 1 ho
                                                                          D
                                                   0             0
H~C hk
                             0
                                          hi
                                          hk
                                                  ho
                                                 hk-I
                                                                 0
                                                             hk- 2
386                                                                            Coding theory
(iii) the dual code C.L is cyclic, generated by the reciprocal of the check polynomial for c:
(10.4.2)
  We go on to describe how generator and check polynomials are used for coding
and decoding a cyclic code. Let C be a cyclic [n, kJ-code with generator polynomial
g of degree r = n - k and check polynomial h of degree k. Given a message
a = aOal •.. ak-l E F;, we regard this as a polynomial a = L aixi of degree < k
over Fq • We encode a by multiplying it by g and obtain a polynomial u = ag of
degree < n. We note that any code word "# 0 has degree at least r = deg g.
  For any polynomial f of degree < n we calculate its syndrome S(f) by multiplying
the coefficients of f by the rows of the parity check matrix H. The result is
where for any polynomial qJ in x, qJi denotes the coefficient of xi. To represent
(10.4.3) as a polynomial, we take the polynomial part of x-k(fh), ignoring powers
beyond x T - 1. This can also be achieved by reducing fh (mod x n - 1) to a polynomial
of degree < n and then taking the quotient of the division by xk:
Since deg f < n, the highest possible power in fh is x n + k - 1. When reduced this
becomes X k- 1, and so does not affect the quotient in (10.4.4). Therefore S(f) is
indeed the syndrome of f, and as before S(f) = 0 precisely when f has the form
ago By reducing fh (mod xn - 1), we obtain a representative of degree < n; hence
S(f) is of degree < n - k = r, as one would expect.
   Now we choose for each possible syndrome u a coset leader L(u) of least weight.
To decode a word f we compute its syndrome S(f) and subtract the corresponding
coset leader: f - LS(f) is a code word, so we have (f - LS(f))h = a(xn - 1), for
some a, and this a is the required decoding of f For example, x 7 - 1 =
(x - 1)(.0 + x2 + 1)(.0 + x + 1) is a complete factorization over F2 • Let us take
g = x 3 + X + 1, h = X4 + xl + x + 1, so r = 3, k = 4. Suppose we encode x2 + x,
obtaining the code word (x 2 + x)(.0 + x + 1) = x 5 + X4 + x 3 + X. Owing to errors
in transmission this is received as x 5 + X4 + X. We have x 5 + X4 + X =
10.4 Cyclic codes                                                                   387
as a parity check matrix for the Hamming code. This will correct one error, and it
seems plausible that we can correct more errors by including further rows, indepen-
dent of (10.4.5).
   To obtain such rows we recall that for any n distinct elements CI, ... , Cn over a
field, the Vandermonde matrix
1 1
CI C2 Cn
                                              Cn-I
                                               1     C2n-I   Cn
                                                               n-I
ni>j(Ci-Cj)'
388                                                                       Coding theory
Theorem 10.4.4. Let q = 2m and denote by aI, ... ,aq_1 the non-zero elements ofFq .
Then for any integer t < q/2 there is a [q, k ]-code over F2 with k 2: q - mt and mini-
mum distance at least 2t + 1 and with parity check matrix
                                  al         a2           aq_1
                                  a 3I       a32          a 3q _ 1
                        H=        as
                                      I
                                             as
                                              2
                                                          a qS _ 1             (10.4.6)
   The binary code with parity check matrix (10.4.6) is called a BCH-code, after its
discoverers R. C. Bose, D. K. Ray-Chaudhuri and A. Hocquenghem.
Exercises
1. The zero code oflength n is the subspace 0 off;. Find its generator polynomial (as
   cyclic code) and describe its dual, the universal code.
2. The repetition code is the [n, I]-code consisting of all code words (y, y, ... , y),
   Y E Fq• Find its generator polynomial and describe its dual, the zero-sum code.
3. Describe the cyclic code with generator polynomial x + 1 and its dual.
4. Verify that the [7, 4]-Hamming code is cyclic and find its generator polynomial.
5. Show that the [8,4]-code obtained by extending the [7,4]-Hamming code by
   parity check is self-dual and has weight enumerator Z8 + I4z4 + 1.
6. Show that a binary cyclic code contains a vector of odd weight iff x-I does not
   divide the generator polynomial. Deduce that such a code contains the repetition
   code.
7. The weight of a polynomial f is defined as the number w(!) of its non-zero
   coefficients. Show that for polynomials over F2 , w(fg) ::: w(!)w(g).
10.5 Other codes                                                                                  389
                             L-
                              x-ajj
                                   =0  Cj
                                            -          (mod .,?t),                         (10.5.1)
Definition. Let g be a polynomial of degree t over Fqm and let ao, ... , an _ I E Fqm be
such that g(aj) =j:. 0 (i = 0, ... , n - 1). The Gappa code with Gappa polynomial g is
defined as the set of all vectors C = (co, Cj, ..• , Cn-l) satisfying
                            "~ x ~
                                 c· aj           =0   (mod g(x)).                          (10.5.2)
                              t
We see that Goppa codes are linear but not necessarily cyclic. In order to find a parity
check matrix for the Goppa code we recall that in the special case of the BCH -code
this was obtained by writing
                     (x-aj)-l         =-        Lxjaj-j-l(1-x2taj-2t).
The coefficients of x j (taken mod x2t) form the entries of the (j + 1)-th row in the
parity check matrix. We have
further write g(aj)-l = hj. Then (10.5.2) becomes L cjhij = 0, where hij =
hjLgv+j+lxjaj. Thus the matrix (hij) is
                                                                      hn-lgt                      )
                                                              hn- l (gt-I + gtan-I)
                           (~""
                              ho           hI
                                         hlal                  hH        )
                   H=                                        hn-.I~n-I       .
We see again that any t columns are linearly independent, hence the Goppa code has
minimum distance> t and its dimension is::: n - mt. There are several methods of
decoding Goppa codes, based on the Euclidean algorithm (see McEliece (1977)) and
the work of Ramanujan (see Hill (1985).
   (ii) Let C be any block code of length n. We obtain another code, possibly the
same, by permuting the n places in any way. The permutations which do not
change C form a subgroup of Symn, the group of C, which may be denoted by
G( C). For example, the group of a cyclic code contains all translations i 1-+ i + r
(mod n). If s is prime to n, we have the permutation
Then
The Galois group of EIF q is generated by the map xl-+x q• Since q E Q, this opera-
tion permutes the zeros of go as well as those of gl; therefore go and gl have their
coefficients in Fq . We note that Its interchanges go and gl if sEN, hence the
codes with generators go and gl are equivalent. We shall be particularly interested
in the QR-code generated by go; our aim will be to find restrictions on the maximum
distance d. We recall from number theory (see e.g. BA, Further Exercise 24 of
Chapter 7) that 2 is a quadratic residue mod n iff n == ±1 (mod 8), and that -1 is
a quadratic residue mod n iff n == 1 (mod 4). We shall also need a lemma on weights;
for a polynomial f (regarded as a code word) the weight w(f) is of course just the
number of non-zero coefficients.
10.5 Other codes                                                                         391
Lemma 10.5.1 Let f be a polynomial over Fq such that f(1) =I- O. Then (1 + x               +
... + xn-l)f has weight at least n. If q = 2 and deg f < n, then (1 + x + ...              +
xn-l)f has weight exactly n.
Proof. By the division algorithm, f    = (x -   l)u + c, where c = f(l) =I-   o.   It follows
that
(10.5.5)
Suppose that w(u) = r; then the right-hand side has at least r terms of degree::: n,
while the terms in u can cancel at most r terms in (1 + x + ... + xn - 1 )c. So the total
weight is ::: r + (n - r) = n.
  If q = 2 and deg f < n, we again have (10.5.5), where now deg u < n - 1. Each
non-zero term in u will cancel a term in 1 + x + ... + xn - l, and this is exactly
compensated by the corresponding term in xnu. Hence there are exactly n terms
on the right of (10.5.5).                                                             •
Proposition 10.5.2. Let C be a QR-code with generator go and let c = c(x) be a code
word in C such that c( 1) =I- O. Then
(10.5.6)
Proof. The polynomial c(x) is divisible by go but not by x - 1, because c(1) =I- O.
For suitable s, JLs will transform c(x) into a polynomial c*(x) divisible by gl and
again not by x - 1. This means that cc* is a multiple of gOgl =
1 + x + ... + x n- l , and so, by Lemma 10.5.1, w(cc*) ::: n. Now (10.5.6) follows
because w(cc*) :s w(c)w(c*) = wed.
  If n == -1 (mod 4), then -1 E N and so the operation x 1-+ x - 1 transforms go into
gl; thus c(x)c(x- l ) is divisible by gOgl. Now corresponding terms in c(x) and c(x- l )
give rise to a term of degree zero in c(x)c(x- l ), so there are at most W(C)2 - w(c) + 1
terms in all and (10.5.7) follows.
   Finally assume that n == -1 (mod 8) and q = 2; then (10.5.7) applies. Further,
any code word c has degree < n; hence on writing r = deg c, we have
xr c(x- l )c(x) = fgogl> where f is a polynomial of degree < n. By Lemma 10.5.1,
this product has weight exactly n, and writing d = w(c), we have d2 - d + 1 ::: n.
Now consider how terms in c(x)c(x- l ) can cancel. We have c = LX\
c(x- l ) = L x- ri and a pair of terms in the product will cancel if ri - rj = rk - rl.
But in this case rj - ri = rl - rk and another pair will cancel, so that terms cancel
in fours. Hence we have d 2 - d + 1 - 4t = n; therefore d 2 - d == 2 (mod 4), and
so d == -1 (mod 4).                                                                    •
392                                                                       Coding theory
   Let us now take q = 2. Then the condition that q is a quadratic residue mod n
gives n == ±I (mod 8). Consider the polynomial
                                        B(x) =   LX'.
                                                 'EQ
Since B(X)2 = LX2' = B(x), it follows that B is idempotent. In particular, for the
primitive element a of E we have B(a)2 = B(a), so B(a) is 0 or 1. For any r E Q
we have B(a') = B(a), while for r E N, B(a') + B(a) = L~-1 a' = 1. If B(a) = 1,
replace a by as, where 5 E N; since (5, n) = 1, as is again a primitive n-th root of 1
and B(aS ) = o. Thus for a suitable choice of a we have B(a) = O. It follows that
                                            0           if i   E   Q,
                             B(a i )   ={   1           if i   E   N,
                                            (n - 1)/2   if i =     o.
If n == 1 (mod 8), then B(a i ) vanishes exactly when i E Q U {OJ, so the code is then
generated by (x -1)go. Similarly if n == -1 (mod 8), then B(a i ) vanishes when
i E Q, so in this case the generator is go.
   Let C(n) be the binary code defined in this way and C(n)+ its extension by parity
check. It can be shown that the group of C(n)+ is transitive on the n + 1 places.
(These places may be interpreted as the points on the projective line over Fn and
the group is then the projective special linear group, see van Lint (1982), p. 88.)
Consider a word c E C of least weight d. Since the group is transitive, we may
assume that the last coordinate in C+ (the parity check) is 1. This means that c
has odd weight d, say, and so c( 1) = 1. Hence by Proposition 10.5.2, d == -1
(mod 4) and d 2 - d + 1 ~ n.
   For example, for n = 7 we obtain the [7, 4]-Hamming code; here d = 3. A second
(and important) example is the case n = 23. Here
                        go   = XlI + x9 + x7 + x6 + XS + x + 1.
Since 23  == -1 (mod 8), go is a generator for this code, which is known as the
[23, I2]-Golay code. Since d 2 - d + 1 ~ 23 and d == -1 (mod 4), it follows that
d ~ 7, so the 3-spheres about the code words form a packing. On checking their
size, we note the remarkable fact that
This shows that C(23) is a perfect code, with minimum distance d = 7. The extended
code C+ is of length 24, with minimum distance 8, giving rise to the Leech lattice,
a particularly close sphere packing in 24 dimensions. The symmetry group of a
point in this lattice in R is the first Conway group .0 ('dotto') of order
222.3 9 .54 .72.11.13.23"-' 8.3 x 10 18 • The quotient by its centre (which has order 2)
is the sporadic simple group known as .1, discovered in 1968 by John Conway
(see Conway and Sloane (1988)).
10.5 Other codes                                                                        393
Exercises
1. Construct the ternary [11, 6]-Golay code and verify that it is perfect. Find its
   weight enumerator.
2. Construct the extended ternary [12, 6]-Golay code and find its weight enumerator.
   Is it self-dual?
3. A binary self-dual code is called doubly even if all weights of code words are
   divisible by 4. Show that the extended [8,4]-Hamming code is doubly even.
   Show that if there is a [2k, k]-code which is doubly even, then k == 0 (mod 4).
4. Show that the extended binary [24, 12]-Golay code is doubly even. Find its weight
   enumerator.
Groups form the particular case where every element has an inverse. As an example
of a monoid other than a group we may take, for any set A, the set Map(A) = AA of
all mappings of A into itself, with composition of mappings as multiplication and the
identity mapping as neutral. Many of the concepts defined for groups have a natural
analogue for monoids, e.g. a submonoid of a monoid M is a subset of M containing 1
and admitting multiplication. A homomorphism between monoids M, N is a mapping
f : M -+ N such that (xy)f = xf.yf, IMf = IN for X, Y E M. Here we had to assume
explicitly that the unit element is preserved by f, for groups this followed from the
other conditions. A generating set of a monoid M is a subset X such that every
element of M can be written as a product of a number of elements of X. For example,
the set N of all natural numbers is a monoid under multiplication, with neutral
element the number 1; here a generating set is given by the set of all prime numbers,
for every positive integer can be written as a product of prime numbers, with 1
expressed as the empty product. Likewise the set No = N U {OJ is a monoid under
addition, with neutral element 0 and generating set {l}.
   An example of particular importance for us in the sequel is the following monoid.
Let X be any set, called the alphabet, and denote by X* the set of all finite sequences
of elements of X:
The associative law is easily verified, and it is also seen that the empty sequence 1 is
the neutral element. X* is called the free monoid on X. We remark that when
X = 0, X* reduces to the trivial monoid consisting of 1 alone. This case will usually
be excluded in what follows. Apart from this trivial case the simplest free monoid is
that on a one-element set, {x} say. The elements are 1, x, x 2 , x 3 , ••• , with the usual
multiplication. We see that {x}* is isomorphic to No, the monoid of non-negative
integers under addition, by the rule n # xn. Since the expression (11.1.1) for an
element w of a free monoid is unique, the number r of factors on the right is an
invariant of w, called its length and written Iwl.
   The name 'free monoid' is justified by the following result:
where a I # a is the given correspondence between A' and A. Since every element of
F can be written as a product     a; ...
                                     a~ in just one way, f is well-defined by (11.1.3).
It is surjective because A generates M, and f is easily seen to be a homomorphism
by (11.1.2).                                                                        •
11.1 Monoids and monoid actions                                                   397
Writing for the moment Rx for the mapping s 1-+ sx of S into itself, we can express
S.I, S.2 as
(11.1.4)
This just amounts to saying that the mapping R : x 1-+ Rx is a monoid homo-
morphism of Minto Map(S). For example, M itself is an M-set, taking the multi-
plication in M as M-action. This is sometimes called the regular representation
of M. We can use it to obtain the following analogue of Cayley's theorem for
groups (BA, Theorem 2.2.1):
Proof. Given a monoid M, we take the regular representation of M. If this is x 1-+ Px,
then P is a homomorphism from M to Map(M), by what has been said, and if
Px = Py, then x = 1.Px = 1.Py = y, hence the homomorphism is injective.            •
this shows that any set S with an M-action can also be regarded as a set with an
X*-action, where X corresponds to a generating set of M.
   A free monoid has several remarkable properties, which can also be used to
characterize it. A monoid M is called conical if xy = 1 ::::} x = Y = 1; M is said to
have cancellation or be a cancellation monoid if for all x, y E M, xu = yu or ux = uy
for some u E M implies x = y. Further, M is rigid if it has cancellation and whenever
ac = bd, there exists Z E M such that either a = bz or b = az. We observe that any
free monoid is conical and rigid; the first property is clear from (11.1.2), because
the product in (11.1.2) cannot be 1 unless r = s = O. To prove cancellation we
note that in any element x, ... Xr =f=. 1 the leftmost factor x, is unique, as is the
rightmost factor xr. Thus x, ... Xr = y, ... Ys can hold only if r = 5 and Xi = Yi
(i = 1, ... , r). It follows that when xu = yu, say
rigidity, let ac   = bd, say a = XI ... Xn b = YI ... Y" c = UI ... Uh, d = VI ... Vk.   Then
we have
By symmetry we may assume that r::: s; then XI = YI, ... , Xr = Yn and hence
b = XI .. ,XrYr+I .. 'Y5 = az, where z = Yr+I .. 'YS' This shows a free monoid to be
rigid. We remark that when ac = bd, then a = bz or b = az according as Ia I is :::
or::: Ibl.
   By a unit in a monoid M we understand an element Usuch that V exists in M satis-
fying uv = 1, vu = 1. For example, in a conical monoid the only unit is 1. When M
has cancellation, it is enough to assume one of these equations, say uv = 1; for then
(vu)v = v(uv) = vI = lv, hence vu = 1 by cancellation, and similarly if vu = 1.
Let us define an atom as a non-unit which cannot be expressed as a product of
two non-units (as in rings). For example, in a free monoid the atoms are just the
elements oflength 1. This shows incidentally that in a free monoid the free generat-
ing set is uniquely determined as the set of all atoms. We now have the following
characterization of free mono ids:
Theorem 11.1.3. Let F be a monoid and X the set of all its atoms. Then F is free, on X
as free generating set if and only if F is conical and rigid, and is generated by X.
Proof. We have seen that in a free monoid these conditions are satisfied. Conversely,
assume that they hold; we shall show that every element of F can be written in just
one way as a product of elements of X. Any a E F can be expressed as such a product
in at least one way, because X generates F. If we have
                            a=XI ... Xr=YI"'Ys,Xj,         YjEX,
then by rigidity, XI = YI b or YI = XI b for some b E F, say the former holds. Since
XI, YI are atoms, b must be a unit and so b = 1 because F is conical. Thus XI = YI
and we can cancel this factor and obtain X2 ... Xr = Y2 ... Y5' By induction on
max(r, s) we find r - 1 = s - 1, i.e. r = sand X2 = Y2, ... ,Xr = Yr' Thus F is
indeed free on X, as we had to show.                                                       •
Exercises
1. Show that every finite cancellation monoid is a group.
2. Let a, b be any elements of a monoid M. Show that if ab and ba are invertible in M,
   then so are a and b, but this does not follow if we only know that ab is invertible.
   What can we say if aba is invertible?
3. Show that every finitely generated monoid which is conical and rigid is free.
4. Show that the additive monoid of non-negative rational numbers is conical and
   rigid but not free.
5. Show that a submonoid of a free monoid is free iff it is rigid. Give examples
   of submonoids of free monoids that are not free. (Hint. Consider first the
   I-generator case.)
11.2 Languages and grammars                                                         399
6. A set with an associative multiplication is called a semigroup. Show that any semi-
   group 5 may be embedded in a monoid by defining 51 = 5 U {l} with multi-
   plication xl = Ix = x for all x E 5.
7. A zero in a monoid M is an element 0 such that Ox = xO = 0 for all x E M. Verify
   that (i) a monoid has at most one zero, (ii) every monoid M can be embedded in
   a monoid Mo with zero. If M already has a zero, how can the presence of these two
   zeros be reconciled with (i)?
The mathematical model consists of a set of rules of the form: sentence -+ {subject,
verb}, noun -+ cow, etc., which will lead to all the sentences of the language and no
others. This amounts to reading the above diagram from the bottom upwards.
   In order to write our sentences we need a finite (non-empty) set X, our alphabet.
As we have seen, the free monoid on X is the set X* of all strings of letters from X,
also called words in X (with a multiplication which we ignore for the moment). Bya
language on X we understand any subset of X*. Here we do not distinguish between
words and sentences; we can think of an element of X* as a message, with a
particular symbol of X as a blank space, to separate the words of the message.
400                                                               Languages and automata
Examples
1. L = {xn In      ::: I}. Rules a ~ x, a ~ ax. We shall write this more briefly as a ~ x;
     ax. A typical derivation is a ~ ax ~ ax 2 ~ ax 3 ~ X4. Similarly the language
     {x mr + ns 1m, n ::: O} is generated by the rules a ~ ax r; ax s; l.
2.   L = {xyn In::: OJ, also written xy*. Rules: a ~ x; a ~ ay.
3.   L = {xmyn 1m, n ::: OJ. Rules: a ~ xa; ay; 1.
4.   L = {xmyn 10 .:::: m .:::: n}. Rules: a ~ xay; ay; l.
5.   L = {xnzyn In ::: OJ. Rules: a ~ xay; z.
6.   The empty language L = 0 has the rule a ~ a.
7.   The universal language X* has the rules a ~ ax; 1 (x EX).
This concept of a language is of course too wide to be of use and one singles out
certain classes of languages by imposing conditions on the generating grammar, as
follows. The classification below is known as the Chomsky hierarchy.
   O. By a language of type 0 or a phrase structure language we understand any
language generated by a phrase structure grammar. By no means every language is
of type 0; in fact, since the alphabet and the set of rewriting rules are finite, the
set of all languages of type 0 is countable, whereas there are uncountably many
languages, because an infinite set has uncountably many subsets. It can be shown
that the languages of type 0 are precisely the recursively enumerable subsets of X*
(see e.g. M. Davis (1958)).
11.2 Languages and grammars                                                               401
The grammar is then also called a CS-grammar. The rule (11.2.1) can be taken to
mean: a is replaced by u in the context fag.
  2. A language is said to be of type 2, or context-free, or a CF-Ianguage if it can be
generated by a grammar with rewriting rules of the form
                               a   ~    u,    where a      E    V, u E A*.           (11.2.2)
The grammar is then also called a CF-grammar. The rule (11.2.2) means that a is
replaced by u independently of the context in which it occurs.
  3. A language is said to be of type 3, or regular, or finite-state if it can be generated
by a grammar with rules of the form
                       a   ~   x{J, a   ~    1,      where x     E   X, a, {J E V.   (11.2.3)
Again the term regular is also used for the grammar. Here a is replaced by a variable
following a letter or by 1. Instead of writing the variable {J on the right of the
terminal letter we can also restrict the rules so as to have {J on the left of the terminal
letter throughout. It can be shown that this leads to the same class oflanguages (see
Exercise 4 of Section 11.3).
   If !l'; (i = 0, 1,2,3) denotes the class of all proper languages of type i, then it is
clear that
                                                                                     (11.2.4)
in fact the inclusion can all be shown to be strict, but in general it may not be easy to
tell where a given language belongs, since there are usually many grammars generat-
ing it. Thus to show that a given language is context-free we need only find a CF-
grammar generating it, but to show that a language is not context-free we must
show that none of the grammars generating it is CF.
   We note the following alternative definition of grammars of type 1, 2:
lul::::lvl· (11.2.5)
by replacing the letters in u one at a time by a letter in v, taking care to leave a (new)
variable until last. To give a typical example, if u = UjexU2U3, v = Vj ... vs, we replace
U --+ v by the rules UjexU2U3 --+ VdJU2U3 --+ vd3u2VS --+ VdJV4VS --+ v, where f3 does
not occur elsewhere.
   (ii) It is clear from the definition of a CF-Ianguage that its rules are characterized
by (11.2.6); the details may be left to the reader.                                     •
    Sometimes one may wish to include the empty word in a proper language. This is
most easily done by replacing any occurrence of a on the right of a rule by a new
variable A, say, for each rule with a on the left add the same rule with a replaced
by I on the left and adding the rule a --+ 1. For example, to generate the language
{xnzyn, lin ~ O} we modify the example (11.2.5) above: a --+ XAy; 1, A --+ XAy; z.
If we just added a --+ 1 to the rules of 5, we would also get xy.
    From any improper CF-Ianguage L we can obtain the proper CF-Ianguage L\{I}
by replacing in any CF-grammar for L, any rule ex --+ 1 by f3 --+ u, where u runs
over all words obtained from derivations of the form f3 --+ --+ u, where U contains
ex, by replacing ex in U by 1. For example, the language {xmyn 1m + n > O} is generated
by a --+ xa; ay; y; x.
    Looking at the examples given earlier, we see that Examples 1, 2 and 3 are regular,
as well as Examples 6 and 7. Examples 4 and 5 are context-free but not regular, as we
shall see in Section 11.3. We conclude with an example of a CS-Ianguage which is not
context-free, as Proposition 11.3.6 will show.
Example
8. {xnznyn In ~ O} has the generating grammar a --+ XaAf-L; xzf-L; 1, f-LA --+ Af-L,
   ZA --+ Z2, f-L --+ y. The first two rules generate all the words XnZf-L(Af-L) n- j, the
   fourth moves the A'S past the f-L'S next to Z and the next replaces each A by z.
   Finally each f-L is replaced by y. To obtain the same language without 1 we
   simply omit the rule a --+ 1.
Exercises
1. Find a regular grammar to generate the set of all words of even length in X.
2. Show that each finite language is regular.
3. Show that if L, L' are any languages of type i ( = 0, 1,2 or 3), then so are L U L',
   LL' = {uvlu E L, vEL'} and LO obtained from L by writing each word in reverse
   order.
4. Show that if L is regular, then so is L*, the language whose words are all the finite
   strings of words from L.
5. Show that regular languages form the smallest class containing all finite languages
   and closed under union, product and *.
6. Show that every context-free language can be generated by a CF-grammar G
   with the property: for each non-terminal variable ex there is a derivation
   ex --+ U (u E X*) and for each terminal letter x there is a derivation ex --+ u,
   where x occurs in u.
11.3 Automata                                                                             403
7. Show that for any CF-grammar G = (X, V) there is a CF-grammar G' producing
   the same language as G such that (i) G' contains no rule ex ~ fJ, where ex, fJ E V,
   (ii) if L(G) is improper, then G' contains the rule ex ~ 1 but no other rules with
   1 on the right-hand side and (iii) no rule of G' has a occurring on the right.
   Thus all rules of G' have the form a ~ 1, ex ~ x or ex ~ f, where
   f E (XU V\{a})+, ifi::: 2.
8. Show that for a given CF-grammar G there exists a CF-grammar G' producing the
   same language as G, with rules ex ~ xf, f E V* and possibly a ~ 1 (Greibach
   normal form).
11.3 Automata
Logical machines form a convenient means of studying recursive functions. In
particular, Turing machines lead precisely to recursively enumerable sets, and so
correspond to grammars of type 0, as mentioned earlier. These machines are outside
the scope of this book and will not be discussed further, but we would expect the
more restricted types 1-3 of grammars to correspond to more special machines.
This is in fact that case and in this section we shall define the types of machines
corresponding to these grammars and use them to derive some of their properties.
   A sequential machine M is given by three sets and two functions describing its
action. There is a set 5 of states as well as two finite alphabets: an input X and
an output Y. The action is described by a transition function 8 : 5 x X ~ 5 and an
output function A : S x X ~ Y. To operate the machine we start from a given
state 5 and input x; then the machine passes to the state 8(s, x) and produces the
output A(s, x). In general the input will not just be a letter but a word w on X.
The machine reads w letter by letter and gives out Y E Y according to the output
function A, while passing through the different states in accordance with the transi-
tion function 8. The output is thus a word on Y, of the same length as w, obtained as
follows. Define mappings 8' : 5 x X* ~ 5, A' : 5 x X* ~ y* by the equations
These equations define 8', A' by induction on the length of words. It is clear that
8', A' extend 8, A respectively and so we may without risk of confusion omit the
primes from 8', A'. From (11.3.1) it is clear that
so the mapping 8 just defines an action of the free monoid X * on 5. We note that this
holds even though no conditions were imposed on 8.
   From this definition it is clear that a machine is completely specified by the set of
all quadruples of the form (5, x, A(s, x), 8(5, x)). Sometimes it is preferable to start
404                                                                Languages and automata
The members of P are called its edges; each edge (5, x, y, 5') has an initial state 5, input
x, output y and final state 5'. For a sequential machine each pair (5, x) E S x X deter-
mines a unique edge (5, x, y, 5') and whenever our set P of edges is such that
C. for each pair (5, x) E S X X there exists a unique y E Y and 5' E S such that
(5, x, y, 5') E P,
then we can define )..,8 by writing y = )..(5, x), 5' = 8(5, x) and we have a sequential
machine. Two edges are consecutive if the final state of the first edge is also the initial
state of the second. By a path for A we understand a sequence u = (UI' ... , un) of
consecutive edges
                                   Ui   = (Si-I,Xi,Yi,Si).
Its length is n, So is the initial and Sn the final state, XI ... Xn its input label and
YI ... Yn its output label. It is clear how an automaton can be represented by a
graph with the set S of states as vertex set, each edge being labelled by its input
and output. Sometimes one singles out two subsets I, F of S; a path is called
successful if its initial state is in I and its final state in F. The set L(A) of all input
labels of successful paths is a subset of X *, called the behaviour of A, or also the
set accepted by A. We note that the output does not enter into the behaviour;
when the output is absent (so that P now consists of triples (5, x, 5')), A is called
an acceptor. As an example consider an acceptor with states 50,51,52, input x,y and
transition function
           ()     x      y
           80     81    82                                            x
           81     82    81          80
           82     82    82
                                                                crP'
The graph is as shown. If I = {so}, F = {sIJ, then the behaviour is xY*; for I = {so},
F = {52} the behaviour is xy*xX* U yX*.
  An automaton is said to be complete if it satisfies condition C above (so that we
have a sequential machine), and the set I of initial states consists of a single state.
To operate a complete acceptor we take any word in X* and use it as input with
the machine in state I; this may or may not lead to a successful path, i.e. a path
11.3 Automata                                                                             405
ending in F. We shall be interested in its behaviour, i.e. the set of input labels
corresponding to successful paths. Thus the above example is a complete acceptor;
we note how the graph makes it very easy to compute its behaviour. In constructing
an acceptor it is usually convenient not to demand completeness, although complete
acceptors are easier to handle. Fortunately there is a reduction allowing us to pass
from one to the other:
Proposition 11.3.1. For each acceptor A there is a complete acceptor C with the same
behaviour. If A is finite (i.e. with a finite set of states), then so is C.
Proof. Let the set of states for A be S, with initial state I and final state F. We take C
to be on the same alphabet as A, with state set the set of all subsets of S, initial state
{I} and final set of states all sets meeting F. The transition function for C is given by
8(U, x) = V, where V consists of all states v such that (u, x, v) is an edge in A for
some u E U. It is clear that C has the same behaviour as A, and it is easily seen to
be complete.                                                                            •
Ls={VEX*ls.VEF};
then Ls consists of the set of words which give a successful path with s as initial state.
Two states s, t are called separable if Ls =I- Lt , inseparable otherwise. If any two distinct
states are separable, the acceptor is said to be reduced. Every acceptor (finite or
infinite) has a homomorphic image which is reduced and has the same behaviour;
to obtain it we simply identify all pairs of inseparable states.
   For every subset Y of X* we can define a reduced acceptor A(y) whose behaviour
is Y. The states of A(Y) are the non-empty sets
where u ranges over X*. The initial state is 1 - I Y = Y and the final states are the
states u - I Y containing 1. The transition function is defined by
This is a partial function (i.e. not everywhere defined) since u - I Z may be empty, but
it is single-valued. If for u we take a word in X, we have, by induction on the length
ofu,
This shows that (11.3.5) holds for any u E X*. As a consequence we have
wE L(A(y)) {} 1 E Y.w {} WE Y,
which shows that the behaviour of A(Y) is indeed Y. We shall call A(Y) the minimal
acceptor for Y; its properties follow from
Theorem 11.3.2. Let A = (S, i, F) be a trim acceptor and put Y = L(A), the behaviour
of A. Then there is a state homomorphism rp : A --+ A(y) to the minimal acceptor for Y
which is surjective on states, given by
                                rp: S 1--+ Ls = {v E X*ls, v E F}.                           (11.3.6)
V E U- I Y {} uv E Y {} sv = iuv E F {} v E Ls;
   As we have seen, for any subset Y of X* there is a reduced acceptor with behaviour
                                                                                                  •
Y; taking this to be A in Theorem 11.3.2, we find rp in this case to be an isomorphism,
by the definition of 'reduced'. It follows that A must be reduced.
Theorem 11.3.4. A language is regular if and only if it is the precise set accepted by a
finite acceptor.
11.3 Automata                                                                         407
Proof. Let A = (S, i, F) be a finite acceptor with behaviour Yand write the transition
function as 8 for clarity. For our grammar G = {X, V, -+} we take X to be the input
of A and V = S, the set of states, with a = i, the initial state. For each state a in Sand
x E X we include in G the rule
and for each final state (J) we include the rule (J) -+ 1. Given any word w = XI •.. x"
let us put 8(i, XI) = SI, ... , 8(Sr-l, x r ) = Sr. Then the rules (11.3.7) include
i -+ XISI, ... , Sr-I -+ XrS" hence a -+ XISI -+ XIX2S2 -+ ... -+ XI .. . XrSr• If Sr E F,
then Sr -+ 1 and XI ••. Xr is included in L( G). On the other hand, if
XI ... Xr E L( G), consider the rules in G: they are all of the form a -+ xfJ or
a -+ 1 and the number of variables is constant in the application of the former
rule and decreases by 1 when the latter is applied. Thus any derivation of XI ... Xr
must be of the form i -+ XIS!> 51 -+ X2S2, .•. ,5r -1 -+ X r5" 5r -+ 1. This means that
8(Si_l, Xi) = Si (i = 1, ... , r) and Sr E F, so XI .•• Xr is accepted by A.
   Conversely, let G be a regular grammar, with derived language L(G). For our
acceptor A we take the alphabet X of G as input and the set V of variables as state
set, with a as initial state and a triple (a, x, fJ) for each rule a -+ xfJ, while the
final state set consists of all a such that a -+ 1. Then it is clear that the derivations
of G correspond precisely to the successful paths in A; the details may be left to the
reader. Hence L( G) is the set accepted by A.                                           •
   The acceptor constructed in this proof may not be complete, but it is trim
provided that any superfluous variables have been removed from G. It follows by
Theorem 11.3.2 that for a regular language the minimal acceptor is finite. This
provides a practical way of determining whether a language is regular: Y is a regular
language iff its minimal acceptor A(y) is finite. We illustrate this result by some
examples.
Examples
1. {xyn In ~ O}. The minimal acceptor has states So = xy* and SI    = y*, and its opera-
   tion is given by the table:
                           80    81
                  X        81
                  Y              81
             x
             Y
                      80
                      81
                            81
                            81
                                                   x
                                      80 0~----------------..    810          Y
408                                                                            Languages and automata
   The initial state is So and the final state is 51. The behaviour can be read off from
   the graph.
2. {xnyln::: OJ. Here the states are So = x*y and 51 = {I}, with initial state So and
   final state 51:
                        x                        ~_5_0______~Y________5_1_
                        y
3. {xnyn In ::: OJ. We have the states So = {xnyn In::: OJ, 51 = {xnyn+ lin::: OJ,
   52 = {xnyn+2In ::: O}, ... , to = {I}, t1 = {y}, t2 = {yZ}, .... The initial state is So
   and the final states are So, to, while the operations are given by the table below:
50 51 52 53 ...
                                                           TO
                                                                 x             x          x
       50    51     52 ... to      t1   to
  x 51 52           53
  Y    to           t1             to   t1
                                                            o•          0 ..       0 ..       0
                                                                 y             Y          Y   t3 ...
                                                            to          t1         t2
It should be clear from these examples how the behaviour of an acceptor may be read
off from its graph. We note that in none of the cases is the acceptor complete; the
transition function, though single-valued, is not everywhere defined. But this does
not impair its usefulness; in any case we could replace it by a complete acceptor,
using Proposition 11.3.1. Sometimes it is easier to test for regularity by means of
the following necessary condition:
Proposition 11.3.6. Let G be a CF-grammar. Then there exist integers p, q such that
every word w of length greater than p in L( G) can be written as w = w' uzvw", where
uv   -11,   luzvl   s   q and w'unzvnw"      E   L(G) for all n::: 1.
11.3 Automata                                                                           409
Proof. The rules of G are all of the form a --+ t, tEA *. Suppose the number of vari-
ables is k and choose p so large that every word w E L( G) of length ~ p has more
than k steps in its derivation. Each of these steps has the form a --+ t; hence some
variable occurs twice, so the part of the derivation from the first to the second occur-
rence of a reads a --+ ... --+ uav, where uv :j:. 1. It follows that w = w'uzvw", where
a occurs only once in the derivation a --+ ... --+ z, which therefore has at most k
steps. Therefore uzv has bounded length, and by repeating the steps between a
and uav we obtain w'unzvnw" for all n ~ 1.                                            •
    Proposition 11.3.5 shows that the language {xnzyn} which we saw to be context-
free, is not regular, and Proposition 11.3.6 shows that {xnynzn} is not context-free.
    Let us return to the example {xnzyn}; we have just seen that it is not regular, and so
cannot be obtained from a finite acceptor. Intuitively we can see that a finite acceptor
does not have the means of comparing the exponents of x and y. To make such a
comparison requires a memory of some kind, and we shall now describe a machine
with a memory capable of accepting CF-Ianguages. The memory to be described is of
a rather simple sort, a 'first-in, last-out' store, where we only have access to the last
item in the store.
   A pushdown acceptor (PDA for short) is an acceptor which in addition to its set of
states 5 and input alphabet X has a set 1: of store symbols, with initial symbol Ao and a
transition function 8 : 5 x X x 1: --+ 5 x 1:*; but for a given triple of arguments
there may be several or no values. At any stage the machine is described by a
triple (Sj, w,a), where Sj E 5, WE X*, a E 1:*. We apply 8 to the triple consisting
of Sj, the first letter of wand the last letter of a. If w = xw', a = a'A say, and
(Sj' f3) is a value of 8(sj, x, A), then
is a possible move. Thus the effect of 8 is to move into a state Sj' remove the initial
factor x from wand replace the final letter A of a by f3. We say that a word w on X is
accepted by the machine if, starting from (so, w, Ao) there is a series of moves to take
us to (sr, 1, y), where So is the initial and Sr a final state. With this definition we have
Theorem 11.3.7. The context-free languages constitute the precise class of sets accepted
by pushdown acceptors.                                                                •
   We shall not give the proof here (see e.g. Arbib (1969)), but as an example we
describe a PDA for {xnynln ~ I}. Its states are SO,S),S2, where So is initial and 52
final. The store symbols are A (initial symbol), IL, v. We give the values for 8(sj,.,.)
in the form of a table for each Sj:
                50             IL     v                                  v
                x     SoIL   SoIL v So v2          x
                Y             S2IL   S)            y            S2IL
410                                                              Languages and automata
Blanks and the remaining values (for S2) remain undefined. To see how xnyn is
accepted, but no other strings, we note how the store acts as a memory, remembering
how many factors x have been taken off. If we think of the store as arranged verti-
cally, at each stage we remove the topmost symbol and add a number of symbols at
the top, rather like a stack of plates in a cafeteria; this explains the name.
   By a somewhat more elaborate process, with a tape on which the input is written
(a 'linear-bounded' automaton) one can devise a class of machines which accept
precisely all the CS-Ianguages (see Landweber [1963]). These machines are more
special than Turing machines in that their tape length is bounded by a linear func-
tion of the length of the input word.
   Finally we note the following connexion with mono ids:
Exercises
 1. Find the languages generated by the following grammars: (i) a ---+ a 2 ; x; y, (ii)
    a ---+ a 3 ; x; y, (iii) a ---+ a 2 ; xay; yax; 1, (iv) a ---+ xax; xyx; x.
 2. Find a CF-grammar on x, y generating the set of all words in which x is
    immediately followed by y.
 3. Find a grammar on x, y generating the set of all words in which each left factor
    has at least as many x's as y's. Is this a CF-Ianguage?
 4. A grammar with rules of the form a ---+ x, a ---+ fJx is sometimes called left
    regular, while a grammar with rules of the form a ---+ x, a ---+ xfJ is called right
    regular. Show that every language generated by a left regular grammar can
    also be generated by a right regular grammar. (Hint. Interpret the words of
    the language as circuits in the acceptor graph; a left and a right regular grammar
    correspond to the two senses of traversing these loops.)
 5. Show that every CF-Ianguage in one letter is regular.
 6. Show that a language in a one-letter alphabet {x n In E I} is regular iff the set I of
    exponents is ultimately periodic.
11.4 Variable-length codes                                                          411
 7. Construct a PDA for the set of all palindromes with 'centre marker', i.e. L(G),
    where G : a -+ xax; yay; z (Hint. Put one half of the word in store and then
    match the other half.)
 8. Find a PDA for the set of all even palindromes G: a -+ xax; yay; 1. (Hint.
    Construct a PDA to 'guess' the centre.)
 9. An automaton is called deterministic (resp. total) if for each S E S, x E X there
    exists at most (resp. least) one pair y E Y, s' E S such that (s, x, y, s') E P(A).
                                      °
    For any A define its reverse A as the automaton with state set S, input Y,
    output X and P(AO) as the set of all (s', y, x, s) EA. Show that AO is deterministic
    whenever A is reduced.
10. A complete automaton A with N states Sl, ••. , SN can be described by a set of
    N x N matrices P(xly) (x E X,y E Y) where the (i,j)-entry of P(xly) is 1 if
    8(Si' x) = y and )..(Si, x) = Sj' and 0 otherwise. Define P(ulv) recursively for
    u E X*, V E y* by P(uxlvy) = P(ulv)P(xly), P(ulv) = 0 if lui -:j:. Ivl. Show that
    P(uu'lw') = P(ulv)P(u'lv'). Further put P(x) = LyP(xIY), write n for the
    row vector whose i-th component is 1 if Si is the initial state and 0 otherwise
    and write f for the column vector with component 1 for a final and 0 for a
    non-final state. Show that for any u E X*, nP(u)f is 1 if u is accepted and 0
    otherwise.
11. With the notation of Exercise 10, put T = Lx P(x); verify that the (i, j)-entry of
    Tn is the number of words oflength n in X which give a path from Si to Sj. Show
    that if )..(n) denotes the number of words oflength n accepted, then the length
    generating function L(t) = L )..(n)t n satisfies L(t) = n(I - tT) -If. Use the
    characteristic equation of T to find a recursion formula for )..(n).
elements of A; in other words, (A) is the free monoid on A as free generating set.
Thus if W E (A) and
If a = bz, we say that b is a prefix of a. Let us define a prefix set as a non-empty subset
of X* in which no element is a prefix of another. For example, {I} is a prefix set; any
prefix set A =f. {I} cannot contain 1, because 1 is a prefix of any other element of X*.
   What we have just found is that if A is not a code and A =f. {I}, then A is not a
prefix set; thus we have
Clearly this relation is reflexive and transitivej we claim that when M is conical, with
cancellation, then ':s' is antisymmetric, so that we have a partial ordering. For if
u:s v, v :s u, then v = uz, u = vz', hence v = uz = vz'z, so z'z = 1 by cancellation,
and since M is conical, we conclude that z = z' = 1.
   We shall be particularly interested in the ordering (1104.1) on free monoids. In that
case the set ofleft factors of any element u is totally ordered, by rigidity, and since the
length of chains of factors is bounded by lui, the ordering satisfies the minimum
condition. In terms of the ordering (11.4.1) on a free monoid, a prefix set is just
an anti-chain, and by BA, Proposition 3.2.8 there is a natural bijection between
anti-chains and lower segments. Here a 'lower segment' is a subset containing
with any element all its left factorsj such a set, if non-empty, is called a Schreier set;
it is clear that every Schreier set contains 1.
   Let us describe this correspondence between prefix sets and Schreier sets more
explicitly: if C is a prefix set in X*, then the corresponding Schreier set is the com-
plement of CX* in X*j for a Schreier set P the corresponding prefix set is the set of all
minimal elements in the complement of P.
   A prefix set C is said to be right large if Cx* meets wX* for every w E X*. By the
rigidity of X* this just amounts to saying that every element of X* is comparable
with some element of C. Hence the Schreier set corresponding to a right large
prefix set C consists precisely of the proper prefixes of elements of C. To sum up
these relations we need one more definition. In any monoid a product AB of subsets
A, B is said to be unambiguous if each element of AB can be written in just one way as
c = ab, where a E A, b E B.
Proposition 11.4.2. Let X* be the free monoid on a finite alphabet X. Then there is a
natural bijection between prefix sets and Schreier sets: to each prefix set C corresponds
P = X*\ Cx* and to each Schreier set P corresponds C = PX\P, i.e. the set of minimal
elements in X*\P, and we have the unambiguous product
                                            X* = C*P.                                         (11.4.2)
   For a closer study of codes it is useful to have a numerical measure for the
elements of X*. By a measure on X* we understand a homomorphism JL of X*
into the multiplicative monoid of positive real numbers, such that
                                       LJL(X) = 1.                                  (llAA)
                                      XEX
Clearly JL(1) = 1 and the value of JL on X can be assigned arbitrarily as positive real
numbers, subject only to (11.404); once this is done, JL is completely determined on
X* by the homomorphism property. For example, writing m(x) = r- I , we obtain
the uniform measure on X*:
                                      mew) = r- 1wl .
                                    /L(A)   =L      /L(a).
                                              aEA
We note that /L(A) is a positive real number or       00,    and /L(X) = 1 by (llAA). For a
product of subsets we have
with equality if the product is unambiguous. To prove (11.4.5), let us first take A, B
finite, say A = {al,"" am}, B = {b l , ... , bn }. We have
and here each member of AB occurs just once if AB is unambiguous, and otherwise
more than once, so we obtain (11.4.5) in this case, with equality in the unambiguous
case. In general (11.4.5) holds for any finite subsets A', B' of A, B by what has been
shown. Therefore JL(A'B') :::: JL(A)JL(B), and now (11.4.5) follows by taking the
limit.
   In particular, for any code C, the product CC is unambiguous by definition, hence
on writing C 2 = CC etc., we have
We shall need an estimate of JL(A) for finite sets A. We recall that X+ = X*\{I}.
Lemma 11.4.3. Let JL be any measure on X*. Then for any finite subset A of X+,
                                JL(A)   ~   max{lalla   E   A}.                     (11.4.8)
Proof. Since A is finite, the right-hand side of (1l.4.8) is finite, say it equals d. Then
A 5; XU X 2 U ... U X d , and so by (11.4.7),
                                                                                         •
From this lemma we can obtain a remarkable inequality satisfied by codes, which
shows that to be a code, a set must not be too large.
McMillan Inequality. Let C be any code on X*. Then for any measure JL on X we have
                                                                                    (11.4.9)
Proof. Consider first the case where C is finite and let max{lcllc      E   C} = d. Then by
(11.4.6), (11.4.8),
since the elements of c n have length at most nd. Taking n-th roots, we find that
JL(C) ~ (nd)l/n. Here d is fixed; letting n ~ 00, we have (nd)l/n ~ I, therefore
JL( C) ~ 1. In the general case every finite subset of C is a code and so satisfies
(11.4.9), hence this also holds for C itself.                                    •
   Of course the condition (11.4.9) is by no means sufficient for a code, since any set
C will satisfy (11.4.9) for a suitable measure, if we choose X large enough.
   A code on X is said to be maximal if it is not a proper subset of a code on X.
Maximal codes always exist by Zorn's lemma, since the property of being a code
is of finite character. The above inequality provides a convenient test for maximality:
Proposition 11.4.4. Let C be a code on X. If JL(C)      = 1 for some measure JL, then C is a
maximal code.
Proof. Suppose that C is not a maximal code. Then we can find a code B containing
C and another element b, say. We have JL(B) 2: JL(C) + JL(b) > I, and this contra-
dicts (11.4.9).                                                                 •
Theorem 11.4.5. Let n], n2, ... be any sequence of positive integers. Then there exists a
code A   = {a], a2, ... } with lad = ni in an alphabet of r letters if and only if
            r- n, + r- n, + ... ::: 1 (Kraft-McMillan inequality).                (11.4.10)
We claim that A = {a], a2,"'} is a code of the required type. The lengths are right
by construction, and we shall complete the proof by showing that A is a prefix code.
If aj is a prefix of ai, j < i, then aj is obtained from ai by cutting off the last ni - nj
digits. Thus Pj is the greatest integer in the fraction
The construction of ak given in Theorem 11.4.5 can be described by the rule: choose
the least number in the ternary scale which is not a prefix of 222 and which has no
ai (i < k) as a prefix.
11.4 Variable-length codes                                                         417
   We have seen that codes are certain subsets of X* that are not too large; we now
introduce a class of subsets that are not 'too small', in order to study the interplay
between these classes. A subset A of X* is said to be complete if the submonoid
generated by it meets every ideal of X*, i.e. every word in X* occurs as a factor in
some word in (A):
                             X*wX*   n (A)   =J.   0   for all w E X*.
Proposition 11.4.6. Let A be a finite complete subset of X * and let m be the uniform
measure on X. Then meA) :::: 1.
Proof. Let L be the set of prefixes, R the set of suffixes and F the set of factors of
members of A. Since A is finite, L, Rand F are all finite. We claim that
R(A)LUF=X*. (11.4.11)
(A) b
   We note that the converse result does not hold (see Exercise 5). Our final result
clarifies the relation between complete sets and codes.
Theorem 11.4.8 (8oe, de Luca and Restivo [1980)). Let A be a finite subset of X.
Then any two of the following imply the third:
(a) A is a code,
(b) A is complete,
(c) m(A) = 1.
Proof. (a, b) ::::} (c). If A is a complete code, then m(A) = 1 by Proposition 11.4.6
and McMillan's inequality. (b, c) ::::} (a). If m(A) = 1, but A is not a code, then
for some n, m(An) < m(A)n = 1, hence An is not complete, and so neither is A.
(c, a) ::::} (b). If m(A) = 1 and A is a code, then A is a maximal code and so it is
complete, by Theorem 11.4.7.                                                                •
   The situation is reminiscent of what happens for bases in a vector space. A basis is
a linearly independent spanning set, and of the following conditions on a set A of
vectors in a vector space V any two imply the third:
(a) A is linearly independent,
(b) A is a spanning set,
(c) IAI = dim V.
In the proof the exchange axiom plays a vital role: it has the consequence that any
spanning set contains a linearly independent set which still spans. The analogue
here is false; there are minimal complete sets that are not codes. For example, take
X = {x, y}, A = {x 3 , x 2yx, x 2y, yx, y}. It is easily verified that A is minimal complete.
The subset Al = {x 3 , x 2 yx, yx, y} is not a code and m(Ad = 1, while all other proper
subsets of A are codes. What can be shown is the following (see Boe, de Luca and
Restivo [1980]):
  A minimal complete set is a code iff all its proper subsets are codes.
11.5 Free algebras and formal power series rings                                          419
   For finite codes the converse of Theorem 11.4.7 holds: every finite complete code
is maximal (this follows from Theorem 11.4.8 and Proposition 11.4.4), but there are
infinite complete codes which are not maximal (see Exercise 5); this also shows that
Theorem 11.4.8 does not extend to infinite sets.
Exercises
1. Determine which of the following are codes: (i) {xy,xy2,yl}, (ii) {x 2,xy,x2y,
   xyl,y2}, (iii) {X 2,xy2,x2y, xy3,y2,yX}.
2. Construct a code for r = 2 and the sequence 1,2,3, .... Likewise for r = 4 and
   the sequence 1,1,1,2,2,2,3,3,3, ....
3. Let X be an alphabet, G a group and f : X* --* G a homomorphism. Show that for
   any subgroup H of G, Hf - 1 is a free submonoid of X *, and its generating set is a
   maximal code which is bifix (i.e. prefix and suffix). Such a code is called a group
   code.
4. Let X be an alphabet and wAu) the x-length of u E X*. Show that for any integer
   m the mappingf : u 1--* wAu) (mod m) is a homomorphism from X* to Z/m and
   describe the group code Of-I. Show that for m = 2, X = {x,y},
   Of- I = {y} u {xy*x}; find Og-I where g : u H wAu) (mod 3).
5. Let X = {x,y}. Show that 8(u) = wx(u) - Wy(u) is a homomorphism to Z.
   Describe the corresponding group code D (this is known as the Dyck code).
   Show that D is complete and remains complete if one element is omitted.
6. Let f : X * --* Y * be an injective homomorphism of free mono ids. Show that if A
   is a code in X*, then At is a code in Y*; if B is a code in Y*, then Bf- I is a code
   in X*.
7. Let A, B be any codes in X*. Show that An is a code for any n 2: 1, but AB need
   not be a code.
8. Let X = {x, y}, J.1(x) = p, J.1(y) = q, where p, q 2: 0, P + q = 1. Show that
   C = {xy, yx, xylx} is not a code, even though it satisfies (11.4.9).
Here (f, u) is called the coefficient of u, in particular, (f, 1) is called the constant term
off
420                                                                   Languages and automata
Since each element of X* has only a finite number offactors, the sum in (11.5.3) is
finite, so fg is well-defined. The set of all these power series is denoted by k( (X}); it is
easily seen to form a k-algebra with respect to these operations. For each power series
f its support is defined as
                              D(f)      = {u E X*I(f, u) i OJ.
Thus u lies in the support of f precisely if it occurs in the expression (11.5.1) for f
The elements of finite support are called polynomials in X; they are in fact just poly-
nomials, i.e. k-linear combinations of products of elements of X, but care must be
taken to preserve the order of the factors, since the elements of X do not commute.
These polynomials form a subalgebra k(X}, called the free k-algebra on X (see
Section 8.7). We remark that k(X} can also be defined as the monoid algebra of
the free monoid X*, in analogy to the group algebra.
   For each power series f we define its order o(f) as the minimum of the lengths of
terms in its support. The order is a positive integer or zero, according as (f, 1) is or is
not zero. For a polynomial f we can also define its degree d(f); it is the maximum of
the lengths of terms in its support. If all terms of f have the same length r, so that
o(f) = d(f) = r, then f is said to be homogeneous of degree r.
   We remark that if u is a series of positive order, we can form the series
                                   u*   = 1 + u + u2 + ....
The infinite series on the right 'converges' because if o(u) = r, then for any given
d, un contributes only if rn  :s d. Thus in calculating the terms of degree d in u*
we need only consider u for n = 0, 1, ... , [djrJ. We also note that u* satisfies the
equations
                                   u*u = uu* = u* - 1.
Hence (1 - u)u*    = u*(1 -   u)   = 1, so u* is the inverse of 1 -    u:
                                                          00
                                (1-u)-l=u*=L un .                                   (11.5.4)
                                           o
It is easily verified that k(X) has the familiar universal property: every mapping
cp : X -+ A into a k-algebra A can be extended in just one way to a homomorphism
if; : k(X} -+ A. As a consequence every k-algebra can be written as a homomorphic
image of a free k-algebra, possibly on an infinite alphabet. Of course free algebras
on an infinite alphabet are defined in exactly the same way; for power series rings
there are several possible definitions, depending on the degrees assigned to the
variables, but this need not concern us here, as we shall only consider the case of
a finite alphabet.
11.5 Free algebras and formal power series rings                                       421
   The free algebra k(X) may be regarded as a generalization of the polynomial ring
k[x], to which it reduces when X consists of a single element x. The polynomial ring
is of course well known and has been thoroughly studied. The main tool is the
Euclidean algorithm; this allows one to prove that k[x] is a principal ideal domain
(PID) and a unique factorization domain (UFD) (see BA, Section 10.2). The UF-
property extends to polynomials in several (commuting) variables (BA, Section
10.3), but there is no analogue to the principal ideal property in this case. For the
non-commutative polynomial ring k(X) the UF-property persists, albeit in a more
complicated form, and we shall have no more to say about it here (see Exercise 3
below and Cohn (1985), Chapter 3). The principal ideal property generalizes as
follows. We recall from Section 8.7 that a free right ideal ring, right fir for short,
is a ring R with invariant basis number (IBN) in which every right ideal is free as
right R-module; left firs are defined similarly and a left and right fir is called a fir.
In the commutative case a fir is just a PID and the fact that k[x] is a PID generalizes
to the assertion that k(X) is a fir. This is usually proved by the weak algorithm, a
generalization of the Euclidean algorithm, to which it reduces in the commutative
case. We shall not enter into the details (see Cohn (1985), Chapter 2), but confine
ourselves below to giving a direct proof that k(X) is a fir. This method, similar to
the technique used to prove that subgroups of free groups are free, is due to Jacques
Lewin; our exposition follows essentially Berstel and Reutenauer (1988) (see also
Cohn (1985) Chapter 6).
Theorem 11.5.1. Let F = k(X) be the free algebra on a finite set X over a field k and let
a be any right ideal of F. Then there exists a Schreier set P in X* which is maximal
linearly independent (mod a). If C is the corresponding prefix set, determined as in
Proposition 11.4.2, then for each c E C there is an element of a:
fc = c- L (ie,pP (p E P, (ie,p E k)
where the sum ranges over all PEP, but has only finitely many non-zero terms for each
c E C, such that a is free as right F-module on the fc (c E C) as basis. Similarly for left
ideals, and F has IBN, so it is a fir,
Proof. The monoid X* is a k-basis of F, hence its image in Fla is a spanning set and
it therefore includes a basis of Fla as k-space. Moreover, we can choose this basis to
be a Schreier set, by building it up according to length. Thus if Pn is a Schreier set
which forms a basis for all elements of F of degree at most n (mod a), then the
set PnX spans the space of elements of degree at most n + 1 and by choosing a
basis from it we obtain a Schreier set Pn + 1 containing Pn and forming a basis
(mod a) for the elements of degree at most n + 1. In this way we obtain a Schreier
set P = UPn which is maximal k-linearly independent (mod a) and hence a k-basis.
Let C be the corresponding prefix set. For each c E C the set P U {c} is still a Schreier
set, but by the maximality of P it is linearly dependent mod a, say
                                 fe   = C-   L (ie,pP E a,                        ( 1l.5.5)
422                                                             Languages and automata
where the sum ranges over P and almost all the (ie,p vanish. We claim that every
b E F can be written as
(11.5.6)
and the sums range over C and P respectively. By linearity it is enough to prove this
when b is a monomial. When b E P, this is clear; we need only take {Jp = 1 for p = b
and the other coefficients zero. When b rf. P, it has a prefix in C by Proposition
11.4.2, say b = cu, where C E C and u E F. By (11.5.5) we have
(ll.5.?)
For any PEP, either pu E P or pu = CIUI where CI E C and hence Ipl < lell,
Iud < lui. In the first case we have achieved the form (11.5.6); in the second case
we use induction on lui to express CIUI in the same form. Thus we can reduce all
the terms on the right of (11.5.?) to the form (11.5.6) and the conclusion follows.
  We claim that the elements (11.5.5) form the desired basis of a. To show that
they generate a, let us take b E a and apply the natural homomorphism F """"* F / a.
Writing the image of r as r, we have
          p
Since the are linearly independent by construction, we have {Jp = 0, so b = 'Lfcge
and it follows that the fc generate a. To prove their independence over F, assume
that 'Lfcge = 0, where not all the ge vanish. Then by (11.5.5)
( 11.5.8)
Take a word w of maximal length occurring in some go say in ge" Since C is a prefix
code, c'w occurs with a non-zero coefficient A on the left of (11.5.8). Hence
A= L (ie,pILe,p,
where ILe,p is the coefficient of c'w in pge. Now the relation c'w = pu can hold only
when p is a proper prefix of c', hence Ipl < Ic'l, lui> Iwl and this contradicts the
definition of w. This contradiction shows that the fe are linearly independent over
F, so they form a basis of a, which is therefore a free right ideal. By symmetry
every left ideal is free, and F clearly has IBN, since we have a homomorphism
F """"* k, obtained by setting X = 0. This shows F to be a fir.                    •
This tells us again that the prefix set C consists of all products px (p E P, x EX),
which are not in P. By Proposition 11.4.2, P is finite iff C is finite and right large,
11.5 Free algebras and formal power series rings                                       423
but in our case this just means that a is finitely generated and large as right ideal,
while the finiteness of P means that a has finite codimension in F. By replacing
the elements of X by 1 in (11.5.9), we thus obtain
Corollary 11.5.2. Let F = k(X} be the free algebra as in Theorem 11.5.1 and a a right
ideal of F. Then a has finite codimension in F if and only if a is finitely generated and
large as right ideal. If IX I = d, a has codimension r and has a basis of n elements, then
                                    n - 1 = red - 1).
                                                                •                (11.5.10)
We remark that (11.5.10) is analogous to Schreier's formula for the rank of a sub-
group of a free group (see Section 3.4); it is known as the Schreier-Lewin formula.
   We now turn to consider the power series ring. To describe the structure of k( (X})
we recall that a local ring is a ring R in which the set of all non-units forms an ideal m.
Clearly m is then the unique maximal ideal of R, and Rim is a skew field, called the
residue class field of R.
Proposition 11.5.3. The power series ring k( (X}) on any finite set X over a field k is a
local ring with residue class field k. Its maximal ideal consists of all elements with zero
constant term.
Proof. The mapping X --+ 0 defines a homomorphism of k( (X}) onto k, hence the
kernel m, consisting of all elements with zero constant term, is an ideal in k( (X}).
It follows that k( (X}) 1m ~ k, and m contains no invertible element. Any element f
not in m has non-zero constant term A and so ).., -If = 1 - u, where o(u) > O. By
(11.5.4) we have (1- u) - I = u*, hence f- I = A-IU*.                             •
  A power series f is called rational if it can be obtained from the elements of k(X}
by a finite number of operations of addition, multiplication and inversion of series
with non-zero constant terms. The rational series form a subring of k((X}}, denoted
by k(X}rat> as we shall see in Proposition 11.5.4 below, and the method of
Proposition 11.5.3 shows that k(X}rat is again a local ring.
  Let us note that any square matrix A over k(X}rat is invertible provided that its
constant term is invertible over k. To prove this fact, we write A = Ao - B, where
Ao is over k and B has zero constant term. By hypothesis Ao is invertible over k
and on writing AD I A = I - AD I B we reduce the problem to the case where
Ao = I. As in the scalar case we can now write
which makes sense because all the terms of AD I B have positive order. Strictly speak-
ing this does not make it clear that all the entries lie in k(X}; to see this we need to
invert the entries of I - U, where the terms of U have positive orders, term by term.
Since the diagonal entries are units, while the non-diagonal entries are non-units,
this is always possible. This method also yields a criterion for the rationality of
power series.
424                                                                     Languages and automata
Proposition 11.5.4 (Schutzenberger). The set k(X)rat of all rational series is a sub-
algebra ofk((X)). Moreover, for any f      E    k( (X)) the following conditions are equivalent:
(a) f is rational,
(b) f = UI is the first component of the solution of a system
u =Bu+b. (11.5.11)
Fu= b. (11.5.12)
      where F is a matrix with invertible constant term and b is a column over k(X).
  We remark that (11.5.11) can also be written as
(I - B)u = b; (11.5.13)
thus it has the form (11.5.12), where F now has constant term I and has no terms of
degree higher than 1.
Proof. (a) ::::} (b). We shall show that the set of all elements satisfying (b) forms a
subalgebra of k((X)) containing k(X)rat' in which every series of order 0 is
invertible. It then follows that this subalgebra contains k(X)rat.
  It is dear that a E k u X is the solution of UI = a. Given f, g, suppose that f = Ul>
where u is the solution of (11.5.11) and g = VI, where v is the solution of v = Cv + c,
where C satisfies the same conditions as B in (11.5.11). We shall rewrite these
equations as (I - B)u = b, (I - C)v = c. Then f - g is the first component of the
solution of the system
where el = (1.   O..... O)T and BI is the first column of B; for (11.5.14) is satisfied by
w = (UI -                                In (11.5.14) the matrix on the left is not of
            VI. U2 • •••• Urn. VI •..•• Vn)T.
the required form, but it can be brought to the form of a matrix with constant
term I (and no terms of degree higher than 1) by subtracting row m + 1 from
row 1 to get rid of the coefficient 1 in the (1. m + I)-position.
   Similarly fg is the first component of the solution of
an equation for g, where (1 - g)(1        + f)   = 1. We may assume that b = (0, ... ,0, l)T
by writing (11.5.11) in the form
and here n > 1, because Ul has zero constant term. Let Enl be the matrix with (n, 1)-
entry 1 and 0 elsewhere. Then we have
                                                                                   (11.5.16)
and we can again bring the matrix on the left to the required form by subtracting the
first row from the last, without affecting the solution. Comparing (11.5.16) and
(11.5.17), we find that
  With every language L on the alphabet X we associate a power series                 fL,   its
characteristic series, defined by
                                                     I   if U E L,
                                                 {
                                     (fL, u) =       0   if U if L.
In this way the language is described by a single element of K ( (X) ). Our main object
will be to characterize the regular and context-free languages.
moreover, each word of L has a single derivation. Let us number the variables of the
grammar as UI, ... , Un> where UI = cr, and number the terminal letters as XI, ... , X r •
The rules (11.5.18) may be written as
(11.5.19)
where the summations are over all the right-hand sides of rules with               Ci   on the left.
If we express (11.5.19) in terms of the u's and x's, we find
(11.5.20)
   For example, consider the language {xyn}, with grammar cr --+ cry; x. Its character-
istic series is obtained by solving the equation U = uy + x, i.e. U = x( 1 _ y) - I =
Lxyn.
   Next we consider the problem of characterizing context-free languages. For this
purpose we define another subalgebra of the power series ring. An element f of
k( (X)) is said to be algebraic if it is of the form f = Ci + UJ, where Ci E k and UI is
the first component of the solution of a system of equations
                              Ui   = rpi(U, x),    i   =   1, ... n,                      (11.5.21)
where rpi is a (non-commutative) polynomial in the u's and x's without constant
term or linear term in the u's. The set of all algebraic elements is denoted by
k(X) algi we shall show that it is a subalgebra of k( (X)).
Here ({II v) is the sum of all terms of degree v in ({Ii. By h~othesis, for any term u?')
occurring in ((Ii v) we have f1 < v, so the components of ui v) are uniquely determined
in terms of the u?') with f1 < v, while ufo) = 0, again by hypothesis. Thus (11.5.21)
has a unique solution Ui of positive order in the x's.
   If Ui = ({Ii(U, x) (i = 1, ... , m), v = o//u, x) (j = 1, ... , n) are two such systems,
then to show that Ul - VI, Ul VI,      L ur  = (1 - Ul) -1 are algebraic we combine the
above systems of equations for Ui, Vj with the equations W = ({II -0/1> W = ({Ilo/1>
W = ({II + Ul W respectively. This shows that we have a subalgebra. Moreover. the
elements of order 0 are invertible, so we have a local ring.                             •
that the inclusions are strict is easily seen, by considering the case where X consists of
a single letter.
   We now come to the promised characterization of context-free languages:
where ({Ii(U, x) is the sum of the right-hand sides of all the rules with Ui on the left.
On solving the system (11.5.23) we obtain for Ul the series fL, hence fL is algebraic.
  Conversely, assume that fL is algebraic, given by U1> where U is the solution of
(11.5.23). Then the language L is obtained by applying all the rules Ui --+ w, where
W runs over all the words in the support of ({Ii(U, x); hence L is context-free, as we
had to show.                                                                          •
  To give an example, the language {xnyn} has the characteristic series           L xnyn,
which is obtained by solving the equation
                                         U    = 1 +xuy.
Exercises
1. Adapt the proof of Theorem 11.5.1 to the case of an infinite alphabet.
2. Verify that the prefix set associated with a right ideal of finite codimension in a
   free algebra is a maximal code. To what extent does the finiteness condition of
   Corollary 11.5.2 apply to general (non-free) k-algebras?
3. Factorize the element xyzyx + xyz + zyx + xyx + x + z of F = k(x, y, z) in all
   possible ways. (It can be shown that any two complete factorizations of an
428                                                             Languages and automata
     element of F have the same number of terms, and these terms can be paired off in
     such a way that corresponding terms are 'similar', where a, b are similar iff
     FlaF ~ FlbF.)
4.   Show that in R(x, y) the element xy2x + xy + yx + x 2 + 1 is an atom, but it does
     not remain one under extension of R to C.
5.   Show that every CF-Ianguage on one letter is regular.
6.   Show that the inclusions in (11.5.22) are strict.
7.   Define the Hankel matrix of a power series f as the infinite matrix H(n indexed
     by X*, whose (u, v)-entry is (f, uv). Show that f is rational iff H(n has finite
     rank.
 9. Show that the generating set of a maximal free sub monoid of a free monoid is a
    maximal code.
10. Show that a subset C of X* is a code iff the submonoid generated by C has the
    characteristic series (1 - C) - 1.
11. Verify that the power series ring on X may be interpreted as the incidence
    algebra of X* when the latter is ordered by right divisibility (see BA, Section 5.6).
    (Hint. Define fu,v = (f, z) if u = zv and 0 otherwise.) Hence deduce Proposition
    11.5.3 from the fact that the matrix in the incidence algebra is invertible iff all its
    diagonal elements are invertible (see BA, Proposition 5.6.1).
                       Bibliography
                            - - - - - -
This is primarily a list of books where the topics are pursued further (and which were
often used as sources). A second list contains articles having a bearing on the text as
well as those referred to in the text. References in the text are by name and date, the
latter enclosed in round brackets for books and square brackets for papers.
I. Books
Albert, A.A. (1939), Structure of Algebras, AMS Colloquium Publications 24,
  Providence, RI.
Arbib, M.A. (1969), Theories of Abstract Automata, Prentice-Hall, Englewood Cliffs,
  NJ.
Artin, E. (1957) Geometric Algebra, Interscience, New York.
Artin, E. (1965), Collected Papers, Addison-Wesley, Reading, MA.
Barbilian, D. (1956), Teoria Aritmetica a Idealilor (in inele necomutative), Ed. Acad.
  Rep. Pop. Romine, Bucharest.
Barwise, J. (Ed.) (1977), Handbook of Mathematical Logic, North-Holland,
  Amsterdam.
Bass, H. (1962), The Morita Theorems, Oregon Lectures.
Bass, H. (1968), Algebraic K-theory, Benjamin, New York.
Bell, J.L. and Slomson, A.B. (1971), Models and Ultraproducts: An Introduction,
  North-Holland, Amsterdam.
Berstel, J. (1979), Transductions and Context-free Languages, Teubner, Stuttgart.
Berstel, J. and Perrin, D. (1985), Theory of Codes, Academic Press, New York.
Berstel, J. and Reutenauer, C. (1988), Rational Series and their Languages, Springer,
  Heidelberg.
Bourbaki, N. (1974), Algebra I, Chapters 1-3, Addison-Wesley, Reading, MA.
Bourbaki, N. (1990), Algebra II, Chapters 4-7, Springer, Heidelberg.
Burnside, W. (1911, 1955), Theory of Groups of Finite Order, Dover, New York.
Chevalley, C. (1951), Introduction to the Theory of Algebraic Functions of One
  Variable, AMS Math. Surveys 6, New York.
Cohn, P.M. (1966), Morita Equivalence and Duality, Queen Mary College Math.
  Notes.
                                          431
432                                                   Further Algebra and Applications
Cohn, P.M. (1977), Skew Field Constructions, LMS Lecture Note Series 27,
   Cambridge University Press.
Cohn, P.M. (1981) Universal Algebra, 2nd edn. Reidel, Dordrecht.
Cohn, P.M. (1985), Free Rings and their Relations, 2nd edn., LMS Monographs 19,
   Academic Press, London.
Cohn, P.M. (1991), Algebraic Numbers and Algebraic Functions, Chapman & Hall.
Cohn, P.M. (1994), Elements of Linear Algebra, Chapman & Hall, London.
Cohn, P.M. (1995), Skew Fields, Theory of General Division Rings, Vol. 57, Encyclo-
   pedia of Mathematics and its Applications, Cambridge University Press.
Cohn, P.M. (2000), Introduction to Ring Theory, SUMS, Springer, London.
Conway, J.H. and Sloane, N.J.A. (1988), Sphere Packings, Lattices and Groups,
   Grundlehren d. math. Wiss. 290, Springer, Berlin.
Curtis, C.W. and Reiner, 1. (1981) Methods of Representation Theory I, John Wiley
   & Sons, New York.
Davis, M. (1958), Computability and Unsolvability, McGraw-Hill, New York.
Dickson, L.E. (1901, 1958), Linear Groups, with an Exposition of the Galois Field
   Theory, Dover, New York.
Draxl, P. (1983), Skew Fields, LMS Lecture Note Series 83, Cambridge University
   Press.
Eilenberg, S. (1974-78), Automata, Languages and Machines, A-C, Academic Press,
   New York.
Eisenbud, D. (1995), Commutative Algebra, with a View to Algebraic Geometry,
   Springer-Verlag, New York.
Faith, C. (1981) Algebra I: Rings, Modules and Categories, Grundlehren d. math.
   Wiss. 190, Springer, Heidelberg.
Feit, W. (1982), The Representation Theory of Finite Groups, North-Holland,
  Amsterdam.
Goodearl, K.R. and Warfield Jr., R.B. (1989), An Introduction to Non-commutative
  Noetherian Rings, LMS Student Texts 16, Cambridge University Press.
Gorenstein, D. (1982), Finite Simple Groups, Plenum, New York.
Hall Jr., M. (1959), The Theory of Groups, Macmillan, New York.
Hartshorne, R. (1977), Algebraic Geometry, Graduate Texts in Math. 52, Springer,
   Heidelberg.
Herman, G.T. and Rozenberg, G. (1975), Developmental Systems and Languages,
  North-Holland, Amsterdam.
Herstein, LN. (1968), Noncommutative Ring Theory, Carus Math. Monographs 15,
  Math. Association of America.
Herstein, LN. (1976), Rings with Involution, Chicago Lectures in Math., Chicago
  University Press.
Hill, R. (1985), Introduction to Coding Theory, Oxford University Press.
Hilton, P.J. and Stammbach, U. (1971), A Course in Homological Algebra, Graduate
  Texts in Math. 4, Springer, Heidelberg.
Huppert, B. (1967), Endliche Gruppen I, Grundlehren d. math. Wiss. 134, Springer,
  Berlin.
Jacobson, N. (1953), Lectures in Abstract Algebra II, Linear Algebra, Van Nostrand,
  New York.
Bibliography                                                                     433
II. Papers
Amitsur, S.A. [1965], Generalized polynomial identities and pivotal polynomials,
  Trans. Amer. Math. Soc. 114,210-226.
Amitsur, S.A. [1966], Rational identities and applications to algebra and geometry,
  J. Algebra 3, 304-359.
Amitsur, S.A. [1972], On central division algebras, Isr. J. Math. 12,408-420.
Auslander, M. and Goldman, O. [1960], The Brauer group of a commutative ring,
  Trans. Amer. Math. Soc. 97, 367-409.
Azumaya, G. [1950], Corrections and supplementaries to my paper concerning
  Krull-Schmidt's theorem, Nagoya Math. J. 1, 117-124.
Bass, H. [1960], Finitistic dimension and a generalization of semiprimary rings,
  Trans. Amer. Math. Soc. 95, 466-488.
Bergman, G.M. [1974a], Modules over coproducts of rings, Trans. Amer. Math. Soc.
  200, 1-32.
Bergman, G.M. [1974b], Coproducts and some universal ring constructions, Trans.
  Amer. Math. Soc. 200, 33-88.
Bergman, G.M. [1978], The diamond lemma in ring theory, Adv. in Math. 29, 178-
  218.
Boe, J.M., de Luca, A,. and Restivo, A. [1980] Minimal completable sets of words,
  Theor. Comput. Sci. 12, 325-332.
Chase, S.U. [1960], Direct products of modules, Trans. Amer. Math. Soc. 97, 457-
  473.
Chevalley, C. [1955], Sur certains groupes simples, Tohoku Math. J. 7, 14-66.
Cohn, P.M. [1956], Embeddings in semigroups with one-sided division, J. London
  Math. Soc. 31, 169-181.
Cohn, P.M. [1961], Quadratic extensions of skew fields, Proc. London Math. Soc. (3)
  11,531-556.
Cohn, P.M. [1966], On the structure of the GL2 of a ring, Publ. Math. IHES, 30, 5-
  53.
Cohn, P.M. [1987], Valuations in free fields, in 'Algebra, some current trends', Proc.
  Varna 1986, eds. L.L. Avramov and K.B.Tchakerian, Springer Lecture Notes in
  Math. 1352, 75-87.
Cohn, P.M. [1989], The construction of valuations on skew fields, J. Indian Math.
  Soc. 54, 1-45.
Dieudonne, J. [1943]' Les determinants sur un corps non-commutatif, Bull. Soc.
  Math. France 71, 27-45.
Gelfand, LM. and Retakh, V. [1997], Quasideterminants I, Sel. math. New ser. 3,
  517-546.
Goldie, A.W. [1958], The structure of prime rings under ascending chain conditions,
  Proc. London Math. Soc. (3) 8, 589-608.
Hall, P. [1928]' A note on soluble groups, J. London Math. Soc. 3, 89-105.
Hall, P. [1937], A characteristic property of soluble groups, J. London Math. Soc. 12,
  198-200.
Henkin, L. [1960], On mathematical induction, Amer. Math. Monthly 67,323-338.
Bibliography                                                                     435
The number indicates the page where the term is first used or defined. Terms used
only or mainly in one location are not included. When no page number is given, the
term is defined in BA.
Number systems
N                   the   natural numbers, 24
No                  the   natural numbers with 0
Z/m                 the   numbers mod m
Z                   the   integers 2
Q                   the   rational numbers
R                   the   real numbers
C                   the   complex numbers
Set theory
o                   the empty set
IXI                 the cardinal of the set X
&,(X)               the power set (set of all subsets) of X 6
x\Y                 the complement of Y in X xi
yX or Map(X, Y)     the set of all mappings from X to Y, 2
Map(X)              set of all mappings of X into itself, 4
ker f               kernel of a correspondence f, 5
~o                  aleph-null, the cardinal of N
Number theory
max(a, b)           the larger of a and b
min (a, b)          the smaller of a and b
alb                 a divides b
                                        437
438                                                   Further Algebra and Applications
Group theory
Symn or Sn        symmetric group of degree n
Alt n or An       alternating group of degree n
sgn a             sign of the permutation a
Stp               stabilizer of the point p, 119
Cn                cyclic group of order n
Dm                dihedral group of order 2m, 93
(G: H)             index of H in G
(x,y) = x-Iy-I xy commutator of x and y
G'                commutator subgroup (derived group) of G
Aut(G)            automorphism group of G
Inn(G)            group of inner automorphisms of G
N<JG              N is a normal subgroup of G
AXlG              semidirect product, 93
NG(H)             normalizer of H in G
GLn(R)            general linear group over a ring R 116
En(R)             subgroup generated by elementary matrices
Un(C)             unitary group over C, 242
SP2m(K)           symplectic group over K, 121
On(K), SOn(K)     orthogonal group over K, 126
KI (A), SKI (A)   Whitehead group of A, 196
Field theory
[V: k]                 dimension of V over k
NE/P(x)                norm of x from E to F
TE/P(x)                trace of x from E to F
d(n                    degree of polynomial f, 275
~(R)                   field of fractions of an integral domain R
Fq                     field of q elements
Qp                     field of p-adic numbers
H                      Hamilton quaternions, 202
Category theory
Ob(d)                  class of all A-objects
d(X, y)                set of all maps from X to Y
Ens                    category of sets
440                                         Further Algebra and Applications
Gp            category of groups
Ab            category of abelian groups
Rg            category of rings
ModR          category of right R-modules
sModR         category of (S,R)-bimodules
AIlB          product of A and B, 34
AUB           coproduct of A and B, 35
AIJB          biproduct of A and B, 35
lim~(G)       direct limit, 46
lim<-(G)      inverse limit, 46
Hn(G,A)       homology group, 96
Hn(G,A)       cohomology group, 96
X(M)          Euler characteristic, 72
Extn(A, B)    Ext functor, 73
Torn (A, B)   Torsion functor, 75
                       Author index
                                             441
442                                                      Further Algebra and Applications
Hall, Philip (1904-82) 11, 102ff., 105f.      Mackey, George W. (1916-) 256f.
Hamilton William Rowan (1805-65) 200f.        MacWilliams, Florence J. (1917-90) 382
Hamming, Richard Wesley (1915-98) 373,        Magnus, Wilhelm (1907-90) 115
  375, 380                                    Malcev, Anatoli Ivanovich (1909-67) 177
Hankel, Hermann (1839-73) 428                 Martindale Wallace S. III 306
Hasse Helmut (1898-1980) 194,204              Maschke, Heinrich (1853-1908) 226ff.,
Hattori, Akira 159                             242
Henkin, Leon (1921-) 1                        Matlis, Eben 164
Herstein, Israel Nathan (1923-88) 191, 306,   McMillan, Brockway 415f.
  333,346                                     Merkuryev, Alexander S. 217
Higgins, Philip John (1926-) 308              Molien, Theodor E. (1861-1941) 241, 393
Higman, Graham (1917-) 308,333                Morita, Kiiti (1915-95) 148, 156, 159
Hilbert, David (1862-1943) 85,218
Hirata, Kazuhiko 160                          Nagata, Masayochi (1927-) 308, 333
Hochschild, Gerhard P. (1916-) 169            Nakayama, Tadasi (1912-64) 141, 181
Hocquenghem, Alexis 388                       Neumann, John von (1903-57) 163, 247
Holder, Otto (1859-1937) 101                  Newman, Maxwell Herman Alexander
Hopf, Heinz (1894-197l) 115                     (1897-1984) 19
Hotzel, Eckehart 328                          Nielsen, Jakob (1890-1959) l33
Hua Loo-Keng (1910-85) 344                    Noether, Amalie Emmy (1882-1935) 183f.,
                                                266
Iwasawa, Kenkichi 118, 126
                                              are, Oystein (1899-1968) 266ff., 276, 307
Jacobson, Nathan (1910-99) l35, 276, 309,
   324, 328, 342                              Papp, Zoltan 164
Jategaonkar, Arun Vinayak 281,318             Patterson, C.W. 412
                                              Peano, Giuseppe (1858-1932) 24
Kaplansky, Irving (1917-) 303f., 320, 342     Platonov, Victor Pavlovich (1939-) 197
Kasch, Friedrich (1923-) 289                  Plotkin, M. (1922-) 381
Kemer, A. R. 307                              Posner, Edward C. (1933-93) 334f.
Kharchenko, Vladislav Kirillovich 218         Procesi, Claudio (1941-) 302, 336
Koshevoi, E. G. 281
Kothe, Gottfried (1905-89) 191, 331£.         Quillen, Daniel Grey (1940-) 85
Kraft, L.G. 416
Krull, Wolfgang (1899-1971) l38f.             Ramanujan, Srinivasa (1887-1920) 390
Kupferoth, Achim 187                          Ray-Chaudhuri, Dijendra K. 388
                                              Razmyslov, Yuri Pavlovich (1951-) 30 Iff.
Lambek, Joachim (1922-) 63                    Rees, David (1918-) 326ff.
Landweber, Peter S. 410                       Regev, Amitai 296f.
Latyshev, Victor Nicolaevich 296, 298         Remak, Robert (1888-1942?) l39
Laurent, Pierre Alphonse (18l3-54) 279        Restivo, Antonio 418
Leech, John (1926-92) 392                     Retakh, Vladimir S. 353
Leibniz, Gottfried Wilhelm von                Roganov, Yu. V. 83
  (1646-1716) 346                             Rosset, Shmuel 293
Levitzki, Jacob (1904-56) 293, 332            Rowen, Louis Halle 335
Author index                                                                       443
                                    445
446                                               Further Algebra and Applications