John M. Chambers
PROGRAMMING
WITH DATA
A Guide to the $ Language
Springer
asPreface
ie poromninglnquge and evo fr a Kad of compute
Foviing thas ele aE
fo torn ias into aie]
ny apt inaving naa table 5, patsy me oH
Meee te rctre and messing of te at
mt ol ving ky Lsking at ty an eating
chet sumer your prope, Sc be ae wd 0m
cei cringe ection wd nk 8
osteitis ins eae ocoene wha on ae ring 24
ne a ao rnin, pers west main
2 eto wea hapem nee 1 progr Wh S
a 2 nt ile pd arin cus ey
ating rem hn cr rrcning aces ore fel 1 ou ao 12
‘she
sa ane haples of ti boak suey prosasming eS tbe
cn hen ile! DE
phigh of be main tecgsr ith xae, nding te
ampli sci 17
2 hn eset concepts dering aS roan
2 quik reference to many ofthe ols and erie 8S
rust chara who wack 0 ae speteampe of ber o e
‘Teen eg sand cng otha wo i weer!
age Ti gt eta Th nd cope oy eb
ai es ek sul mor ngs eure 2
inf you ng YOU cad a hee apes FN
tk nd th‘The remaining chapters get into more detail on various aspects of pro-
‘gramming in S. Chapters 4 and 5 discuss computations and the objects
in S, describing what's already available to use as building blocks in pro-
gramming. Starting with Chapter 6, we discuss programming itself, first in
general and then in terms of mechanisms for dealing with clastes (Chapter
17), methods (Chapter 8), documentation (Chapter 9), connections (Chapter
10), interfaces (Chapter 11), and other tools. ‘Two appendices deal with
‘more specialized topics: programming in C with S; and compatibility with
earlier versions ofS,
Besides the book itself, there is also a large amount of online documen-
tation. You should got quickly into the habit of typing ? followed by a topic
name, or using the help facility in a graphical interface, whenever you want
to know more about a function or some other topic. The web si
heep://em.be11-1abs.con/stat/Sbook
acts as an extension to the book. Look there for tools related to S, devel
‘opments more recent than the printed version of the book, and pointers to
other sources of information
‘This book does not assume you have used S before, but it does assume
you are interested in programming, in ereating new software with 8. If you
have never used $ before, I would recommend familiarizing yourself with
the basics of computing with § in an interactive, non-programming mode.
If you've purchased a copy of S-Plus, it will include documentation on the
use of the system. You can learn the basics from the S-Plus manuals, plus
the online documentation facilities. There are a number of books on $ and
S-Plus, some listed in the refereness (including books by Krause and Olson
(6), Spector {8}, and Venables and Ripley [10)). Some time spent with this
‘material would make good preparation for the programming described in
this book, if you can be patient enough to postpone those first programming
steps for a while.
‘S-Plus has many sets of functions and optional extensions for various
applications, beyond what is described in this book. ‘To find more about
‘S-Plus, look at the web site, currently
-ttp://ewu.mathsoft.con/eplua.htal
In particular, if your system does not seem to have a copy of § or S-Plus, this
is where to find out more. The version of § described in this book underlies
S-Plus versions 5.0 and higher. ‘The S-Plus libraties may extend or redefine
‘some ofthe functions described in this book, as well as providing many other
facilities. Make use of the online documentation to check, whenever you're
‘There are a wide variety of libraries and functions developed by users,
datasets of general interest and other useful adjuncts to programming, in S
‘The statiib archive isthe most comprehensive, as well as having many links
to other sources of software: look at the web site
eep://ltb, stat. ems. edu/S/
‘This book discusses what underlies and is common to all this software
the $ language and programming environment. If you have learned to pro-
gram in earlier versions of $ or S-Plus, you can use what you already kitow.
‘You will find many extensions and new features, described throughout this
book. These include new approaches to classes and methods, to documen-
tation, to handling large objects, and to scheduling events.
‘There is a great deal of detailed information to dig into. Through it all,
though, keep foremost in mind that fundamental goal: to turn ideas into
software.
About the Cover
‘The images on the cover of this book were produced by the S-Wafers soft:
‘ware, written by Mark Hansen and David James, and described in reference
(4), S-Wafers is a powerful and successful system for visualizing and mod-
cling wafer test data (data from testing electronic chips and other devices),
‘This application, used in Chapter 1 of the book, illustrates the style and
power of programming with data. The front cover shows (top to bottom);
‘the actual wafer, a plot of test results, and a mathematical model of spa-
tial effects. The three images symbolize the path we often want to follow
in programming with data, Starting from the original application and our
‘ideas about it, programming with data enables us to organize and visualize
the data ina direct way. Often, and certainly in this application, the ability
to summarize and display the data conveniently proves to be of steal value
in practice. As our understanding grows, new and perhaps deeper ideas will
bbe implemented, using quantitative models and other advanced techniques.
Growing understanding lends to new questions and new challenges to eom-
puting with the data. ‘The back cover of the book shows results foam an
experiment that varied two parameters in the manufacturing; the display
shows a two-way table of the results, but a table not of numbers but of the
results on whole wafers, condensing many numbers into each visual symbolAcknowledgments
1 would be impossible to cite more than a small fraction ofthe contributions
to 'S over the years. Starting with groups of people, the first debt is to
colleagues past and present at Bell Labs, not only for so many specific ideas,
but even more for a continually stimulating environment, probably never
tore x0 than now. At perhaps the opposite scale from local to worldwide,
the community of Susers are the real owners ofthe language, and they have
shaped it from the start by choosing how to use it. There is no substitute
for a demanding user community. Books and other writing about S, in
general and for particular applications, have helped shape the language;
Special mention here for the contributions of Bill Venables and Brian Ripley.
‘Many colleagues contributed to earlier versions: Rick Becker, Allan Wilks,
‘and Trevor Hastie all own important parts of S. In the evolution of the
‘current version of S, the community of beta test users both within Lucent
"Technologies and outside have been of great help. The continuing productive
relationship with MathSoft has contributed much to the joint growth of S
and S-Plus.
"The contributions from my colleagues at Bell Labs have been essential for
the book and for S: Duncan Temple Lang's ideas have renewed the research
‘and pointed to the future; Mark Hansen and David James, with S-Wafers,
have provided a paradigin for $ software, and have each added many impor
tant insights as well. The current approach to documentation needs special
mention: Duncan Temple Lang's ideas and software made possible the SGM
based documentation, one of the most exciting tecent directions in $; David
‘James also provided essential contributions to this work. Major contribur
tions as well have come from: Stephen Pope, an ideal beta tester; Kishore
Singhal, with many years of pioneering use of S; Rafal Kustra, for work on
the internals of S; Bill Cleveland and Stu Blank, for making the practical
side of Sa success. ‘The editorial partnership with John Kimmel, over many
Years and many books, continues to be a pleasure. Valuable suggestions
and comments came from Doug Bates, Linda Clark, Lorraine Denby, Bill
Dunlap, Steve Gotowich, Richard Heiberger, Sylvia Isler, Diane Lambert,
Clive Loader, Maria Peman, José Pinheiro, Don Sua, ‘Terry Therneau, Luke
‘Ticeney, and Scott Vander Wiel
John Chambers
Bell Labs, April 198,
Contents
1 Highlights
11 Computing with 8
12 Getting Started
1.3 Using $ Functions aon
1.3.1 Arguments to Functions «
13.2 Arithmetic and Other Operators . -
14. Data, Objects, and Databases
14-1 Objects and Classes
142 Assignments to Databases «
143 Computing with Databases
144 Getting Data into S; Connections
1.5 Waiting $ Functions ees
1.5.1 Creating a Function Object
1.52. ‘Turning Tasks into a Function
153. Editing a Function .
154 Debugging and Testing Functions
155 Documenting $ Functions
16 Defining Classes and Methods
1.6.1 Defining Methods
1.62. Defining » New Class
Lt An Extended Example «+ s+ -
LTA Creating Objects from the Class
1.72 Displaying Objects from the Class
17.3 Manipulations and Modeling,
2. Concepts
21 $ and Other Languages
22. Communicating with $23° Data: Objects
24 The Language
24.1 Sasa Functional Language |
24.2 S Expressions as Objects
‘and Chapters
‘The $ Model for Databases
252, Deming Ole Staring Capers
26 Functions
26.1 Function Objects.
2.6.2. Defining New Functions
27 Methods ees
28 Classes of Objects
2.9 Interfaces
29.1 Connections :
29.2 Interfaces to the Shell
2.9.3 Interfaces to Subroutines
25
Quick Reference
3.1 The $ Session
32 The $ Language
33 Computing with $ -
34 Databases :
35 Programming . . .
3.6 Classes and Methods
3.7 Documentation
38 Connections; Reading and Wang Bvents
39° Interfaces
Computations in $
41 The $ Session
4.1.1 Customizing the 5
4.1.2 Selecting the Working Data
4.1.3 Keeping Track of the Session
4.1.4 Quitting from the Session
4.1.5 Interactive or Not?
42 The Language
421 Syntax
422 The Language as Objects
423 The $ Evaluator
* io
ous
BL
+ 138
CONTENTS
65
66
7
nm
u
u
76
8
0
al
cy
8
88
100
107
us
- 6
: 8
119
19
+ 120
123
124
125
wr
7
128
CONTENTS
424 Control of Computations
4.2.5 Assignment Expressions
Numeric Computations
43.1 Operations on Vectors and Structures
482 Different Classes of Numeric Data
‘Testing and Matching
44.1 Comparisons; Tests of Equality
442 Matching; Hash Tables .
44.3 Regular Expressions
444 Partial Matching
Extracting and Replacing Data
45.1 Extracting and Replacing Subsets
4.5.2 General Replacement Expressions
Graphics
Models and Advanced Numerical Methods
4.71 Matrix Computations
4.72 Numerical Linear Algebra
4.73 Model-Fitting Functions
Efficiency in Large Computations
48.1 The Whole-Object View
48.2 ‘Techniques and Tools for Large Computations
483 Iteration and the Apply Func
‘The $ Evaluation Model
49.1 The Evaluator as an $ Function
49.2 Evaluating Function Calls .
493° Argument Matching.
494 Argument Evaluation
495 Arbitrarily Many Arguments
49.6 Method Sclection
49.7 Data Sharing
Objects, Databases, and Chapters
Some Important Classes of Objects .
5.1-1 Vector Classes: Fundamental S Data
5.1.2 Internal Representation of Vector Classes
5.13 Character String Data
5.14 Substrings and String Manipulation
5.1.5 Lists, Tross, Recursive Objocts
5.16 § Structures
* 167
+ 168
. 134
2137
0
2143
25
M6
uz
150
152
2154
156
156
159
161
163
163
164
165
170
2 ITs
15
15
176
178
181
183
185,
189
198
195,
196
* 200
200
205
208
210xi CONTENTS
5.1.7 Raw Data aun
5:2 Databases 212
521. Pinding and Assigning Objects tama 2213
5.2.2 ‘The Search List +218
523 Properties of Attached Databases . aT
5.24 ‘The Objects in a Database +29
5.3 Attaching and Detaching Databases 223
5.3.1 Attaching Chapters feed
$:22 Optional Argument for Attaching Databases - 25
533 Actions on Attaching and Detaching Databases 2a
53.4 Attaching Objects 2
5.5 Attaching Databases Report 28
54 Chapters : 230
54.1 Creating @ Chapter. 231
54.2 Dumping, Moving, and Rebooting a Chapter 232
55 Dumping and Restoring Objects 24
55.1. Deparsing and Dumping for Editing 235
55.2. ‘The Symbolic Dump Format 237
5.5.3 Using the Symbolic Dump Format 239
Creating Functions 243
6.1 $ Fnetions and Expressions... « 243
6.1.1 Creating and Editing Functions - 24d
61.2 Dealing with Optional Arguments. « 29
613 Programming with Arbitrarily Many Argumeie 252
6.1.4 Writing Replacement Functions 255
6.2 Organizing Your $ Software 287
62.1 The Programming Cycle 258
6.2.2 Organizing the Chapter - 259
6.2.3 Tools for Testing 5 261
63. Debugging and Error-Handling . 264
6.3.1 Browsing in the Evaluator 264
63.2 Tracing and Interactive Browsing 207
63.3 The Error Option 268
8:34 Additional Control of Errors and Invern: 270
64. Programming the User Interface 2m
Galt Rinetons fr Parsing and Evaluating 23
642 Generating Messages and Berors 237
206
7.12 Representations and Extensions
13 Prototypes; New Objects
7.14 Computations with Slots
715 Virtual Classes
TAB. Valiity-Chodking Methods :
TAT Structures and Structure-ike Classes
TAB Classes with Fixed Definitions
7.2. Relations Between Classes
72.1 Specifying is Relations
7122 Coercing: a8 Relations
73 Generating Objects from a Class -
7A. Updating Classes; Version Management
78 New Vector Classes
76 Metadata for Classes «
Creating Methods
8.1 Basic Techniques
Sil Method Speciation
8.1.2 Editing Methods
81.3 Examining Methods
814 Removing Methods
81.5 Tracing Methods fi
82. Methods for Some Important Functions
82.1 Scanning a File
822 Printing.
823 Plotting . - eo
t74 Dumping Data for Baiting. «= +
825 Extracting and Replacing Subsets
32.6 Mathematical and Summary Functions -
82.7 Arithmetic and Other Operators
83. Generic Functions
tt Generic Function as Objects
83.2 Specifying the Generic Function
83:3 Group Generic Functions -
+ 338
382
270
279
280
282
287
290
202
295
299
301
302
303
= 307
308
sil
ala
316
321
- 321
327
- 38
330
331
33
332
337
339
312
er
ot
ur9
10
u
Documentat
9.1 Viewing Online Documentation
92 SeltDocumentation
93 Editing Document
94 Documenting Clasoes .
9.5 Documentation Objects
Connections
10.1 Reading and Writing with Connections
10.2 Connection Classes
10.3 Opening and Closing Connections
10.4 Standard Input and Output Connect
10.5 Manipulating Connections
105.1 Pushing Data Back onto a Connection |.
105.2 Connection Modes; File Positions
105.3. Blocking and Non-Blocking Connections
105.4 Raw (Binary) Data on Connections
10.6 Connections and Events
106.1 Reader Connections
106.2 Monitors; Timeout Events
106.3. Choosing a Task
Interfaces to C and Fortran
111 The S Chapter.
11.11 Initializing the Chapter
M12 Attaching and Detaching the Chapter
1.13. Moding the Chap
1122 Interface Functions
12.1 The Interface to ©
11.22 The Interface to Fortran
11.3 Classes; Copying. . .
114 Dealing with WA's in C
115 Raw Data in ©
Programming in C
AA The .call Interface
A2 C Routines Returning § Objects.
A’ $ Objects from Basic $ Classes
AA. Protecting § Objeets in C
8 objects
CONTENTS
355
356
+ 362
405
407
407
409
410
ert
413
ait
400
404
435
ar
428
2420
asi
437
AS C Evaluation Utilities for $ Objects.
AG S Classes in C
AT Handling Brrors in C
Compatibility with Older Versions
B.1 Converting Old Libraries and Databases .
B2 Classes
BS Modernizing Old Data
B4_ Old-Style Interface to C and Fortran
BS Old-Style Documentation
439
440
43
445,
“aT
449
451
453,
2433Chapter 1
Highlights
‘hia chapter introduces the S language and programming envi+
‘Toument, by presenting some of the most common and useful fe
Tunes, The chapter begins and ends with an example, presented
Ensome detail, The sections in between introduce expressions,
Tate, functions classes and methods: the essential ingredients to
programming with data in S, You can read the chapter straight
Through, but since itis fairly long, you might prefer to skip the
txample, by starting with section 1.2 on page 6. On the other
hand, you can follow the example alone by reading section 1-1
‘and then section 1.7
1.1 Computing with S
§ specializes in computing with date: any application with Interesting 1
Gisements for organizing, analyzing, or presenting data is » candida
Miah useful computation can be done with data using graphical or ment
MutTioms or other non-programming approaches; S can be used in this way
aeetften is. ‘The current S-Plus system provides a graphical interface
the S language. ‘The user can select an object (a dataset) of interest and
iS Setoct one or more tools to display the object, summarize it, or perform
rove elaborate computations such as fitting some form of mode.
“This book is about programming with data, meaning that we want to
12 CHAPTER 1. HIGHLIGHTS
extend the tools available in some way, to program the system to implement
some ideas we have. This is where the $ language comes in: $ aims to make
the transition into programming easy, but to allow you to do as serious a
Job of programming as the application requires or ae your time permits
‘This transition is easier because in fact the graphical interface realy is an
interface—behind itis the $ language and programming envionment. ‘To
start programming we just examine the expressions in the language that
‘correspond to typical tools, and go on to modify these.
Because our focus is on programming, we present the tools in their lan-
{guage form, which also allows the discussion to be independent of different
possible user interfaces. Every application will use $ and the tools imple-
mented in it differently, of course. However, what $ does for you is likely
to be similar. You can expect to use $ in three main ways. It provides or-
ganization for your data, particularly by providing useful clases of objects
‘and techniques for storing the objects in S databases. ‘The existing $ tools
provide a wide range of visualization, computation, and modeling; in many
applications these will include software specialized to the data you will en-
‘counter. Finally, $ provides programming, a language and environment to
turn ideas into new tools.
‘To make our discussion more real, let's introduce an actual application.
‘The application is to data produced from modern manufacturing, specific
cally the manufacture of integrated circuits. All the proceasor and memory
chips in the computer you are using, as well as countless similar devices
hidden in the telephone system, automobiles, household appliances and else-
‘where, represent a remarkable evolution of design and manufacture, Great
reductions in size and power consumption of electronic devices, with equally
significant increases in speed and capacity, require a manufacturing process
of correspondingly increased complexity and precision,
Electronic devices (chips) are manufactured on wafers; asthe name sug>
gests, these are flat discs, though distinetly inedible. On the disc, a number
of devices will be manufactured simultaneously, anywhere from a few devices
{o hundreds, Manufacturing here means many steps of depositing and etch-
ing layers of semiconductor material. ‘The end result looks like Figure 1.1:
rectangular chips laid out on the wafer.
‘To monitor and guide the manufacturing process, large quantities of data
are generated during the many stages of manufacture. At the end ofthe pro-
‘cess, the devices are subjected to a variety of tests to measure their perfor-
‘mance before they can be shipped to users. Techniques for computing with
such data, for visualizing, summarizing, and modeling the data, have scored
11. COMPUTING WrrH $ 3
Figure 11: What your computer's components lok lik a birth: « wafer containing
devices (eg, memory or processors). ‘These are the larger light rectangles orvonsel
rows and columns. Smaller rectangles are other, special device
some notable successes in improving our understanding of the underlying
Drocess. Software programmed in S has played an important role; we will
tse some of the ideas incorporated in that software to illustrate program
‘ming with data in $. In sectiou 1.7 we will construct some actual software
{or this application. Here, we're omsiering the general ions anal how they
relate to programming with data.
Let's start with one class of data. When the wafer comes off its “asson
by line”, the manufacturer needs to test whether the chips perform up to
specifications. This is the data we want to use as an example. A complex
‘automated machine, called the probe tester, attaches itself to the terminals
‘on the chips and performs « programmed sequence of tests. In the specific
Kind of data we're considering, the result ofthe tests isa single charncter,
representing the state of the chip. There will be one or more ok states and
usually several failure states, indicating the stage at which the device failed
‘The of devices will go on to be removed feom the wafer and shipped (or nt
least put through further tests); the failed chips will be discarded
Needless to say, the results of the test data get lots of attention from
the people responsible for manufacturing the devices. The overall yild, the
fraction of of devices, isa critical quantity in measuring how wll the mane
facturing process is working. A long-running or relatively simple process
is likely to produce high yields and (depending on your viewpoint) boring
test data. But the constant pressure to improve and modify the devices
‘means that new designs and manufacturing techniques are continually be-
ing introduced. Understanding the test data for these situations is critical
Remember that the process is extremely complex, andl the data is often volu:
‘minous. Each wafer can have many devices; wafers are processed in batches,4 CHAPTER 1. HIGHLIGHTS
or lots, of a8 many as fifty wafers, and large manufacturing lines will be
processing many lots simultaneously.
igure 12 Graphical dply ofthe proborant date, White (4, ini) moans
Firedam ot et sit; Ue saris colrs (or grey levels her) indicate the dierent
Jasare modes.
One of the first, and perhaps the most important, of the contributions
cof computing with such data was just to provide a plot, such as igure 12,
aeornrne tet result on each device is color coded. ‘The importance of this
ors tha it ean present an enormous amount of information compact fs
eorscm thatthe viewer can understand easily. Many wafers can be presented
a orrtaneously, perkape in terms of other information (eg. Figure 13).
“Thre may be some 10,000 test results in such a figure, but the eye has no
trouble detecting patterns in the results.
“The software finplementing such plots allows the user to selec interac
tively 2 mrity of viewing modes: to view all or only one failure mode:
ey sue muividual wafers or a composite representing the whole lot. With
rth browser, you can look at this kind of display yourself, on our web si
netp://em.beli-labe.con/stat/project/tenamit
So where did programming with data, and in particular with S, come
into thst § was used to organize the test data into objects representing the
vanes data collected for wafers as they were manufactured. An S object
anne prebes290®, for example, might contain the probe test data for the
Tot tage ns 49998, The cab of probets908 would automatically identify it
ae rrrsta and the object would contain the information necessary to make
dhe olota shown above, The plots themselves exist at tools in the form of 5
fencions, more speifically, as methods for the S function plot. A plot [ke
Figure 12 for each wafer could be produced by the Seexpresion
11. COMPUTING WITH $
thi
ae
wee
gar onal conftons. Bach plot represcnts Ue “eponte” ot ‘measured by probe
testing.
shot
1B
|
medium
Ee
long
#
|
7
plot prebets996)
‘The plotting tools (And many others) were implemented ©) ‘Mark Hansen
ra ie ames as a package called 5.Wafers Lucent Tesnolege
and Devi ochre needing the tools could use hem through directly oF
ae gre graphical user interface, A paper by Hanson and 20 (reference
{af deveibes §-Wfers; in section 1-7 we wil rere similar tools, 10 show
te erramming with data ca uid up such afc, Yee
Mipatonn the specie clase of data and the functions 258 ‘methods needed
Tat erth dhe data wil diffe, but the style and approach deseribed in
this example usually earry over
“fe get a general introduction to programuning with Save before getting
anne She Sample, conte to section 1.2 on the next page, 120M Wall
back snus the exarape immesitely, jump ahead to page 446 CHAPTER 1. HIGHLIGHTS
1.2 Getting Started
Ifyou have never used § ot S-Plus before, here are a few steps that should
get you going. You should pick a directory in which you are going to play
‘with the language. Since $ creates files and directories to store objects, life
will likely be simpler if you create a directory that is just for use with S, at
least to start with. The recommended approach is to create a subdirectory,
5, in your login directory. In a shell, execute the commands:
sacar SHOHE/S
ce $HOME/S
Splus CHAPTER
‘The last command sets the directory up for use with S. Now you can just
type to the shell the command that invokes the version of $ you are using.
With $-Plus for example, you would type
Splus
are reesei
eae ge ea eee arena
fe ease ee
eee
Se eee ee re
ee entree nat oe ea
ne cura camend in,
a leren epee er matte
eae ges eae ney
1.3. USING $ FUNCTIONS 1
so often run $ through an interface in enaca. Other graphical user interfaces
‘exist and more are being developed all the time. Use one that you find
‘convenient or that works well in your environment. The descriptions in this
book should apply to most any interface; there may be additional tools in a
Particular interface as well
Once the session starts, $ expects any sort of legal expression as a task:
the $ evaluator will parse and evaluate what you typed and, usually, print
or plot some information in response. The S language should look familiar,
if you have used languages in the style of C, C++, or Java, ‘The following
are legal $ expressions:
x ae
xt) = -y
it(ony (data < 0)) data = expCaata)
nile(!converged)
{ model = refit(aodel); converged = atate(nodel)}
‘The rest of this chapter provides a start on programming with $ expres
sions and functions.
1.3. Using S Functions
S isa functional language. $ expressions contain function calls, uch as
unsary (oudget)
This expression gives $ a task, to call the function named suansry. $ will
construct a call to that function, giving it as an argument the name budget.
‘An $ function cal returns an $ object, the value of the call, Usually that's
all it does: funetion cals are evaluated to got the value. When the fanction
call itself makes up the user’s whole tak, the evaluation ofthe call completes
the task. For the standard user interfaces, $ then automatically shows the
value of the eal.
‘Time for an example. For the expression sumary budget), sippuse we
hhave an object budget in our database, containing some typical information
{from household accounts: checks paid out with their check mumbers, plus
deposits and other withdrawals. When the $ evaluator encounters the name
fof an object, it takes that as a shorthand roquest to got the object fro
the database. And if nothing else is done with the result, $ will show the
object to the use, by printing or plotting it. So about the simplest useful S
expression is just the name of an object, say budget.8 CHAPTER 1. HIGHLIGHTS
> budget
‘Transaction Category Amount Date
1 1499 credit Card -50.50 1
2 1440 Service -22.49 1
3 Hal Service “13.421
4 ‘Nk Withdraval -100-00 2
a MA Deposit, 652.90 2
6 1442. Telephone 65.19 2
7 143 Service ~18.00 2
8 aaa Mise -45.00 9
° A Interest 33.509
10 1448, ‘Tex -269.00 14
(lu our examples, the lines typed by the user begin with the prompt "> *
(Other lines are output, printed by $.)
Suppose we want to examine this data. We will use functions eussary
and plot.
> sunnary (budget)
Franeaction Category Anount Deve
in, 1439 Service :3 Min win, 1 1.0
st Qu.si440 Mithdraval:1 dat Qu. et Ques 1.0
Nedsan 1482 Telephone +1 Median : 20.26 Modian : 2.0
Mean 1442 Tax Ao Mean; 42.37 Moan 34
Ged Quesiéés Misc |: Std Qa.: 99.80 Sed Gai: 3.0
nat. sg85 Tatereot 1 Max. + 652.60 Max. 146.0
Mira: 3 (Other)
> plot (badgersbate, budget $hnount)
tue call to suaiary in the first expression returns a table that has the
same column names as budget, with each column summarizing the corre-
sponding data in budget. We didn’t do anything with the result, so it was
‘utomtatically printed. ‘The second expression produces a scatter plot (not
shown here) of the Date and Anoust components of the data.
1.3.1 Arguments to Functions
Calls to functions can have any number of arguments, and the anguments
ean be any S expression.
today = aateO
$502 update(tit, now[date == todey,]) >
1.3, USING $ FUNCTIONS 9
‘Bach function is defined with some set of formal arguments. The arguments
supplied in the eall are matched to those in the definition, In the call to
paste, #3 would be matched to the first argument and the expression
aav{date == today, ] to the second.
Not all the arguments in the definition need to be supplied in the call
‘The function can detect missing arguments. There may be a default expres:
sion included in the definition, or the function may do something else, or
ftmay be that for this particular eall the argument wasn't needed anyway.
‘The user and the programmer have essentially complete flexibility in treating
arguments. ‘Trailing arguments can simply be left out; for example, update
night have more than two arguments, but inthis call they are all missing
‘Optional arguments can conveniently be included by name: in the call
we set the name of the argument to an S expression, For example:
)
, fonction = "beho
nistory budget", max
Ifyou don’t remember the names ofthe formal arguments, te online doc
‘mentation function args tells you:
> arge(history)
history(pattern=".", eax=10,
fovalUate, call, menu, graphics-TRUE,
Vhereraudie £100, editor,
function, tile)
‘The documentation also gives the default values for arguments that have
them, in this example the formal arguments pattern, max, aid function
natch the expressions given and all other arguments are missing,
‘Named arguments are particularly helpful for functions that have many
optional arguments used to fine-tune the way the function works: parameters
fee numerie computations, options for how the function should use data,
‘what to do in special circumstances, and the like. As you can see, Biatory is
ne of those functions. It’s a common tendency for software in any language
fo accumulate these little features, even though the practice is sometimes
eplored. S makes the habit a bit more bearable by allowing the user to omit
frvlevant arguments and by allowing those arguments supplied to be given
by name. Swill complete argument names, 9 long as you give enough of
the name to make it unique (and special user interfaces for $ often provide
additional help); in this example, two characters would have been enough
{or each argument.0 CHAPTER 1. HIGHLIGHTS
‘The history function is a typical multipurpose function with many ar-
‘guments, It goes through the expressions for recent tasks, and either re-
turas some of these expressions (unevalusted) or does something specific
with them, There are two frequently useful applications: to re-evaluate one
of the tasks or to turn some expressions into a function definition. ‘The op-
tional arguments allow users to filter the expressions for particular strings,
say how far back the search should go, and control what is done with the
‘expressions found. The example above uses the argument function to create
‘an $ function from, inthis casu, the last 2 expressions containing the pattern
suadget
> Matory("budget”, max = 2, function = “bshow")
Function object behow defined
and saved on file bahow.S
(Turn to page 21 to see what to do next.)
1.3.2. Arithmetic and Other Operators
‘Shas the usual operators for arithmetic, comparisons, and logical operations
(plus some other operators special to S).
residuala(tit) + offset
counte/(1 + rouieaneounte))
fnount <0
Operators look different from other functions because they appear in ex-
pressions in infix (scientific) notation. When we come to program with such
‘operators, though, we will treat them just like ordinary functions, which is
whnt they are, We put the operator "* between two arguments instead
in front ofits arguments, but this is just because the interface we are us-
ing expects to parse such expressions, Most users prefer to see arithmetic
expressions written this way, but in fact the § expression,
aa
Gl123
Operator "x" computes modulus. Some special matrix operators do com:
putations in numerical linear algebra, such as matrix multiplication
Additional operators in $ extract pieces of objects. Square brackets cant
be sod to extract the portion of the object before "(" that corresponds to
the expression between the square brackets
amount (1:31
amount amount < 0 1
‘The first expression produces the frst 3 elements of amount, the second the
subset of elements corresponding to true values in the logical expression.
‘Square drackets can be extended to apply to matrices, multiway arrays, and
other sila objects. Muliple arguments betwen the square braces eer
(0 rows and columns, etc:
budget fanount < 0, “Date")
‘This selects rows satisfying the logical expression and columns corpo
to the character string name given. ce
Other operators extrac ingle ements, components of lists ofthe sats
of an objet; se Chapter 3 or the online documentation, by typing tt
28", oF 70" :
Assignment, writen =" inoue examples, is aothee important § oper
ot. $s amignent Ii very general may, Dl to mug moet
Corresponding toa ate al ao to moyen ak oped
vary ot ways twee tas, sl
an signa
budget = read. table (budget, date
S will evaluate the right side of the assignment expression and save the result
in a database, called in $ the working data, under the name on the left sil.
‘Then the object budge is available for future tasks, whenever we are work.
Ing with this database. Assignment operators are used to modify existing
objects, when the expression on the left of the assignment is a function call
instead of a name. We call these expressions replacements,B (CHAPTER 1. HIGHLIGHTS
budget (8, 3) = -45.00
‘As you would expect, this expression replaces the (8, 31 entry of Padget
enicthe value on the right side. Replacements in this style are familiar from
Moat languages, bt in $ replacements are much more general. Any function
an appear on the left of the assignment if there is some interpretation of
how the object should be modified, using the value on the right side of the
ont. For example, evaluating the expression
‘asign
engen(xt) = 14
means that the length of the object is set to the value 14 and the result
rrctasigned in place of the previous x1. We will take up assignments again
on page 14
1.4 Data, Objects, and Databases
§ focuses on programming with data, emphasizing the ability to customize
Sour view of data to match your programining nests: Data in S comes in
aoetiorm of objects, Everything is an object: this is a fundamental concept
a's. ta many waye all objects are treated equally. All objects have a class,
allowing computations on the objects to be customized. ‘There are donens
a tiuses provided with S and defining new ones is very much part of the
programming style, The behavior of classes can be customized by defining
eremods for S functions when they encounter arguments from particular
classes.
'} maintains databases for objects, allowing users to store and retrieve
cbjcte (any objects) by name. All the computations you do in 8 from
sineractive analysis and visualization to the full extent of programming, use
int ereate objects, By attaching databases, users get access to $ objects
‘and to librasies of $ functions.
14.1 Objects and Classes
[AIS objects have a class, a character string that defines what the object i
xa); elase(chass)
(4) stunceion :
14, DATA, OBJECTS, AND DATABASES 13
‘An object's lass defines the object in several ways.
‘int, itcan éefine the method used by a function when the object appears
asan argument. Whether we're plotting the object using it in arithmetic, oF
‘xtracting data from it the computations can be tailored to the particular
object by a method defined for that class.
Second, the representation of the object is defined by its class. A method
using the class can count on finding certain information in the object be:
TEAMS he clae representation is defined and accessible. The simplest castes
clasa(xt)
(01 "aunerict
>xird
(yrrra ret rretTt
> elasa(xi > 0)
1} "logical
[New elasses are built up from existing ones, most often by including sov-
et sata in the representation, each slot containing some simpler class that
‘fofce part of the information in the new class Slots bave names) asl pro>
grammers can extract of replace information by refering to the slot, using
Ere operator "a". When we come to discuss defining new clases in section
116, well use slots to do 20.
‘the thivd major eontsibution of classes comes from relations between
‘asous, When one class inchides all the behavior of another we say it extends
‘hevether class. Class extension helps greatly in defining new classes: all the
tnehods defined for the earlier class are taken over by the new class, with
no reprogram
‘Cone Lind of extension is expecially basic to the language. A virtual
clas exists only 20 other classes can extend it; it groups together classes
SHEE share some important behavior. For example, all the atomic classes
ritioned above share the essential notion of having some data values, their
Komente, which ean be extracted and replaced by referring to their indes
their position in the object. $ ealls these objects vectors; and the virial
Sas vector exists 30 all vector classes can extend jt. The numeric object
cis an example: because it's a vector we know that the expression x3{1:6]
xia eturn a vector of the first 5 elements, and Lengeh(xi) will etumn the
‘number of elements
‘When we come to define methods, we will sce that vector and other
sirteal clases are enormously helpful: we can write methods for the virtual4 (CHAPTER 1. HIGHLIGHTS
lass that exploit the common behavior, and have the method apply to all
the actual classes automatically. New classes can be defined extending the
virtual class; they too automatically inberit all the methods.
1.4.2 Assignments to Databases
SS maintaine databases for users. An $ database contains S objects, each
‘associated with a name. The objects can be anything at all: datasets, func-
tions, whatever information we need to keep and reuse (everything is an
‘object, remember?). To give S a task that says “put this object in the data-
base, with this name”, we supply an assignment expression, with the name
‘on the left and the object to assign on the right. ‘This looks very familiar
from most any programming language.
at min(xt) +)
ox (xt
{Asin other languages, the value of the expression on the right can now be
retrieved and used by supplying the name 2x1.
What happens to implement the assignment, however, is simpler from
the user's point of view and more general than in most languages. ‘The
programmer does not need to make the asignment legal, by declarations or
bby making sure that the data being assigned is consistent with previous use
of the name. When the assignment is supplied as a task for $ to do, the
cbjectissaved on the working database, typically either the current directory
for this S session or the user’s home directory. This database is maintained
throughout the $ session and from one session to the next. Any object can
be assigned to any name.
‘The expression on the left ofthe assignment operator can be more than
just a name
xi) = 0
eogen(x) = nax(10, Length(y))
assignment expressions with function calls on the left are called replace-
‘ments. The model for them is simple but powerful. If the function on the
left has a name as its first argument, § replaces the named object by some-
thing new. The new object isthe value of a call to the replacement function
corresponding to the name of the function on the left. The frst argument to
the replacement function isthe original object and the last argument is the
‘expression on the right side ofthe assignment, ‘The name ofthe replacement
function by convention is the name of the funetion on the left, concatenated
14. DATA, OBJECTS, AND DATABASES 6
with "<-" (another assignment operator). ‘The second replacement expres
sion above is equivalent to:
X= MLengthe-"(x, max(10, Lengeh(y)))
‘The generality of replacement expressions may take some getting used to,
but itis a powerful programming technique.
1.4.3 Computing with Databases
Databases are collections of objects, each associated with a name. You ale
ways have a database, the working data, available when you're computing
with §. The cHaPTER command created an empty working data, and ax
‘signment expressions you typed to $ will have created some objects in the
database. Other than assigning and maybe modifying the objects, you are
‘not expected to manage them: § manages arbitrary objects in the database.
‘There are other databases always available; in particular one or more
libraries of § functions, supplied with the language. ‘These are not different
inany essential way from your own database, except that you won't norinally
be assigning into them. When you refer to an object by name, whether it's
‘ function or an object containing data, S searches for the object in the
databases currently attached,
‘The function search returns the names of these databases
> search()
CD med
‘The first name on the list is always the working database: the name *.*
‘means the current directory. ‘The otler two names are, in this case, two S
Tibraties.
You can get the names ofall the objects on a databuase from the function
objects. Give it the database you're interested in, either by is name oF by
's position inthe asaren( output. By dela, ebject ake the working
lata,
> objected)
(1) *stast.vatue" "budge
[s] today"
oudget2" —seiee
When we typed the expression sumary(budget), $ had to find both the
fanction sumary and the object budget. The rule is always that S looks in
‘order in the databases in the search lis. From the output of objects(), we6 CHAPTER 1. HIGHLIGHTS
can see that S will find buaget on the working data, but not suxmary; that
‘must be on one of the § libraries,
> fina(*sunmary")
(1) "modes"
‘There are many tools for examining $ databases; see the tables in Chapter
4, For most computations, however, it’s enough to know that the data we
‘sign is put into the working data.
Tr other databases have objects we need (either functions or data), we
can ad these to the search list by calling Library or attach
> Lsprary(aaea)
> wearch©)
Gn'" “modets"
[Now the objects in the $ library "data" are also available,
“Phe function objects is a useful way to search inthe very large collections
‘of functions found on the libraries. Several optional arguments to objects let
you restrict your search in any way you can imagine. The argument pattern
Toturns only objects whose names match the argument. So, if we wondered
ltbout functions dealing with aatrix objects, say, one step would be to look
for objects with "matrix" in their name:
> onjectetta
[1] Nessastrix"
pattern = Matrix")
.natrix.defaule”
(3) "ia .nateix” seats
@) preatrix"
m steuatrix®
Other arguments restrict the class of object or apply an arbitrary test to
each
1.4.4 Getting Data into S; Connections
‘Most interesting programming will deal with data that originated in some
“outside” application or process. The application will have left some data
‘around somewhere, sty on a-file, We need to make a connection from the fle
to'S in order to read the data in and work with it. § provides an unlimited
Variety of ways to do this, both through existing functions and through tools
tat let you design your own functions.
"The function acan reads data items, as ordinary text, and interprets them
as data for S.
14, DATA, OBJECTS, AND DATABASES Ww
> tex = sean
4s 348 926 999 345 925 904 994 352
9: 847 936 340 337 923 927
15:
Ci] 948 925 389 945 925 994 334 992 247 335 340 597
is] 393 827
When called with no arguments, ecaa reads from standard input (keyboard
input), prompting with the index of the next item to be read. An empty
fine, o¢ an £0F signal, terminates input. ‘The scan function has o number
of arguments, but the first two are the most important ones. The fist
Srgument tells acan to read from a fle rather than the keyboard. Most
‘fren, the argument is the name of file in the file system, though in fact it
fan be any connection defining a connection between S and external sources
Gf data more about that later in this section, ‘The second argument, vhaty
Ghtines what class of object the user wants scan to look for. If what is,
say, any objet of class character, then acan wil interpret the data items
seesoments of a character vector. ‘The possibilities are unlimited, because
methods can be defined for acan to read data of any class at all
"The method for an ordinary $ list, for example, expects to get items
successively to add to each of the elements of the list supplied as what.
Suppose we want to create a list having two elements named x and y, each
pelug numeric vectors. We will read this in ftom a file. The file needs to
Ihave the frst x value, then the frst y value, then the second x value, and 80
aa We will be les likely to make errors ifthe file has two items per Tine,
though scan doesn't care about that. Suppose the file is "syData
first few lines are
156 348
182 325,
21 333
212 345
218 325
‘220 394
246 334
‘Then scen can read the data into the list we want.18 CHAPTER 1. HIGHLIGHTS
can uybatar, Lat(x = nuneric(), yemuneric())
a]
oy
ae
(1) 186 192 212 212 218 220 246 247 251 252 254 250
Us) 261 263
sy:
i) 249 325 333 345 525 24 334 992 947 335 240 997
cis} 323 327
‘There are other methods for ecas, and some additional arguments. ‘The most
important of the other arguments is a, which controls the number of items
to be read, if we don't want to read to the end of the fie.
'As we mentioned, providing the character string name of a file is only a
special case, Any S connection object would do instead. Connection objects
represent all the kinds of things in your computing environment from which
‘Scan read data, of to which $ can write data. The connection objects can
be manipulated in a variety of ways in S, when you want more control over
how data is transferred.
‘The most important way to manipulate a connection is to open it. We
didn't open *xyData", 30 scan opened it for us, and then closed it again before
returning. Another call to scan with the same file would start reading again
at the beginning of the fle. Fine for this example, but in some cases we
‘might want to read pieces of the data from the file, look at those, and then
decide what to read next.
Suppose that the same data was written on another file, "xy2*, in a
different style. First the file contains the number of elements to expect in
teach of x and y (1d in this case), then all the x data, then all the y dat
‘The essential trick to scanning sequentially is to open the file frst. Then,
‘each successive call to scan will eave the connection open and pick up reading
‘where the last call left of. ‘The fist time, we'l just read one integer number,
say ary. Next we'll read the x data, then the y data.
> xyfile = open(sy2")
> aay = acan(eyfile, integer(, nt)
> x= scan(ayfile, aunerie(), 2 = nxy)
> y = scan(ayfile, auneric(, 2 = ny)
sx
syd = listlx = x, y=)
"2
1.5. WRITING $ FUNCTIONS 19
(a) 156 182 211 212 218 220 246 247 251 252 256 268
{13} 261 268,
gy:
{1} 348 925 585 945 925 994 394 292 347 396 240 297
(13) 323 327
> ato
mr
(xytite)
‘As afinal step, we explicitly closed the connection. This n't required but it
is good housekeeping, freeing up operating system resources and preventing
any accidental use of the connection later on.
1.5 Writing 8 Functions
Programming in S begins for real when you start writing $ functions.
tries to make it easy to get started with the process. A function can be
constructed from the expressions you have already given as tasks to 8, or it
can be the result of wanting to change an existing function. Often, you find
yourself writing the same, or similar, expressions over and over. Tt makes
sense then to package these as a function,
1.8.1 Creating a Function Object,
You can create a function by just typing in an expression that defines and
assigns a function object. The idea is simple: take an expression you would
‘use to compute something interesting, precede the expression by function
followed by the parenthesized names of the objects involved, and that ex-
pression defines a function. “Take the expression log(x ~ nin(x) + 1) we
‘used on page 14. To turn it into a function:
> logerana
function(s) log(s ~ min(s) + 1)
Like any assignment expression, this tells $ to evaluate the expression on the
right of the "=" and associate it with the name on the left. The difference
in this case is that the expression on the right, when it ie parsed, defines a
function object. ‘The parser converts the reserved word function, followed by
‘2 parenthesized list of arguments, followed by an S expression into a function
‘object. The expression is called the body of the function object.0 CHAPTER 1. HIGHLIGHTS
'A function definition can appear anywhere you want in an S expression
Most often, it appears on the right of an assignment, as in this example,
which associates this function object with the name logerana
"After the assignment, the S evaluator can take 2 call of the form:
ogerenstet)
and evaluate it, using the same rules we discussed in section 1.3. Indeed,
there is no difference between a function created by a user's program and
‘one supplied with 8.
‘Once the computations get a little more complicated, providing an in-
line funetion definition as we did here becomes clumsy, of even impossible.
Often, we need the funetion body to contain several expressions, including
‘assignments. ‘The technique is to enclose these expressions in braces and
separate them by semicolons or new lines. As in many languages, the ex-
pressions in the braced lit are evaluated one after another. In 8, the key
point ia that the value of the whole braced list is the value of the last ex
pression. Functions like this quickly get too complicated to type straight of
[But perhaps we can manage something fairly simple. Suppose we want to
turn the following sequence of tasks into a function:
> a= nine)
>be maxGD =a
does (al = a/b
‘The computation has the effect of shifting the numbers in x1 to the interval
0 to 1. Let's type the same sequence of expressions, but this time as the
body of a function:
> ahite = function(s) (
fae aia
+= maxGe) -
+ Gadi
*
‘The § parser keeps prompting for more input until it has a syntactically
‘complete expression. ‘The change made from the three tasks to the three
texpressions is typical: we replaced the third line by an expression without,
‘an assignment, ‘This expression becomes the value of a call to the function
‘The other key difference between tasks and function definitions is that
ordinary assignments in the body of the function, such as those for « and
', now become local assignments in the evaluation of a call to shift. The
susignments and any storage they require go away when the call is completed.
1.5, WRITING $ FUNCTIONS 2
1.5.2 Turning Tasks into a Function
‘As an alternative to typing in the threo expressions in the body of shift, we
can ask § to create a function containing some recently typed tasks. To do
this, you supply the name you want to give the new function as an argument
to the function history, which will then turn the expressions it picks up into
the body of function. ‘The arguments pattern and wax in a cal to Nesory
cect only taska matching some string pattern and imit the number of tasks
wetted. In the computations to shift x1, we typed three expressions, each
of which contained the string "#1":
> a= aint!)
> b= maxGad) = 0
axe Gi ay
‘To turn these into a function object named shitt,
nite)
> aistory(x1", 9, functi
Function object shift defined
and saved on file ohife.5
‘The history call had two side effects: it created an object named shift on
the database and it wrote a file "ebife.S* containing an equivalent assign:
esot expression, (Computations in a functional language such as $ aren't
Supposed to have side effects, and we do try to avoid them. But Sis willing
te pmake exceptions to proper behavior when the result is a useful tool for
the user, a i i in this case.)
The notion of eapturing some tasks fom the recent history is particularly
attractive if we really just want to save typing. Consider the ptor and
‘sumary example from page 8:
> plot(budgersbate, budget $hnount)
> sunmary (budget)
fT Gnd I'm doing those two computations fairly often, it makes sense to
‘surn them into a function, say boy. This does its
> nistory(tunction = "bshow", pattern = "budget", aax = 2)
Nrote the S definition for function show to file bahow.S
> dahow
function
plot (budgetSDate, budgetSAnoust)
bumary (budget)2 CHAPTER 1. HIGHLIGHTS
‘That's it: the function bebow is ready to use.
"The tasks usually will not be exactly what you want in the function, just
‘2 rough approximation. In the bekow ease, the function is fine if | always
want to look at the same dataset. I might have wanted the function, though,
to look at a number of similar datasets, with the dataset being the argument
to the function. In the obite case that is most certainly the situation.
‘The technique isto take the file created by history and edit it. The file
Dbahow.$ starts off as this:
benow
runction()
plot (budgets0ate, budget Amount)
summary (budget)
)
All we need to dois edit the first ine to be:
bahow = tunctSon(budgst)
Once the file i edited, we can redefine the $ object bahov by the S task:
source "bshow.$")
Your chosen user interface environment will determine how you do the edit-
ing, and may provide some alternative to source as a mechanism for londing
the revised definition back into S. In our examples, we'll avoid assuming
any patticular interface and stick to standard S mechanisms. In the next
subsection, we look at a standard way to do the editing.
1.5.3 Editing a Function
Nothing is perfect, and often you will fel that some existing function, written
by you or by someone else, could be improved by a few changes. Since $
functions are $ objets, procedures for doing this are simple. We will show
here how to dump function to a text fle and then source in the edited file
to give you a new version.
“The only hard part of this is understanding the function well enough
to change i! TF you wrote the function yourself, you hope that you still
remember what it was supposed to do. I t's a standard § function or one
written by someone else, your chances will depend on how complicated the
job the function does and on how clearly the author codes, aswell as your
‘own experince with S, of course. Ifthe function looks obscure, consider
15. WRITING $ FUNCTIONS 2
that often you can modify the behavior of the function without rewriting it
by just defining a new function that cals it in a slightly special way.
‘Suppose, though, that you do want to edit the function. If it's one of
your own functions, chances are that you will already have a copy of it on a
text file. If not, or if tis a function someone else wrote, you can use disp
to create a text file. Suppose I wrote the function behow (see page 21), but
‘you decide that instead of plotting the same two components each time, you
‘want to plot all the pairs of components, You could just create your own
version of bstov, but it is usually smarter to start off with a different name
for the new function. ‘That allows you to compare the old and new easily,
Just incase the new function doesn’t work quite right. We copy the function
objet, say to an object named ayshos, and use dap to dump the object to
> myshow = behow
umpCnyshox")
rayahow.5"
‘We didn't tell dunp where to write the output, soit named the file using the
object name, a8 *ayshow.S*. ‘The file contains:
ppitov = tmactioninget)
plot (budgetsDate, budget SAnount)
sunnary budget)
All we nea to doin eiting i to change the pot call o be plo onege),
source(*myshow.")
creates the revised object nya.
Different user interfaces may provide you with other ways to dump the
4efinition ofa function toa file, orto edit a function without being conseious
‘of dumping it to a file at all. Choose whatever mechanism you find most
Convenient. As you do more programming in S, you will want to develop a
systematic approach to organizing the functions and other software you own,
Use the $ chapter for this: in a chapter you can co-ordinate S functions,
other $ objects, documentation and, if you want to, related software in
other languages. It’s worth keeping in mind that $ functions are objects:
‘the process described here creates or modifies an $ object in your chapterm CHAPTER 1. HIGHLIGHTS
database. How you dump the function to a text file, what name you give
the text file, how you source it back in, even whether you keep the text file
‘around are all your decisions. $ provides some tools but imposes no rules.
‘You give § the tasks to dump and source: the $ evaluator looks only for the
$ object corresponding to the function's name the next time you call it.
1.5.4 Debugging and Testing Functions
Once a function is defined or redefined, we need to see whether it does what
‘we want, Once the new function is assigned, itis available for use, so we can
proceed immediately to give it something to do. S is designed to encourage
you to plunge in and try things: f they don’t work, the reason will often be
‘Obvious. You can edit the function to fix the obvious problem and try again,
all within a few minutes.
Sometimes, however, it pays to look a little harder at problems, partc-
ularly ifthe eause is not obvious. $ provides some tools to help, and the
‘design of the language lets you use and modify these tools in an unlimited
‘way. Let’s look at an example using the function bshow defined on page 21:
after editing, we end up with the following function:
function(budget)
(
plot CbudgersDa
sunmary (budget)
, budget $imoune)
7
Notice that naming the formal argument budget doesn’t create any problem,
even though there is also an object named budge: on our database. Inside
the function, budget is assigned and found locally
‘The first test should be whether we can redo the tasks that inspired the
function in the first place
> bahow budget)
Transaction Category
win, 11439 Service :3
tet Qu.:1440 Wiehdrava'1
4442 Telephone #1
442 Tax 1
Had act
TMS Interest 21
3 (Other) 22
15. WRITING $ FUNCTIONS 2%
Good, now let’s try it on another object we have around, budgst2:
> babow (budget)
Problons Ho sethod for plotting clase "WULL" vs class *nupericé
Devag 7 (yla)
Not o good. In evaluating the expression, $ has come to a point where some
function decides there is a problem, and that it doesn't know what to do
next.
“The function then invokes S's error recovery, for example by calling the
§ fanetion stop. You have control over what happens next, but by default $
invites you to debug the problem interactively, through the function recover.
‘This function seta up an interactive environment for you, in which you can
type ordinary § expression, plus afew commands special to recover.
‘The first important difference from ordinary evaluation is that you are
‘working locally in a fonction call where the problem arose. Arguments and
‘hjecta assigned locally in the function can be studied, even modified. To
get started, just anawer y to the invitation,
> vehow (budget)
Errors No sethod for plotting class "NULL" ve class "muneric
Debug * (yin): ¥
Brovaing in frane of bebou(budgot2)
‘Local Variables: budget
Whenever you don't know what's available, evaluating the expression ? will
‘provide some hints.
RB?
‘Type any expression. Spoctsl comands:
Tip’, tdowa’ for navigation between fran0s.% CHAPTER 1. HIGHLIGHTS
sn the function calle?
‘uhore? # bere are
‘dump? # dusp frenee, ead hie task
tqi ond this task, no dump
‘go’ # retry the expression, with corrections made
Browsing in frane of behov (budget?)
Local Variablea: budget
In this ease, let's look at the data:
budget,
‘row.labele Transaction Anount Data
Magazine 1 14s 2018
Mise 2 May 10818
Deposit 3 MA 1000 19
Staring at that for a bit shows that the last column is called "Data", not
pate", as we had intended. Sooms likely to have caused confusion. We can,
in this case, try modifying the data on the fy; to change the last name.
‘Then, the command go will tell Sto go on with the task.
> nanos budget)
(1) *rov-labela” “Transaction” "Amount" "Data"
W nanos (budget) ((4]] = “Dat
(1) "Date"
Fovtabels Tranwaction Amount
in 1-000 Mine st446 Min. ns 106.00
fet Qv.:i.25 Int Qu.ri648 tet Quo: -85.75
edsan 2.00 Median <1446 Yadian : 20.00
fen 2.00 Mean, 18860 Youn 291.70
Sed 00.:2.75 Sra Ou.sl4e7 Sra Oe. 745.00
Mer 3.000 Rags S447 Mans: $0000.00
Very nice. To be honest, most debugging problems aren’t quite so simple
that we can just edit the data and go on. More typically, there is something
15. WRITING S FUNCTIONS Fa
‘wrong with one of the functions, Just use the command q to quit from
‘The error recovery is only one way to use interactive browsing in $ func
‘ions. Instead of waiting until an error occurs, you ean call the function
Drowser, which provides the same style of interaction,
‘You can call this function anywhere, but the simplest way to usc i is to
‘trace calls to a particular $ function. No editing of the function is needed:
‘call to trace will provide a temporary version of the function you want to
trace. A typical notion is to call the browser either on entering or on exiting
‘call to some function.
Consider the shite function defined on page 20. Suppose we'e using it
to reseale the output from some complicated computation. ‘The results 60
far are in bigx, a large numeric object. Well use shitt on that and save the
result.
> Lengeh(oiex)
[E11 400000
> xx = anitt(bign)
We should check that things worked as expected. But clearly we don’t pla
‘on examining 400,000 numbers. What to do?
‘Testing software isa hard and very important part of programming; we'll
discuss it many times throughout the book. $ has a varity of tools, and
the everything is an object concept will turn out to be essential. Right nov,
we just want a way to look at the data; the function plot might be good,
since it would show all the numbers on a single plot. For 400,000 numbers,
hough, a plot might take a little while, Lets settle for just looking at a few
numbers
> xxlt:10)
(17 WA A MA NA HRA HA A A NK
‘The value A in S stands for undefined numeric data (missing, or the result
of some numeric computation whose result is undefined). Scoms unlikely
that all the initial values were undefined in big
> digxtiss0)
[12 0.6817872 -1.2800208 -0.6903045 -0.1199928 1.4296871
(6) -0.3890879 0.130248 -15897425 1.5309090 -0. 7088244
‘But let’s not just flail around: now we ean use sore of the debugging tools
in $. Let's trace the computations in anite: if we sayey CHAPTER 1. HIGHLIGHTS
teace(ahitt, browser, oxitebrovaer)
then an interactive browser will be called from each call to shift, once at
the beginning and again just before returning In ths case there isn't much
to see at the beginning so let's just use:
rsca(shife, oxit-brovser)
Like recover, brovser evaluates any expression you tye, but in the coritext
from which browser was called. To remind you where you are, it changes
the prompt into ab followed by the name identifying the context, We look
at the values of a and b:
> trace(ahift, exitebrovser)
Sox abit (Oig0)
(Op exit: Called from: shife(bigh)
bosbite)> =
(a) a
boanite)>
(ma
‘They'ee both undefined: ths clue should make us look up the documentation
ofaie:
Db(anste)> Tai
Title!
Usage:
ax(..., aa-r09F)
pin(ss.y Rare)
Axgunests
of numeric arguaents
alin e logical value, default ‘FALSE’. Tf ‘TRUE’,
iseing values are ignored
vais
the single aaxisun or ainisum value found in any of
the ange: aay *HA"e in the date produce ‘MA’ as &
Feruit unless ‘nara? is ‘TRIE.
ae
[An, wo if bigk had any undefined values at all, the min and max will be
Tundeined, ‘The fanetion is.na reports NA values in its argument; we don't
‘want to ee the 400,000 fogial values, so we just ask if any of the values were
missing.
1.5. WRITING $ FUNCTIONS 2
(enise)> aay(ie.na(2))
at
boenite)> @
‘That solves the mystery, and now we can decide what we want wo do. We
will edit the function ahitt to fix up the problem.
What do we went hore? Either W's are unacceptable, in which case We
showld shook and produce an error, or we only want to resale using the
rans vatuce, Te’ easy to include a test, and the any(se.naG)? wil do
terete 8 function step isthe standard way to produce an erro: i (akes
iy numberof arguments, pastes them together, and uses the result as a
error message:
sf (aay (ia na?)
seopCtnisving values not alloved")
“The call to stop puts the user into a dialog with the recover function, just
ts happened somewhere deep in the plotting on page 25.
Sere this really the right decision on our part? Tt puts the burden
‘on the wor to do something about missing values. If it were true that
Geta the data made the ebift computation fundamentally meanings,
vast decision would be reasonable. But the shift is just an application
srraithmetio, and $ arithmetic isn't bothered by missing values, Missing
ar operands of arithmetic produce missing valus inthe result, without
Complaining, and that seems eminently sensible here.
iP anatioas will be more useful if they don't introduce inessentil extra
requirements on the user Ia the abitt example, all we need vo do to retain
Tar reate ganerality of the arithmetic operations i to ignore #h values when
Sanputing the scalars and b. The online documentation for nin on Page
98 already showed us how: include the nara argument.
Here is an implementation:
anit =
function(s)
= minG, mare)
b= naxx, na.raeT)-a
[Notice that the arithmetic to do the shift itself doesn’t need to take Sry
vote of values since we're following the standard $ method In dealing
‘with them, Let's try out the new version30 CHAPTER 1. HIGHLIGHTS
> ax = shite(oign)
> exft:10)
1] 0,8617603 0.259648 0.2529725 0.4420463 0.6877482
(6) 0.408249 0.s825645 0.218845 0.7037401 0.9601110
> min(sx, na-raet); max(3x, na.rm=T)
to
tt
Now the answer seems reasonable.
‘The first instinct here might be to just throw away the ws, retu
only the actual values. ‘This is a bad idea, usually, because the user will
likely be working with other data parallel to x. Ifwe throw away the missing
values, the user can't relate the scaled result to those other objects. Worse,
we didn't provide any clue that the values were thrown away. As a result,
some error or confusion is likely to result later on, making for a mystery.
So far we have concentrated on debugging: browsing interactively after
we have discovered that something needs fixing. Other tools in $ help to find
‘things that need fixing. One useful tool is the function s11.equal. You call
this function with two arguments, target and current. The notion is that
both are supposed to have come from equivalent computations: 12.equal
‘compares them and reports the nature of any diferences. In general, this
tricky business, requiring some notions of when thing are “close enough” and
of which pieces of a complicated object we want to regard as essential. The
function can be extended by writing new methods, but it comes equipped to
‘work on most common $ objects. Testing with a11 equal works best if there
are some nice identities impliod by our new function. In akite, for example,
‘we should be able to get back the original data by rescaling and then adding
on the minimum of the original
> ae aint)
> b= aax(et)
> xe = ehife(el)
Daxeaed
> alLequel (xt, x2) :
tat
Not all the values of xx will necessarily be identical to those of x1, when we
are doing numerical computations; for this reason, «11 equal allows for some
tolerance in deciding on equality. You may want to look at documentation
for other $ functions designed to assist in testing: try 7Adentical, %40.text,
or terror. check.
1.5. WRITING $ FUNCTIONS 3
What general lesson do these examples show? A very important one:
Use $ to Program in 5. The same S expressions we might use to study data
interactively can be usod, in exactly the same way, to test or debug our
programs. When browsing either in error recovery or in a traced funetion,
all the resources of $ are available, These browsers let us work on objects in
‘the function call frames, but these are ordinary § objects just local rather
than stored on a databese,
1.5.5 Documenting $ Functions
We have used the 7 online help operator to get information about § functions
‘and other topics. Once we get into programming, we should thiak about
documenting our own functions as well. Other people who use our functions
will appreciate it, and even if we're only programming for ourselves, it helps
to document functions in order to remember what they really were meant
‘to do when we come back to them later on,
All $ functions are self- renite
Tele:
Function shite
snise(s)
x: argunent, no default
Not very informative, but at least it tells us the arguments.
‘The first step in documenting a function is to include some descriptive
‘comments in the function definition. Any comments at the top of the defi.
nition are interpreted by the self documenting software in $ as « description
of the function. Let's add a couple of lines to the last version of abst:
ale ‘x? to the range 0 to 1
sing values alloved,
function(s)
(
st(any ie.na(2)))
stop(*nissing values not allowed"2 CHAPTER 1. HIGHLIGHTS
a = nin)
b= mata
(eae
)
‘With the comments added, users get some useful information:
> tenite
Tele
ste.
Function shite
‘ease:
ase)
drgunen
Hi argunent, no defaclt
Deacription
sehdane ‘a? to the range 0 to 1. Mo sissing valsee
allowed. “s
‘Adding comments to functions is easy, and you should try to make ita habit
‘whenever writing & new function to put a line or two describing it at the
top of the definition. When you edit the function, glance at the comments
to see if they should be changed to reflect the change in the function itself
‘Those simple steps will go a long way to making your functions easy to use
‘Eventually, you will likely want some more elegant or thorough documen-
tation for functions that prove really useful. Section 9.3 discusses techniques
{or creating and editing $ online documentation formally; for the present
‘chapter, we can got along fine using comments in the functions.
1.6 Defining Classes and Methods
Methods in § define how a particular function should behave, based on
the clas of the arguments to the function. You can find yourself involved in
programming new methods for several reasons, usually arising either from
working with a new class of objects or with a new function:
1. You need to define a new class of objects to represent data that behaves
«bit differently from existing classes of objects
1.6. DEFINING CLASSES AND METHODS 33
2, Your application requires a new function, to do something different
‘fom existing functions, and it makes sense for that function to behave
differently for different clases of objects.
YYou may also just need to revise an existing method or diferentiate classes
that were treated together before. However it comes about, defining & new
tnethod is the workhorse of programming with objects in S.
‘Conversely, defining a new class ia less common, Dut often the crucial
sep. Clames encapsulate how we think about the objects we deal with, whet
see vation the objects contain, what makes them valid, You will ikely write
many more methods than class definitions, particulary since each new ass
“rebeition typically generates a number of new methods. But the usefulness
St your project will depend on good design of the object clases, probably
more than anything.
“To begin, hough, we will discuss the simpler project of designing meth
‘ods. Let's take on # project to write a function that returns a one line”
seciption ofan object. We could just type the nase of the object, of
Course, and 8 would show us that object. Also, the function eumary, suP-
sie eth S, is designed to summarize the essential information in an objets
veally in a page or so. Our project takes susnary one step further, with the
goal of a one-liner.
We will name the function vhatie and give it just one argument, the
“object we're interested in. One definition might be:
> whatia = function(obdject) class(object)
3 Thatia(i:i0); whatie(erate.x91); vhatie(natis)
[a] "integer"
[1] seateix™
[1] "tunction®
Olay, but not much of a programming contribution. What else might we
et to know about the object? Something about how big it is, perhaps.
We can use the § function length. So we might ty, as a second attempt:
> whatis * function(object) paste( "An object of cle
1h Chasa(object), "and Length", Length (object)
“The S function pases pastes strings from all its argument into single strings
Let's try this definition on a few objects:
> wnatioGeD
Ta) thn object of clase nuneric and Length 14"u CHAPTER 1. HIGHLIGHTS
object of class atrix and Length 42"
> whatie(ebatse)
(1) "in object of clase function and length
Wel, better but not great. ‘The idea of longth is fine for numeric objects, and
generally for the vector objects we discussed on page 13. But for a matrix
‘we would like to know the number of rows and columns, and for function it
‘may not be clear what we would like, but certainly the length has no obvious
relevance at all. Lets go back to the simpler definition, class (object), and
think things over.
1.6.1 Defining Methods
Around now we realize that the genetic purpose of the function (in this case,
to produce an informative one-line summary) needs to be implemented by
ferent methods for different kinds of objects. ‘The class/method mecha-
iam in $ provides exactly this facility. We will define methods for those
clases of objects where we can gee a useful, simple summary. The existing
function still plays a role, now as the default method, to be used when none
of the explicit methods applies. For this purpose we will want to return to
a simple definition.
For a definition of whatis for ordinary vectors of numbers, character
strings, or other kinds of data, the Length is quite reasonable. As mentioned
fon page 13, this is where virtual classes are #0 helpful. We don’t need to
implement a method for every actual vector class, just one method for all
vectors
‘The method is defined by a call to the function setwethod:
sunction(object)
paste(elasa(object), “vector of length", Lengeh(object))
>
We tell setmethod three things: what generic function is involved, what
classes of arguments the method corresponds to, and what the definition of
the method is. $ uses the term signature for the second item: in general, it
matches any of the arguments of the function to the name of a class. The
definition of the method is a function; it must always have exactly the same
arguments as the generic, This is the fist method defined for vhatie, 90
just takes the ordinary function as defining the generic
1.6. DEFINING CLASSES AND METHODS 35
‘The vector method will do fine for those ordinary vectors, but for objects
with more complicated classes, we can do mote. Consider matrices, for
example. We would like to know the number of rows and numberof columns.
What else should we include in a one-line summary? Well, matrices are
examples ofS structures: objects that take a vector and add some structure
‘to’it. So we might ask whether the relevant information about the underlying
‘vector could be included. We decided before that the class and the length
‘are useful descriptions of vectors, but in this case we don't need the length
if we know the number of rows and columns. We can include the class of
the vector, though, and this is useful since matrices can include any vector
class as data. All that information can be wrapped up in the function
whatTolatrix = function(object)
paste(ciass(as(object, *vector")), “matrix with",
rrow(object), “rove and", ncol(object), "columna")
1m order to make this the matrix method for vhatia, we call the function
‘setHethod again,
sottethod(*uhatie", “antrix", what Totatrix)
‘We can call showtethods to se all the methods currently defined fora generic.
> shotathods("uhat ie)
Database object
sare
snatrix"
‘Three methods are defined, all on the working database (which happened to
appear as "." in the search list of databases). The method corresponding
to class Aly is the one used if none of the other methods matches the class
of object; in other words, the default method.
{At this point, there are some observations worth noting,
'* We did not define the default method. Because there was an existing
dunphothod (“wnat
Method epecifieation uritten to file
{asked for the method for class suseric; there is no explicit method, but
ines clase munerie extends class vector, that method was written out, The
file name chosen combines the name of the generic and the signature of the
method we're looking at. The fle *vhatie.nuneric.5* contains:
‘setHathod shat sa"
Fanction(object)
paste(elass (object), "rector of Length
>
rengen (object)
‘The settethod call, when evaluated, will define the new method, but cur-
rently just contains the method for class "vector". After we edit the file
source(*whatte.nuneric.$")
will define the method. As an exercise, you might try writing a method for
imoric objects: pechaps in addition to length, the min and max might be
interesting, but take a look at page 29 frst.
Lets look at a few calls to wbatse with the methods defined so far.
> vnatia(1:10)
[1) "integer vector of length 10°
> what ieGe)
1.6. DEFINING CLASSES AND METHODS a
1) *nuneric vector of Length 14"
> whatie(at >
{1} "logical vector of length 14°
> vhatia(letters)
{1} tcharacter vector of Length 26
> vaatie Cn)
Tt) suumeric matrix with 14 rows and 3 coltans
> whate(pasto)
{il "én object of clas
“the case of a function object still falls through to the default method, be.
san cavtunction object is not a vector. There is nothing particularly difheuly
sa eating with functions ax objects, but you will need to Gnd some ols 0
telp. If you'd Uke to try writing whatie method for function objects, =
page 79 in section 2.6.1
function”
1.6.2 Defining a New Class
Designing a clas is an extremely important step in programming with dows
‘owing you to mold $ objets to conform to your application's needs. he
day step isto decide what information you class should contain, What dose
Ai Sata mean and how are we likely to use it? ‘There are often dierest
‘RSs to organise te same information; no single choice may be unequivocal
TRht, but some time pondering the consequences ofthe choices wil be well
invested.
“The mechanism for realing classes is fairly simple: you call one function,
setciere, to define the new clas, and then write some associated functions
se retthods to make the clas usefl. For many new clases, this includes
some of all ofthe following:
1. software to create objects from the class, such as generating functions
for methods to read objects from external files;
2, perhaps a method to validate an object from the class;
53, methods to show (print and/or plot), oF to summarize the objects for
users;
4. data manipulation methods, especially methods to extract and replace
subsets;
‘5, methods for numeric calculations (arithmetic and other operators, math
functions) that reflect the character of the objects.8 (CHAPTER 1. HIGHLIGHTS
‘We will sketch a fow of those here, using a relatively simple, but practical,
‘example, In section 1.7, a more extended example is given. See section 8.2
for more information about important functions, those for which you may
want to write new methods.
Suppose we are tracking a variable of interest along some axis, perhaps
by a measuring device that records a value yi at a position 1, a value yz at
122, and so on. The vector y is the fundamental data, but we need sometimes
to remember the positions, z, as well. This example was developed by my
colleague Scott Vander Wiel for an application with = being distance along
fa length of ber optic cable, and y some measurement of the cable at that
position. Clearly, though, the concept is a very general one,
How do wo want to think of such data? Basically, we want to operate
fon the measurements, but always carry along the relevant positions and
‘use them when this makes sense; for example, when plotting. What is the
natural way to implement this concept? We could represcat the data as a
‘matrix, with two columns. This leaves the user, however, to remember when
to work with one oF the other column, or both. We could represent the data
‘asa list with two components, but this has a similar problem.
'S provides a class definition mechanism for such situations. We can de-
cide what information the class needs, and then define methods for functions
‘to make the class objects behave as users would expect. Users of the meth-
‘ods ean for most purposes forget about how the class is implemented and
Jjust work with the concept of the data.
Classes can be defined in terms of named slots containing the relevant
information. In this case, the choice of slots is pretty obvious: positions and
measurements, or x and y. The $ function setCiass records the definition
cof a class. Its first two arguments are the name of the class and a repre-
sentation for it—pairs of slot names and the class ofthe corresponding slot.
‘The function representation constructs this argument, Let's call the new
class track, reflecting the concept of tracking the measurement at various
positions.
set
now knows the representation of the class and can do some elementary
‘computations to create and print objects from the class. The operator "0"
can be used to extract or set the slots of an object from the class, and the
function clase can be used to extract or set the class. ‘To create a track
object from posi
> tri = now(vereck,
sentation(x = *naneric", y = "nuneric"))
ya(erack", rep
1.6. DEFINING CLASSES AND METHODS 39
‘The function nev returns a new object from any non-virtual class, Its frst
argument is the name of the class, all other arguments are optional and if
they are provided, $ tries to interpret them as supplying the data for the
new object. In particular, as we did here, the call to new can supply data for
the slots, using the slot names as argument names.
_ Since $ knows the representation of the class, an object can be shown
using the known names and classes of the slots. ‘The default method for ehow
will do this:
Slot “2:
1] 186 182 211 212 218 220 246 247 251 252 254 258 261 262
Stot y":
(1) fe 926 933 345 325 534 996 392 947 235 340 397 323 27
W's also possible to convert objects into the new class by a eall to as. For
example, a named list, xy, with elements x and y could be made into a track
object by as(ay, “rack").
‘Most classes will come with generating functions to ereate objects more
‘conveniently than by calling ney. $ imposes no restrictions on these fune-
tions, but for convenience they tend to have the same name as the class
‘Their arguments can be anything. For track objects, a simple gencrator
‘would look much like what we did directly
# an object repronenting
function(s, 9)
sasurenents “y’, tracked at positions ‘x!
x, numeric")
(7, Pmuaerie")
it Gengeh(s) t= lengeh(y))
stop(*x, y should have oq
ny
From a user's perspective, a major advantage of a generating function is that
the call is independent of the details of the representation.
> trl = erack(post, respi)