Skip to main content
Honey bees live in groups of approximately 40,000 individuals and go through their reproductive cycle by the swarming process, during which the old queen leaves the nest with numerous workers and drones to form a new colony. In the spring... more
Honey bees live in groups of approximately 40,000 individuals and go through their reproductive cycle by the swarming process, during which the old queen leaves the nest with numerous workers and drones to form a new colony. In the spring time, many clues can be seen in the hive, which sometimes demonstrate the proximity to swarming, such as the presence of more or less mature queen cells. In spite of this the actual date and time of swarming cannot be predicted accurately, as we still need to better understand this important physiological event. Here we show that, by means of a simple transducer secured to the outside wall of a hive, a set of statistically independent instantaneous vibration signals of honey bees can be identified and monitored in time using a fully automated and non-invasive method. The amplitudes of the independent signals form a multi-dimensional time-varying vector which was logged continuously for eight months. We found that combined with specifically tailored weighting factors,
this vector provides a signature highly specific to the swarming process and its build up in time, thereby shedding new light on it and allowing its prediction several days in advance. The output of our monitoring method could be used to provide other signatures highly specific to other physiological processes in honey bees, and applied to better understand health issues recently encountered by pollinators.
Research Interests:
Research Interests:
The document ‘Data Sets for “Basic Statistical Graphics for Archaeology with R: Life Beyond Excel” ’ is an Excel file associated with the referenced text. All the data sets in that text that are accompanied with R code for analysis are... more
The document ‘Data Sets for “Basic Statistical Graphics for Archaeology with R: Life Beyond Excel” ’ is an Excel file associated with the referenced text. All the data sets in that text that are accompanied with R code for analysis are contained in separate worksheets. See Section 1.6 of the text for details of how this should be used. The text itself provides the necessary bibliographic details and archaeological background.
Research Interests:
The document `Figures for “Basic Statistical Graphics for Archaeology with R: Life Beyond Excel” - R code’’ is a Word file associated with the referenced text. It contains full details of the R code need to undertake the analyses... more
The document `Figures for “Basic Statistical Graphics for Archaeology with R: Life Beyond Excel” - R code’’ is a Word file associated with the referenced text. It contains full details of the R code need to undertake the analyses illustrated in the text where, in some cases, abbreviated code is given. See Section 1.6 of the text for details of how this should be used as well as the first page of the document itself.
Research Interests:
The archaeological literature is permeated with the use of simple statistical graphics, such as histograms, pie charts, barplots, scatterplots and boxplots. The text illustrates how such plots can be obtained using the open source... more
The archaeological literature is permeated with the use of simple statistical graphics, such as histograms, pie charts, barplots, scatterplots and boxplots. The text illustrates how such plots can be obtained using the open source software system R; the other graphical methods covered are kernel density estimates, ternary diagrams and correspondence analysis. Copious illustrative examples using real data are provided and placed in their archaeological context. Full details of the data sets and R code used are available as separate supplementary files.

Part of the motivation for writing the text was dissatisfaction with much of what is to be found in this literature. The authors’ view is that graphs are sometimes unnecessary, where the relevant information is better conveyed in a simple sentence or commentary on a table. Graphical presentation is also often poor, probably because of reliance on the ‘defaults’ in conveniently available menu-driven packages, such as Excel, not specifically designed for statistical analysis. This contention is supported through analysis of, and reference to, real data and the way they are commonly presented in the literature.

Although R is not menu-driven – taking more time to learn than packages in common use – it allows far more control over presentation than such packages. It also forces the user to think about the real need for a graphic, as well as its design. It is hoped that the text will encourage a more critical appraisal of the use of basic graphics in archaeological publications. The same care and attention should be lavished on them as on the main text, into which they should be properly integrated.
Research Interests:
The book is intended as an introduction to the ‘classical’ methods of multivariate analysis that have been widely used in archaeometric data analysis for the last 40 years or so. The main methods covered are principal component, cluster... more
The book is intended as an introduction to the ‘classical’ methods of multivariate analysis that have been widely used in archaeometric data analysis for the last 40 years or so. The main methods covered are principal component, cluster and linear discriminant analysis, as well as exploratory methods that can be used before these are applied. The emphasis is on the ideas that underpin these methods, their implementation, and interpretation. Some of the relevant mathematics is included for completeness, but the intention is that the book can be read without needing to understand the mathematical detail provided. The intended readership is archaeological scientists, presumed to have some knowledge of basic statistics but not any great experience of multivariate analysis.
This is a book-length treatment on the subject of statistical methods frequently used in Quantitative Archaeology and their implementation using the open-source software R. It is not intended as a textbook or as an introduction to either... more
This is a book-length treatment on the subject of statistical methods frequently used in Quantitative Archaeology and their implementation using the open-source software R.  It is not intended as a textbook or as an introduction to either statistical methods or R, though it is intended to be accessible. There are plenty of good texts on these subjects to which this text might be seen as complementary.
There is an emphasis on the analysis of real data sets, more so than in typical introductory quantitative archaeology texts. There is, similarly, an emphasis not to be found in other archaeological texts on the practicalities of implementation using modern statistical software, R. This is an open-source and extremely powerful package that has grown rapidly into something of an ‘industry-standard’ in applied statistics, and other application areas, that has not really touched archaeology much yet, at least in terms of what is visible in the literature.
The underlying premise, that statistical methods are conceptually simpler, and more easily implemented, than is often conceded is spelled out in the introductory chapter. For those unfamiliar with R, and with data they need to analyse, one way of putting this is that there is life beyond Excel and SPSS well worth making the effort to discover.
The text is moderately lengthy and is also available in the form of individual chapters, along with the data sets used, in Excel format.
Research Interests:
To emphasize, this is an exact reprint of the 1994 text, with a new introduction. The original text has been out of print for some time, but has been republished by Eliot Werner Publications/Percheron Press. The new introduction indicates... more
To emphasize, this is an exact reprint of the 1994 text, with a new introduction. The original text has been out of print for some time, but has been republished by Eliot Werner Publications/Percheron Press. The new introduction indicates how a rewritten text might be updated, and the graphics would take advantage of modern software, but as it stands, the four main techniques covered - principal component analysis, correspondence analysis, cluster analysis and discriminant analysis - are still widely used in quantitative archaeology much as they were at the time of original publication.
This a substantial revision of earlier notes on cinemetric data analysis, still available. It is, effectively, now a 'book' (the earlier notes were incomplete) with preface, table of contents, index and so on.
Research Interests:
See Book abstract; the Introduction expands on this
Research Interests:
See abstract for book.
Research Interests:
See abstract for book.
Research Interests:
See abstract for book.
Research Interests:
See abstract for book.
Research Interests:
See abstract for book chapter.
Research Interests:
See abstract for book.
Research Interests:
See abstract for book.
Research Interests:
See abstract for book.
Research Interests:
See abstract for book.
Research Interests:
See abstract for paper.
Research Interests:
See abstract for book.
Research Interests:
See abstract for book.
Research Interests:
See abstract for book.
See abstract for book.
Research Interests:
See abstract for book.
Research Interests:
See abstract for book.
Research Interests:
Excel data files for "Notes on Quantitative Archaeology and R". The context, bibliographic details etc. are in the "Notes"
Research Interests:
This paper demonstrates that, at a third century cemetery outside of the auxiliary Roman fort at Brougham, (England NGR NY 5450029000), it was only adult males who had glass drinking vessels deposited with them as grave goods. By using... more
This paper demonstrates that, at a third century cemetery outside of the auxiliary Roman fort at Brougham, (England NGR NY 5450029000), it was only adult males who had glass drinking vessels deposited with them as grave goods. By using significance tests, especially Fisher’s exact test, it can be shown that the pattern was unlikely to have arisen by chance. The deposition of glass cups in male graves was part of a much wider pattern within the cemetery where artefact deposition both on the pyres and in the graves could be shown to be highly influenced by the age and the sex of the deceased. The full consideration of this was published in Cool, H.E.M. 2004.  The Roman Cemetery at Brougham, Cumbria: Excavations 1966-1967. Britannia Monograph 21 (London). The statistical methodology was further explored in Cool, H.E.M. and Baxter, M.J. 2006. 'Cemeteries and significance texts', Journal of Roman Archaeology 18, 397-404.
This is an unpublished paper from 1995. It was a comment on a paper in the Scottish Archaeological Review. It was submitted for publication but sadly the journal ceased to publish in the intervening period, so I did not have the... more
This is an unpublished paper from 1995. It was a comment on a paper in the Scottish Archaeological Review. It was submitted for publication but sadly the journal ceased to publish in the intervening period, so I did not have the satisfaction of having it rejected (which is what I expected). As I hope is evident the paper is something of a ‘rant’.  The article commented on is essentially theoretical but engaged in some data analysis including one of the worst abuses of histograms I’d seen in the archaeological literature to that point. It reduced me to a mixture of anger and despair and the paper explains why. I’d have hoped that matters would have improved over more than 20 years, and would not normally resurrect a paper like this, but, alas, not so. Ample evidence of much more recent abuses of histograms, bar charts and pie charts, with the evidence, is provided in the more extensive, and possibly balanced, account in Baxter and Cool (2016). It is interesting that some of the worst abuses are the product of younger scholars of theoretical bent who feel obliged to engage in some sort of data analysis they are not well-equipped for.

https://www.academia.edu/29415587/Basic_Statistical_Graphics_for_Archaeology_with_R_Life_Beyond_Excel

The other reason for revisiting this, to slightly misquote form Wendy Cope’s poem ‘Making cocoa for Kingsley Amis’
‘I knew it wouldn't be much of a paper
But I love the title.’
It is shown that a simple ternary diagram can represent much of the morphological and temporal variation in the set of post-Medieval wine bottles under discussion. This is confirmed using more complex discriminant analysis methodology.... more
It is shown that a simple ternary diagram can represent much of the morphological and temporal variation in the set of post-Medieval wine bottles under discussion. This is confirmed using more complex discriminant analysis methodology.

[Additional note: The paper was published nearly 30 years ago and was my first venture into publishing a statistical analysis of archaeological data. Apart from the purely personal, if it retains any interest it is one of the first - if not the first - applications of Aitchison's (1986) to compositional data analysis of archaeological data. I was enthusiastic about this for some years but later developed a more nuanced view, explained in publications from the early 1990s on.]
The paper is a product of what was a long-running investigation into the production and analysis of Roman colourless glass. The emphasis is archaeological and interpretative; though it draws on statistical ideas these are not the main... more
The paper is a product of what was a long-running investigation into the production and analysis of Roman colourless glass. The emphasis is archaeological and interpretative; though it draws on statistical ideas these are not the main focus. Caroline Jackson was the lead author.
This is a comparatively early British application of correspondence analysis (CA) to study an archaeological problem. Specifically, CA is used to investigate the relationship between assemblage compositions and building functions. It... more
This is a comparatively early British application of correspondence analysis (CA) to study an archaeological problem. Specifically, CA is used to investigate the relationship between assemblage compositions and building functions. It formed the basis of 1 9993 conference presentation, but publication was delayed until 1995. By this stage a much fuller account had also appeared, including the archaeological background, in Cool et al, referenced in the text, which also appeared in 1995.
Research Interests:
This paper appeared in the first edition of Internet Archaeology over 20 years ago. A revised version in letterpress is presented here for the first time. Regrettably the title is very similar to that of Baxter (2017), recently made... more
This paper appeared in the first edition of Internet Archaeology over 20 years ago. A revised version in letterpress is presented here for the first time. Regrettably the title is very similar to that of Baxter (2017), recently made available on academia.edu. The papers are, however, rather different, dealing with opposite ends of the temporal spectrum within which kernel density estimation has been used in archaeology. It is simplest to explain the reasons for and nature of the revisions in a newly added preface. The original paper brought together our earliest work on the use of kernel density estimation for archaeological data analysis. Some if this is now primarily of historical interest but the paper also involves considerable data analysis, including ways of applying kernel density estimates that have not been emulated much, if at all, in archaeological applications. It is this aspect of the paper which might have some contemporary interest, and the reason for making it available in a differently accessible form. Preface: This paper, as noted in the abstract, appeared in the first edition of Internet Archaeology in 1996; the original paper is freely available online at http://intarch.ac.uk/ journal/issue1/. MB recently made a paper available on academia.edu that reviewed the history of the use of kernel density estimates (KDEs) in archaeology with a particular emphasis on developments since 2000, when the methodology began to be much more widely used (Baxter, 2017). The present paper has a regrettably similar title to the 2017 paper, but they are rather different. The 2017 paper attempted to bring the story up to date; the present paper goes back to the beginning, so far as we are concerned. The authors began collaboration on KDEs in the early 1990s. Conference presentations and publications apart this was our first extended journal publication on the subject, and in an online format then quite novel. We think we can claim some credit for introducing KDEs into the archaeological literature, though it was not really taken up until the 2000s. Our collaboration lasted for the best part of 10 years before our career paths diverged. If Google Scholar is to be believed the Internet Archaeology paper has not been cited much; it is a slightly later paper in the Journal of Archaeological Science that has attracted more attention (Baxter et al., 1997). This paper has sometimes been credited with introducing the use of bivariate KDEs for spatial analysis and such use is now quite common given the availability of the methodology in GIS systems (Baxter, 2017). The interest shown in Baxter (2017) encouraged us to revisit the Internet Archaeology paper, which we had not done for some while. While parts of it are inevitably dated we were surprised how innovative some of it still seemed, which is why we thought it might be worth making it available in an alternative and revised format. In fact we have tried to limit revisions and additions to a minimum except where they seemed to be necessary. The paper is of its date but we have, with an important exception to be noted, kept it as originally written. Transferring an online paper, which can 'jump around' a lot, to a linear narrative is not necessarily straightforward, but we wrote the original paper as a linear narrative so transferring it to the present format has not been an onerous task. The main advantage of publishing online was that we were allowed far more text and figures than would be common in conventional journal publication, and this proved highly advantageous.
Research Interests:
This paper is over 20 years old so much of it is of purely historical interest if any. It is an early account ofdevelopmental work undertaken by Christian Beardah and myself on the subject of the use of kernel density estimates (KDEs) in... more
This paper is over 20 years old so much of it is of purely historical interest if any. It is an early account ofdevelopmental work undertaken by Christian Beardah and myself on the subject of the use of kernel density estimates (KDEs) in archaeology, Christian being very much the lead author.

At the time KDEs had hardly been used in archaeology and little in the way of accessible software was available so Christian had to develop MATLAB routines to udertake analysis. One of the purposes of the paper was to advertse this,  but it is now outdated and development has not been maintained. One reason is that software, such as the open source system R, is now widely available for undertaking kernel density estimation and this is what I now use. If the paper here maintains any interest it is because it illustrates some applications of KDEs - such as adaptive and boundary kernels - that have otherwise not been much used in the archaeological literature.
Research Interests:
A simple, simulation-based model of temporal uncertainty is presented that embraces other approaches recently proposed in the literature, including those more usually involving mathematical calculation rather than simulation. More... more
A simple, simulation-based model of temporal uncertainty is presented that embraces other approaches
recently proposed in the literature, including those more usually involving mathematical calculation
rather than simulation. More specifically, it is shown how the random generation of dates for events,
conditioned by uncertain temporal knowledge of the true date, can be adapted to what has been called
the chronological apportioning of artefact assemblages and aoristic analysis (as a temporal rather than
spatio-temporal method). The methodology is in the same spirit e though there are differences e as that
underpinning the use of summed radiocarbon dates. A possibly novel approach to representing temporal
change is suggested. Ideas are illustrated using data extracted from a large corpus of late Iron Age and
Roman brooches, where the focus of interest was on their temporal distribution over a period of about
450 years.
This paper demonstrates that morphological bias can have a major impact on the types of Roman brooches recovered by metal-detecting. It is demonstrated that penannular brooches and one-piece brooches are under-represented in any... more
This paper demonstrates that morphological bias can have a major impact on the types of Roman brooches recovered by metal-detecting. It is demonstrated that penannular brooches and one-piece brooches are under-represented in any assemblage which is based on this method of recovery. Roman brooches are one of the commonest finds reported to the Portable Antiquities Scheme whose data are increasingly being used in large-scale geo-referenced research. The bias highlighted has important implications for such research as it can be shown that there is a systematic under-representation of brooches from particular periods. Though the bias is demonstrated using Roman material, it is shown that items with similar morphologies dating to different periods are also under-represented.
Research Interests:
Regional and temporal patterns in brooch use in Britannia are studied, confirming and challenging ‘received wisdoms’ about ‘regionality’. The complexity of the ‘Fibula Event Horizon’ is brought into sharp focus; a similarly complex and... more
Regional and temporal patterns in brooch use in Britannia are studied, confirming and challenging ‘received wisdoms’ about ‘regionality’. The complexity of the ‘Fibula Event Horizon’ is brought into sharp focus; a similarly complex and unexplained ‘Fibula Abandonment Horizon’ is also clearly demonstrated. Conclusions are insensitive to assumptions about use-life. Detailed analysis for the family of trumpet brooches casts light on hitherto unappreciated features of ‘regionality’. Comparison with continental data suggests the British temporal patterns may be reflecting a wider north-western province pattern. Understudied aspects of bias in metal-detected finds and their implications for studies of this kind are noted.
In archaeological applications involving the spatial clustering of two-dimensional spatial data k-means cluster analysis has proved to be a popular method for over 40 years. There are alternatives to k-means analysis which can be thought... more
In archaeological applications involving the spatial clustering of two-dimensional spatial data k-means cluster analysis has proved to be a popular method for over 40 years. There are alternatives to k-means analysis which can be thought of as either ‘competitive’ or ‘complementary’ such as k-medoids, model-based clustering, fuzzy clustering and density based clustering among many possibilities. Most of these have been little used in archaeology. That k-means has been a method of choice is probably because it is easily understood, is perceived as being geared to archaeological needs, and was rendered accessible at a time when computational resources were
limited compared to what is now available. It is, in fact, a long-established approach to clustering that pre-dates archaeological interest in it by some years. The theses of the present paper are that
(a) other methods are available that potentially improve on what is possible with k-means;
(b) these are (mostly) as readily understood as k-means;
(c) they are now as easy to implement as k-means is; and
(d) merit more attention than they have received from practitioners who find k-means useful.
The arguments are illustrated by extensive application to a data set that has been the subject of several previous studies.
Research Interests:
The book 'Anglo-Saxon Graves and Grave Goods of the 6th and 7th Centuries AD: A Chronological Framework' (2013) is a major contribution to, and revision of, Anglo-Saxon chronology, the importance of which has already attracted... more
The book 'Anglo-Saxon Graves and Grave Goods of the 6th and 7th Centuries AD: A Chronological Framework' (2013) is a major contribution to, and revision of, Anglo-Saxon chronology, the importance of which has already attracted considerable attention. The central Chapters 6 and 7 on the chronology of male and female furnished inhumations depend heavily on statistical and scientific methodology and reasoning that not all Anglo-Saxon scholars are necessarily equipped to follow.  This assessment is based on comment from such scholars, who have also commented that the book makes for very difficult reading. This paper, and its companion on the male graves, was conceived as a 'reader's guide' to the analysis that tries to separate out the more important aspects from those that the less statistically informed reader might wish to avoid.

The statistical methodology developed in the book is, in an archaeological context, innovative and more advanced than other comparable methodology I am familiar with. The value of this extends beyond Anglo-Saxon studies and should be of interest to a much wider audience. This and the companion paper have taken a critical approach concerning the 'readability' of some of the text, but the methodological importance should not be obscured. This paper concerns Chapter 7 on the chronology of the male graves. A separate paper on Chapter 6, the male graves, has been written. The analysis of the female graves, the subject of this paper, posed problems of a different and -- from a statistical perspective -- more interesting nature.
Research Interests:
The book 'Anglo-Saxon Graves and Grave Goods of the 6th and 7th Centuries AD: A Chronological Framework' (2013) is a major contribution to, and revision of, Anglo-Saxon chronology, the importance of which has already attracted... more
The book 'Anglo-Saxon Graves and Grave Goods of the 6th and 7th Centuries AD: A Chronological Framework' (2013) is a major contribution to, and revision of, Anglo-Saxon chronology, the importance of which has already attracted considerable attention. The central Chapters 6 and 7 on the chronology of male and female furnished inhumations depend heavily on statistical and scientific methodology and reasoning that not all Anglo-Saxon scholars are necessarily equipped to follow.  This assessment is based on comment from such scholars, who have also commented that the book makes for very difficult reading. This paper, and its companion on the female graves, was conceived as a 'reader's guide' to the analysis that tries to separate out the more important aspects from those that the less statistically informed reader might wish to avoid.

The statistical methodology developed in the book is, in an archaeological context, innovative and more advanced than other comparable methodology I am familiar with. The value of this extends beyond Anglo-Saxon studies and should be of interest to a much wider audience. This and the companion paper have taken a critical approach concerning the 'readability' of some of the text, but the methodological importance should not be obscured. This paper concerns Chapter 6 on the chronology of the male graves. A separate paper on Chapter 7, the female graves, has been written because of the different, and interesting, statistical and interpretive challenges it posed.
Research Interests:
The paper explores the use of correspondence analysis (CA)for comparing vessel glass assemblage, though it has more general application. A prior requirement is that assemblages be quantified so that comparisons are valid, and a means of... more
The paper explores the use of correspondence analysis (CA)for comparing vessel glass assemblage, though it has more general application. A prior requirement is that assemblages be quantified so that comparisons are valid, and a means of doing this is described. The 'peeling' part of the title refers to the fact that an initial CA will often reveal the more obvious structure in the data - for example, clear groups and/or outliers. If these are 'peeled' form the data (e.g. outliers removed, and groups treated separately) thesn CAs of the subsets so defined can reveal more subtle and archaeologically interesting patterns may emerge. Further 'peeling' and analysis can be undertaken if needed; that is, an iterative and exploratory approach to such analysis is advocated. A number of Romano-British assemblages are used for illustration.
In archaeological studies involving cemetery analysis based on the content of burials it can be of interest to examine the data in the form of two by two cross- classifications; for example, male/female against presence/absence of some... more
In archaeological studies involving cemetery analysis based on the content of burials it can be of interest to examine the data in the form of two by two cross- classifications; for example, male/female against presence/absence of some artefact type. The paper provides several examples. The obvious question of interest is whether or not there is an association between the categories involved in the cross-classification. A first thought is to apply a statistical significance test, of the null hypothesis of no association, with the chi-squared test coming to mind. Not all archaeologist think like this, however, and the paper was prompted by an article that would have benefited, but didn’t, from such an approach. Problems arise if the data set is small, in which case Fisher’s exact test is available. This is not widely used in archaeology and we explain the mechanics of this and the chi-square test, with illustrations in some detail. Our most interesting example concerns the incidence of diffuse idiopathic skeletal hyperostosis DISH) in male and female burials at Poundbury (Dorset). A quie complicate analysis suggests that males were more prone to DISH than modern reference populations, opening up interesy questions as to why this might be so.
Research Interests:
The paper is concerned with the statistical analysis of a large sample of loomweights found in excavations at Insula VI.1, Pompeii., using dimensional data. An unexpected finding was the bi-modality of the distribution of weights,... more
The paper is concerned with the statistical analysis of a large sample of loomweights found in excavations at Insula VI.1, Pompeii., using dimensional data. An unexpected finding was the bi-modality of the distribution of weights, associated with aspects of shape. We are not aware that such phenomena have been previously been commented on, Probably because of the small sample sizes typically available. Interpretation currently has to be speculative, but we wonder if the bimodality might reflect temporal differences.
An approach to testing for modes in low-dimensional data, Silverman’s test, novel in an archaeological setting, is described and illustrated. ‘Patterns’ in archaeological data can be suggested by the presence of modes. Reassurance is... more
An approach to testing for modes in low-dimensional data, Silverman’s test, novel in an archaeological setting, is described and illustrated. ‘Patterns’ in archaeological data can be suggested by the presence of modes. Reassurance is needed that modes suggested by graphical analysis are genuine before attempting substantive archaeological interpretation. The test either provides such reassurance, or else guards against over-interpretation, particularly with small samples. Data on loomweight dimensions, lead isotope ratios, and ceramic compositions are used to illustrate use of the test, dealing with issues concerning outliers and small samples as they arise. The focus is on univariate mode detection.
Pompeii and statistics? Iron age tombs and correspondence analysis? It comes as a surprise to many that there are any applications at all of statistics and mathematics in archaeology. But as Mike Baxter explains, counting what you find... more
Pompeii and statistics? Iron age tombs and correspondence analysis? It comes as a surprise to many that there are any applications at all of statistics and mathematics in archaeology. But as Mike Baxter explains, counting what you find can give new insights into the past.
This paper is a modified version of part of the introduction to my 2003 book 'Statistics in Archaeology' that dealt with what I considered to be `landmark' papers in the development of quantitative archaeology up to to about 1990. The... more
This paper is a modified version of part of the introduction to my 2003 book 'Statistics in Archaeology' that dealt with what I considered to be `landmark' papers in the development of quantitative archaeology up to to about 1990. The original text has largely been retained, but edited so that it reads coherently (I hope) as a paper in its own right. No changes have been made to the original selection, nor have new papers been added, but some of the entries have been expanded to include my later thoughts with reference to some later publications that have interested me. The section in the 2003 book on the journal and book literature has been retained, as it was, but with some additions.
Research Interests:
This is the record of a conference presentation at the Computer Applications in Archaeology conference of 1999, not published, for various reasons, until 2004. The main interest lies, if anywhere, in the application of the clustering of... more
This is the record of a conference presentation at the Computer Applications in Archaeology conference of 1999, not published, for various reasons, until 2004. The main interest lies, if anywhere, in the application of the clustering of kernel density estimates applied to art historical data derived via scientific methodology.
Three approaches that have been used to investigate assemblage diversity in the archaeological literature, two established and one new, are studied, with a particular emphasis on assemblage richness. It is argued that the established... more
Three approaches that have been used to investigate assemblage diversity in the archaeological literature, two established and one new, are studied, with a particular emphasis on assemblage richness. It is argued that the established regression and simulation approaches, as often used, are only strictly valid if they assume what they are supposed to test - namely that assemblages are sampled from populations with the same richness or structure. Rarefaction methodology provides an alternative to the simulation approach and suggests, that even if the latter is used, sampling without rather than with replacement is preferable. Some potential limitations of a recently proposed approach using jackknife methods are noted, and it is suggested that bootstrapping may be a more natural resampling method to use.
K-means spatial clustering, or pure locational clustering, has been a popular tool for spatial data analysis in archaeology since its introduction by Kintigh and Ammerman in 1982. Among its acknowledged limitations is the problem of... more
K-means spatial clustering, or pure locational clustering, has been a popular tool for spatial data analysis in archaeology since its introduction by Kintigh and Ammerman in 1982. Among its acknowledged limitations is the problem of choosing an appropriate level of clustering and, more seriously, the fact that the method tends to produce circular clusters of equal size, regardless of the true underlying structure. This paper draws on recent developments in the statistical literature to show how these problems can be overcome, and illustrates the methodology on both simulated and real data. What emerges is presented not as an alternative to k-means clustering as usually practiced, but as a method in the same spirit that overcomes some of its limitations.
This short account of neural networks (NNs) in archaeology was originally to be included in my 2003 book 'Statistics in Archaeology' but had to be omitted for reasons of space. At the time I wrote it the neural network literature in... more
This short account of neural networks (NNs) in archaeology was originally to be included in my 2003 book 'Statistics in Archaeology' but had to be omitted for reasons of space. At the time I wrote it  the neural network literature in archaeology was widely scattered and, by common consent, reasonably limited. I attempted to locate, to about 2002, as many references as I could. The essay to follow comments on some of these, and my general impression then was that NNs still had to 'prove themselves' as a tool for archaeological analysis superior to simpler alternatives. Things may have changed but NNs is not my field and I haven't read most of the later papers on the subject in detail. With minor changes and additions the text is as I originally wrote it. What I have done is to try and locate papers using neural networks that post-date 2002, and have included these in the bibliography (such papers are mostly not referenced in the text). The bibliograpy is now double the length it was in 2003 and makes no claims to be exhaustive. This paper might best be thought of as a bibliographical resource.
This is an updated and rewritten version of a section originally intended for inclusion in Baxter (2003) which had to be omitted for reasons of space. The topic is perhaps a little esoteric, but interesting in that it shows how... more
This is an updated and rewritten version of a section originally intended for inclusion in Baxter (2003) which had to be omitted for reasons of space. The topic is perhaps a little esoteric, but interesting in that it shows how statistical modelling techniques, applied to a paticular problem, can develop in complexity over time. We are very much in the realm of modeling here, rather than more exploratory forms of statistical data analysis which have, perhaps, had a more dominant role in the archaeological literature. Most of the work on this problem took place from the early 1980s through to the early 2000s and I have not located much since. Updating the paper did not involve a great deal of research; I could only locate one 2004 article that post-dated what I originally wrote, and Google Scholar suggests that no subsequent archaeological publication has drawn on it. The earliest models proposed were straightforward and tractable; they gradually grew in complexity to encompass models that required rather complex Bayesian methodology to estimate them. In essence, it became a computational statistical playground to explore new and complex statistical methodology beyond, I suspect, the understanding of most archaeologists.
This paper presents a synthesis of current approaches to the comparison of archaeological assemblages. It draws its data .from Roman Britain but the methodology discussed is equally applicable to other periods and places. Different types... more
This paper presents a synthesis of current approaches to the comparison of archaeological assemblages. It draws its data .from Roman Britain but the methodology discussed is equally applicable to other periods and places. Different types of assemblages including  those of small finds, broken vessels and animal bones are discussed, and the problems relating to quantification are considered. The different sorts of questions that may  be asked of data of varying quality are examined, and it is shown that even 'poor quality' data can provide useful insights into pas1 societies. It is argued that to explore the full richness of the data available, multivariate statistics are an invaluable tool and this is illustrated by exploration of two groups of assemblages using Correspondence Analysis. Finally, attitudes within the archaeological community  which may prove  a barrier to further  advances are examined.
Three approaches that have been used to investigate assemblage diversity in the archaeological literature, two established and one new, are studied, with a particular emphasis on assemblage richness. It is argued that the established... more
Three approaches that have been used to investigate assemblage diversity in the archaeological literature, two established and one new, are studied, with a particular emphasis on assemblage richness. It is argued that the established regression and simulation approaches, as often used, are only strictly valid if they assume what they are supposed to test - namely that assemblages are sampled from populations with the same richness or structure . Rarefaction methodology provides an alternative to the simulation approach and suggests, that even if the latter is used, sampling without rather than with replacement is preferable. Some potential limitations of a recently proposed approach using jackknife methods are noted, and it is suggested that bootstrapping may be a more natural resampling method to use.
Byrd (1997) proposed a non-linear regression model for investigating the relationship between assemblage richness and sample size that has clear advantages compared to the log-linear regression model often used. The method used to... more
Byrd (1997) proposed a non-linear regression model for investigating the relationship between assemblage richness and sample size that has clear advantages compared to the log-linear regression model often used. The method used to estimate the parameters of this model is, however, unsatisfactory, resulting in estimates of population richness notably lower than the maximum assemblage richness observed, and producing visually poor fits to some of the data sets used. Baxter (2001) noted some problems with the method but did not provide the technical details, which this note provides. An improved approach to estimation is described and the model used is generalized. Application to the same data sets used by Byrd produces unequivocally superior results that are sufficiently different to call into question some aspects of the archaeological inferences to be drawn from the analysis.
The paper formed the basis of a 2000 conference publication. At the time the co-author, Simon Westwood, and I were exploring the potential of projection pursuit (PP) methodology as an alternative to the usual use of principal component... more
The paper formed the basis of a 2000 conference publication. At the time the co-author, Simon Westwood, and I were exploring the potential of projection pursuit (PP) methodology as an alternative to the usual use of principal component analysis (PCA) for exploring multivariate archaeometric data. For various reasons, discussed in the text, we did not pursue the research in any depth. Chief among the was that I did not thing PP had much to offer, in practice, over PCA and, at the time, was not suited to routine application.
This paper was presented at the 2003 Compositional Data Analysis workshop, held in Girona, Spain. It explores a variety of approaches to the multivariate analysis of glass compositional data. The emphasis is on different approaches to... more
This paper was presented at the 2003 Compositional Data Analysis workshop, held in Girona, Spain. It explores a variety of approaches to the multivariate analysis of glass compositional data. The emphasis is on different approaches to data transformation, including problems posed by zero values when log-transformation is involved.

The workshop was designed to promote the use of log-ratio analysis as expounded at length in Aitchison (1986). While some of our conclusions were equivocal we did suggest that log-ratio analysis did not always work well and that methodology that proponents of log-ratio analysis would regard as 'incorrect' often produced more satisfactory results. Given what I can only describe as the intensive 'evangelical' attitude of such proponents in favour of their preferred methodology the message of our paper, and other similar ones, did not go down well (to put it mildly). A later, accessible and, I hope, balanced review of the issues involved is provided in Baxter and Freestone (2006), Archaeometry 48, 511-531.
The paper is a relatively early exploration of issues involved in detecting unusual data in large archaeometric data sets.
Research Interests:
[A preliminary and additional word of explanation may be in order here. It's the first paper I published on an archaeometric topic and obviously out-of-date, particularly on the computational front. I was, in a sense, 'selling' a... more
[A preliminary and additional word of explanation may be in order here. It's the first paper I published on an archaeometric topic and obviously out-of-date, particularly on the computational front. I was, in a sense, 'selling' a methodology. I realised fairly rapidly that the data set I'd used for illustration had such obvious structure that any method woulld recover it. I also came to revise my views about the methodology; papers dealing with the evolution of my thought are available on academia.edu of which the most thorough is the 2006 Archaeometry paper by Baxter and Freestone. For the reasons alluded to above I've not made the paper more widely available but do get requests to do so, so for what it's worth here it is. Everything tha follows is as originally published.]

Several recent articles have published and analysed information on the chemical composition of glass from a range of archaeological contexts. A common theme of most of these papers has been to classify glasses on the basis of their chemical composition. Subsequently a successful classification may be related to temporal or regional differences between the groups identified. Apart from descriptive and graphical appraisal of the data the statistical methods most used are methods of multivariate analysis, chiefly cluster analysis (CA) or principal component analysis (PCA). These methods are applied either to the raw compositional data or to such data standardised to have similar variability. Recent research by Aitchison (1982. 1983, 1984, 1986) has criticised this approach and has developed an alternative methodology for compositional data analysis. The purpose of this paper is to describe and evaluate this methodology as applied to the classification of glass data. A brief review of the usual approaches, Aitchison's critique and computational considerations is given in the next three sections. The remainder of the paper applies and discusses the approaches. For illustrative purposes the data of Cox and Gillies (1986), on analyses of blue soda glass from York Minster and other sources, is used. The data is reproduced, in reordered form, in Appendix A with Cox and Gillies' groups I , 2 and 3 PCA and CA methods are described in many texts of multivariate analysis. Applications of PCA to glass data include Christie rr al. (1979) and Sharaf et al. (1986); applications of CA include Christie et ul. (1979), Sharaf et uf. (1986). Kuleff et al. (1985) and Cox and Gillies (1 986); simpler graphical approaches with similar aims are given in Henderson and Warren (I98 1) and Tite (1987). The following brief description of PCA and CA summarises their salient features for the purposes of this paper. Assume p commensurable variables are measured on each of n objects and that the distance between objects i and k, d,, say, is given by 45
At CoDaWork'03 we presented work on the analysis of archaeological glass composi-tional data. Such data typically consist of geochemical compositions involving 10-12 variables and approximates completely compositional data if the main... more
At CoDaWork'03 we presented work on the analysis of archaeological glass composi-tional data. Such data typically consist of geochemical compositions involving 10-12 variables and approximates completely compositional data if the main component, sil-ica, is included. We suggested that what has been termed 'crude' principal component analysis (PCA) of standardized data often identified interpretable pattern in the data more readily than analyses based on log-ratio transformed data (LRA). The fundamental problem is that, in LRA, minor oxides with high relative variation, that may not be structure carrying, can dominate an analysis and obscure pattern associated with variables present at higher absolute levels. We investigate this further using sub-compositional data relating to archaeological glasses found on Israeli sites. A simple model for glass-making is that it is based on a 'recipe' consisting of two 'ingredients', sand and a source of soda. Our analysis focuses on the sub-composition of components associated with the sand source. A 'crude' PCA of standardized data shows two clear compositional groups that can be interpreted in terms of different recipes being used at different periods, reflected in absolute differences in the composition. LRA analysis can be undertaken either by normalizing the data or defining a 'residual'. In either case, after some 'tuning', these groups are recovered. The results from the normalized LRA are differently interpreted as showing that the source of sand used to make the glass differed. These results are complementary. One relates to the recipe used. The other relates to the composition (and presumed sources) of one of the ingredients. It seems to be axiomatic in some expositions of LRA that statistical analysis of compositional data should focus on relative variation via the use of ratios. Our analysis suggests that absolute differences can also be informative.
Research Interests:
At CoDaWork'03 we presented work on the analysis of archaeological glass composi-tional data. Such data typically consist of geochemical compositions involving 10-12 variables and approximates completely compositional data if the main... more
At CoDaWork'03 we presented work on the analysis of archaeological glass composi-tional data. Such data typically consist of geochemical compositions involving 10-12 variables and approximates completely compositional data if the main component, sil-ica, is included. We suggested that what has been termed 'crude' principal component analysis (PCA) of standardized data often identified interpretable pattern in the data more readily than analyses based on log-ratio transformed data (LRA). The fundamental problem is that, in LRA, minor oxides with high relative variation, that may not be structure carrying, can dominate an analysis and obscure pattern associated with variables present at higher absolute levels. We investigate this further using sub-compositional data relating to archaeological glasses found on Israeli sites. A simple model for glass-making is that it is based on a 'recipe' consisting of two 'ingredients', sand and a source of soda. Our analysis focuses on the sub-composition of components associated with the sand source. A 'crude' PCA of standardized data shows two clear compositional groups that can be interpreted in terms of different recipes being used at different periods, reflected in absolute differences in the composition. LRA analysis can be undertaken either by normalizing the data or defining a 'residual'. In either case, after some 'tuning', these groups are recovered. The results from the normalized LRA are differently interpreted as showing that the source of sand used to make the glass differed. These results are complementary. One relates to the recipe used. The other relates to the composition (and presumed sources) of one of the ingredients. It seems to be axiomatic in some expositions of LRA that statistical analysis of compositional data should focus on relative variation via the use of ratios. Our analysis suggests that absolute differences can also be informative.
Research Interests:
Recent statistical work on approaches to analysing compositional data-where variables sum to a constant for each row of a data matrix-may encounter dijiculties when applied to data of the kind typically arising in scientijic archaeology.... more
Recent statistical work on approaches to analysing compositional data-where variables sum to a constant for each row of a data matrix-may encounter dijiculties when applied to data of the kind typically arising in scientijic archaeology. The reason is that results obtained may be unsatisfactory from a substantive viewpoint for identijiable technical reasons. This paper explores and illustrates some possible resolutions of the problem. A feature of the approach used is to analyse subsets of the variables on separate scales. A synthesis of the results obtained from separate analyses is essential and the use of multiple correspondence analysis for this purpose is illustrated.
Research Interests:
Principal component analysis (PCA) is a useful exploratory technique for investigating structure in multi-elemental analyses of archaeological artefacts. If an exhaustive chemical analysis is available so that the data are 'compositional'... more
Principal component analysis (PCA) is a useful exploratory technique for investigating structure in multi-elemental analyses of archaeological artefacts. If an exhaustive chemical analysis is available so that the data are 'compositional' in the sense of Aitchison (1986) (i.e. the compositional sum is 100%) then PCA of the correlation matrix (PCAC), as often used, may be inappropriate. Baxter (1989) drew attention to Aitchison's approach (PCAA) to these kinds of data. Despite the theoretical attractions of PCAA there are, as Stone (1987) anticipated, potential 'subject-based' objections to its use. These arise when some of the elements of a composition have a low presence, as may be typical for many kinds of archaeological analyses. A main aim of this paper is to undertake an empirical comparison of PCAC and PCAA on a range of analyses of glass data sets having different kinds of structure. In essence the problem is one of determining an appropriate transformation of the raw data prior to statistical analysis. The increasingly popular technique of correspondence analysis provides an alternative exploratory approach and has been advocated by Underhill and Peisach (1985). This method produces a graphical representation of the rows and columns of a data matrix of non-negative numbers. The row representation, which is of primary concern here, can be obtained by a PCA of suitably transformed data. The use of two forms of correspondence analysis, regarded as PCA of transformed data, is explored here in addition to PCAC and PCAA. The similarity of PCAC and correspondence analysis of the raw data (CAR) under certain conditions is noted. These conditions seem to apply, in practice, for data sets with compositions based on 10-12 elements/oxides. Correspondence analysis after dividing values by their element mean (CAM)-the approach used by Underhill and Peisach (1985)-gives results very similar to PCAA. The mathematical reasons for this similarity are discussed in Baxter et al. (1990). It is believed that the observations in the foregoing paragraph, arising from this empirical study, are new. A practical consequence is that CAM, but not PCAA, can be used with zeroes in the data and thus provides an alternative approach when PCAA is not directly applicable. It will be assumed that n objects have their chemical composition measured with respect t o p oxides/elements so that x,, is the proportion of the i'th object accounted for by thej'th oxide/element. It is further assumed, for the purposes of exposition, that the sum for each 29
Research Interests:
Samples from ore bodies, mined for copper in antiquity, can be characterized by measurements on three lead isotope ratios. Given sufficient samples, it is possible to estimate the lead isotope field -a three-dimensional construct - that... more
Samples from ore bodies, mined for copper in antiquity, can be characterized by measurements on three lead isotope ratios. Given sufficient samples, it is possible to estimate the lead isotope field -a three-dimensional construct - that characterizes the ore body. For the purposes of estimating the extent of a field, or assessing whether bronze artefacts could have been made using copper from a particular field, it is often assumed that fields have a trivariate normal distribution. Using recently published data, for which the sample sizes are larger than usual, this paper casts doubt on this assumption. A variety of tests of univariate normality are applied, both to the original lead isotope ratios and to transformations of them based on principal component analysis; the paper can be read as a case study in the use of tests of univariate normality for assessing multivariate normality. This is not an optimal approach, but is sufficient in the cases considered to suggest that fields are, in fact, `non-normal’ . A direct test of multivariate normality confirms this. Some implications for the use of lead isotope ratio data in archaeology are discussed.
The statistical analysis of lead isotope ratio data in archaeology has attracted considerable controversy, but one area of consensus seems to be that a minimum sample size of 20 is adequate for the satisfactory characterisation of a lead... more
The statistical analysis of lead isotope ratio data in archaeology has attracted considerable controversy, but one area of consensus seems to be that a minimum sample size of 20 is adequate for the satisfactory characterisation of a lead isotope field. The argument in the present paper is that this is too small. Twenty would be satisfactory if the assumption of normality sometimes used in analysing lead isotope was correct, but it is inadequate for checking this assumption or detecting non-normal structures within a field. Evidence based on both real and simulated data suggests that 40 may be a more realistic minimum, and even this is not always adequate. The consequences of incorrectly assuming normality, and alternative methods of analysis that do not involve this assumption, are investigated. 2000 Academic Press
NOTE (2015): The debate to which this paper contributed, on the appropriate statistical treatment of lead-isotope data, is probably no longer of other than 'historical' interest. The statistical methodology used may have an interest... more
NOTE (2015): The debate to which this paper contributed, on the appropriate statistical treatment of lead-isotope data, is probably no longer of other than 'historical' interest. The statistical methodology used may have an interest indepently of the context in which it was applied here.

                                  *******************************

There has been recent and extensive debate about the analysis and interpretation of lead isotope ratio data inarchaeology. This paper addresses the specific technical issue of whether data arising from lead isotope fields can be reasonably modelled by trivariate normal distributions. This assumption underpins much of the statistical analysis of such data. It is argued that the univariate and coordinate dependent approaches that have been used to test for normality need to be complemented with truly multivariate tests. Several such tests are described and applied to seven recently published data sets. The  results suggest that non-normality may be the rule rather than the exception. Some ofthe consequences of this are discussed.
Research Interests:
The paper discusses different approaches to data transformation when using principal component analysis. It was written a long time ago, but the material on the use of rank-transformations might be of interest. See the paper itself for a... more
The paper discusses different approaches to data transformation when using principal component analysis. It was written a long time ago, but the material on the use of rank-transformations might be of interest. See the paper itself for a proper summary.
A statistical re-analysis is undertaken of 118 inductively coupled plasma spectrometry analyses of Romano-British glass specimens found in excavations at Colchester. There are four vessel types present, some of which are associated with... more
A statistical re-analysis is undertaken of 118 inductively coupled plasma spectrometry analyses of Romano-British glass specimens found in excavations at Colchester. There are four vessel types present, some of which are associated with chronologically distinct periods. Previous research has suggested little difference in the mean composition of different types. The present paper shows that there are interesting differences in the variation of compositions within types, with some showing much greater compositional stability then others. Some possible models to explain this phenomenon are discussed.
The scientific analysis of ceramics often has the aim of identifying groups of similar artefacts. Much published work focuses on analysis of data derived from geochemical or mineralogical techniques. The former is more likely to be... more
The scientific analysis of ceramics often has the aim of identifying groups of similar artefacts. Much published work focuses on analysis of data derived from geochemical or mineralogical techniques. The former is more likely to be subjected to quantitative statistical analysis. This paper examines some approaches to the statistical analysis of data arising from both kinds of techniques, including ‘mixed-mode’ methods where both types of data are incorporated into analysis. The approaches are illustrated using data derived from 88 Late Bronze Age transport jars from Kommos, Crete. Results suggest that the mixed-mode approach can provide additional insight into the data.
In a recently published study of Romano-British colourless glass compositions, using inductively coupled plasma spectroscopy, 28 glasses from Colchester sampled in a previous study were resampled. This was done deliberately, with a view... more
In a recently published study of Romano-British colourless glass compositions, using inductively coupled plasma spectroscopy, 28 glasses from Colchester sampled in a previous study were resampled. This was done deliberately, with a view to examining the repeatability of results from sampling on different occasions. We report on our results here, developing in the process some simple statistical methodology that could be applied in similar situations. The potential for combining analyses undertaken at different times is discussed and illustrated.
Compositional data arise commonly in archaeometry, in the study of artefact compositions where the variables measured either sum to 100%, or can be viewed as a subset of such a set of variables. There has been debate in Archaeometry about... more
Compositional data arise commonly in archaeometry, in the study of artefact compositions where the variables measured either sum to 100%, or can be viewed as a subset of such a set of variables. There has been debate in Archaeometry about the appropriate way to analyse such data statistically, which amounts to argument about how the data should be transformed prior to statistical analysis. This paper reviews aspects of the debate and illustrates, using both simulated and real data, that what has been proposed as the ‘correct’ theoretical approach—log-ratio analysis—does not always work well. The reasons for this are discussed.
Principal component, cluster and discriminant analysis are multivariate statistical methods that are widely used in archaeometry. They are examples of what are known in some literatures as unsupervised and supervised learning methods.... more
Principal component, cluster and discriminant analysis are multivariate statistical methods that are widely used in archaeometry. They are examples of what are known in some literatures as unsupervised and supervised learning methods. Over the past 20 years or so, a wide variety of other learning methods have been developed that take advantage of modern computing power and, in some cases, have been designed to handle data sets more complex than those often used in archaeometric data analysis. To date, these methods have had little impact on archaeometry. This paper reviews, in a largely non-technical manner, the ideas behind these newer methods; illustrates their use on a variety of data sets; and attempts to assess their potential for future archaeometric use.
Late Antique coarse cooking wares and painted fine wares found at Herdonia (second half of the fourth century to mid-fifth century ad) and Canusium (late sixth century to early seventh century ad) have been chemically and mineralogically... more
Late Antique coarse cooking wares and painted fine wares found at Herdonia (second half of the fourth century to mid-fifth century ad) and Canusium (late sixth century to early seventh century ad) have been chemically and mineralogically characterized. A total of 74 samples (40 of coarse ware and 34 of fine painted ware) was investigated through optical microscopy, scanning electron microscopy, X-ray powder diffraction, inductively coupled plasma optical emission spectrometry, inductively coupled plasma mass spectrometry, neutron activation analysis and X-ray fluorescence. A new statistical method, namely the classification tree methodology, was used for the treatment of geochemical data. The characterization of the Herdonia and Canusium assemblages was combined with a review of earlier results obtained for San Giusto and Posta Crusta, in order to get an insight on Late Antique ceramic trades in northern Apulia. It appears possible to reconstruct a production pattern organized at multiple production sites, both rural and urban, that exploited similar raw material deposits, specialized in certain productions, and commercialized products at different geographical scales. Imports from outside northern Apulia may be identified for coarse wares. A likely area of production is difficult to establish; however, the northern Adriatic coast and the area of Greece may be suggested.
This is a modified version (as of 2014) of a paper originally published in 2004. It was based on, and partly formed the basis for, two talks given at a summer school in Varenna, Italy in 2003, aimed at younger scholars for whom a... more
This is a modified version (as of 2014) of a paper originally published in 2004. It was based on, and partly formed the basis for, two talks given at a summer school in Varenna, Italy in 2003, aimed at younger scholars for whom a knowledge of this kind of statistical methodology might prove useful. As such, no originality is claimed for the material presented -- I'd written on much of this before and have expanded on some of the ideas in later work. Some effort was, however, made to focus on the idea of `distance', and the published version is more `mathematical' than the presentations were. Most of the original text has been retained, but some additional later references relevant to the original discussion have been added.
In artefact compositional studies, the selection of variables to use in an analysis is unavoidable. Given this ubiquity , surprisingly little attention has been paid to ways in which variables might be selected. After arguing the case for... more
In artefact compositional studies, the selection of variables to use in an analysis is unavoidable. Given this ubiquity , surprisingly little attention has been paid to ways in which variables might be selected. After arguing the case for the importance of variable selection, two systematic approaches to making a choice, which have had little or no application in archaeometry, are discussed and illustrated. One, based on the use of principal components, is appropriate if the structure is not known. The other, based on the use of classification trees,  is appropriate when there are know or assumed groups in the data.
Cluster analysis is the most widely used multivariate technique in archaeology, with the majority of applications being exploratory in nature. Model-based methods of clustering have their advocates, but have had little application to... more
Cluster analysis is the most widely used multivariate technique in archaeology, with the majority of applications being exploratory in nature. Model-based methods of clustering have their advocates, but have had little application to archaeometric data. The paper investigates two such methods. They have potential advantages over exploratory techniques, if successful. Mixture maximum-likelihood worked well using low-dimensional lead isotope data, but had problems coping with higher-dimensional ceramic compositional data. For our most challenging example, classification maximum-likelihood performed comparably with more standard methods, but we find no evidence to suggest it should supplant these.
The approach to the analysis of compositional data involving log-ratio transformation of the data has not been generally adopted by researchers wishing to analyse such data. In the context of exploratory methods of multivariate analysis,... more
The approach to the analysis of compositional data involving log-ratio transformation of the data has not been generally adopted by researchers wishing to analyse such data. In the context of exploratory methods of multivariate analysis, such as principal components analysis, where the hope is to identify (cluster) structure in the data, this may be because traditional methods can produce more interpretable results than the log-ratio approach. After illustrating this with an example, circumstances under which
the log-ratio approach performs poorly when traditional approaches work well are identified. Logratio analysis can be dominated by variables having low absolute presence and high relative variation that do not contribute to, and can obscure, structure in the data. Traditional methods can detectcertain kinds of structure in the data that correspond to structure on a ratio scale, after a suitable redefinition of the composition. Since traditional methods often detect such structure more directly
than log-ratio analysis it can be concluded that claims that the traditional analysis is “inappropriate” or “meaningless” are exaggerated. This conclusion is based on empirical experience rather than theoretical concerns. The arguments are illustrated using compositional data for alkaline glasses, but have more general application.
The paper can be thought of as the outcome of a dialog between MJB and RGVH that took place in the mid-2000s, which resulted in two poster presentations at a conference, the details of which I regret I have forgotten. I think it was... more
The paper can be thought of as the outcome of a dialog between MJB and RGVH that took place in the mid-2000s, which resulted in two poster presentations at a conference, the details of which I regret I have forgotten. I think it was Quebec, but I didn’t attend. Other than modifications to enhance continuity the text is left as it originally was, with the two posters being merged. Something that emerged in the collaboration was what migh be termed the merits of a ‘bivariate-splitting approach to data analysis for problems of this kind. Some might regard this as preferable to, and more comprehensible than, what they perceive as ‘complex’ multivariate analysis. Hancock et al. (2008, Archaeometry 50, 710-726) is an interesting exercise in applying bivariate-splitting to a large and complex ceramic compositional data set.
Model-based methods for clustering artefacts, given their chemical composition, often assume sampling from a mixture of multivariate normal distributions and/or make explicit assumptions about the way a composition is formed. It is argued... more
Model-based methods for clustering artefacts, given their chemical composition, often assume sampling from a mixture of multivariate normal distributions and/or make explicit assumptions about the way a composition is formed. It is argued that, analysed within a modelling framework, several important and apparently competing methodologies are more similar than would initially appear. The opportunity is taken to note that models for populations are often not compatible with models for compositions, and that dilution correction - which can be accomplished in a variety of ways – can be interpreted as an attempt to resolve this problem.
The paper explores aspects of the careers of actresses who appeared in silent films. An ultimate aim is to examine the effect that the coming of sound film had on these careers. There undoubtedly was an effect but the paper suggests it... more
The paper explores aspects of the careers of actresses who appeared in silent films. An ultimate aim is to examine the effect that the coming of sound film had on these careers. There undoubtedly was an effect but the paper suggests it has possibly been exaggerated. Much of the `popular' literature, at least, concentrates on the `stars' who are not typical. Most of the actresses were not stars, or not for very long, and had relatively short careers as far as film went, and had relatively long and, one hopes, otherwise normal lives. Some biographers tend to emote about the `tragic' lives that Hollywood actresses lived. In fact we don't know much about the majority of these lives. Based on a sample of about 1700 actresses for which basic information is available some of the `myths' that have accumulated about actresses and Hollywood in the 1920s are queried.
Research Interests:
Redfern (2012) disputes the idea that the lognormal distribution is a suitable parametric model for the shot length distribution of films. The force of his arguments against the idea is diminished by problems in his exposition, the... more
Redfern (2012) disputes the idea that the lognormal distribution
is a suitable parametric model for the shot length distribution of films.
The force of his arguments against the idea is diminished by problems in his
exposition, the most serious is the manner in which hypothesis testing is deployed.
The application fails to deal adequately with the effect sample size
has on p-values and this compromises much of the analysis. There is, nevertheless,
strong evidence of a fairly systematic departure from lognormality,
manifest in the fact that distributions mostly remain positively skewed after
log-transformation. This is not recognised in the paper, and thus not exploited.
The present paper shows that after a second transformation normality can be
achieved for well over half the films, which are thus distributionally regular in
this sense. Some suggestions as to why lognormality, or any other form of distributional
regularity, might be of interest are offered at the start of the paper,
which concludes with an illustration of how the establishment of a distributional
‘norm’ might then be exploited.
In 1926 D.W. Griffith, in 'Pace in the Movies', issued what might be thought of as a 'prescription' for structuring melodramas of the kind for which he was famous. This can be interpreted in terms of the cutting-pattern to be expected.... more
In 1926 D.W. Griffith, in 'Pace in the Movies', issued what might be thought of as a 'prescription' for structuring melodramas of the kind for which he was famous. This can be interpreted in terms of the cutting-pattern to be expected. The paper develops a statistical method of investigating cutting-pattern. The 'prescription' would appear to have been applied to his major feature films between 1914 and 1921, but is much less evident in his feature films after this period, of in his earlier work, and shorter films, at Biograph between 1908--1913.
Research Interests:
The quantitative study of editing patterns in silent film, for the period 1908--1915, has often relied on the average shot length (ASL). It is, for example, clear that the rapidity of D.W Griffith's cutting increased over the period 1908... more
The quantitative study of editing patterns in silent film, for the period 1908--1915, has often relied on the average shot length (ASL). It is, for example, clear that the rapidity of D.W Griffith's cutting increased over the period 1908 to 1913 as measured by the ASL. Griffith was a mentor of Mack Sennett whose early directorial work at Biograph emulated Griffith in terms of cutting-rates. When Sennett moved to Keystone there was a marked increase in the cutting-rates employed in the films he directed - this paper suggests that it is possible to trace evolution in his cutting-rates over the period 1912-1914. Sennett was, in turn, a mentor of Charlie Chaplin whose directorial efforts at Keystone have been argued to represent a reaction against the fast cutting expected by that studio. A comparison of the ASLs of the films of Sennett and Chaplin supports this argument, but in certain respects the ASL is a blunt tool for comparative purposes, and more nuanced quantitative analysis is possible. This paper examines some graphical approaches based on the use of cumulative frequency distributions of shot-lengths (SLs) that can be more informative than the use of the ASL alone. Among the graphical techniques illustraed are the averaging of cumulative SLs across bodies of films, using a logarithmic scale to highlight differences, and correspondence analysis of the cumulative SLs to investigatee patterns of difference between individual films. Apart from the suggestion of evolution in Sennett's cutting-rates from late-1912 to 1914 there is greater complexity in Chaplin's practices at Keystone than can be summarized using the ASL. His practice evolved away from the `style' employed by other Keystone directors, in the direction of making greater use of shots of longer duration. However -- and this cannot be inferred from the ASL -- he also made greater use of shots of short duration. That is, to say that he ended his career at Keystone cutting more slowly than Sennett -- based on analysis of ASLs -- oversimplifies things; he was cutting with greater variety as well.
Research Interests:
This paper is a further contribution to the debate about the interpretation of the results in an important paper published by Cutting et al.. (2010). Not all the protagonists in the debate believe that claims about evolution in Hollywood... more
This paper is a further contribution to the debate about the interpretation of the results in an important paper published by Cutting et al.. (2010). Not all the protagonists in the debate believe that claims about evolution in Hollywood film made in the paper can be sustained. The issues are covered in detail in the debate that has taken place on the Cinemetrics website and will not be rehearsed in any detail. The paper elaborates on some points raised in Baxter (2013), but the main intention is to stand back from the specifics and muse on some more general issues. These concern aspects of cinemetrics data analysis associated with the `explosion' of data that the Cinemetrics website has made possible, and the increased complexity of statistical methods that have been used to analyze such data. Particular attention is paid to the contrast between deductive and inductive modes of data analysis, and the use of statistical models. Arguably, Cutting  et al. (2010) engage with both modes of analysis, and their conclusions are heavily dependent on models to which alternatives exist, and which may be wrong. This lies, I think, at the heart of the debate.
Cutting, Delong and Nothelfer (2010) use statistical methods to investigate the evolution of shot-length patterns in popular film. They argue, using what they call a `modified autoregressive index' (mAR), that patterns are becoming... more
Cutting, Delong and Nothelfer (2010) use statistical methods to investigate the evolution of shot-length patterns in popular film. They argue, using what they call a `modified autoregressive index' (mAR), that patterns are becoming increasingly clustered and also  evolving towards 1/f structure, a pattern described in later publication as 'like those that our minds may naturally generate'. This paper shows that the interpretation of the mAR index is wrong. It is also shown that that the results concerning 1/f patterns can be interpreted in an equally plausible and much less 'exciting' way. That is, although there are undoubtedly interesting temporal patterns in the shot length structure, they can't be interpreted in terms of the 'evolution of Hollywood film' in the sense intended in the original paper.
Two graphical methods of exploring aspects of cutting structure in film using shot-length data are presented. The first of these converts the cumulative frequencies of shot-lengths to tabular form that can be displayed in various ways,... more
Two graphical methods of exploring aspects of cutting structure in film using shot-length data are presented. The first of these converts the cumulative frequencies of shot-lengths to tabular form that can be displayed in various ways, including the use of correspondence analysis. The method is used to investigate the differences in cutting rates used by Mack Sennett and Charlie Chaplin while directing films for the Keystone Company in 1912–14. Results suggest that previous analyses based on the average shot-length may over-simplify the contrast, and some evidence for the evolution of Sennett’s style in 1913 is also suggested. The methods used, like much of the literature, do not take into account the time-series structure of the shot-lengths. In the second approach presented this is allowed for by smoothing the time-series of shot-lengths using non-parametric regression. A computer-intensive way of presenting results graphically is developed, that does not commit the analyst to a particular choice of smoothing level, and is used to investigate an ‘hypothesis’ about D.W. Griffith’s cutting style suggested by a ‘prescription’ for pacing implied in comments made in an article published under his name in the 1920s. It is shown that his major feature films between 1914–21 conform to the prescription, but there is not much evidence for it in his earlier and later work.
The statistical modelling of athletics data, particularly world record and Olympic data, has attracted considerable interest. It should be an obvious fact, often neglected, that results - for example, predictions of limits to performance... more
The statistical modelling of athletics data, particularly world record and Olympic data, has attracted considerable interest. It should be an obvious fact, often neglected, that results - for example, predictions of limits to performance - can be sensitive to the models and subsets of data used. The issues involved are addressed through two case studies, the lessons of which have more general implications. The first shows that different models, having different interpretations, can fit data equally well. In particular the claims of Carbone \& Savaglio (2001) that their analysis of speed against time for world records shows a `sharp' distinction between anaerobic and aerobic for the first time `in physical terms' are questioned. A simpler model that does not entail such a distinction fits the data as well. The second study, using progressive best marathon times, shows that the variation in predictions of ultimate limits is so sensitive to the model and subset of data used that any similar exercise that resorts to a single model must be viewed with caution.
The paper examines patterns in swimming performance since World War 2, up to the Beijing Olympics in 2008. World-record and Olympic data are used, with particular attention paid to whether differences in male and female performance are... more
The paper examines patterns in swimming performance since World War 2, up to the Beijing Olympics in 2008. World-record and Olympic data are used, with particular attention paid to whether differences in male and female performance are stabilizing.
This isn't exactly an abstract, but what you have is a powerpoint presentation used for a conference on Cinemetrics in March 2014. To make sense of it (I don't have a written text of the talk) you could look at the talk that was videoed,... more
This isn't exactly an abstract, but what you have is a powerpoint presentation used for a conference on Cinemetrics in March 2014. To make sense of it (I don't have a written text of the talk) you could look at the talk that was videoed, along with the others, at http://neubauercollegium.uchicago.edu/events/uc/Cinemetrics-Conference/.