Skip to main content

    Paul Garthwaite

    Research Interests:
    We suppose a case is to be compared with controls on the basis of a test that gives a single discrete score. The score of the case may tie with the scores of one or more controls. However, scores relate to an underlying quantity of... more
    We suppose a case is to be compared with controls on the basis of a test that gives a single discrete score. The score of the case may tie with the scores of one or more controls. However, scores relate to an underlying quantity of interest that is continuous and so an observed score can be treated as the rounded value of an underlying continuous score. This makes it reasonable to break ties. This paper addresses the problem of forming a confidence interval for the proportion of controls that have a lower underlying score than the case. In the absence of ties, this is the standard task of making inferences about a binomial proportion and many methods for forming confidence intervals have been proposed. We give a general procedure to extend these methods to handle ties, under the assumption that ties may be broken at random. Properties of the procedure are given and an example examines its performance when it is used to extend several methods. A real example shows that an estimated c...
    In this paper, the difference between two correlated t variables is divided by a function of their sample correlation and the distribution of the resulting quantity is examined. Functions of the sample correlation are found for which this... more
    In this paper, the difference between two correlated t variables is divided by a function of their sample correlation and the distribution of the resulting quantity is examined. Functions of the sample correlation are found for which this quantity is approximately pivotal and has a t distribution, asymptotically. Simulations show that the asymptotic results hold well for small sample sizes.
    Abstract: During March 2003, Autosub, an autonomous underwater vehicle (AUV) operated by the UK National Oceanography Centre, Southampton, was deployed under sea ice north of Thurston Island, Amundsen Sea, Antarctica (at∼ 71 S, 100 W).... more
    Abstract: During March 2003, Autosub, an autonomous underwater vehicle (AUV) operated by the UK National Oceanography Centre, Southampton, was deployed under sea ice north of Thurston Island, Amundsen Sea, Antarctica (at∼ 71 S, 100 W). The vehicle was fitted with an upward-looking 300 kHz acoustic Doppler current profiler (ADCP) to provide current velocity above the AUV. The ADCP also recorded ranges to the ocean-ice interface. Such data can be used to derive sea-ice draft by using a number of novel processing steps such ...
    ... 1. INTRODUCTION Bayesian statistical methods provide a formal mechanism for taking into account prior knowledge, meaning ... This distribution must satisfy the usual laws of probability so that, for example, any variance ... as a... more
    ... 1. INTRODUCTION Bayesian statistical methods provide a formal mechanism for taking into account prior knowledge, meaning ... This distribution must satisfy the usual laws of probability so that, for example, any variance ... as a design point and the latter as the response. ...
    In some experiments the fixed interval method has proved a better means of quantifying subjective probabilities than the variable interval method, while other experiments have found the converse. This discrepancy in findings is due, at... more
    In some experiments the fixed interval method has proved a better means of quantifying subjective probabilities than the variable interval method, while other experiments have found the converse. This discrepancy in findings is due, at least in part, to variation in task characteristics. The present work examines various forms of both methods and compares their performances through calibration and scoring
    The mean response of body protein accretion in growing animals to their amino acid intake is sometimes described by a rectilinear ("broken-line") model and sometimes by a curvilinear model. The re sponse of a population may be... more
    The mean response of body protein accretion in growing animals to their amino acid intake is sometimes described by a rectilinear ("broken-line") model and sometimes by a curvilinear model. The re sponse of a population may be curvilinearas a result of averaging individualrectilinear responses or because in dividual responses are themselves curvilinear. This ex periment was undertaken to distinguish these
    Meta-analysis is now a standard statistical tool for assessing the overall strength and interesting features of a relationship, on the basis of multiple independent studies. There is, however, recent acknowledgement of the fact that in... more
    Meta-analysis is now a standard statistical tool for assessing the overall strength and interesting features of a relationship, on the basis of multiple independent studies. There is, however, recent acknowledgement of the fact that in many applications responses are rarely uniquely determined. Hence there has been some change of focus from a single response to the analysis of multiple outcomes. In this paper we propose and evaluate three Bayesian multivariate meta-analysis models: two multivariate analogues of the traditional univariate random effects models which make different assumptions about the relationships between studies and estimates, and a multivariate random effects model which is a Bayesian adaptation of the mixed model approach. Our preferred method is then illustrated through an analysis of a new data set on parental smoking and two health outcomes (asthma and lower respiratory disease) in children.
    Univariate partial least squares (PLS) is a method of modeling relationships between a Y variable and other explanatory vanables. It may be used with any number of explanatory variables, even far more than the number of observations. A... more
    Univariate partial least squares (PLS) is a method of modeling relationships between a Y variable and other explanatory vanables. It may be used with any number of explanatory variables, even far more than the number of observations. A simple interpretation is given that shows the method to be a straightforward and reasonable way of forming prediction equations. Its relationship to multivariate PLS, in which there are two or more Y variables, is examined, and an example is given in which it is compared by simulation with other methods of forming prediction equations. With univanate PLS, linear combinations of the explanatory variables are formedsequentially and related to Y by ordinary least squares regression. It is shown that these linear combinations, here called components, may be viewed as weighted averages of predictors, where each predictor holds the residual information in an explanatory vanable that is not contained in earlier components, and the quantity to be predicted is the vector of residuals from regressing Y against earlier components. A similar strategy is shown to underlie multivanate PLS, except that the quantity to be predicted is a weighted average of the residuals from separately regressing each Y variable against earlier components. This clarifies the differences between univariate and multivanate PLS, and it is argued that in most situations, the univanate method is likely to give the better prediction equations. In the example using simulation, univariate PLS is compared with four other methods of forming prediction equations: ordinary least squares, forward variable selection, principal components regression, and a Stein shnnkage method. Results suggest that PLS is a useful method for forming prediction equations when there are a large number of explanatory vanables, particularly when the random error vanance is large.
    Abstract We have known for some time that content words have “bursty” distributions in text (eg Church 00). In contrast, much of the litera-ture assumes that function words are uninfor-mative because they distribute homogeneously (eg Katz... more
    Abstract We have known for some time that content words have “bursty” distributions in text (eg Church 00). In contrast, much of the litera-ture assumes that function words are uninfor-mative because they distribute homogeneously (eg Katz 96). In this paper based on two sets of ...
    A variety of methods of eliciting a prior distribution for a multivariate normal (MVN) distribution have recently been proposed. This paper reports an experiment in which 16 meteorologists used the methods to quantify their opinions about... more
    A variety of methods of eliciting a prior distribution for a multivariate normal (MVN) distribution have recently been proposed. This paper reports an experiment in which 16 meteorologists used the methods to quantify their opinions about climatology variables. Our results compare prior models and show, in particular, that it can be better to assume the mean and variance of an
    ABSTRACT
    This edition of Statistical Inference is similar in nature to the original from 1995, see Zbl 0862.62002, with additional material on estimating equations, local likelihood and fiducial intervals. The sections on Gibbs sampling,... more
    This edition of Statistical Inference is similar in nature to the original from 1995, see Zbl 0862.62002, with additional material on estimating equations, local likelihood and fiducial intervals. The sections on Gibbs sampling, generalized linear models, tests based on ranks and the bootstrap have been substantially revised. The book is easy to read and is designed to be an advanced level textbook for senior undergraduate students. There is a collection of exercises associated with each chapter. The book is also a useful, comprehensive reference for practising statisticians. Contents: 1. Introduction; 2. Properties of estimators; 3. Maximum likelihood and other methods of estimation; 4. Hypothesis testing; 5. Interval estimation; 6. The decision-theory approach to inference; 7. Bayesian inference; 8. Nonparametric and robust inference; 9. Computationally intensive methods; 10. Generalized linear models.
    In this paper, we propose to investigate style through mod- eling burstiness in the occurrence patterns of terms in differ- ent collections. We set out a fine grained model that looks at gaps between the successive occurrence of the term... more
    In this paper, we propose to investigate style through mod- eling burstiness in the occurrence patterns of terms in differ- ent collections. We set out a fine grained model that looks at gaps between the successive occurrence of the term using a mixture of exponential distributions. A Bayesian frame- work allows flexibility in fitting the model. The parameter estimates are
    The statistical NLP and IR literatures tend to make a "homogeneity assumption" about the distribution of terms, either by adopting a "bag of words" model, or in their treatment of function words. In this paper we... more
    The statistical NLP and IR literatures tend to make a "homogeneity assumption" about the distribution of terms, either by adopting a "bag of words" model, or in their treatment of function words. In this paper we develop a notion of homogeneity detection to a level of statistical significance, and conduct a series of experiments on different datasets, to show that
    Abstract We have known for some time that content words have “bursty” distributions in text (eg Church 00). In contrast, much of the litera-ture assumes that function words are uninfor-mative because they distribute homogeneously (eg Katz... more
    Abstract We have known for some time that content words have “bursty” distributions in text (eg Church 00). In contrast, much of the litera-ture assumes that function words are uninfor-mative because they distribute homogeneously (eg Katz 96). In this paper based on two sets of ...
    Univariate partial least squares (PLS) is a method of modeling relationships between a Y variable and other explanatory vanables. It may be used with any number of explanatory variables, even far more than the number of observations. A... more
    Univariate partial least squares (PLS) is a method of modeling relationships between a Y variable and other explanatory vanables. It may be used with any number of explanatory variables, even far more than the number of observations. A simple interpretation is given that shows the method to be a straightforward and reasonable way of forming prediction equations. Its relationship to multivariate PLS, in which there are two or more Y variables, is examined, and an example is given in which it is compared by simulation with other methods of forming prediction equations. With univanate PLS, linear combinations of the explanatory variables are formedsequentially and related to Y by ordinary least squares regression. It is shown that these linear combinations, here called components, may be viewed as weighted averages of predictors, where each predictor holds the residual information in an explanatory vanable that is not contained in earlier components, and the quantity to be predicted is...
    ABSTRACT
    ABSTRACT
    Meta-analysis is now a standard statistical tool for assessing the overall strength and interesting features of a relationship, on the basis of multiple independent studies. There is, however, recent acknowledgement of the fact that in... more
    Meta-analysis is now a standard statistical tool for assessing the overall strength and interesting features of a relationship, on the basis of multiple independent studies. There is, however, recent acknowledgement of the fact that in many applications responses are rarely uniquely determined. Hence there has been some change of focus from a single response to the analysis of multiple outcomes. In this paper we propose and evaluate three Bayesian multivariate meta-analysis models: two multivariate analogues of the traditional univariate random effects models which make different assumptions about the relationships between studies and estimates, and a multivariate random effects model which is a Bayesian adaptation of the mixed model approach. Our preferred method is then illustrated through an analysis of a new data set on parental smoking and two health outcomes (asthma and lower respiratory disease) in children.
    Univariate partial least squares (PLS) is a method of modeling relationships between a Y variable and other explanatory variables. It may be used with any number of explanatory variables, even far more than the number of observations. A... more
    Univariate partial least squares (PLS) is a method of modeling relationships between a Y variable and other explanatory variables. It may be used with any number of explanatory variables, even far more than the number of observations. A simple interpretation is given that shows the method to be a straightforward and reasonable way of forming prediction equations. Its relationship to multivariate PLS, in which there are two or more Y variables, is examined, and an example is given in which it is compared by simulation with other methods of forming prediction equations. With univariate PLS, linear combinations of the explanatory variables are formed sequentially and related to Y by ordinary least squares regression. It is shown that these linear combinations, here called components, may be viewed as weighted averages of predictors, where each predictor holds the residual information in an explanatory variable that is not contained in earlier components, and the quantity to be predicted is the vector of residuals from regressing Y against earlier components. A similar strategy is shown to underlie multivariate PLS, except that the quantity to be predicted is a weighted average of the residuals from separately regressing each Y variable against earlier components. This clarifies the differences between univariate and multivariate PLS, and it is argued that in most situations, the univariate method is likely to give the better prediction equations. In the example using simulation, univariate PLS is compared with four other methods of forming prediction equations: ordinary least squares, forward variable selection, principal components regression, and a Stein shrinkage method. Results suggest that PLS is a useful method for forming prediction equations when there are a large number of explanatory variables, particularly when the random error variance is large.

    And 117 more