0% found this document useful (0 votes)

12 views10 pages

Modelling Lecture 5

This document introduces Generalized Linear Models (GLMs), which extend linear regression concepts to accommodate non-normal and discrete response variables. It outlines the structure of GLMs, including the random component, systematic component, and link function, while emphasizing the importance of model fit and hypothesis testing using deviance statistics. The document also discusses model checking techniques and the use of residuals to evaluate the adequacy of fitted models.

Uploaded by

melophinegwaro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views10 pages

Modelling Lecture 5

Uploaded by

melophinegwaro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

Introduction to Generalized Linear

Models

I. Motivation
In this lecture we extend the ideas of linear regression to the more general idea of a
generalized linear model (GLM). The essence of linear models is that the response
variable is continuous and normally distributed: here we relax these assumptions and
consider cases where the response variable is non-normal and in particular has a
discrete distribution. Although these models are more general than linear models,
nearly all of the techniques for testing hypotheses regarding the regression coefficients,
and checking the assumptions of the model apply directly to a glm. In addition, a
linear model is simply a special type of a generalised linear model and thus all of the
discussion below applies equally to linear models.

In Lecture 1 we saw that a typical statistical model can be expressed as an equation that
equates the mean(s) of the response variable(s) to some function of a linear
combination of the explanatory variables:

E [Y X = x ] = η ( β 0 + β 1 X 1 + L + β p X p ) = η[ LC ( X ; β )] , (1)

In equation (1), the form of the function η( ) is known, as are Y and X (the latter for a
particular choice of explanatory variables). However, the parameters of the model,
β = ( β 0 , β 1 ,K, β p ) , are not known and must be estimated. The simple linear model is a
special case of the above in which the function η() is the identity function.

Generalized linear models extend the ideas underlying the simple linear model

(
E[Y | X ] = β 0 + β 1 X 1 + β 2 X 2 + L + β p X p , Y ~ N µ , σ 2 ) (2)

where the Yi are independent, to the following more general situations:

1. Response variables can follow distributions other than the Normal distribution.
They can be discrete, or logical (one of two categories).

2. The relationship between the response and the explanatory variables does not
have to take on the simple linear form above.

IAUL & D EPARTMENT OF S TATISTICS P AGE

1
I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

GLMs are based on the exponential family of distributions that shares many of the
desirable statistical properties of the Normal distribution. Consider a single random
variable Y whose probability distribution depends only on a single parameter θ . The
distribution of Y belongs to the exponential family if it can be written in the form

f ( y,θ ) = exp[{yθ − b(θ )} a(φ ) + c( y, φ )] (3)

where a, b and c are known functions. Furthermore, φ is a scale parameter that is treated
as a nuisance parameter if it is unknown. If φ is known, this is an exponential-family
model with canonical parameter θ . Some well-known distributions that belong to the
exponential family of distributions include the Normal, exponential, Poisson and
binomial distributions. For example, consider the discrete random variable Y that
follows a Poisson distribution with parameter λ . The probability function for Y is

λ y e −λ
f ( y, λ ) =
y!

where y takes the values 0, 1, 2, K . This can be rewritten as

f ( y, λ ) = exp( y log λ − λ − log y!) = exp( yθ − exp(θ ) − log y!) = f ( y,θ ) .

This is exactly the form in (3) with θ = log(λ ) , φ = 1 , a (φ ) = 1 , b(θ ) = exp(θ ) and
c( y,θ ) = − log y! . Similarly, all the other distributions in the exponential family can be
rewritten in form (2).

In the case of GLMs we require an extension of numerical estimation methods to

estimate the parameters β from the linear model in (2), to a more general situation
where there is some non-linear function, g, relating E (Yi ) = µ i to the linear component
x ′ β , that is
i

g (µ i ) = x i ′ β

where g is called the link function. The estimation of the parameters is typically based
on maximum likelihood. Although explicit mathematical expressions can be found for
the estimators in some special cases, numerical optimisation methods are usually
needed. These methods are included in most modern statistical software packages. It
is not the aim of this course to go into any detail on estimation, since we will focus on
the application of these models rather than their estimation.

GLMs are extensively used in the analysis of binary data (e.g. logistic regression) and
count data (e.g. Poisson and log-linear models). We will consider some of these
applications in the following lecture. In this lecture we will introduce the basic GLM
setup and assumptions.

IAUL & D EPARTMENT OF S TATISTICS P AGE

2
I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

II. Basic GLM setup and assumptions

We will define the generalized linear model in terms of a set of independent random
variables Y1 , Y2 ,K , YN each with a distribution from the exponential family of
distributions. The generalized linear model has three components:

1. The random component: the response variables, Y1 , Y2 , K , YN , are assumed to

share the same distribution from the exponential family of distributions
introduced in the previous section, with E (Yi ) = µ i and constant variance σ 2 .
This part describes how the response variable is distributed.

2. The systematic component: covariates (explanatory variables) x1 , x 2 , K, x p produce

a linear predictor η given by
p
η = ∑x jβ j
i =1

3. The link function, g , between the random and systematic components. It

describes how the covariates are related to the random component, i.e.
η i = g (µ i ) , where µ i = E (Yi ) and can be any monotone differentiable function.

An important aspect of generalized linear models that we need to keep in mind is that
they assume independent (or at least uncorrelated) observations. A second important
assumption is that there is a single error term in the model. This corresponds to
Assumption 1 for the linear model, namely, that the only error in the model has to do
with the response variable: we will assume that the X variables are measured without
error. In the case of generalized linear models we no longer assume constant variance
of the residuals, although we still have to know how the variance depends on the
mean. The variance function V (µ i ) relates the variance of Yi is related to its mean µ i .
The form of this function is determined by the distribution that is assumed.

III. Goodness of fit and comparing models

Overview
A very important aspect of generalized models, and indeed all statistical models
(Lectures 1-4), is to evaluate the relevance of our model for our data and how well it fits
the data. In statistical terms this is referred to as goodness of fit. We are also interested
in comparing different models and selecting a model that is reasonably simple, but that
provides a good fit. This involves finding a balance between improving the fit on one
side without unnecessary increasing the complexity of the model on the other. In a
statistical modelling framework we perform hypothesis tests to compare how well two
related models fit the data. In the generalized linear model framework, the two models
we compare should have the same probability distribution and link function, but they

IAUL & D EPARTMENT OF S TATISTICS P AGE

3
I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

can differ with regards to the number of parameters in the systematic component of the
model. The simpler model relating to the null hypothesis H 0 is therefore a special case
of the other more general model. If the simpler model fits the data as well as the more
general model it is retained on the grounds of parsimony and H 0 cannot be rejected. If
the more general model provides a significantly better fit than the simple model, then
H 0 is rejected in favour of the alternative hypothesis H 1 , which corresponds to the
more general model. In order to make these comparisons we need goodness of fit
statistics to describe how well the models fit. These statistics can be based on any of a
number of criteria such as the maximum value of the log-likelihood function, the
minimum value of the sum of squares criterion or a composite statistic based on the
residuals. If f Y ( y,θ ) is the density function for a random variable Y given the
parameter θ , then the log-likelihood based on a set of independent observations of Y ,
y1 , y 2 ,K , y n , is then defined as

l (µ , y ) = ∑ log f i ( y i ,θ i )
i

where µ = (µ1 , µ 2 , K, µ n ) . It is important to note the subtle shift in emphasis from the
density function. In the density function f ( y,θ ) is considered as a function in y for
fixed θ , whereas the log-likelihood is primarily considered as a function of θ for the
particular data observed ( y1 , y 2 ,K , y n ).

In order to test the hypotheses above, sampling distributions of the goodness of fit
statistics are required. In the following subsection we consider one such goodness of fit
criterion, the deviance, in a bit more detail.

Finally, we will say something about the choice of scale for the analysis. It is an
important aspect of the model selection process, although scaling problems are
considerably reduced in the generalized linear model setup. The normality and
constant variance assumption of the linear regression model is for instance no longer a
requirement. The choice of scale is largely dependent on the purpose for which the
scale will be used. It is also important to keep in mind that no single scale will
simultaneously produce all the desired properties.

The Deviance
One way of assessing the adequacy of a model is to compare it with a more general
model with the maximum number of parameters that can be estimated. It is referred to
as the saturated model. In the saturated model there is basically one parameter per
observation. The deviance assesses the goodness of fit for the model by looking at the
difference between the log-likelihood functions of the saturated model and the model
( ) ( )
under investigation, i.e. l b sat , y − l b, y . Here b sat denotes the maximum likelihood
estimator of the parameter vector of the saturated model, β sat , and b is the maximum
likelihood estimator of the parameters of the model under investigation, β . The

IAUL & D EPARTMENT OF S TATISTICS P AGE

4
I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

maximum likelihood estimator is the estimator that maximises the likelihood function.
The deviance is defined as

{( ) ( )}
D = 2 l b sat , y − l b, y .

A small deviance implies a good fit. The sampling distribution of the deviance is
approximately χ 2 (m − p,υ ) , where ν is the non-centrality parameter. The deviance has
an exact χ 2 distribution if the response variables Yi are normally distributed. In this
case, however, D depends on var(Yi ) = σ 2 which, in practice, is usually unknown. This
prevents the direct use of the deviance as a goodness of fit statistic in this case. For
other distributions of the Yi , the deviance may only be approximately chi-square. It
must be noted that this approximation can be very poor for limited amounts of data. In
the case of the binomial and Poisson distributions, for example, D can be calculated
and used directly as a goodness of fit statistic. If the scale parameter φ is unknown or
known to have a value other than one, we us a scaled version of the deviance

D
φ

and we call it the scaled deviance.

The deviance forms the basis for most hypothesis testing for generalized linear models.
Suppose we are interested in comparing the fit of two models. These models need to
have the same probability distribution and the same link function. The models also
need to be hierarchical, which means that the systematic component of the simpler
model M 0 is a special case of the linear component of the more general model M 1 .
Consider the null hypothesis

 β1 
 
H0 : β = β 0 =  M 
β q 
 

that corresponds to model M 0 and a more general hypothesis

 β1 
 
H1 : β = β 1 =  M 
β p 
 

that corresponds to model M 1 , with q < p < N .

We can test H 0 against H 1 using the difference of the deviance statistics

{( ) ( )} { ( ) ( )} { ( ) (
∆D = D0 − D1 = 2 l b sat , y − l b 0 , y − 2 l b sat , y − l b1 , y = 2 l b1 , y − l b 0 , y )}

IAUL & D EPARTMENT OF S TATISTICS P AGE

5
I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

If both models describe the data well, then D0 follows a χ 2 (N − q ) distribution and D1
follows a χ 2 ( N − p ) distribution. It then follows that ∆D has a χ 2 ( p − q ) distribution
under certain independence assumptions. If the value of ∆D is consistent with the
χ 2 ( p − q ) distribution we would generally choose the model M 0 corresponding to H 0
because it is simpler. If the value of ∆D is in the critical region of the χ 2 ( p − q )
distribution we would reject H 0 in favour of H 1 on the grounds that model M 1
provides a significantly better description of the data. It must be noted that this model
too may not fit the data particularly well.

If the deviance can be calculated from the data, then ∆D provides a good method for
hypothesis testing. The sampling distribution of ∆D is usually better approximated by
the chi-squared distribution than is the sampling distribution of a single deviance.

Model checking
As was the case for the simple linear model, we should perform model checking after
we have fitted a particular generalized linear model. As before we should look at
whether the model is reasonable and we should investigate whether the various
assumptions we make when we fit and draw inference using the model are satisfied. If
the checks and investigations do reveal that there is a problem, then there are a number
of different solutions available to us. These were discussed in Lecture Notes on model
checking for linear models. In this section we discuss graphical techniques available to
us for trying to detect systematic departures from our generalized linear model that
may for example be the result of an incorrectly specified link function, variance
function or a misspecification of the explanatory variables in the systematic component
in the model. As was the case for the linear model, many of these graphical techniques
entail using the residual values from our fitted model.

For generalized linear models we require an extended definition of residuals,

applicable to distributions other than the Normal distribution. Many of these these
residuals are discussed in detail in McCullagh and Nelder (1989). These residuals can
be used to assess various aspects of the adequacy of a fitted generalized linear model
that we have mentioned above. The residuals should be unrelated to the explanatory
variables and they can also be use to identify any unusual values that may require
some further investigation. Various plots of the residuals can be used to assess these
properties. Systematic pattern in the residual plots can for example be indicative of an
unsuitable link function, wrong scale of one or more of the predictors, or omission of a
quadratic term in a predictor. Examples of extended definitions of residuals that are
widely used in model checking for GLMs include the Pearson and deviance residuals.

The Pearson residuals are just rescaled versions of the raw or response residuals and are
defined as

IAUL & D EPARTMENT OF S TATISTICS P AGE

6
I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

( yi − µˆ i )
rP = .
V (µˆ i )

Here V (µ i ) is the variance function. The name is taken from the fact that for the Poisson
distribution the Pearson residual is just the signed square root of the component of the
Pearson X 2 goodness-of-fit statistic, so that
2 2
∑ rp = X .

If the deviance is used as a measure of discrepancy of a generalized linear model, then

each unit contributes a quantity d i to the deviance, so that D = ∑i d i . The deviance
residual is defined as

rD = sign( y i − µ i ) d i .

Most often standardised versions of the above residuals are used in model checking.
The standardised versions of the Pearson and deviance residual are given by

( yi − µˆ i ) rD
rP ′ = ′
and rD =
φˆ V (µˆ i )(1 − h ) φˆ (1 − h )

respectively. In the above h is the equivalent to the leverage that we have defined for
the linear model. In general the deviance residual, either unstandardized or
standardized, is preferred to the Pearson residual. Below we will discuss a few basic
plots of the (standardized) residuals that can be used to check the validity of our
model.

1. Informal checks using the residuals

It is almost standard procedure to consider scatterplots of the residuals against some
function of the fitted values. Scatterplots of the standardized deviance residuals against the
estimated linear predictor η̂ or against the fitted values transformed to a constant scale of the
error distribution are recommended for this purpose. For a few commonly used error
distributions the following transformed values are recommended:

µ̂ for Normal errors,

2 µ̂ for Poisson errors,

2 sin −1 µ̂ for binomial errors,

2 log µ̂ for gamma errors.

The plot should be centred at µˆ = 0 and a constant range. Typical deviations from this
pattern include curvature in the residuals with the mean and a systematic change in

IAUL & D EPARTMENT OF S TATISTICS P AGE

7
I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

the range of the residuals with fitted value. Curvature may arise from several causes,
including the wrong choice of link function, wrong choice of scale of one or more of the
covariates, or omission of a quadratic term in a covariate.

A second useful scatterplot is that of the residuals against an explanatory variable in the
linear predictor. This plot should exhibit the same form as the plot above. The
presence of systematic trend usually arises for the same reasons as for the plot above.
Additionally, the trend may also be the result of a faulty scale in another explanatory
variable that is closely correlated with the one under investigation.

The third scatterplot that we will consider is termed an added-variable plot. This is
equivalent to the partial regression plot that we considered for the linear model. This
plot helps us to check if an omitted explanatory variable should be included in the
linear predictor. The added-variable plot for a particular candidate explanatory
variable is formed by (a.) finding the unstandardized residuals for the existing
generalized linear model with response variable Y and any already included
explanatory variables, (b.) finding the unstandardized residuals for another linear
model in which the candidate explanatory variable is treated as the response, using the
same linear predictor as for Y (here, the candidate explanatory variable is treated as
the response variable), and (c.) plotting the first set of residuals against the second set.
The presence of a trend in the points in this plot indicates that you might consider
including the particular candidate variable in the model as an explanatory variable,
and the shape of the trend can tell you what forms of the variable you might include.

2. Checking the variance function

A plot of the absolute residuals against the fitted values gives an informal check on the
adequacy of the assumed variance function. An ill-chosen variance function will result
in a trend in the mean. A positive trend indicates that the current variance function is
increasing to slowly with the mean, and vice versa.

3. Checking the link function

An informal check involves examining the plot of the scale-adjusted dependent
variable against η̂ , the estimated linear predictor. This should approximately be a
straight line. For link functions of the power family an upward curvature in the plot
points to a link with higher power than that used. A downward curvature points to a
lower power. For binary data this plot is uninformative. McCullagh and Nelder
(1989) discuss more formal methods for this situation. Note that checks for the link
function are affected by failure to establish the correct scales for one or more of the
explanatory variables in the linear predictor. This can be validated using partial
residual plots. They are described below.

4. Checking the scales of explanatory variables

The partial residual plot is a useful tool for checking whether the correct scales have
been used for the explanatory variables. In its generalized form the partial residual is

IAUL & D EPARTMENT OF S TATISTICS P AGE

8
I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

defined by

u = z − ηˆ + γˆx

where z is the adjusted dependent variable, η̂ the fitted linear predictor and γˆ the
parameter estimate for the explanatory variable x . The plot of u against x provides
an informal check if the scale of x is satisfactory. A correctly specified scale should
result in an approximately linear plot. The form of the plot may suggest a suitable
alternative if the scale is not appropriate. Note that distortions in this plot may also
occur when the scales of other explanatory variables are wrong, which may require that
we look at partial residual plots for several explanatory variables.

5. Checks for outlying or influential points

We also need to check for individual points that may differ from the general pattern set
by the remainder of the points. We defined the leverage h for individual points in the
context of the linear model by which to judge their influence on the fit. We can
consider these measures for GLMs as well, but we must note that a point in the extreme
of the explanatory variable’s range will not necessarily have a high leverage if its
weight is small.

The Cook’s distance was introduced as a measure of influence for the linear model in a
previous lecture. Adapted versions of the Cook’s distance can be used for generalized
linear models. The Studentized residuals

Yi − Yˆ(i )
ei* = ri* =
seˆ(Yi − Yˆ(i ) )

that were introduced in the context of the linear model can also be used to assess the
consistency of individual points. One-step approximations of these residuals exist that
are appropriate for GLMs. To interpret the n values that we get for the leverage,
Cook’s distance and Studentized residuals respectively, we need some measure to
assess how large extreme values would be in a sample of a given size even if no
unusual points were present. Normal plots can be used for this purpose. There are
two forms of Normal plots: the half-Normal plot and the Normal plot. We will not go
into the theoretical detail underlying these plots. We will only mention that the half-
Normal plot is appropriate for non-negative quantities like the leverage and Cook’s
distance, while for the Studentized residuals there are two options, either a half-
Normal plot of r * or a full Normal plot of r * itself. For either plot the ordered values
of the statistic are plotted against the expected order statistics of a Normal sample.
Extreme points will appear at the extremes of the plot, and may possibly deviate from
the trend indicated by the remainder of the points. Note that is trend is not necessarily
linear in the case of the leverage or Cook’s distances.

IAUL & D EPARTMENT OF S TATISTICS P AGE

9
I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

6. Checks for correlations in the errors

An index plot of the residuals should can be used to assess correlation in the residuals.
If the residuals are independent this should fluctuate randomly without systematic
pattern. If the residuals are correlated special modelling methods are needed.

K. Javaras and W. Vos (2002)

References
Davidson, A. C. and Snell, E. J. (1991). Residuals and diagnostics. Chapter 4 of Hinkley
et al. (1991).

Dobson, A. (1990). An Introduction to Generalized Linear Models (2nd ed.). Boca Raton, FL:
Chapman and Hall/CRC.

Hinkley, D. V., Reid, N. and Snell, E. J. eds (1991). Statistical Theory and Modelling. In
Honour of Sir David Cox. London: Chapman & Hall.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). London:
Chapman and Hall.

IAUL & D EPARTMENT OF S TATISTICS P AGE

7 Generalized Linear Models Padua
No ratings yet
7 Generalized Linear Models Padua
29 pages
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
No ratings yet
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
6 pages
Class - Lectur 5&6
No ratings yet
Class - Lectur 5&6
12 pages
Stat5900 f24 Lec9
No ratings yet
Stat5900 f24 Lec9
12 pages
(TRANSLATED) Generalized Linear Model
No ratings yet
(TRANSLATED) Generalized Linear Model
11 pages
Actuarial GLMs Explained
No ratings yet
Actuarial GLMs Explained
74 pages
GLM Slides 2 Exp Family
No ratings yet
GLM Slides 2 Exp Family
35 pages
Lecture 3
No ratings yet
Lecture 3
30 pages
GAMS Getting Started
No ratings yet
GAMS Getting Started
31 pages
(GAM) Application PDF
No ratings yet
(GAM) Application PDF
30 pages
Ho GLM
No ratings yet
Ho GLM
5 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
9 pages
Note on Generalized Linear Models: y y Xβ w X β w I y Xβ I y Xβ X w X
No ratings yet
Note on Generalized Linear Models: y y Xβ w X β w I y Xβ I y Xβ X w X
4 pages
Httpsemas2.Ui - Ac.idpluginfile - Php2375826mod Resourcecontent1kuliah1 2 PDF
No ratings yet
Httpsemas2.Ui - Ac.idpluginfile - Php2375826mod Resourcecontent1kuliah1 2 PDF
31 pages
2.1972 Generalized Linear Models Nelder Wedderburn
No ratings yet
2.1972 Generalized Linear Models Nelder Wedderburn
16 pages
w6 - Statistical Modelling
No ratings yet
w6 - Statistical Modelling
24 pages
GLM Theory
No ratings yet
GLM Theory
46 pages
General Additive Model - Michael Clark
No ratings yet
General Additive Model - Michael Clark
31 pages
Generalized Linear Models: Simon Jackman Stanford University
No ratings yet
Generalized Linear Models: Simon Jackman Stanford University
7 pages
Chapter 2
No ratings yet
Chapter 2
5 pages
6.1 - Introduction To GLMs
No ratings yet
6.1 - Introduction To GLMs
3 pages
Categorical Notes Ch4
No ratings yet
Categorical Notes Ch4
40 pages
Statistical Modeling Notes
No ratings yet
Statistical Modeling Notes
25 pages
Generalized Linear Models: FX Axb C DX Axb C DX
No ratings yet
Generalized Linear Models: FX Axb C DX Axb C DX
11 pages
An Empirical Study of Generalized Linear Model For
No ratings yet
An Empirical Study of Generalized Linear Model For
4 pages
15 GLM
No ratings yet
15 GLM
32 pages
Intro to General Linear Models
No ratings yet
Intro to General Linear Models
3 pages
RM - Elements of Generalised Linear Models (GLM) and Inference For GLM
No ratings yet
RM - Elements of Generalised Linear Models (GLM) and Inference For GLM
11 pages
S M S T C Lecture 2425
No ratings yet
S M S T C Lecture 2425
45 pages
Final Scribd
100% (1)
Final Scribd
23 pages
Generalized Linear Models
No ratings yet
Generalized Linear Models
7 pages
Generalised Linear Model
No ratings yet
Generalised Linear Model
4 pages
Categorical Data Analysis: GLM Basics
100% (1)
Categorical Data Analysis: GLM Basics
53 pages
University of Illinois - Introduction To GLM
No ratings yet
University of Illinois - Introduction To GLM
39 pages
Ch13slides Generalized Linear Models
No ratings yet
Ch13slides Generalized Linear Models
24 pages
5 - GLM
No ratings yet
5 - GLM
38 pages
GLM & Logistic
No ratings yet
GLM & Logistic
26 pages
Econometria Avanzada: Generalized Linear Models
No ratings yet
Econometria Avanzada: Generalized Linear Models
30 pages
GLM Slides 6 Binary Response Print
No ratings yet
GLM Slides 6 Binary Response Print
55 pages
Chapman-Kolmogorov Equations 37 This Produces The 48511
No ratings yet
Chapman-Kolmogorov Equations 37 This Produces The 48511
9 pages
GLM - NelderWedderburn1972
No ratings yet
GLM - NelderWedderburn1972
16 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
30 pages
Generalized Linear Models GLMs
No ratings yet
Generalized Linear Models GLMs
10 pages
Presentation Generalized Linear Model Theory
No ratings yet
Presentation Generalized Linear Model Theory
77 pages
07 GLM
No ratings yet
07 GLM
49 pages
Lecture BDS 2 23 24 Print
No ratings yet
Lecture BDS 2 23 24 Print
10 pages
Lecture 11
No ratings yet
Lecture 11
6 pages
SandipK - Generalized Linear Models Question Bank
No ratings yet
SandipK - Generalized Linear Models Question Bank
38 pages
GU5232 Syl
No ratings yet
GU5232 Syl
2 pages
GLMexponetialfm
No ratings yet
GLMexponetialfm
22 pages
Lindsey 1997 Applying Generalized Linear Models PDF
100% (2)
Lindsey 1997 Applying Generalized Linear Models PDF
265 pages
Generalized Linear Models-1
No ratings yet
Generalized Linear Models-1
29 pages
1.1 Normal Linear Model: 1.1.1 Theoretical (Population) Model and Fitted (Sample) Model
No ratings yet
1.1 Normal Linear Model: 1.1.1 Theoretical (Population) Model and Fitted (Sample) Model
63 pages
Modelos Lineales Generalizados Con Ejemplos en R
No ratings yet
Modelos Lineales Generalizados Con Ejemplos en R
573 pages
Generalized Linear Models Guide
No ratings yet
Generalized Linear Models Guide
24 pages
Generalized Linear Models For Insurance Data
100% (3)
Generalized Linear Models For Insurance Data
208 pages
Human Anatomy & Physiology Laboratory Manual, Cat Version (12th Edition) - ISBN 0321971353, 978-0321971357
97% (30)
Human Anatomy & Physiology Laboratory Manual, Cat Version (12th Edition) - ISBN 0321971353, 978-0321971357
23 pages
Proposal (1) (1) (1) .Docx - NP
No ratings yet
Proposal (1) (1) (1) .Docx - NP
26 pages
Attachment Summary Sheet - Agriculture Extensions
100% (1)
Attachment Summary Sheet - Agriculture Extensions
13 pages
Artikel Danta Dan Tim
No ratings yet
Artikel Danta Dan Tim
15 pages
Statistics for Engineers & Scientists
No ratings yet
Statistics for Engineers & Scientists
2 pages
Editing Coding Tabulation of Data
No ratings yet
Editing Coding Tabulation of Data
18 pages
ESB2021 Resit With Solution
No ratings yet
ESB2021 Resit With Solution
9 pages
Operation Research BBM 6 Sem Tumkur University
No ratings yet
Operation Research BBM 6 Sem Tumkur University
7 pages
Poetry of The Singing Palm Tree - The Tradition of Tapping Palm Trees in Sabu Raijua NTT
No ratings yet
Poetry of The Singing Palm Tree - The Tradition of Tapping Palm Trees in Sabu Raijua NTT
16 pages
PR m1
No ratings yet
PR m1
15 pages
Feng Shui: The Factor Affecting The Choice of Malaysians When Purchasing A House Facing T-Junction
No ratings yet
Feng Shui: The Factor Affecting The Choice of Malaysians When Purchasing A House Facing T-Junction
75 pages
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
No ratings yet
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
22 pages
Title Proposal Food
No ratings yet
Title Proposal Food
10 pages
Sales Analysis at Elite Winery of Last Seven Years and Sales Prediction For The Upcoming Year in Karnataka
No ratings yet
Sales Analysis at Elite Winery of Last Seven Years and Sales Prediction For The Upcoming Year in Karnataka
19 pages
Re: ST: Panel Data-FIXED, RANDOM EFFECTS and Hausman Test: Sindijul@msu - Edu Statalist@hsphsun2.harvard - Edu
No ratings yet
Re: ST: Panel Data-FIXED, RANDOM EFFECTS and Hausman Test: Sindijul@msu - Edu Statalist@hsphsun2.harvard - Edu
5 pages
Using Mixed Methods Research Designs in Health Psy
No ratings yet
Using Mixed Methods Research Designs in Health Psy
16 pages
Applied Psychometrics Using SPSS and AMOS 1st Edition Holmes Finch Instant Download
100% (14)
Applied Psychometrics Using SPSS and AMOS 1st Edition Holmes Finch Instant Download
74 pages
(Original PDF) Statistics For Business: Decision Making and Analysis 3rd Edition Instant Download
100% (4)
(Original PDF) Statistics For Business: Decision Making and Analysis 3rd Edition Instant Download
46 pages
Introduction To Research Methods Part A Test Bank
100% (1)
Introduction To Research Methods Part A Test Bank
17 pages
Answer Key VI
No ratings yet
Answer Key VI
1 page
Qualitative Literature Review Guide
100% (1)
Qualitative Literature Review Guide
4 pages
Practical Research 2 - Importance of Quantitative Research Across Fields MELC 2 - LAS 1
100% (2)
Practical Research 2 - Importance of Quantitative Research Across Fields MELC 2 - LAS 1
12 pages
Air BNB and Hotel Guest Satisfaction Powerpointmmm
No ratings yet
Air BNB and Hotel Guest Satisfaction Powerpointmmm
16 pages
Historical Antecedents in The Course of Science and Technology
No ratings yet
Historical Antecedents in The Course of Science and Technology
24 pages
研究方法与案例分析
100% (2)
研究方法与案例分析
7 pages
Women Empowerment Through Self-Help Groups: The Role of Information& Communication Technology - A Case Study of Jharkhand State in India
No ratings yet
Women Empowerment Through Self-Help Groups: The Role of Information& Communication Technology - A Case Study of Jharkhand State in India
15 pages
DOE Practice Exam for Students
100% (1)
DOE Practice Exam for Students
30 pages
Identification of Operational Risks Impeding The Implementation of Elearning in Higher Education System
No ratings yet
Identification of Operational Risks Impeding The Implementation of Elearning in Higher Education System
17 pages
Financial Forecasting
No ratings yet
Financial Forecasting
38 pages
Research Methods: Report On Various Types of Researches
No ratings yet
Research Methods: Report On Various Types of Researches
9 pages

Modelling Lecture 5

Uploaded by

Modelling Lecture 5

Uploaded by

I NTRODUCTION TO S TATISTICAL M ODELLING T RINITY 2002

Introduction to Generalized Linear

where the Yi are independent, to the following more general situations:

IAUL & D EPARTMENT OF S TATISTICS P AGE

f ( y,θ ) = exp[{yθ − b(θ )} a(φ ) + c( y, φ )] (3)

where y takes the values 0, 1, 2, K . This can be rewritten as

f ( y, λ ) = exp( y log λ − λ − log y!) = exp( yθ − exp(θ ) − log y!) = f ( y,θ ) .

In the case of GLMs we require an extension of numerical estimation methods to

IAUL & D EPARTMENT OF S TATISTICS P AGE

II. Basic GLM setup and assumptions

1. The random component: the response variables, Y1 , Y2 , K , YN , are assumed to

2. The systematic component: covariates (explanatory variables) x1 , x 2 , K, x p produce

3. The link function, g , between the random and systematic components. It

III. Goodness of fit and comparing models

IAUL & D EPARTMENT OF S TATISTICS P AGE

IAUL & D EPARTMENT OF S TATISTICS P AGE

and we call it the scaled deviance.

that corresponds to model M 0 and a more general hypothesis

that corresponds to model M 1 , with q < p < N .

We can test H 0 against H 1 using the difference of the deviance statistics

IAUL & D EPARTMENT OF S TATISTICS P AGE

For generalized linear models we require an extended definition of residuals,

IAUL & D EPARTMENT OF S TATISTICS P AGE

If the deviance is used as a measure of discrepancy of a generalized linear model, then

1. Informal checks using the residuals

µ̂ for Normal errors,

2 µ̂ for Poisson errors,

2 sin −1 µ̂ for binomial errors,

2 log µ̂ for gamma errors.

IAUL & D EPARTMENT OF S TATISTICS P AGE

2. Checking the variance function

3. Checking the link function

4. Checking the scales of explanatory variables

IAUL & D EPARTMENT OF S TATISTICS P AGE

5. Checks for outlying or influential points

IAUL & D EPARTMENT OF S TATISTICS P AGE

6. Checks for correlations in the errors

K. Javaras and W. Vos (2002)

IAUL & D EPARTMENT OF S TATISTICS P AGE

You might also like