Statistical Modelling Assignment II
Please analysis the following data and submit a report based on
the analysis.
Data I: Mosquito death data
Data II: Graduate admission data
Case I: Graduate Admission Data:
For our data analysis below, we have imported a data set from
the website which gives us information about Graduate
Admission Data.
data1 = read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
head(data1)
admit gre gpa rank
1 0 380 3.61 3
2 1 660 3.67 3
3 1 800 4.00 1
4 1 640 3.19 4
5 0 520 2.93 4
6 1 760 3.00 2
This dataset has a binary response variable called admit. There
are three predictor variables: gre, gpa and rank. We treat the
variables gre and gpa as continuous variables. We can get the
basic descriptive for the entire dataset using summary.
summary(data1)
admit gre gpa rank
Min. :0.0000 Min. :220.0 Min. :2.260 Min. :1.000
1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000
Median :0.0000 Median :580.0 Median :3.395 Median :2.000
Mean :0.3175 Mean :587.7 Mean :3.390 Mean :2.485
3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000
Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000
Using the Logistic Regression Model:
The code below estimates a logistic regression model using
the glm (generalized linear model) function. First, we
convert rank to a factor to indicate that rank should be treated
as a categorical variable. Then to get the results, we use the
summary command.
data1$rank = factor(data1$rank)
logit = glm(admit ~ gre + gpa + rank, data = data1, family = 'binomial')
summary(logit)
Call:
glm(formula = admit ~ gre + gpa + rank, family = "binomial",
data = data1)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6268 -0.8662 -0.6388 1.1490 2.0790
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.989979 1.139951 -3.500 0.000465 ***
gre 0.002264 0.001094 2.070 0.038465 *
gpa 0.804038 0.331819 2.423 0.015388 *
rank2 -0.675443 0.316490 -2.134 0.032829 *
rank3 -1.340204 0.345306 -3.881 0.000104 ***
rank4 -1.551464 0.417832 -3.713 0.000205 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 499.98 on 399 degrees of freedom
Residual deviance: 458.52 on 394 degrees of freedom
AIC: 470.52
Number of Fisher Scoring iterations: 4
The logistic regression coefficients give the change in the log
odds of the outcome for a one unit increase in the predictor
variable.
For every one-unit change in gre, the log odds of admission
(versus non – admission) increases by 0.002.
For a one unit increase in gpa, the log odds of being
admitted to graduate school increases by 0.804.
Having attended an undergraduate institution with rank of
2, versus an institution with a rank of 1, changes the log
odds of admission by -0.675.
Having attended an undergraduate institution with rank of
3, versus an institution with a rank of 1, changes the log
odds of admission by -1.34.
Having attended an undergraduate institution with rank of
4, versus an institution with a rank of 1, changes the log
odds of admission by -1.55.
We can use confint function to obtain confidence estimates.
confint(logit) # Confidence Intervals using profiled log – likelihood.
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) -6.2716202334 -1.792547080
gre 0.0001375921 0.004435874
gpa 0.1602959439 1.464142727
rank2 -1.3008888002 -0.056745722
rank3 -2.0276713127 -0.670372346
rank4 -2.4000265384 -0.753542605
We can also check the measures of how well our model fits. One
measure of model fit is the significance of the overall model.
This test asks whether the model with predictors fits significantly
better than a model with just an intercept (a null model).
with(logit, null.deviance - deviance)
[1] 41.45903
Finally, the p-value can be obtained as:
with(logit, pchisq(null.deviance - deviance, df.null - df.residual,
lower.tail = FALSE))
[1] 7.578194e-08