Example sheet 4 1a. data<-read.table("salary.
txt", sep="", header=FALSE) x<-data$V1 y<-data$V2 plot(data) I have produced a scatter plot, if I put type=l this should produce a line but it produces a mess! Plot shows a straight line ish. (???) plot(y,x^2) plot(log(y),x) plot(log(y), x^2) 2a. fit<-lm( y ~1+I(x^2), data=data) Remember to use the I: identity, tells R to do the arithmetic in brackets the fitted equation is y= 0.06907(x^2) + 23.50169 2b. test if the slope is equal to 0 summary(fit) Ho: Slope=0. HA: Slope not equal to 0. Significance level = 0.05 Standard error = 0.004784 Slope = 0.06907 Degrees of freedom = n-2 = 48 Test statistic (T value) = slope/standard error = 0.06907/0.004784 = 14.44 P value = < 2e-16 P value is less than the significance value so reject the null hypothesis. There is evidence to show the slope does not equal 0. 2c. test if the intercept is equal to 0 summary(fit) Ho: Intercept = 0. HA: Intercept not equal to 0. Significance level 0.05 Standard error = 2.173313 Intercept = 23.50169 intercept= 23.50169 slope= 0.06907
Degrees of freedom = n-2 = 48 Test statistic (T value) = 10.81 P value = 1.84e-14 P value is less than the significance value so reject the null hypothesis. There is evidence to show the slope does not equal 0. (Additional help http://stattrek.com/regression/slope-test.aspx) 2d. Find an estimate for the variance of errors. (summary(fit)$sigma)^2 = 65.59568
2e. The proportion of the total variation is Multiply R-squared = 0.8128 It is close to 1 so there is a good fit by linear regression. 2f. ??? 2g. ??? 2hi. plot(residuals(fit), fitted(fit)) 2hii. plot(residuals(fit), data$x) ??? Dont know if h is correct or what it is supposed to show. 3a. fit2<-lm( log(y) ~1+x, data=data) the fitted equation is log(y)= 0.04998x + 2.93357 3b. test if the slope is equal to 0 summary(fit2) Ho: Slope=0. HA: Slope not equal to 0. Significance level = 0.05 Standard error = 0.002868 Slope = 0.04998 Degrees of freedom = n-2 = 48 Test statistic (T value) = 0.04998/0.002868 = 17.43 P value = <2e-16 P value is less than the significance value so reject the null hypothesis. There is evidence to show the slope does not equal 0. 3c. test if the intercept is equal to 0 intercept= 2.93357 slope= 0.04998
summary(fit2) Ho: Intercept = 0. HA: Intercept not equal to 0. Significance level 0.05 Standard error = 0.056355 Intercept = 2.93357 Degrees of freedom = n-2 = 48 Test statistic (T value) = 52.05 P value = <2e-16 P value is less than the significance value so reject the null hypothesis. There is evidence to show the slope does not equal 0. 3d. Find an estimate for the variance of errors. (summary(fit2)$sigma)^2 = 0.02375086
3e. The proportion of the total variation is Multiply R-squared = 0.1541 Not close to 1 so there is not a good fit by linear regression. 3f, 3g, 3h ??? 4? Example sheet 5 1a. data<-read.table("brainweight.txt", header=TRUE, sep=",", row.names=1) plot(data) distribution is not clear from this plot. If we use the transformation plot(log(data)) we can see there is a strong positive linear relationship. 1b. fit <- lm(log(brain) ~ 1 + log(body), data = data) the fitted equation is log(y)= 0.7626log(x) + 2.0918 summaryfit<-summary(fit) (summaryfit$sigma)^2 estimate of the variance of the errors = 0.5332777 1c.anova(fit) Analysis of Variance Table response variable is log(brain) df sum squares mean squares f value p value
log(body) residuals
1 60
333.1 32
333.10 0.53
624.63 <2.2e-16
We have ssreg/(ssreg+sserror) = 333.1/(333.1+32) = 0.9123528, which is close to 1 so there is a good fit by linear regression. Ho: beta=0 and HA: beta not equal to 0. Let alpha= 0.05. P value < alpha therefore we reject the alternative hypothesis, there is a good relationship between the body and the brain weight. 1d. R squared = 0.7303 which is quite close to 1 so quite a good fit by linear regression. Use graphs to comment on the significance of the model parameters? 1ei.newdata<- data.frame(body=3) predict(fit,newdata, interval=c(confidence),level=0.95) Confidence interval is 2.743277 to 3.115831. Dont understand what interval to use, none, confidence, prediction? 1eii. newdata<- data.frame(body=450) predict(fit,newdata, interval=c("confidence"),level=0.95) Confidence interval is 6.407064 to 7.093832. Dont know if this is correct ??? 2. ???