Solution Question Bank (Unit-3)
Solution Question Bank (Unit-3)
Contents: Measures of central tendency, Skewness, Kurtosis, Curve Fitting, Method of least squares,
fitting of straight lines, fitting of second-degree parabola, Exponential curves, Correlation and Rank cor-
relation, Regression Analysis: Regression lines of y on x and x on y,regression coefficients, properties of
regressions coefficients and nonlinear regression.
Course Outcome (CO3): Understand the basic statistical concept like moments, skewness, kurtosis, curve
fitting, correlation and regression.
Problem 1:
A cooperative bank has two branches employing 50 and 70 workers respectively. The average salaries paid
by two respective branches are Rs. 360 and Rs. 390 per month. Calculate the mean of the salaries of all the
employees.
Solution:
To calculate the mean salary of all employees, we use the weighted average formula:
P
(Group Size × Group Average)
Weighted Mean =
Total Group Size
Given:
• Branch 1: 50 workers, average salary = Rs. 360/month
• Branch 2: 70 workers, average salary = Rs. 390/month
Total Workers:
50 + 70 = 120
Total Salary:
Total Salary = (50 × 360) + (70 × 390)
Calculations:
50 × 360 = 18, 000
70 × 390 = 27, 300
Total Salary = 18, 000 + 27, 300 = 45, 300
1
Mean Salary:
Total Salary 45, 300
Mean Salary = =
Total Workers 120
Problem:2
Find the median of the dataset: 6, 8, 9, 10, 11, 12, 13
Solution:
1. Arrange the numbers in ascending order:
6, 8, 9, 10, 11, 12, 13
2. Count the total number of data points: The total number of data points (n) etis 7, which is odd.
3. Find the position of the median: For an odd number of data points, the median is the value at the
position:
n+1
Median Position =
2
Substituting n = 7:
7+1
Median Position = =4
2
4. The 4th number in the dataset is 10.
Final Answer: The median of the dataset is: 10
Problem:3
Find the mode of the following marks obtained by 15 students:
4, 6, 5, 7, 9, 8, 10, 4, 7, 6, 5, 8, 7, 7, 9
Solution:
1. Arrange the data and count the frequency of each number:
4 : 2 times
5 : 2 times
6 : 2 times
7 : 4 times
8 : 2 times
9 : 2 times
10 : 1 time
2. Identify the mode: The mode is the number that appears the most frequently. Here, 7 appears 4 times,
which is more than any other number.
Final Answer: The mode of the given data is:7
Page 2
Problem:4
Find the arithmetic mean of the following distribution.
x 1 2 3 4 5 6 7
f 5 9 12 17 14 10 6
Solution:
The formula for the arithmetic mean is:
P
fi xi
Arithmetic Mean = P
fi
where:
• fi is the frequency of each observation.
• xi is the value of each observation.
Step 1: Calculate fi xi for each xi
xi fi fi xi
1 5 5
2 9 18
3 12 36
4 17 68
5 14 70
6 10 60
7 6 42
X
fi = 5 + 9 + 12 + 17 + 14 + 10 + 6 = 73
X
fi xi = 5 + 18 + 36 + 68 + 70 + 60 + 42 = 299
Step 2: Compute the Arithmetic Mean
P
fi xi 299
Arithmetic Mean = P = ≈ 4.10
fi 73
Final Answer: The arithmetic mean is approximately: 4.10
Problem:5
In an asymmetrical distribution, the mean is 16 and the median is 20. Calculate the mode of the distribution.
Solution:
The empirical relationship between the mean, median, and mode is given by:
Mode = 3 × Median − 2 × Mean
Given:
Mean = 16, Median = 20
Substitute the values:
Mode = 3 × 20 − 2 × 16
Mode = 60 − 32 = 28
Final Answer: The mode of the distribution is:28
Page 3
Problem:6
The first three central moments of a distribution are given as µ2 = 0.15 and µ3 = −31. Find the moment
coefficient of skewness.
Solution:
The formula for the moment coefficient of skewness is:
µ3
γ1 = 3/2
µ2
Given:
µ1 = 0, µ2 = 15, µ3 = −31
Compute γ1 :
−31
γ1 = √ ≈ −.53
153
Final Answer: The moment coefficient of skewness is approximately:
γ1 ≈ −533.61
Problem:7
The first two moments of a distribution about the value ‘2’ of the variable are 1, 16. Show that mean is 3,
and variance is 15.
Solution:
We use the relationships between the moments about a point a and the raw moments (µ′1 and µ′2 ):
1. The mean (x̄) is given by:
x̄ = µ1 + a
2. The variance (σ 2 ) is given by:
σ 2 = µ2 − (µ1 )2
Step 1: Find the Mean (µ′1 ) Using the formula for the mean:
x̄ = µ′1 + a
Substituting the values:
x̄ = 1 + 2 = 3
Thus, the mean is:
x̄ = 3
2
Step 2: Find the Variance (σ ) Using the formula for the variance:
σ 2 = µ′2 − (µ′1 )2
Since
σ 2 = µ2
Substituting the values:
σ 2 = 16 − (1)2 = 16 − 1 = 15
Thus, the variance is:
σ 2 = 15
Final Answer:
Page 4
• Mean (x̄) = 3
• Variance (σ 2 ) = 15
Problem:8
The fourth central moment is µ4 = 48. What must be its standard deviation (σ) in order for the distribution
to be mesokurtic?
Solution:
The kurtosis (β2 ) is given as:
µ4
β2 =
σ4
where:
• µ4 is the fourth central moment.
• σ is the standard deviation.
Step 1: Substitute the known values: For a mesokurtic distribution, β2 = 3, and µ4 = 48. Substi-
tuting these into the formula:
48
3= 4
σ
Step 2: Solve for σ 4 :
48
σ4 = = 16
3
Step 3: Solve for σ: Taking the fourth root (or square root twice) of both sides:
√
4
√
σ = 16 = 4 = 2
Problem:9
Write the normal equations to fit the curve y = ax2 + b by the method of least squares.
Solution:
To fit the curve y = ax2 + b using the least squares method, we minimize the sum of squared residuals.
Step 1: Define the Error The error for each data point (xi , yi ) is:
ei = yi − (ax2i + b)
Step 2: Minimize S To find the values of a and b, we set the partial derivatives of S with respect to a
and b to zero:
∂S ∂S
= 0 and =0
∂a ∂b
Page 5
Step 3: Derive the normal equations After differentiation, the normal equations are:
X X
yi = a x2i + bn
X X X
x2i yi = a x4i + b x2i
Final Normal Equations:
1. X X
yi = a x2i + bn
2. X X X
x2i yi = a x4i + b x2i
Problem:10
Write the formula for Karl Pearson’s correlation coefficient and state the range of the correlation coefficient.
Solution:
Karl Pearson Correlation Coefficient Formula:
The Karl Pearson correlation coefficient (r) is given by the formula:
P
(xi − x̄)(yi − ȳ)
r = pP P
(xi − x̄)2 (yi − ȳ)2
where:
• xi and yi are the individual data points of the two variables X and Y ,
• x̄ and ȳ are the means of the variables X and Y , respectively.
Range of the Correlation Coefficient:
The value of the correlation coefficient r lies between -1 and +1, inclusive:
−1 ≤ r ≤ 1
Problem:11
If the covariance between variables x and y is 10, and the variances of x and y are 16 and 9 respectively,
find the coefficient of correlation.
Solution:
The formula for the coefficient of correlation (r) is:
cov(x, y)
r=
σx σy
where:
• cov(x, y) is the covariance between x and y,
Page 6
• σx and σy are the standard deviations of x and y, respectively.
Given:
• cov(x, y) = 10,
√
• σx2 = 16, so σx = 16 = 4,
√
• σy2 = 9, so σy = 9 = 3.
Step 1: Apply the formula for correlation:
10 10 5
r= = =
4×3 12 6
Final Answer: The coefficient of correlation r is 56 .
Problem 12:
The lines of regression of y on x and x on y are respectively:
Solution
The lines of regression of y on x and x on y are respectively:
16 94
y =5+x and x = − y,
9 9
To calculate the correlation coefficient (r) between x and y, we use the equations of the lines of regression.
Given:
• Line of regression of y on x: y = x + 5
Slope of this line (byx ) = 1.
• Line of regression of x on y: 16x − 9y = 94
9
Rewrite it as x = 16 y + 94
16 , so the slope (bxy ) =
9
16 .
Page 7
Final Answer:
3
r=
4
Problem:13
If the regression coefficients are byx = 0.8 and bxy = 0.2, find the value of the coefficient of correlation (r).
Solution:
The formula for the coefficient of correlation is:
p
r = ± byx · bxy
Given:
byx = 0.8, bxy = 0.2
Substitute these values into the formula:
√
r = ± 0.8 · 0.2
Simplify: √
r = ± 0.16
r = ±0.4
Final Answer:
r = 0.4
problem:14
If the regression coefficients are byx = 0.8 and bxy = 0.8, find the value of the coefficient of correlation (r).
Solution:
The formula for the coefficient of correlation is:
p
r = ± byx · bxy
Page 8
Given:
byx = 0.8, bxy = 0.8
Substitute these values into the formula:
√
r = ± 0.8 · 0.8
Simplify: √
r = ± 0.64
r = ±0.8
Final Answer:
r = 0.8
Problem:15
What is the relation between the regression coefficients and the coefficient of Correlation?
Formula:
p
r = ± byx · bxy
Problem:16
c0 √
Write the normal equations to fit a curve y = x + c1 x
c0 √
Normal Equations to Fit the Curve y = x
+ c1 x
We aim to fit a curve of the form: √
c0
y= + c1 x
x
Using the method of least squares, we minimize the sum of squared residuals:
X 2
c0 √
S= yi − − c1 x i
xi
By differentiating this expression with respect to c0 and c1 , and setting the derivatives to zero, we obtain
the following normal equations:
Page 9
First Normal Equation (with respect to c0 ):
X 1 c0 √
yi − − c1 x i = 0
xi xi
Simplifying this:
X yi X 1 X √x i
= c0 + c 1
xi x2i xi
Problem:17
Write the formula for rank correlation in the case of tied rank
Where:
mi is no of repetition of the ranks.
Page 10
Question Description (7 Marks)
Problem:1
Calculate the first four central moments and also comment upon Skewness and Kurtosis from the following
data:
Solution:
Calculation of Central Moments, Skewness, and Kurtosis
Given Data:
Page 11
Step 3: Calculate the Central Moments
First Central Moment (Mean):
µ1 = 0
Second Central Moment (Variance):
f (x − x̄)2
P
µ2 = P
f
f (x − x̄)3
P
µ3 = P
f
Page 12
Now, calculate the sum:
X
f (x − x̄)3 = −4096 − 864 + 192 + 5488 = 720
f (x − x̄)4
P
µ4 = P
f
Problem 2:
Calculate the first four central moments about the mean, Skewness, and Kurtosis for the following data
(2021-22):
x 0 1 2 3 4 5 6 7 8
f 1 8 28 56 70 56 28 8 1
Page 13
Calculation of Central Moments, Skewness, and Kurtosis
Given Data:
x 0 1 2 3 4 5 6 7 8
f 1 8 28 56 70 56 28 8 1
Solution:
Step 1: Calculate the Mean (x̄)
The mean (x̄) is calculated by:
P
(f · x)
x̄ = P
f
x f f ·x
0 1 0
1 8 8
2 28 56
3 56 168
4 70 280
5 56 280
6 28 168
7 8 56
8 1 8
Total 256 1024
1024
x̄ = =4
256
Step 2: Calculate the Central Moments
The central moments are calculated using the formula:
f (x − x̄)r
P
µr = P
f
Second Central Moment (Variance):
f (x − x̄)2
P
µ2 = P
f
Page 14
f (x − x̄)3
P
µ3 = P
f
f (x − x̄)4
P
µ4 = P
f
Problem 3:
Compute Skewness and Kurtosis, if the first four moments of a frequency distribution about the value 4 of
the variable are 1, 4, 10, and 45.
Page 15
Solution:
We are given the first four moments about the value A = 4:
µ2 = µ′2 − (µ′1 )2 , µ3 = µ′3 − 3µ′2 µ′1 + 2(µ′1 )3 , µ4 = µ′4 − 4µ′3 µ′1 + 6µ′2 (µ′1 )2 − 3(µ′1 )4
µ3 = 10 − 3(4)(1) + 2(1)3
µ3 = 10 − 12 + 2 = 0
3. Fourth Central Moment (µ4 ):
Page 16
Final Results
• Second Central Moment (µ2 ): 3
• Third Central Moment (µ3 ): 0
• Fourth Central Moment (µ4 ): 26
• Skewness (γ1 ): 0 (symmetric distribution)
• Kurtosis (γ2 ): 2.89 (slightly leptokurtic)
Problem 4:
The first four moments of a frequency distribution about the value 4 of the variable are -1.5, 17,-30 and
80.Find
µ1 , µ2 , µ3 µ4
about mean. Also find
β1 andβ2
.
Solution
We are given the first four moments about the value A = 4:
The formulae for central moments (µr ) in terms of moments about A (µ′r ) are:
µ1 = 0, µ2 = µ′2 − (µ′1 )2 , µ3 = µ′3 − 3µ′2 µ′1 + 2(µ′1 )3 , µ4 = µ′4 − 4µ′3 µ′1 + 6µ′2 (µ′1 )2 − 3(µ′1 )4
Page 17
Step 2: Skewness and Kurtosis
Skewness (β1 ):
µ23
β1 =
µ32
Substitute µ3 = 53.25 and µ2 = 14.75:
53.252
β1 =
(14.75)3
53.252 53.25
β1 = = ≈ 0.8836
(14.75)3 56.7157
Kurtosis (β2 ):
µ4
β2 =
µ22
Substitute µ4 = 114.3125 and µ2 = 14.75:
114.3125
β2 =
(14.75)2
114.3125
β2 = ≈ 0.53
217.5625
Final Results
• Second Central Moment (µ2 ): 14.75
• Third Central Moment (µ3 ): 53.25
Problem 5:
The first four moments of a frequency distribution about the value 2 of the variable are 2, 20, 40 and 50
respectively. Comment upon the skewness and kurtosis of the distribution.
Solution
Analysis of Skewness and Kurtosis
The first four moments about A = 2 are given as:
Mean (µ1 )
The mean of the distribution is:
µ1 = A + µ′1 = 2 + 2 = 4.
Page 18
Central Moments
The central moments are calculated using the formula:
n
X n ′
µn = µ (A − µ1 )n−k .
k k
k=0
Skewness (γ1 )
Skewness is calculated as:
µ3
γ1 = 3/2
µ2
−72 −72 −72
γ1 = 3/2
= √ = ≈ −0.61.
(24) 24 24 24 × 4.899
Interpretation: Since γ1 is negative, the distribution is negatively skewed.
Kurtosis (γ2 )
Kurtosis is calculated as:
µ4
γ2 =
µ22
226 226
γ2 = 2
= ≈ 0.392.
24 576
Excess kurtosis is:
Excess Kurtosis = γ2 − 3 = 0.392 − 3 = −2.608.
Interpretation: The negative excess kurtosis indicates that the distribution is platykurtic (flatter than a
normal distribution).
Conclusion
• The distribution is negatively skewed (γ1 ≈ −0.61).
• The distribution is platykurtic (γ2 ≈ 0.392), meaning it has lighter tails and a flatter peak compared
to a normal distribution.
Page 19
Problem 6:
The first four moments of a frequency distribution about the value 5 of the variable are 1, 2.5, 5.5 and 16
respectively.Find the four central moments, moments about origin and coefficient of skewness.
Solution:
Given:
µ′1 = 1,
µ′2 = 2.5,
µ′3 = 5.5,
µ′4 = 16.
M1 = x̄ = A + µ′1 = 5 + 1 = 6.
µ1 = 0,
µ2 = µ′2 − (µ′1 )2 ,
µ3 = µ′3 − 3µ′2 µ′1 + 2(µ′1 )3 ,
µ4 = µ′4 − 4µ′3 µ′1 + 6µ′2 (µ′1 )2 − 3(µ′1 )4 .
Substitute the values:
1. µ2 = µ′2 −(µ′1 )2 = 2.5−12 = 2.5−1 = 1.5, 2. µ3 = µ′3 −3µ′2 µ′1 +2(µ′1 )3 = 5.5−3(2.5)(1)+2(1)3 = 5.5−
7.5+2 = 0, 3. µ4 = µ′4 −4µ′3 µ′1 +6µ′2 (µ′1 )2 −3(µ′1 )4 = 16−4(5.5)(1)+6(2.5)(1)2 −3(1)4 = 16−22+15−3 = 6.
Thus, the central moments are:
µ2 = 1.5, µ3 = 0, µ4 = 6.
M1 = x̄ = 6,
M1 = 6, M2 = 1.5, M3 = 0, M4 = 6.
Page 20
Step 3: Coefficient of Skewness
The coefficient of skewness γ1 is given by:
µ3 0
γ1 = 3/2
= = 0.
µ2 (1.5)3/2
Final Results
• Central Moments: µ2 = 1.5, µ3 = 0, µ4 = 6
• Moments About the Origin: M1 = 6, M2 = 1.5, M3 = 0, M4 = 6
• Coefficient of Skewness: γ1 = 0
Problem:7
Determine the Skewness and Kurtosis for the following data:
1 Solution
We are given the following frequency distribution:
Page 21
We first calculate fi xi :
Class Interval fi xi fi xi
10 − 20 18 15 270
20 − 30 20 25 500
30 − 40 30 35 1050
40 − 50 22 45 990
50 − 60 10 55 550
Now, calculate the totals:
X
fi xi = 270 + 500 + 1050 + 990 + 550 = 3360
X
fi = 18 + 20 + 30 + 22 + 10 = 100
Thus, the mean is:
3360
= 33.6
x̄ =
100
Step 3: Calculate the Second and Third Central Moments Second Central Moment (Variance) We now
calculate (xi − x̄)2 :
Class Interval xi fi (xi − x̄) (xi − x̄)2 fi (xi − x̄)2
10 − 20 15 18 −18.6 345.96 6227.28
20 − 30 25 20 −8.6 73.96 1479.20
30 − 40 35 30 1.4 1.96 58.80
40 − 50 45 22 11.4 129.96 2859.12
50 − 60 55 10 21.4 457.96 4579.60
The variance is:
fi (xi − x̄)2
P
12304
Variance = P = = 123.04
fi 100
Thus, the standard deviation σ is: √
σ= 123.04 ≈ 11.09
Third Central Moment We now calculate (xi − x̄)3 :
Class Interval xi fi (xi − x̄) (xi − x̄)3 fi (xi − x̄)3
10 − 20 15 18 −18.6 −640.416 −11527.488
20 − 30 25 20 −8.6 −50.056 −1001.12
30 − 40 35 30 1.4 2.744 82.32
40 − 50 45 22 11.4 1487.304 32717.728
50 − 60 55 10 21.4 9774.704 97747.04
The third central moment is:
fi (xi − x̄)3
P
106018.48
Third Central Moment = P ≈ = 1060.18
fi 100
Step 4: Calculate Skewness and KurtosiSkewness Skewness is calculated using the formula:
Third Central Moment
Skewness =
(Standard Deviation)3
1060.18 1060.18
Skewness = 3
≈ ≈ 0.78Kurtosisiscalculatedusingthef ormula :
(11.09) 1362.42
Fourth Central Moment
Kurtosis = −3
(Standard Deviation)4
The kurtosis value is approximately:
Kurtosis ≈ 3.1
Final Results: - **Skewness** = 0.78 - **Kurtosis** = 3.1
Page 22
Problem 8:
Find the coefficient of correlation from the following points of observation (1,3),(2,2),(3,5),(4,4),(5,6).
Solution:
To find the coefficient of correlation r for the given points of observation, we use the Pearson correlation
coefficient formula: P P P
n xi yi − xi yi
r= p P 2
[n xi − ( xi )2 ][n yi2 − ( yi )2 ]
P P P
5 × 68 − (15 × 20)
r= p
[5 × 55 − (15)2 ][5 × 90 − (20)2 ]
Page 23
Problem:9
A random sample of 5 college students is selected and their grades in Mathematics and Statistics are found
to be:
Student Mathematics
Statistics
1 85
93
2 60
75
3 73
65
4 40
50
5 90
80
Calculate the rank correlation coefficient.
Solution:
Spearman’s Rank Correlation Coefficient
The formula for the **rank correlation coefficient** rs is:
6 d2i
P
rs = 1 −
n(n2 − 1)
Where: - n is the number of data points (in this case, n = 5), - di is the difference between the ranks of
corresponding values of Mathematics and Statistics for each student. Given Data:
Step 1: Rank the Values for Each Subject Ranks for Mathematics (X): - 90 → Rank 1 - 85 → Rank 2 - 73
→ Rank 3 - 60 → Rank 4 - 40 → Rank 5 Thus, the ranks for Mathematics are:
RankX = [2, 4, 3, 5, 1]
Ranks for Statistics (Y): - 93 → Rank 1 - 80 → Rank 2 - 75 → Rank 3 - 65 → Rank 4 - 50 → Rank 5 Thus,
the ranks for Statistics are:
RankY = [1, 3, 4, 5, 2]
Step 2: Calculate the Differences in Ranks and Square Them Now, we calculate di = RankX − RankY and
d2i :
Student RankX RankY di = RankX − RankY d2i
1 2 1 1 1
2 4 3 1 1
3 3 4 −1 1
4 5 5 0 0
5 1 2 −1 1
Page 24
Step 3: Apply the Spearman Rank Correlation Formula The formula for Spearman’s rank correlation is:
6 d2i
P
rs = 1 −
n(n2 − 1)
Problem:10
Calculate the coefficient of correlation for the following heights (in inches) of fathers (X) and their sons (Y ):
Solution:
The **Pearson correlation coefficient** r is given by the formula:
P P P
n xi yi − xi yi
r= p P 2
[n xi − ( xi )2 ][n yi2 − ( yi )2 ]
P P P
Where: - n is the number of data points, - xi and yi are the individual data points for fathers’ and sons’
heights, respectively. We are given the data for 8 students, so n = 8. Step 1: Calculate the Required Sums
1. **Sum of xi ** and **Sum of yi **:
X
xi = 65 + 66 + 67 + 67 + 68 + 69 + 70 + 72 = 484
X
yi = 67 + 68 + 65 + 68 + 72 + 72 + 69 + 71 = 552
Page 25
Step 2: Apply the Formula for Pearson’s Correlation Coefficient
Now, substitute the values into the formula:
8 × 37460 − 484 × 552
r= p
[8 × 38528 − (484)2 ][8 × 37532 − (552)2 ]
Step 3: Simplify the Expression
First, calculate the numerator:
Question 11:
Fit a parabolic curve of second degree to the following data:
x y
0 1
1 1.8
2 1.3
3 2.5
4 6.3
Solution:
Fitting a Parabolic Curve
Given Data:
Page 26
x y
0 1
1 1.8
2 1.3
3 2.5
4 6.3
Step 1: Calculate the Necessary Summations
x y x2 x3 x4 x · y x2 · y
0 1 0 0 0 0 0
1 1.8 1 1 1 1.8 1.8
2 1.3 4 8 16 2.6 5.2
3 2.5 9 27 81 7.5 22.5
4
P 6.3 16 64 256 25.2 100.8
30 100 354 37.1 130.3
Step 2: Form the Normal Equations
The normal equations are:
X X X
y = na + b x+c x2 ,
X X X X
(xy) = a x+b x2 + c x3 ,
X X X X
(x2 y) = a x2 + b x3 + c x4 .
Problem 12:
Use the method of least squares to find the curve y = abx that best fits the following data:
x y
2 8.3
3 15.4
4 33.1
5 65.2
6 127.4
Page 27
Solution:
We assume the equation is of the form y = abx . Taking the natural logarithm of both sides:
Y = A + Bx
Now, we apply the method of least squares to the transformed equation:
The normal equations are:
X X
Y = nA + B x
X X X
xY = A x+B x2
We compute the required sums:
X X
x = 2 + 3 + 4 + 5 + 6 = 20, x2 = 4 + 9 + 16 + 25 + 36 = 90
X
Y = 2.120 + 2.740 + 3.497 + 4.181 + 4.850 = 17.388
X
xY = 76.453
We solve the system of equations:
1. 17.388 = 5A + 20B 2. 76.453 = 20A + 90B
Solving this, we find A = 0.7208 and B = 0.6893.
Thus, a = eA = 2.055 and b = eB = 1.989.
Therefore, the best-fitting curve is:
y = 2.055 × (1.989)x
Problem 13:
Use the method of least squares to find the curve y = abx that best fits the following data:
x y
2 144
3 172.8
4 207.4
5 248.8
6 298.5
2 Solution:
We assume the equation is of the form y = abx . Taking the natural logarithm of both sides:
Y = A + Bx
Now, we apply the method of least squares to the transformed equation:
Page 28
The normal equations are:
X X
Y = nA + B x
X X X
xY = A x+B x2
We compute the required sums:
X X
x = 2 + 3 + 4 + 5 + 6 = 20, x2 = 4 + 9 + 16 + 25 + 36 = 90
X
Y = 4.976 + 5.153 + 5.329 + 5.515 + 5.699 = 26.672
X
xY = 108.496
We solve the system of equations:
1. 26.672 = 5A + 20B 2. 108.496 = 20A + 90B
Solving this, we find A = 4.6112 and B = 0.1808.
Thus, a = eA = 100.823 and b = eB = 1.198.
Therefore, the best-fitting curve is:
y = 100.823 × (1.198)x
Problem 14:
Use the method of least squares to fit the curve y = ax + bx2 to the following data:
x y
1 1
2 1.2
3 1.8
4 2.5
5 3.6
6 4.7
7 6.6
8 9.1
Solution:
2
We assume
P the curvePis 2of the
Pform y =Pax + bxP. The normal equations are:
1. y = na + b x 2. xy = a x + b x2
We compute the required sums:
X X X X X
x = 36, x2 = 204, y = 29.5, xy = 182, x2 y = 1229
The system of equations is:
1. 29.5 = 8a + 204b 2. 182 = 36a + 204b
We solve this system and find a = 5.45 and b = −0.0691.
Thus, the best-fitting curve is:
y = 5.45x − 0.0691x2
Page 29
Problem 15:
Find the exponential curve of the form P = kV γ using the method of least squares for the following data:
V P
50 135
100 48
150 26
200 17
Solution:
We assume the curve is of the form P = kV γ . Taking the natural logarithm of both sides:
Y = A + B ln(V )
WeP apply the method
P of least squares to the
P linear equation:
V Y = A ln(V ) + B (ln(V ))2
P P
1. Y = nA + B ln(V ) 2.
After computing the necessary sums:
X X X
Y = 14.867, ln(V ) = 18.825, V = 500
X X
V Y = 1687.65, (ln(V ))2 = 89.68
We substitute these values into the normal equations:
1. 14.867 = 4A + 18.825B 2. 1687.65 = 18.825A + 89.68B
Solving this system of equations will give us the values of A and B. Once we have A and B, we can
compute k = eA and γ = B.
Thus, the exponential curve is P = kV γ .
Problem 16:
c1
Using the method of least squares, fit the curve y = c0 +c2 x to the following data:
x y
0.2 16
0.3 14
0.5 11
1 6
2 3
Page 30
Solution:
c1
Fitting the Curve y = √
x
+ c0 x
Given Data:
x y
0.2 16
0.3 14
0.5 11
1 6
2 3
Step 1: Transform the Model We rewrite the model as:
y = c1 z1 + c0 z0 ,
where:
1
z1 = √ , z0 = x.
x
Step 2: Compute Necessary Summations
Problem 17:
Using the method of least squares, fit the curve f (x) = a + bx + cx2 to the following data:
x f (x)
0 1
1 4
2 10
3 17
4 30
Page 31
Solution:
The equation of the curve Pis f (x) =Pa + bx +Pcx2 . The normalP equations
P 2 are:P 3
2
P 2
x f (x) = a x2 +
P P
1.
P 3 f (x)
P 4 = n · a + b x + c x 2. xf (x) = a x + b x + c x 3.
b x +c x
After calculating the necessary sums:
X X X X
x = 10, x2 = 30, x3 = 100, x4 = 354
X X X
f (x) = 62, xf (x) = 195, x2 f (x) = 677
We substitute these sums into the normal equations:
1. 62 = 5a + 10b + 30c 2. 195 = 10a + 30b + 100c 3. 677 = 30a + 100b + 354c
This gives us the system of equations to solve for a, b, and c.
Thus, we can solve this system to find the values of a, b, and c to fit the quadratic curve f (x) = a+bx+cx2 .
Problem 18:
Using the method of least squares, fit a curve of the form:
y = aebx
x y
1 1.0
2 1.2
3 1.8
4 2.5
5 3.6
Solution
Fitting the Curve y = aebx Using Least Squares
Step 1: Transform the Model
Taking the natural logarithm of both sides:
y = aebx =⇒ ln y = ln a + bx.
Y = A + bx.
x y Y = ln y
1 1 0.000000
2 1.2 0.182321
3 1.8 0.587787
4 2.5 0.916291
5 3.6 1.280934
Step 3: Compute Summations
Page 32
x Y x2 xY
1 0.000000 1 0.000000
2 0.182321 4 0.364642
3 0.587787 9 1.763361
4 0.916291 16 3.665164
5
P 1.280934 25 6.404670
15 2.967333 55 12.197837
Step 4: Form the Normal Equations
The normal equations are:
5A + 15b = 2.967333,
15A + 55b = 12.197837.
Step 5: Solve for A and b
Solving the equations, we get:
A = −0.3953, b = 0.3296.
Converting A to a:
a = eA = 0.6735.
Step 6: Fitted Curve
The fitted curve is:
y = 0.6735e0.3296x .
Problem 19:
If the following two lines are the regression equations:
1. 4x − 5y + 33 = 0 (regression of x on y), 2. 20x − 9y = 107 (regression of y on x),
Find the mean values of x and y, the correlation coefficient, and the standard deviation of y, given that
the variance of x is 9.
Solution:
Step 1: Find the Mean Values of x and y
The given regression equations are:
1. 4x − 5y + 33 = 0, or x = 54 y − 33 20 107
4 2. 20x − 9y = 107, or y = 9 x − 9
Let x and y be the mean values of x and y, respectively. Substitute x = x and y = y into the regression
equations:
1. x = 54 y − 33 20 107
4 2. y = 9 x − 9
We solve this system of linear equations:
From equation (1):
5 33
x= y−
4 4
From equation (2):
20 107
y= x−
9 9
Substitute equation (1) into equation (2):
20 5 33 107
y= y− −
9 4 4 9
Page 33
Simplifying:
100 660 107
y= y− −
36 36 9
25 1088
y= y−
9 36
Multiplying through by 36 to eliminate the denominator:
where bxy is the regression coefficient of x on y and byx is the regression coefficient of y on x.
From the given regression equations: - The coefficient bxy = 54 (regression of x on y), - The coefficient
byx = 20
9 (regression of y on x).
Thus, the correlation coefficient is:
r r r
5 9 45 9 3
r= × = = =
4 20 80 16 4
So, the correlation coefficient r is 34 .
Step 3: Find the Standard Deviation of y
We are given that the variance of x is 9. The standard deviation of x is:
√
SD(x) = 9 = 3
Page 34
Problem 20:
Problem Statement
In a partially destroyed laboratory record of an analysis of correlation data, the following results are legible:
Variance of x: σx2 = 9.
The regression equations are:
Solution
Given:
Variance of x = σx2 = 9 =⇒ σx = 3,
and the regression equations:
Page 35
(c) Coefficient of Correlation
The coefficient of correlation is:
r = 0.6.
Final Answers
• Mean values: x̄ = 13, ȳ = 17.
• Standard deviation of y: σy = 4.
Problem 21:
Two lines of regression are given by:
5x − 2y = 52 (regression of x on y)
and
3x − 8y = 12 (regression of y on x),
and the variance of x is given by σx2 = 12.
Calculate:
Solution:
Step 1: Rearrange the Regression Equations
The regression equations are:
2 52
5x − 2y = 52 ⇒ x= y+
5 5
and
3 12 3 3
3x − 8y = 12 ⇒ y= x− = x− .
8 8 8 2
Step 2: Calculate Mean Values of x and y
To find the mean values of x and y, we substitute x = x and y = y in the regression equations.
From the equation for x in terms of y:
2 52
x= y+ .
5 5
From the equation for y in terms of x:
3 3
y = x− .
8 2
Now substitute the expression for x into the equation for y:
Substitute x = 25 y + 52 3 3
5 into y = 8 x − 2 :
3 2 52 3
y= y+ − .
8 5 5 2
Page 36
Simplifying:
3 2 3 52 3
y= × y+ × − ,
8 5 8 5 2
6 156 60
y= y+ − ,
40 40 40
3 96
y= y+ ,
20 40
3 24
y= y+ .
20 10
Multiply through by 20 to eliminate the denominator:
20y = 3y + 48,
20y − 3y = 48,
17y = 48,
48
y= ≈ 2.82.
17
Now substitute y = 2.82 into the equation for x:
2 52
x= × 2.82 + ,
5 5
5.64 52
x= + ,
5 5
x = 1.128 + 10.4 = 11.528.
Thus, the mean values are:
x ≈ 11.528, y ≈ 2.82.
Step 3: Calculate the Coefficient of Correlation
The formula for the correlation coefficient r is given by:
p
r = bxy × byx ,
where bxy is the regression coefficient of x on y, and byx is the regression coefficient of y on x.
From the regression equations: - bxy = 25 , - byx = 83 .
Thus: r r r
2 3 6 3
r= × = = ≈ 0.387.
5 8 40 20
So, the coefficient of correlation is approximately r ≈ 0.387.
Step 4: Calculate the Variance of y
We are given the variance of x is σx2 = 12. The variance of y can be calculated using the formula:
1
σy2 = σx2 × r2 × .
b2xy
Page 37
• Mean of x: x ≈ 11.528,
• Mean of y: y ≈ 2.82,
• Coefficient of correlation: r ≈ 0.387,
Problem 22:
The following table gives the age (x) in years of cars and annual maintenance cost (y) in hundred rupees.
x y
1 15
3 18
5 21
7 23
9 22
Calculate the maintenance cost for a 4-year-old car after finding the regression equation.
Solution
The regression equation is of the form:
y = a + bx
where: P P P P P
n (xy) − x y y−b x
b= P 2 P , a=
n x − ( x)2 n
Page 38
Step 4: Predict Maintenance Cost for a 4-Year-Old Car (x = 4)
y = 15.05 + 0.95(4) = 18.85 (hundred rupees).
Final Answer
The estimated maintenance cost for a 4-year-old car is:
1885 rupees
Problem 23:
From the following data, determine the equations of the line of regression of y on x and x on y:
x y
6 9
2 11
10 5
4 8
8 7
Solution
The regression equation of y on x is:
y − ȳ = byx (x − x̄),
where: P P
P x y
xy −
byx = P Pn 2
( x)
x2 − n
The regression equation of x on y is:
x − x̄ = bxy (y − ȳ),
where: P P
P x y
xy − n
bxy = P P 2
( y)
y2 − n
X X X X X
x = 30, y = 40, x2 = 220, y 2 = 340, xy = 214
Page 39
Step 3: Calculate Regression Coefficients
P P
P x y
xy −
byx = P Pn 2
( x)
x2 − n
30×40
214 − 5
byx = 2 = −0.65
220 − 305
P P
xy − xn y
P
bxy = P P 2
y 2 − ( ny)
30×40
214 − 5
bxy = 2 = −1.30
340 − 405
Problem 24:
Fit a parabolic curve of regression of y on x to the following data:
x y
1.0 1.1
1.5 1.3
2.0 1.6
2.5 2.0
3.0 2.7
3.5 3.4
4.0 4.1
Solution
The parabolic regression curve is of the form:
y = a + bx + cx2
Page 40
Step 2: Solve Normal Equations
Substitute the calculated sums into the normal equations:
X X X
y = na + b x+c x2
X X X X
(xy) = a x2 + c
x+b x3
X X X X
(x2 y) = a x2 + b x3 + c x4
Solve these simultaneous equations to find a, b, and c.
Problem 25:
Find the multiple regression equation of X3 on X1 and X2 from the data given below:
X1 X2 X3
3 10 20
5 10 25
6 5 15
8 7 16
12 5 15
10 2 2
Solution
The multiple regression equation is given by:
X3 = a + b1 X1 + b2 X2
P P P
P To determine
P a, b1 , and
P b22 , we P
use the normal
P equations: P1. X3 P
= na + b1 X P1 + b2
2
X2 2.
(X3 X1 ) = a X1 + b1 X1 + b2 (X1 X2 ) 3. (X3 X2 ) = a X2 + b1 (X1 X2 ) + b2 X2
X3 = a + b1 X1 + b2 X2
Page 41
Problem 26:
For the data given below, determine the lines of regression of y on x and x on y:
x y
2 5
4 7
6 9
8 8
10 11
Solution
Regression of y on x:
The regression equation of y on x is:
y − ȳ = byx (x − x̄)
where:
Cov(x, y)
byx =
Var(x)
Regression of x on y:
The regression equation of x on y is:
x − x̄ = bxy (y − ȳ)
where:
Cov(x, y)
bxy =
Var(y)
Cov(x, y) Cov(x, y)
byx = , bxy =
Var(x) Var(y)
Page 42
Step 3: Write the Regression Equations
1. **Regression of y on x:**
y − ȳ = byx (x − x̄)
2. **Regression of x on y:**
x − x̄ = bxy (y − ȳ)
Page 43