We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 11
419723, 435 PM
In [1]: import pandas as pd
import numpy as np
import scipy.stats as st
In [2]: df=pd.read_csv("INFY.NS. csv"
df.head()
Out [2]: Date
Open High
LUntilea2- Jupyter Notebook
Low
Close
Adj Close
Volume
0 si21i2022
1 s12are022
2 312312022
3 3124/2022
4 312512022
In (31: df.info()
1861.000000 1886.900024
1850,000000 1890,000000
1897.000000 1900.000000
4856.150024 1894.599976
1892,000000 1894,000000
“947.099976
1839,000000
1857.000000
+1956.150024
1858.000000
251 entries, @ to 250
Data columns (total 7 columns):
Non-Null Count Dtype
Rangelndex:
# Column
@ bate
1 Open
2 High
3 Low
4 Close
5
6 Volume
251
251
251
251
251,
Adj Close 251
251
dtypes: float64(s),
memory usage: 13.94
In [4]: #Shapiro-Wilk test
from scipy.stats import shapiro
In [5]: shapiro(df["
out[S]: ShapiroResult(statistic=@,8402987718582153, pvalue=2.2313751036104985e-15)
"Hig
1)
non-null
non-null
non-null
non-null
non-null
non-null
non-null
object
floated
floated
floated
floated
Floats
intea
int64(1), object(1)
KB
11853.050049
1887.400024
1872.400024
+1986.599951
1876.550049
1813.808716
1847.431274
1832.749023
+1846.746094
1836.811157
Pp value is much lower than the 0.05 so we have to reject the null hypothesis.
localhost 888/natebooks/Stats and MLUntied2 jpynot
19362085
5709982
6192824
13784303
3438588
am419723, 435 PM
In [6]:
out(6]:
In [7]:
out[7]:
In [8]:
out [8]:
In [9]:
out[9]:
LUntilea2- Jupyter Notebook
# confidence Level=@.8 , calculate CI for chol.
import scipy.stats as st
st.norm. interval (alpha=0.8, loc=np.mean(df["High”]),scale=st.sem(df["High"]))
Users\Administrator\Appbata\Local\Tenp\ipykernel_7788\2672082760.py:3: Dep
recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep
recated and wil be removed in SciPy 1.11.0. Use first positional argument or
keyword argument ‘confidence’ instead.
st-norm. interval (alpha=@.8,loc=np.mean(df["High"]), scale=st.sem(df[ "Hig
h")))
(1543. 2997722185137, 1560.791060721725)
st.norm. interval (alpha-0.84, loc=np.mean(df["High"]), scale=st.sem(d#["High"]))
C:\Users \Administrator\AppData\Local\Temp\ipykernel_7788\3405290328.py:1: Dep
recationliarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep
recated and wil be renoved in SciPy 1.11.0. Use first positional argument or
keyword argunent ‘confidence’ instead.
st norm. interval (alpha-0..84, loc=np.mean(df{"High"]), scale-st.sem(df["Hig
ny)
(1542. 4568393539932, 1561.6339935862454)
st. norm. interval (alpha=.89, loc=np.mean(df[ "High" ]),scale=st.sem(df["High"]))
C:\Users\Administrator\Appbata\Local \Tenp \ ipykernel_7788\1096260642. py:1: Dep
recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep
recated and wil be renoved in SciPy 1.11.0. Use first positional argument or
keyword argument ‘confidence’ instead.
st. norm. interval (alpha=@.89, loc=np.mean(df["High"]), scale=st.sem(df[ "Hig
"»)
(1541. 1389270061688, 1562.9519059340698)
# confidence Level=0.99 , calculate CI for trestbps.
import scipy.stats as st
st-norm.interval (alpha=0.84, loc=np.mean(df["Close"]), scale=st. sem(df[ "Close" ])
:\Users\Administrator\AppData\Local\Temp\ipykernel_7788\4073126372.py:3: Dep
recationliarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep
recated and wil be renoved in SciPy 1.11.0. Use first positional argument or
keyword argunent ‘confidence’ instead.
st norm. interval (alpha-0.84, loc=np.mean(df["Close"]), scale=st.sem(df["Clos
e))
(1527.1994354186372, 1546.4826418403272)
localhost 888/natebooks/Stats and MLUntied2 jpynot ant419723, 435 PM
In [10]:
out(19):
In [11]:
In [12]:
out (12):
In [13]:
out [13]:
In [14]:
out(14]:
In [15]:
In [16]:
out [16]:
LUntilea2- Jupyter Notebook
import scipy.stats as st
st.norm.interval(alpha=0.84,loc=np.mean(df["Low"]), scale=
st. sem(dF["Low"]))
C:\Users\Administrator\Appbata\Local \Tenp\ ipykernel_7788\2997352743.py:2: Dep
recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep
recated and wil be renoved in SciPy 1.11.0. Use first positional argument or
keyword argument ‘confidence’ instead.
st. norm. interval (alpha=@.84, loc=np.mean(df["Low"]) ,scale=st.sem(df["Low"]))
(1513. 3218811097054, 1532.4665715835217)
Confidence intervals Using the t Distribution
# create a sample dataset
dfi=np.random.randint (20,40,62)
shapiro(df1)
ShapiroResult (statistic=0.9521715044975281, pvalue=0.019726457074284554)
import scipy.stats as st
st.t. interval (alpha=@.84,df=Len(df1)-1,loc=np.mean(df1), scal
st. sem(df1))
C:\Users \Administrator\AppData\Local\Temp\ ipykernel_7788\1478219878.py:2: Dep
recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep
recated and wil be removed in SciPy 1.11.0. Use first positional argument or
keyword argument ‘confidence’ instead.
st.t.interval (alpha=0.84,df=len(df1) -1, loc=np.mean(d#1), scale=st.sem(df1))
(29.118613342362874, 31.181386657637123)
import scipy.stats as st
st.t. interval (alpha=@.94,df=1en(df1)-1,1oc=np.mean(df1), scale=st.sem(df1))
C:\Users\Administrator\AppData\Local\Temp\ ipykernel_7788\1651804482.py:2: Dep
recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep
recated and wil be removed in SciPy 1.11.0. Use first positional argument or
keyword argument ‘confidence’ instead.
st.t-interval (alpha=0.94,df=1en(df1)-1, loc=
-mean(df1), scale=st.sem(d#1))
(28.76009951742648, 31.53990048257352)
‘One Sample t-test in Python
data = [1360,1362,1355,1378,1377,1393,1376,1386,1414]
import scipy.stats as st
st. ttest_1samp(data, popmean=1377)
TtestResult (statistic-0.1451848242151389, pvalue=0.8881562000414411, df=8)
‘Two Sample test in Python
localhost 888/natebooks/Stats and MLUntied2 jpynot ant419723, 435 PM
In [17]:
In [18]:
out [18]:
In [ ]:
In [19]:
In [20]:
out [20]:
In [21]:
out [21]:
In [22]:
In [23]:
out [23]:
In [24]:
Untitled? - Jupyter Notebook
Samplet
Sample2
[1360, 1362, 1355,1378,1377,1393,1376,1386,1414]
[1340, 1352, 1335,1318,1387,1343,1366,1396,1424]
st.ttest_ind(Sample1, Sample2)
Ttest_indResult (stati stic=1.2675024291577285, pvalue=0.24478627949581977)
# hypothesis
synchronous = [94. , 84.9, 82.6, 69.5, 80.1, 79.6, 81.4, 77.8, 81.7, 78-8, 73.
asynchronous = [77.1, 71.7, 91. , 72.2, 74.8, 85.1, 67.6, 69.9, 75.3, 71.7, 65
shapiro(synchronous)
ShapiroResult(statisti:
-9676008820533752, pvalue=0.6555896997451782)
shapiro(asynchronous)
ShapiroResult(statistic=0.8898013830184937, pvalue=.08030176907777786)
print(np.var(synchronous), np.var(asynchronous))
40. 75208677685952 41.81714285714285
st.ttest_ind (synchronous, asynchronous, equal_var=False)
Ttest_indResult(statistic=2.8241907458142563, pvalue=0.008754235249671019)
ft test in pandas
create pandas DataFrame
if = pd.DataFrame({'method': ['A', "A', 'A', 'A', "AY, 'A', AY, “A, TAY, ‘AY
"B', ‘B', ‘B', ‘B', 'B', ‘B', "B', ‘B', "B', ‘B’
: [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 8
84, 88, 88, 89, 98, 98, 91]})
‘score
localhost 888/natebooks/Stats and MLUntied2 jpynot
ann419723, 435 PM
In [25]: df.head(100)
out(25]:
10
"
2
8
1“
18
16
w
8
19
QUESTION & ANSWER
Q1. An auto company decided to introduce a new six cylinder car whose mean petrol
‘consumption is claimed to be lower than that of the existing auto engine. It was found that the
mean petrol consumption for the 50 cars was 10 km per litre with a standard deviation of 3.5 km
per litre. Test at 5% level of significance, whether the claim of the new car petrol consumption is
method score
A
wooe eee ooo > re >>> > mm
n
n
nm
78
78
at
2
83
89
“4
80
a
a
84
88
88
89
90
90
“4
LUntilea2- Jupyter Notebook
9.5 km per litre on the average is acceptable.
localhost 888/natebooks/Stats and MLUntied2 jpynot
sit419723, 435 PM LUntilea2- Jupyter Notebook
In [26]: import numpy as np
from scipy.stats import t
# Sample size
n= 50
# Sample mean and standard deviation
xbar = 10
s= 3.5
# Claimed population mean
mu@ = 9.5
# Degrees of freedom
df=n-1
# Calculate the t-statistic
t_statistic = (x_bar - mug) / (s / np.sqrt(n))
# Calculate the p-value
pvalue = t.sf(np.abs(t_statistic), df) * 2
# Test at 5X Level of significance
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis. The mean petrol consumption of the new
els
print("Fail to reject the null hypothesis. The mean petrol consumption of +
Fail to reject the null hypothesis. The mean petrol consumption of the new ca
pis not significantly different from 9.5 km per litre.
2. A manufacturer of ball pens claims that a certain pen he manufactures has a mean writing
life of 400 pages with a standard deviation of 20 pages. A purchasing agent selects a sample of
100 pens and puts them for test. The mean writing life for the sample was 390 pages. Should
the purchasing agent reject the manufactures claim at 1% level?
localhost 888/natebooks/Stats and MLUntied2 jpynot419723, 435 PM
In (27):
Untitled? - Jupyter Notebook
import numpy as np
from scipy.stats import t
# Set the significance Level
alpha = 0.1
# Sample information
n= 100
x_bar = 398
ma = 400
s= 20
# Calculate the t-statistic
t_stat = (xbar - mu) / (s / np.sqrt(n))
# Calculate the degrees of freedom
df=n-1
# Calculate the p-value
p.val = t.cdf(t_stat, df)
# Compare the p-value with alpha
if p_val < alphi
print("Reject null hypothesis. The mean writing life is less than 490 page:
else:
print("Fail to reject null hypothesis. The mean writing life is 480 pages.
Reject null hypothesis. The mean writing life is less than 400 pages.
Q3. (i) Asample of 900 members has a mean 3.4 cm and SD 2.61 cm. Is the sample taken from
a large population with mean 3.25 om. and SD 2.62 om?
i) If the population is normal and its mean is unknown, find the 95% and 98% confidence
of true mean,
localhost 888/natebooks/Stats and MLUntied2 jpynot
mm419723, 435 PM
In [28]:
LUntilea2- Jupyter Notebook
import numpy as np
from scipy.stats import t
sample_mean = 3.4
sample_sd = 2.61
sample_size = 900
alpha = 2.05
t_critical_95
teritical_98
t.ppf(1 - alpha/2, sample_size-1)
t.ppf(1 - 0.02/2, sample_size-1)
lower_ci_95
upper_ci_95
sample_mean - t_critical_95 * (sample_sd / np.sqrt(sample_size))
sample_mean + t_critical_95 * (sample_sd / np.sqrt(sample_size))
lower_ci_98 = sample_mean - t_critical_98 * (sample_sd / np.sqrt(sample_size))
upper_ci_98 = sample_mean + t_critical_98 * (sample_sd / np.sqrt(sample_size))
print("95% confidence interval: ({:.4F},
print("98% confidence interval: ({:.4F},
4F})" format (lower_ci_95, upper_ci.
4f})" format (lower ci_98, upper_ci,
95% confidence interval: (3.2293, 3.5787)
98% confidence interval: (3.1972, 3.6028)
Q4. The mean weekly sales of soap bars in departmental stores were 146.3 bars per store
‘After an advertising campaign the mean weekly sales in 400 stores for a typical week increased
to 153.7 and showed a standard deviation of 17.2. Was the advertising campaign successful?
localhost 888/natebooks/Stats and MLUntied2 jpynot
ant419723, 435 PM United? - Jupyter Notebook
In [29]: import numpy as np
from scipy.stats import t
# Sample statistics
n= 400
x_bar = 153.7
s = 17.2
# NuLL hypothesis
mu@ = 146.3
# Degrees of freedom
df=n-1
# Standard error
se = s / np.sqrt(n)
# t-statistic
t_stat = (xbar - mua) / se
# p-value
pvalue = 1 - t.cdf(t_stat, dF)
# Significance Level
alpha = @.05
# Test decision
if p_value < alpha:
print("Reject the null hypothesis. The advertising campaign was successful
else:
print("Fail to reject the null hypothesis. The advertising campaign was not
Reject the null hypothesis. The advertising campaign was successful.
08.
‘The wages of the factory workers are assumed to be normally distributed with mean and
variance 25. A random sample of 50 workers gives the total wages equal to & 2,550, Test the
hypothesis y= 52, against the alternative hypothesis y = 49 at 1% level of significance.
localhost 888/natebooks/Stats and MLUntied2 jpynot amt419723, 435 PM
In [30]:
LUntilea2- Jupyter Notebook
import numpy as np
from scipy import stats
# Define the sample size, sample mean, and sample standard deviation
n= 50
x_bar = 2558 / n
s = np.sqrt(25)
# Set the null hypothesis mean and the alternative hypothesis mean
mu_null = 52
mu_alt = 49
# Calculate the t-statistic and p-value
t_statistic, p_value = stats.ttest_1samp([x_bar], mu_null, axis=0)
# Calculate the critical value based on the 1% Level of significance
alpha = 0.01
dfen-a
tcritical = stats.t.ppf(1 - alpha/2, dF)
# Print the results
print("Sample mean:", x_bar)
print("t-statistic:", t_statistic)
print("p-value:", p_value)
print("t-critical:", t_critical)
if abs(t_statistic) > t_critical or p_value < alpha:
print("Reject the null hypothesis”)
els
print("Fail to reject the null hypothesis")
Sample mean: 51.0
t-statistic: nan
p-value: nan
tecritical: 2.67995197363155
Fail to reject the null hypothesis
€:\Progranbata\anaconda3\1ib\site-packages\scipy\stats\_axis_nan_policy.py:
2: RuntimeWarning: Precision loss occurred in moment calculation due to catas
trophic cancellation. This occurs when the data are nearly identical. Results
may be unreliable.
res = hypotest_fun_out(*samples, **kwds)
C:\ProgranData\anaconda3\1ib\site-packages\scipy\stats\_stats_py.py:1214: Run
timeliarning: divide by zero encountered in divide
var *= np.divide(n, n-ddof) # to avoid error on division by zero
€:\ProgranData\anaconda3\1ib\site-packages\scipy\stats\_stats_py.py:1214: Run
timeWarning: invalid value encountered in double_scalars
var *= np.divide(n, n-ddof) # to avoid error on division by zero
26,
‘An ambulance service claims that it takes on the average 8.9 minutes to reach its destination in
‘emergency calls. To check on this claim, the agency which licenses ambulance services has
them timed on 50 emergency calls, getting a mean of 9.3 minutes with a standard deviation of
1.6 minutes. What can they conclude at the level of significance.
localhost 888/natebooks/Stats and MLUntied2 jpynot sont419723, 435 PM
In (31):
In [ ]:
in [ ]:
LUntilea2- Jupyter Notebook
import scipy.stats as stats
n= 50
mean = 9.3
std = 1.6
cl = [9, 95, 99] # confidence Levels in percent
for ¢ in cl:
z = stats.norm.ppF((1 + ¢ / 108) / 2)
ci = (mean - 2 * std /n** 0.5, mean +7 * std /-n ** 0.5)
print(f"Aat {c}% confidence level, the confidence interval is {ci}")
At 90% confidence level, the confidence interval is (8.927812110823465, 9.672
187889176536)
At 95% confidence level, the confidence interval is (8.856510776208104, 9.743
489223791897)
‘At 99% confidence level, the confidence interval is (8.717156362330098, 9.882
843637669904)
localhost 888/natebooks/Stats and MLUntied2 jpynot
na