Hypothesis Testing

Uploaded by

Mr conqueror Fan page

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

82 views11 pages

Hypothesis Testing

Uploaded by

Mr conqueror Fan page

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 11

419723, 435 PM In [1]: import pandas as pd import numpy as np import scipy.stats as st In [2]: df=pd.read_csv("INFY.NS. csv" df.head() Out [2]: Date Open High LUntilea2- Jupyter Notebook Low Close Adj Close Volume 0 si21i2022 1 s12are022 2 312312022 3 3124/2022 4 312512022 In (31: df.info() 1861.000000 1886.900024 1850,000000 1890,000000 1897.000000 1900.000000 4856.150024 1894.599976 1892,000000 1894,000000 “947.099976 1839,000000 1857.000000 +1956.150024 1858.000000 251 entries, @ to 250 Data columns (total 7 columns): Non-Null Count Dtype Rangelndex: # Column @ bate 1 Open 2 High 3 Low 4 Close 5 6 Volume 251 251 251 251 251, Adj Close 251 251 dtypes: float64(s), memory usage: 13.94 In [4]: #Shapiro-Wilk test from scipy.stats import shapiro In [5]: shapiro(df[" out[S]: ShapiroResult(statistic=@,8402987718582153, pvalue=2.2313751036104985e-15) "Hig 1) non-null non-null non-null non-null non-null non-null non-null object floated floated floated floated Floats intea int64(1), object(1) KB 11853.050049 1887.400024 1872.400024 +1986.599951 1876.550049 1813.808716 1847.431274 1832.749023 +1846.746094 1836.811157 Pp value is much lower than the 0.05 so we have to reject the null hypothesis. localhost 888/natebooks/Stats and MLUntied2 jpynot 19362085 5709982 6192824 13784303 3438588 am419723, 435 PM In [6]: out(6]: In [7]: out[7]: In [8]: out [8]: In [9]: out[9]: LUntilea2- Jupyter Notebook # confidence Level=@.8 , calculate CI for chol. import scipy.stats as st st.norm. interval (alpha=0.8, loc=np.mean(df["High”]),scale=st.sem(df["High"])) Users\Administrator\Appbata\Local\Tenp\ipykernel_7788\2672082760.py:3: Dep recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be removed in SciPy 1.11.0. Use first positional argument or keyword argument ‘confidence’ instead. st-norm. interval (alpha=@.8,loc=np.mean(df["High"]), scale=st.sem(df[ "Hig h"))) (1543. 2997722185137, 1560.791060721725) st.norm. interval (alpha-0.84, loc=np.mean(df["High"]), scale=st.sem(d#["High"])) C:\Users \Administrator\AppData\Local\Temp\ipykernel_7788\3405290328.py:1: Dep recationliarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be renoved in SciPy 1.11.0. Use first positional argument or keyword argunent ‘confidence’ instead. st norm. interval (alpha-0..84, loc=np.mean(df{"High"]), scale-st.sem(df["Hig ny) (1542. 4568393539932, 1561.6339935862454) st. norm. interval (alpha=.89, loc=np.mean(df[ "High" ]),scale=st.sem(df["High"])) C:\Users\Administrator\Appbata\Local \Tenp \ ipykernel_7788\1096260642. py:1: Dep recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be renoved in SciPy 1.11.0. Use first positional argument or keyword argument ‘confidence’ instead. st. norm. interval (alpha=@.89, loc=np.mean(df["High"]), scale=st.sem(df[ "Hig "») (1541. 1389270061688, 1562.9519059340698) # confidence Level=0.99 , calculate CI for trestbps. import scipy.stats as st st-norm.interval (alpha=0.84, loc=np.mean(df["Close"]), scale=st. sem(df[ "Close" ]) :\Users\Administrator\AppData\Local\Temp\ipykernel_7788\4073126372.py:3: Dep recationliarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be renoved in SciPy 1.11.0. Use first positional argument or keyword argunent ‘confidence’ instead. st norm. interval (alpha-0.84, loc=np.mean(df["Close"]), scale=st.sem(df["Clos e)) (1527.1994354186372, 1546.4826418403272) localhost 888/natebooks/Stats and MLUntied2 jpynot ant419723, 435 PM In [10]: out(19): In [11]: In [12]: out (12): In [13]: out [13]: In [14]: out(14]: In [15]: In [16]: out [16]: LUntilea2- Jupyter Notebook import scipy.stats as st st.norm.interval(alpha=0.84,loc=np.mean(df["Low"]), scale= st. sem(dF["Low"])) C:\Users\Administrator\Appbata\Local \Tenp\ ipykernel_7788\2997352743.py:2: Dep recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be renoved in SciPy 1.11.0. Use first positional argument or keyword argument ‘confidence’ instead. st. norm. interval (alpha=@.84, loc=np.mean(df["Low"]) ,scale=st.sem(df["Low"])) (1513. 3218811097054, 1532.4665715835217) Confidence intervals Using the t Distribution # create a sample dataset dfi=np.random.randint (20,40,62) shapiro(df1) ShapiroResult (statistic=0.9521715044975281, pvalue=0.019726457074284554) import scipy.stats as st st.t. interval (alpha=@.84,df=Len(df1)-1,loc=np.mean(df1), scal st. sem(df1)) C:\Users \Administrator\AppData\Local\Temp\ ipykernel_7788\1478219878.py:2: Dep recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be removed in SciPy 1.11.0. Use first positional argument or keyword argument ‘confidence’ instead. st.t.interval (alpha=0.84,df=len(df1) -1, loc=np.mean(d#1), scale=st.sem(df1)) (29.118613342362874, 31.181386657637123) import scipy.stats as st st.t. interval (alpha=@.94,df=1en(df1)-1,1oc=np.mean(df1), scale=st.sem(df1)) C:\Users\Administrator\AppData\Local\Temp\ ipykernel_7788\1651804482.py:2: Dep recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be removed in SciPy 1.11.0. Use first positional argument or keyword argument ‘confidence’ instead. st.t-interval (alpha=0.94,df=1en(df1)-1, loc= -mean(df1), scale=st.sem(d#1)) (28.76009951742648, 31.53990048257352) ‘One Sample t-test in Python data = [1360,1362,1355,1378,1377,1393,1376,1386,1414] import scipy.stats as st st. ttest_1samp(data, popmean=1377) TtestResult (statistic-0.1451848242151389, pvalue=0.8881562000414411, df=8) ‘Two Sample test in Python localhost 888/natebooks/Stats and MLUntied2 jpynot ant419723, 435 PM In [17]: In [18]: out [18]: In [ ]: In [19]: In [20]: out [20]: In [21]: out [21]: In [22]: In [23]: out [23]: In [24]: Untitled? - Jupyter Notebook Samplet Sample2 [1360, 1362, 1355,1378,1377,1393,1376,1386,1414] [1340, 1352, 1335,1318,1387,1343,1366,1396,1424] st.ttest_ind(Sample1, Sample2) Ttest_indResult (stati stic=1.2675024291577285, pvalue=0.24478627949581977) # hypothesis synchronous = [94. , 84.9, 82.6, 69.5, 80.1, 79.6, 81.4, 77.8, 81.7, 78-8, 73. asynchronous = [77.1, 71.7, 91. , 72.2, 74.8, 85.1, 67.6, 69.9, 75.3, 71.7, 65 shapiro(synchronous) ShapiroResult(statisti: -9676008820533752, pvalue=0.6555896997451782) shapiro(asynchronous) ShapiroResult(statistic=0.8898013830184937, pvalue=.08030176907777786) print(np.var(synchronous), np.var(asynchronous)) 40. 75208677685952 41.81714285714285 st.ttest_ind (synchronous, asynchronous, equal_var=False) Ttest_indResult(statistic=2.8241907458142563, pvalue=0.008754235249671019) ft test in pandas create pandas DataFrame if = pd.DataFrame({'method': ['A', "A', 'A', 'A', "AY, 'A', AY, “A, TAY, ‘AY "B', ‘B', ‘B', ‘B', 'B', ‘B', "B', ‘B', "B', ‘B’ : [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 8 84, 88, 88, 89, 98, 98, 91]}) ‘score localhost 888/natebooks/Stats and MLUntied2 jpynot ann419723, 435 PM In [25]: df.head(100) out(25]: 10 " 2 8 1“ 18 16 w 8 19 QUESTION & ANSWER Q1. An auto company decided to introduce a new six cylinder car whose mean petrol ‘consumption is claimed to be lower than that of the existing auto engine. It was found that the mean petrol consumption for the 50 cars was 10 km per litre with a standard deviation of 3.5 km per litre. Test at 5% level of significance, whether the claim of the new car petrol consumption is method score A wooe eee ooo > re >>> > mm n n nm 78 78 at 2 83 89 “4 80 a a 84 88 88 89 90 90 “4 LUntilea2- Jupyter Notebook 9.5 km per litre on the average is acceptable. localhost 888/natebooks/Stats and MLUntied2 jpynot sit419723, 435 PM LUntilea2- Jupyter Notebook In [26]: import numpy as np from scipy.stats import t # Sample size n= 50 # Sample mean and standard deviation xbar = 10 s= 3.5 # Claimed population mean mu@ = 9.5 # Degrees of freedom df=n-1 # Calculate the t-statistic t_statistic = (x_bar - mug) / (s / np.sqrt(n)) # Calculate the p-value pvalue = t.sf(np.abs(t_statistic), df) * 2 # Test at 5X Level of significance alpha = 0.05 if p_value < alpha: print("Reject the null hypothesis. The mean petrol consumption of the new els print("Fail to reject the null hypothesis. The mean petrol consumption of + Fail to reject the null hypothesis. The mean petrol consumption of the new ca pis not significantly different from 9.5 km per litre. 2. A manufacturer of ball pens claims that a certain pen he manufactures has a mean writing life of 400 pages with a standard deviation of 20 pages. A purchasing agent selects a sample of 100 pens and puts them for test. The mean writing life for the sample was 390 pages. Should the purchasing agent reject the manufactures claim at 1% level? localhost 888/natebooks/Stats and MLUntied2 jpynot419723, 435 PM In (27): Untitled? - Jupyter Notebook import numpy as np from scipy.stats import t # Set the significance Level alpha = 0.1 # Sample information n= 100 x_bar = 398 ma = 400 s= 20 # Calculate the t-statistic t_stat = (xbar - mu) / (s / np.sqrt(n)) # Calculate the degrees of freedom df=n-1 # Calculate the p-value p.val = t.cdf(t_stat, df) # Compare the p-value with alpha if p_val < alphi print("Reject null hypothesis. The mean writing life is less than 490 page: else: print("Fail to reject null hypothesis. The mean writing life is 480 pages. Reject null hypothesis. The mean writing life is less than 400 pages. Q3. (i) Asample of 900 members has a mean 3.4 cm and SD 2.61 cm. Is the sample taken from a large population with mean 3.25 om. and SD 2.62 om? i) If the population is normal and its mean is unknown, find the 95% and 98% confidence of true mean, localhost 888/natebooks/Stats and MLUntied2 jpynot mm419723, 435 PM In [28]: LUntilea2- Jupyter Notebook import numpy as np from scipy.stats import t sample_mean = 3.4 sample_sd = 2.61 sample_size = 900 alpha = 2.05 t_critical_95 teritical_98 t.ppf(1 - alpha/2, sample_size-1) t.ppf(1 - 0.02/2, sample_size-1) lower_ci_95 upper_ci_95 sample_mean - t_critical_95 * (sample_sd / np.sqrt(sample_size)) sample_mean + t_critical_95 * (sample_sd / np.sqrt(sample_size)) lower_ci_98 = sample_mean - t_critical_98 * (sample_sd / np.sqrt(sample_size)) upper_ci_98 = sample_mean + t_critical_98 * (sample_sd / np.sqrt(sample_size)) print("95% confidence interval: ({:.4F}, print("98% confidence interval: ({:.4F}, 4F})" format (lower_ci_95, upper_ci. 4f})" format (lower ci_98, upper_ci, 95% confidence interval: (3.2293, 3.5787) 98% confidence interval: (3.1972, 3.6028) Q4. The mean weekly sales of soap bars in departmental stores were 146.3 bars per store ‘After an advertising campaign the mean weekly sales in 400 stores for a typical week increased to 153.7 and showed a standard deviation of 17.2. Was the advertising campaign successful? localhost 888/natebooks/Stats and MLUntied2 jpynot ant419723, 435 PM United? - Jupyter Notebook In [29]: import numpy as np from scipy.stats import t # Sample statistics n= 400 x_bar = 153.7 s = 17.2 # NuLL hypothesis mu@ = 146.3 # Degrees of freedom df=n-1 # Standard error se = s / np.sqrt(n) # t-statistic t_stat = (xbar - mua) / se # p-value pvalue = 1 - t.cdf(t_stat, dF) # Significance Level alpha = @.05 # Test decision if p_value < alpha: print("Reject the null hypothesis. The advertising campaign was successful else: print("Fail to reject the null hypothesis. The advertising campaign was not Reject the null hypothesis. The advertising campaign was successful. 08. ‘The wages of the factory workers are assumed to be normally distributed with mean and variance 25. A random sample of 50 workers gives the total wages equal to & 2,550, Test the hypothesis y= 52, against the alternative hypothesis y = 49 at 1% level of significance. localhost 888/natebooks/Stats and MLUntied2 jpynot amt419723, 435 PM In [30]: LUntilea2- Jupyter Notebook import numpy as np from scipy import stats # Define the sample size, sample mean, and sample standard deviation n= 50 x_bar = 2558 / n s = np.sqrt(25) # Set the null hypothesis mean and the alternative hypothesis mean mu_null = 52 mu_alt = 49 # Calculate the t-statistic and p-value t_statistic, p_value = stats.ttest_1samp([x_bar], mu_null, axis=0) # Calculate the critical value based on the 1% Level of significance alpha = 0.01 dfen-a tcritical = stats.t.ppf(1 - alpha/2, dF) # Print the results print("Sample mean:", x_bar) print("t-statistic:", t_statistic) print("p-value:", p_value) print("t-critical:", t_critical) if abs(t_statistic) > t_critical or p_value < alpha: print("Reject the null hypothesis”) els print("Fail to reject the null hypothesis") Sample mean: 51.0 t-statistic: nan p-value: nan tecritical: 2.67995197363155 Fail to reject the null hypothesis €:\Progranbata\anaconda3\1ib\site-packages\scipy\stats\_axis_nan_policy.py: 2: RuntimeWarning: Precision loss occurred in moment calculation due to catas trophic cancellation. This occurs when the data are nearly identical. Results may be unreliable. res = hypotest_fun_out(*samples, **kwds) C:\ProgranData\anaconda3\1ib\site-packages\scipy\stats\_stats_py.py:1214: Run timeliarning: divide by zero encountered in divide var *= np.divide(n, n-ddof) # to avoid error on division by zero €:\ProgranData\anaconda3\1ib\site-packages\scipy\stats\_stats_py.py:1214: Run timeWarning: invalid value encountered in double_scalars var *= np.divide(n, n-ddof) # to avoid error on division by zero 26, ‘An ambulance service claims that it takes on the average 8.9 minutes to reach its destination in ‘emergency calls. To check on this claim, the agency which licenses ambulance services has them timed on 50 emergency calls, getting a mean of 9.3 minutes with a standard deviation of 1.6 minutes. What can they conclude at the level of significance. localhost 888/natebooks/Stats and MLUntied2 jpynot sont419723, 435 PM In (31): In [ ]: in [ ]: LUntilea2- Jupyter Notebook import scipy.stats as stats n= 50 mean = 9.3 std = 1.6 cl = [9, 95, 99] # confidence Levels in percent for ¢ in cl: z = stats.norm.ppF((1 + ¢ / 108) / 2) ci = (mean - 2 * std /n** 0.5, mean +7 * std /-n ** 0.5) print(f"Aat {c}% confidence level, the confidence interval is {ci}") At 90% confidence level, the confidence interval is (8.927812110823465, 9.672 187889176536) At 95% confidence level, the confidence interval is (8.856510776208104, 9.743 489223791897) ‘At 99% confidence level, the confidence interval is (8.717156362330098, 9.882 843637669904) localhost 888/natebooks/Stats and MLUntied2 jpynot na

Statistics
No ratings yet
Statistics
163 pages
Data Science Practical With Solutions BSC Cs Sem 6
No ratings yet
Data Science Practical With Solutions BSC Cs Sem 6
29 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
COST - JournalPracticals (1-7)
No ratings yet
COST - JournalPracticals (1-7)
22 pages
STATSCHEATSHeet
No ratings yet
STATSCHEATSHeet
5 pages
Data Science Practicals
No ratings yet
Data Science Practicals
47 pages
Estimation and Hypothesis Testing Guide
No ratings yet
Estimation and Hypothesis Testing Guide
3 pages
Results
No ratings yet
Results
8 pages
Data Analytics Lab - Introduction
No ratings yet
Data Analytics Lab - Introduction
43 pages
Omkar
No ratings yet
Omkar
37 pages
Lecture 15 - Statistics For Data Science (Inferential Statistics)
No ratings yet
Lecture 15 - Statistics For Data Science (Inferential Statistics)
25 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
17 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
HW 2
No ratings yet
HW 2
12 pages
R Practice
No ratings yet
R Practice
38 pages
ADS LAB Merged
No ratings yet
ADS LAB Merged
86 pages
How To Perform T-Test in Pandas
No ratings yet
How To Perform T-Test in Pandas
5 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
MH 3511 Midterm 2018 So LN
No ratings yet
MH 3511 Midterm 2018 So LN
5 pages
Experiment 7 Prob R
No ratings yet
Experiment 7 Prob R
5 pages
Solutions Questionnaire Exercises W1-W3
No ratings yet
Solutions Questionnaire Exercises W1-W3
27 pages
BAN5
No ratings yet
BAN5
2 pages
DA Pr3 Output
No ratings yet
DA Pr3 Output
1 page
Tarea de Laboratorio de Diseño Experiemtal
No ratings yet
Tarea de Laboratorio de Diseño Experiemtal
8 pages
Staff Manual 06
No ratings yet
Staff Manual 06
3 pages
Lab Checkup Notes 2
No ratings yet
Lab Checkup Notes 2
7 pages
Sasha2411 Hypothesis Testing With Scipy
No ratings yet
Sasha2411 Hypothesis Testing With Scipy
1 page
ADS EXP Assignments
No ratings yet
ADS EXP Assignments
38 pages
Output Da Record
No ratings yet
Output Da Record
16 pages
Tinywow Matlabworkbookstathw4 83108852
No ratings yet
Tinywow Matlabworkbookstathw4 83108852
16 pages
Exp5ids Merged
No ratings yet
Exp5ids Merged
7 pages
10 Mar - AssQ
No ratings yet
10 Mar - AssQ
2 pages
Experiment 8: Measures of Central Tendencies and Testing of Hypothesis
No ratings yet
Experiment 8: Measures of Central Tendencies and Testing of Hypothesis
5 pages
Probability Theory and Mathematical Statistics: Homework 4, Vitaliy Pozdnyakov
No ratings yet
Probability Theory and Mathematical Statistics: Homework 4, Vitaliy Pozdnyakov
8 pages
One Sample T Test Py
No ratings yet
One Sample T Test Py
2 pages
21bce0427 VL2022230503921 Ast04
No ratings yet
21bce0427 VL2022230503921 Ast04
13 pages
Ass 3
No ratings yet
Ass 3
3 pages
Assignment 7
No ratings yet
Assignment 7
23 pages
Mat Lab Workbooks Ta THW 4
No ratings yet
Mat Lab Workbooks Ta THW 4
4 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
2018dec 02402 Solution en
No ratings yet
2018dec 02402 Solution en
31 pages
UL3
No ratings yet
UL3
2 pages
Hypothesis Testing - Cheatsheet
No ratings yet
Hypothesis Testing - Cheatsheet
10 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1st Edition Frank Westhoff PDF Download
100% (13)
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1st Edition Frank Westhoff PDF Download
84 pages
Data Science Lab Manual: Pandas & Analysis
No ratings yet
Data Science Lab Manual: Pandas & Analysis
53 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Aih Exp 3
No ratings yet
Aih Exp 3
8 pages
Assignment 2 (Set B)
No ratings yet
Assignment 2 (Set B)
5 pages
Week 2 Part 1 Inferential Statistics 1 Self Paced TutorialsUpload
No ratings yet
Week 2 Part 1 Inferential Statistics 1 Self Paced TutorialsUpload
16 pages
Bayesian Network for Heart Disease
No ratings yet
Bayesian Network for Heart Disease
10 pages
Homework 9: Independent and Paired Samples T-Tests: Information 1
No ratings yet
Homework 9: Independent and Paired Samples T-Tests: Information 1
7 pages
Statistics Cheat Sheet
No ratings yet
Statistics Cheat Sheet
9 pages
Research Methods for Statisticians
50% (2)
Research Methods for Statisticians
5 pages
SPECIMEN EXAM SOLUTIONS - CS1B - IFoA - 2019 - Final
No ratings yet
SPECIMEN EXAM SOLUTIONS - CS1B - IFoA - 2019 - Final
8 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages

Hypothesis Testing

Uploaded by

Hypothesis Testing

Uploaded by

You might also like