[go: up one dir, main page]

0% found this document useful (0 votes)
82 views11 pages

Hypothesis Testing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
82 views11 pages

Hypothesis Testing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 11
419723, 435 PM In [1]: import pandas as pd import numpy as np import scipy.stats as st In [2]: df=pd.read_csv("INFY.NS. csv" df.head() Out [2]: Date Open High LUntilea2- Jupyter Notebook Low Close Adj Close Volume 0 si21i2022 1 s12are022 2 312312022 3 3124/2022 4 312512022 In (31: df.info() 1861.000000 1886.900024 1850,000000 1890,000000 1897.000000 1900.000000 4856.150024 1894.599976 1892,000000 1894,000000 “947.099976 1839,000000 1857.000000 +1956.150024 1858.000000 251 entries, @ to 250 Data columns (total 7 columns): Non-Null Count Dtype Rangelndex: # Column @ bate 1 Open 2 High 3 Low 4 Close 5 6 Volume 251 251 251 251 251, Adj Close 251 251 dtypes: float64(s), memory usage: 13.94 In [4]: #Shapiro-Wilk test from scipy.stats import shapiro In [5]: shapiro(df[" out[S]: ShapiroResult(statistic=@,8402987718582153, pvalue=2.2313751036104985e-15) "Hig 1) non-null non-null non-null non-null non-null non-null non-null object floated floated floated floated Floats intea int64(1), object(1) KB 11853.050049 1887.400024 1872.400024 +1986.599951 1876.550049 1813.808716 1847.431274 1832.749023 +1846.746094 1836.811157 Pp value is much lower than the 0.05 so we have to reject the null hypothesis. localhost 888/natebooks/Stats and MLUntied2 jpynot 19362085 5709982 6192824 13784303 3438588 am 419723, 435 PM In [6]: out(6]: In [7]: out[7]: In [8]: out [8]: In [9]: out[9]: LUntilea2- Jupyter Notebook # confidence Level=@.8 , calculate CI for chol. import scipy.stats as st st.norm. interval (alpha=0.8, loc=np.mean(df["High”]),scale=st.sem(df["High"])) Users\Administrator\Appbata\Local\Tenp\ipykernel_7788\2672082760.py:3: Dep recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be removed in SciPy 1.11.0. Use first positional argument or keyword argument ‘confidence’ instead. st-norm. interval (alpha=@.8,loc=np.mean(df["High"]), scale=st.sem(df[ "Hig h"))) (1543. 2997722185137, 1560.791060721725) st.norm. interval (alpha-0.84, loc=np.mean(df["High"]), scale=st.sem(d#["High"])) C:\Users \Administrator\AppData\Local\Temp\ipykernel_7788\3405290328.py:1: Dep recationliarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be renoved in SciPy 1.11.0. Use first positional argument or keyword argunent ‘confidence’ instead. st norm. interval (alpha-0..84, loc=np.mean(df{"High"]), scale-st.sem(df["Hig ny) (1542. 4568393539932, 1561.6339935862454) st. norm. interval (alpha=.89, loc=np.mean(df[ "High" ]),scale=st.sem(df["High"])) C:\Users\Administrator\Appbata\Local \Tenp \ ipykernel_7788\1096260642. py:1: Dep recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be renoved in SciPy 1.11.0. Use first positional argument or keyword argument ‘confidence’ instead. st. norm. interval (alpha=@.89, loc=np.mean(df["High"]), scale=st.sem(df[ "Hig "») (1541. 1389270061688, 1562.9519059340698) # confidence Level=0.99 , calculate CI for trestbps. import scipy.stats as st st-norm.interval (alpha=0.84, loc=np.mean(df["Close"]), scale=st. sem(df[ "Close" ]) :\Users\Administrator\AppData\Local\Temp\ipykernel_7788\4073126372.py:3: Dep recationliarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be renoved in SciPy 1.11.0. Use first positional argument or keyword argunent ‘confidence’ instead. st norm. interval (alpha-0.84, loc=np.mean(df["Close"]), scale=st.sem(df["Clos e)) (1527.1994354186372, 1546.4826418403272) localhost 888/natebooks/Stats and MLUntied2 jpynot ant 419723, 435 PM In [10]: out(19): In [11]: In [12]: out (12): In [13]: out [13]: In [14]: out(14]: In [15]: In [16]: out [16]: LUntilea2- Jupyter Notebook import scipy.stats as st st.norm.interval(alpha=0.84,loc=np.mean(df["Low"]), scale= st. sem(dF["Low"])) C:\Users\Administrator\Appbata\Local \Tenp\ ipykernel_7788\2997352743.py:2: Dep recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be renoved in SciPy 1.11.0. Use first positional argument or keyword argument ‘confidence’ instead. st. norm. interval (alpha=@.84, loc=np.mean(df["Low"]) ,scale=st.sem(df["Low"])) (1513. 3218811097054, 1532.4665715835217) Confidence intervals Using the t Distribution # create a sample dataset dfi=np.random.randint (20,40,62) shapiro(df1) ShapiroResult (statistic=0.9521715044975281, pvalue=0.019726457074284554) import scipy.stats as st st.t. interval (alpha=@.84,df=Len(df1)-1,loc=np.mean(df1), scal st. sem(df1)) C:\Users \Administrator\AppData\Local\Temp\ ipykernel_7788\1478219878.py:2: Dep recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be removed in SciPy 1.11.0. Use first positional argument or keyword argument ‘confidence’ instead. st.t.interval (alpha=0.84,df=len(df1) -1, loc=np.mean(d#1), scale=st.sem(df1)) (29.118613342362874, 31.181386657637123) import scipy.stats as st st.t. interval (alpha=@.94,df=1en(df1)-1,1oc=np.mean(df1), scale=st.sem(df1)) C:\Users\Administrator\AppData\Local\Temp\ ipykernel_7788\1651804482.py:2: Dep recationWarning: Use of keyword argument ‘alpha’ for method ‘interval’ is dep recated and wil be removed in SciPy 1.11.0. Use first positional argument or keyword argument ‘confidence’ instead. st.t-interval (alpha=0.94,df=1en(df1)-1, loc= -mean(df1), scale=st.sem(d#1)) (28.76009951742648, 31.53990048257352) ‘One Sample t-test in Python data = [1360,1362,1355,1378,1377,1393,1376,1386,1414] import scipy.stats as st st. ttest_1samp(data, popmean=1377) TtestResult (statistic-0.1451848242151389, pvalue=0.8881562000414411, df=8) ‘Two Sample test in Python localhost 888/natebooks/Stats and MLUntied2 jpynot ant 419723, 435 PM In [17]: In [18]: out [18]: In [ ]: In [19]: In [20]: out [20]: In [21]: out [21]: In [22]: In [23]: out [23]: In [24]: Untitled? - Jupyter Notebook Samplet Sample2 [1360, 1362, 1355,1378,1377,1393,1376,1386,1414] [1340, 1352, 1335,1318,1387,1343,1366,1396,1424] st.ttest_ind(Sample1, Sample2) Ttest_indResult (stati stic=1.2675024291577285, pvalue=0.24478627949581977) # hypothesis synchronous = [94. , 84.9, 82.6, 69.5, 80.1, 79.6, 81.4, 77.8, 81.7, 78-8, 73. asynchronous = [77.1, 71.7, 91. , 72.2, 74.8, 85.1, 67.6, 69.9, 75.3, 71.7, 65 shapiro(synchronous) ShapiroResult(statisti: -9676008820533752, pvalue=0.6555896997451782) shapiro(asynchronous) ShapiroResult(statistic=0.8898013830184937, pvalue=.08030176907777786) print(np.var(synchronous), np.var(asynchronous)) 40. 75208677685952 41.81714285714285 st.ttest_ind (synchronous, asynchronous, equal_var=False) Ttest_indResult(statistic=2.8241907458142563, pvalue=0.008754235249671019) ft test in pandas create pandas DataFrame if = pd.DataFrame({'method': ['A', "A', 'A', 'A', "AY, 'A', AY, “A, TAY, ‘AY "B', ‘B', ‘B', ‘B', 'B', ‘B', "B', ‘B', "B', ‘B’ : [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 8 84, 88, 88, 89, 98, 98, 91]}) ‘score localhost 888/natebooks/Stats and MLUntied2 jpynot ann 419723, 435 PM In [25]: df.head(100) out(25]: 10 " 2 8 1“ 18 16 w 8 19 QUESTION & ANSWER Q1. An auto company decided to introduce a new six cylinder car whose mean petrol ‘consumption is claimed to be lower than that of the existing auto engine. It was found that the mean petrol consumption for the 50 cars was 10 km per litre with a standard deviation of 3.5 km per litre. Test at 5% level of significance, whether the claim of the new car petrol consumption is method score A wooe eee ooo > re >>> > mm n n nm 78 78 at 2 83 89 “4 80 a a 84 88 88 89 90 90 “4 LUntilea2- Jupyter Notebook 9.5 km per litre on the average is acceptable. localhost 888/natebooks/Stats and MLUntied2 jpynot sit 419723, 435 PM LUntilea2- Jupyter Notebook In [26]: import numpy as np from scipy.stats import t # Sample size n= 50 # Sample mean and standard deviation xbar = 10 s= 3.5 # Claimed population mean mu@ = 9.5 # Degrees of freedom df=n-1 # Calculate the t-statistic t_statistic = (x_bar - mug) / (s / np.sqrt(n)) # Calculate the p-value pvalue = t.sf(np.abs(t_statistic), df) * 2 # Test at 5X Level of significance alpha = 0.05 if p_value < alpha: print("Reject the null hypothesis. The mean petrol consumption of the new els print("Fail to reject the null hypothesis. The mean petrol consumption of + Fail to reject the null hypothesis. The mean petrol consumption of the new ca pis not significantly different from 9.5 km per litre. 2. A manufacturer of ball pens claims that a certain pen he manufactures has a mean writing life of 400 pages with a standard deviation of 20 pages. A purchasing agent selects a sample of 100 pens and puts them for test. The mean writing life for the sample was 390 pages. Should the purchasing agent reject the manufactures claim at 1% level? localhost 888/natebooks/Stats and MLUntied2 jpynot 419723, 435 PM In (27): Untitled? - Jupyter Notebook import numpy as np from scipy.stats import t # Set the significance Level alpha = 0.1 # Sample information n= 100 x_bar = 398 ma = 400 s= 20 # Calculate the t-statistic t_stat = (xbar - mu) / (s / np.sqrt(n)) # Calculate the degrees of freedom df=n-1 # Calculate the p-value p.val = t.cdf(t_stat, df) # Compare the p-value with alpha if p_val < alphi print("Reject null hypothesis. The mean writing life is less than 490 page: else: print("Fail to reject null hypothesis. The mean writing life is 480 pages. Reject null hypothesis. The mean writing life is less than 400 pages. Q3. (i) Asample of 900 members has a mean 3.4 cm and SD 2.61 cm. Is the sample taken from a large population with mean 3.25 om. and SD 2.62 om? i) If the population is normal and its mean is unknown, find the 95% and 98% confidence of true mean, localhost 888/natebooks/Stats and MLUntied2 jpynot mm 419723, 435 PM In [28]: LUntilea2- Jupyter Notebook import numpy as np from scipy.stats import t sample_mean = 3.4 sample_sd = 2.61 sample_size = 900 alpha = 2.05 t_critical_95 teritical_98 t.ppf(1 - alpha/2, sample_size-1) t.ppf(1 - 0.02/2, sample_size-1) lower_ci_95 upper_ci_95 sample_mean - t_critical_95 * (sample_sd / np.sqrt(sample_size)) sample_mean + t_critical_95 * (sample_sd / np.sqrt(sample_size)) lower_ci_98 = sample_mean - t_critical_98 * (sample_sd / np.sqrt(sample_size)) upper_ci_98 = sample_mean + t_critical_98 * (sample_sd / np.sqrt(sample_size)) print("95% confidence interval: ({:.4F}, print("98% confidence interval: ({:.4F}, 4F})" format (lower_ci_95, upper_ci. 4f})" format (lower ci_98, upper_ci, 95% confidence interval: (3.2293, 3.5787) 98% confidence interval: (3.1972, 3.6028) Q4. The mean weekly sales of soap bars in departmental stores were 146.3 bars per store ‘After an advertising campaign the mean weekly sales in 400 stores for a typical week increased to 153.7 and showed a standard deviation of 17.2. Was the advertising campaign successful? localhost 888/natebooks/Stats and MLUntied2 jpynot ant 419723, 435 PM United? - Jupyter Notebook In [29]: import numpy as np from scipy.stats import t # Sample statistics n= 400 x_bar = 153.7 s = 17.2 # NuLL hypothesis mu@ = 146.3 # Degrees of freedom df=n-1 # Standard error se = s / np.sqrt(n) # t-statistic t_stat = (xbar - mua) / se # p-value pvalue = 1 - t.cdf(t_stat, dF) # Significance Level alpha = @.05 # Test decision if p_value < alpha: print("Reject the null hypothesis. The advertising campaign was successful else: print("Fail to reject the null hypothesis. The advertising campaign was not Reject the null hypothesis. The advertising campaign was successful. 08. ‘The wages of the factory workers are assumed to be normally distributed with mean and variance 25. A random sample of 50 workers gives the total wages equal to & 2,550, Test the hypothesis y= 52, against the alternative hypothesis y = 49 at 1% level of significance. localhost 888/natebooks/Stats and MLUntied2 jpynot amt 419723, 435 PM In [30]: LUntilea2- Jupyter Notebook import numpy as np from scipy import stats # Define the sample size, sample mean, and sample standard deviation n= 50 x_bar = 2558 / n s = np.sqrt(25) # Set the null hypothesis mean and the alternative hypothesis mean mu_null = 52 mu_alt = 49 # Calculate the t-statistic and p-value t_statistic, p_value = stats.ttest_1samp([x_bar], mu_null, axis=0) # Calculate the critical value based on the 1% Level of significance alpha = 0.01 dfen-a tcritical = stats.t.ppf(1 - alpha/2, dF) # Print the results print("Sample mean:", x_bar) print("t-statistic:", t_statistic) print("p-value:", p_value) print("t-critical:", t_critical) if abs(t_statistic) > t_critical or p_value < alpha: print("Reject the null hypothesis”) els print("Fail to reject the null hypothesis") Sample mean: 51.0 t-statistic: nan p-value: nan tecritical: 2.67995197363155 Fail to reject the null hypothesis €:\Progranbata\anaconda3\1ib\site-packages\scipy\stats\_axis_nan_policy.py: 2: RuntimeWarning: Precision loss occurred in moment calculation due to catas trophic cancellation. This occurs when the data are nearly identical. Results may be unreliable. res = hypotest_fun_out(*samples, **kwds) C:\ProgranData\anaconda3\1ib\site-packages\scipy\stats\_stats_py.py:1214: Run timeliarning: divide by zero encountered in divide var *= np.divide(n, n-ddof) # to avoid error on division by zero €:\ProgranData\anaconda3\1ib\site-packages\scipy\stats\_stats_py.py:1214: Run timeWarning: invalid value encountered in double_scalars var *= np.divide(n, n-ddof) # to avoid error on division by zero 26, ‘An ambulance service claims that it takes on the average 8.9 minutes to reach its destination in ‘emergency calls. To check on this claim, the agency which licenses ambulance services has them timed on 50 emergency calls, getting a mean of 9.3 minutes with a standard deviation of 1.6 minutes. What can they conclude at the level of significance. localhost 888/natebooks/Stats and MLUntied2 jpynot sont 419723, 435 PM In (31): In [ ]: in [ ]: LUntilea2- Jupyter Notebook import scipy.stats as stats n= 50 mean = 9.3 std = 1.6 cl = [9, 95, 99] # confidence Levels in percent for ¢ in cl: z = stats.norm.ppF((1 + ¢ / 108) / 2) ci = (mean - 2 * std /n** 0.5, mean +7 * std /-n ** 0.5) print(f"Aat {c}% confidence level, the confidence interval is {ci}") At 90% confidence level, the confidence interval is (8.927812110823465, 9.672 187889176536) At 95% confidence level, the confidence interval is (8.856510776208104, 9.743 489223791897) ‘At 99% confidence level, the confidence interval is (8.717156362330098, 9.882 843637669904) localhost 888/natebooks/Stats and MLUntied2 jpynot na

You might also like