[go: up one dir, main page]

100% found this document useful (1 vote)
1K views16 pages

Statistics for Business Analysis

This document contains instructions for completing statistics assignments involving salary data analysis. Key tasks include: 1. Identifying qualitative and quantitative parameters in the data, creating summary tables for job level, and drawing graphs for education, sector, and salary. 2. Calculating descriptive statistics and testing the normality of salary data, drawing box plots to identify outliers for age, and creating a Pareto chart for total salary by discipline. 3. Drawing a scatter diagram between age and salary and calculating probabilities related to normal distributions for TV lifetime, customer service times, and weekly orange deliveries. 4. Analyzing capability indices for order delivery times and providing recommendations to improve supplier performance.

Uploaded by

Karma Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views16 pages

Statistics for Business Analysis

This document contains instructions for completing statistics assignments involving salary data analysis. Key tasks include: 1. Identifying qualitative and quantitative parameters in the data, creating summary tables for job level, and drawing graphs for education, sector, and salary. 2. Calculating descriptive statistics and testing the normality of salary data, drawing box plots to identify outliers for age, and creating a Pareto chart for total salary by discipline. 3. Drawing a scatter diagram between age and salary and calculating probabilities related to normal distributions for TV lifetime, customer service times, and weekly orange deliveries. 4. Analyzing capability indices for order delivery times and providing recommendations to improve supplier performance.

Uploaded by

Karma Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Statistics assignment

Task 1: Using the Salary Data Sheet provided to perform the following tasks and
comment on the results

 Identify the type of each parameter (Qualitative & Quantitative).


1. Quantitative data are (Age, salary, customer satisfaction)
2. Qualitative data are (Name, job level, sector, education, discipline)
 Create summary table for Job Level.
Job Level
N Valid 682
Missing 0
Mean 1.9868
Median 2.0000
Mode 1.00
Std. Deviation .93777
Variance .879
Range 2.00
Minimum FM
1.00
Maximum 3.00
Sum 1355.00

Job Level
Frequen Percen Valid Cumulative
cy t Percent Percent
Valid Senior 304 44.6 44.6 44.6
Junior 83 12.2 12.2 56.7
Manag 295 43.3 43.3 100.0
er
Total 682 100.0 100.0

1
 Draw bar graph for Education.

There are 682 persons in the entire sample size. from the people, 260 persons hold a
bachelor's degree, constituting 38% of the entire population. Furthermore, 173 people,
or 25% of the total, are doctorate holders. And last, 249 people, or 37% of the total,
possess a master's degree.

 Draw pie chart for Sector.

FM

The distribution of 682 people by industry of employment is shown in the pie chart. Out
of the whole sample, 175 people, or 25.7%, work for the government, and 507 people, or
74.3%, work for private companies.

2
 Draw a histogram for salary.

- The sample incomes vary from 4,830 at the least to 30,435 at the top.
- Among the sample, 308 people earn between $5,000 and $10,000 per month, making
up the largest share (45%).
- Just seven people, or 0.1% of the total population, make less than $5,000 annually.
- In a similar vein, just 7.1% of people make more than $30,000.
 Calculate descriptive statistics of Salary and test its normality.
FM

Descriptive Statistics: Salary

3
One-Sample Kolmogorov-Smirnov Test

The following formula can be used to determine if a distribution is normal: IQR/Stdev.


Here, the computation yields a value of 1.33, which is the product of 9410 and 7039.
The test's outcome shows that the pay information is dispersed regularly.

 Draw box blot for Age and determine the existence of outliers.
FM

- The difference between the first quartile (Q1) and the third quartile (Q3) is
represented by the interquartile range (IQR), which is 16.56.
- There are no outliers in the boxplot graph, which suggests that the age group is
quite homogeneous.

4
the existence of outliers for age
Min 24.1
Q1 32.10553
median 36.0
Q3 48.66272
max 58.0
mean 39.7
IQR 16.55719
low limit 7.26975
high limit 56.94131

 Draw a Pareto chart for total salary of each Discipline & present your
conclusion about the vital few.

Sum of Salary by Discipline


2450000
2394975 FM

2400000
2350000 2331233
2300000
2250000 2236497
2200000
2150000 2114859
2100000
2050000
2000000
1950000
Business Engineering IT Lawyer

- In this sample, the Engineering group's total compensation is 2,394,975, or 26.4% of


the sample's total income. It is noteworthy that the Engineering group comprises 181
individuals, signifying 26.5% of the sample's overall membership.
- Together, engineering and IT make about 52% of the total compensation expense.
- In our sample, the fields of engineering, information technology, business, and law all
have comparable degrees of relevance.
 Draw a scatter diagram between Age & Salary and discuss the result to
obtain conclusion about the relationship between the two parameters.
5
The addition of more factors results in the existence of several complex pathways, and
as age grows, the association between two words gets stronger.

Task 2:

You are working in TV set factory. The manufactured TV has a normal


distribution life with  = 3,500 working hours and  = 200 hours.
FM

 What is the probability that a TV will work less than 3,350 hours?

x−μ 3350−3500
P(x<3350)= p( σ < 200
) = p(z < -0.75) = 0.2266

 What is the probability that a TV will work more than 3,750 hours?

 What is the probability that a TV will work between 3,350 & 3,750 hours?

3350−3500 x−μ 3370−3500


P(3350<x<3370)= p ( <¿ < ) = p(-0.75 <z < 0.75) =
200 σ 200
0.5468

 What is TV life that you are confident 95% it will keep working?
x−μ
z=
σ

6
x−3500
1.96=
200
X=(1.96)(200)+3500=3892

So, we are 95 % confident that the TV will work for 3828.971 hours.
The R codes used are as follows, Output as follows,

You are working in a bank. You have collected enough data to determine the
average time needed to serve one customer and found that it follows a normal
distribution with  = 4.78 minutes and  = 1.32 minutes.
FM

 What is the probability that you will serve 10 customers every hour?

x−μ 10−4.78
P(x=10)= p( σ = ¿=p (z=3.95)
1.32

 What is the probability that you will serve more than 15 customers every
hour?

x−μ 15−4.78
P(x>15)= p( σ > 1.32
) = p(z > 7.74)

 What is the probability that you will serve between 10 & 15 customers every
hour?

10−4.78 x−μ 15−4.78


P(10<x<15)= p ( 1.32 < ¿ σ < 1.32 ) = p( 3.95<z <7.74 )

7
 What is the number of customers you will be 95% confident that you will
serve every hour?
x−μ
z=
σ
x−4.78
1.96=
1.32
X=(1.96)(1.32)+4.78=7,3672

You will be 95% of the 8 (7.37) number of clients. assured that you would provide for
them every hour

You are thinking about signing a contract, as a supplier for one of the biggest
global exporting companies. The draft contract obligates you to deliver 20 tons of
orange every week. The delivery process of orange during this season follows a
normal distribution with  = 22.5 tons every week and  = 3.2 tons.

 What is the probability that you will achieve the contract terms?
FM

X = 20, m = 22.5, s = 3.2

z score =

p(z-0.78) = 0.7823-vs= 0.2823

 What is the orange quantity that you will be 95% confident that you will
deliver every week?

n= = = 157.351936

Because we are not simply required to compute the orange quantity, we employ margin
error = 0.5. Consequently, 1.58 tonnes is the orange quantity.

8
Task 3: A supplier was requested to deliver order within 25 to 35 days after
receiving the Purchase Order.

 Calculate the capability indices and provide your comments on the results.

With the use of Capability test as shown below

FM

The implication is as follows.


- The provider regularly fulfils orders within the stipulated period of 25 days 15,625
times per million, according to the data that is currently accessible. In the long term,
this number is expected to increase to 39,249 times per million.
- 109,375 times out of a million, the provider reliably delivers items more than 35
days later than requested, according to the data supplied. In the far future, it is
projected that this figure will drop to 97,077 per million.
Hypothesis:
Null hypothesis: The mean customer satisfaction : H0;
Alternative hypothesis: The mean customer satisfaction : Ha:
Test statistic:

9
Xbar is the mean of sample / n is the sample size / s is the sample SD

Mean = 4.98 / SD = 3.76

= -3.57155 ~ -3.57

t critical value at 5% level of significance with 45

df = -1.67943(=T.INV (0.05,45) ~-1.68

tcal>tcrit

Consequently, we determine that the mean customer satisfaction is less than three and
reject hypothesis H0.

 Provide your recommendation to improve the supplier performance.


FM

Reducing variation and enhancing delivery capabilities should be the supplier's top
priorities. One way to do this is by setting internal upper and lower bounds. But, in this
specific case, the length may be as little as 27 days or as much as 32 days.
Reaching this level of capability will boost process effectiveness, which will
immediately improve overall performance.
Hypothesis:
Null hypothesis: There is no significant difference between product A and product B
customer satisfaction. (H0; )
Alternative Hypothesis: There is a significant difference between product A and product
B customer satisfaction. (Ha; )
Test statistic:

10
Sp =

 t = 0.61
 t critical = 1.99
 tcal<tcrit

Our research indicates that there is not enough data to rule out the null hypothesis. Thus,
we may say that there isn't a statistically significant difference between product A and
product B in terms of consumer satisfaction.

 Check if there is a significant difference between Product B & Product C


customer satisfaction

Hypothesis:
Null hypothesis: There is no significant difference between product B and product C
FM

customer satisfaction. (H0; )


Alternative Hypothesis: There is a significant difference between product B and product
C customer satisfaction. (Ha; )
Test statistic:

Sp =

 t = -0.28
 t critical = 1.99
 tcal<tcrit

11
There is no statistically significant difference in customer satisfaction between product
B and product C, suggesting that the null hypothesis is not rejected.

Task 4: Cycle Time was measured and found to be a non-normal distribution.


 Transform Cycle Time data using Box-Cox transformation.

The data must be transformed since the number "1" did not fall within the range specified
by the Lower Control Limit (LCL) and Upper Control Limit (UCL).
FM

 Transform Cycle Time data using Johnson transformation.

12
Task 5: Car manufacturing plant is studying the stability of its processes.
 Provide your comments on the stability of daily production.

The date below is filtered using the oldest to newest date on the control chart (I-MR).

Every day's output is closely examined to make sure there are no anomalies. Moreover,
the Moving Range has no documented extraordinary values.

- Observation: Information is controlled and preserved below a certain level.


(Position of Motion) FM

 Provide your comments on the stability of defective cars.


a) The stability of malfunctioning vehicles should be evaluated using the control chart
attribute NP when the sample size is equal.
b) The quantity of automobiles with defects is equivalent to the quantity of cars that are
fixed.
c) The sample data is used to determine how many automobiles are examined.
d) The samples do not have the same size.

13
In the data set, there are seven outliers in total, two of which go below the Lower Control
Limit (LCL) and five of which fall beyond the Upper Control Limit (UCL). Its found that
Control and stability are lacking in the process.

 Provide your comments on the stability of car defects.

Number of repaired components equals vehicle faults; the control chart attribute C chart
will be used.

The procedure is controlled and kept in an even condition. We can use the data from the
labor production sheet to make a scatter plot for Age (y) and Training (h). We can find
FM

the equation and view the Pcr value for the equation's second term by clicking on the
graph.
task 6: Sales manager wishing to predict the future sales.
 Provide your comments on the sales trend.

The accuracy of the linear trend is 92%, with a mean absolute percentage error (MAPE)
of 8%.

14
The accuracy of the quadratic trend model is 92%, with a mean absolute percentage error
(MAPE) of 8%.

FM

The Mean Absolute Percentage Error (MAPE) indicates that the exponential trend model
has an accuracy of 86%.

Measuring 93% accurate, the S-Curve trend model has a 7% Mean Absolute Percentage
Error (MAPE). When compared to other trend models, the S-Curve model performs the
best.

15
 Provide your comments on the sales seasonality.
There is seasonality in the revenue decline between points 8, 12, and 16.
 Provide your comments on the obtained accuracy model.
Of all the trend models, the S-curve trend model is the least inaccurate, 7% is the value of
the Mean Absolute Percentage Error (MAPE), A 93% accuracy rate is achieved.
 Predict the upcoming 3 quarter sales.

It is predicted that Q 19 will come in at 283,915, Q 20 will come in at 291,027, and Q 21


FM

will come in at 297,025.

16

You might also like