[go: up one dir, main page]

100% found this document useful (1 vote)
148 views15 pages

SMDM Project: Submitted By: Tina Das

The document summarizes an analysis of survey data from 62 undergraduate students at Clear Mountain State University (CMSU). 1) Contingency tables were constructed for Gender vs Major, Gender vs Graduation Intention, and Gender vs Employment and Computer Ownership. 2) Probabilities were calculated that a randomly selected student would be male (46.8%) or female (53.2%). Conditional probabilities of majors for male and female students were also found. 3) The probability a randomly chosen student is male and intends to graduate is 27.4%. The probability a randomly selected student is female and does not have a laptop is 12.12%.

Uploaded by

Tina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
148 views15 pages

SMDM Project: Submitted By: Tina Das

The document summarizes an analysis of survey data from 62 undergraduate students at Clear Mountain State University (CMSU). 1) Contingency tables were constructed for Gender vs Major, Gender vs Graduation Intention, and Gender vs Employment and Computer Ownership. 2) Probabilities were calculated that a randomly selected student would be male (46.8%) or female (53.2%). Conditional probabilities of majors for male and female students were also found. 3) The probability a randomly chosen student is male and intends to graduate is 27.4%. The probability a randomly selected student is female and does not have a laptop is 12.12%.

Uploaded by

Tina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

SMDM PROJECT

Submitted by:
Tina Das

1
SMDM PROJECT

Problem Statement 1:
A wholesale distributor operating in different regions of Portugal has
information on annual spending of several items in their stores across different
regions and channels. The data consists of 440 large retailers’ annual spending
on 6 different varieties of products in 3 different regions (Lisbon, Oporto,
Other) and across different sales channel (Hotel, Retail).
1.1 Use methods of descriptive statistics to summarize data. Which Region and which
Channel spent the most? Which Region and which Channel spent the least?

Fig 1

Summarising the data:


The data set contains 440 observations of data and 8 variables.

The data has 440 instances with 8 attributes. 6 integer type, 2 object type.

6 continuous types of feature ('Fresh', 'Milk', 'Grocery', 'Frozen', 'Detergents_Paper', 'Delicassen')

2 categoricals features ('Channel', 'Region')

There are 3 channels and 2 Regions and 7 .

Top Channel is Hotel, with a frequency of 298 out of 440 transactions, 67.7% of the spending comes from
“Hotel” channel.

71.8 % of spending comes from “Other” region.

Fresh item has the highest standard deviation at 12647.3 with min value of 3 and max value of 112151 .
Q1(25%) is 3127.75, Q3(75%) is 16933.8, with Q2(50%) 8504

2
SMDM PROJECT

Milk item has a mean of 5796.27, standard deviation of 7380.38, with min value of 55 and max value of 73498.
Q1(25%) is 1533, Q3(75%) is 7190.25, with Q2(50%) 3627

Grocery item has a mean of 7951.28, standard deviation of 9503.16, with min value of 3 and max value of
92780. Q1(25%) is 2153, Q3(75%) is 10655.8, with Q2(50%) 4755.5

Frozen has a mean of 3071.93, standard deviation of 4854.67, with min value of 25 and max value of 60869.
Q1(25%) is 742.25, Q3(75%) is 3554.25, with Q2(50%) 1526

Detergents_Paper has a mean of 2881.49, standard deviation of 4767.85, with min value of 3 and max value of
40827. Q1(25%) is 256.75, Q3(75%) is 3922, with Q2(50%) 816.5

Delicatessen has a mean of 1524.87, standard deviation of 2820.11, with min value of 3 and max value of
47943. Q1(25%) is 408.25, Q3(75%) is 1820.25, with Q2(50%) 965.5

Highest Spend

Highest spend in the Region is from Others and lowest spend in the region is from Oporto Highest spend in
the Channel is from Hotel and lowest spend in the Channel is from Retail.

3
SMDM PROJECT

1.2 There are 6 different varieties of items that are considered. Describe and
comment/explain all the varieties across Region and Channel? Provide a detailed
justification for your answer.

Following are the 6 different varieties of items that are considered:

1. FRESH : annual spending on fresh products


2. MILK : annual spending milk products
3. GROCERY : annual spending on grocery products
4. FROZEN : annual spending on frozen products
5. DETERGENTS_PAPER : annual spending on detergents and paper products
6. DELICATESSEN : annual spending on and delicatessen products

Fig 2

Observation from the above fig 2

1. Frozen:

4
SMDM PROJECT

Frozen item has highest spending in Libson region in retail channel. It has highest spending in Oporto
Region in Hotel Channel.

2. DETERGENTS_PAPER
Detergent_Paper has similar spending in Libson and Oporto region of Retail channel. It has very
less spending in Hotel channel.

3. DELICATESSEN

Other and Libson region have similar spending in Retail channel . It has most spending in Other

region of hotel channel.

4. Milk
Other and Libson have similar spending in hotel channel. It have much less spending in Hotel
channel.
5. Frozen
Frozen item has highest spending in Oporto region in Hotel channel.
6. Fresh:
Fresh item has highest spending in other region of Hotel channel.

5
SMDM PROJECT

Fresh, Milk, Grocery, Frozen, Detergents_Paper and Delicatessen are highly skewed .

1.3 On the basis of a descriptive measure of variability, which item shows the most
inconsistent behaviour? Which items show the least inconsistent behaviour?
With reference to Fig 1-

Fresh item has highest standard deviation so it is inconsistent.

Delicatessen has lowest standard deviation, so it is consistent.

1.4 Are there any outliers in the data? Back up your answer with a suitable plot/technique
with the help of detailed comments.

6
SMDM PROJECT

Fig 3

Based on the above fig, yes all the 6 variables,i.e, 'Fresh', 'Milk', 'Grocery', 'Frozen', 'Detergents_Paper',
'Delicassen' has ourliers.

1.5 On the basis of your analysis, what are your recommendations for the business? How
can your analysis help the business to solve its problem? Answer from the business
perspective
The Wholesale Business is not very consistent across different channels. Would recommend to buy in bulk
across different items.

Retail and Hotel channels are very different in operation. Would recommend to identify the product/item
requirement for both the channels and concentrate accordingly.

Problem Statement 2:

7
SMDM PROJECT

The Student News Service at Clear Mountain State University (CMSU) has
decided to gather data about the undergraduate students that attend CMSU.
CMSU creates and distributes a survey of 14 questions and receives responses
from 62 undergraduates (stored in the Survey data set).

2.1. For this data, construct the following contingency tables (Keep Gender as row
variable)
2.1.1. Gender and Major

2.1.2. Gender and Grad Intention

2.1.3. Gender and Employment

8
SMDM PROJECT

2.1.4. Gender and Computer

2.2. Assume that the sample is representative of the population of CMSU. Based on the
data, answer the following question:
2.2.1. What is the probability that a randomly selected CMSU student will be male?
Total number of students = 62 Number of male students = 29

Probability that a randomly selected CMSU student will be male = 29/62

P(Male) = 0.468(46.8%)

2.2.2. What is the probability that a randomly selected CMSU student will be female?

Total number of students = 62

Number of female students = 33

Probability that a randomly selected CMSU student will be a female = 33/62

P(Female) = 0.532(53.2%)

2.3. Assume that the sample is representative of the population of CMSU. Based on the
data, answer the following question:

9
SMDM PROJECT

2.3.1. Find the conditional probability of different majors among the male students in
CMSU.
Probability of male_Accounting = 0.1379 (13.79%)

Probability of male_CIS = 0.034 (3.4%)

Probability of male_Economics_Finance = 0.1379 (13.79%)

Probability of male_International_Business = 0.069 (6.9%)

Probability of male_Management = 0.2069 (20.69%)

Probability of male_Other = 0.1379 (13.79%)

Probability of male_Retailing_Marketing = 0.1724 (17.24%)

Probability of male_Undecided = 0.1034 (10.34%)

2.3.2 Find the conditional probability of different majors among the female students of
CMSU.
Probability of female_Accounting = 0.091 (9.1%)

Probability of female_CIS = 0.091 (9.1%)

Probability of female_Economics_Finance = 0.2121 (21.21%)

Probability of female_International_Business = 0.1212 (12.12%)

Probability of female_Management = 0.1212 (12.12%)

Probability of female_Other = 0.091 (9.1%)

Probability of female_Retailing_Marketing = 0.2727 (27.27%)

Probability of female_Undecided = 0

2.4. Assume that the sample is a representative of the population of CMSU. Based on the
data, answer the following question:
2.4.1. Find the probability That a randomly chosen student is a male and intends to
graduate.

10
SMDM PROJECT

No. of male who have already decided(Column Yes) to do Graduation = 17.

Total number of students = 62.

Probability = 17/62 = 0.274 (27.4%)

2.4.2 Find the probability that a randomly selected student is a female and does NOT have
a laptop. 

Number of Female having laptop = 29

Total number of Female = 33

Probability of Female having laptop = 29/33 = 0.8787 (87.87%)

Probability of Female not having laptop = 1-0.8787 = 0.1212 (12.12%)

2.5. Assume that the sample is representative of the population of CMSU. Based on the
data, answer the following question:
2.5.1. Find the probability that a randomly chosen student is either a male or has full-time
employment?
P(male or full-time employment) = P(male) + P(full-time employment) – P(male and full-time employment)

=29/62 + 10/62 – 7/62 = 0.516 (51.6%)

11
SMDM PROJECT

2.5.2. Find the conditional probability that given a female student is randomly chosen, she
is majoring in international business or management.
Pfemale(int buss/mgmt.) = P(int buss) + P(mgmt.) …. Mutually exclusive events

= 4/62+4/62

= 0.064 + 0.064 = 0.128 (12.8%).

2.6.  Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No).
The Undecided students are not considered now and the table is a 2x2 table. Do you think
the graduate intention and being female are independent events?

2.7. Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
Spending, and Text Messages.
Answer the following questions based on the data
2.7.1. If a student is chosen randomly, what is the probability that his/her GPA is less than
3?
17

2.7.2. Find the conditional probability that a randomly selected male earns 50 or more.
Find the conditional probability that a randomly selected female earns 50 or more.

12
SMDM PROJECT

2.8. Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
Spending, and Text Messages. For each of them comment whether they follow a normal
distribution. Write a note summarizing your conclusions for this whole Problem 2.

Each of the variables GPA, Salary, Spending and Text Messages follow normal distribution.

Problem Statement 3
An important quality characteristic used by the manufacturers of ABC asphalt
shingles is the amount of moisture the shingles contain when they are
packaged. Customers may feel that they have purchased a product lacking in
quality if they find moisture and wet shingles inside the packaging.   In some
cases, excessive moisture can cause the granules attached to the shingles for
texture and colouring purposes to fall off the shingles resulting in appearance
problems. To monitor the amount of moisture present, the company
conducts moisture tests. A shingle is weighed and then dried. The shingle is
then reweighed, and based on the amount of moisture taken out of the
product, the pounds of moisture per 100 square feet is calculated. The

13
SMDM PROJECT

company would like to show that the mean moisture content is less than
0.35 pound per 100 square feet.

3.1 Do you think there is evidence that means moisture contents in both types of shingles
are within the permissible limits? State your conclusions clearly showing all steps.
For Sample A -

t = -1.4, p value = 0.07

Since p value > 0.05, we cannot reject null hypothesis .

There is not enough evidence to conclude that the mean moisture content for Sample A shingles is less than
0.35 pounds per 100 square feet. p-value = 0.07.

For Sample B -

t = -3.1 p value = 0.002

Since pvalue < 0.05, we reject null hypothesis .

There is enough evidence to conclude that the mean moisture content for Sample B shingles is not less than
0.35 pounds per 100 square feet. p-value = 0.0021

3.2 Do you think that the population mean for shingles A and B are equal? Form the
hypothesis and conduct the test of the hypothesis. What assumption do you need to
check before the test for equality of means is performed?

Alpha = 0.05

t =1.29 and p value=0.202

As the pvalue > α , we cannot reject null hypothesis.

Population mean for shingles A and B are equal test asumptions when running a two-sample t-test.

14
SMDM PROJECT

The basic assumptions are that the distributions of the two populations are normal, and that the variances of the two

distributions are the same.

15

You might also like