Introduction to RStudio
1. What is R? How to install R and R studio? Which is the latest version of R? Give details.
2. Give details of R studio layout with snapshots.
3. What are the two major parts of R language software? Give details.
4. What is a package and how can it be installed?
5. What do you mean by Object Assignment? Explain the two different types of object assignment
using an example.
Basic Functions, Commands, and Objects
6. What is the c() function? Write a sample command for the same.
7. What is a matrix? What are the different ways to create a matrix? Explain using sample commands.
8. What are data frames and vectors? Write a command to create data frames and vectors.
9. Explain the concept of function in R language. Name some built-in functions with their description.
10. Write the code to identify integer or non-integer numbers using the If and If else statements.
11. What is nested if…else statement? Display the grade of students (use your name) using nested if…
else command for the following data:
If Marks Then
>=90 Excellent
>=70<90 Very Good
>=50<70 Good
<50 Poor
12. What is the paste( ) function used for? Write a sample command for the same.
13. Create a data frame with 8 rows and 5 columns.
a) Write a command to access rows- 2,4,5 and columns- 1,2,3.
b) Write a command to drop row-1 and column-4 from the data frame.
14. What is the command to access built-in datasets? What is the command to get a description of a
built-in dataset?
Importing Data from Excel and Data Analysis
15. How do you import an Excel data sheet in RStudio? Explain using snapshots of the same.
16. Calculate correlation by importing data from Excel. Also, explain whether there is a positive or a
negative correlation between advertisement and sales.
Advertisement (in lakhs) Sales (in crores)
15 5
18 12
25 21
21 18
29 23
38 33
32 33
31 31
35 38
37 42
17. Create an Excel data sheet with 3 columns and 5 rows showing marks of 3 subjects scored by 5
students. Import the datasheet to the r studio. Write the sample commands to calculate the maximum
value, minimum value, mean, median, standard deviation, and variance of all the subjects.
Data Visualisation
18. Create a dataset containing overall marks of 20 BCOM students with 5 (A to C) sections.
a) Represent the dataset using a Bar Chart.
b) Create a histogram plot for the same dataset.
c) Create a density plot with different line types.
Give colours, names to the axes, the chart's title, and also specify the range.
19. Activate the Motor Trend Car Road Tests dataset. Using the given data set prepare the following
quick plots:
a) Scatter plots with a smoothed line for Miles/(US) gallon on the y-axis and Weight (lb/1000) on
the x-axis
b) Scatter plots (for Miles/(US) gallon on the y-axis and Weight (lb/1000) on the x-axis) with
Smoothed line by groups (Number of cylinders)
c) Scatter plots with colours for Miles/(US) gallon on y-axis and Weight (lb/1000) on x axis
d) Scatter plots (for Miles/(US) gallon on the y-axis and Weight (lb/1000) on the x-axis) with
colours by groups (Number of gears)
e) Scatter plots (for Miles/(US) gallon on the y-axis and Weight (lb/1000) on the x-axis) with
Smoothed lines and colors by groups (Number of gears)
f) Scatter plots (for Miles/(US) gallon on the y-axis and Weight (lb/1000) on the x-axis) with
Smoothed line. and the point shape by groups (Number of gears)
Hypothesis Testing
A. T-Test: One Sample t- Test
20. Problem: To determine that the population mean of age is equal to 30 at α=0.05.
Age
23
29
23
45
32
17
42
14
62
25
16
31
45
51
52
48
27
41
39
61
58
B. T-Test: Two Sample t- Test
21. Problem: To analyze that the time spent by BCOM students in studying Research Methodology is
different from the time spent by BBA students at a significance level of 95%.
BCOM (in hrs) BBA (in hrs)
3.2 1.6
4.1 2.9
5.0 4.1
6.1 2.4
2.8 4.1
2.9 2.1
3.1 1.2
3.6 1.6
4.2 3.2
3.1 4.1
2.7 3.9
C. T-test: Two sample t-test
22. Problem: Is there sufficient evidence to suggest that the mean time to exhaustion is greater after
chocolate milk than after a carbohydrate replacement drink? Use a significance level of 0.05.
Cyclist Chocolate Milk Carbohydrated Replacement Milk
1 50.46 32.90
2 47.08 20.10
3 57.51 41.67
4 46.60 32.69
5 49.10 46.33
6 27.50 31.63
7 23.87 50.61
8 28.65 14.99
9 35.37 20.11
D. T-Test: Paired t-test
23. Problem: The number of units produced by 10 employees before and after giving training is given below.
Determine if the training was effective. Use a significance level of 95%.
No. of units produced before training No. of units produced after training
34 39
31 42
43 44
36 45
44 44
42 50
37 36
38 45
41 48
45 50
39 47
36 43
42 42
41 49
38 45
E. F Test
24. Problem: Use F-test and determine whether there is a significant difference between the variances of two
data sets.
Group 1 Group 2
150 170
125 165
160 130
130 155
160 125
125 150
F. One Way Anova
25. Problem: The sales (in crores) for 3 different products- Shampoo, Handwash, Detergent are given.
Determine whether there is a significant difference between their means.
Shampoo Handwash Detergent
23 44 22
32 34 24
23 46 31
34 14 20
45 25 27
56 22 18
43 35 22
G. Chi Square test
26. Problem: Determine whether brand preference is independent of age group.
Age/Brand Brand 1 Brand 2 Brand 3
10-20 75 56 72
21-30 60 40 64
31-40 45 52 50
41-50 55 35 45
Packages in R Programming
A. The “tidyr” Package
27. The table below represents the marks scored by 10 students from 3 different groups. Use the following
data and execute the given functions (gather, separate, unite, spread, fill, full_seq, drop_na, and replace_na)
in “tidyr Package”.
S.No. Group 1 Group 2 Group 3
1 145 275 287
2 292 134 129
3 124 229 139
4 112 175 255
5 215 111 199
6 249 223 305
7 168 146 123
8 157 278 237
9 101 102 245
10 138 144 112
B. The “dplyr” Package
28. Create a dataset for 5 students using following heads.
Apply Important Functions (filter, arrange, select, rename, mutate and transmute, sample_n and
sample_frac) for following column heads with 5 data rows:
Name Economics Marks Mathematics Marks Internship Done (Yes/No)