[go: up one dir, main page]

0% found this document useful (0 votes)
67 views8 pages

STA 100 Lab Assignment 1

Uploaded by

cloudy.mugwort
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views8 pages

STA 100 Lab Assignment 1

Uploaded by

cloudy.mugwort
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

STA 100 Lab Assignment 1

Ling Ai

2024-10-03

R Workout Lab Assignment: Patient Data Analysis


Instructions: For each question below, you will be working with the patients101.csv dataset. Follow the
steps as outlined and submit your work on Gradescope. The assignment will be graded as complete, partially
complete, or missing. Your TA will go over the material in class, and you are expected to follow along and
complete each step.

Import Data and inspect Data Set


CSV (Comma-Separated Values) is a common format for datasets. To load a CSV file into R, you can use
the read.csv() function.
data = read.csv("path/to/your/file.csv")
• path/to/your/file.csv: Replace this with the actual file path of your CSV file.
• data: This is the variable where the data will be stored as a data frame.
data = read.csv("/Users/lindaai/Downloads/patients101.csv")
data

## age totalchol sysBP weight height sedmins obese marriage gender


## 1 52 193 128 92.3 152.1 60 obese other F
## 2 63 194 112 71.1 151.7 300 obese married F
## 3 48 225 128 58.1 162.9 480 normal divorced F
## 4 21 145 106 79.8 170.0 120 overweight married M
## 5 66 224 124 116.2 160.0 480 obese widowed F
## 6 31 270 118 77.5 165.8 480 overweight married M
## 7 64 165 158 88.0 183.7 20 overweight married M
## 8 73 241 124 75.8 170.2 240 overweight married M
## 9 39 240 122 100.8 170.3 2 obese married F
## 10 73 183 196 81.2 160.6 240 obese married F
## 11 50 185 106 75.7 171.5 300 overweight married M
## 12 71 292 146 83.8 149.7 480 obese divorced F
## 13 35 330 104 63.8 160.0 360 overweight married F
## 14 44 212 116 93.0 177.0 480 overweight married M
## 15 40 202 112 75.3 156.1 120 obese divorced F
## 16 23 199 114 115.7 182.3 720 obese nevermarried M
## 17 53 251 114 59.7 164.9 300 normal married F
## 18 37 176 100 64.3 161.3 300 normal married F
## 19 60 250 96 54.5 156.4 420 normal married F
## 20 80 140 124 70.4 148.5 480 obese widowed F
## 21 67 154 114 93.6 172.1 180 obese married M

1
## 22 58 179 142 106.0 167.2 600 obese married F
## 23 57 142 156 101.3 159.9 600 obese married F
## 24 72 148 132 94.3 177.4 360 obese widowed M
## 25 39 217 132 90.7 150.3 480 obese married F
## 26 35 211 136 78.8 166.6 480 overweight married F
## 27 62 216 110 96.5 182.5 240 overweight married M
## 28 52 184 130 75.0 179.8 360 normal nevermarried M
## 29 75 184 138 76.7 166.6 120 overweight married M
## 30 61 193 130 55.2 160.6 600 normal married F
## 31 73 269 180 77.7 169.7 480 overweight divorced M
## 32 20 187 124 99.5 165.0 600 obese nevermarried F
## 33 60 264 118 78.7 159.0 300 obese divorced F
## 34 56 150 184 77.0 172.5 120 overweight other M
## 35 54 171 134 67.6 180.7 120 normal nevermarried F
## 36 39 173 108 73.9 160.1 90 overweight other F
## 37 31 201 106 80.4 175.8 600 overweight nevermarried F
## 38 28 144 110 74.3 188.5 180 normal nevermarried M
## 39 31 170 100 60.0 158.2 120 normal nevermarried F
## 40 39 239 132 115.8 186.2 300 obese married M
## 41 41 182 106 65.7 160.6 720 overweight divorced F
## 42 51 184 132 77.7 183.0 240 normal other M
## 43 29 155 112 63.4 160.4 240 normal other F
## 44 31 227 126 106.9 163.0 300 obese divorced F
## 45 20 175 114 75.5 170.3 180 overweight nevermarried F
## 46 39 267 122 91.5 173.5 120 obese other M
## 47 48 241 114 60.6 159.3 240 normal married F
## 48 76 133 118 94.2 169.1 480 obese married M
## 49 80 227 120 69.0 175.5 120 normal married M
## 50 28 154 110 79.9 168.9 360 overweight nevermarried M
## 51 32 212 138 109.5 183.3 780 obese other M
## 52 27 153 112 98.4 167.1 60 obese married F
## 53 51 144 132 103.9 178.5 180 obese married M
## 54 27 196 92 70.3 166.6 360 overweight nevermarried F
## 55 44 192 126 62.3 165.8 180 normal divorced M
## 56 20 227 114 106.6 181.5 600 obese nevermarried M
## 57 30 228 104 64.7 159.5 120 overweight married F
## 58 26 164 110 55.7 168.3 360 normal other M
## 59 80 158 150 82.8 168.0 240 overweight married M
## 60 48 243 146 142.1 182.2 240 obese divorced M
## 61 37 237 120 78.0 153.1 180 obese married M
## 62 33 173 104 66.5 165.2 240 normal married F
## 63 80 165 148 49.0 158.8 120 normal married F
## 64 28 236 124 100.8 169.7 90 obese married M
## 65 44 232 102 58.2 173.9 300 normal nevermarried M
## 66 56 238 170 108.5 161.1 600 obese married F
## 67 42 264 110 82.8 171.7 120 overweight married M
## 68 48 298 116 81.5 172.8 180 overweight married M
## 69 38 200 104 71.4 158.0 360 overweight married F
## 70 75 152 124 71.3 173.8 480 normal widowed M
## 71 30 148 116 72.7 183.6 360 normal nevermarried M
## 72 45 180 120 129.2 173.4 180 obese other M
## 73 41 182 92 67.8 165.3 540 normal married F
## 74 49 202 112 82.8 164.8 240 obese nevermarried F
## 75 38 186 108 99.8 177.6 360 obese other F

2
## 76 69 205 104 100.6 184.6 120 overweight married M
## 77 61 275 154 58.2 145.8 120 overweight married F
## 78 74 217 132 99.2 156.8 360 obese widowed F
## 79 69 163 134 122.3 176.2 360 obese married M
## 80 78 276 128 75.2 168.8 240 overweight divorced F
## 81 71 196 146 89.1 148.4 180 obese married F
## 82 80 194 170 62.5 160.1 180 normal divorced F
## 83 23 198 96 66.9 163.9 180 normal nevermarried M
## 84 62 194 130 55.7 148.6 120 overweight married F
## 85 41 239 118 100.6 164.4 180 obese married M
## 86 76 162 148 70.0 148.2 360 obese widowed F
## 87 75 242 128 58.6 169.9 180 normal married M
## 88 58 204 132 87.1 170.8 300 overweight married M
## 89 45 178 116 90.2 172.8 600 obese divorced F
## 90 39 170 100 62.2 182.8 840 normal married M
## 91 73 148 176 91.3 167.4 300 obese married M
## 92 62 240 174 76.9 169.6 420 overweight nevermarried M
## 93 38 226 144 71.8 170.2 240 normal other M
## 94 26 188 106 110.5 155.1 240 obese married F
## 95 46 298 128 75.4 152.9 90 obese married F
## 96 30 203 106 100.1 161.0 420 obese other F
## 97 59 266 138 78.0 166.1 600 overweight widowed F
## 98 39 152 118 80.1 169.0 240 overweight married F
## 99 20 162 114 68.9 153.4 360 overweight married F
## 100 76 253 140 93.3 177.0 240 overweight widowed M
A data frame in R is a two-dimensional table-like structure used to store data. It’s similar to a spreadsheet
or a database table where each column represents a variable, and each row represents an observation or data
point.
Here’s a breakdown of the data frame structure: - Columns: Each column contains data of a specific type
(e.g., numeric, character, or factor). For example, in the patients101.csv dataset, age would be a numeric
column, and gender would be a character or factor column.
• Rows: Each row is a single observation. In our case, each row in the dataset represents one patient’s
information.
You can think of it as an organized collection of variables (columns) where each observation (row) holds
values for those variables.
Example: Here’s what the data frame might look like for first 6 rows of our dataset:
head(data,6)

## age totalchol sysBP weight height sedmins obese marriage gender


## 1 52 193 128 92.3 152.1 60 obese other F
## 2 63 194 112 71.1 151.7 300 obese married F
## 3 48 225 128 58.1 162.9 480 normal divorced F
## 4 21 145 106 79.8 170.0 120 overweight married M
## 5 66 224 124 116.2 160.0 480 obese widowed F
## 6 31 270 118 77.5 165.8 480 overweight married M
In R, you can view the structure of a data frame using the str() function, which helps you understand the
type and dimensions of each variable:
str(data)

## 'data.frame': 100 obs. of 9 variables:


## $ age : int 52 63 48 21 66 31 64 73 39 73 ...

3
## $ totalchol: int 193 194 225 145 224 270 165 241 240 183 ...
## $ sysBP : int 128 112 128 106 124 118 158 124 122 196 ...
## $ weight : num 92.3 71.1 58.1 79.8 116.2 ...
## $ height : num 152 152 163 170 160 ...
## $ sedmins : int 60 300 480 120 480 480 20 240 2 240 ...
## $ obese : chr "obese" "obese" "normal" "overweight" ...
## $ marriage : chr "other" "married" "divorced" "married" ...
## $ gender : chr "F" "F" "F" "M" ...
To work with specific columns in a data frame, you can refer to them in a few different ways. In R, there are
several ways to call or select certain columns from a data frame:
1. Using the $ operator:
• This is one of the easiest ways to access a single column. You type the name of the data frame,
followed by $, and then the column name.
data$age

## [1] 52 63 48 21 66 31 64 73 39 73 50 71 35 44 40 23 53 37 60 80 67 58 57 72 39
## [26] 35 62 52 75 61 73 20 60 56 54 39 31 28 31 39 41 51 29 31 20 39 48 76 80 28
## [51] 32 27 51 27 44 20 30 26 80 48 37 33 80 28 44 56 42 48 38 75 30 45 41 49 38
## [76] 69 61 74 69 78 71 80 23 62 41 76 75 58 45 39 73 62 38 26 46 30 59 39 20 76
2. Using square brackets []:
• Data frames can be treated like matrices where rows and columns are accessed using square
brackets. You can select columns by specifying their index (position) or name.
data[,"age"]

## [1] 52 63 48 21 66 31 64 73 39 73 50 71 35 44 40 23 53 37 60 80 67 58 57 72 39
## [26] 35 62 52 75 61 73 20 60 56 54 39 31 28 31 39 41 51 29 31 20 39 48 76 80 28
## [51] 32 27 51 27 44 20 30 26 80 48 37 33 80 28 44 56 42 48 38 75 30 45 41 49 38
## [76] 69 61 74 69 78 71 80 23 62 41 76 75 58 45 39 73 62 38 26 46 30 59 39 20 76
data[,1]

## [1] 52 63 48 21 66 31 64 73 39 73 50 71 35 44 40 23 53 37 60 80 67 58 57 72 39
## [26] 35 62 52 75 61 73 20 60 56 54 39 31 28 31 39 41 51 29 31 20 39 48 76 80 28
## [51] 32 27 51 27 44 20 30 26 80 48 37 33 80 28 44 56 42 48 38 75 30 45 41 49 38
## [76] 69 61 74 69 78 71 80 23 62 41 76 75 58 45 39 73 62 38 26 46 30 59 39 20 76
data[,c(1,3)]

## age sysBP
## 1 52 128
## 2 63 112
## 3 48 128
## 4 21 106
## 5 66 124
## 6 31 118
## 7 64 158
## 8 73 124
## 9 39 122
## 10 73 196
## 11 50 106
## 12 71 146
## 13 35 104
## 14 44 116

4
## 15 40 112
## 16 23 114
## 17 53 114
## 18 37 100
## 19 60 96
## 20 80 124
## 21 67 114
## 22 58 142
## 23 57 156
## 24 72 132
## 25 39 132
## 26 35 136
## 27 62 110
## 28 52 130
## 29 75 138
## 30 61 130
## 31 73 180
## 32 20 124
## 33 60 118
## 34 56 184
## 35 54 134
## 36 39 108
## 37 31 106
## 38 28 110
## 39 31 100
## 40 39 132
## 41 41 106
## 42 51 132
## 43 29 112
## 44 31 126
## 45 20 114
## 46 39 122
## 47 48 114
## 48 76 118
## 49 80 120
## 50 28 110
## 51 32 138
## 52 27 112
## 53 51 132
## 54 27 92
## 55 44 126
## 56 20 114
## 57 30 104
## 58 26 110
## 59 80 150
## 60 48 146
## 61 37 120
## 62 33 104
## 63 80 148
## 64 28 124
## 65 44 102
## 66 56 170
## 67 42 110
## 68 48 116

5
## 69 38 104
## 70 75 124
## 71 30 116
## 72 45 120
## 73 41 92
## 74 49 112
## 75 38 108
## 76 69 104
## 77 61 154
## 78 74 132
## 79 69 134
## 80 78 128
## 81 71 146
## 82 80 170
## 83 23 96
## 84 62 130
## 85 41 118
## 86 76 148
## 87 75 128
## 88 58 132
## 89 45 116
## 90 39 100
## 91 73 176
## 92 62 174
## 93 38 144
## 94 26 106
## 95 46 128
## 96 30 106
## 97 59 138
## 98 39 118
## 99 20 114
## 100 76 140
• The first position in the brackets [,] refers to rows, and the second position refers to columns. If you
leave the row position blank (as shown), you select all rows for that column.
By using these methods, you can focus on analyzing specific variables in your dataset without dealing with
the entire data frame.

Question
(a) Find the average systolic blood pressure of all subjects.
avg.sysBP = mean(data$sysBP)

• Answer:The blood pressure of average subjects is 125.12


(b) Find the standard deviation of systolic blood pressure of all subjects.
sd(data$sysBP)

## [1] 20.91893
• Answer: 20.91893
(c) Find the average weight by gender.

6
mean(data$weight[data$gender=="M"])

## [1] 86.54681
mean(data$weight[data$gender=="F"])

## [1] 78.26415
k = aggregate(weight ~ gender,data,mean)
knitr::kable(k)

gender weight
F 78.26415
M 86.54681

• Answer:Male:86.54681, Female:78.26415
(d) Find the standard deviation of height by gender.
sd(data$height[data$gender=="M"])

## [1] 7.204293
sd(data$height[data$gender=="F"])

## [1] 7.721785
aggregate(height~gender,data,sd)

## gender height
## 1 F 7.721785
## 2 M 7.204293
• Answer:Male:7.204293, Female:7.721785
(e) Which marriage category has the most subjects?
g = table(data$marriage)
g

##
## divorced married nevermarried other widowed
## 12 52 16 12 8
aggregate(age~marriage,data,length)

## marriage age
## 1 divorced 12
## 2 married 52
## 3 nevermarried 16
## 4 other 12
## 5 widowed 8
• Answer: married

Submission Instructions:
• Make sure your code runs without errors and produces the correct output.

7
• Upload your pdf to Gradescope under the corresponding assignment.
• Your assignment will be graded as complete, partially complete, or missing.

Grading Rubric:
• Complete: All questions are answered with correct and functional code.
• Partially complete: Some questions are answered, but there are errors or missing parts in the code.
• Missing: No code is provided or no attempt is made to answer the questions.

You might also like