0% found this document useful (0 votes)

17 views8 pages

LabPractice 1 EfficientProgramming

This document outlines a lab exercise focused on efficient programming and statistical computing, authored by Luis Torres Serrano. It includes sections on installing packages, microbenchmarking, efficient programming techniques, data I/O, data carpentry, optimization, and hardware considerations. Each section contains specific tasks and coding exercises to enhance programming efficiency in R.

Uploaded by

Fernando Pérez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views8 pages

LabPractice 1 EfficientProgramming

Uploaded by

Fernando Pérez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Lab 1: Efficient Programming

Introduction to Statistical Computing

Luis Torres Serrano

13/01/2025

Contents
Installing and loading packages 2

Q1. Microbenchmarking 2

Q2. Efficient set-up 4

Q3. Efficient programming 4

Q4. Efficient data I/O 6

Q5. Efficient data carpentry 6

Q6. Efficient optimization 7

Q7. Efficient hardware 7

1
This lab is to be done outside of class time. You may collaborate with one classmate, but you must identify
yourself and his/her name above, in the author’s field, and you must submit your own lab as this completed
.Rmd file.

Installing and loading packages

In order to perform the exercise in this practice you should install and load the microbenchmark and profvis
packages. Also install devtools and the proftools package from CRAN.

# YOUR CODE GOES HERE

From the Bioconductor repository you must also install the graph and Rgraphviz packages. To
install packages from this repository, you must install BiocManager package first and then use the
BiocManager::install() function to install the packages.

install.packages("BiocManager", dep = TRUE)

BiocManager::install(c("Rgraphviz","graph"))

Q1. Microbenchmarking
1a. Use the microbenchmark::microbenchmark() function to know which of the following three functions
is the fastest to perform the cumulative sum of a 100-element vector. By how much is the fastest with respect
to the second one?

x <- 1:100 # initiate vector to cumulatively sum

# Method 1: with a for loop (10 lines)

cs_for <- function(x) {
for (i in x) {
if (i == 1) {
xc = x[i]
} else {
xc = c(xc, sum(x[1:i]))
}
}
xc
}

# Method 2: with apply (3 lines)

cs_apply <- function(x) {
sapply(x, function(x) sum(1:x))
}

# Method 3: cumsum (1 line)

cumsum(x)

# YOUR CODE GOES HERE

1b. Run the same benchmark but now x is 1:50000. As the benchmark could take too long, set the argument
time = 1 in the microbenchmark function. Does the relative difference between the fastest and the second
fastest increase or decrease? By how much?

2
# YOUR CODE GOES HERE

1c. Try profiling a section of code you have written using the profvis::profvis() function. Where are
the bottlenecks? Were they where you expected?

# YOUR CODE GOES HERE

1d. Let’s profile a section of code with the Rprof() function. The code section is a function to compute
sample variance of a numeric vector:

# Compute sample variance of numeric vector x

sampvar <- function(x) {
# Compute sum of vector x
my.sum <- function(x) {
sum <- 0
for (i in x) {
sum <- sum + i
}
sum
}

# Compute sum of squared variances of the elements of x from

# the mean mu
sq.var <- function(x, mu) {
sum <- 0
for (i in x) {
sum <- sum + (i - mu) ˆ 2
}
sum
}

mu <- my.sum(x) / length(x)

sq <- sq.var(x, mu)
sq / (length(x) - 1)
}

To use the Rprof() function, you shall specify in which file you want to store the results of the profiling.
Then you execute the code you want to profile, and then you execute Rprof(NULL) to stop profiling. In
order to profile the sampvar() function applied to a random 100 million number vector:

x <- runif(1e8)
Rprof("Rprof.out", memory.profiling = TRUE)
y <- sampvar(x)
Rprof(NULL)

Use the summaryRprof() function to print a summary of the code profiling. Which part of the function
takes more time to execute? Which part of the function requires more memory?

# YOUR CODE GOES HERE

1e. summaryRprof() function prints a summary of the code profiling, but it is not user-friendly to read.
Using the proftool packages, let’s see the results from the Rprof.out file. See the help (?) for the functions
readProfiledata and plotProfileCallGraph and plot the results of the code profiling from 1d.

3
# YOUR CODE GOES HERE

Q2. Efficient set-up

Let’s check if you have an optimal R installation.
2a. What is the exact version of your computer’s operating system?

# YOUR CODE GOES HERE

2b. Start an activity monitor and execute the following chunk.In it, lapply() (or its parallel version
mclapply()) is used to apply a function, median(), over every column in the data frame object X

# Note: uses 2+ GB RAM and several seconds or more depending on hardware

# 1: Create large dataset
X <- as.data.frame(matrix(rnorm(1e9), nrow = 1e8))
# 2: Find the median of each column using a single core
r1 <- lapply(X, median)
# 3: Find the median of each column using many cores
r2 <- parallel::mclapply(X, median)

2c. Try modifying the settings of your RStudio setup using the Tools > Global Options menu. What settings
do you think can affect R performance? (Note only some of them, not ALL of them)

# YOUR CODE GOES HERE

2d. Try some of the shortcuts integrated in RStudio. What shortcuts do you think can save you development
time? (Note only some of them, not ALL of them)

# YOUR CODE GOES HERE

2e. Check how well your computer is suited to perform data analysis tasks. In the following code chunk
you will run a benchmark test from the benchmarkme package and plot your result against the results from
people around the world. Do you think that you should upgrade your computer?

library("benchmarkme")
# Run standard tests
res_std <- benchmark_std(runs=3)
plot(res_std)
# Run memory I/O tests by reading/writing a 5MB file
res_io <- benchmark_io(runs = 1, size = 5)
plot(res_io)

Q3. Efficient programming

3a. Create a vector x of 100 random numbers and use the microbenchmark package to compare the vectorised
construct x = x + 1 to the for loop version for (i in seq_len(n)) x[i] = x[i] + 1. Try varying the
size of the input vector and check how the results differ. Which functions are being called by each method?

4
# YOUR CODE GOES HERE

3b. Monte Carlo integration can be performed with the following code:

monte_carlo = function(N) {
hits = 0
for (i in seq_len(N)) {
u1 = runif(1)
u2 = runif(1)
if (u1 ˆ 2 > u2)
hits = hits + 1
}
return(hits / N)
}

Create a vectorized function monte_carlo_vec which do not use a for loop.

# YOUR CODE GOES HERE

3c. How much faster is the vectorized function monte_carlo_vec with respect to the original function
monte_carlo?

# YOUR CODE GOES HERE

3d. Using the memoise function, create a function called m_fib that is the memoized version of the recursive
function:

fib <- function(n) {

if(n == 1 || n == 2) return(1)
fib(n-1) + fib(n-2)
}

Then, using microbenchmark, simulate calculating the 10th position of the Fibonacci serie a 100 times with
each function. How much faster is the memoized version?

# YOUR CODE GOES HERE

3e. Try varying the parameters of the 3d exercise. What happens when you measure the computing time
of calculating the 1st position of Fibonacci serie? And the 25th?

# YOUR CODE GOES HERE

3f. Create the c_fib function as the compilation version of the fib function declared in exercise 3d using
the cmpfun of the compiler package. Which is faster, fib, c_fib or m_fib? And cm_fib (compiled version
of m_fib)? And mc_fib (memoized version of c_fib)?

# YOUR CODE GOES HERE

Challenge 01. Calculate the computing time for calculating the Fibonacci serie 5 times from the 1st to the
25th position with the fib, c_fib, m_fib, cm_fib and mc_fib functions. Store the results for each position
and create a plot showing these results. When does it begin to compensate using the memoized version?
Hint: Use geom_point() and geom_errorbars() function of ggplot2 to show the median, lq and uq values
of the microbenchmark analysis.

5
# YOUR CODE GOES HERE

Q4. Efficient data I/O

4a. Import data from https://github.com/mledoze/countries/raw/master/countries.json using the import()
function from the rio package.

# YOUR CODE GOES HERE

4b. Export the data imported in 4a to 3 different file formats of your choosing supported by rio (see
vignette("rio") for supported formats). Try opening these files in external programs. Which file formats
are more portable?

# YOUR CODE GOES HERE

Challenge 03. Create a simple benchmark to compare the write times for the different file formats of 4b.
Which is fastest? Which is the most space efficient?

# YOUR CODE GOES HERE

Q5. Efficient data carpentry

5a. Create the following data.frame:

df_base = data.frame(colA = "A")

Try and guess the output of the following commands. Quit the eval = FALSE argument and check if the
output is what you thought.

print(df_base)
df_base$colA
df_base$col
df_base$colB

Now create a tibble tbl_df and repeat the above commands.

# YOUR CODE GOES HERE

5b. Load and look at subsets of the pew and lnd_geo_df datasets from the efficient package. What is
untidy about them? Convert each of these datasets into tidy form.

# YOUR CODE GOES HERE

5c. Consider the following string of phone numbers and fruits:

strings = c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569",
"387 287 6718", "apple", "233.398.9187 ", "482 952 3315",
"239 923 8115", "842 566 4692", "Work: 579-499-7527",
"$1000", "Home: 543.355.3679")

6
Write expressions in stringr and base R that return a logical vector reporting whether or not each string
contains a number.

# YOUR CODE GOES HERE

Q6. Efficient optimization

6a. Create a vector x and benchmark any(is.na(x)) against anyNA(x). Do the results vary with the size
of the vector?

# YOUR CODE GOES HERE

6b. Construct a matrix of integers and a matrix of numerics and use a pryr::object_size() to compare
the object occupation.

# YOUR CODE GOES HERE

6c. Consider the following piece of code:

double test1() {
double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++i)
b = b + a;
return b;
}

• Save the function test1() in a separate file. Make sure it works.

• Write a similar function in R and compare the speed of the C++ and R versions.
• Create a function called test2() where the double variables have been replaced by float. Do you
still get the correct answer?
• Change b = b + a to b += a to make your code more C++ like.
• (Bonus) What’s the difference between i++ and ++i?

Q7. Efficient hardware

7a. How much RAM does your computer have? (Optional question, privacy above all. Write a random
number if you do not want to share your hardware information.)

# YOUR CODE GOES HERE

7b. Using your preferred search engine, how much does it cost to double the amount of available RAM on
your system? (Again, write a random number if you do not want to share your hardware information)

# YOUR CODE GOES HERE

7c. Check if you are using a 32-bit or 64-bit version of R.

7
# YOUR CODE GOES HERE

Worksheet 16
No ratings yet
Worksheet 16
3 pages
Writing Efficient R Code
No ratings yet
Writing Efficient R Code
5 pages
2 Functions
No ratings yet
2 Functions
49 pages
R Program 2025,-1
No ratings yet
R Program 2025,-1
11 pages
Mit 302 Cat Solutions - 1
No ratings yet
Mit 302 Cat Solutions - 1
4 pages
Practical File R by Komal
No ratings yet
Practical File R by Komal
26 pages
R Practicals
No ratings yet
R Practicals
32 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
R Commands
No ratings yet
R Commands
18 pages
Bda Skill
No ratings yet
Bda Skill
34 pages
Basic R Programming
No ratings yet
Basic R Programming
16 pages
R Program
No ratings yet
R Program
22 pages
Assignment 1
No ratings yet
Assignment 1
8 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
DA All
No ratings yet
DA All
15 pages
Introduction to R Programming
No ratings yet
Introduction to R Programming
59 pages
R Programs
No ratings yet
R Programs
12 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
RNASeq Data Analysis in R
No ratings yet
RNASeq Data Analysis in R
4 pages
R Manual
No ratings yet
R Manual
10 pages
W2 Advanced Data Structures, IO & Control
No ratings yet
W2 Advanced Data Structures, IO & Control
44 pages
Statistical Computing II Answers
No ratings yet
Statistical Computing II Answers
4 pages
R Programmimg Practical Journal All-1
No ratings yet
R Programmimg Practical Journal All-1
25 pages
R File Code
No ratings yet
R File Code
16 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
R Function Writing Guide
No ratings yet
R Function Writing Guide
36 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
My R Report
No ratings yet
My R Report
52 pages
Interactive Debugging in R Guide
No ratings yet
Interactive Debugging in R Guide
8 pages
R Examples
No ratings yet
R Examples
56 pages
Profiling & Simulation Aim:: Pre-Lab Discussion Theory
No ratings yet
Profiling & Simulation Aim:: Pre-Lab Discussion Theory
8 pages
Bdo Co1 Session 4
No ratings yet
Bdo Co1 Session 4
43 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
Assignment TEE
No ratings yet
Assignment TEE
3 pages
DA Lab Week-1
No ratings yet
DA Lab Week-1
7 pages
BDA Lab Manual (12 Weeks)
No ratings yet
BDA Lab Manual (12 Weeks)
22 pages
DSA Lab
No ratings yet
DSA Lab
29 pages
R Console
No ratings yet
R Console
6 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
R Lab Manual
No ratings yet
R Lab Manual
19 pages
R Homework
No ratings yet
R Homework
13 pages
Introduction To R For Gene Expression Data Analysis
No ratings yet
Introduction To R For Gene Expression Data Analysis
11 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
Rintro
No ratings yet
Rintro
14 pages
DS Lab
No ratings yet
DS Lab
31 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
R Programming Lab
No ratings yet
R Programming Lab
14 pages
R Commands: Appendix B
No ratings yet
R Commands: Appendix B
5 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
MultivariateRGGobi PDF
No ratings yet
MultivariateRGGobi PDF
60 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
Sumit Kumar R Practical 20241211 163149 0000
No ratings yet
Sumit Kumar R Practical 20241211 163149 0000
24 pages
R Programs 2024-2025
No ratings yet
R Programs 2024-2025
13 pages
Chemical Engg Basic Indiabix
100% (3)
Chemical Engg Basic Indiabix
123 pages
Venous and Lymphatic Diseases, 1st Edition New Edition PDF
100% (9)
Venous and Lymphatic Diseases, 1st Edition New Edition PDF
14 pages
Payment of Wages Act, 1936
No ratings yet
Payment of Wages Act, 1936
43 pages
U41 DT AssignmentBrief Jan23
No ratings yet
U41 DT AssignmentBrief Jan23
15 pages
Vũ Hoài Ân
No ratings yet
Vũ Hoài Ân
194 pages
Signature Global Daxin Vistas
No ratings yet
Signature Global Daxin Vistas
49 pages
Role Play
No ratings yet
Role Play
6 pages
Ang Mutya NG Section e (Book 3) (Part 3) (Completed) - 3
No ratings yet
Ang Mutya NG Section e (Book 3) (Part 3) (Completed) - 3
200 pages
JAR - OPS Quality System
No ratings yet
JAR - OPS Quality System
9 pages
Cambridge IGCSE™: Mathematics (Us) 0444/41 May/June 2020
No ratings yet
Cambridge IGCSE™: Mathematics (Us) 0444/41 May/June 2020
8 pages
Austrian Light Infantry Painting Guide
No ratings yet
Austrian Light Infantry Painting Guide
11 pages
Chinese Firm
No ratings yet
Chinese Firm
13 pages
Unit - I
No ratings yet
Unit - I
17 pages
TNSTC
No ratings yet
TNSTC
1 page
Part Payment
No ratings yet
Part Payment
2 pages
Customer Segments Ver.2.0
No ratings yet
Customer Segments Ver.2.0
51 pages
Learner'S Licence: Form 3 (See Rule 3 (A) and 13)
No ratings yet
Learner'S Licence: Form 3 (See Rule 3 (A) and 13)
1 page
German, Code No. 120 Class XI and XII (2020-2021)
No ratings yet
German, Code No. 120 Class XI and XII (2020-2021)
11 pages
Expansion Api 510 Tomo 1
No ratings yet
Expansion Api 510 Tomo 1
2 pages
Reviewer in Electricity and Magnetism
No ratings yet
Reviewer in Electricity and Magnetism
3 pages
Licensure Examination For BSC Nursing Graduates Prepared by Fmoh, June 2019
100% (5)
Licensure Examination For BSC Nursing Graduates Prepared by Fmoh, June 2019
17 pages
English Grammar Guide for Students
No ratings yet
English Grammar Guide for Students
5 pages
Understanding Center of Gravity
No ratings yet
Understanding Center of Gravity
10 pages
Client Onboarding Questionnaire
No ratings yet
Client Onboarding Questionnaire
5 pages
Stones Paramount Cutter-A System For Cutting Garments 1887
100% (1)
Stones Paramount Cutter-A System For Cutting Garments 1887
72 pages
Gecco V6.2
No ratings yet
Gecco V6.2
1 page
667 Question Paper
No ratings yet
667 Question Paper
2 pages
KDS 41 10 15 2019
No ratings yet
KDS 41 10 15 2019
51 pages
G.R. No. 239957
No ratings yet
G.R. No. 239957
22 pages
Hydraulic Seal Catalogue 2022
100% (3)
Hydraulic Seal Catalogue 2022
423 pages

LabPractice 1 EfficientProgramming

Uploaded by

LabPractice 1 EfficientProgramming

Uploaded by

Lab 1: Efficient Programming

Introduction to Statistical Computing

Luis Torres Serrano

Q2. Efficient set-up 4

Q3. Efficient programming 4

Q4. Efficient data I/O 6

Q5. Efficient data carpentry 6

Q6. Efficient optimization 7

Q7. Efficient hardware 7

Installing and loading packages

# YOUR CODE GOES HERE

install.packages("BiocManager", dep = TRUE)

x <- 1:100 # initiate vector to cumulatively sum

# Method 1: with a for loop (10 lines)

# Method 2: with apply (3 lines)

# Method 3: cumsum (1 line)

# YOUR CODE GOES HERE

# YOUR CODE GOES HERE

# Compute sample variance of numeric vector x

# Compute sum of squared variances of the elements of x from

mu <- my.sum(x) / length(x)

# YOUR CODE GOES HERE

Q2. Efficient set-up

# YOUR CODE GOES HERE

# Note: uses 2+ GB RAM and several seconds or more depending on hardware

# YOUR CODE GOES HERE

# YOUR CODE GOES HERE

Q3. Efficient programming

Create a vectorized function monte_carlo_vec which do not use a for loop.

# YOUR CODE GOES HERE

# YOUR CODE GOES HERE

fib <- function(n) {

# YOUR CODE GOES HERE

# YOUR CODE GOES HERE

# YOUR CODE GOES HERE

Q4. Efficient data I/O

# YOUR CODE GOES HERE

# YOUR CODE GOES HERE

# YOUR CODE GOES HERE

Q5. Efficient data carpentry

df_base = data.frame(colA = "A")

Now create a tibble tbl_df and repeat the above commands.

# YOUR CODE GOES HERE

# YOUR CODE GOES HERE

5c. Consider the following string of phone numbers and fruits:

# YOUR CODE GOES HERE

Q6. Efficient optimization

# YOUR CODE GOES HERE

# YOUR CODE GOES HERE

6c. Consider the following piece of code:

• Save the function test1() in a separate file. Make sure it works.

Q7. Efficient hardware

# YOUR CODE GOES HERE

# YOUR CODE GOES HERE

7c. Check if you are using a 32-bit or 64-bit version of R.

You might also like