Just Enough R: Learn Data Analysis with R in a Day
3.5/5
()
About this ebook
Because there is so much of a buzz around the R programming language in the data science and analytics world, in health-care and other domains, I wrote a book on R-programming for data analysis. The book aims to teach data analysis using R within a single day to anyone who already knows some programming in any other language. The book has sample code (which can be downloaded as a zip file from the Dropbox link provided in the text of the book) that uses the freely available, CMS-sourced Medicare Physician and Other Supplier Data CY 2014 health-care data-set which was released to the public a few years back and created quite a stir.
Sivakumaran Raman
Sivakumaran Raman is a physician who has spent most of his career in Medical Informatics and Analytics. With the experience of leadership positions at several large US health insurance and information technology firms, he has extensive expertise working with medical claims and clinical data using big-data platforms like Hadoop and Spark. He counts R among his favorite programming languages along with Scala and Perl.
Related to Just Enough R
Related ebooks
Learn R Programming in 24 Hours Rating: 0 out of 5 stars0 ratingsR for Data Science Rating: 5 out of 5 stars5/5Learning RStudio for R Statistical Computing Rating: 4 out of 5 stars4/5Developing Analytic Talent: Becoming a Data Scientist Rating: 3 out of 5 stars3/5Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples Rating: 0 out of 5 stars0 ratingsPython For Data Science Rating: 0 out of 5 stars0 ratingsLearning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition Rating: 0 out of 5 stars0 ratingsGetting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsHands-On Time Series Analysis with R: Perform time series analysis and forecasting using R Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials Rating: 0 out of 5 stars0 ratingsWeb Application Development with R Using Shiny - Second Edition Rating: 0 out of 5 stars0 ratingsR High Performance Programming Rating: 4 out of 5 stars4/5Preparing Data for Analysis with JMP Rating: 0 out of 5 stars0 ratingsPython Machine Learning By Example Rating: 4 out of 5 stars4/5Learning Bayesian Models with R Rating: 5 out of 5 stars5/5Introduction to Data Science Using R Rating: 0 out of 5 stars0 ratingsData Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition) Rating: 0 out of 5 stars0 ratingsLearning R Programming Rating: 5 out of 5 stars5/5R Programming - a Comprehensive Guide: Software Rating: 0 out of 5 stars0 ratingsPractical Data Analysis Rating: 4 out of 5 stars4/5Data Analytics. Fast Overview. Rating: 3 out of 5 stars3/5Data Collection: Getting Started With Statistics Rating: 0 out of 5 stars0 ratings
Information Technology For You
CompTia Security 701: Fundamentals of Security Rating: 0 out of 5 stars0 ratingsLearning Microsoft Endpoint Manager: Unified Endpoint Management with Intune and the Enterprise Mobility + Security Suite Rating: 0 out of 5 stars0 ratingsCreating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5An Ultimate Guide to Kali Linux for Beginners Rating: 3 out of 5 stars3/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5DevOps Handbook: What is DevOps, Why You Need it and How to Transform Your Business with DevOps Practices Rating: 4 out of 5 stars4/5Cyber Security Consultants Playbook Rating: 0 out of 5 stars0 ratingsSummary of Super-Intelligence From Nick Bostrom Rating: 4 out of 5 stars4/5How to Write Effective Emails at Work Rating: 4 out of 5 stars4/5DNS in Action Rating: 0 out of 5 stars0 ratingsCompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61 Rating: 5 out of 5 stars5/5A Mind at Play: How Claude Shannon Invented the Information Age Rating: 4 out of 5 stars4/5COMPUTER SCIENCE FOR ROOKIES Rating: 0 out of 5 stars0 ratingsThe iPadOS 17: The Complete User Manual to Quick Set Up and Mastering the iPadOS 17 with New Features, Pictures, Tips, and Tricks Rating: 0 out of 5 stars0 ratingsCreating your MySQL Database: Practical Design Tips and Techniques Rating: 3 out of 5 stars3/5Data Management Fundamentals (DMF) - CDMP exam preparation Rating: 0 out of 5 stars0 ratingsSupercommunicator: Explaining the Complicated So Anyone Can Understand Rating: 3 out of 5 stars3/5Personal Knowledge Graphs: Connected thinking to boost productivity, creativity and discovery Rating: 5 out of 5 stars5/5Hacking Essentials - The Beginner's Guide To Ethical Hacking And Penetration Testing Rating: 3 out of 5 stars3/5Health Informatics: Practical Guide Rating: 0 out of 5 stars0 ratingsMicrosoft Access for Beginners and Intermediates Rating: 0 out of 5 stars0 ratingsHow To Use Chatgpt: Using Chatgpt To Make Money Online Has Never Been This Simple Rating: 0 out of 5 stars0 ratingsCybersecurity Playbook for Executives Rating: 0 out of 5 stars0 ratingsAWS Certified Cloud Practitioner: Study Guide with Practice Questions and Labs Rating: 5 out of 5 stars5/5CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008 Rating: 0 out of 5 stars0 ratingsRandom Tech Rating: 0 out of 5 stars0 ratingsOracle JD Edwards EnterpriseOne 9.0: Supply Chain Management Cookbook Rating: 0 out of 5 stars0 ratings
Reviews for Just Enough R
2 ratings0 reviews
Book preview
Just Enough R - Sivakumaran Raman
Whom is This Book For?
If your job involves working with data in any manner, you cannot afford to ignore the R revolution! If your domain is called data analysis, analytics, informatics, data science, reporting, business intelligence, data management, big data, or visualization, you just have to learn R as this programming language is a game-changing sledgehammer.
However, if you have looked at a standard text on R or read some of the online discussions, you might feel that there is a steep learning curve of six months or more to grok the language. I will debunk this myth through my book by focusing on practical essentials instead of theory.
If you have programmed in some language in the past (whether that language be SAS, SPSS, C, C++, C#, Java, Python, Perl, Visual Basic, Ruby, Scala, shell scripts, or plain old SQL), even if you are rusty, this book will get you up and running with R in a single day, writing programs for data analysis and visualization.
At the end of this book you will be able to:
- write R programs to execute on the 3 major data-analysis phases.
- visualize data in an illustrative and interactive manner
- move on to using R for big data analytics
R you excited? You should be. Let us charge forward!
Preface
R (https://en.wikipedia.org/wiki/R_(programming_language)) is an interpreted, open-source, free, statistical-programming and data-analysis language. It was created by Ross Ihaka and Robert Gentleman. It is a functional language and has all the standard programming features like variables, functions, objects, loops, and data-structures.
R is perfect for data analysis and visualization. Though R can, in theory, be used for tasks like web programming and building software applications, it is not optimized for these purposes and is not preferred for these tasks. R was created in 1993 and has become very popular because of the rapid growth of the domains of big data, data science, visualization, and analytics.
The aim of this book is to teach the elements of R programming in a single day. This book is meant for people who already know how to program in at least one language and want to learn R. After completing this book, the reader should be able to write simple R programs for data analysis. Instead of adopting a spoon-feeding approach, I assume that the reader is familiar with standard programming constructs like variables, functions and the like – therefore, I only outline differences in the way R does things. The emphasis is on writing and running programs in R for data analysis and visualization. The book includes a sample data-analysis conducted on freely available CMS-sourced (CMS: Centers for Medicare and Medicaid Services) healthcare data. The book does not aim to teach all the elements of statistics, machine learning or data science – since doing so would expand the scope of the book immensely.
Unlike many standard texts on R, the book teaches the most effective way to accomplish any specific task in R. No effort is made to teach all the ways in which a particular task can be completed: No TMTOWTDI (https://en.wikipedia.org/wiki/There's_more_than_one_way_to_do_it)!
All through the text, I provide a lot of Internet links to more information and detail. This is one of the great things about open-source software – it is usually supported by a very active web-based community of users and almost all the answers to questions newbies might have can be found online. The R community is one of the largest and best in this regard. Lastly, instead of laying out all the theory behind R programming (for which there are numerous other sources on the Internet), the emphasis is on learning by doing – the code samples provided throughout the book should be read and understood line by line. The reader should make an effort to complete the practice exercises offered at the ends of certain chapters.
Preparation to Start
Computer
Any Windows® or Linux machine can be used. I would recommend at least 8 GB of Random Access Memory be available on the computer.
The R programs used in this book were run on two different computers:
• R version 3.3.2 on a Windows laptop running Windows 10 Pro, Intel(R) Core(TM) i5-2520M CPU @ 2.50 GHz, 8Gb RAM, L3 cache size 3072 KB
• R version 3.3.3 on a Linux laptop running Ubuntu 14.04, Intel(R) Celeron(R) CPU 1007U @ 1.50GHz, 8Gb RAM, L3 cache size 2048 KB
R is available for Mac and other platforms as well – interested readers can use these.
Installation of Java
Some of the R packages we will be using are wrappers around Java-based libraries and thus require the Java Runtime Environment (JRE) to be installed on the computer. Please install the latest version of the Oracle JRE (http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html) if you are on Windows. On Linux, you can install either the OpenJDK (http://openjdk.java.net/) Java Runtime (using apt-get or a similar software installation tool) or the Oracle JRE for Linux.
After installation, ensure that the java executable is in the PATH. This can be tested by running the java -version command at the Linux (bash) shell or Windows command line (cmd.exe or powershell.exe) and seeing if the appropriate message appears:
Linux bash shell:
radium@aceraspiredelto:~$ java -version
java version 1.7.0_121
OpenJDK Runtime Environment (IcedTea 2.6.8) (7u121-2.6.8-1ubuntu0.14.04.3)
OpenJDK 64-Bit Server VM (build 24.121-b00, mixed mode)
Windows cmd.exe shell:
C:\Users\shiminty\Desktop>java -version
java version 1.8.0_121
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
If the java executable is not in the PATH, please edit the PATH variable (at the system or user level) and add the path to java (java.exe on Windows) to the PATH variable.
Installation of R and associated software and packages
For Windows, R binaries can be downloaded and installed from the R website (https://www.r-project.org/). After installing R on Windows, please edit the PATH variable (at the system or user level) and add the paths to R.exe, Rscript.exe to the variable.
For installation of R on Linux, it is best to use the software package management tool for your Linux distribution. For Debian and Ubuntu Linux, the tool to use is apt-get. Linux installs of R using tools like apt-get mostly add the paths to the R and Rscript executables to the PATH variable. However, if this is not the case, please modify your PATH variable on Linux.
After R has been installed, install the R packages we will need by running the install.packages() command within R with a list of supplied package names. First, start up the R interactive-session (also called a Read-Eval-Print-Loop or REPL) by typing R at the command line. Then run the install.packages() command copied from the text-box below with the full list of packages to be installed. Make sure you are connected to the Internet and choose a CRAN (Comprehensive R Archive Network) package repository mirror close to your geographical location. If R warns you about the fact that it is installing the packages in a user-level local repository (since you are running R on the machine without admin or root privileges), it is not a cause for concern: Respond with a Yes to this message, and proceed.
On Linux, the command line session looks like this (list of R packages included):
radium@aceraspiredelto:~$ R
R version 3.3.2 (2016-10-31) -- Sincere Pumpkin Patch
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> install.packages(c(broom
, choroplethr
, data.table
,
datacheck
, dplyr
, dtplyr
, ggplot2
, ggvis
,
h2o
, htmlwidgets
, httr
, jsonlite
, leaflet
,
maps
, maptools
, OpenStreetMap
, plotly
,
randomForest
, R2HTML
, RDSTK
, readr
, rjson
,
rpart
, RSQLite
, scales
, sqldf
, stargazer
,
svglite
, tidyr
, tmap
, ztable
));
> q()
Save workspace image? [y/n/c]: n
radium@aceraspiredelto:~$
Note: On Linux, some package and software dependencies might crop up while installing the svglite package or other R packages. The svglite package depends on gdtools. But the installation of gdtools first requires the Cairo (https://www.cairographics.org/download/) graphics software developer libraries to be installed using apt-get or similar software package tool on Linux.The way to do it on Ubuntu/Debian Linux is:
sudo apt-get install libcairo2-dev
After this, re-running the install.packages() command for svglite within the R REPL should work smoothly:
radium@aceraspiredelto:~$ R
R version 3.3.2 (2016-10-31) -- Sincere Pumpkin Patch
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for