download.file("https://cf-courses-data.s3.us.
cloud-object-
storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0101EN-Coursera/v2/dataset/movies-db.xls",
destfile="movies-db.xls") download.file("https://cf-courses-data.s3.us.cloud-object-
storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0101EN-Coursera/v2/dataset/movies-db.csv",
destfile="movies-db.csv")
In [6]: my_data <- read.csv("movies-db.csv")
my_data
A data.frame: 30 × 8
name year length_min genre average_rating cost_millions foreign age_restriction
<fct> <int> <int> <fct> <dbl> <dbl> <int> <int>
Toy Story 1995 81 Animation 8.3 30.0 0 0
Akira 1998 125 Animation 8.1 10.4 1 14
The Breakfast Club 1985 97 Drama 7.9 1.0 0 14
The Artist 2011 100 Romance 8.0 15.0 1 12
Modern Times 1936 87 Comedy 8.6 1.5 0 10
Fight Club 1999 139 Drama 8.9 63.0 0 18
City of God 2002 130 Crime 8.7 3.3 1 18
The Untouchables 1987 119 Drama 7.9 25.0 0 14
Star Wars Episode IV 1977 121 Action 8.7 11.0 0 10
American Beauty 1999 122 Drama 8.4 15.0 0 14
Room 2015 118 Drama 8.3 13.0 1 14
Dr. Strangelove 1964 94 Comedy 8.5 1.8 1 10
The Ring 1998 95 Horror 7.3 1.2 1 18
Monty Python and the
1975 91 Comedy 8.3 0.4 1 18
Holy Grail
High School Musical 2006 98 Comedy 5.2 4.2 0 0
Shaun of the Dead 2004 99 Horror 8.0 6.1 1 18
Taxi Driver 1976 113 Crime 8.3 1.3 1 14
The Shawshank
1994 142 Crime 9.3 25.0 0 16
Redemption
Interstellar 2014 169 Adventure 8.6 165.0 0 10
Casino 1995 178 Biography 8.2 50.0 0 18
The Goodfellas 1990 145 Biography 8.7 25.0 0 14
Blue is the Warmest
2013 179 Romance 7.8 4.5 1 18
Colour
Black Swan 2010 108 Thriller 8.0 13.0 0 16
Back to the Future 1985 116 Sci-fi 8.5 19.0 0 0
The Wave 2008 107 Thriller 7.6 5.5 1 16
Whiplash 2014 106 Drama 8.5 3.3 1 12
The Grand Hotel
2014 100 Crime 8.1 25.5 0 14
Budapest
Jumanji 1995 104 Fantasy 6.9 65.0 0 12
The Eternal Sunshine of 2004 108 Drama 8.3 20.0 0 14
the Spotless Mind
Chicago 2002 113 Comedy 7.2 45.0 0 12
In [3]:
Error in read.excel("movies-db.xls"): could not find function "read.excel"
Traceback:
In [7]: head(my_data)
A data.frame: 6 × 8
name year length_min genre average_rating cost_millions foreign age_restriction
<fct> <int> <int> <fct> <dbl> <dbl> <int> <int>
1 Toy Story 1995 81 Animation 8.3 30.0 0 0
2 Akira 1998 125 Animation 8.1 10.4 1 14
3 The Breakfast Club 1985 97 Drama 7.9 1.0 0 14
4 The Artist 2011 100 Romance 8.0 15.0 1 12
5 Modern Times 1936 87 Comedy 8.6 1.5 0 10
6 Fight Club 1999 139 Drama 8.9 63.0 0 18
In [8]: str(my_data)
'data.frame': 30 obs. of 8 variables:
$ name : Factor w/ 30 levels "Akira","American Beauty",..: 29 1 21 20 14 10 8
27 18 2 ...
$ year : int 1995 1998 1985 2011 1936 1999 2002 1987 1977 1999 ...
$ length_min : int 81 125 97 100 87 139 130 119 121 122 ...
$ genre : Factor w/ 12 levels "Action","Adventure",..: 3 3 7 10 5 7 6 7 1 7
...
$ average_rating : num 8.3 8.1 7.9 8 8.6 8.9 8.7 7.9 8.7 8.4 ...
$ cost_millions : num 30 10.4 1 15 1.5 63 3.3 25 11 15 ...
$ foreign : int 0 1 0 1 0 0 1 0 0 0 ...
$ age_restriction: int 0 14 14 12 10 18 18 14 10 14 ...
In [9]: library(readxl)
In [10]: my_excel_data <- read_excel("movies-db.xls")
In [11]: str(my_excel_data)
tibble [30 × 8] (S3: tbl_df/tbl/data.frame)
$ name : chr [1:30] "Toy Story" "Akira" "The Breakfast Club" "The Artist" ...
$ year : num [1:30] 1995 1998 1985 2011 1936 ...
$ length_min : num [1:30] 81 125 97 100 87 139 130 119 121 122 ...
$ genre : chr [1:30] "Animation" "Animation" "Drama" "Romance" ...
$ average_rating : num [1:30] 8.3 8.1 7.9 8 8.6 8.9 8.7 7.9 8.7 8.4 ...
$ cost_millions : num [1:30] 30 10.4 1 15 1.5 63 3.3 25 11 15 ...
$ foreign : num [1:30] 0 1 0 1 0 0 1 0 0 0 ...
$ age_restriction: num [1:30] 0 14 14 12 10 18 18 14 10 14 ...
In [12]: my_data['name']
A data.frame: 30 × 1
name
<fct>
Toy Story
Akira
The Breakfast Club
The Artist
Modern Times
Fight Club
City of God
The Untouchables
Star Wars Episode IV
American Beauty
Room
Dr. Strangelove
The Ring
Monty Python and the Holy Grail
High School Musical
Shaun of the Dead
Taxi Driver
The Shawshank Redemption
Interstellar
Casino
The Goodfellas
Blue is the Warmest Colour
Black Swan
Back to the Future
The Wave
Whiplash
The Grand Hotel Budapest
Jumanji
The Eternal Sunshine of the Spotless Mind
Chicago
In [13]: my_data$name
Toy Story · Akira · The Breakfast Club · The Artist · Modern Times · Fight Club · City of God ·
The Untouchables · Star Wars Episode IV · American Beauty · Room · Dr. Strangelove · The Ring ·
Monty Python and the Holy Grail · High School Musical · Shaun of the Dead · Taxi Driver ·
The Shawshank Redemption · Interstellar · Casino · The Goodfellas · Blue is the Warmest Colour ·
Black Swan · Back to the Future · The Wave · Whiplash · The Grand Hotel Budapest · Jumanji ·
The Eternal Sunshine of the Spotless Mind · Chicago
Levels:
In [14]: my_data[["name"]]
Toy Story · Akira · The Breakfast Club · The Artist · Modern Times · Fight Club · City of God ·
The Untouchables · Star Wars Episode IV · American Beauty · Room · Dr. Strangelove · The Ring ·
Monty Python and the Holy Grail · High School Musical · Shaun of the Dead · Taxi Driver ·
The Shawshank Redemption · Interstellar · Casino · The Goodfellas · Blue is the Warmest Colour ·
Black Swan · Back to the Future · The Wave · Whiplash · The Grand Hotel Budapest · Jumanji ·
The Eternal Sunshine of the Spotless Mind · Chicago
Levels:
In [15]: my_data[1, c("name","length_min")]
A data.frame: 1 × 2
name length_min
<fct> <int>
1 Toy Story 81
In [16]: data()
Data sets
A data.frame: 104 × 3
Package Item Title
<chr> <chr> <chr>
datasets AirPassengers Monthly Airline Passenger Numbers 1949-1960
datasets BJsales Sales Data with Leading Indicator
datasets BJsales.lead (BJsales) Sales Data with Leading Indicator
datasets BOD Biochemical Oxygen Demand
datasets CO2 Carbon Dioxide Uptake in Grass Plants
datasets ChickWeight Weight versus age of chicks on different diets
datasets DNase Elisa assay of DNase
datasets EuStockMarkets Daily Closing Prices of Major European Stock Indices, 1991-1998
datasets Formaldehyde Determination of Formaldehyde
datasets HairEyeColor Hair and Eye Color of Statistics Students
datasets Harman23.cor Harman Example 2.3
datasets Harman74.cor Harman Example 7.4
datasets Indometh Pharmacokinetics of Indomethacin
datasets InsectSprays Effectiveness of Insect Sprays
datasets JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share
datasets LakeHuron Level of Lake Huron 1875-1972
datasets LifeCycleSavings Intercountry Life-Cycle Savings Data
datasets Loblolly Growth of Loblolly pine trees
datasets Nile Flow of the River Nile
datasets Orange Growth of Orange Trees
datasets OrchardSprays Potency of Orchard Sprays
datasets PlantGrowth Results from an Experiment on Plant Growth
datasets Puromycin Reaction Velocity of an Enzymatic Reaction
datasets Seatbelts Road Casualties in Great Britain 1969-84
datasets Theoph Pharmacokinetics of Theophylline
datasets Titanic Survival of passengers on the Titanic
datasets ToothGrowth The Effect of Vitamin C on Tooth Growth in Guinea Pigs
datasets UCBAdmissions Student Admissions at UC Berkeley
datasets UKDriverDeaths Road Casualties in Great Britain 1969-84
datasets UKgas UK Quarterly Gas Consumption
datasets USAccDeaths Accidental Deaths in the US 1973-1978
datasets USArrests Violent Crime Rates by US State
datasets USJudgeRatings Lawyers' Ratings of State Judges in the US Superior Court
datasets USPersonalExpenditure Personal Expenditure Data
datasets UScitiesD Distances Between European Cities and Between US Cities
datasets VADeaths Death Rates in Virginia (1940)
datasets WWWusage Internet Usage per Minute
datasets WorldPhones The World's Telephones
datasets ability.cov Ability and Intelligence Tests
datasets airmiles Passenger Miles on Commercial US Airlines, 1937-1960
datasets airquality New York Air Quality Measurements
datasets anscombe Anscombe's Quartet of 'Identical' Simple Linear Regressions
datasets attenu The Joyner-Boore Attenuation Data
datasets attitude The Chatterjee-Price Attitude Data
datasets austres Quarterly Time Series of the Number of Australian Residents
datasets beaver1 (beavers) Body Temperature Series of Two Beavers
datasets beaver2 (beavers) Body Temperature Series of Two Beavers
datasets cars Speed and Stopping Distances of Cars
datasets chickwts Chicken Weights by Feed Type
datasets co2 Mauna Loa Atmospheric CO2 Concentration
datasets crimtab Student's 3000 Criminals Data
datasets discoveries Yearly Numbers of Important Discoveries
datasets esoph Smoking, Alcohol and (O)esophageal Cancer
datasets euro Conversion Rates of Euro Currencies
datasets euro.cross (euro) Conversion Rates of Euro Currencies
datasets eurodist Distances Between European Cities and Between US Cities
datasets faithful Old Faithful Geyser Data
datasets fdeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK
datasets freeny Freeny's Revenue Data
datasets freeny.x (freeny) Freeny's Revenue Data
datasets freeny.y (freeny) Freeny's Revenue Data
datasets infert Infertility after Spontaneous and Induced Abortion
datasets iris Edgar Anderson's Iris Data
datasets iris3 Edgar Anderson's Iris Data
datasets islands Areas of the World's Major Landmasses
datasets ldeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK
datasets lh Luteinizing Hormone in Blood Samples
datasets longley Longley's Economic Regression Data
datasets lynx Annual Canadian Lynx trappings 1821-1934
datasets mdeaths (UKLungDeaths) Monthly Deaths from Lung Diseases in the UK
datasets morley Michelson Speed of Light Data
datasets mtcars Motor Trend Car Road Tests
datasets nhtemp Average Yearly Temperatures in New Haven
datasets nottem Average Monthly Temperatures at Nottingham, 1920-1939
datasets npk Classical N, P, K Factorial Experiment
datasets occupationalStatus Occupational Status of Fathers and their Sons
datasets precip Annual Precipitation in US Cities
datasets presidents Quarterly Approval Ratings of US Presidents
datasets pressure Vapor Pressure of Mercury as a Function of Temperature
datasets quakes Locations of Earthquakes off Fiji
datasets randu Random Numbers from Congruential Generator RANDU
datasets rivers Lengths of Major North American Rivers
datasets rock Measurements on Petroleum Rock Samples
datasets sleep Student's Sleep Data
datasets stack.loss (stackloss) Brownlee's Stack Loss Plant Data
datasets stack.x (stackloss) Brownlee's Stack Loss Plant Data
datasets stackloss Brownlee's Stack Loss Plant Data
datasets state.abb (state) US State Facts and Figures
datasets state.area (state) US State Facts and Figures
datasets state.center (state) US State Facts and Figures
datasets state.division (state) US State Facts and Figures
datasets state.name (state) US State Facts and Figures
datasets state.region (state) US State Facts and Figures
datasets state.x77 (state) US State Facts and Figures
datasets sunspot.month Monthly Sunspot Data, from 1749 to "Present"
datasets sunspot.year Yearly Sunspot Data, 1700-1988
datasets sunspots Monthly Sunspot Numbers, 1749-1983
datasets swiss Swiss Fertility and Socioeconomic Indicators (1888) Data
datasets treering Yearly Treering Data, -6000-1979
datasets trees Girth, Height and Volume for Black Cherry Trees
datasets uspop Populations Recorded by the US Census
datasets volcano Topographic Information on Auckland's Maunga Whau Volcano
datasets warpbreaks The Number of Breaks in Yarn during Weaving
datasets women Average Heights and Weights for American Women
Use ‘data(package = .packages(all.available = TRUE))’ to list the data sets in all *available* packages.
In [17]: help(women)
women {datasets} R Documentation
Average Heights and Weights for American Women
Description
This data set gives the average heights and weights for American women aged 30–39.
Usage
women
Format
A data frame with 15 observations on 2 variables.
[,1] height numeric Height (in)
[,2] weight numeric Weight (lbs)
Details
The data set appears to have been taken from the American Society of Actuaries Build and Blood Pressure
Study for some (unknown to us) earlier year.
The World Almanac notes: “The figures represent weights in ordinary indoor clothing and shoes, and heights
with shoes”.
Source
The World Almanac and Book of Facts, 1975.
References
McNeil, D. R. (1977) Interactive Data Analysis. Wiley.
Examples
require(graphics)
plot(women, xlab = "Height (in)", ylab = "Weight (lb)",
main = "women data: American women aged 30-39")
[Package datasets version 3.5.1 ]
In [18]: women
A data.frame: 15
×2
height weight
<dbl> <dbl>
58 115
59 117
60 120
61 123
62 126
63 129
64 132
65 135
66 139
67 142
68 146
69 150
70 154
71 159
72 164
In [20]: summary(my_data)
name year length_min genre
Akira : 1 Min. :1936 Min. : 81.00 Drama :7
American Beauty : 1 1st Qu.:1988 1st Qu.: 99.25 Comedy :5
Back to the Future : 1 Median :1998 Median :110.50 Crime :4
Black Swan : 1 Mean :1996 Mean :116.80 Animation:2
Blue is the Warmest Colour: 1 3rd Qu.:2008 3rd Qu.:124.25 Biography:2
Casino : 1 Max. :2015 Max. :179.00 Horror :2
(Other) :24 (Other) :8
average_rating cost_millions foreign age_restriction
Min. :5.200 Min. : 0.400 Min. :0.0 Min. : 0.00
1st Qu.:7.925 1st Qu.: 3.525 1st Qu.:0.0 1st Qu.:12.00
Median :8.300 Median : 13.000 Median :0.0 Median :14.00
Mean :8.103 Mean : 22.300 Mean :0.4 Mean :12.93
3rd Qu.:8.500 3rd Qu.: 25.000 3rd Qu.:1.0 3rd Qu.:16.00
Max. :9.300 Max. :165.000 Max. :1.0 Max. :18.00
In [ ]: