2012 NSF Graduate Research Fellowship statistics
Elson Liu June 11, 2012
Contents
1 Preliminaries 2 Undergraduate institution 3 Graduate institution 4 Field of Study 5 Subject area 1 2 5 7 9
Preliminaries
The awardee list was downloaded from https://www.fastlane.nsf.gov/grfp/AwardeeList. do?method=sort&method%3DloadAwardeeList&exportType=2. Some preprocessing was done in Microsoft Excel: baccalaureate institutions were normalized by converting to lower case, and a Subject column was generated by splitting the Field of Study column on hyphens. The data were then exported in CSV format as NSFAwardeeList2012.csv. Load libraries library(xtable) library(ggplot2) library(gdata) Import data df <- read.csv("NSFAwardeeList2012.csv", head = TRUE)
Undergraduate institution
Tabulate undergraduate institution frequencies ugrads <- table(df$Lower.Case.Baccalaureate, dnn = c("Number of awardees")) Convert the table back to a data frame udf <- as.data.frame(ugrads) names(udf) = c("Undergrad", "Awardees") head(udf) ## ## ## ## ## ## ## Undergrad Awardees 1 2 2 agnes scott college 1 3 albany state university 1 4 alfred university 1 5 amherst college 4 6 appalachian state university 2
Sort by number of awardees o <- order(-udf$Awardees) ugrads.sorted <- udf[o, ] head(ugrads.sorted, n = 35L) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Undergrad Awardees massachusetts institute of technology 59 university of california berkeley 51 cornell university 41 university of texas at austin 38 stanford university 36 harvard university 31 princeton university 31 university of washington 29 yale university 28 arizona state university 27 california institute of technology 26 university of california-berkeley 26 university of wisconsin-madison 26 brown university 23 georgia institute of technology 22 university of arizona 22 university of california-davis 22 university of chicago 20 2
166 298 73 386 255 120 216 398 436 7 28 299 401 24 106 291 300 311
## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
69 columbia university 84 duke university 307 university of california, san diego 329 university of illinois at urbana-champaign 336 university of maryland 432 william marsh rice university 320 university of florida 344 university of michigan 345 university of michigan ann arbor 348 university of minnesota-twin cities 397 university of virginia main campus 366 university of pennsylvania 23 brigham young university 44 carnegie-mellon university 200 northwestern university 267 swarthmore college 270 texas a&m university main campus
19 19 19 19 18 18 17 17 17 17 17 16 15 15 15 15 15
Select undergrad institutions with more than 20 awardees and draw a dotplot ugrads.top <- drop.levels(udf[udf$Awardees > 20, ]) p <- qplot(x = Awardees, y = Undergrad, data = ugrads.top) print(p)
yale university university of wisconsinmadison university of washington university of texas at austin university of californiadavis university of californiaberkeley university of california berkeley
q q
Undergrad
university of arizona stanford university princeton university massachusetts institute of technology harvard university georgia institute of technology cornell university california institute of technology brown university arizona state university
30
40
50
60
Awardees
A Generate a L TEX-formatted table
utable <- xtable(ugrads) print(utable, type = "latex", file = "undergrads.tex", tabular.environment = "longtable")
Graduate institution
Tabulate graduate institution frequencies grads <- table(df$Graduate.Institution, dnn = c("Number of awardees")) Convert the table back to a data frame gdf <- as.data.frame(grads) names(gdf) = c("Grad", "Awardees") head(gdf) ## ## ## ## ## ## ## Grad Awardees 1 2 2 American Museum Natural History 4 3 Arizona State University 15 4 Auburn University 1 5 Baylor College of Medicine 1 6 Baylor University 1
Sort by number of awardees o <- order(-gdf$Awardees) grads.sorted <- gdf[o, ] head(grads.sorted, n = 19L) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Grad Awardees University of California-Berkeley 190 Massachusetts Institute of Technology 147 Stanford University 136 Harvard University 99 Cornell University 58 University of Washington 56 University of Michigan Ann Arbor 51 Duke University 43 University of Wisconsin-Madison 43 California Institute of Technology 42 Northwestern University 40 University of Texas at Austin 40 University of California-Davis 37 University of California-San Diego 36 University of California-San Francisco 36 Columbia University 35 Princeton University 32 Yale University 32 University of California-Los Angeles 30 5
96 49 78 39 23 158 128 28 159 11 60 150 97 101 102 22 69 172 99
Select grad institutions with more than 30 awardees and draw a dotplot grads.top <- drop.levels(gdf[gdf$Awardees > 30, ]) p <- qplot(x = Awardees, y = Grad, data = grads.top) print(p)
Yale University University of WisconsinMadison University of Washington University of Texas at Austin University of Michigan Ann Arbor University of CaliforniaSan Francisco University of CaliforniaSan Diego University of CaliforniaDavis
Grad
University of CaliforniaBerkeley Stanford University Princeton University Northwestern University Massachusetts Institute of Technology Harvard University Duke University Cornell University Columbia University California Institute of Technology
q q q q q
50
100
150
Awardees
A Generate a L TEX-formatted table
gtable <- xtable(grads) print(gtable, type = "latex", file = "grads.tex", tabular.environment = "longtable")
Field of Study
Tabulate eld of study frequencies fields <- table(df$Field.of.Study, dnn = c("Number of awardees")) Convert the table back to a data frame fdf <- as.data.frame(fields) names(fdf) = c("Field", "Awardees") head(fdf) ## ## ## ## ## ## ## Field Awardees 1 2 2 Chemistry - Chemical Catalysis 9 3 Chemistry - Chemical Measurement and Imaging 17 4 Chemistry - Chemical Structure, Dynamics, and Mechanism 16 5 Chemistry - Chemical Synthesis 39 6 Chemistry - Chemical Theory, Models and Computational Methods 8
Sort by number of awardees o <- order(-fdf$Awardees) fields.sorted <- fdf[o, ] head(fields.sorted, n = 10L) ## ## ## ## ## ## ## ## ## ## ## Field Awardees Life Sciences - Ecology 121 Engineering - Biomedical 87 Engineering - Mechanical 79 Engineering - Chemical Engineering 71 Life Sciences - Neurosciences 64 Engineering - Bioengineering 62 Engineering - Electrical and Electronic 56 Life Sciences - Evolutionary Biology 51 Life Sciences - Molecular Biology 49 Engineering - Environmental 46
91 39 48 40 98 38 43 93 97 45
Select elds of study with more than 40 awardees and draw a dotplot 7
fields.top <- drop.levels(fdf[fdf$Awardees > 40, ]) p <- qplot(x = Awardees, y = Field, data = fields.top) print(p)
Life Sciences Neurosciences
Life Sciences Molecular Biology
Life Sciences Evolutionary Biology
Life Sciences Ecology
Engineering Mechanical
Field
Engineering Environmental
Engineering Electrical and Electronic
Engineering Chemical Engineering
Engineering Biomedical
Engineering Bioengineering
60
80
100
120
Awardees
A Generate a L TEX-formatted table
ftable <- xtable(fields) print(ftable, type = "latex", file = "fields.tex", tabular.environment = "longtable") 8
Subject area
Tabulate subject area frequencies subjects <- table(df$Subject, dnn = c("Number of awardees")) Convert the table back to a data frame sdf <- as.data.frame(subjects) names(sdf) = c("Subject", "Awardees") head(sdf, n = 10L) ## ## ## ## ## ## ## ## ## ## ## Subject Awardees 1 2 2 Chemistry 153 3 Comp/IS/Eng 104 4 Engineering 529 5 Geosciences 94 6 Life Sciences 566 7 Materials Research 42 8 Mathematical Sciences 75 9 Physics and Astronomy 107 10 Psychology 146
Sort by number of awardees o <- order(-sdf$Awardees) subjects.sorted <- sdf[o, ] head(subjects.sorted, n = 10L) ## ## ## ## ## ## ## ## ## ## ## Subject Awardees 6 Life Sciences 566 4 Engineering 529 11 Social Sciences 172 2 Chemistry 153 10 Psychology 146 9 Physics and Astronomy 107 3 Comp/IS/Eng 104 5 Geosciences 94 8 Mathematical Sciences 75 7 Materials Research 42 9
Draw a dotplot of number of awardees for each subject area p <- qplot(x = Awardees, y = Subject, data = subjects.sorted) print(p)
STEM Education and Learning Research
Social Sciences
Psychology
Physics and Astronomy
Mathematical Sciences
Subject
Materials Research
Life Sciences
Geosciences
Engineering
Comp/IS/Eng
Chemistry
100
200
300
400
500
Awardees
A Generate a L TEX-formatted table
10
stable <- xtable(subjects) print(stable, type = "latex", file = "subjects.tex", tabular.environment = "longtable") Number of awardees 2 153 104 529 94 566 42 75 107 146 172 12
Chemistry Comp/IS/Eng Engineering Geosciences Life Sciences Materials Research Mathematical Sciences Physics and Astronomy Psychology Social Sciences STEM Education and Learning Research
11