Jurnal Prostho
Jurnal Prostho
David Schuff
20.1 Introduction
Increasing attention has been paid to the demand for data scientists. In fact,
Davenport and Patil (2012) declared the data scientist as the “sexiest job of the 21st
century.” What exactly is meant by the term “data scientist,” however, is unclear.
We often think of a data scientist as a highly quantitative, technically-trained
professional with advanced knowledge of statistics and big data infrastructure
technologies.
However, Davenport and Patil (2012) define a data scientist as “a high-ranking
professional with the training and curiosity to make discoveries in the world of big
data.” Press (2012) to defines the data scientist as “an engineer who employs the
scientific method and applies data-discovery tools to find new insights in data.”
D. Schuff (*)
Department of Management Information Systems, Fox School of Business,
Temple University, 210 Speakman Hall, 1810 North 13th Street, Philadelphia,
PA 19122-6083, USA
e-mail: schuff@temple.edu
These definitions are broad and do not necessarily imply a data scientist is a statisti-
cian, a computer scientist, or even a business analyst. Davenport and Patil’s defini-
tion specifically mentions “big data,” while Press’ definition does not.
What these definitions have in common is that they underscore the importance of
data literacy (as opposed to statistical and technological proficiency) as a skill for
discovery. Infusing data literacy into a curriculum is an unrealized opportunity for
higher education to truly make an impact on the current generation as they prepare
to move into the workforce. Universities are squarely focused on providing focused
Master’s Degrees in Business Analytics; there are now 117 such programs accord-
ing to the website “Master’s In Data Science” (www.mastersindatascience.org).
However, a data-literate undergraduate population, through their sheer numbers,
have a far greater potential impact on the way organizations operate.
This chapter describes the design and structure of a new, unique undergraduate
elective course introduced last year into the curriculum of Temple University, a
large, public University in the Northeastern United States. In its first year it has gone
from a pilot to a regular, multi-section offering in the University’s “General
Education” curriculum by emphasizing practical data literacy through current
events, readily available analysis tools and the methods of scientific inquiry.
Temple University is a large, public, urban institution with over 37,000 students. Its
primary mission is to educate the regional undergraduate population through 140
bachelor degree programs (the University also has 126 master’s degree and 57 doc-
toral programs). There are 17 schools and colleges including liberal arts, business,
education, law, media and communication, music and dance, and engineering.
Like many large Universities, there is an institution-wide core curriculum that
covers several broad categories. To fulfill the “General Education” or “GenEd”
requirements, students must select from a menu of courses in each category, which
includes analytical reading and writing, humanities, quantitative literacy, arts,
human behavior, race and diversity, science and technology, US society, and world
society.
One of the stated goals of the University’s GenEd program is that, in an environ-
ment where “the amount of information is available … and the speed with which we
can access information … continues to expand,” the University must teach students
“how information is linked and how pieces of information are interrelated” (Temple
University 2015). This is certainly a reasonable and important goal for undergradu-
ate education regardless of a student’s major field of study, with obvious ties to
concepts of data science and data literacy.
Further, Information Systems is a field that is well-positioned to deliver this
material to a broad audience. Key aspects of the IS2010 Model Curriculum includes
“understanding and addressing information requirements” and “exploiting
opportunities created by technology innovations” (Topi et al. 2010). Most impor-
20 Data Science for All: A University-Wide Course in Data Literacy 283
tantly, Information Systems is one of the few fields with the orientation and skill set
to teach data literacy to a non-technical audience. Its emphasis on training business
professionals create an applied focus on the identification and collection of data for
problem-solving and use of practical analysis tools.
With this in mind, we proposed, developed, and executed a new course for the
University’s GenEd curriculum that would employ this dual focus on data and tech-
nology, targeted at a non-technical audience. The design of the course set out to
inspire an “evidence-based” mindset, encouraging students to identify and use data
relevant to them in their field of study and the larger world around them.
The course was designed to address several of the University GenEd program’s
broad learning goals (Temple University 2015):
1. Information literacy, including the ability to recognize and articulate informa-
tion needs; to locate, critically evaluate, and organize information for a specific
purpose; and to recognize and reflect on the ethical use of information.
2. Development of critical thinking skills, including the evaluation of evidence,
analysis and synthesis of multiple sources, and reflection on varied
perspectives.
3. Communications skills, using spoken and written language to construct a mes-
sage that demonstrates the communicator has established clear goals and has
considered her or his audience.
4. Retrieve, organize, and analyze data associated with a scientific model.
5. Understand and communicate how technology encourages the process of
discovery.
6. Recognize, use, and appreciate scientific or technological thinking for solving
problems that are part of everyday life.
From these broad goals, we developed ten specific learning goals for the course
that could be evaluated through assignments and exams. Because of the course’s
dual focus—literacy and skill-building—the course learning goals span Krathwohl’s
(2002) knowledge dimension, with goals focused on factual (e.g., knowing data sci-
ence terminology), conceptual (e.g., applying data visualization principles to assess
the effectiveness of a graphic), and procedural knowledge (e.g., how to clean a data
set). The learning goals also span the entire range of Krathwohl’s cognitive process
dimension, requiring students to remember, understand, apply, analyze, evaluate,
and create. This is in line with the purpose of the course, which is to impart termi-
nology, teach basic skills, and have them apply those skills to produce original
knowledge. Table 20.1 lists each learning goal, along with where the specific goal
lies on each dimension. These learning goals can involve several components and
therefore may span multiple levels in a dimension.
284 D. Schuff
This module builds skills in support of the learning goals of “information literacy,”
“critical thinking,” “how technology encourages discovery,” and “technological
thinking for everyday problems.” The basics of scientific inquiry is discussed in this
module, including the notion of theory and hypotheses formation. Students also
learn to identify sources of relevant data. They will learn the role of data across
many disciplines, with concrete examples from current events. For example, this
module discussed the National Security Agency’s collection and use of telephony
metadata. This module will also cover how citizens and organizations can use
government-published “open data” to understand the world around them.
20.4.3 O
verview of Module 3: Working with Data in the Real
World
This module builds skills in support of the “information literacy” GenEd learning
goal, and the Science & Technology area goals to “retrieve, organize, and analyze
data” and “technological thinking to solve everyday problems. Students learn how
to fix problems in data sets. This builds on the course’s first module where they learn
how to identify data quality issues in data. Students learn how to address these prob-
lems through data cleansing and transformation to create a useable, reliable data set
using Microsoft Excel. They will resolve inconsistencies within and across data
20 Data Science for All: A University-Wide Course in Data Literacy 289
sets, and determine when data is in the wrong form. The exercise below introduces
students to the concept of a Key Performance Indicator. The exercise is intended to
apply the SMART criteria (Specific, Measurable, Achievable, Relevant, and Time
Phased) to evaluate candidate KPIs for a scenario. Another exercise requires the
students to create several KPI scorecards using a more business-oriented scenario:
on-time flight data for a set of airports.
This module builds skills in support of the “critical thinking” GenEd learning goal
and the Science & Technology area goals to “retrieve, organize, and analyze data,”
as “how technology encourages discovery,” and “technological thinking for every-
day problems.” Students learn how data is stored and organized. Specifically, they
will learn the differences between spreadsheets and databases and why each is used.
They will also learn three analytics techniques to give them a sense of what can be
done with data analytics, and also to give them hands-on experience with analytics
tools such as Microsoft Excel and Tableau Desktop. For example, students learn
how to use Pivot Tables to summarize large data sets (such as the crime activity
assignment described below), and Association Analysis to discover which products
are likely to be bought together at a store (such as graham crackers and marshmal-
lows). Students also learn to interpret the output from these analyses and make
inferences about underlying patterns in the data.
The final project is a group project that requires students to bring together what
they’ve learned throughout the course. Student teams source an “original” data set
(i.e., not one already used in the course), develop a research question, and then
answer that question by using one or more of the data analysis techniques and tools
covered in the course. Students are encouraged to find a data set and question that is
relevant and interesting to them.
The deliverable is a five-minute presentation with two minutes for questions
from the instructor and the class. The short presentation is a deliberate choice
because (1) it forces students to hone their presentation skills by being direct and to
the point and (2) it allows the course to scale.
The suggested format for the final presentation is:
• Slide 1 should list the group members and the title of the presentation.
• Slide 2 will describe the scenario. What question will be answered and why is it
important?
20 Data Science for All: A University-Wide Course in Data Literacy 291
2 . Within each feed, click on a few tweets and read the replies.
3. Find three examples of positive tweets, three examples of negative tweets,
and three examples of neutral tweets (neither positive nor negative). Write
them down in three lists.
4. Make a note of why you classified them as positive, negative, or neutral.
Part 2: Larger Group (5 min)
1 . Find another group to form a group of four.
2. Share your lists of positive and negative tweets. See if you agree with each
other’s choices.
3. Come up with rules for determining whether a tweet is positive or nega-
tive. For example:
(a) Are there certain words which increase your certainty of how to clas-
sify the tweet?
(b) Are there certain tweets that sound positive but really are negative?
(c) How do you detect sarcasm?
(d) How would you explain to someone how to classify tweets?
Part 3: Class Discussion (20 min)
We’ll compare notes. Specifically, we will discuss:
• What are some rules for determining positive versus negative sentiment?
• Were some tweets difficult to categorize? Why?
• In what ways would this be a good method of understanding how people
felt about your brand? In what ways could it give you bad information?
292 D. Schuff
• Slide 3 will describe the data. What are the key elements and how was it obtained?
• Slides 4 and 5 will describe the analysis and the results, making good use of data
visualizations.
• Slide 6 will summarize the conclusions. What was learned? Students should sup-
port their conclusions using the results of the analysis, citing specific evidence.
• Slide 7 will list the references.
Some examples of final projects from past classes include:
• Exploring the question of whether the best soccer players are the highest paid.
• An analysis of the national origin of members of terrorist organizations.
• Investigating what time of day students are most likely to answer an online
survey.
• Correlations between stock price and social media sentiment for a major aero-
space firm.
20.6 Conclusions
Data analytics is no longer the prevue of data scientists. It is a fundamental skill for
the twenty-first century workforce. This is both a challenge and an opportunity for
higher education, one that the Information Systems discipline is uniquely positioned
to meet. With courses such as the one described here, Information Systems can
increase their reach beyond the business school and provide valuable, marketable
skills to a broad audience across the University.
The key to seizing this opportunity is to recognize that “data literacy” is the true
core skill for undergraduate students, not sophisticated analytics techniques. We
must instill in our students an appreciation of evidence-based decision-making
through an appreciation of what data can do and how even simple analysis can yield
sophisticated insights.
Biography
Course Description
We are all drowning in data, and so is your future employer. Data pours in from
sources as diverse as social media, customer loyalty programs, weather stations,
smartphones, and credit card purchases. How can you make sense of it all? Those
that can turn raw data into insight will be tomorrow’s decision-makers; those that
can solve problems and communicate using data will be tomorrow’s leaders. This
course will teach you how to harness the power of data by mastering the ways it is
stored, organized, and analyzed to enable better decisions. You will get hands-on
experience by solving problems using a variety of powerful, computer-based data
tools virtually every organization uses. You will also learn to make more impactful
and persuasive presentations by learning the key principles of presenting data
visually.
Course Objectives
Assignments
# Assignment description
1 Create a data analysis plan (individual)
Develop a plan for data analysis by forming hypothesis and finding data sets that will allow
you to test those hypotheses. The scenario: Once students graduate, it’s time for them to go
get a job. But is staying in the area the best choice? Evaluate our city as a place to live,
work, and play compared to the rest of the United States
2 Analyze a data set using tableau (individual)
Use Tableau to analyze and reveal various relationships within a data set. Use the data set
from the Environmental Protection Agency regarding fuel economy 2015 model year cars.
Answer a series of questions by creating the most visually effective charts and graphs using
the guidelines discussed in class
294 D. Schuff
# Assignment description
3 Cleaning a data set (individual)
Correct the errors in a data set for the fictitious company “Vandelay Industries.” The sales
group is suspicious that there might be errors in the data for January. Work with a new data
set of 3296 orders with 5192 line items from January 2014
4 Group data analysis (group term project)
In groups, perform an original analysis on a data set of your choosing. The data set can
come from any source as long as it is something you have not already worked on for this
course. Possible sources of data include: open data from Data.gov, data sets from the Pew
Research Center, sports statistics, a data set from your current employer, or an original
survey conducted by your group
Your analysis should clearly demonstrate the tools and techniques you’ve been exposed to in
this course. This can take any form you’d like (i.e., comparison of averages across
categories, mapping geographic data, sentiment analysis, developing and visualizing KPIs)
Your group will present your work in class through a five-min presentation, with 2 min for
questions
Week/
session Topic/key questions Readings
Module 1: Data in our daily lives
1.1 Introduction
• Course introduction/syllabus
• What is the difference between
data, information, and knowledge?
• What makes “big data” big?
1.2 Science and data science Dhar, V. (2013). Data Science and
• What is data science? Prediction. Communications of the
• What is the difference between a ACM. Vol. 56, No. 12. pp. 64–73
theory and a hypothesis?
Allain, R. (2013). Three Science Words
• What are the dangers of data
We Should Stop Using. Wired.com.
analysis without a hypotheses?
March 27
2.1 A brief introduction to data Stein, G. (2013). I’m Beating the NSA to
• What are the forms data can take? the Punch by Spying on Myself.
• Where does data come from? Fastcolabs.com. June 12
• What is metadata? A data
Di Justto, P. (2013). What the
dictionary?
N.S.A. Wants to Know about Your Phone
Calls. The New Yorker. June 7
2.2 Identifying Sources of Data Silver, N. (2014). What the Fox Knows.
• What kinds of data are available in FiveThirtyEight.com. March 17
different disciplines (arts, sciences,
Open Data. Wikipedia
medicine, business, government,
etc.)? Silver, N. (2014). In Search of America’s
• What kinds of problems and issues Best Burrito. FiveThirtyEight.com.
can data insight address? June 5
20 Data Science for All: A University-Wide Course in Data Literacy 295
Week/
session Topic/key questions Readings
3.1 Learning to (Mis)trust Data Weisberg, J. (2011). Bubble Trouble: Is
• How do you spot reliable sources Web Personalization Turning Us Into
of data? Solipsistic Twits? Slate.com
• How do you assess data quality?
Crawford, K. (2013). The Hidden Biases
• What is the “Filter Bubble?”
in Big Data. Harvard Business Review
Blog Network. April 1
Hayes, B. (2013). In Data We Trust.
Business Over Broadway. November 4
3.2 Guest speaker
Module 2: Telling stories with data
4.1 Viewing data Unwin, A. (2008). Chapter II.2: Good
• What are different ways of viewing Graphics? Handbook of Data
data? Visualization. Chen, Hardle, and Unwin
• When do you need to visualize (Eds.). pp. 57–78
data?
• What are the basic techniques of
data visualization?
4.2 Introduction to Tableau Hoven, N. (n.d.). Stephen Few on Data
• What is Tableau? What can you do Visualization: 8 Core Principles. Tableau
with it? Software
• How is it different from Microsoft
Acohido, B. (2013). Watch Out,
Excel?
Terrorists: Big Data is on the Case.
USAToday.com. July 29
5.1 Communicating using data Davenport, T. (2013). Telling a Story
• What are the principles of with Data. Deloitte University Press
communicating data?
Matlin, C. (2014). Visualizaing a Day in
• How do you communicate complex
the Life of a New York City Cab.
ideas using data?
FiveThirtyEight.com. July 17
• How do you construct
visualizations that complement a
report? That stand on their own?
5.2 Storytelling with infographics Krum, R. (2014). Cool infographics:
• How are infographics different Effective Communication with Data
from other types of visualizations? Visualization. (Chapter 1: The Science of
• How do infographic tools differ Infographics)
from other data tools we’ve used
Krum, R. (2014). Cool infographics:
so far?
Effective Communication with Data
Visualization. (Chapter 6: Designing
Infographics)
6.1/6.2 Exam review/EXAM 1
Module 3: Working with data in the real world
7.1 Dirty Data Redman, T. (2013). Data’s Credibility
• How does data get dirty? Problem. Harvard Business Review. Vol.
• What are the consequences (i.e., 91, No. 12. pp. 84–88
ethical, financial) of dirty data?
Gandel, S. (2013). Damn Excel! How the
• How do you clean it?
‘Most Important Software Application of
All Time’ Is Ruining the World. Fortune.
com. April 17
296 D. Schuff
Week/
session Topic/key questions Readings
7.2 Data cleansing Taber, D. (2010). Stupid Data Corruption
• How do you identify data Tricks: Take our CRM Quiz. CIO.com.
problems? November 2
• How do you correct data
Top Ten Ways to Clean Your Data.
problems?
Microsoft
• When is fixing the data not worth it?
8.1 Choosing relevant data Performance Indicator. Wikipedia
• How do you identify Key
Schambra, W. (2013). The Tyranny of
Performance Indicators (KPIs)?
Success: Nonprofits and Metrics.
• How do you identify the right
NonprofitQuarterly.com. December 30
measure for the selected problem?
8.2 Evaluating key performance indicators Olson, P. (2014). Wearable Tech is
• How do you categorize and Plugging into Health Insurance. Forbes.
visualize KPIs according to a com. June 19
threshold?
Bialik, C. (2014). Tracking Health One
• How do you use Tableau to
Step (and Clap, and Wave, and Fist
evaluate KPIs? How would you use
Pump) at a Time. FiveThirtyEight.com.
Excel?
March 17
9.1 Connecting diverse data Strickland, J. (n.d.). How Data
• How do you identify data sets that Integration Works. howstuffworks.com
can be combined?
Gallagher, S. (2014). The GOP Arms
• How do you combine data sets?
Itself for the Next “War” in the Analytics
• How do you resolve conflicts?
Arms Race. arstechnica.com. February 7
9.2 Creating interactive dashboards Best Practices for Designing Views and
• How does a dashboard differ from Dashboards. Tableau Software
an Infographic? A chart?
Farmer, D. (2014). The One Skill You
• How do dashboards facilitate
Really Need for Data Analysis
decision-making?
10.1/10.2 Exam Review/EXAM 2
Module 4: Analyzing data
11.1 Storing and retrieving data Rosenblum, M. and Dorsey, P. (n.d.).
• What is a database? How are Knowing Just Enough about Relational
spreadsheets just a type of Databases. Dummies.com
database?
Bertolucci, J. (2013). How to Explain
• How are technology advances
Hadoop to Non-Geeks. InformationWeek.
changing how we think about
com. November 19
storing data?
• What are the core technologies of
big data analytics?
11.2 Using Tableau to aggregate data Acampora, J. (2013). How to Structure
• What can you learn from Source Data for Excel Pivot Tables &
aggregation? Unpivot. July 18
• How does thinking of data
dimensionally help solve
problems?
20 Data Science for All: A University-Wide Course in Data Literacy 297
Week/
session Topic/key questions Readings
12.1 Beyond numbers Hurwitz, J., Nugent, A., Halper, F., and
• What is the difference between Kaufman, M. (n.d.). Unstructured Data in
structured and unstructured data? a Big Data Environment. Dummies.com
• What can you learn from text data
Feldman, R. (2013). Techniques and
that you can’t from numeric data?
Applications for Sentiment Analysis.
• What are the tools for text
Communications of the ACM. Vol. 56,
analysis?
No. 4. pp. 82–89
12.2 Twitter sentiment analysis using Excel Wohlsen, M. (2014). Don’t Worry,
and Google Drive Facebook Still Has No Clue How You
• What are the steps in performing a Feel. Wired.com. July 2
sentiment analysis?
• What are the challenges in deriving
meaningful information from text?
13.1 Predicting the future Paine, N. (2014). What Analytics Can
• What is predictive analytics? What Teach Us About the Beautiful Game.
problems does it address? June 12
• What kinds of analysis can be
Bertolucci, J. (2013). Big Data Analytics:
done?
Descriptive vs. Predictive vs.
• What kinds of data are needed for
Prescriptive. InformationWeek.com.
an analysis?
December 31
13.2 Predictive analytics using Tableau Peck, D. (2013). They’re Watching You at
• Perform a forecasting analysis Work. TheAtlantic.com. November 20
• Perform a simple association
analysis
13.1/13.2 Group presentations/FINAL EXAM
review
References
Davenport TH, Patil DJ (2012) Data scientist: the sexiest job of the 21st century. Harv Bus Rev
90(10):70–76
Krathwohl D (2002) A revision of Bloom’s taxonomy: an overview. Theory Pract 41(4):212–218
Press G (2012) Data scientists: the definition of sexy. Forbes. http://www.forbes.com/sites/gil-
press/2012/09/27/data-scientists-the-definition-of-sexy/#526d00375187. Accessed 27 Sept
2012
Temple University (2015) General education program. http://gened.temple.edu. Accessed 29 Feb
2016
Topi H, Valacich J, Wright RT, Kaiser K, Nunamaker JF, Sipior JC, Jan de Vreede G (2010)
IS 2010: curriculum guidelines for undergraduate degree programs in information systems.
Commun Assoc Inf Syst 26:18