[go: up one dir, main page]

0% found this document useful (0 votes)
56 views17 pages

Jurnal Prostho

Jurnal gtl gigi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views17 pages

Jurnal Prostho

Jurnal gtl gigi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 20

Data Science for All: A University-Wide


Course in Data Literacy

David Schuff

Abstract  Infusing data literacy into a curriculum is an unrealized opportunity for


higher education to truly make an impact on the current generation as they prepare
to move into the workforce. This chapter describes the design and structure of a new,
unique undergraduate elective course introduced into the curriculum of a large, pub-
lic University in the Northeastern United States. The design of the course is designed
to inspire an “evidence-based” mindset, encouraging students to identify and use
data relevant to them in their field of study and the larger world around them. The
chapter includes the course goals mapped to specific learning objectives, examples
of exercises and assignments, a reading list, and a course syllabus. Instructors and
institutions interested in bringing data science concepts to a broad audience can use
this course as a foundation to build their own curriculum in this area.

Keywords  Data science • Data literacy • Curriculum design • Pedagogy

20.1  Introduction

Increasing attention has been paid to the demand for data scientists. In fact,
Davenport and Patil (2012) declared the data scientist as the “sexiest job of the 21st
century.” What exactly is meant by the term “data scientist,” however, is unclear.
We often think of a data scientist as a highly quantitative, technically-trained
­professional with advanced knowledge of statistics and big data infrastructure
technologies.
However, Davenport and Patil (2012) define a data scientist as “a high-ranking
professional with the training and curiosity to make discoveries in the world of big
data.” Press (2012) to defines the data scientist as “an engineer who employs the
scientific method and applies data-discovery tools to find new insights in data.”

D. Schuff (*)
Department of Management Information Systems, Fox School of Business,
Temple University, 210 Speakman Hall, 1810 North 13th Street, Philadelphia,
PA 19122-6083, USA
e-mail: schuff@temple.edu

© Springer International Publishing AG 2018 281


A.V. Deokar et al. (eds.), Analytics and Data Science, Annals of Information
Systems 21, DOI 10.1007/978-3-319-58097-5_20
282 D. Schuff

These definitions are broad and do not necessarily imply a data scientist is a statisti-
cian, a computer scientist, or even a business analyst. Davenport and Patil’s defini-
tion specifically mentions “big data,” while Press’ definition does not.
What these definitions have in common is that they underscore the importance of
data literacy (as opposed to statistical and technological proficiency) as a skill for
discovery. Infusing data literacy into a curriculum is an unrealized opportunity for
higher education to truly make an impact on the current generation as they prepare
to move into the workforce. Universities are squarely focused on providing focused
Master’s Degrees in Business Analytics; there are now 117 such programs accord-
ing to the website “Master’s In Data Science” (www.mastersindatascience.org).
However, a data-literate undergraduate population, through their sheer numbers,
have a far greater potential impact on the way organizations operate.
This chapter describes the design and structure of a new, unique undergraduate
elective course introduced last year into the curriculum of Temple University, a
large, public University in the Northeastern United States. In its first year it has gone
from a pilot to a regular, multi-section offering in the University’s “General
Education” curriculum by emphasizing practical data literacy through current
events, readily available analysis tools and the methods of scientific inquiry.

20.2  The Environment

Temple University is a large, public, urban institution with over 37,000 students. Its
primary mission is to educate the regional undergraduate population through 140
bachelor degree programs (the University also has 126 master’s degree and 57 doc-
toral programs). There are 17 schools and colleges including liberal arts, business,
education, law, media and communication, music and dance, and engineering.
Like many large Universities, there is an institution-wide core curriculum that
covers several broad categories. To fulfill the “General Education” or “GenEd”
requirements, students must select from a menu of courses in each category, which
includes analytical reading and writing, humanities, quantitative literacy, arts,
human behavior, race and diversity, science and technology, US society, and world
society.
One of the stated goals of the University’s GenEd program is that, in an environ-
ment where “the amount of information is available … and the speed with which we
can access information … continues to expand,” the University must teach students
“how information is linked and how pieces of information are interrelated” (Temple
University 2015). This is certainly a reasonable and important goal for undergradu-
ate education regardless of a student’s major field of study, with obvious ties to
concepts of data science and data literacy.
Further, Information Systems is a field that is well-positioned to deliver this
material to a broad audience. Key aspects of the IS2010 Model Curriculum includes
“understanding and addressing information requirements” and “exploiting
­opportunities created by technology innovations” (Topi et al. 2010). Most impor-
20  Data Science for All: A University-Wide Course in Data Literacy 283

tantly, Information Systems is one of the few fields with the orientation and skill set
to teach data literacy to a non-technical audience. Its emphasis on training business
professionals create an applied focus on the identification and collection of data for
problem-solving and use of practical analysis tools.
With this in mind, we proposed, developed, and executed a new course for the
University’s GenEd curriculum that would employ this dual focus on data and tech-
nology, targeted at a non-technical audience. The design of the course set out to
inspire an “evidence-based” mindset, encouraging students to identify and use data
relevant to them in their field of study and the larger world around them.

20.3  Course Goals

The course was designed to address several of the University GenEd program’s
broad learning goals (Temple University 2015):
1. Information literacy, including the ability to recognize and articulate informa-
tion needs; to locate, critically evaluate, and organize information for a specific
purpose; and to recognize and reflect on the ethical use of information.
2. Development of critical thinking skills, including the evaluation of evidence,
analysis and synthesis of multiple sources, and reflection on varied
perspectives.
3. Communications skills, using spoken and written language to construct a mes-
sage that demonstrates the communicator has established clear goals and has
considered her or his audience.
4. Retrieve, organize, and analyze data associated with a scientific model.
5. Understand and communicate how technology encourages the process of

discovery.
6. Recognize, use, and appreciate scientific or technological thinking for solving
problems that are part of everyday life.
From these broad goals, we developed ten specific learning goals for the course
that could be evaluated through assignments and exams. Because of the course’s
dual focus—literacy and skill-building—the course learning goals span Krathwohl’s
(2002) knowledge dimension, with goals focused on factual (e.g., knowing data sci-
ence terminology), conceptual (e.g., applying data visualization principles to assess
the effectiveness of a graphic), and procedural knowledge (e.g., how to clean a data
set). The learning goals also span the entire range of Krathwohl’s cognitive process
dimension, requiring students to remember, understand, apply, analyze, evaluate,
and create. This is in line with the purpose of the course, which is to impart termi-
nology, teach basic skills, and have them apply those skills to produce original
knowledge. Table 20.1 lists each learning goal, along with where the specific goal
lies on each dimension. These learning goals can involve several components and
therefore may span multiple levels in a dimension.
284 D. Schuff

Table 20.1  Course learning goals mapped to Krathwohl’s (2002) taxonomy


Knowledge Cognitive process
Learning goal dimension dimension
1.  Describe how advances in technology enable the Factual Remember
field of data science—This includes topics such as the
storage and retrieval of data, the difference between a
relational database and a “flat-file” spreadsheet, and
advances due to big data technologies.
2.  Locate sources of data relevant to their field of Procedural Apply, analyze
study—A key goal of the course is to see the
relevance of data and take a more data-driven
approach within their own major. Students should be
able to identify data sets relevant to a specific
problem. When those data sets do not exist, they
should be able to create them.
3.  Identify and correct problems with data sets to Procedural Analyze, evaluate
facilitate analysis—This includes both identifying
“bad data” and determining appropriate remedies for
correcting it, such as deleting bad data, replacing
erroneous numeric values with the mean, and
reconciling errors in text labels.
4.  Combine data sets from different sources— Procedural Evaluate, create
Students must be able to reconcile differences in
numeric and textual representations across data sets
so that they can be analyzed as a single data source.
5.  Assess the quality of a data source—Students Conceptual, Evaluate
assess the trustworthiness of a data set by judging its procedural
source and its content. This is a significant issue with
freely available “open data.”
6.  Convey meaningful insights from a data analysis Conceptual, Understand,
through visualizations—This includes learning the procedural apply, create
basic principles of effective visualizations in order to
critique existing graphics and make effective choices
when creating original ones.
7.  Analyze a data set using pivot tables—Students Conceptual, Apply, analyze
learn how to use pivot tables to aggregate and procedural
summarize data to reveal insights.
8.  Determine meaning in textual data using text Conceptual, Apply, evaluate
mining—Students learn the mechanics of sentiment procedural
analysis to understand its proper use and its
limitations as a decision tool. They also use a simple
sentiment analysis tool to analyze a body of text.
9.  Identify when advanced analytics techniques are Factual, Remember,
appropriate—This includes differentiating between conceptual understand
descriptive, prescriptive, and predictive analytics and
what types of problems each can address.
10. Predict events that will occur together using Procedural Apply, analyze
association mining—Students perform simple
association mining on a common problem (market
basket analysis) to take analytics from the theoretical
to the practical in a way that can be replicated using
their own data.
20  Data Science for All: A University-Wide Course in Data Literacy 285

Table 20.2  Learning goal mapping by module


Module Module Module
Learning goal 1 2 3 Module 4
Information literacy ✔ ✔
Critical thinking ✔ ✔ ✔
Communications skills ✔
Retrieve, organize, and analyze data ✔ ✔
How technology encourages discovery ✔ ✔
Technological thinking for everyday problems ✔ ✔ ✔ ✔

20.4  Course Structure

The course is divided into four multi-week modules:


• Module 1: Data in our Daily Lives
• Module 2: Telling Stories with Data
• Module 3: Working with Data in the Real World
• Module 4: Analyzing Data
Each class session has a discussion component (approximately one-third of the
class time), supplemented with experiential learning both in and outside of class.
In-class experiential activities (about two-thirds of class time) are ungraded and
directly tied to the current discussion topic. Homework assignments build on con-
cepts introduced in the lecture and in-class activities. They employ more in-depth
research and hands-on, computer-based activities using software tools such as
Tableau Desktop and Microsoft Excel. A final group project requires students to
select an issue relevant to them, source a data set, perform an analysis, and com-
municate the insight from their analysis to a general audience.
We designed the course for medium-to-large class sizes (60 students and over)
with multiple sections and instructors. The in-class activities encourage teamwork
through small breakout groups of 4–5 students. The four modules of the course take
students through the process of identifying and collecting, communicating, prepar-
ing, and analyzing data. The modules are deliberately ordered to first give students
foundational skills they will use later in the course (identifying data and communi-
cating results). Also, the course introduces students to concepts of data preparation
before they do basic predictive analysis.
Table 20.2 summarizes how the six learning goals—information literacy, critical
thinking, communications skills, the ability to retrieve, organize, and analyze data,
understanding how technology encourages the process of discovery, and the use of
technological thinking in solving problems that are part of everyday life—are cov-
ered in the course. The rest of this section explains how each module addresses
these learning goals, including a specific example of an in-class activity that sup-
ports the course content. Appendix contains an abridged version of the course syl-
labus, including session-by-session topics, brief assignment descriptions, and a
reading list.
286 D. Schuff

20.4.1  Overview of Module 1: Data in Our Daily Lives

This module builds skills in support of the learning goals of “information literacy,”
“critical thinking,” “how technology encourages discovery,” and “technological
thinking for everyday problems.” The basics of scientific inquiry is discussed in this
module, including the notion of theory and hypotheses formation. Students also
learn to identify sources of relevant data. They will learn the role of data across
many disciplines, with concrete examples from current events. For example, this
module discussed the National Security Agency’s collection and use of telephony
metadata. This module will also cover how citizens and organizations can use
government-­published “open data” to understand the world around them.

Sample In-Class Exercise: Identifying Sources of Data


Objective: Identify uses for data from open data sites. (Open data is the blan-
ket term for publicly available data sets that can be used freely and without
restriction. Usually this data comes from government sources.)
Learning Outcomes:
• Navigate metadata repositories and explore the data sets
• Understand the types of data available through open data sources
• Formulate possible uses for new data sets
Step 1: Explore—individual (10 min)
1. Visit these open data sites: http://www.data.gov and http://www.opendata-
philly.org
2. Browse each site to get a feel for what kind of data is available from each
one. Navigating these sites can be a little cryptic; look for “Data” tabs and
“View the Dataset” and “Download” buttons.
3. Identify two data sets from each site that are interesting to you. Make sure
you write down the name of the data set so you can find it again!
Step 2: Prepare—group (15 min)
1 . In groups of three or four, compare and discuss your lists.
2. Select two data sets from each site (Data.gov and OpenDataPhilly) that
you want to share with the class.
3. Identify how that data could be used. Be creative! For example, imagine a
new website or mobile app that would use the data.
4. Designate a member of your group to be the spokesperson.
Step 3: Class Discussion (20 min)
Each group will briefly report out with their best ideas.
20  Data Science for All: A University-Wide Course in Data Literacy 287

20.4.2  Overview of Module 2: Telling Stories with Data

The module builds skills in support of the learning goals of “communications


skills,” “critical thinking,” “retrieve, organize, and analyze data,” and “technological
thinking for everyday problems.” This course has a unique take on the communica-
tion goal—beyond just oral and written skills, today’s students should be able to
communicate visually. They must be able to create clear, informative graphics to
effectively communicate insights in a data set. Students learn key principles that
differentiate good data visualizations from bad ones using guidelines from experts
such as Edward Tufte. Students apply these principles by evaluating a series of
examples pulled from current news sources. For example, Fig. 20.1 highlights an
example from Fox News that highlights how differences in data points can be mis-
represented by truncating the y-axis:
The sample exercise below requires students to locate examples of good and bad
data visualizations and evaluate them using criteria based on a set of guidelines. The
students then post their findings to the course site and present their examples to the
rest of the class. This creates a library of positive and negative examples that they
can refer to later.

Sample In-Class Exercise: Finding Good and Bad Data Visualizations


Objective: Understand the difference between effective and ineffective data
visualizations.
Learning Outcomes:
• Identify the message a graphic is trying to convey
• Evaluate how successful the graphic is at conveying that message
• Explain why, according to the principles discussed in class, the graphic is
(or is not) effective
Step 1: Explore—individual (15 min)
1. Visit one or more of these data visualization/infographic sites: http://www.
flowingdata.com, http://www.coolinfographics.com, or http://dailyinfo-
graphic.com
2. Identify one example of a graphic that you think is very good, and one
example of a graphic that you think isn’t that great and could be improved.
So you don’t lose track of your graphics, copy and paste the URL where
you found the graphic from your browser into a Word document. You can
also use the document to make some notes.
3. Using the principles we discussed in class, come up with
(a) Reasons why the graphic you’ve chosen are good/bad.
(b) Make recommendations for improvement.
288 D. Schuff

Step 2: Prepare—group (15 min)


1 . In groups of 3–5, compare and discuss your notes.
2. From your lists, choose one example of a good graphic and one example
of a bad graphic.
3. For each group, post a comment to the entry on the course web site for this
in-class exercise. Make sure the name of at least one member of your
group is included in the post.
Step 3: Class Discussion (15 min)
Each group will report out on their choices. Make sure you enter the URLs
correctly—I’ll display your graphics as your talking about them!

Fig. 20.1  Visualization example inappropriately truncating the y-axis

20.4.3  O
 verview of Module 3: Working with Data in the Real
World

This module builds skills in support of the “information literacy” GenEd learning
goal, and the Science & Technology area goals to “retrieve, organize, and analyze
data” and “technological thinking to solve everyday problems. Students learn how
to fix problems in data sets. This builds on the course’s first module where they learn
how to identify data quality issues in data. Students learn how to address these prob-
lems through data cleansing and transformation to create a useable, reliable data set
using Microsoft Excel. They will resolve inconsistencies within and across data
20  Data Science for All: A University-Wide Course in Data Literacy 289

sets, and determine when data is in the wrong form. The exercise below introduces
students to the concept of a Key Performance Indicator. The exercise is intended to
apply the SMART criteria (Specific, Measurable, Achievable, Relevant, and Time
Phased) to evaluate candidate KPIs for a scenario. Another exercise requires the
students to create several KPI scorecards using a more business-oriented scenario:
on-time flight data for a set of airports.

Sample In-Class Exercise: Identifying Key Performance Indicators


Objective: Select Key Performance Indicators (KPIs) that facilitate evalua-
tion for a given scenario.
Learning Outcomes:
• Identify “good” KPIs that adhere to the SMART criteria
• Select the best KPIs from a list of potential metrics
• Describe the limitations of using KPIs to make an evaluation
Scenario:
Working in groups is something students regularly do in their classes.
However, everybody has worked in a group where the quality of individual
contributions vary significantly among group members.
Your task is to come up with a set of KPIs to evaluate a group member.
This can be used as a tool for groups to evaluate and give feedback to each
other during group projects.
Step 1: Identify Key Performance Indicators (Individual, 5 min)
Working individually, come up with five KPIs that can be used to measure
how a student is doing as a contributor to a group project. This should be done
from the perspective of another student, not an instructor; however, an instruc-
tor might be interested in your list. Make sure each KPI is adheres to the
SMART criteria discussed in class: Specific, Measurable, Achievable,
Relevant, and Time Phased.
For example, “Student contributes quality work” is not a good KPI:
• It isn’t specific—What does “quality” mean? What does “contribute”
mean?
• It isn’t measurable—It is unclear how quality would be measured?
• It isn’t time phased—A time period is never specified; is this weekly con-
tributions, or over the course of the entire project?
However, “Number of ideas contributed each week” would be a much bet-
ter KPI. It is certainly more specific and more measurable!
Step 2: Determine the Best KPIs (Group, 10 min)
In groups of 3–4, compare your lists. Choose the three best KPIs, taking
ideas from your individual lists. Remember that you should choose items that
best adhere to the SMART criteria, but you should also think about the items
that are most important in determining who is a good member of a project
group.
290 D. Schuff

Step 3: Group Discussion (15 min)


Share your KPIs with the rest of the class. They should all meet the SMART
criteria, so beyond that explain why the ones you’ve chosen are the best to use
for evaluation—why are they the most helpful in differentiate good group
members from poor ones.

20.4.4  Overview of Module 4: Analyzing Data

This module builds skills in support of the “critical thinking” GenEd learning goal
and the Science & Technology area goals to “retrieve, organize, and analyze data,”
as “how technology encourages discovery,” and “technological thinking for every-
day problems.” Students learn how data is stored and organized. Specifically, they
will learn the differences between spreadsheets and databases and why each is used.
They will also learn three analytics techniques to give them a sense of what can be
done with data analytics, and also to give them hands-on experience with analytics
tools such as Microsoft Excel and Tableau Desktop. For example, students learn
how to use Pivot Tables to summarize large data sets (such as the crime activity
assignment described below), and Association Analysis to discover which products
are likely to be bought together at a store (such as graham crackers and marshmal-
lows). Students also learn to interpret the output from these analyses and make
inferences about underlying patterns in the data.

20.5  Final Project

The final project is a group project that requires students to bring together what
they’ve learned throughout the course. Student teams source an “original” data set
(i.e., not one already used in the course), develop a research question, and then
answer that question by using one or more of the data analysis techniques and tools
covered in the course. Students are encouraged to find a data set and question that is
relevant and interesting to them.
The deliverable is a five-minute presentation with two minutes for questions
from the instructor and the class. The short presentation is a deliberate choice
because (1) it forces students to hone their presentation skills by being direct and to
the point and (2) it allows the course to scale.
The suggested format for the final presentation is:
• Slide 1 should list the group members and the title of the presentation.
• Slide 2 will describe the scenario. What question will be answered and why is it
important?
20  Data Science for All: A University-Wide Course in Data Literacy 291

In-Class Exercise: Manually Determining the Sentiment of Textual Data


Objective: Differentiate between positive and negative sentiment in text.
Learning Outcomes:
• Perform a manual sentiment analysis of a Twitter stream
• Develop rules for classifying a message as positive or negative
• Explain the problems and issues with accurately describing sentiment
within text
Part 1: Group (15 min)
1. In groups of two, visit two Twitter feeds for well-known brands. You can
choose any brand you want for this, but if you need some ideas:

CocaCola @CocaCola http://twitter.com/cocacola


McDonalds @McDonalds http://twitter.com/mcdonalds
Honda Motors @Honda http://twitter.com/honda
Starbucks @Starbucks http://twitter.com/starbucks
Nike @Nike http://twitter.com/nike
H&M @hm http://twitter.com/hm

2 . Within each feed, click on a few tweets and read the replies.
3. Find three examples of positive tweets, three examples of negative tweets,
and three examples of neutral tweets (neither positive nor negative). Write
them down in three lists.
4. Make a note of why you classified them as positive, negative, or neutral.
Part 2: Larger Group (5 min)
1 . Find another group to form a group of four.
2. Share your lists of positive and negative tweets. See if you agree with each
other’s choices.
3. Come up with rules for determining whether a tweet is positive or nega-
tive. For example:
(a) Are there certain words which increase your certainty of how to clas-
sify the tweet?
(b) Are there certain tweets that sound positive but really are negative?
(c) How do you detect sarcasm?
(d) How would you explain to someone how to classify tweets?
Part 3: Class Discussion (20 min)
We’ll compare notes. Specifically, we will discuss:
• What are some rules for determining positive versus negative sentiment?
• Were some tweets difficult to categorize? Why?
• In what ways would this be a good method of understanding how people
felt about your brand? In what ways could it give you bad information?
292 D. Schuff

• Slide 3 will describe the data. What are the key elements and how was it obtained?
• Slides 4 and 5 will describe the analysis and the results, making good use of data
visualizations.
• Slide 6 will summarize the conclusions. What was learned? Students should sup-
port their conclusions using the results of the analysis, citing specific evidence.
• Slide 7 will list the references.
Some examples of final projects from past classes include:
• Exploring the question of whether the best soccer players are the highest paid.
• An analysis of the national origin of members of terrorist organizations.
• Investigating what time of day students are most likely to answer an online
survey.
• Correlations between stock price and social media sentiment for a major aero-
space firm.

20.6  Conclusions

Data analytics is no longer the prevue of data scientists. It is a fundamental skill for
the twenty-first century workforce. This is both a challenge and an opportunity for
higher education, one that the Information Systems discipline is uniquely positioned
to meet. With courses such as the one described here, Information Systems can
increase their reach beyond the business school and provide valuable, marketable
skills to a broad audience across the University.
The key to seizing this opportunity is to recognize that “data literacy” is the true
core skill for undergraduate students, not sophisticated analytics techniques. We
must instill in our students an appreciation of evidence-based decision-making
through an appreciation of what data can do and how even simple analysis can yield
sophisticated insights.

Biography 

David Schuff is Professor of Management Information Systems in the Fox School


of Business at Temple University. David holds a B.A. in Economics from the
University of Pittsburgh, an M.B.A. from Villanova University, an M.S. in
Information Management from Arizona State University, and a Ph.D. in Business
Administration from Arizona State University. His research interests include the
application of information visualization to decision support systems, data ware-
housing, and the impact of user-generated content on organizations and society.
David’s work has appeared in numerous journals, including Management
Information Systems Quarterly, Decision Sciences, Decision Support Systems,
Information & Management, Communications of the ACM, IEEE Computer, AIS
Transactions on Human-Computer Interaction, and Information Systems Journal.
20  Data Science for All: A University-Wide Course in Data Literacy 293

Appendix: Abbreviated Course Syllabus for Data Science

Course Description

We are all drowning in data, and so is your future employer. Data pours in from
sources as diverse as social media, customer loyalty programs, weather stations,
smartphones, and credit card purchases. How can you make sense of it all? Those
that can turn raw data into insight will be tomorrow’s decision-makers; those that
can solve problems and communicate using data will be tomorrow’s leaders. This
course will teach you how to harness the power of data by mastering the ways it is
stored, organized, and analyzed to enable better decisions. You will get hands-on
experience by solving problems using a variety of powerful, computer-based data
tools virtually every organization uses. You will also learn to make more impactful
and persuasive presentations by learning the key principles of presenting data
visually.

Course Objectives

• Describe how advances in technology enable the field of data science


• Locate sources of data relevant to their field of study
• Identify and correct problems with data sets to facilitate analysis
• Combine data sets from different sources
• Assess the quality of a data source
• Convey meaningful insights from a data analysis through visualizations
• Analyze a data set using pivot tables
• Determine meaning in textual data using text mining
• Identify when advanced analytics techniques are appropriate
• Predict events that will occur together using association mining

Assignments

# Assignment description
1 Create a data analysis plan (individual)
Develop a plan for data analysis by forming hypothesis and finding data sets that will allow
you to test those hypotheses. The scenario: Once students graduate, it’s time for them to go
get a job. But is staying in the area the best choice? Evaluate our city as a place to live,
work, and play compared to the rest of the United States
2 Analyze a data set using tableau (individual)
Use Tableau to analyze and reveal various relationships within a data set. Use the data set
from the Environmental Protection Agency regarding fuel economy 2015 model year cars.
Answer a series of questions by creating the most visually effective charts and graphs using
the guidelines discussed in class
294 D. Schuff

# Assignment description
3 Cleaning a data set (individual)
Correct the errors in a data set for the fictitious company “Vandelay Industries.” The sales
group is suspicious that there might be errors in the data for January. Work with a new data
set of 3296 orders with 5192 line items from January 2014
4 Group data analysis (group term project)
In groups, perform an original analysis on a data set of your choosing. The data set can
come from any source as long as it is something you have not already worked on for this
course. Possible sources of data include: open data from Data.gov, data sets from the Pew
Research Center, sports statistics, a data set from your current employer, or an original
survey conducted by your group
Your analysis should clearly demonstrate the tools and techniques you’ve been exposed to in
this course. This can take any form you’d like (i.e., comparison of averages across
categories, mapping geographic data, sentiment analysis, developing and visualizing KPIs)
Your group will present your work in class through a five-min presentation, with 2 min for
questions

 chedule and Reading List (Current Configuration Is for Two


S
80-min Sessions per Week)

Week/
session Topic/key questions Readings
Module 1: Data in our daily lives
1.1 Introduction
 • Course introduction/syllabus
 • What is the difference between
data, information, and knowledge?
 • What makes “big data” big?
1.2 Science and data science Dhar, V. (2013). Data Science and
 • What is data science? Prediction. Communications of the
 • What is the difference between a ACM. Vol. 56, No. 12. pp. 64–73
theory and a hypothesis?
Allain, R. (2013). Three Science Words
 • What are the dangers of data
We Should Stop Using. Wired.com.
analysis without a hypotheses?
March 27
2.1 A brief introduction to data Stein, G. (2013). I’m Beating the NSA to
 • What are the forms data can take? the Punch by Spying on Myself.
 • Where does data come from? Fastcolabs.com. June 12
 • What is metadata? A data
Di Justto, P. (2013). What the
dictionary?
N.S.A. Wants to Know about Your Phone
Calls. The New Yorker. June 7
2.2 Identifying Sources of Data Silver, N. (2014). What the Fox Knows.
 • What kinds of data are available in FiveThirtyEight.com. March 17
different disciplines (arts, sciences,
Open Data. Wikipedia
medicine, business, government,
etc.)? Silver, N. (2014). In Search of America’s
 • What kinds of problems and issues Best Burrito. FiveThirtyEight.com.
can data insight address? June 5
20  Data Science for All: A University-Wide Course in Data Literacy 295

Week/
session Topic/key questions Readings
3.1 Learning to (Mis)trust Data Weisberg, J. (2011). Bubble Trouble: Is
 • How do you spot reliable sources Web Personalization Turning Us Into
of data? Solipsistic Twits? Slate.com
 • How do you assess data quality?
Crawford, K. (2013). The Hidden Biases
 • What is the “Filter Bubble?”
in Big Data. Harvard Business Review
Blog Network. April 1
Hayes, B. (2013). In Data We Trust.
Business Over Broadway. November 4
3.2 Guest speaker
Module 2: Telling stories with data
4.1 Viewing data Unwin, A. (2008). Chapter II.2: Good
 • What are different ways of viewing Graphics? Handbook of Data
data? Visualization. Chen, Hardle, and Unwin
 • When do you need to visualize (Eds.). pp. 57–78
data?
 • What are the basic techniques of
data visualization?
4.2 Introduction to Tableau Hoven, N. (n.d.). Stephen Few on Data
 • What is Tableau? What can you do Visualization: 8 Core Principles. Tableau
with it? Software
 • How is it different from Microsoft
Acohido, B. (2013). Watch Out,
Excel?
Terrorists: Big Data is on the Case.
USAToday.com. July 29
5.1 Communicating using data Davenport, T. (2013). Telling a Story
 • What are the principles of with Data. Deloitte University Press
communicating data?
Matlin, C. (2014). Visualizaing a Day in
 • How do you communicate complex
the Life of a New York City Cab.
ideas using data?
FiveThirtyEight.com. July 17
 • How do you construct
visualizations that complement a
report? That stand on their own?
5.2 Storytelling with infographics Krum, R. (2014). Cool infographics:
 • How are infographics different Effective Communication with Data
from other types of visualizations? Visualization. (Chapter 1: The Science of
 • How do infographic tools differ Infographics)
from other data tools we’ve used
Krum, R. (2014). Cool infographics:
so far?
Effective Communication with Data
Visualization. (Chapter 6: Designing
Infographics)
6.1/6.2 Exam review/EXAM 1
Module 3: Working with data in the real world
7.1 Dirty Data Redman, T. (2013). Data’s Credibility
 • How does data get dirty? Problem. Harvard Business Review. Vol.
 • What are the consequences (i.e., 91, No. 12. pp. 84–88
ethical, financial) of dirty data?
Gandel, S. (2013). Damn Excel! How the
 • How do you clean it?
‘Most Important Software Application of
All Time’ Is Ruining the World. Fortune.
com. April 17
296 D. Schuff

Week/
session Topic/key questions Readings
7.2 Data cleansing Taber, D. (2010). Stupid Data Corruption
 • How do you identify data Tricks: Take our CRM Quiz. CIO.com.
problems? November 2
 • How do you correct data
Top Ten Ways to Clean Your Data.
problems?
Microsoft
 • When is fixing the data not worth it?
8.1 Choosing relevant data Performance Indicator. Wikipedia
 • How do you identify Key
Schambra, W. (2013). The Tyranny of
Performance Indicators (KPIs)?
Success: Nonprofits and Metrics.
 • How do you identify the right
NonprofitQuarterly.com. December 30
measure for the selected problem?
8.2 Evaluating key performance indicators Olson, P. (2014). Wearable Tech is
 • How do you categorize and Plugging into Health Insurance. Forbes.
visualize KPIs according to a com. June 19
threshold?
Bialik, C. (2014). Tracking Health One
 • How do you use Tableau to
Step (and Clap, and Wave, and Fist
evaluate KPIs? How would you use
Pump) at a Time. FiveThirtyEight.com.
Excel?
March 17
9.1 Connecting diverse data Strickland, J. (n.d.). How Data
 • How do you identify data sets that Integration Works. howstuffworks.com
can be combined?
Gallagher, S. (2014). The GOP Arms
 • How do you combine data sets?
Itself for the Next “War” in the Analytics
 • How do you resolve conflicts?
Arms Race. arstechnica.com. February 7
9.2 Creating interactive dashboards Best Practices for Designing Views and
 • How does a dashboard differ from Dashboards. Tableau Software
an Infographic? A chart?
Farmer, D. (2014). The One Skill You
 • How do dashboards facilitate
Really Need for Data Analysis
decision-making?
10.1/10.2 Exam Review/EXAM 2
Module 4: Analyzing data
11.1 Storing and retrieving data Rosenblum, M. and Dorsey, P. (n.d.).
 • What is a database? How are Knowing Just Enough about Relational
spreadsheets just a type of Databases. Dummies.com
database?
Bertolucci, J. (2013). How to Explain
 • How are technology advances
Hadoop to Non-Geeks. InformationWeek.
changing how we think about
com. November 19
storing data?
 • What are the core technologies of
big data analytics?
11.2 Using Tableau to aggregate data Acampora, J. (2013). How to Structure
 • What can you learn from Source Data for Excel Pivot Tables &
aggregation? Unpivot. July 18
 • How does thinking of data
dimensionally help solve
problems?
20  Data Science for All: A University-Wide Course in Data Literacy 297

Week/
session Topic/key questions Readings
12.1 Beyond numbers Hurwitz, J., Nugent, A., Halper, F., and
 • What is the difference between Kaufman, M. (n.d.). Unstructured Data in
structured and unstructured data? a Big Data Environment. Dummies.com
 • What can you learn from text data
Feldman, R. (2013). Techniques and
that you can’t from numeric data?
Applications for Sentiment Analysis.
 • What are the tools for text
Communications of the ACM. Vol. 56,
analysis?
No. 4. pp. 82–89
12.2 Twitter sentiment analysis using Excel Wohlsen, M. (2014). Don’t Worry,
and Google Drive Facebook Still Has No Clue How You
 • What are the steps in performing a Feel. Wired.com. July 2
sentiment analysis?
 • What are the challenges in deriving
meaningful information from text?
13.1 Predicting the future Paine, N. (2014). What Analytics Can
 • What is predictive analytics? What Teach Us About the Beautiful Game.
problems does it address? June 12
 • What kinds of analysis can be
Bertolucci, J. (2013). Big Data Analytics:
done?
Descriptive vs. Predictive vs.
 • What kinds of data are needed for
Prescriptive. InformationWeek.com.
an analysis?
December 31
13.2 Predictive analytics using Tableau Peck, D. (2013). They’re Watching You at
 • Perform a forecasting analysis Work. TheAtlantic.com. November 20
 • Perform a simple association
analysis
13.1/13.2 Group presentations/FINAL EXAM
review

References

Davenport TH, Patil DJ (2012) Data scientist: the sexiest job of the 21st century. Harv Bus Rev
90(10):70–76
Krathwohl D (2002) A revision of Bloom’s taxonomy: an overview. Theory Pract 41(4):212–218
Press G (2012) Data scientists: the definition of sexy. Forbes. http://www.forbes.com/sites/gil-
press/2012/09/27/data-scientists-the-definition-of-sexy/#526d00375187. Accessed 27 Sept
2012
Temple University (2015) General education program. http://gened.temple.edu. Accessed 29 Feb
2016
Topi H, Valacich J, Wright RT, Kaiser K, Nunamaker JF, Sipior JC, Jan de Vreede G (2010)
IS 2010: curriculum guidelines for undergraduate degree programs in information systems.
Commun Assoc Inf Syst 26:18

You might also like