[go: up one dir, main page]

0% found this document useful (0 votes)
3 views56 pages

Intro To Data Literacy Week 1

data literacy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views56 pages

Intro To Data Literacy Week 1

data literacy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Week 1:

Course Overview &


What is Data Literacy?
CECS 1050: Introduction to Data Literacy
Teaching team

Dr. Nguyen Tuan Binh Dr. Le Duy Dung Nguyen Giang Son, MSc
Assistant Professor, CECS Assistant Professor, CECS Research Assistant, CECS
binh.nt2@vinuni.edu.vn dung.ld@vinuni.edu.vn son.ng@vinuni.edu.vn
(Instructor Week 1-7) (Instructor Week 9-15) (TA)

CECS1050 Introduction to Data Literacy 2


Course logistics
• When: Weekly class on every Tuesday, 13:15-15:05
o Midterm: Week 8
o Final projects & presentations: Week 15
• Where: E403

CECS1050 Introduction to Data Literacy 3


Learning objectives
On successful completion of this course, you will be able to:
1. Read, understand, create, and communicate data as information.
2. Analyze and visualize data using tools like Excel and Python.
3. Understand the importance of data in business strategy and apply data-
driven insights for decision-making.
4. Develop skills to improve data collection designs and ensure data
quality.
5. Utilize statistical methods and probability to interpret, present and
effectively tell stories about data.

CECS1050 Introduction to Data Literacy 4


Course Evaluation
Class Participation/Attendance 10%
Quizzes (every 2 weeks) 10%
Assignments (every 2 weeks) 20%
Midterm Exam (Week 8) 25%
Final Project (Report + Presentation) 35%
Total 100%

CECS1050 Introduction to Data Literacy 5


Course Evaluation
Letter Grade Range
A 100% to 89.5%
A- <89.5% to 84.5%
B+ <84.5% to 79.5%
B <79.5% to 74.5%
B- < 74.5% to 69.5%
C+ <69.5% to 64.5%
C <64.5% to 59.5%
C- <59.5% to 54.5%
D+ <54.5% to 49.5%
D <49.5% to 44.5%
F <40% to 0%

CECS1050 Introduction to Data Literacy 6


Poll: How Comfortable Are You with Handling
Data?
“How comfortable are you with interpreting data (like charts or
statistics)?”
• Very Comfortable?
• Somewhat?
• Only a little?
• Not at all?

CECS1050 Introduction to Data Literacy 7


What is Data?
• Data = "a collection of facts, numbers, words, observations or
other useful information" 1.

1 https://www.ibm.com/think/topics/data

CECS1050 Introduction to Data Literacy 8


What is Data? Examples

Picture of handwritten digits


A table of students' heights (MNIST)

Audio of Vietnam's Declaration of


Independence
CECS1050 Introduction to Data Literacy Text of natural language 9
What is Data? (cont.)
• When processed or interpreted, data becomes useful information.

CECS1050 Introduction to Data Literacy 10


Image source: https://www.jeffwinterinsights.com/insights/dikw-pyramid
Data in Everyday Life
• Question: Where do you see data?

• Take 5 minutes to list 2–3 ways you encounter data in your daily
life.

CECS1050 Introduction to Data Literacy 11


What is Data Literacy?
• “Data literacy is the ability to read, work with, analyze, and
communicate with data”
• In other words, being comfortable understanding data and using it
to inform decisions and communicate insights.
• Anyone can develop data literacy – it’s not just for “engineering
people.”

CECS1050 Introduction to Data Literacy 12


Core Components of Data Literacy
Key Skills of a Data-Literate Person:
• Reading Data: Understanding what data is telling us (e.g.
interpreting a chart or dataset)
• Working with Data: Collecting, producing, or organizing data (e.g.
running a survey or compiling a table)
• Analyzing/Reasoning with Data: Drawing conclusions, finding
patterns, and making decisions based on data
• Communicating Data: Presenting data insights clearly (e.g.
through a story, report, or visualization)

CECS1050 Introduction to Data Literacy 13


Why Data Literacy? Data Literacy in the Digital
Age
• We live in an era flooded with data:
• about 3-400 million terabytes of data are created every day (2023)
(imagine every person on earth writing 25,000 books, or taking 12,500 photos each day)

CECS1050 Introduction to Data Literacy 14


1 Statista: Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2023, with forecasts from 2024 to 2028
Why Data Literacy? Data Literacy in the Digital
Age (cont.)
• Data is highly valuable

-> it is essential to understand and make use of this valuable resources.

• Data literacy: The ability to understand and use data effectively


CECS1050 Introduction to Data Literacy 15
1 The world’s most valuable resource is no longer oil, but data 2 Meta Invests $14 Billion In Scale AI To Strengthen Model Training 3 Wavestone Data and AI executive leadership survey 2024
Why Data Literacy? – Data Everywhere,
Everyday
• Data is part of daily life:

CECS1050 Introduction to Data Literacy 16


Why Data Literacy? – In the Workplace
• Data skills are in demand:
• Modern jobs in every field expect you to be comfortable with data.
• Tableau study: “87% of employees rate basic data skills as very important
for their day-to-day operations”

Head of MKT & Comms @ VinUni JD

Have you seen a job description in your field


with data skills requirements?

CECS1050 Introduction to Data Literacy 17


1 Tableau Data literacy explained
Why Data Literacy? – In the Workplace (cont.)
• Data skills are in demand:
• WEF: AI & Big data is core skill in 2025, more important by 2030.

CECS1050 Introduction to Data Literacy 18


1 WEF The Future of Jobs Report 2025
Why Data Literacy? – In the Workplace (cont.)
• Yet, only ~40% of employees report they’ve been properly trained
on the data skills 1
• “companies lose an average of 43 hours per employee per year due to
data-induced procrastination” 2
• → Being data literate can open doors to leadership roles (e.g.,
“analytics translators” who bridge business and technical teams)

CECS1050 Introduction to Data Literacy 19


1 Tableau Data literacy explained 2 Accenture The Human Impact of Data Literacy
Why Data Literacy? – Better Decisions &
Critical Thinking
• Making Informed Decisions: Data literacy helps you avoid pitfalls
like being misled by false statistics or flashy graphs
• Ask critical questions about data: “Who collected this? Is the sample
biased? What does that percentage really mean?”
• Distinguish correlation vs causation: Just because two things occur
together doesn’t mean one causes the other (more on this next!).
• Trust but verify: Before believing a claim (“This new drug reduces risk by
50%”), you’ll look at the actual numbers and context.
• Adapt to new information: In the digital age, things change fast (trends,
virus outbreaks, market shifts). Data-literate people can interpret new
data and adjust decisions quickly.

CECS1050 Introduction to Data Literacy 20


Key Data Terms & Concepts
• Data vs dataset
• Qualitative vs quantitative data
• Structured vs unstructured data
• Random variables, statistics, correlation, etc.

CECS1050 Introduction to Data Literacy 21


Data vs. Dataset
• Data = raw facts, figures, or information points we collect about
the world
• Example: one student’s height & weight; a response to one survey
question.
• Dataset: A structured collection of data, usually related to a
certain topic or study. Often presented in a table or spreadsheet
form (rows and columns). It’s multiple data points organized
together.
• Example: A spreadsheet of all students’
heights in this class is a dataset.

CECS1050 Introduction to Data Literacy 22


Quantitative vs. Qualitative Data
• Quantitative Data: Data that is numerical – it can be measured
or counted in numbers.
• Examples: height in centimeters, test scores, income in dollars.
Quantitative = quantity.
• Qualitative Data: Data that is descriptive or categorical – not
numeric by nature. It describes qualities or categories.
• Examples: favorite color, types of cuisine, interview comments.
Qualitative = quality (characteristics).
• Sometimes qualitative data can be coded as numbers (e.g. 1 for M, 0 for
F), but the number itself isn’t a measurement, just a label.

CECS1050 Introduction to Data Literacy 23


Structured vs. Unstructured Data
• Structured Data: Data that is organized in a predefined format, making it easy to search and
analyze. Usually found in tables, spreadsheets, databases (rows and columns).

• Unstructured Data: Data without a clear predefined format or model. It’s often textual or
media content. Often requires additional steps to organize & analyze.

• Question: is most online data structured or unstructred?

CECS1050 Introduction to Data Literacy 24


Variable
• A characteristic or attribute that can vary among the subjects or
items you’re studying, and which is recorded as data. It is
essentially a column in a dataset.
• Each variable has a name and definition (what it measures).
• Examples: In a student dataset, variables include Age, Score, etc. Age and
Score are numerical variables; Grade and Gender are categorical
(qualitative) variable.
• Each individual (or case) has a value for each variable.

CECS1050 Introduction to Data Literacy 25


Statistic
• Statistic: In general conversation, “a statistic” often means a
numerical summary of a dataset. It’s a value that describes some
aspect of the data.
• Common statistics include:
• Mean (Average): e.g. average exam score of the class.
• Median: the middle value when data is sorted (e.g. median age of
students).
• Percentage/Proportion: e.g. 80% of students in this class are first-year
students.
• Max/Min: e.g. the highest score and lowest score on a quiz.
• Count: e.g. number of students who responded to a survey

CECS1050 Introduction to Data Literacy 26


Correlation
• Correlation: A relationship or association between two variables.
When one variable changes, the other tends to change in a
consistent way.

Positive correlation between test Negative correlation between No correlation between coffee
score and wage running time and body fat consumption and IQ level
CECS1050 Introduction to Data Literacy 27
Activity: Correlation ≠ Causation
• Important: Correlation ≠ causation. Just because two things
correlate doesn’t mean one causes the other. We must be careful
not to jump to conclusions.
• Scenario: Studies show a strong positive correlation between ice
cream sales and number of drownings at beaches. When ice
cream sales go up, drownings also rise.
• Question: Does eating ice cream cause drowning? Why or why
not? (Think for a moment – what else could be going on?)

CECS1050 Introduction to Data Literacy 28


Activity: Correlation ≠ Causation
• Answer:

• Lesson: Always ask if there might be another explanation for a


correlation. In data literacy, we seek underlying factors before claiming
cause and effect.

CECS1050 Introduction to Data Literacy 29


Spurious Correlations in The Wild
• Explore for yourself : https://www.tylervigen.com/spurious-correlations

CECS1050 Introduction to Data Literacy 30


Real-World Case Studies: Data in Action
• 2 case studies in the following domains:
o Business,
o Healthcare
o Education.
• We will see how data can lead to better decisions and
innovations.
• As we go through, think about how the terms we just learned
appear in these examples (What data was used? What variables?
What decisions were made?).

CECS1050 Introduction to Data Literacy 31


Data in Business: Overview
• Companies today rely on data for almost every aspect of
operations and strategy.

Marketing: targeted ads, Operations: optimize Product Strategy: decide Decision-Making Culture:
personalized supply chains (stocking what new products or employees at all levels use
recommendations. the right products, features to develop. data to justify decisions
efficient delivery routes). (“analytics culture”).

CECS1050 Introduction to Data Literacy 32


Data in Business: Overview
• Plenty of evidence that businesses that leverage data effectively
often outperform those that rely on gut feeling
• McKinsey study of B2B firms: “Companies that are using data-driven
B2B sales-growth engines report above-market growth and EBITDA
increases from 15-25%.”
• AWS/ S&P survey of 2.3k SMBs: “65% of highly data-driven SMBs
financially outperform their competitors, twice as much as less data-
driven SMBs (33 percent)”
• Wavestone survey of >100 Fortune 1000 leaders: “87.0% reporting
successful delivery of measurable business value (from data investment)
to the organization”
• and more …

CECS1050 Introduction to Data Literacy 33


UPS ORION – Data-Driven Route
Optimization
• Challenge: In 2015, UPS manages 55,000 daily delivery routes and sought to reduce
miles, costs, and emissions.
• Traditional route planning was manual and couldn’t easily account for all variables.
• Solution: UPS developed ORION (“On-Road Integrated Optimization and Navigation”)
• ORION analyzes 200,000+ route options per driver per day to find the most efficient path

CECS1050 Introduction to Data Literacy 34


UPS ORION – Data-Driven Route
Optimization

Customer locations data

Traffic data

predictive modeling: sequence deliveries


optimally, minimizing left turns and idle time
Delivery commitments data

CECS1050 Introduction to Data Literacy 35


UPS ORION – Data-Driven Route
Optimization
• Results:

CECS1050 Introduction to Data Literacy 36


Domino’s Pizza – Digital Transformation with
Big Data
• Background: Facing stagnation in the 2000s, Domino’s pivoted to become a “digital-first”
pizza company, leveraging data from online ordering, its app, and loyalty program to drive
strategy.
• The goal was to personalize marketing and streamline operations using real-time
customer data.

CECS1050 Introduction to Data Literacy 37


Domino’s Pizza – Digital Transformation with
Big Data

Customer orders Item suggestion


data

Preferences & timing Optimized promotions timing


data

Machine Learning
Location data
New store locations &
customized menu to local
CECS1050 Introduction to Data Literacy tastes 38
Domino’s Pizza – Digital Transformation with
Big Data
• https://cloud.google.com/customers/dominos
• Results:
• One analytics initiative (connecting digital + CRM data) led to an
immediate 6% increase in monthly revenue.
• The company saved ~80% in ad serving costs by targeting customers
more precisely. In the UK and Ireland, online sales climbed 30% year-
over-year, now making up 70% of sales (44% via mobile).
• Implications: The case shows how big data can reinvent a
business model.
• Domino’s now launches data-driven innovations (like one-click ordering,
AI chatbots) faster, guided by customer insight.

CECS1050 Introduction to Data Literacy 39


Data in Healthcare: Overview
• The healthcare sector generates massive data (patient records,
lab results, public health stats) and is increasingly data-driven:

Personal Health: How Wearables Are


Changing Cancer Care in Vietnam (VinUni)

Patient Care: improve Operations: reducing wait


diagnoses and treatment times, managing staff Public Health: track
plans (e.g. predictive schedules, ensuring disease outbreaks and
models to identify high- enough beds and supplies inform policy decisions.
risk patients for diseases). based on forecasts.

Q: Have you or a family member used an app to track health? (That’s


Innovation: new medical discoveries or
personal health data at work). personalized medicine.
CECS1050 Introduction to Data Literacy 40
Case study: HCA’s SPOT Sepsis Alert – Early
Detection Saves Lives
• Background: Sepsis is a life-threatening condition responsible for ~270,000 deaths
per year in the U.S., often due to late detection. HCA Healthcare (with 185 hospitals)
sought to use data to catch sepsis earlier and prompt timely treatment.
• Data Used: HCA developed an algorithm called SPOT (Sepsis Prediction and
Optimization of Therapy) using data from 31 million patient encounters.
• SPOT continuously monitors patients’ electronic health records – vital signs, lab
results, nursing notes – for patterns indicating sepsis risk.
• Method: SPOT employs rules-based and machine-learning algorithms to analyze
subtle changes often invisible to clinicians. It runs 24/7, essentially acting as a
“smoke detector” for sepsis by recognizing concerning data combinations ~6 hours
earlier than manual methods.
• When high-risk patterns emerge, it sends automatic alerts to the care team’s mobile
devices within minutes, enabling prompt intervention (e.g. antibiotics, fluids) before
the patient deteriorates.

CECS1050 Introduction to Data Literacy 41


Case study: HCA’s SPOT Sepsis Alert –
Early Detection Saves Lives
• Background: Sepsis is a life-threatening condition responsible for ~270,000 deaths per year
in the U.S., often due to late detection. HCA Healthcare (with 185 hospitals) sought to use
data to catch sepsis earlier and prompt timely treatment.
• Data Used: HCA developed an algorithm called SPOT (Sepsis Prediction and Optimization
of Therapy) using data from 31 million patient encounters.

CECS1050 Introduction to Data Literacy 42


Case study: HCA’s SPOT Sepsis Alert –
Early Detection Saves Lives
rules-based and machine-
learning algorithms to
analyze subtle changes
often invisible to clinicians

(vital signs, lab results, nursing notes)

~6 hours earlier than manual


methods.

CECS1050 Introduction to Data Literacy 43


Case study: HCA’s SPOT Sepsis Alert –
Early Detection Saves Lives
• Results:
• From 2013 to 2018: SPOT + evidence-based care protocols = saving ~8,000
lives.
• SPOT alerts allow clinicians to begin treatment several hours sooner (each
hour of delay raises death risk 4–7%).
• The initiative earned HCA the 2019 Eisenberg Award for Patient Safety
• Implications: This case demonstrates the life-saving potential of big
data in clinical settings.
• Real-time analytics can catch what human eyes miss, but it also requires
training staff to trust and appropriately act on algorithmic alerts.
• HCA’s enterprise-scale deployment 170+ hospitals shows such tools can
generalize and work at scale. across
• Other hospital systems are now adopting similar early-warning
platforms for sepsis and other conditions.

CECS1050 Introduction to Data Literacy 44


Case study: Analytics-Driven Reduction of
Hospital Readmissions
• Background: Unplanned hospital readmissions (patients returning within 30 days)
are costly for providers and can indicate suboptimal care. U.S. health systems face
financial penalties for high readmission rates.
• Data & Method: One US regional's health system combined predictive modeling
with workflow changes to target readmission risks. They analyzed a wide range of
data: clinical data, prior admissions, comorbidities, and social factors – to predict
which discharged patients were at high risk of readmission.
• These predictions were embedded into clinicians’ and care managers’ daily
workflows.
• Intervention: High-risk patients (as flagged by the model) were enrolled in a 30-day
transition care program.
• Case managers proactively followed up with these patients, ensuring they
understood medications, had timely follow-up appointments, and addressed social
needs (transportation, home care, etc.)
• The approach fully integrated data insights with on-the-ground care coordination

CECS1050 Introduction to Data Literacy 45


Case study: Analytics-Driven Reduction of
Hospital Readmissions
• Results: This data-driven initiative achieved striking results. A pilot
hospital saw a 40% improvement in its risk-adjusted readmission
index over three years after analytics integration.
• These reductions translated to hundreds of avoided admissions and
millions in savings, while patients experienced safer recoveries at
home.
• Implications: By predicting and preventing likely bounce-backs,
hospitals improved quality ratings and avoided penalties.
• The case underlines that simply deploying a predictive model isn’t
enough – success came from organizational change: aligning IT,
clinicians, and case management in a coordinated workflow.
• Many health systems are now investing in “mission control” analytics
centers to manage patient flow and prevent readmissions.

CECS1050 Introduction to Data Literacy 46


Data in Education: Overview
• Schools and universities are using data to enhance learning
experiences:

Tracking Performance: Personalized Learning: Institutional Decisions: Early Warning Systems: Continuous
Teachers use student data Educational software (like improve curricula and flag if a student is at risk Improvement: refining
(grades, attendance, adaptive learning platforms) student support services. of dropping out, so teaching methods and
participation) to see who might adjust the difficulty or topics advisors can intervene. curriculum (what’s
need extra help or advanced based on a student’s performance working, what isn’t).
challenges. data in real time.

CECS1050 Introduction to Data Literacy 47


Case study: Georgia State University – Data
Analytics Boosts Student Success
• Background: Georgia State University (GSU) faced low graduation rates
in the mid-2000s – barely 32% of students, with even lower rates for
Black, Latino, and low-income students. By 2015, GSU embarked on a
data-driven strategy to turn this around and close equity gaps.
• Data Used: GSU built a comprehensive student data warehouse
tracking 10+ years of student records – grades, credit progress,
demographics, financial aid, and even engagement metrics.
• Intervention Method: Using predictive analytics models, GSU’s system
alerts advisors when a student is off track. An advisor center monitors
these alerts daily.
• For example, if a STEM major gets a poor grade in a critical class, the system
flags it (yellow/red status).
• Advisors then proactively reach out within 48 hours to offer tutoring, academic
counseling, or adjust course plans.
• GSU also implemented data-informed initiatives like micro-grants (small
financial awards auto-triggered for students at risk of dropping out due to
finances)

CECS1050 Introduction to Data Literacy 48


Case study:
Georgia State University
• Results:
• graduation rate rose by 23 percentage
points
• Achievement gaps by race, ethnicity, and
income were completely eliminated
• an additional ~3,000 students earn degrees
who might have dropped out.
• Implications:
• GSU’s “Moneyball for education” shows that
institutions can boost outcomes at scale
through predictive advising.
• The university also saw financial benefits:
more students retained means ~$60 million
in added tuition over years.

CECS1050 Introduction to Data Literacy 49


Case Study: Fraud Detection in 2018
Vietnam's National Exam
• Background (what raised suspicion): Within a day of releasing 2018 exam results, analysts
noticed anomalously high concentrations of top scores in Hà Giang, a province with
historically lower average performance.
o Example outliers: 36 of the 76 highest A1 combinations (≥27 points) nationwidecame from Hà Giang, and
the province accounted for about one-third of top Ascores despite a low graduation rate (89.35% vs 97.57%
nationwide).

CECS1050 Introduction to Data Literacy 50


Case Study: Fraud Detection in 2018
Vietnam's National Exam
• Method: outlier analysis:

CECS1050 Introduction to Data Literacy 51


Case Study: Fraud Detection in 2018
Vietnam's National Exam
• Method: outlier analysis

CECS1050 Introduction to Data Literacy 52


Case Study: Fraud Detection in 2018
Vietnam's National Exam
Method: Digital forensics
• Investigators cross-checked CD1 vs CD2 and found 114 candidates with altered results.
• A provincial official (Vũ Trọng Lương) had downloaded official answer keys, converted
them to Excel, and over-wrote candidates’ files, then manually edited scripts
• This scheme is exposed because the unaltered CD1 remained on record.

CECS1050 Introduction to Data Literacy 53


Case Study: Fraud Detection in 2018
Vietnam's National Exam
• Results: 114 inflated scores were corrected back to the true marks; some
candidates’ totals had been boosted by up to 29.95 points. Across the 2018
scandal, nearly 20 people were prosecuted (spanning several provinces).
Quick re-scoring prevented fraudulent university admissions.
• Implications:
o Baseline vs. outlier dashboards: Monitor ≥9 rates and ≥27 A/A1 counts by
province/subject against historical and national distributions; auto-flag extreme
deviations for review.
o Independent records: Preserve immutable raw files (CD1); audit any downstream
processed files (CD2) for mismatches.
o Context checks: Watch for mismatches like “many top scores” in a region with lower
graduation rates or past averages, an integrity red flag.

CECS1050 Introduction to Data Literacy 51


Conclusions: Today’s Takeaways
• Data literacy = ability to read, work with, analyze, and
communicate data.
• In short, data literacy is an essential 21st-century skill.
• Real-world impact: Examples from business (UPS, Domino's
Pizza), healthcare (HCA), and education (Georgia State
University, Khan Academy + public schools) showed data-
driven decisions can lead to better outcomes – higher profits,
healthier patients, and improved learning.
• Basic terms and concepts: You now know key vocabulary – data,
dataset, variables, statistics (mean, etc.), correlation vs causation
– and why each concept is important in analyzing information.
CECS1050 Introduction to Data Literacy 52
The Next Steps
• Becoming Data Literate is a Journey: as we continue, you’ll get
hands-on experience (e.g., analyzing a simple dataset, creating a
chart).
• Upcoming Topics: In the next sessions, we’ll delve into
o Types of data (structured vs unstructured in more depth, quantitative vs
qualitative)
o Data collection methods
o Basic statistics and visualization
o Practical skills (Excel, a bit of Python) to work with data.

CECS1050 Introduction to Data Literacy 53

You might also like