Week 1 Introduction V1 Class
Week 1 Introduction V1 Class
Unless Otherwise Stated, this presentation refers to study material from AWS, Microsoft Azure, and Snowflake
• Introductions
• Quick Survey
• Course Overview
• Syllabus Walkthrough
• Introduction to Data and More…
INTRODUCTION – ABOUT ME
Mandar Samant
MBA, PMP, CSM, B. Eng (Computer Eng)
Digital, Data, Customer Engagement
Experience: 25+ years in technology and consulting management, with 15+ years in engineering and program
management delivering mission-critical enterprise and data initiatives. Balanced experience in growing startups and
leading engagements in established companies.
Expertise: Solution design, data engineering, technical program management, customer and stakeholder management,
business analysis, lean process redesign, digital transformation.
What I love to do: Hiking, Reading, Percussion Instruments, Solving Problems, Volunteering, and Movies!
Contact:
Syllabus Overview
What is Possible
5
this course
Ground Rules
• Attendance sheet is your input.
• If it is not there, it is not there.
• Group assignments are a group effort.
• Figure out the group dynamics!
• STRICTLY NO EXCEPTIONS for additional presentations, and last-
minute grievances for the group efforts.
• Raise the flag early!
• Penalty for late submissions.
• Final exam/presentation is in the CLASS. No Exceptions.
• Manage your travel accordingly.
• Fewer excuses and more engagement, please!
• 94.84 is 94.84, and hence A-!
• The final exam is in the CLASS.
9
Course Composition
Emphasis
11
Industry
All about data Feedback
Examples
• Data Engineering • Classroom Participation
• Data Analysis • HBR Case Studies • Video assignment
• New frontiers like • Industry articles • Group Projects
Gen AI • Classroom Debates &
• Platforms: Azure and Discussions
Snowflake (Sprinkle • Guest Speaker
of AWS)
Handson Labs
12
Attendance 70 7%
Class Participation 50 5%
A 95.00-100.00
A- 91.00-94.99
B+ 87.00-90.99
B 83.00-86.99
B- 80.00-82.99
C+ 77.00-79.99
C 73.00-76.99
F 72.99 & below
15
Labs
Class Modality
Use of Generative AI
Why?
Any Questions?
21
Break
23
What is Data?
…since decades…
• “Data is food to AI” - AI veteran Andrew Ng
• “Without big data, you are blind and deaf and in the middle of a freeway.” —
Geoffrey Moore
• “Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has
to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives
profitable activity; so must data be broken down, analyzed for it to have value.” -
Michael Palmer
• “With data collection, ‘the sooner the better’ is always the best answer.” — Marissa
Mayer
• “If we have data, let’s look at data. If all we have are opinions, let’s go with mine.”
— Jim Barksdale
You collect as much data as you can. You immerse yourself in that data, but
then make the decision with your heart.”
– Jeff Bezos
Last but not the least…
“In olden days, when very little facts were recorded, we were starved for
insights; today with the data deluge, we are still starved for insights.” — Dr.
Rupa Mahanti
How do WE decide…with the data? 26
• Which type of course, skills will provide me with more job opportunities ?
• Which restaurant has above average reviews but generally easy to get seating for a
group?
• How to keep students engaged through the year and reduce the
drop put rate ?
• How does vendors’ quality look like for my grocery store chain?
What happened?
Data Aware
Data is collected but used
only discretionarily or on
need basis
Data Literacy
Data Indifference
Maturity
Most Decision are made
from gut feeling than even
being curious about data
Data Driven
Data is the DNA for all decision
making. Data collection,
cleaning, analytics and insights
Data Informed is matured
Source:
SpaceX Data: SpaceX: Enabling Space Exploration through Data and Analytics - Digital Innovation and Transformation (harvard.edu)
Lead from the front with Data 34
Source:
Big data in Insurance: https://www.actuary.org/sites/default/files/files/publications/BigDataAndTheRoleOfTheActuary.pdf
Book: Army of None: Autonomous Weapons and the Future of War, by Paul Scharre
Sports and Data: How Data Analysis In Sports Is Changing The Game (forbes.com)
All is good…But then…
35
Source: https://www.bcg.com/publications/2020/increasing-odds-of-success-in-digital-transformation
Gartner Predicts 80% of D&A Governance Initiatives Will Fail by 2027, Due to a Lack of a Real or Manufactured Crisis
Hinderances to Data Progression
36
Source *- wavestone-global-technology-data-leaders-survey-2024.pdf
2024 Company outlook- Data Investments
37
Source: data-ai-executive-leadership-survey-2024.pdf
2024- State of Generative AI in Companies
38
Source: data-ai-executive-leadership-survey-2024.pdf
High Level Challenges in data initiatives 39
Source: data-ai-executive-leadership-survey-2024.pdf
CDO/CDAO Role, Skills, Responsibilities 40
while there has been notable improvement in the percentage of organizations that report
the CDO/CDAO role now being well understood within their organizations, the percentage
remains below 50%, pointing to an ongoing gap between expectations and
understanding. Also, while there is near unanimity that the CDO/CDAO role is necessary
today – 97.2%, a smaller percentage – 88.8% -- see the role as being necessary 5 years
from now.
Source: data-ai-executive-leadership-survey-2024.pdf
How CDO/CDAO role is perceived? 41
The nature of CDO/CDAO reporting also appears to vary by industry and whether the role is
seen as primarily an offensive/revenue role versus a defensive/risk function. While the role
appears to have evolved into more of an offensive role for most organizations, particularly in
industries such as consumer packaged goods and retail, for highly regulated industries such as
financial services, risk, regulatory, and compliance demands continue to be a dominant
function of the role.
Source: data-ai-executive-leadership-survey-2024.pdf
Data value over time… 42
a diminishing effect
Most valuable:
Preventive
& Predictive
Actionable
Reactive
Less valuable:
Historical
Data Processing
Maturity & agility
Near real time Within seconds to Within hours Within days to
minutes months
The trade-offs of data-driven decisions
43
Prediction
Cost Accuracy
Triple intersection is
Speed of obviously desired but
challenging to achieve in
Insights reality
• How much should you invest to go faster • How quickly do you need an answer? • How accurate does the prediction
or predict more accurately? need to be?
• Can you sacrifice accuracy for speed?
• How much incremental improvement • Does waiting for a better answer
justifies the additional cost? outweigh answering more quickly?
44
• volume
• velocity
• variety
• veracity
• value
46
Volume
Amount of Data
• It is the base of big data
Velocity
Variety
• Structured
• Unstructured
• Semi-structured
Type of Data 49
Structured Un-Structured
data that has been predefined and formatted to a data stored in its native format and not processed
set structure until it is used
Easily used by business users Need analytics to derive patterns and behaviors
Predefined structure makes it easy on Machine Native format helps to use data as-is and easy to
Learning algorithms collect as there are not much rules on the
structure
Cons: Limited storage formats and choices. RDBMS Cons: Requires more storage as compared to
or DW structured data e.g. Data lakes or cloud data DWs
Cons: Predefine structure also forces limitation on Cons: Limited skill set availability due to the
use and manipulations. technical nature of the toolset and frameworks
Sources of Data 50
Business or HRMS, ERP, CRM, PPM, Structured Low Mid Mid Low
Enterprise EMR
Application
Documents PDF, XLS, JSSON and so Unstructured Mid Low Low Mid
on
Data Storage File streams, NoSQL, R- Structured/ Low Mid High Mid
ORDBMS Hybrid
• Semi-structured
• Hasn’t been organized
• Has meta-data – example – pictures grouped by tags
52
Veracity
Source: https://datascience.aero/big-data-veracity-value/
53
Value
3 Ps 4 Cs 5 Vs
People Clean Velocity
Process Consistent Variety
Platform Conformed Volume
Current Veracity
Value
Business Outcomes
Is our new talent aligned with what
55
Agility in thought
process and adoption Going beyond
of new technology traditional tech stack
paradigms
As data vision is maturing and data foundations are laid for aspiring data-driven
organizations, the landscape is expecting more data engineers, data-ops, data
governance, MDM, ml-ops, and ml engineers.
2012 2017
Source:
2012 Big data Landscape- Forbes Blog
2017 Big Data Landscape – Matt Turck-
2021+…Data landscape is ocean 58
Source: https://mattturck.com/data2021/
59
Data analysts, data scientists, predictive modelers, statisticians and other analytics professionals,
• collect
• process
• clean and
• analyze
growing volumes of structured transaction data as well as other forms of data not used by
conventional BI and analytics programs.
Source:
https://online.hbs.edu/blog/post/data-life-cycle
https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
Critical Elements of Data Lifecyle 61
Source: Reis, Joe; Housley, Matt. Fundamentals of Data Engineering (p. 24). O'Reilly Media.
Modern Data Professional
62
Next Week
References:
• https://www.newvantage.com/_files/ugd/e5361a_ad5a8b3da825
4a71807d2dccdb0844be.pdf
• https://www.newvantage.com/_files/ugd/e5361a_ad5a8b3da825
4a71807d2dccdb0844be.pdf
• https://online.hbs.edu/blog/post/data-life-cycle
• https://mattturck.com/data2021/
66
Thank you…