[go: up one dir, main page]

0% found this document useful (0 votes)
66 views65 pages

Week 1 Introduction V1 Class

The document outlines the course BUAN6335, focusing on organizing for business analytics platforms, led by Prof. Mandar Samant. It includes an overview of the syllabus, grading criteria, and the importance of data in decision-making, emphasizing teamwork and engagement. The course covers various data-related topics, hands-on labs, and case studies, aiming to develop a data-driven mindset among students.

Uploaded by

ojhashobha28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views65 pages

Week 1 Introduction V1 Class

The document outlines the course BUAN6335, focusing on organizing for business analytics platforms, led by Prof. Mandar Samant. It includes an overview of the syllabus, grading criteria, and the importance of data in decision-making, emphasizing teamwork and engagement. The course covers various data-related topics, hands-on labs, and case studies, aiming to develop a data-driven mindset among students.

Uploaded by

ojhashobha28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

BUAN6335

Organizing for Business Analytics Platforms

Prof. Mandar Samant


Week 1

Unless Otherwise Stated, this presentation refers to study material from AWS, Microsoft Azure, and Snowflake
• Introductions
• Quick Survey
• Course Overview
• Syllabus Walkthrough
• Introduction to Data and More…
INTRODUCTION – ABOUT ME

Mandar Samant
MBA, PMP, CSM, B. Eng (Computer Eng)
Digital, Data, Customer Engagement

HP/ Ness Sun Ness Apollo Edu Amazon Clairvoyant UT-


Verifone Microsystems Group /EXL Dallas

Experience: 25+ years in technology and consulting management, with 15+ years in engineering and program
management delivering mission-critical enterprise and data initiatives. Balanced experience in growing startups and
leading engagements in established companies.

Expertise: Solution design, data engineering, technical program management, customer and stakeholder management,
business analysis, lean process redesign, digital transformation.

What I love to do: Hiking, Reading, Percussion Instruments, Solving Problems, Volunteering, and Movies!

Contact:

Email Address: Mandar.Samant@UTDallas.edu


Pinging on Teams is Preferred as a first contact. Email is ok too. Please check the office hours in the syllabus.
4

Syllabus Overview
What is Possible
5

• New topics and discussions (projects, problems,


solutions and more)
• New ideas
• Early assignment submissions ☺
• Flexible office hours: I have designated timeslots in the
syllabus, but you can always book a timeslots or take th
opportunity to ping me on Teams
Modus Operandi 6

• Data platform topics are sometimes dry so we will make it


interesting with examples, case studies, labs, and multimedia…
• Data platforms have many foundational pillars, such as :
• data storage and processing, data ingestion, data transformation,
governance, security, BI/analytics and data observability
• All the pillars need CONTEXT
• Ask “why” to get to the cause, else we will be just observing or
applying temporary band-aids
• No substitute for your questions and challenges! Hence, keep
churning your thinking throughout the class!

And one last thing….


Teamwork is at the heart of 7

this course

"Great things in business are never


done by one person. They're done by a
team of people.“
Steve Jobs

Image source: Steve Jobs gave us our president (dallasnews.com)


8

Ground Rules
• Attendance sheet is your input.
• If it is not there, it is not there.
• Group assignments are a group effort.
• Figure out the group dynamics!
• STRICTLY NO EXCEPTIONS for additional presentations, and last-
minute grievances for the group efforts.
• Raise the flag early!
• Penalty for late submissions.
• Final exam/presentation is in the CLASS. No Exceptions.
• Manage your travel accordingly.
• Fewer excuses and more engagement, please!
• 94.84 is 94.84, and hence A-!
• The final exam is in the CLASS.
9

We are all sensitive and empathetic


So let us keep it genuine
10

Course Composition
Emphasis
11

Industry
All about data Feedback
Examples
• Data Engineering • Classroom Participation
• Data Analysis • HBR Case Studies • Video assignment
• New frontiers like • Industry articles • Group Projects
Gen AI • Classroom Debates &
• Platforms: Azure and Discussions
Snowflake (Sprinkle • Guest Speaker
of AWS)

Handson Labs
12

Sample Case Studies/Articles


Case Study Type
Architecting Customer 360 Platform with Class Discussion
AWS/Azure
Disney+ Data Challenges and Class Discussion
Opportunities
Uber Pickup Efficiency Problem Group Presentation

Unstructured data wisdom and AirBnB Class Discussion / Presentation

John Deere AI Marketplace challenge Presentation


Data, ML, Computer vision in Supply Class Discussion
chain

The final list of case studies and articles will be provided on


eLearning by 1/28.
Grading Criteria
13

Grading Milestones Points % Weightage

Individual Assignment Milestone 100 10%

Hands-on Labs 200 20%

HBR Case Studies (Group) 200 20%

Project Presentation & Paper (Group) 200 20%

Attendance 70 7%

Final Exam 100 10%

Group Participation (Project and HBR) 80 8%

Class Participation 50 5%

Total 1000 100.00%


Grade Range
14

Final Point Total Letter Grade

A 95.00-100.00
A- 91.00-94.99
B+ 87.00-90.99
B 83.00-86.99
B- 80.00-82.99
C+ 77.00-79.99
C 73.00-76.99
F 72.99 & below
15

Labs

There are around 25+ Labs, so


embrace some interesting hands-
on practice!
16

Class Modality

• Classroom based, unless notified by me


• Class will meet during scheduled times in the
assigned room on the days as indicated in the
coursebook, syllabus.
• Any changes for a specific week, the entire
cohort would be notified by TA and/or myself
• eLearning portal is the primary source for the
course content, instructions, assignments, and
so on…
17

Overall weekly schedule

• In standard class structure:


• Prepare for the class by reading/watching the
prerequisite study material (if any specified)
• Learn from weekly topics
• Classroom activities specifically debates, discussions
based on scenarios, case studies

• Group work requires preparation of arguments,


presentation and so on. In such settings you are
presenting and debating in front of other
professionals (other teams) in the class, and half-
baked cake is never a pleasant sight!
18

Miscellaneous but important points

• All communication regarding graded assessments


must occur in writing via UT Dallas email
• Accessibility – please let me know if you need any
form of accessibility assistance
• Academic Dishonesty – DON’T DO IT
– I do not hesitate to engage with the UTD Office of
Community Standards and Conduct
• Requested exceptions for absences, late submissions,
etc., must be in writing and include documentation
– E.g., car accident, death in the family, illness, etc.
19

Use of Generative AI

Good with it if it is not a mere copy-paste job . I need


your take on the discussions and not on an LLM engine.

Why?

In the real world, you will be the face of


technology client engagements and certainly can
not hide behind the Gen AI Engine.
20

Any Questions?
21

Let's get to know you…


22

Break
23

What is Data?

Data is the …….


Everyone is talking about the data 24

…since decades…
• “Data is food to AI” - AI veteran Andrew Ng

• “Without big data, you are blind and deaf and in the middle of a freeway.” —
Geoffrey Moore

• “Data is the new oil.” — Clive Humby

• “Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has
to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives
profitable activity; so must data be broken down, analyzed for it to have value.” -
Michael Palmer

• “With data collection, ‘the sooner the better’ is always the best answer.” — Marissa
Mayer

• “If we have data, let’s look at data. If all we have are opinions, let’s go with mine.”
— Jim Barksdale

• “Above all else, show the data.” — Edward R. Tufte


Few More… 25

“There were 5 exabytes of information created between the dawn of


civilization through 2003, but that much information is now created every
two days.”
- Eric Schmidt, Executive Chairman at Google

You collect as much data as you can. You immerse yourself in that data, but
then make the decision with your heart.”
– Jeff Bezos
Last but not the least…

“Torture the data, and it will confess to anything.”


— Ronald Coase

“In olden days, when very little facts were recorded, we were starved for
insights; today with the data deluge, we are still starved for insights.” — Dr.
Rupa Mahanti
How do WE decide…with the data? 26

• Which website provides me better pricing with free shipping?

• How should I line up by fantasy football team?


• knowing I have pass heavy points system ?

• Which type of course, skills will provide me with more job opportunities ?

• Which restaurant near me serves the best Thai food?

• Which restaurant has above average reviews but generally easy to get seating for a
group?

• What should I pay for this home?

• Which is the fastest way to pay off student loan?

• How much my current home or car worth of ?

• Which travel websites provides better deals tracking?

Consumers Convenience & Value


How do ORGANIZATIONS decide…
27

• How is my infrastructure is handling the peak season?

• Which of these customer transactions should be flagged as fraud?

• What customer experience should be enhanced in our mobile app or


website to increase interactions?

• Which patients are most likely to have a relapse?

• How to keep students engaged through the year and reduce the
drop put rate ?

• When is the optimum time to harvest this year's crop?

• How does vendors’ quality look like for my grocery store chain?

Organizations Top & Bottom line, Efficiency Driven


29

Valuable Actionable Insights are hard to


achieve without a proper data foundation

More valuable Prescriptive


How can we
make something
Predictive happen or
prevent
What will something from
happen? happening? Actionable
Diagnostic Insights
Why did
Maturity
something
Descriptive happen?

What happened?

Less valuable, easier to derive


More difficult to derive
and sustain
Data Literacy and Gen D Worker
30

Reading data involves


01 understanding what data is,
and what aspects of the world
it represents
Working with data involves
02 creating, acquiring, cleaning,
and managing it.
Analyzing data involves
03 filtering, sorting, aggregating,
comparing, and performing
other such analytic operations
on it.
Arguing with data involves
04 using data to support a larger
narrative intended to
communicate some message to
a particular audience
Progression in Organization Mindset
31

Data Aware
Data is collected but used
only discretionarily or on
need basis
Data Literacy
Data Indifference
Maturity
Most Decision are made
from gut feeling than even
being curious about data
Data Driven
Data is the DNA for all decision
making. Data collection,
cleaning, analytics and insights
Data Informed is matured

Business users use data


many times to make
business decisions

Time and Maturity Scale

Source: https://www.smartsheet.com/data-driven-decision-making-management & AWS Academy documentation


32

Data Driven Organizations


Prominent Examples…
Data Driven Organizations 33

Tesla Netflix SpaceX

• Sensor data (10x than • Personalized Movie • Data crunching at a


competitor) to train Recommendation rapid pace
models to design • Auto Generated • Digital Development
autopilot/self-driving Video Thumbnails • Reusable rockets
cars • Marketing • Data forecast
• Collects ocean of data Optimization effective yet
about traffic, driver, • Predictive and precision
car, and battery prescriptive analysis Manufacturing
behavior. May use it • Data for Operations
for monetization

Source:
SpaceX Data: SpaceX: Enabling Space Exploration through Data and Analytics - Digital Innovation and Transformation (harvard.edu)
Lead from the front with Data 34

Insurance Industry Intelligence Healthcare

• Targeted marketing • Monitoring and • Pharmaceutical


• Underwriting agility analyzing all possible Research and
• Setting pricing and sources for lawful Development
reducing costs surveillance • Disease Detection
• Claims management • Automate and apply AI • Health Insurance Risk
• Fraud detection wherever possible as Assessment
data capture growth is
exponential

Renewable Energy Industry Sports


• Use customer feedback, • Helping teams strategize
smart meters, transactional • Identifying hidden players
data to predict production and skills
and need of energy • Game plan revisions
• operational efficiency • Fantasy sports
• Automation • Expanding partnerships
• Building smart cities with
confidence

Source:
Big data in Insurance: https://www.actuary.org/sites/default/files/files/publications/BigDataAndTheRoleOfTheActuary.pdf
Book: Army of None: Autonomous Weapons and the Future of War, by Paul Scharre
Sports and Data: How Data Analysis In Sports Is Changing The Game (forbes.com)
All is good…But then…
35

Per Gartner, NewVantange Partners…


around 60% data initiatives fail

The real question is:


If Failure Is Not an Option,
Why are successful data
initiatives being so Rare?

Source: https://www.bcg.com/publications/2020/increasing-odds-of-success-in-digital-transformation
Gartner Predicts 80% of D&A Governance Initiatives Will Fail by 2027, Due to a Lack of a Real or Manufactured Crisis
Hinderances to Data Progression
36

• NO DATA QUALITY, NO GEN AI – PERIOD! *


• Reliability or quality of data (31% of surveyed experts) was given as
the biggest barrier to implementing GenAI projects.
• Data does not lie but wrong or stale data can misdirect
• Achieving data-driven leadership remains an elusive
aspiration for most organizations
• Data initiatives can not be executed and managed in silos
• Becoming data-driven requires an organizational focus on cultural
change
• Data must be considered part of business strategy than just
the technology enablement

Source *- wavestone-global-technology-data-leaders-survey-2024.pdf
2024 Company outlook- Data Investments
37

Source: data-ai-executive-leadership-survey-2024.pdf
2024- State of Generative AI in Companies
38

Source: data-ai-executive-leadership-survey-2024.pdf
High Level Challenges in data initiatives 39

Cultural impediments continue to represent the greatest obstacle to corporate goals of


achieving a data-driven organization and establishing a data culture within the firm. Not
surprisingly, once again, organizations highlighted that the greatest challenge to becoming
data-driven was a function of culture, people, process change, and organizational alignment,
and had little to do with technology limitations.

Source: data-ai-executive-leadership-survey-2024.pdf
CDO/CDAO Role, Skills, Responsibilities 40

while there has been notable improvement in the percentage of organizations that report
the CDO/CDAO role now being well understood within their organizations, the percentage
remains below 50%, pointing to an ongoing gap between expectations and
understanding. Also, while there is near unanimity that the CDO/CDAO role is necessary
today – 97.2%, a smaller percentage – 88.8% -- see the role as being necessary 5 years
from now.

Source: data-ai-executive-leadership-survey-2024.pdf
How CDO/CDAO role is perceived? 41

The nature of CDO/CDAO reporting also appears to vary by industry and whether the role is
seen as primarily an offensive/revenue role versus a defensive/risk function. While the role
appears to have evolved into more of an offensive role for most organizations, particularly in
industries such as consumer packaged goods and retail, for highly regulated industries such as
financial services, risk, regulatory, and compliance demands continue to be a dominant
function of the role.

Source: data-ai-executive-leadership-survey-2024.pdf
Data value over time… 42

a diminishing effect

Most valuable:
Preventive
& Predictive
Actionable

Reactive
Less valuable:
Historical
Data Processing
Maturity & agility
Near real time Within seconds to Within hours Within days to
minutes months
The trade-offs of data-driven decisions
43

Prediction
Cost Accuracy

Triple intersection is
Speed of obviously desired but
challenging to achieve in
Insights reality

Cost Speed Accuracy

• How much should you invest to go faster • How quickly do you need an answer? • How accurate does the prediction
or predict more accurately? need to be?
• Can you sacrifice accuracy for speed?
• How much incremental improvement • Does waiting for a better answer
justifies the additional cost? outweigh answering more quickly?
44

Five Well Known Data


Challenges
Five Key challenges 45

(5 v’s) in Data Domain

• volume

• velocity

• variety

• veracity

• value
46

Volume

Amount of Data
• It is the base of big data

• It is the initial size and amount of data that is collected

• If the volume of data is large enough, it can be


considered big data
47

Velocity

• How quickly data is generated

• How quickly that data moves

• Companies need their data to flow quickly


To make it available at the right time to enable business
decision making

Example – In healthcare, there are many medical devices made


today to monitor patients and collect data. From in-hospital
medical equipment to wearable devices, collected data needs to
be sent to its destination and analyzed quickly.
48

Variety

• Diversity of the data types

• Outside of the company

• Within the company

• Structured

• Unstructured

• Semi-structured
Type of Data 49

Structured Un-Structured
data that has been predefined and formatted to a data stored in its native format and not processed
set structure until it is used

Easily used by business users Need analytics to derive patterns and behaviors

Predefined structure makes it easy on Machine Native format helps to use data as-is and easy to
Learning algorithms collect as there are not much rules on the
structure

Cons: Limited storage formats and choices. RDBMS Cons: Requires more storage as compared to
or DW structured data e.g. Data lakes or cloud data DWs

Cons: Predefine structure also forces limitation on Cons: Limited skill set availability due to the
use and manipulations. technical nature of the toolset and frameworks
Sources of Data 50

Source Example Type Complexity Velocity Volume Variety

Business or HRMS, ERP, CRM, PPM, Structured Low Mid Mid Low
Enterprise EMR
Application
Documents PDF, XLS, JSSON and so Unstructured Mid Low Low Mid
on

Collaboration Emails, Slack, Teams, Unstructured Mid Mid High Mid


Systems/Public Govt sites, business
Webs sites sites
Media Videos, Audio files, Unstructured High High High High
Images

Social LinkedIn, Twitter, Unstructured High High High Mid


Networks TikTok, Instagram

Data Storage File streams, NoSQL, R- Structured/ Low Mid High Mid
ORDBMS Hybrid

Log files Application, events, Unstructured Mid High High High


transactions ,
clickstream
Sensor Medical devices, Unstructured Low High High High
Data/IoT Data Household Devices,
Security systems, Flight
systems
Variety – cont.
51

• Semi-structured
• Hasn’t been organized
• Has meta-data – example – pictures grouped by tags
52

Veracity

• Defined as the accuracy or truthfulness of a data


set
• In many cases, the veracity of the data sets can be
traced back to the source provenance.
• However, when multiple data sources are combined,
e.g. to increase variety, the interaction across data
sets and the resultant non-homogeneous landscape of
data quality can be difficult to track.

Source: https://datascience.aero/big-data-veracity-value/
53

Value

• What can the organization do with the data?

• What insight can be gained?

• How could that impact the bottom line?

• How could that help gain a competitive advantage?


54

Data Challenges and Opportunities

3 Ps 4 Cs 5 Vs
People Clean Velocity
Process Consistent Variety
Platform Conformed Volume
Current Veracity
Value

Business Outcomes
Is our new talent aligned with what
55

modern data initiatives expect?

Agility in thought
process and adoption Going beyond
of new technology traditional tech stack
paradigms

Roles going beyond End to End


mere data analysis data value realization
The current data roles are evolving… 56

but are we?

• The field of data is


now adding
technology strategy,
governance and
business outcomes to
the traditional model.
• Data roles are not
limited to only data
analysis and
visualization
anymore.

As data vision is maturing and data foundations are laid for aspiring data-driven
organizations, the landscape is expecting more data engineers, data-ops, data
governance, MDM, ml-ops, and ml engineers.

Source: Stanford Lightcast AI Index Report 2023


57

But where to start ?


Data tools and technologies are growing at an impeccable pace

2012 2017

Source:
2012 Big data Landscape- Forbes Blog
2017 Big Data Landscape – Matt Turck-
2021+…Data landscape is ocean 58

Source: https://mattturck.com/data2021/
59

This all sounds logical...

But how to realize it?

• Connect Business Outcome ALWAYS to your data projects


• Understand Data Lifecyle
• Establish standard data infrastructure pattern such as Data Pipelines
• Fail fast and adopt cautiously. No to "Death by Experiments!"
• Improve and Automate

Organize: Data {People, Process, and Platforms}


Data Life Cycle 60

Two perspectives, similar process

Data Engineering & Data Scientist Lens


Analytics Lens

Data analysts, data scientists, predictive modelers, statisticians and other analytics professionals,​
• collect​
• process​
• clean and
• analyze
growing volumes of structured transaction data as well as other forms of data not used by
conventional BI and analytics programs.
Source:
https://online.hbs.edu/blog/post/data-life-cycle
https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
Critical Elements of Data Lifecyle 61

Source: Reis, Joe; Housley, Matt. Fundamentals of Data Engineering (p. 24). O'Reilly Media.
Modern Data Professional
62

• Must have a good understanding Data Lifecycle


• Have broad understanding on technology stack
• Can not remain as island of their own role
• Continuous evolve and upgrade capabilities

Effective data analysis solutions require both storage


and the ability to analyze data in near real-time, with low
latency, while yielding high-value returns.
63

Effective data analysis solutions require both storage


and the ability to analyze data in near real time, with low
latency, while yielding high-value returns.
64

Next Week

- Data Pipelines: What and Why?


- Azure Labs Introductions
65

References:
• https://www.newvantage.com/_files/ugd/e5361a_ad5a8b3da825
4a71807d2dccdb0844be.pdf

• https://www.newvantage.com/_files/ugd/e5361a_ad5a8b3da825
4a71807d2dccdb0844be.pdf

• https://online.hbs.edu/blog/post/data-life-cycle

• https://mattturck.com/data2021/
66

Thank you…

You might also like