ISOM3360 Data Mining for Business Analytics
Introduction
Instructor: Yi Yang
Department of ISOM
Spring 2025
Welcome
❑ Course information
2
About me
❑ Instructor: Yi Yang, Associate Professor, ISOM
❑ Research interests: machine learning
❑ Ph.D. from Northwestern University
❑ Taught at UIUC for two years
❑ Worked at IBM Research and Amazon
❑ Consulting for hedge fund on machine learning
❑ Teaching ISOM 3360, 3370, 5270 and ExecEd
3
Course information
❑ 18~19 lectures
❑ 10 lab sessions
❑ Hands-on problem solving using Python
❑ 3 assignments
❑ 1 team project (3~4 person per group)
❑ 2 exams: Midterm exam (tentatively Mar 20,
7:00pm-8:30pm); Final exam (TBD)
4
Course material
❑ All the materials (e.g., lecture slides, readings)
will be posted on Canvas course website.
❑ Data Mining for Business Analytics: Concepts, Techniques, and
Applications in R, by Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R.
Patel, Kenneth C. Lichtendahl
❑ Data Science for Business: What you need to know about data mining
and data-analytic thinking, by Foster Provost, Tom Fawcett
❑ Learning Data Mining with Python, by Robert Layton
5
Grading components
❑ Lab 5%
❑ Class Attendance/Participation 10%
❑ Homework Assignments 10%
❑ Group Project 15%
❑ Midterm Exam 28%
❑ Final Exam 32%
6
❑ Instructor: Yi Yang
❑ Email: imyiyang@ust.hk Begin subject: [ISOM3360]
❑ Office Hours: by appointment
❑ Teaching Assistant:
❑ Sophie Gu, imsophie@ust.hk
7
Academic integrity
8
Questions
9
You may have heard of these
Big Data
Artificial
Intelligence
Data
Mining Data
Science
Machine Python
Learning
10
Data mining
11
Data mining
12
Data
Structured Data Unstructured Data
Data that has a predefined
Data that does not have a
and organized format or
Definition predefined or organized format
schema, often stored in a
or schema
database or spreadsheet
Text documents, social media
Tables in a relational
posts, images, videos, audio
Examples database, spreadsheets,
recordings, email messages,
log files, financial data
web pages, GPS data, etc
Natural language processing,
Techniques Machine learning (simple) computer vision, speech
recognition (hard)
Data Volume Usually smaller in volume Usually much larger in volume
13
Unstructured data
❑ In addition to traditional numerical data, a wealth
of potentially valuable business information may
originate in unstructured forms.
14
Unstructured data: Text
15
16
Unstructured data: Image
17
Data mining
❑ Finding patterns in large amount of data, using
machine learning methods, for actionable
insights
18
Prediction is the key
❑ Prediction is the key for decision
making under uncertainty.
❑ Better prediction creates competitive
advantages.
19
Machine Learning
❑ Machine learning algorithms enable computer programs to
automatically analyze data, recognize patterns, and make
predictions for new unseen data.
❑ Machine learning models make predictions.
Machine learning: It’s a
induces pattern FACE
from data
Face recognition from image 20
❑ Q: Is ChatGPT a machine learning model?
21
An Example: customer complaint management
Paint point: Firms receive customer complaint
filings on different aspects, how to handle the
complaints in a timely manner?
22
Traditional vs. machine learning solution
❑ Traditional: hire a team of customer services to read the
complaints and forward the complaints to different teams for
handling.
❑ Machine learning: automatically classify customer complaints
into different categories (e.g. shipping related, product defect
related); automatically rank the priority of the complaints; and
forward the complaints to different teams for handling.
23
Business Intelligence pyramid
Decision making and
strategic planning
Data Mining, Machine Learning
Data retrieval and
aggregation
Data management
and Storage
24
Predictive vs. Descriptive analytics
Descriptive Analytics Predictive Analytics
Focus Understanding past events Predicting future outcomes
Goal Summarize and present data Build predictive models
Data Type Historical data Historical data
Analysis Identify patterns and trends Build models to make predictions
Use Case Understand past performance Forecast future outcomes
Example Sales data for the past year Sales forecast for next quarter
Summary statistics and
Output Predictive models and forecasts
visualizations
Data aggregation, visualization,
Techniques Machine learning
and basic statistics
Decision Making Reactive Proactive
25
Exercise
You, as a company marketing director, want to know the answers to the
following questions. Which ones require a data mining solution?
Who are the high-value customers?
Is there an age difference between the high-value customers and the low-value
customers?
Will some particular new customer be high-value customer?
How many sales amount should I expect a new customer to generate?
Customer Gender Age Membership Monthly Amount
Purchase
Alice F 25 Y 5 $120
Let’s define customers whose
Bob M 40 Y 3 $30
amount > $100 as high-value
Charlie M 35 Y 6 $210 customer. The rests are the low-
value customer.
Doug M 18 N 4 $95
… … … … … …
26
Descriptive analytics
27
Exercise
❑ Say you work in a digital media company that provides
online streaming video service. You have lots of data
about lots of users watching lots of movies/TVs. What are
the use cases of predictive analytics?
28
Course objective
❑ You will learn
❑ Various machine learning models
❑ Hands on experience by lab practice
❑ Analytical thinking by various business examples
❑ You will not learn
❑ Data warehousing, Database, big data techniques
❑ Business/Managerial planning
29
30