0% found this document useful (0 votes)

39 views51 pages

Big Data Visual Analytics Course

The document outlines the course CS 661: Big Data Visual Analytics taught by Soumya Dutta at IIT Kanpur, covering key concepts in visual design, visual variables, and the importance of scalability in big data. It emphasizes the role of visual analytics in data processing and interaction, along with challenges such as handling noisy data and data normalization. Additionally, it discusses techniques for data augmentation and reduction to manage large datasets effectively.

Uploaded by

Swaraj Sonavane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views51 pages

Big Data Visual Analytics Course

Uploaded by

Swaraj Sonavane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Big Data Visual Analytics (CS 661)

Instructor: Soumya Dutta

Department of Computer Science and Engineering
Indian Institute of Technology Kanpur (IITK)
email: soumyad@cse.iitk.ac.in
Announcements
• To get a quicker response from me, please email to my CSE email and
not to my IITK email:
• My CSE email: soumyad@cse.iitk.ac.in

IITK CS661: Big Data Visual Analytics: Soumya Dutta 2

Acknowledgements
• Some of the following slides are adapted from the excellent course
materials made available by:
• Prof. Klaus Mueller (State University of New York at Stony Brook)
• Prof. Tamara Munzner (University of British Columbia)

IITK CS661: Big Data Visual Analytics: Soumya Dutta 3

Visual Design and Visual
Variables

IITK CS661: Big Data Visual Analytics: Soumya Dutta 4

Key Visual Representations
• Gestalt Principles
• The tendency to perceive elements as belonging to a group, based on
certain visual properties
• Pre-attentiveness
• Certain low level visual aspects are recognized before conscious
awareness
• Visual variables
• The different visual aspects that can be used to encode information

IITK CS661: Big Data Visual Analytics: Soumya Dutta 5

Gestalt Principles
• “Gestalt” is German for “unified whole”
• Grasp the "totality" of something before worrying about the details
• Proximity, similarity, closure, multistability, …

Rubin’s vase
What do you see in this figure? What do you see in this figure?
IITK CS661: Big Data Visual Analytics: Soumya Dutta 6
Pre-attentiveness
• Also called pop-out

IITK CS661: Big Data Visual Analytics: Soumya Dutta 7

Visual Variables
• Two planar variables
• Spatial dimensions (X and Y)

IITK CS661: Big Data Visual Analytics: Soumya Dutta 8

Visual Variables
• Two planar variables
• Spatial dimensions (X and Y)
• Six Retinal variables
• Size
• Color
• Shape
• Orientation
• Texture
• Brightness

IITK CS661: Big Data Visual Analytics: Soumya Dutta 9

Visual Variables
• Two planar variables
• Spatial dimensions (X and Y)
• Six Retinal variables
• Size
• Color
• Shape
• Orientation
• Texture
• Brightness
• Retinal variables allow for one more variable to be encoded

IITK CS661: Big Data Visual Analytics: Soumya Dutta 10

Visual Variables

Planar Size Brightness Shape

Texture Color Orientation

IITK CS661: Big Data Visual Analytics: Soumya Dutta 11

Take Aways
• Planar variable is the strongest visual variable
• Maps to proximity
• Provides an intuitive organization of information
• Things close together are perceptually grouped together (Gestalt)
• Size and brightness are good secondary visual variables to encode
relative magnitude
• Color is a good visual variable for labeling
• Texture can do this as well, but it does not support pop-out much
• Shape provides only limited pop-out

IITK CS661: Big Data Visual Analytics: Soumya Dutta 12

Considerations with Scalability for Big Data
• Must be scalable to
• Number of data points
• Number of dimensions
• Data sources
• Diversity of data sources (heterogeneity)
• Number of users

IITK CS661: Big Data Visual Analytics: Soumya Dutta 13

Visual Analytics can help!

IITK CS661: Big Data Visual Analytics: Soumya Dutta 14

What is Visual Analytics
• Visualization plus...
• Data processing (analytics)
• Intelligent computing (AI, machine learning)
• Interaction (HCI)
• Pattern discovery
• Storytelling and sensemaking
• Behavioral psychology (cognitive science, human factors)

Visual Analytics is the process of analytical reasoning often

supported by a highly interactive visual interface/tool

IITK CS661: Big Data Visual Analytics: Soumya Dutta 15

Visual Information Seeking Mantra
• Ben Shneiderman’s Mantra: Overview, zoom and filter, then details-on-demand!

Overview first
IITK CS661: Big Data Visual Analytics: Soumya Dutta 16
Visual Information Seeking Mantra
• Ben Shneiderman’s Mantra: Overview, zoom and filter, then details-on-demand!

Zoom
IITK CS661: Big Data Visual Analytics: Soumya Dutta 17
Visual Information Seeking Mantra
• Ben Shneiderman’s Mantra: Overview, zoom and filter, then details-on-demand!

Filter
IITK CS661: Big Data Visual Analytics: Soumya Dutta 18
Visual Information Seeking Mantra
• Ben Shneiderman’s Mantra: Overview, zoom and filter, then details-on-demand!

Details on demand
IITK CS661: Big Data Visual Analytics: Soumya Dutta 19
Another Paradigm: Focus + Context
• Focus + Context:
• One single view which shows information in direct context
• Maintains continuity and do not require viewer to shift back and forth
• But: there is distortion!

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://www.youtube.com/watch?v=acsFQvv4B0Q 20

Use of Visualization
• Visual Perception
• Fast screening of lot of data
• Pattern recognition
• High-level cognition
• Interaction
• Direct manipulation of data and visualization (Human in the loop)
• Two-way communication

Humans are important!

But Humans are imperfect too!!

IITK CS661: Big Data Visual Analytics: Soumya Dutta 21

Humans Are Imperfect
• Humans tend to overlook/ignore non-focused (and unexpected)
objects even when they are very close and obvious
• Humans also have limited working memory
• Fine details are quickly forgotten when focus changes
• Need to preserve temporal context

IITK CS661: Big Data Visual Analytics: Soumya Dutta 22

Humans Are Imperfect
• Spot the difference: Change blindness

IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 23
Humans Are Imperfect
• Spot the difference: Change blindness

IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Wikipedia 24
Human Limitations for Visualization
• The Magic Number Seven (7 ± 2) for visualization
• Not more than 7 ± 2 segments in a pie chart
• Not more than 7 ± 2 colors in a line chart
• and so on …..

Miller, G.. (1956). "The magical number seven, plus or minus two: Some limits on our capacity for processing information".
IITK CS661: Big Data Visual Analytics: Soumya Dutta 25
Example of Visual Complexity

Do we really need the background grid? Maybe not!

IITK CS661: Big Data Visual Analytics: Soumya Dutta 26

Handling Data

IITK CS661: Big Data Visual Analytics: Soumya Dutta 27

What Do We Do After Getting the Raw Data?
• Real world data can be dirty!

• Data cleaning (Wrangling)

• Missing values
• Noisy data
• Deal with outliers
• Standardize/normalize
• Resolve inconsistency
• Fuse/merge

Data Cleaning Cycle

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://blog.insycle.com/data-cleaning-hubspot 28
Missing Data: Why?
• Data may not be always available/complete!

• Missing data may be due to

• Equipment malfunction
• Inconsistent with other recorded data and thus deleted
• Data not entered due to misunderstanding
• Certain data may not be considered important at the time of entry
• Many more other reasons

IITK CS661: Big Data Visual Analytics: Soumya Dutta 29

Missing Data: How to Handle?
• How would you estimate the missing value for a dataset?
• Ignore or put in a default value
• Manually fill in (can be tedious or infeasible for large data)
• Use the available value of the nearest neighbor
• Average over all the values
• Use a probabilistic methods (regression, Bayesian, decision tree)
• Use AI/ML models to predict missing data

IITK CS661: Big Data Visual Analytics: Soumya Dutta 30

Data Normalization and Standardization
• Sometimes we like to have all variables on the same scale
• Min-max normalization

• Standardization

IITK CS661: Big Data Visual Analytics: Soumya Dutta 31

Data Normalization and Standardization
• Sometimes we like to have all variables on the same scale
• Min-max normalization

• Standardization

• Clipping tails and outliers

• set all values beyond ± 3s to value at 3s

IITK CS661: Big Data Visual Analytics: Soumya Dutta 32

Normalization

IITK CS661: Big Data Visual Analytics: Soumya Dutta 33

Standardization

IITK CS661: Big Data Visual Analytics: Soumya Dutta 34

Robust Scaling

• IQR = Q3 – Q1
• Difference between the 75th percentile and the 25th percentile data
• Immune to outliers
• Relies on the median and IQR, which are robust to extreme values
• Ensures that most of the data falls within a consistent range after scaling

IITK CS661: Big Data Visual Analytics: Soumya Dutta 35

Comparison Among Diff. Methods of Scaling

Raw Data Min-max normalization Standardization Robust Scaling

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://www.geeksforgeeks.org/standardscaler-minmaxscaler-and-robustscaler-techniques-ml/ 36

Noisy Data
• Noise = Random error in a measured variable
• Faulty data collection instruments
• Data entry problems
• Data transmission problems
• Technology limitation
• Inconsistency in naming convention

IITK CS661: Big Data Visual Analytics: Soumya Dutta 37

Noisy Data: What to Do?
• Binning
• Replace data with bin centers

IITK CS661: Big Data Visual Analytics: Soumya Dutta 38

Noisy Data: What to Do?
• Binning
• Replace data with bin centers
• Clustering
• Detect and remove outliers

IITK CS661: Big Data Visual Analytics: Soumya Dutta 39

IITK CS661: Big Data Visual Analytics: Soumya Dutta 40

Noisy Data: What to Do?
• Binning
• Replace data with bin centers
• Clustering
• Detect and remove outliers
• Semi-automated method
• Combined human and computer inspection
• Detect suspicious value and check manually
• Regression
• Smooth data by fitting to a regression
function

IITK CS661: Big Data Visual Analytics: Soumya Dutta 41

IITK CS661: Big Data Visual Analytics: Soumya Dutta 42

Deal with Small Data
• Can you invent meaningful new data?

IITK CS661: Big Data Visual Analytics: Soumya Dutta 43

Deal with Small Data à Data Augmentation
• Can you invent meaningful new data?
• Data Augmentation
• Strategy to artificially synthesize new data from
existing data

IITK CS661: Big Data Visual Analytics: Soumya Dutta 44

Deal with Small Data à Data Augmentation
• Can you invent meaningful new data?
• Data Augmentation
• Strategy to artificially synthesize new data from
existing data
• Common techniques are (for images)
• rotations
• Translations
• Zooms
• Flips
• color perturbations
• crops
• add noise by jittering

IITK CS661: Big Data Visual Analytics: Soumya Dutta 45

Synthetic Data Generation for Imbalanced
Classification
• When data has severe imbalance in
the class representation
• If you use such data for ML model
training, it will perform poorly for the
minority class
• SMOTE (Synthetic Minority
Oversampling Technique) can help
• A data augmentation method

Imbalanced Data

IITK CS661: Big Data Visual Analytics: Soumya Dutta 46

SMOTE: Synthetic Data Generation for
Imbalanced Classification
• How do we generate samples for minority class?
1. Randomly under-sample the majority class
2. Select a minority class instance (x) at random and find its k-nearest
minority class neighbors
3. Select one of the k neighbors at random, say (y)
4. The synthetic instances are generated as a convex combination of the two
chosen instances x and y

IITK CS661: Big Data Visual Analytics: Soumya Dutta 47

SMOTE: Synthetic Data Generation for
Imbalanced Classification
• Example:

Imbalanced Data SMOTE + random under-sampling

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/ 48

Data Augmentation for Visualization
• Generate new samples according to the data distributions
• Cluster the data (outliers may form clusters!)
• The size of each cluster represents its percentage in the population
• Randomize new samples – bigger clusters get more samples

Augmentation rate ~ Cluster size

IITK CS661: Big Data Visual Analytics: Soumya Dutta 49
Deal with Big Data à Data Reduction!
• Purpose
• Reduce the data to a size that can be feasibly stored without missing on
important information
• Reduce the data so a mining algorithm can be feasibly run

• Alternatives
• Buy more storage
• Buy more computers or faster ones
• Develop more efficient algorithms

• In practice, all of this is happening at the same time

• But the growth of data and complexities is faster
• So, data reduction is important!
•
IITK CS661: Big Data Visual Analytics: Soumya Dutta 50
Data Reduction: How?
• Summarization (Later in the course)
• Binning

Summary Data
• Distribution-based
• Clustering
• Sampling (Later in the course)
• Systematic/Regular
• Random Big Data
• Stratified
• Adaptive/Data-driven
• Importance-driven

Sampling
• Cluster-based
• Dimension Reduction (Later in the course)
AI/ML model
• AI/ML techniques (Later in the course)

IITK CS661: Big Data Visual Analytics: Soumya Dutta 51

Lecture 2
No ratings yet
Lecture 2
55 pages
Lecture9 InfoVis Intro
No ratings yet
Lecture9 InfoVis Intro
34 pages
Lecture 1
No ratings yet
Lecture 1
58 pages
Lecture1 Introduction
No ratings yet
Lecture1 Introduction
59 pages
Big Data Visual Analytics Course
No ratings yet
Big Data Visual Analytics Course
55 pages
Lecture7 TF Design
No ratings yet
Lecture7 TF Design
37 pages
Lecture 4
No ratings yet
Lecture 4
46 pages
Lecture8 Parallel Volren 5691c06c A649 4ca9 Ac52 B236c4faf2a6
No ratings yet
Lecture8 Parallel Volren 5691c06c A649 4ca9 Ac52 B236c4faf2a6
45 pages
Bda U-5
No ratings yet
Bda U-5
33 pages
Chapter 1
No ratings yet
Chapter 1
53 pages
BDA GTU Study Material Presentations Unit-1 09082021103431AM
100% (1)
BDA GTU Study Material Presentations Unit-1 09082021103431AM
53 pages
Da Notes (Big Data) PDF
No ratings yet
Da Notes (Big Data) PDF
32 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
24 pages
Big Data Unit 1 Easy Notes (Edushine Classes)
No ratings yet
Big Data Unit 1 Easy Notes (Edushine Classes)
21 pages
Introduction To Big Data Management
No ratings yet
Introduction To Big Data Management
53 pages
Lecture8 Parallel Volren
No ratings yet
Lecture8 Parallel Volren
44 pages
1 Introduction
No ratings yet
1 Introduction
68 pages
Unit 4 - 250612 - 231911
No ratings yet
Unit 4 - 250612 - 231911
12 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
16 pages
L01 Intro
No ratings yet
L01 Intro
47 pages
Big Data Analytics L-7
No ratings yet
Big Data Analytics L-7
3 pages
Kit 601 L Unit 1 240219102731 858108ce
No ratings yet
Kit 601 L Unit 1 240219102731 858108ce
35 pages
Eti MP
No ratings yet
Eti MP
15 pages
1 Introduction
No ratings yet
1 Introduction
130 pages
Dsbda Ut6
No ratings yet
Dsbda Ut6
11 pages
Data Visualization-1
No ratings yet
Data Visualization-1
29 pages
DS4015
No ratings yet
DS4015
2 pages
Big Data for 3rd Year AI&DS Students
No ratings yet
Big Data for 3rd Year AI&DS Students
71 pages
BDA - CHP 1
No ratings yet
BDA - CHP 1
141 pages
Unit-5 BDA - Data Visualization
No ratings yet
Unit-5 BDA - Data Visualization
19 pages
Bdt..u1 PPT 08112023
No ratings yet
Bdt..u1 PPT 08112023
71 pages
Intro To Big Data Analytics
No ratings yet
Intro To Big Data Analytics
14 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
130 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
Unit I
No ratings yet
Unit I
64 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Unit-1 Introduction To Big Data Analytics
No ratings yet
Unit-1 Introduction To Big Data Analytics
57 pages
00 Intro
No ratings yet
00 Intro
40 pages
College La Iruthu Come Back Bone Only For
No ratings yet
College La Iruthu Come Back Bone Only For
2 pages
Unit 5 BDT
No ratings yet
Unit 5 BDT
132 pages
CSA3004 - DATA-VISUALIZATION - LT - 1.0 - 1 - CSA3004 - Data Visualization
No ratings yet
CSA3004 - DATA-VISUALIZATION - LT - 1.0 - 1 - CSA3004 - Data Visualization
3 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Purple Pink Trendy Cyber Y2K Creative Presentation - 20241202 - 093632 - 0000
No ratings yet
Purple Pink Trendy Cyber Y2K Creative Presentation - 20241202 - 093632 - 0000
16 pages
Bda U1
No ratings yet
Bda U1
80 pages
01 Introduction
No ratings yet
01 Introduction
26 pages
Subject Code:: Data Visualization
No ratings yet
Subject Code:: Data Visualization
8 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
R Sharmila
No ratings yet
R Sharmila
6 pages
Business Data Visual
No ratings yet
Business Data Visual
50 pages
Elec 3-Reviewer
No ratings yet
Elec 3-Reviewer
33 pages
LM of Data
No ratings yet
LM of Data
5 pages
Exploring Big Data Using Visual Analytics: Daniel A. Keim
No ratings yet
Exploring Big Data Using Visual Analytics: Daniel A. Keim
16 pages
Data Analytics Lesson Plan
No ratings yet
Data Analytics Lesson Plan
11 pages
Interaction 2021
No ratings yet
Interaction 2021
64 pages
Huawei
No ratings yet
Huawei
29 pages
CO5-Session-1-Evaluate Data Visualization and Identify Ways To Improve It
No ratings yet
CO5-Session-1-Evaluate Data Visualization and Identify Ways To Improve It
14 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
15 pages
Bda Mod 1
No ratings yet
Bda Mod 1
83 pages
Data Classification 1707296890
No ratings yet
Data Classification 1707296890
6 pages
Impact of Facebook
No ratings yet
Impact of Facebook
18 pages
G.I.S and Its Application in Archeology
No ratings yet
G.I.S and Its Application in Archeology
13 pages
Lab - Qlik Replicate SAP Extractor Endpoints
No ratings yet
Lab - Qlik Replicate SAP Extractor Endpoints
16 pages
Seeq UseCase Continuous Process Verification
No ratings yet
Seeq UseCase Continuous Process Verification
2 pages
Big Data Architecture Guide
No ratings yet
Big Data Architecture Guide
41 pages
01 Gmane-Introduction - en
No ratings yet
01 Gmane-Introduction - en
2 pages
Full Download Design Computing and Cognition'22 John S. Gero PDF
No ratings yet
Full Download Design Computing and Cognition'22 John S. Gero PDF
47 pages
11 - Backup and Restore 2.0
No ratings yet
11 - Backup and Restore 2.0
8 pages
DR Drill
No ratings yet
DR Drill
3 pages
Aist 1000
No ratings yet
Aist 1000
238 pages
Binca Sql-Injection
No ratings yet
Binca Sql-Injection
3 pages
The IDEA Method - DMGT 748
No ratings yet
The IDEA Method - DMGT 748
171 pages
Certified Data Analyst - Ain GenX (PVT.) Ltd.
No ratings yet
Certified Data Analyst - Ain GenX (PVT.) Ltd.
11 pages
Ritika Vohra Excel Sheet
No ratings yet
Ritika Vohra Excel Sheet
39 pages
1st Grade Space Data Lesson
No ratings yet
1st Grade Space Data Lesson
3 pages
The Study of Rural Marketing Strategies of Mahindra Tractors
No ratings yet
The Study of Rural Marketing Strategies of Mahindra Tractors
5 pages
Course Handout BRM 2012
No ratings yet
Course Handout BRM 2012
4 pages
Research Proposal (Main File)
No ratings yet
Research Proposal (Main File)
31 pages
Understanding Research
No ratings yet
Understanding Research
14 pages
Overview of SAP BW Landscape and VWITS Role
No ratings yet
Overview of SAP BW Landscape and VWITS Role
8 pages
IPTV Stream Links and Details
0% (1)
IPTV Stream Links and Details
12 pages
Lakhmani Resume BI DA
No ratings yet
Lakhmani Resume BI DA
2 pages
Unit I Introduction To Data Science 9
No ratings yet
Unit I Introduction To Data Science 9
20 pages
Automa Tag
No ratings yet
Automa Tag
51 pages
School Leadership Challenges
No ratings yet
School Leadership Challenges
86 pages
2020 Cavallo-Et-Al CR PP
No ratings yet
2020 Cavallo-Et-Al CR PP
29 pages
DACI Programmer's Reference Manual
No ratings yet
DACI Programmer's Reference Manual
7 pages
Research Design: Qualitative vs Quantitative
No ratings yet
Research Design: Qualitative vs Quantitative
1 page
2.2 Ogg-19-1-0-0-0-Cert-Matrix
No ratings yet
2.2 Ogg-19-1-0-0-0-Cert-Matrix
30 pages

Big Data Visual Analytics Course

Uploaded by

Big Data Visual Analytics Course

Uploaded by

Big Data Visual Analytics (CS 661)

Instructor: Soumya Dutta

IITK CS661: Big Data Visual Analytics: Soumya Dutta 2

IITK CS661: Big Data Visual Analytics: Soumya Dutta 3

IITK CS661: Big Data Visual Analytics: Soumya Dutta 4

IITK CS661: Big Data Visual Analytics: Soumya Dutta 5

IITK CS661: Big Data Visual Analytics: Soumya Dutta 7

IITK CS661: Big Data Visual Analytics: Soumya Dutta 8

IITK CS661: Big Data Visual Analytics: Soumya Dutta 9

IITK CS661: Big Data Visual Analytics: Soumya Dutta 10

Planar Size Brightness Shape

Texture Color Orientation

IITK CS661: Big Data Visual Analytics: Soumya Dutta 11

IITK CS661: Big Data Visual Analytics: Soumya Dutta 12

IITK CS661: Big Data Visual Analytics: Soumya Dutta 13

Visual Analytics can help!

IITK CS661: Big Data Visual Analytics: Soumya Dutta 14

Visual Analytics is the process of analytical reasoning often

IITK CS661: Big Data Visual Analytics: Soumya Dutta 15

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://www.youtube.com/watch?v=acsFQvv4B0Q 20

Humans are important!

IITK CS661: Big Data Visual Analytics: Soumya Dutta 21

IITK CS661: Big Data Visual Analytics: Soumya Dutta 22

Do we really need the background grid? Maybe not!

IITK CS661: Big Data Visual Analytics: Soumya Dutta 26

IITK CS661: Big Data Visual Analytics: Soumya Dutta 27

• Data cleaning (Wrangling)

Data Cleaning Cycle

• Missing data may be due to

IITK CS661: Big Data Visual Analytics: Soumya Dutta 29

IITK CS661: Big Data Visual Analytics: Soumya Dutta 30

IITK CS661: Big Data Visual Analytics: Soumya Dutta 31

• Clipping tails and outliers

IITK CS661: Big Data Visual Analytics: Soumya Dutta 32

IITK CS661: Big Data Visual Analytics: Soumya Dutta 33

IITK CS661: Big Data Visual Analytics: Soumya Dutta 34

IITK CS661: Big Data Visual Analytics: Soumya Dutta 35

Raw Data Min-max normalization Standardization Robust Scaling

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://www.geeksforgeeks.org/standardscaler-minmaxscaler-and-robustscaler-techniques-ml/ 36

IITK CS661: Big Data Visual Analytics: Soumya Dutta 37

IITK CS661: Big Data Visual Analytics: Soumya Dutta 38

IITK CS661: Big Data Visual Analytics: Soumya Dutta 39

IITK CS661: Big Data Visual Analytics: Soumya Dutta 40

IITK CS661: Big Data Visual Analytics: Soumya Dutta 41

IITK CS661: Big Data Visual Analytics: Soumya Dutta 42

IITK CS661: Big Data Visual Analytics: Soumya Dutta 43

IITK CS661: Big Data Visual Analytics: Soumya Dutta 44

IITK CS661: Big Data Visual Analytics: Soumya Dutta 45

IITK CS661: Big Data Visual Analytics: Soumya Dutta 46

IITK CS661: Big Data Visual Analytics: Soumya Dutta 47

Imbalanced Data SMOTE + random under-sampling

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/ 48

Augmentation rate ~ Cluster size

• In practice, all of this is happening at the same time

IITK CS661: Big Data Visual Analytics: Soumya Dutta 51

You might also like