[go: up one dir, main page]

0% found this document useful (0 votes)
7 views93 pages

02 Intro To Data Viz

The document provides an introduction to data visualization, emphasizing its importance in conveying information and understanding data through visual perception. It discusses various types of visual representations, their goals, and the principles of effective visual encoding, including the use of color and visual channels. Additionally, it outlines different data types and datasets, along with design principles for creating effective visualizations.

Uploaded by

abby.iitpkd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views93 pages

02 Intro To Data Viz

The document provides an introduction to data visualization, emphasizing its importance in conveying information and understanding data through visual perception. It discusses various types of visual representations, their goals, and the principles of effective visual encoding, including the use of color and visual channels. Additionally, it outlines different data types and datasets, along with design principles for creating effective visualizations.

Uploaded by

abby.iitpkd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

Introduction to Visualization

Saravanan Thirumuruganathan
Topic Goals
• Understand how visualizations convey information

• Understand how to use visual perception capabilities for visualization

• Critically evaluate visualizations in the wild


Data Visualization
• Data visualization is the
• graphical representation of data and information
• Transformation of the symbolic into the geometric [McCormick et al. 1987]

• Often uses visual elements such as charts, graphs, maps, infographics

• Goal: see and understand trends, outliers, and patterns in data.


Data Visualization
• Graphs typically display quantitative information, and include ≥2
scales/axes.
• Charts display discrete relationships among discrete entities.
• Example: flow charts
• Maps display spatial information, possibly with labels and other
information.
• Diagrams are schematic pictures whose parts are symbolic (i.e., not
photographic).
• Infographics are a sort of hybrid of all of the above
Visualization Goals
• Presentation
• Known facts about data
• Task: Communicate results
• Exploration
• Data without hypothesis
• Task: Generate hypothesis
• Confirmation
• Hypothesis is given
• Task: Verify / falsify hypothesis
Visual Representations vs Summary Statistics

Anscombe's Quartet
Visual Representations vs Summary Statistics

Datasaurus Dozen

https://www.math.csi.cuny.edu/~mvj/GC-DataViz-S23/lectures/L1.htm
Why Data Visualization?
"The ability to take data—to be able to understand it,
to process it, to extract value from it, to visualize it, to
communicate it—that's going to be a hugely
important skill in the next decades, […] because now
we really do have essentially free and ubiquitous data.
So the complimentary scarce factor is the ability to
understand that data and extract value from it."

Hal Varian, Google's Chief Economist


Why Data Visualization? A Poverty of
Attention
"What information consumes is rather obvious:
it consumes the attention of its recipients.
Hence a wealth of information creates a poverty of attention,
and a need to allocate that attention efficiently among the
overabundance of information sources that might consume it.”

Herbert A. Simon, Nobel Prize winner


Political science, Economics, Computer science (AI),
Cognitive Psychology.
Data Types: Shneiderman’s Taxonomy
• 1D (sequences)
• Temporal
• 2D (maps)
• 3D (shapes)
• nD (relational)
• Trees (hierarchical)
• Networks (graphs)
• Others (text)

The Eyes Have It: A Task by Data Type Taxonomy for Information Visualization [Shneiderman, 96]
Data Types: Munzner’s Taxonomy
• Item
• An individual entity that is discrete (e.g., a number, a row of a table, etc.)
• Attribute
• Some measurable property of the item
• Link
• A relationship between items
• Position
• A spatial data (e.g., location, coordinates, etc.)
• Grid
• The strategy for sampling continuous data (e.g., geometric and/or topological)
between cells.
Items
• An item is a discrete individual entity
• row in a table
• node in a network
Attributes
• Some measurable property about item
Attributes
• Nominal (labels or categories)
• Fruits: apples, oranges, …
• Ordered
• Quality of meat: Grade A, AA, AAA
• Quantitative - Interval (location of zero arbitrary)
• Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45)
• Only differences (i.e. intervals) may be compared
• Quantitative - Ratio (zero fixed)
• Physical measurement: Length, Mass, Temp, …
• Counts and amounts
Attributes
• Nominal (labels or categories)
• Operations: =, ≠
• Ordered
• Operations: =, ≠, <, >
• Q - Interval (location of zero arbitrary)
• Operations: =, ≠, <, >, -
• Can measure distances, differences etc
• Q - Ratio (zero fixed)
• Operations: =, ≠, <, >, -, %
• Can measure ratios or proportions
Quick Check
• Positive/Negative
• Hot, Warm, Cold
• Temperature Value
• Marital Status: Single, Married, Divorced,
• Year of Birth
• Age
• Gender
Attribute Semantics
• Sequential attribute has values in a certain sequence. Ex. age, height,
weight.

• Diverging attribute is one for which we can determine a middle value


(or zero-value) such that all the values above it are greater than it (or
positive) and all the values below it are less than it (or negative). Ex.
temperature, altitude.

• Cyclic attribute has values that repeat in a period of time. Ex. hour,
week, year.
Dataset Types: Munzner’s Taxonomy
• A dataset is a collection of information that is the target of analysis.
• Complex combination of multiple data types
• Four major types
• Tables
• Networks
• Fields
• Geometry
• [Clusters, Sets, Lists : not very common]
Dataset Types
Dataset Types

Tables Networks & Fields Geometry Clusters, Sets


Trees and Lists

Items Items (Node) Grids Items Items

Attributes Links Positions Positions

Attributes Attributes
Table/Relational Dataset
• Most common dataset type – we will almost exclusively focus on this
• In a 2D table,
• each row is an item
• each column is an attribute
• each cell has a value of a particular item and a particular attribute.
(2D) Table Dataset
Multidimensional Tables
• Multiple indices that allows different dimensions to be connected
Network Dataset type
• Networks (or graphs) are useful to represent relationship between
several items.
• An item is a node and a link is a relation between two items.
• Each node can have associated attributes (e.g., city size)
• Each link may also have associated attributes (e.g., distance between
two cities).

• A tree is just a special type of network.


Fields Dataset type
• Commonly used for scientific analysis
Fields Dataset type
• A field contains attribute values associated with cells.
• Each cell contains measurements or calculations from a continuous
domain.
• Obtaining values from a continuous domain is usually very challenging
because the domain is a continuum.
• A good sampling strategy for taking measurements from discrete
positions is needed.
• Special case: spatial field
Fields Dataset type
Geometry Dataset type
• Data about shape of items with explicit spatial information.

• The items could be points, lines/curves, 2D surfaces/regions, 3D


volumes, or even higher dimensional data.

• Geometry datasets are typically spatial


Visual Encoding
The way in which data is
presented changes how
we consume it, drastically.

Carlos Scheidegger
Visual Encoding

Data Visualization
• Physical data types Visual Encoding • Visual Channels
• int, float, string • x, y, color, opacity
• Graphical marks
• Conceptual data types • point, line, area
• temperature, location
Bertin’s Semiology of Graphics
• Invariants: the idea, concept or topic that unifies all visual marks.
• Components: variable features of the invariant
• Elements: atomic parts within each component consists
Bertin’s Semiology of Graphics
• Elements: atomic parts within each component consists
• Visual mark : A mark is a basic graphical element in an image.
• Position on the plane
Marks: Munzner’s Taxonomy
Marks: Wilkinson’s Taxonomy
Visual Channels
• A visual channel is a way to control the appearance of marks,
independent of the dimensionality of the geometric primitives.

• Popular ones: position, color, shape, tilt and size.

• Unusual ones: depth, luminance, saturation, curvature, etc.


Visual Channels
• Bar charts show two attributes
• Vertical is quantitative
• Horizontal is categorical

• A second quantitative attribute can be


encoded by using the visual channel of
horizontal spatial position. (scatter plot)

• One more categorical attribute may be added by


using the visual channel of hue (i.e., color)

• The visual channel of size may be used to add yet


another quantitative attribute
Visual Channels
• Each attribute is encoded with a single channel.

• Identity channels tells about what something is or where it is.


• Good for unordered categorical attributes
• Examples: circles/triangles, colors, in motion, inside/outside of an area, etc.

• Magnitude channels tell us how much of something there is.


• Good for ordered (ordinal and quantitative) attributes
• Examples: longer/shorter, larger/smaller, brighter/darker, etc.
Design Principles for Visual Channels

• The expressiveness principle states that the visual encoding should


express all of, and only, the information in the dataset attributes.
• Ordered data should be shown so that our perceptual system senses as
ordered.
• Unordered data should not be shown in a way that perceptually implies an
ordering.
• Recall: identity channels for categorical and magnitude for ordered
Design Principles for Visual Channels
• The effectiveness principle states that the importance of the attribute
should match with the salience (i.e., noticeability) of the channel.
• The most important attributes should be encoded with the most
effective channels in order to be most noticeable.
• Less important attributes can be matched with less effective channel.
Design Principles for Visual Channels
• How do we know which attributes are most important?
• Very task and dataset specific

• How do we know which channels are most effective?


• Accuracy
• Discriminability
• Separability
• Ability to provide visual popout
• Ability to provide perceptual grouping
Channel Accuracy
• Accuracy: How close is human perceptual judgment to some
objective measurement of the stimulus.

• Human perceive different visual channels with different levels of


accuracy; they are not all equally distinguishable.

• Most stimulus are magnified or compressed, with few remaining


unchanged.
Channel Accuracy
Channel Discriminability
• If a data attribute is encoded using a particular visual channel, are the
differences between items (of this attribute) perceptible to the
human as intended?

• Binning: each bin is a distinguishable step or level from the other.


• Higher the number, more discriminative a channel is
Channel Discriminability
Channel Discriminability
• Ratios are more important than magnitude
• Most continuous variation in stimuli are perceived in discrete steps
Channel Discriminability

• There are three bins: 500, 250,


100
• Adding more will make chart
hard to read and discriminate
Channel Separability
• Visual channels are not completely independent from each other,
because some have dependencies and interactions with other.

• Consider potential interaction between visual channels, ranging from


the orthogonal and independent separable channels to the
inextricably combined integral channels.
Channel Separability
Channel Separability

• Position and Hue channels • Size and Hue


• Good separation • Decent separation
• Left and right are separable • Big and small are separable
• Red and blue are separable • Blue and violet are not so separable
Channel Separability
Channel Separability

• Horizontal and vertical size • Red and green color channels


channels • Terrible separation
• Interference with area channel
• Unexpected grouping: circle, flat
and tall ellipses
Channel Popout
• Visual Popout: Making a distinct item stands out from many others
immediately.
• Popout is also known as preattentive processing

Color: Good Shape: less good Conjunction: Color + Shape


Not so good.
Channel Popout

Tilt/Angle Size Shape


Magnitude Estimation

4X

How much longer?


Magnitude Estimation

4X

How much steeper slope?


Magnitude Estimation

4.5 X

How much larger area?


Magnitude Estimation

2X

How much darker?


Magnitude Estimation

4X

How much bigger value?


Relative Magnitude Comparison

Modified from Hearst’s interpretation of Mackinlay ’88


Bertin’s Levels of Organization

Nominal Ordinal Quantitative


Position Y Y Y
Size Y Y Y
Value Y Y Meh
Texture Y Meh
Color Y
Orientation Y
Shape Y
Mackinlay’s Ranking
Empirical estimates of encoding effectiveness
(Use this!) Perceptual ranking of channels
Colors
https://www.reddit.com/r/funny/comments/3jotqq/tadah/
Recall: Bertin’s Levels of Organization

Nominal Ordinal Quantitative


Position Y Y Y
Size Y Y Y
Value Y Y Meh
Texture Y Meh
Color Y
Orientation Y
Shape Y
Recall: Mackinlay’s Ranking
Recall: Perceptual Ranking of Channels
Order these colors

Different people will order it differently. So be careful when you use color for ordered items
Semi-Myth: Color Emotion Guide

https://thelogocompany.net/psychology-of-color-in-logo-design/
Decomposing Color
• Color is not a monolithic entity: it can be decomposed to individual
components

• Typically, they are decomposed to three visual channels

• Because human color perception is 3-dimensional.

• L, M and S cone cells in retina


Decomposing Color
• RGB
• Suitable for computer screens

• CMY: Cyan + Magenta + Yellow


• Suitable for printing
• Needs lot of ink for darker colors

https://www.sciencephoto.com/media/1156405/view/additive-and-subtractive-
colour-mixing-illustration
HSL Color Space
• Hue, Saturation and Lightness
• More intuitive and perceptually relevant
• Used by artists
• Relevant to data scientists when preparing charts

• Hue axis captures what we normally think of as pure colors (e.g., red,
green, blue, yellow, purple, etc.)
• Saturation axis is the amount of white mixed with a pure color.
• Pink is desaturated Red
• Lightness axis is the amount of black mixed with a color.
HSL Color Space
Order these colors
Order these colors
Order these colors
Lightness Saturation Hue

Perceived as ordered Not so much


Color Spaces, Maps and Palettes
• Color space is a specific organization of colors allowing reproducible
representation of colors
• Color model represents colors as tuples of numbers (RGB, CMY, HSL)

• Color map defines a mapping between colors and data values, a visual
encoding with colors.
• Categorical vs ordered color maps
• Continuous vs Segmented color maps
Color Palette

• A palette or color scheme is a choice of colors


originally used in artistic and design contexts.
• Provides aestetic appeal and intuitive contrast

• One of the good ones to use is Brewer palettes


• Most viz libraries provide it
Rainbow Colormap (Do not use it!!!!!)
• Rainbow colormap is perceptually nonlinear
• steps of the same size at different points in the colormap range are not
perceived equally by our eyes.

• Luckily, most libraries and languages are phasing it out


Rainbow Colormap

https://www.wsj.com/articles/weather-forecasts-should-get-over-the-rainbow-1538054430
Univariate Categorical
• Aim for maximum distinguishability
• Use hue as primary visual channel
• Even spreads around the hue circle to
maximize perceptual distance and
produce harmonious color combinations
Univariate Ordered
• Dimension 1: Attribute Type
• If Ordinal, use Segmented
• If Quantitative, use continuous
• Dimension 2: Attribute Ordering
• Sequential: ramp up either
luminance/brightness or saturation
• Diverging:
• Use neutral color for midpoint
• Saturated color for endpoints
• Distinguish endpoints with hue differences
Bivariate Data
• Visual channels are not independent, interpretation can get very
difficult
• Avoid using color in this case, unless you really know what you
are doing
• Brewer’s Color Scheme Chooser
Recall: Perceptual Ranking of Channels
Advice for Colorists
• Get it right in black and white
• Using colors is sometimes unavoidable
• Color is a less effective visual channel
• So use other more effect visual channels first
• Use color as a redundant visual channel to reinforce
existing channels
• This ensures your chart can stand on its own even in
black and white
• A tasteful combination of visual channels (eg
color, marker shape, line shape etc) can make
your visualization memorable
Adapted from Jeffrey Heer slides https://www.bekk.christmas/post/2022/01/visual-redundancy
Advice for Colorists

• Use only a few colors (~6 ideal)

• Respect the color blind


• 36% of population have some visual defect
• 5% of population are color blind (5-8% of men and 0.5-1% of women)
Advice for Colorists
• Colors should be
distinctive and named
• If you do not what teal
is, do not use it :)

Adapted from Jeffrey Heer slides


Advice for Colorists
• Use cultural conventions; appreciate symbolism
• Red, gold are considered lucky colors in China
• Popular colors

https://today.yougov.com/international/articles/12335-why-blue-worlds-favorite-color
Advice for Colorists
• Use semantically resonant color assignments

Adapted from Jeffrey Heer slides

You might also like