Introduction to Visualization
Saravanan Thirumuruganathan
Topic Goals
• Understand how visualizations convey information
• Understand how to use visual perception capabilities for visualization
• Critically evaluate visualizations in the wild
Data Visualization
• Data visualization is the
• graphical representation of data and information
• Transformation of the symbolic into the geometric [McCormick et al. 1987]
• Often uses visual elements such as charts, graphs, maps, infographics
• Goal: see and understand trends, outliers, and patterns in data.
Data Visualization
• Graphs typically display quantitative information, and include ≥2
scales/axes.
• Charts display discrete relationships among discrete entities.
• Example: flow charts
• Maps display spatial information, possibly with labels and other
information.
• Diagrams are schematic pictures whose parts are symbolic (i.e., not
photographic).
• Infographics are a sort of hybrid of all of the above
Visualization Goals
• Presentation
• Known facts about data
• Task: Communicate results
• Exploration
• Data without hypothesis
• Task: Generate hypothesis
• Confirmation
• Hypothesis is given
• Task: Verify / falsify hypothesis
Visual Representations vs Summary Statistics
Anscombe's Quartet
Visual Representations vs Summary Statistics
Datasaurus Dozen
https://www.math.csi.cuny.edu/~mvj/GC-DataViz-S23/lectures/L1.htm
Why Data Visualization?
"The ability to take data—to be able to understand it,
to process it, to extract value from it, to visualize it, to
communicate it—that's going to be a hugely
important skill in the next decades, […] because now
we really do have essentially free and ubiquitous data.
So the complimentary scarce factor is the ability to
understand that data and extract value from it."
Hal Varian, Google's Chief Economist
Why Data Visualization? A Poverty of
Attention
"What information consumes is rather obvious:
it consumes the attention of its recipients.
Hence a wealth of information creates a poverty of attention,
and a need to allocate that attention efficiently among the
overabundance of information sources that might consume it.”
Herbert A. Simon, Nobel Prize winner
Political science, Economics, Computer science (AI),
Cognitive Psychology.
Data Types: Shneiderman’s Taxonomy
• 1D (sequences)
• Temporal
• 2D (maps)
• 3D (shapes)
• nD (relational)
• Trees (hierarchical)
• Networks (graphs)
• Others (text)
The Eyes Have It: A Task by Data Type Taxonomy for Information Visualization [Shneiderman, 96]
Data Types: Munzner’s Taxonomy
• Item
• An individual entity that is discrete (e.g., a number, a row of a table, etc.)
• Attribute
• Some measurable property of the item
• Link
• A relationship between items
• Position
• A spatial data (e.g., location, coordinates, etc.)
• Grid
• The strategy for sampling continuous data (e.g., geometric and/or topological)
between cells.
Items
• An item is a discrete individual entity
• row in a table
• node in a network
Attributes
• Some measurable property about item
Attributes
• Nominal (labels or categories)
• Fruits: apples, oranges, …
• Ordered
• Quality of meat: Grade A, AA, AAA
• Quantitative - Interval (location of zero arbitrary)
• Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45)
• Only differences (i.e. intervals) may be compared
• Quantitative - Ratio (zero fixed)
• Physical measurement: Length, Mass, Temp, …
• Counts and amounts
Attributes
• Nominal (labels or categories)
• Operations: =, ≠
• Ordered
• Operations: =, ≠, <, >
• Q - Interval (location of zero arbitrary)
• Operations: =, ≠, <, >, -
• Can measure distances, differences etc
• Q - Ratio (zero fixed)
• Operations: =, ≠, <, >, -, %
• Can measure ratios or proportions
Quick Check
• Positive/Negative
• Hot, Warm, Cold
• Temperature Value
• Marital Status: Single, Married, Divorced,
• Year of Birth
• Age
• Gender
Attribute Semantics
• Sequential attribute has values in a certain sequence. Ex. age, height,
weight.
• Diverging attribute is one for which we can determine a middle value
(or zero-value) such that all the values above it are greater than it (or
positive) and all the values below it are less than it (or negative). Ex.
temperature, altitude.
• Cyclic attribute has values that repeat in a period of time. Ex. hour,
week, year.
Dataset Types: Munzner’s Taxonomy
• A dataset is a collection of information that is the target of analysis.
• Complex combination of multiple data types
• Four major types
• Tables
• Networks
• Fields
• Geometry
• [Clusters, Sets, Lists : not very common]
Dataset Types
Dataset Types
Tables Networks & Fields Geometry Clusters, Sets
Trees and Lists
Items Items (Node) Grids Items Items
Attributes Links Positions Positions
Attributes Attributes
Table/Relational Dataset
• Most common dataset type – we will almost exclusively focus on this
• In a 2D table,
• each row is an item
• each column is an attribute
• each cell has a value of a particular item and a particular attribute.
(2D) Table Dataset
Multidimensional Tables
• Multiple indices that allows different dimensions to be connected
Network Dataset type
• Networks (or graphs) are useful to represent relationship between
several items.
• An item is a node and a link is a relation between two items.
• Each node can have associated attributes (e.g., city size)
• Each link may also have associated attributes (e.g., distance between
two cities).
• A tree is just a special type of network.
Fields Dataset type
• Commonly used for scientific analysis
Fields Dataset type
• A field contains attribute values associated with cells.
• Each cell contains measurements or calculations from a continuous
domain.
• Obtaining values from a continuous domain is usually very challenging
because the domain is a continuum.
• A good sampling strategy for taking measurements from discrete
positions is needed.
• Special case: spatial field
Fields Dataset type
Geometry Dataset type
• Data about shape of items with explicit spatial information.
• The items could be points, lines/curves, 2D surfaces/regions, 3D
volumes, or even higher dimensional data.
• Geometry datasets are typically spatial
Visual Encoding
The way in which data is
presented changes how
we consume it, drastically.
Carlos Scheidegger
Visual Encoding
Data Visualization
• Physical data types Visual Encoding • Visual Channels
• int, float, string • x, y, color, opacity
• Graphical marks
• Conceptual data types • point, line, area
• temperature, location
Bertin’s Semiology of Graphics
• Invariants: the idea, concept or topic that unifies all visual marks.
• Components: variable features of the invariant
• Elements: atomic parts within each component consists
Bertin’s Semiology of Graphics
• Elements: atomic parts within each component consists
• Visual mark : A mark is a basic graphical element in an image.
• Position on the plane
Marks: Munzner’s Taxonomy
Marks: Wilkinson’s Taxonomy
Visual Channels
• A visual channel is a way to control the appearance of marks,
independent of the dimensionality of the geometric primitives.
• Popular ones: position, color, shape, tilt and size.
• Unusual ones: depth, luminance, saturation, curvature, etc.
Visual Channels
• Bar charts show two attributes
• Vertical is quantitative
• Horizontal is categorical
• A second quantitative attribute can be
encoded by using the visual channel of
horizontal spatial position. (scatter plot)
• One more categorical attribute may be added by
using the visual channel of hue (i.e., color)
• The visual channel of size may be used to add yet
another quantitative attribute
Visual Channels
• Each attribute is encoded with a single channel.
• Identity channels tells about what something is or where it is.
• Good for unordered categorical attributes
• Examples: circles/triangles, colors, in motion, inside/outside of an area, etc.
• Magnitude channels tell us how much of something there is.
• Good for ordered (ordinal and quantitative) attributes
• Examples: longer/shorter, larger/smaller, brighter/darker, etc.
Design Principles for Visual Channels
• The expressiveness principle states that the visual encoding should
express all of, and only, the information in the dataset attributes.
• Ordered data should be shown so that our perceptual system senses as
ordered.
• Unordered data should not be shown in a way that perceptually implies an
ordering.
• Recall: identity channels for categorical and magnitude for ordered
Design Principles for Visual Channels
• The effectiveness principle states that the importance of the attribute
should match with the salience (i.e., noticeability) of the channel.
• The most important attributes should be encoded with the most
effective channels in order to be most noticeable.
• Less important attributes can be matched with less effective channel.
Design Principles for Visual Channels
• How do we know which attributes are most important?
• Very task and dataset specific
• How do we know which channels are most effective?
• Accuracy
• Discriminability
• Separability
• Ability to provide visual popout
• Ability to provide perceptual grouping
Channel Accuracy
• Accuracy: How close is human perceptual judgment to some
objective measurement of the stimulus.
• Human perceive different visual channels with different levels of
accuracy; they are not all equally distinguishable.
• Most stimulus are magnified or compressed, with few remaining
unchanged.
Channel Accuracy
Channel Discriminability
• If a data attribute is encoded using a particular visual channel, are the
differences between items (of this attribute) perceptible to the
human as intended?
• Binning: each bin is a distinguishable step or level from the other.
• Higher the number, more discriminative a channel is
Channel Discriminability
Channel Discriminability
• Ratios are more important than magnitude
• Most continuous variation in stimuli are perceived in discrete steps
Channel Discriminability
• There are three bins: 500, 250,
100
• Adding more will make chart
hard to read and discriminate
Channel Separability
• Visual channels are not completely independent from each other,
because some have dependencies and interactions with other.
• Consider potential interaction between visual channels, ranging from
the orthogonal and independent separable channels to the
inextricably combined integral channels.
Channel Separability
Channel Separability
• Position and Hue channels • Size and Hue
• Good separation • Decent separation
• Left and right are separable • Big and small are separable
• Red and blue are separable • Blue and violet are not so separable
Channel Separability
Channel Separability
• Horizontal and vertical size • Red and green color channels
channels • Terrible separation
• Interference with area channel
• Unexpected grouping: circle, flat
and tall ellipses
Channel Popout
• Visual Popout: Making a distinct item stands out from many others
immediately.
• Popout is also known as preattentive processing
Color: Good Shape: less good Conjunction: Color + Shape
Not so good.
Channel Popout
Tilt/Angle Size Shape
Magnitude Estimation
4X
How much longer?
Magnitude Estimation
4X
How much steeper slope?
Magnitude Estimation
4.5 X
How much larger area?
Magnitude Estimation
2X
How much darker?
Magnitude Estimation
4X
How much bigger value?
Relative Magnitude Comparison
Modified from Hearst’s interpretation of Mackinlay ’88
Bertin’s Levels of Organization
Nominal Ordinal Quantitative
Position Y Y Y
Size Y Y Y
Value Y Y Meh
Texture Y Meh
Color Y
Orientation Y
Shape Y
Mackinlay’s Ranking
Empirical estimates of encoding effectiveness
(Use this!) Perceptual ranking of channels
Colors
https://www.reddit.com/r/funny/comments/3jotqq/tadah/
Recall: Bertin’s Levels of Organization
Nominal Ordinal Quantitative
Position Y Y Y
Size Y Y Y
Value Y Y Meh
Texture Y Meh
Color Y
Orientation Y
Shape Y
Recall: Mackinlay’s Ranking
Recall: Perceptual Ranking of Channels
Order these colors
Different people will order it differently. So be careful when you use color for ordered items
Semi-Myth: Color Emotion Guide
https://thelogocompany.net/psychology-of-color-in-logo-design/
Decomposing Color
• Color is not a monolithic entity: it can be decomposed to individual
components
• Typically, they are decomposed to three visual channels
• Because human color perception is 3-dimensional.
• L, M and S cone cells in retina
Decomposing Color
• RGB
• Suitable for computer screens
• CMY: Cyan + Magenta + Yellow
• Suitable for printing
• Needs lot of ink for darker colors
https://www.sciencephoto.com/media/1156405/view/additive-and-subtractive-
colour-mixing-illustration
HSL Color Space
• Hue, Saturation and Lightness
• More intuitive and perceptually relevant
• Used by artists
• Relevant to data scientists when preparing charts
• Hue axis captures what we normally think of as pure colors (e.g., red,
green, blue, yellow, purple, etc.)
• Saturation axis is the amount of white mixed with a pure color.
• Pink is desaturated Red
• Lightness axis is the amount of black mixed with a color.
HSL Color Space
Order these colors
Order these colors
Order these colors
Lightness Saturation Hue
Perceived as ordered Not so much
Color Spaces, Maps and Palettes
• Color space is a specific organization of colors allowing reproducible
representation of colors
• Color model represents colors as tuples of numbers (RGB, CMY, HSL)
• Color map defines a mapping between colors and data values, a visual
encoding with colors.
• Categorical vs ordered color maps
• Continuous vs Segmented color maps
Color Palette
• A palette or color scheme is a choice of colors
originally used in artistic and design contexts.
• Provides aestetic appeal and intuitive contrast
• One of the good ones to use is Brewer palettes
• Most viz libraries provide it
Rainbow Colormap (Do not use it!!!!!)
• Rainbow colormap is perceptually nonlinear
• steps of the same size at different points in the colormap range are not
perceived equally by our eyes.
• Luckily, most libraries and languages are phasing it out
Rainbow Colormap
https://www.wsj.com/articles/weather-forecasts-should-get-over-the-rainbow-1538054430
Univariate Categorical
• Aim for maximum distinguishability
• Use hue as primary visual channel
• Even spreads around the hue circle to
maximize perceptual distance and
produce harmonious color combinations
Univariate Ordered
• Dimension 1: Attribute Type
• If Ordinal, use Segmented
• If Quantitative, use continuous
• Dimension 2: Attribute Ordering
• Sequential: ramp up either
luminance/brightness or saturation
• Diverging:
• Use neutral color for midpoint
• Saturated color for endpoints
• Distinguish endpoints with hue differences
Bivariate Data
• Visual channels are not independent, interpretation can get very
difficult
• Avoid using color in this case, unless you really know what you
are doing
• Brewer’s Color Scheme Chooser
Recall: Perceptual Ranking of Channels
Advice for Colorists
• Get it right in black and white
• Using colors is sometimes unavoidable
• Color is a less effective visual channel
• So use other more effect visual channels first
• Use color as a redundant visual channel to reinforce
existing channels
• This ensures your chart can stand on its own even in
black and white
• A tasteful combination of visual channels (eg
color, marker shape, line shape etc) can make
your visualization memorable
Adapted from Jeffrey Heer slides https://www.bekk.christmas/post/2022/01/visual-redundancy
Advice for Colorists
• Use only a few colors (~6 ideal)
• Respect the color blind
• 36% of population have some visual defect
• 5% of population are color blind (5-8% of men and 0.5-1% of women)
Advice for Colorists
• Colors should be
distinctive and named
• If you do not what teal
is, do not use it :)
Adapted from Jeffrey Heer slides
Advice for Colorists
• Use cultural conventions; appreciate symbolism
• Red, gold are considered lucky colors in China
• Popular colors
https://today.yougov.com/international/articles/12335-why-blue-worlds-favorite-color
Advice for Colorists
• Use semantically resonant color assignments
Adapted from Jeffrey Heer slides