Book Link: kupdf.net_visualization-analysis-and-designpdf.
pdf
From Data to Visualization
• All aesthetics fall into one of two groups: those that can represent
continuous data and those that cannot.
From Data to Visualization
• quantitative
• When data is numerical
• qualitative
• When it is categorical
• Variables holding qualitative data are
factors, and
• the different categories are called levels.
• The levels of a factor are most commonly
without order
• (as in the example of dog, cat, fish),
• but factors can also be ordered,
• (as in the example of good, fair, poor)
From Data to Visualization
• Month
• ordered factor,
• day
• discrete numerical value,
• location
• unordered factor,
• station ID
• unordered factor, and
• temperature
• continuous numerical value.
Scales Map Data Values onto
Aesthetics
• To map data values onto aesthetics,
• need to specify which data values correspond to which specific aesthetics values.
• which data values are represented by particular shapes or colors.
• This mapping between
• data values and aesthetics values is created via scales.
• A scale defines a unique mapping between data and aesthetics
• a scale must be one-to-one,
• such that for each specific data value there is exactly one aesthetics value and vice versa.
Scales Map Data Values onto
Aesthetics
• which variables we map onto
which scales.
• For example, instead of mapping
temperature onto the y axis and
location onto color, we can do the
opposite.
• key variable of interest
(temperature) is shown as color,
• to show sufficiently large colored
areas for the colors to convey useful
information
Scales Map Data Values onto
Aesthetics
• which variables we map onto which
scales.
• Therefore, for this visualization
• chosen squares instead of lines,
• one for each month and location, and
• colored them by the average temperature normal
for each month
• uses two position scales (month along the x axis
and location along the y axis), but neither is a
continuous scale
• uses two position scales (month along the x
axis and location along the y axis), but
neither is a continuous scale
• Month is an ordered factor with 12 levels and
• location is an unordered factor with 4 levels.
Scales Map Data Values onto
Aesthetics
• which variables we map onto which
scales.
• two position scales are both discrete.
• generally place the different levels of the factor at
an equal spacing along the axis.
• If the factor is ordered (as month),
• then the levels need to be placed in the
appropriate order.
• If the factor is unordered (as location),
• then the order is arbitrary
• Both Figures 2-3 and 2-4 used
• three scales in total,
• two position scales and
• one color scale.
Scales Map Data Values onto
Aesthetics
• use more than three scales at once
• uses five scales—
• two position scales,
• one color scale,
• one size scale, and
• one shape scale—and
• each scale represents a different variable
from the dataset.
Coordinate Systems and Axes
• 2D visualizations,
• two numbers are required to uniquely specify a point, and
• need two position scales.
• These two scales are usually but not necessarily
• the x and y axes of the plot.
• to specify the relative geometric arrangement of these scales
• the x axis runs horizontally and the y axis vertically, but choose other arrangements.
• the y axis run at an acute angle relative to the x axis, or
• one axis run in a circle and the other run radially.
• The combination of a set of position scales and their relative geometric
arrangement is called a coordinate system.
Coordinate Systems and Axes
• 2D Cartesian coordinate system
• where each location is uniquely specified by
an x and a y value.
• The x and y axes run orthogonally to each
other, and
• data values are placed in an even spacing along
both axes
• The two axes are continuous position scales,
and
• they can represent both positive and negative
real numbers.
• to specify the range of numbers each axis
covers.
• the x axis runs from –2.2 to 3.2 and
• the y axis runs from –2.2 to 2.2.
Visualization Analysis & Design
Arrange Tables (Ch 7) I
@tamaramunzner
Visual Encoding design
• four visual encoding design choices
for
• how to arrange tabular data spatially.
Encoding design
• Why Arrange?
• The arrange design choice covers all aspects of the use of spatial channels for
visual encoding.
• The three highest ranked effectiveness channels
• for quantitative and ordered attributes are all related to spatial position:
• planar position against a common scale,
• planar position along an unaligned scale, and length.
• The highest ranked effectiveness channel for categorical attributes,
• grouping items within the same region, is also about the use of space.
• no nonspatial channels that are highly effective for all attribute types:
• the others are split into being suitable for either ordered or categorical attributes,
• but not both, because of the principle of expressiveness
Inputs for visual encoding:
• The input for visual encoding can
be of two models.
• Mathematical model.
• Conceptual model
Types of visual encoding:
• The visual encoding is broadly • 2.Planar
classified into • drawing graphs across the X- and
• 1. Retinal Y-axis.
• Human beings are very sensitive to • Planar variables work for any data
these kinds of retinal variables. type
• Some of the retinal variables are
• colours, shapes, size and other kind of
• They work great to present any
properties. quantitative data
• Human beings can easily differentiate
between these kinds of retinal
variables.
• to present three or more variables?
• use the retinal variables!
Mapping of data types to
encoding:
• Quantitative: • Nominal:
• These are the data types that • In this kind of data types, the data
represent the quantity of certain is represented in the form of
data. • the names and categories.
• Some attributes of this type includes
• position, length, volume, area etc.,
• Ordinal:
• These are the data types that holds
data of some order.
• For example, days of the week,
• which holds the order in which they
should be represented.
Visual encoding principles /
variables:
Visual encoding principles /
variables:
• Position: • Size:
• graphs maps across the X- and Y- • the size indicates greater quantity
axis. • or importance
• They work great to present any • Ayville has a greater population
quantitative data. (roughly 4 times greater) than
• It deals with the flat screens and Beeton.
• just two planar variables. • The bigger the square size, the
larger is the quantity.
Visual encoding principles /
variables:
• Orientation • Color
• Orientation typically indicates • Color is often modeled as three
relative orientation, components – HSV model.
• direction of flow or movement.. • Hue indicates the redness or
greenness of the color that it
represents.
• Many hues have particular
associations
Visual encoding principles /
variables:
• Color
• Color is often modeled as three components – HSV model.
• Hue
• indicates the redness or greenness of the color that it represents.
• Many hues have particular associations
• Saturation or Chroma,
• describes about the color purity.
• Saturation is often used in combination with value,
• but may also be used independently
• to control the prominence of symbols.
• Value
• gives the color intensity as lightness or darkness.
• Value is often used like size
• to indicate quantity or importance.
• Other color models also exist, e.g.
• RGB (display devices),
• CMYK (cyan, magenta, yellow, black – used in printing).
Example
• visual encoding variable such as color, size, and positions represents
each of the different attributes.
• Color:
• It represents and distinguishes the continents in the given figure.
• Size:
• It represents the medals count of each of the continent in the given figure.
• X and Y:
• It represents the world map and their mapping with each of the continent.
Example
Focus on Tables
Spatial
25
Keys and values
• Key
– independent attribute
– used as unique index to look up items
– Key attributes can be categorical or ordinal
• values can be all three of the types:
– categorical, ordinal, or quantitative.
– The unique values for a categorical or ordered attribute are called
levels,
– simple tables: 1 key
– multidimensional tables: multiple keys
• value
– dependent attribute, value of cell
• The core design choices for visually encoding tables
directly
– relate to the semantics of the table’s attributes: 26
Keys and values
• key
– independent attribute
– used as unique index to look up items
– simple tables: 1 key
– multidimensional tables: multiple keys
• value
– dependent attribute, value of cell
• classify arrangements by keys used
– 0, 1, 2, ...
27
Express: Quantitative Values
Idiom: scatterplot
• express values (magnitudes)
– quantitative attributes
• no keys, only values
[A layered grammar of graphics. Wickham. Journ. Computational and Graphical Statistics 19:1 (2010), 3–28.] 29
Idiom: scatterplot
• express values (magnitudes)
– quantitative attributes
• no keys, only values
– data
• 2 quant attribs
– mark: points
– channels
• horiz + vert position
[A layered grammar of graphics. Wickham. Journ. Computational and Graphical Statistics 19:1 (2010), 3–28.] 30
Idiom: scatterplot
• express values (magnitudes)
– quantitative attributes
• no keys, only values
– data
• 2 quant attribs
– mark: points
– channels
• horiz + vert position
– tasks
• find trends, outliers, distribution, correlation, clusters
– scalability
• hundreds of items
[A layered grammar of graphics. Wickham. Journ. Computational and Graphical Statistics 19:1 (2010), 3–28.] 31
Scatterplots: Encoding more channels
• additional channels viable since using point
marks
– color
– size (1 quant attribute, used to control 2D area)
• note radius would mislead, take square root since area
grows quadratically
– shape
https://observablehq.com/@d3/scatterplot-with-shapes
32
Scatterplot tasks
33
Scatterplot tasks
• correlation
https://www.mathsisfun.com/data/scatter-xy-plots.html
34
Scatterplot tasks
• correlation
• clusters/groups, and clusters vs classes https://www.mathsisfun.com/data/scatter-xy-plots.html
https://www.cs.ubc.ca/labs/imager/tr/2014/DRVisTasks/
35
• distribution of regions into three operations:
– separating into regions,
• The separation should be done according to an at tribute that is categorical
– aligning the regions (optional)
– ordering the regions.
• whereas alignment and ordering should be done by some other attribute that is ordered.
Some keys
38
Some keys: Categorical regions
39
Regions: Separate, order, align
• regions: contiguous bounded areas distinct from each other
– separate into spatial regions: one mark per region (for now)
• use categorical or ordered attribute to separate into regions
– no conflict with expressiveness principle for categorical attributes
• use ordered attribute to order and align regions
40
Separated and aligned and ordered
• best case
41
Separated and aligned but not ordered
• limitation: hard to know rank. what's 4th? what's 7th?
42
Separated and aligned but not ordered
43
Separated but not aligned or ordered
• limitation: hard to make comparisons with size (vs aligned position)
44
Idiom: bar chart
• one key, one value
– data
• 1 categ attrib, 1 quant attrib
– mark: lines
– channels
• length to express quant value
• spatial regions: one per mark
– separated horizontally, aligned vertically
– ordered by quant attrib
» by label (alphabetical), by length attrib (data-driven)
– task
• compare, lookup values
– scalability
• dozens to hundreds of levels for key attrib [bars], hundreds for values
45
Idiom: stacked bar chart
• a more complex glyph for each bar,
– where multiple sub-bars are stacked vertically
• The length of the composite glyph still encodes a value,
– as in a standard bar chart, but
• each subcomponent also encodes a length-encoded
value.
• Stacked bar charts show information about
multidimensional tables,
– specifically a two-dimensional table with two keys.
https://www.d3-graph-gallery.com/graph/
• The composite glyphs are arranged as a list
according to a primary key. barplot_stacked_basicWide.html
• The other secondary key is used
– in constructing the vertical structure of the glyph itself.
46
Idiom: stacked bar chart
• one more key
–data
• 2 categ attrib, 1 quant attrib
–mark: vertical stack of line marks
• glyph: composite object, internal structure from multiple
marks
–channels
• length and color hue
• spatial regions: one per glyph
–aligned: full glyph, lowest bar component
–unaligned: other bar components https://www.d3-graph-gallery.com/graph/
–task barplot_stacked_basicWide.html
• part-to-whole relationship
–scalability: asymmetric
• for stacked key attrib, 10-12 levels [segments]
47
Idiom: stacked bar chart
https://www.d3-graph-gallery.com/graph/
barplot_stacked_basicWide.html
48
Idiom: streamgraph
• shows how a numeric variable
changes over time for
multiple groups [Stacked Graphs Geometry & Aesthetics. Byron and Wattenberg. IEEE Trans. Visualization
and Computer Graphics (Proc. InfoVis 2008) 14(6): 1245–1252, (2008).]
• generalized stacked graph
– emphasizing horizontal continuity
• vs vertical items
– data
• 1 categ key attrib (movies)
• 1 ordered key attrib (time)
• 1 quant value attrib (counts)
– derived data
• geometry: layers, where height
encodes counts
• 1 quant attrib (layer ordering) 49
Idiom: streamgraph
• generalized stacked graph
– emphasizing horizontal continuity
• vs vertical items
[Stacked Graphs Geometry & Aesthetics. Byron and Wattenberg. IEEE Trans. Visualization
– data and Computer Graphics (Proc. InfoVis 2008) 14(6): 1245–1252, (2008).]
• 1 categ key attrib (movies)
• 1 ordered key attrib (time)
• 1 quant value attrib (counts)
– derived data
• geometry: layers, where height
encodes counts
• 1 quant attrib (layer ordering)
– scalability
https://flowingdata.com/2008/02/25/ebb-and-flow-of-box-office-receipts-over-past-20-years/
• hundreds of time keys
• dozens to hundreds of movies keys
– more than stacked bars: most layers don’t 50
extend across whole chart
Idiom: streamgraph
[Stacked Graphs Geometry & Aesthetics. Byron and Wattenberg. IEEE Trans. Visualization
and Computer Graphics (Proc. InfoVis 2008) 14(6): 1245–1252, (2008).]
51
Idiom: dot / line chart
• one key, one value
– data
• 2 quant attribs
– mark: points
AND line connection marks between them
– channels
• aligned lengths to express quant value
• separated and ordered by key attrib into
horizontal regions
– task
• find trend
– connection marks emphasize ordering of items
along key axis by explicitly showing relationship
between one item and the next
– scalability
• hundreds of key levels, hundreds of value 52
Choosing bar vs line charts
• depends on type of key
attrib
– bar charts if categorical
– line charts if ordered
• do not use line charts for
categorical key attribs
– violates expressiveness
principle
• implication of trend so strong after [Bars and Lines: A Study of Graphic Communication.
that it overrides semantics! Zacks and Tversky. Memory and Cognition 27:6 (1999),
– “The more male a person is, the 1073–1079.]
taller he/she is”
53
Choosing bar vs line charts
54
Chart axes: label them!
• best practice to label
– few exceptions: individual small multiple views could share axis label
https://xkcd.com/833/
55
Chart axes: avoid cropping y axis
• include 0 at bottom left or slope misleads
[Truncating the Y-Axis: Threat or Menace?
Correll, Bertini, & Franconeri, CHI 2020.] 56
Chart axes: avoid cropping y axis
• include 0 at bottom left or slope misleads
– some exceptions (arbitrary 0, small change matters)
[Truncating the Y-Axis: Threat or Menace?
Correll, Bertini, & Franconeri, CHI 2020.] 57
Idiom: Indexed line charts
• data: 2 quant attribs
– 1 key + 1 value
• derived data: new quant value attrib
– index
– plot instead of original value
• task: show change over time
– principle: normalized, not absolute
• scalability
– same as standard line chart
https://public.tableau.com/profile/ben.jones#!/vizhome/CAStateRevenues/Revenues
58
Idiom: Gantt charts
• one key, two (related) values
– data
• 1 categ attrib, 2 quant attribs
– mark: line
• length: duration
– channels https://www.r-bloggers.com/gantt-charts-in-r-using-plotly/
• horiz position: start time
(+end from duration)
– task
• emphasize temporal overlaps & start/end dependencies
between items
– scalability
• dozens of key levels [bars]
• hundreds of value levels [durations]
59
Idiom: Slopegraphs
• two values
– data
• 2 quant value attribs
• (1 derived attrib: change magnitude)
– mark: point + line
• line connecting mark between pts
– channels
• 2 vertical pos: express attrib value
• (linewidth/size, color)
– task https://public.tableau.com/profile/ben.jones#!/vizhome/Slopegraphs/Slopegraphs
• emphasize changes in rank/value
– scalability
• hundreds of value levels
• dozens of items
60
2 Keys
61
Idiom: heatmap
• two keys, one value
– data
• 2 categ attribs (gene, experimental condition)
• 1 quant attrib (expression levels)
– marks: point
• separate and align in 2D matrix
– indexed by 2 categorical attributes
– channels
• color by quant attrib
– (ordered diverging colormap)
– task
• find clusters, outliers
– scalability
• 1M items, 100s of categ levels, ~10 quant attrib levels
62
Heatmap reordering
https://blogs.sas.com/content/iml/2018/05/02/reorder-variables-correlation-heat-map.html 63
Idiom: cluster heatmap
• in addition
– derived data
• 2 cluster hierarchies
– dendrogram
• parent-child relationships in tree with connection line marks
• leaves aligned so interior branch heights easy to compare
– heatmap
• marks (re-)ordered by cluster hierarchy traversal
• task: assess quality of clusters found by automatic methods
64
Idiom: cluster heatmap
65
Visualization Analysis & Design
Tables (Ch 7) II
Tamara Munzner
Department of Computer
Science
University of British Columbia
@tamaramunzner
67
Idioms: radial bar chart, star plot
• star plot
– line mark, radial axes meet at central point
• radial bar chart
– line mark, radial axes meet at central ring
– channels: length, angle/orientation
• bar chart
– rectilinear axes, aligned vertically
• accuracy
– length not aligned with radial layouts
• less accurately perceived than rectilinear aligned
[Vismon: Facilitating Risk Assessment and Decision Making In Fisheries Management. Booshehrian, Möller,
Peterman, and Munzner. Technical Report TR 2011-04, Simon Fraser University, School of Computing 68
Science, 2011.]
Idiom: radar plot
• radial line chart
– point marks, radial layout
– connecting line marks
• avoid unless data is cyclic
69
“Radar graphs: Avoid them (99.9% of the time)”
original
difficult to
interpret
redesign for
rectilinear
http://www.thefunctionalart.com/2012/11/radar-graphs-avoid-them-999-of-time.html 70
Idioms: pie chart, coxcomb chart
• pie chart
–interlocking area marks with angle channel: 2D area
varies
• separated & ordered radially, uniform height
–accuracy: area less accurate than rectilinear aligned
line length
–task: part-to-whole judgements
• coxcomb chart
–line marks with length channel: 1D length varies
• separated & ordered radially, uniform width
–direct analog to radial bar charts
• data [A layered grammar of graphics. Wickham. Journ.
–1 categ key attrib, 1 quant value attrib
Computational and Graphical Statistics 19:1 (2010), 3– 71
Coxcomb / nightingale rose / polar area chart
• invented by Florence Nightingale:
Diagram of the Causes of Mortality in the Army in the East
72
Coxcomb: perception
• encode: 1D length
• decode/perceive: 2D area
• nonuniform line/sector width
as length increases
– so area variation is nonlinear wrt nonuniform width as length
uniform width as length increases
line mark length! increases
• bar chart safer: uniform width,
so area is linear with line mark
length radial & rectilinear bars: uniform width as length
increases
– both radial & rectilinear cases
73
Pie charts: perception
• some empirical evidence that people
respond to arc length
– decode/perceive: not angles
– maybe also areas?…
• donut charts no worse than pie charts
r Areas: Individual Data Encodings in Pie and Donut Charts. Skau and Kosara. Proc. EuroVis 2016.]
https://eagereyes.org/blog/2016/an-illustrated-tour-of-the-pie-chart-study-results 74
Pie charts: best practices
• not so bad for two (or few) levels, for part-to-whole task
https://eagereyes.org/pie-charts 75
Pie charts: best practices
• not so bad for two (or few) levels, for part-to-whole task
• dubious for several levels if details matter
https://eagereyes.org/pie-charts 76
Pie charts: best practices
• not so bad for two (or few) levels, for part-to-whole task
• dubious for several levels if details matter
• terrible for many levels
https://eagereyes.org/pie-charts 77
Idioms: normalized stacked bar chart
• task
– part-to-whole judgements
• normalized stacked bar chart
– stacked bar chart, normalized to full vert height
– single stacked bar equivalent to full pie
• high information density: requires narrow rectangle
• pie chart
– information density: requires large circle
http://bl.ocks.org/mbostock/388620
8
, 78
Idiom: glyphmaps
• rectilinear good for
linear vs nonlinear trends
• radial good for cyclic patterns
– evaluating periodicity
[Glyph-maps for Visually Exploring Temporal Patterns in Climate Data and Models.
Wickham, Hofmann, Wickham, and Cook. Environmetrics 23:5 (2012), 382–393.]
79
80
Idiom: SPLOM
• scatterplot matrix
(SPLOM)
– rectilinear axes,
point mark
– all possible pairs of axes
– scalability
• one dozen attribs
• dozens to hundreds of
items
81
Idioms: parallel coordinates
• scatterplot limitation
– visual representation with
orthogonal axes
– can show only two attributes with
spatial position channel
after [Visualization Course Figures. McGuffin, 2014.
http://www.michaelmcguffin.com/courses/vis/] 82
Idioms: parallel coordinates
• scatterplot limitation
–visual representation with orthogonal axes
–can show only two attributes with spatial
position channel
• alternative: line up axes in parallel to
show many attributes with position
–item encoded with a line with n segments
–n is the number of attributes shown
• parallel coordinates
–parallel axes, jagged line for item
–rectilinear axes, item as point
• axis ordering is major challenge
–scalability
• dozens of attribs
• hundreds of items after [Visualization Course Figures. McGuffin, 2014.
http://www.michaelmcguffin.com/courses/vis/] 83
Task: Correlation
• scatterplot matrix
–positive correlation
• diagonal low-to-high https://www.mathsisfun.com/data/scatter-xy-plots.html
–negative correlation
• diagonal high-to-low
–uncorrelated: spread out
• parallel coordinates
–positive correlation
• parallel line segments
–negative correlation
• all segments cross at halfway
point
[Hyperdimensional Data Analysis Using Parallel
–uncorrelated Coordinates. Wegman. Journ. American Statistical
Association 85:411 (1990), 664–675.] 84
Parallel coordinates, limitations
• visible patterns only between neighboring axis pairs
• how to pick axis order?
– usual solution: reorderable axes, interactive exploration
– same weakness as many other techniques
• downside of interaction: human-powered search
– some algorithms proposed, none fully solve
85
Orientation limitations
• rectilinear: scalability wrt #axes
• 2 axes best, 3 problematic, 4+ impossible
86
Orientation limitations
• rectilinear: scalability wrt #axes
• 2 axes best, 3 problematic, 4+ impossible
• parallel: unfamiliarity, training time
87
Orientation limitations
• rectilinear: scalability wrt #axes
• 2 axes best, 3 problematic, 4+ impossible
• parallel: unfamiliarity, training time
• radial: perceptual limits
– polar coordinate asymmetry
• angles lower precision than length
• nonuniform sector width/size depending on radial distance
– frequently problematic
• but sometimes can be deliberately exploited!
– for 2 attribs of very unequal importance
[Uncovering Strengths and Weaknesses of Radial Visualizations - an Empirical
Approach. Diehl, Beck and Burch. IEEE TVCG (Proc. InfoVis) 16(6):935--942, 2010.] 88
Layout density
8
9
Idiom: Dense software overviews
• data: text
– text + 1 quant attrib per line
• derived data:
– one pixel high line
– length according to original
• color line by attrib
• scalability
– 10K+ lines
[Visualization of test information to assist fault localization. Jones, Harrold, Stasko. Proc. ICSE 2002, p
90
467-477.]
Arrange tables
9
1
9
2
9
3