[go: up one dir, main page]

0% found this document useful (0 votes)
64 views93 pages

Data Visualization

The document discusses the principles of data visualization, focusing on the mapping of data values to aesthetics through scales, and the importance of coordinate systems in 2D visualizations. It covers various types of visual encoding, including quantitative, qualitative, and ordinal data, and emphasizes the effectiveness of spatial channels for visual representation. Additionally, it explores different idioms for visualizing data, such as scatterplots, bar charts, and stacked bar charts, highlighting their design choices and tasks they facilitate.

Uploaded by

sumrunkhan904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views93 pages

Data Visualization

The document discusses the principles of data visualization, focusing on the mapping of data values to aesthetics through scales, and the importance of coordinate systems in 2D visualizations. It covers various types of visual encoding, including quantitative, qualitative, and ordinal data, and emphasizes the effectiveness of spatial channels for visual representation. Additionally, it explores different idioms for visualizing data, such as scatterplots, bar charts, and stacked bar charts, highlighting their design choices and tasks they facilitate.

Uploaded by

sumrunkhan904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 93

Book Link: kupdf.net_visualization-analysis-and-designpdf.

pdf
From Data to Visualization
• All aesthetics fall into one of two groups: those that can represent
continuous data and those that cannot.
From Data to Visualization
• quantitative
• When data is numerical
• qualitative
• When it is categorical
• Variables holding qualitative data are
factors, and
• the different categories are called levels.
• The levels of a factor are most commonly
without order
• (as in the example of dog, cat, fish),
• but factors can also be ordered,
• (as in the example of good, fair, poor)
From Data to Visualization
• Month
• ordered factor,
• day
• discrete numerical value,
• location
• unordered factor,
• station ID
• unordered factor, and
• temperature
• continuous numerical value.
Scales Map Data Values onto
Aesthetics
• To map data values onto aesthetics,
• need to specify which data values correspond to which specific aesthetics values.
• which data values are represented by particular shapes or colors.
• This mapping between
• data values and aesthetics values is created via scales.
• A scale defines a unique mapping between data and aesthetics
• a scale must be one-to-one,
• such that for each specific data value there is exactly one aesthetics value and vice versa.
Scales Map Data Values onto
Aesthetics
• which variables we map onto
which scales.
• For example, instead of mapping
temperature onto the y axis and
location onto color, we can do the
opposite.
• key variable of interest
(temperature) is shown as color,
• to show sufficiently large colored
areas for the colors to convey useful
information
Scales Map Data Values onto
Aesthetics
• which variables we map onto which
scales.
• Therefore, for this visualization
• chosen squares instead of lines,
• one for each month and location, and
• colored them by the average temperature normal
for each month
• uses two position scales (month along the x axis
and location along the y axis), but neither is a
continuous scale
• uses two position scales (month along the x
axis and location along the y axis), but
neither is a continuous scale
• Month is an ordered factor with 12 levels and
• location is an unordered factor with 4 levels.
Scales Map Data Values onto
Aesthetics
• which variables we map onto which
scales.
• two position scales are both discrete.
• generally place the different levels of the factor at
an equal spacing along the axis.
• If the factor is ordered (as month),
• then the levels need to be placed in the
appropriate order.
• If the factor is unordered (as location),
• then the order is arbitrary

• Both Figures 2-3 and 2-4 used


• three scales in total,
• two position scales and
• one color scale.
Scales Map Data Values onto
Aesthetics
• use more than three scales at once
• uses five scales—
• two position scales,
• one color scale,
• one size scale, and
• one shape scale—and
• each scale represents a different variable
from the dataset.
Coordinate Systems and Axes
• 2D visualizations,
• two numbers are required to uniquely specify a point, and
• need two position scales.
• These two scales are usually but not necessarily
• the x and y axes of the plot.
• to specify the relative geometric arrangement of these scales
• the x axis runs horizontally and the y axis vertically, but choose other arrangements.
• the y axis run at an acute angle relative to the x axis, or
• one axis run in a circle and the other run radially.
• The combination of a set of position scales and their relative geometric
arrangement is called a coordinate system.
Coordinate Systems and Axes
• 2D Cartesian coordinate system
• where each location is uniquely specified by
an x and a y value.
• The x and y axes run orthogonally to each
other, and
• data values are placed in an even spacing along
both axes
• The two axes are continuous position scales,
and
• they can represent both positive and negative
real numbers.
• to specify the range of numbers each axis
covers.
• the x axis runs from –2.2 to 3.2 and
• the y axis runs from –2.2 to 2.2.
Visualization Analysis & Design

Arrange Tables (Ch 7) I

@tamaramunzner
Visual Encoding design
• four visual encoding design choices
for
• how to arrange tabular data spatially.
Encoding design
• Why Arrange?
• The arrange design choice covers all aspects of the use of spatial channels for
visual encoding.
• The three highest ranked effectiveness channels
• for quantitative and ordered attributes are all related to spatial position:
• planar position against a common scale,
• planar position along an unaligned scale, and length.
• The highest ranked effectiveness channel for categorical attributes,
• grouping items within the same region, is also about the use of space.
• no nonspatial channels that are highly effective for all attribute types:
• the others are split into being suitable for either ordered or categorical attributes,
• but not both, because of the principle of expressiveness
Inputs for visual encoding:
• The input for visual encoding can
be of two models.

• Mathematical model.
• Conceptual model
Types of visual encoding:
• The visual encoding is broadly • 2.Planar
classified into • drawing graphs across the X- and
• 1. Retinal Y-axis.
• Human beings are very sensitive to • Planar variables work for any data
these kinds of retinal variables. type
• Some of the retinal variables are
• colours, shapes, size and other kind of
• They work great to present any
properties. quantitative data
• Human beings can easily differentiate
between these kinds of retinal
variables.
• to present three or more variables?
• use the retinal variables!
Mapping of data types to
encoding:
• Quantitative: • Nominal:
• These are the data types that • In this kind of data types, the data
represent the quantity of certain is represented in the form of
data. • the names and categories.
• Some attributes of this type includes
• position, length, volume, area etc.,
• Ordinal:
• These are the data types that holds
data of some order.
• For example, days of the week,
• which holds the order in which they
should be represented.
Visual encoding principles /
variables:
Visual encoding principles /
variables:
• Position: • Size:
• graphs maps across the X- and Y- • the size indicates greater quantity
axis. • or importance
• They work great to present any • Ayville has a greater population
quantitative data. (roughly 4 times greater) than
• It deals with the flat screens and Beeton.
• just two planar variables. • The bigger the square size, the
larger is the quantity.
Visual encoding principles /
variables:
• Orientation • Color
• Orientation typically indicates • Color is often modeled as three
relative orientation, components – HSV model.
• direction of flow or movement.. • Hue indicates the redness or
greenness of the color that it
represents.
• Many hues have particular
associations
Visual encoding principles /
variables:
• Color
• Color is often modeled as three components – HSV model.
• Hue
• indicates the redness or greenness of the color that it represents.
• Many hues have particular associations
• Saturation or Chroma,
• describes about the color purity.
• Saturation is often used in combination with value,
• but may also be used independently
• to control the prominence of symbols.
• Value
• gives the color intensity as lightness or darkness.
• Value is often used like size
• to indicate quantity or importance.

• Other color models also exist, e.g.


• RGB (display devices),
• CMYK (cyan, magenta, yellow, black – used in printing).
Example
• visual encoding variable such as color, size, and positions represents
each of the different attributes.
• Color:
• It represents and distinguishes the continents in the given figure.
• Size:
• It represents the medals count of each of the continent in the given figure.
• X and Y:
• It represents the world map and their mapping with each of the continent.
Example
Focus on Tables

Spatial

25
Keys and values
• Key
– independent attribute
– used as unique index to look up items
– Key attributes can be categorical or ordinal
• values can be all three of the types:
– categorical, ordinal, or quantitative.
– The unique values for a categorical or ordered attribute are called
levels,
– simple tables: 1 key
– multidimensional tables: multiple keys
• value
– dependent attribute, value of cell
• The core design choices for visually encoding tables
directly
– relate to the semantics of the table’s attributes: 26
Keys and values
• key
– independent attribute
– used as unique index to look up items
– simple tables: 1 key
– multidimensional tables: multiple keys
• value
– dependent attribute, value of cell
• classify arrangements by keys used
– 0, 1, 2, ...

27
Express: Quantitative Values
Idiom: scatterplot
• express values (magnitudes)
– quantitative attributes
• no keys, only values

[A layered grammar of graphics. Wickham. Journ. Computational and Graphical Statistics 19:1 (2010), 3–28.] 29
Idiom: scatterplot
• express values (magnitudes)
– quantitative attributes
• no keys, only values
– data
• 2 quant attribs
– mark: points
– channels
• horiz + vert position

[A layered grammar of graphics. Wickham. Journ. Computational and Graphical Statistics 19:1 (2010), 3–28.] 30
Idiom: scatterplot
• express values (magnitudes)
– quantitative attributes
• no keys, only values
– data
• 2 quant attribs
– mark: points
– channels
• horiz + vert position
– tasks
• find trends, outliers, distribution, correlation, clusters
– scalability
• hundreds of items

[A layered grammar of graphics. Wickham. Journ. Computational and Graphical Statistics 19:1 (2010), 3–28.] 31
Scatterplots: Encoding more channels
• additional channels viable since using point
marks
– color
– size (1 quant attribute, used to control 2D area)
• note radius would mislead, take square root since area
grows quadratically
– shape

https://observablehq.com/@d3/scatterplot-with-shapes
32
Scatterplot tasks

33
Scatterplot tasks
• correlation

https://www.mathsisfun.com/data/scatter-xy-plots.html

34
Scatterplot tasks
• correlation

• clusters/groups, and clusters vs classes https://www.mathsisfun.com/data/scatter-xy-plots.html

https://www.cs.ubc.ca/labs/imager/tr/2014/DRVisTasks/
35
• distribution of regions into three operations:
– separating into regions,
• The separation should be done according to an at tribute that is categorical
– aligning the regions (optional)
– ordering the regions.
• whereas alignment and ordering should be done by some other attribute that is ordered.
Some keys

38
Some keys: Categorical regions

39
Regions: Separate, order, align

• regions: contiguous bounded areas distinct from each other


– separate into spatial regions: one mark per region (for now)
• use categorical or ordered attribute to separate into regions
– no conflict with expressiveness principle for categorical attributes
• use ordered attribute to order and align regions

40
Separated and aligned and ordered
• best case

41
Separated and aligned but not ordered
• limitation: hard to know rank. what's 4th? what's 7th?

42
Separated and aligned but not ordered

43
Separated but not aligned or ordered
• limitation: hard to make comparisons with size (vs aligned position)

44
Idiom: bar chart
• one key, one value
– data
• 1 categ attrib, 1 quant attrib
– mark: lines
– channels
• length to express quant value
• spatial regions: one per mark
– separated horizontally, aligned vertically
– ordered by quant attrib
» by label (alphabetical), by length attrib (data-driven)
– task
• compare, lookup values
– scalability
• dozens to hundreds of levels for key attrib [bars], hundreds for values
45
Idiom: stacked bar chart
• a more complex glyph for each bar,
– where multiple sub-bars are stacked vertically
• The length of the composite glyph still encodes a value,
– as in a standard bar chart, but
• each subcomponent also encodes a length-encoded
value.
• Stacked bar charts show information about
multidimensional tables,
– specifically a two-dimensional table with two keys.
https://www.d3-graph-gallery.com/graph/
• The composite glyphs are arranged as a list
according to a primary key. barplot_stacked_basicWide.html

• The other secondary key is used


– in constructing the vertical structure of the glyph itself.
46
Idiom: stacked bar chart
• one more key
–data
• 2 categ attrib, 1 quant attrib
–mark: vertical stack of line marks
• glyph: composite object, internal structure from multiple
marks
–channels
• length and color hue
• spatial regions: one per glyph
–aligned: full glyph, lowest bar component
–unaligned: other bar components https://www.d3-graph-gallery.com/graph/

–task barplot_stacked_basicWide.html

• part-to-whole relationship
–scalability: asymmetric
• for stacked key attrib, 10-12 levels [segments]
47
Idiom: stacked bar chart

https://www.d3-graph-gallery.com/graph/

barplot_stacked_basicWide.html

48
Idiom: streamgraph
• shows how a numeric variable
changes over time for
multiple groups [Stacked Graphs Geometry & Aesthetics. Byron and Wattenberg. IEEE Trans. Visualization
and Computer Graphics (Proc. InfoVis 2008) 14(6): 1245–1252, (2008).]
• generalized stacked graph
– emphasizing horizontal continuity
• vs vertical items
– data
• 1 categ key attrib (movies)
• 1 ordered key attrib (time)
• 1 quant value attrib (counts)
– derived data
• geometry: layers, where height
encodes counts
• 1 quant attrib (layer ordering) 49
Idiom: streamgraph
• generalized stacked graph
– emphasizing horizontal continuity
• vs vertical items
[Stacked Graphs Geometry & Aesthetics. Byron and Wattenberg. IEEE Trans. Visualization
– data and Computer Graphics (Proc. InfoVis 2008) 14(6): 1245–1252, (2008).]

• 1 categ key attrib (movies)


• 1 ordered key attrib (time)
• 1 quant value attrib (counts)
– derived data
• geometry: layers, where height
encodes counts
• 1 quant attrib (layer ordering)
– scalability
https://flowingdata.com/2008/02/25/ebb-and-flow-of-box-office-receipts-over-past-20-years/
• hundreds of time keys
• dozens to hundreds of movies keys
– more than stacked bars: most layers don’t 50
extend across whole chart
Idiom: streamgraph

[Stacked Graphs Geometry & Aesthetics. Byron and Wattenberg. IEEE Trans. Visualization
and Computer Graphics (Proc. InfoVis 2008) 14(6): 1245–1252, (2008).]

51
Idiom: dot / line chart
• one key, one value
– data
• 2 quant attribs
– mark: points
AND line connection marks between them
– channels
• aligned lengths to express quant value
• separated and ordered by key attrib into
horizontal regions
– task
• find trend
– connection marks emphasize ordering of items
along key axis by explicitly showing relationship
between one item and the next
– scalability
• hundreds of key levels, hundreds of value 52
Choosing bar vs line charts
• depends on type of key
attrib
– bar charts if categorical
– line charts if ordered
• do not use line charts for
categorical key attribs
– violates expressiveness
principle
• implication of trend so strong after [Bars and Lines: A Study of Graphic Communication.
that it overrides semantics! Zacks and Tversky. Memory and Cognition 27:6 (1999),
– “The more male a person is, the 1073–1079.]
taller he/she is”

53
Choosing bar vs line charts

54
Chart axes: label them!
• best practice to label
– few exceptions: individual small multiple views could share axis label

https://xkcd.com/833/

55
Chart axes: avoid cropping y axis
• include 0 at bottom left or slope misleads

[Truncating the Y-Axis: Threat or Menace?


Correll, Bertini, & Franconeri, CHI 2020.] 56
Chart axes: avoid cropping y axis
• include 0 at bottom left or slope misleads
– some exceptions (arbitrary 0, small change matters)

[Truncating the Y-Axis: Threat or Menace?


Correll, Bertini, & Franconeri, CHI 2020.] 57
Idiom: Indexed line charts
• data: 2 quant attribs
– 1 key + 1 value
• derived data: new quant value attrib
– index
– plot instead of original value
• task: show change over time
– principle: normalized, not absolute
• scalability
– same as standard line chart

https://public.tableau.com/profile/ben.jones#!/vizhome/CAStateRevenues/Revenues
58
Idiom: Gantt charts
• one key, two (related) values
– data
• 1 categ attrib, 2 quant attribs
– mark: line
• length: duration
– channels https://www.r-bloggers.com/gantt-charts-in-r-using-plotly/

• horiz position: start time


(+end from duration)
– task
• emphasize temporal overlaps & start/end dependencies
between items
– scalability
• dozens of key levels [bars]
• hundreds of value levels [durations]
59
Idiom: Slopegraphs
• two values
– data
• 2 quant value attribs
• (1 derived attrib: change magnitude)
– mark: point + line
• line connecting mark between pts
– channels
• 2 vertical pos: express attrib value
• (linewidth/size, color)
– task https://public.tableau.com/profile/ben.jones#!/vizhome/Slopegraphs/Slopegraphs

• emphasize changes in rank/value


– scalability
• hundreds of value levels
• dozens of items
60
2 Keys

61
Idiom: heatmap
• two keys, one value
– data
• 2 categ attribs (gene, experimental condition)
• 1 quant attrib (expression levels)
– marks: point
• separate and align in 2D matrix
– indexed by 2 categorical attributes
– channels
• color by quant attrib
– (ordered diverging colormap)
– task
• find clusters, outliers
– scalability
• 1M items, 100s of categ levels, ~10 quant attrib levels
62
Heatmap reordering

https://blogs.sas.com/content/iml/2018/05/02/reorder-variables-correlation-heat-map.html 63
Idiom: cluster heatmap
• in addition
– derived data
• 2 cluster hierarchies
– dendrogram
• parent-child relationships in tree with connection line marks
• leaves aligned so interior branch heights easy to compare
– heatmap
• marks (re-)ordered by cluster hierarchy traversal
• task: assess quality of clusters found by automatic methods

64
Idiom: cluster heatmap

65
Visualization Analysis & Design

Tables (Ch 7) II

Tamara Munzner
Department of Computer
Science
University of British Columbia
@tamaramunzner
67
Idioms: radial bar chart, star plot
• star plot
– line mark, radial axes meet at central point
• radial bar chart
– line mark, radial axes meet at central ring
– channels: length, angle/orientation
• bar chart
– rectilinear axes, aligned vertically

• accuracy
– length not aligned with radial layouts
• less accurately perceived than rectilinear aligned

[Vismon: Facilitating Risk Assessment and Decision Making In Fisheries Management. Booshehrian, Möller,
Peterman, and Munzner. Technical Report TR 2011-04, Simon Fraser University, School of Computing 68
Science, 2011.]
Idiom: radar plot
• radial line chart
– point marks, radial layout
– connecting line marks

• avoid unless data is cyclic

69
“Radar graphs: Avoid them (99.9% of the time)”
original
difficult to
interpret

redesign for
rectilinear

http://www.thefunctionalart.com/2012/11/radar-graphs-avoid-them-999-of-time.html 70
Idioms: pie chart, coxcomb chart
• pie chart
–interlocking area marks with angle channel: 2D area
varies
• separated & ordered radially, uniform height
–accuracy: area less accurate than rectilinear aligned
line length
–task: part-to-whole judgements
• coxcomb chart
–line marks with length channel: 1D length varies
• separated & ordered radially, uniform width
–direct analog to radial bar charts
• data [A layered grammar of graphics. Wickham. Journ.
–1 categ key attrib, 1 quant value attrib
Computational and Graphical Statistics 19:1 (2010), 3– 71
Coxcomb / nightingale rose / polar area chart
• invented by Florence Nightingale:
Diagram of the Causes of Mortality in the Army in the East

72
Coxcomb: perception
• encode: 1D length
• decode/perceive: 2D area

• nonuniform line/sector width


as length increases
– so area variation is nonlinear wrt nonuniform width as length
uniform width as length increases
line mark length! increases

• bar chart safer: uniform width,


so area is linear with line mark
length radial & rectilinear bars: uniform width as length
increases
– both radial & rectilinear cases
73
Pie charts: perception
• some empirical evidence that people
respond to arc length
– decode/perceive: not angles
– maybe also areas?…
• donut charts no worse than pie charts

r Areas: Individual Data Encodings in Pie and Donut Charts. Skau and Kosara. Proc. EuroVis 2016.]
https://eagereyes.org/blog/2016/an-illustrated-tour-of-the-pie-chart-study-results 74
Pie charts: best practices
• not so bad for two (or few) levels, for part-to-whole task

https://eagereyes.org/pie-charts 75
Pie charts: best practices
• not so bad for two (or few) levels, for part-to-whole task
• dubious for several levels if details matter

https://eagereyes.org/pie-charts 76
Pie charts: best practices
• not so bad for two (or few) levels, for part-to-whole task
• dubious for several levels if details matter
• terrible for many levels

https://eagereyes.org/pie-charts 77
Idioms: normalized stacked bar chart
• task
– part-to-whole judgements
• normalized stacked bar chart
– stacked bar chart, normalized to full vert height
– single stacked bar equivalent to full pie
• high information density: requires narrow rectangle
• pie chart
– information density: requires large circle

http://bl.ocks.org/mbostock/388620
8
, 78
Idiom: glyphmaps

• rectilinear good for


linear vs nonlinear trends

• radial good for cyclic patterns


– evaluating periodicity

[Glyph-maps for Visually Exploring Temporal Patterns in Climate Data and Models.
Wickham, Hofmann, Wickham, and Cook. Environmetrics 23:5 (2012), 382–393.]

79
80
Idiom: SPLOM
• scatterplot matrix
(SPLOM)
– rectilinear axes,
point mark
– all possible pairs of axes
– scalability
• one dozen attribs
• dozens to hundreds of
items

81
Idioms: parallel coordinates
• scatterplot limitation
– visual representation with
orthogonal axes
– can show only two attributes with
spatial position channel

after [Visualization Course Figures. McGuffin, 2014.


http://www.michaelmcguffin.com/courses/vis/] 82
Idioms: parallel coordinates
• scatterplot limitation
–visual representation with orthogonal axes
–can show only two attributes with spatial
position channel
• alternative: line up axes in parallel to
show many attributes with position
–item encoded with a line with n segments
–n is the number of attributes shown
• parallel coordinates
–parallel axes, jagged line for item
–rectilinear axes, item as point
• axis ordering is major challenge
–scalability
• dozens of attribs
• hundreds of items after [Visualization Course Figures. McGuffin, 2014.
http://www.michaelmcguffin.com/courses/vis/] 83
Task: Correlation
• scatterplot matrix
–positive correlation
• diagonal low-to-high https://www.mathsisfun.com/data/scatter-xy-plots.html

–negative correlation
• diagonal high-to-low
–uncorrelated: spread out
• parallel coordinates
–positive correlation
• parallel line segments
–negative correlation
• all segments cross at halfway
point
[Hyperdimensional Data Analysis Using Parallel
–uncorrelated Coordinates. Wegman. Journ. American Statistical
Association 85:411 (1990), 664–675.] 84
Parallel coordinates, limitations
• visible patterns only between neighboring axis pairs
• how to pick axis order?
– usual solution: reorderable axes, interactive exploration
– same weakness as many other techniques
• downside of interaction: human-powered search
– some algorithms proposed, none fully solve

85
Orientation limitations
• rectilinear: scalability wrt #axes
• 2 axes best, 3 problematic, 4+ impossible

86
Orientation limitations
• rectilinear: scalability wrt #axes
• 2 axes best, 3 problematic, 4+ impossible
• parallel: unfamiliarity, training time

87
Orientation limitations
• rectilinear: scalability wrt #axes
• 2 axes best, 3 problematic, 4+ impossible
• parallel: unfamiliarity, training time
• radial: perceptual limits
– polar coordinate asymmetry
• angles lower precision than length
• nonuniform sector width/size depending on radial distance
– frequently problematic
• but sometimes can be deliberately exploited!
– for 2 attribs of very unequal importance

[Uncovering Strengths and Weaknesses of Radial Visualizations - an Empirical


Approach. Diehl, Beck and Burch. IEEE TVCG (Proc. InfoVis) 16(6):935--942, 2010.] 88
Layout density

8
9
Idiom: Dense software overviews
• data: text
– text + 1 quant attrib per line
• derived data:
– one pixel high line
– length according to original
• color line by attrib
• scalability
– 10K+ lines

[Visualization of test information to assist fault localization. Jones, Harrold, Stasko. Proc. ICSE 2002, p
90
467-477.]
Arrange tables

9
1
9
2
9
3

You might also like