CH 3 (SpatialDB)
CH 3 (SpatialDB)
1.1 Overview
1.2 Application domains
1.3 Compare a SDBMS with a GIS
1.4 Categories of Users
1.5 An example of an SDBMS application
1.6 A Stroll though a spatial database
1.6.1 Data Models, 1.6.2 Query Language, 1.6.3 Query Processing,
1.6.4 File Organization and Indices, 1.6.5 Query Optimization,
1.6.6 Data Mining
Learning Objectives
Learning Objectives (LO)
LO1 : Understand the value of SDBMS
• Application domains
• users
• How is different from a DBMS?
LO2: Understand the concept of spatial databases
LO3: Learn about the Components of SDBMS
Exercise: List two ways you have used spatial data. Which
software did you use to manipulate spatial data?
Learning Objectives
Learning Objectives (LO)
LO1 : Understand the value of SDBMS
LO2: Understand the concept of spatial databases
• What is a SDBMS?
• How is it different from a GIS?
LO3: Learn about the Components of SDBMS
Sections for LO2
Section 1.5 provides an example SDBMS
Section 1.1 and 1.3 compare SDBMS with DBMS and GIS
What is a SDBMS ?
A SDBMS is a software module that
can work with an underlying DBMS
supports spatial data models, spatial abstract data types (ADTs)
and a query language from which these ADTs are callable
supports spatial indexing, efficient algorithms for processing
spatial operations, and domain specific rules for query
optimization
Example: Oracle Spatial data cartridge, ESRI SDE
can work with Oracle 8i DBMS
Has spatial data types (e.g. polygon), operations (e.g. overlap)
callable from SQL3 query language
Has spatial indices, e.g. R-trees
SDBMS Example
Consider a spatial dataset with:
County boundary (dashed white line)
Census block - name, area, population,
boundary (dark line)
Water bodies (dark polygons)
Satellite Imagery (gray scale pixels)
Fig 1.2
Modeling Spatial Data in Traditional DBMS
Figure 1.3
Spatial Data Types and Traditional Databases
Traditional relational DBMS
Support simple data types, e.g. number, strings, date
Modeling Spatial data types is tedious
Example: Figure 1.4 shows modeling of polygon using numbers
Three new tables: polygon, edge, points
• Note: Polygon is a polyline where last point and first point are same
A simple unit sqaure represented as 16 rows across 3 tables
Simple spatial operators, e.g. area(), require joining tables
Tedious and computationally inefficient
Fig 1.4
Evolution of DBMS technology
Fig 1.5
Spatial Data Types and Post-relational Databases
Post-relational DBMS
Support user defined abstract data types
Spatial data types (e.g. polygon) can be added
Choice of post-relational DBMS
Object oriented (OO) DBMS
Object relational (OR) DBMS
A spatial database is a collection of spatial data types, operators,
indices, processing strategies, etc. and can work with many post-
relational DBMS as well as programming languages like Java, Visual
Basic etc.
How is a SDBMS different from a GIS ?
GIS is a software to visualize and analyze spatial data
using spatial analysis functions such as
Search Thematic search, search by region, (re-)classification
Location analysis Buffer, corridor, overlay
Terrain analysis Slope/aspect, catchment, drainage network
Flow analysis Connectivity, shortest path
Distribution Change detection, proximity, nearest neighbor
Spatial analysis/Statistics Pattern, centrality, autocorrelation, indices of
similarity, topology: hole description
Measurements Distance, perimeter, shape, adjacency, direction
GIS uses SDBMS
to store, search, query, share large spatial data sets
How is a SDBMS different from a GIS ?
SDBMS focusses on
Efficient storage, querying, sharing of large spatial datasets
Provides simpler set based query operations
Example operations: search by region, overlay, nearest
neighbor, distance, adjacency, perimeter etc.
Uses spatial indices and query optimization to speedup queries
over large spatial datasets.
SDBMS may be used by applications other than GIS
Astronomy, Genomics, Multimedia information systems, ...
Will one use a GIS or a SDBM to answer the following:
How many neighboring countries does USA have?
Which country has highest number of neighbors?
Evolution of acronym “GIS”
Geographic Information Systems (1980s)
Geographic Information Science (1990s)
Geographic Information Services (2000s)
Fig 1.1
Three meanings of the acronym GIS
Geographic Information Services
Web-sites and service centers for casual users, e.g. travelers
Example: Service (e.g. AAA, mapquest) for route planning
Geographic Information Systems
Software for professional users, e.g. cartographers
Example: ESRI Arc/View software
Geographic Information Science
Concepts, frameworks, theories to formalize use and
development of geographic information systems and services
Example: design spatial data types and operations for querying
Exercise: Which meaning of the term GIS is closest to the focus of
the book titled “Spatial Databases: A Tour”?
Learning Objectives
Learning Objectives (LO)
LO1 : Understand the value of SDBMS
LO2: Understand the concept of spatial databases
LO3: Learn about the Components of SDBMS
• Architecture choices
• SDBMS components:
– data model, query languages,
– query processing and optimization
– File organization and indices
– Data Mining
Chapter Sections
1.5 second half
1.6 – entire section
Components of a SDBMS
Recall: a SDBMS is a software module that
can work with an underlying DBMS
supports spatial data models, spatial ADTs and a query
language from which these ADTs are callable
supports spatial indexing, algorithms for processing spatial
operations, and domain specific rules for query optimization
Components include
spatial data model, query language, query processing, file
organization and indices, query optimization, etc.
Figure 1.6 shows these components
We discuss each component briefly in chapter 1.6 and in more
detail in later chapters.
Three Layer Architecture Fig 1.6
1.6.1 Spatial Taxonomy, Data Models
Spatial Taxonomy:
multitude of descriptions available to organize space.
Topology models homeomorphic relationships, e.g. overlap
Euclidean space models distance and direction in a plane
Graphs models connectivity, Shortest-Path
Spatial data models
rules to identify identifiable objects and properties of space
Object model help manage identifiable things, e.g. mountains,
cities, land-parcels etc.
Field model help manage continuous and amorphous
phenomenon, e.g. wetlands, satellite imagery, snowfall etc.
More details in chapter 2.
1.6.2 Spatial Query Language
• Spatial query language
• Spatial data types, e.g. point, linestring, polygon, …
• Spatial operations, e.g. overlap, distance, nearest neighbor, …
• Callable from a query language (e.g. SQL3) of underlying DBMS
SELECT S.name
FROM Senator S
WHERE S.district.Area() > 300
• Standards
• SQL3 (a.k.a. SQL 1999) is a standard for query languages
• OGIS is a standard for spatial data types and operators
• Both standards enjoy wide support in industry
• More details in chapters 2 and 3
Multi-scan Query Example
• Spatial join example
SELECT S.name FROM Senator S, Business B
WHERE S.district.Area() > 300 AND Within(B.location, S.district)
• Non-Spatial Join example
SELECT S.name FROM Senator S, Business B
WHERE S.soc-sec = B.soc-sec AND S.gender = ‘Female’
Fig 1.7
1.6.3 Query Processing
• Efficient algorithms to answer spatial queries
• Common Strategy - filter and refine
• Filter Step:Query Region overlaps with MBRs of B,C and D
• Refine Step: Query Region overlaps with B and C
Fig 1.8
Query Processing of Join Queries
•Example - Determining pairs of intersecting rectangles
• (a):Two sets R and S of rectangles, (b): A rectangle with 2 opposite corners
marked, (c ): Rectangles sorted by smallest X coordinate value
• Plane sweep filter identifies 5 pairs out of 12 for refinement step
•Details of plane sweep algorithm on page 15
Fig 1.9
1.6.4 File Organization and Indices
• A difference between GIS and SDBMS assumptions
•GIS algorithms: dataset is loaded in main memory (Fig. 1.10(a))
•SDBMS: dataset is on secondary storage e.g disk (Fig. 1.10(b))
•SDBMS uses space filling curves and spatial indices
•to efficiently search disk resident large spatial datasets
Fig 1.10
Organizing spatial data with space filling curves
•Issue:
•Sorting is not naturally defined on spatial data
•Many efficient search methods are based on sorting datasets
•Space filling curves
•Impose an ordering on the locations in a multi-dimensional space
•Examples: row-order (Fig. 1.11(a), z-order (Fig 1.11(b))
• Allow use of traditional efficient search methods on spatial data
Fig 1.11
Spatial Indexing: Search Data-Structures
•Choice for spatial indexing:
•B-tree is a hierarchical collection of ranges of linear keys, e.g. numbers
•B-tree index is used for efficient search of traditional data
•B-tree can be used with space filling curve on spatial data
•R-tree provides better search performance yet!
•R-tree is a hierarchical collection of rectangles
•More details in chapter 4
• Examples:
•GIS organize spatial set as a set of layers
•Databases organize dataset as a collection of tables
Why Data Models?
• Data models facilitate
• Early analysis of properties, e.g. storage cost, querying ability, ...
• Reuse of shared data among multiple applications
• Exchange of data across organization
• Conversion of data to new software / environment
• Example- Y2K crisis for year 2000
Many computer software systems were developed without well-defined data
models in 1960s and 1970s. These systems used a variety of data models for
representing time and date. Some of the representations used two digits to
represent years. In late 1990s, people worried that the 2 digit representation of
year may lead to errorneous behaviour. For example age of a person born in
1960 (represented as 60) in year 2000 (represented as 00) may appear
negative and may be flagged as illegal data item. A large amount of effort and
resources (hundreds of Billions of dollars) was spent in revising the software.
Proper use of data model may have significantly reduced the costs. If time and
date were modeled as abstract data types in a software, only a small portion of
the software implementing the date ADT had to be reviewed and revised.
Types of Data Models
•Two Types of data models
•Generic data models
•Developed for business data processing
•Support simple abstract data types (ADTs), e.g. numbers, strings, date
•Not convenient for spatial ADTs, e.g. polygons
•Recall a polygon becomes dozens of rows in 3 tables (Fig. 1.4, pp. 8)
•Need to extend with spatial concepts, e.g. ADTs
•Application Domain specific, e.g. spatial models
•Set of concepts developed in Geographic Info. Science
•Common spatial ADTs across different GIS applications
•Plan of Study
•First study concepts in spatial models
•Then study generic model
•Finally put the two together
Learning Objectives
Learning Objectives (LO)
LO1: Understand concept of data models
LO2 : Understand the models of spatial information
• Field based model
• Object based model
LO3: Understand the 3-step design of databases
LO4: Learn about the trends in spatial data models
f + g : x → f ( x) + g ( x)
f g : x → f ( g ( x))
Types of Field Operations
Local: value of the new field at a given location in the spatial frame-work
depends only on the value of the input field at that location(e.g., Thresholding)
Focal:value of the resulting field at a given location depends on the values
that the input field assumes in a small neighborhood of the location(e.g.,
Gradient)
Zonal:Zonal operations are naturally associated with aggregate operators or
the integration function. An operation that calculates the average height of the
trees for each species is a zonal operation.
Point City 0
Curve River 1
Spatial Object Types in OGIS Data Model
Fig 2.2: Each rectangle shows a distinct spatial object type
Classifying Operations on spatial objects in Object Model
•Classifying operations (Tables 2.1, 2.2, pp. 29-31)
• Set based: 2-dimensional spatial objects (e.g. polygons) are sets of points
• a set operation (e.g. intersection) of 2 polygons produce another polygon
• Topological operations: Boundary of USA touches boundary of Canada
•Directional: New York city is to east of Chicago
•Metric: Chicago is about 700 miles from New York city.
• Q? Identify classes of spatial operations not listed in this slide.
Green is A interior ( Ao )
U
Red is boundary of A (A)
Concept Symbol
Entities
Attributes
Multi-valued Attributes
Relationships
ER Diagram for “State-Park”
Fig 2.4
•Exercise:
•List the entities, attributes, relationships in this ER diagram
•Identify cardinality constraint for each relationship.
•How many roads “Accesses” a “Forest_stand”? (one or many)
2.2.2 Logical Data Model: The Relational Model
Relational model is based on set theory
Main concepts
Domain: a set of values for a simple attribute
Relation: cross-product of a set of domains
• Represents a table, i.e. homogeneous collection of rows (tuples)
• The set of columns (i.e. attributes) are same for each row
Comparison to concepts in conceptual data model
Relations are similar to but not identical to entities
Domains are similar to attributes
Translation rules establishing exact correspondence are discussed in 2.2.3
Relational Schema
Schema of a Relation
Enumerates columns, identifies primary key and foreign keys.
Primary Key :
• one or more attributes uniquely identify each row within a table
Foreign keys
• R’s attributes which form primary key of another relation S
• Value of a foreign key in any tuple of R match values in some row of S
Relational schema of a database
collection of schemas of all relations in the database
Example: Figure 2.5 (next slide)
Ablue print summary drawing of the database table structures
Allows analysis of storage costs, data redundancy, querying capabilities
Some databases were designed as relational schema in 1980s
Nowadays, databases are designed as E R models and relational schema is
generated via CASE tools
Relational Schema Example
•Exercise:
•Identify relations with
•primary keys
•foreign keys
•other attributes
•Compare with ER diagram
•Figure 2.4, pp. 37
Fig 2.5
Relational Schema for “Point”, “Line”, “Polygon” and “Elevation”
Fig 2.5
More on Relational Model
Integrity Constraints
Key: Every relation has a primary key.
Entity Integrity: Value of primary key in a row is never undefined
Referential Integrity: Value of an attribute of a Foreign Key must appear as a value
in the primary key of another relationship or must be null.
Fig 2.7
Specifying Pictograms
•Grammar based approach
•Rewrite rule
•like English syntax diagrams
•Classes of pictograms
•Entity pictograms
•basic: point, line, polygon
•collection of basic
•...
•Relationship pictograms
•partition, network
Entity Pictograms: Basic shapes, Collections
Entity Pictograms: Derived and Alternate Shapes
•Derived shape example is city center point from boundary polygon
•Alternate shape example: A road is represented as a polygon for construction
•or as a line for navigation
2.4 Conceptual Data Modeling with UML
•Motivation
•ER Model does not allow user defined operations
•Object oriented software development uses UML
•UML stands for Unified Modeling Language
•It is a standard consisting of several diagrams
•class diagrams are most relevant for data modeling
•UML class diagrams concepts
•Attributes are simple or composite properties
•Methods represent operations, functions and procedures
•Class is a collection of attributes and methods
•Relationship relate classes
•Example UML class diagram: Figure 2.8
UML Class Diagram with Pictograms: Example
•Exercise: Identify classes, attributes, methods, relationships in Fig. 2.8.
•Compare Fig. 2.8 with corresponding ER diagram in Fig. 2.7.
Fig 2.8
Comparing UML Class Diagrams to ER Diagrams
•Concepts in UML class diagram vs. those in ER diagrams
•Class without methods is an Entity
•Attributes are common in both models
•UML does not have key attributes and integrity constraints
• ERD does not have methods
•Relationships properties are richer in ERDs
•Entities in ER diagram relate to datasets, but UML class diagram
•can contain classes which have little to do with data
2.5 Summary
Spatial Information modeling can be classed into Field
based and Object based
Field based for modeling smoothly varying entities, like
rainfall
Object based for modeling discrete entities, like country
Summary
A data model is a high level description of the data
it can help in early analysis of storage cost, data quality
There are two popular models of spatial information
Field based and Object based
Database are designed in 3-steps
Conceptual, Logical and Physical
Pictograms can simplify Conceptual data models
Chapter 3:Spatial Query Languages
3.1 Standard Database Query Languages
3.2 Relational Algebra
3.3 Basic SQL Primer
3.4 Extending SQL for Spatial Data
3.5 Example Queries that emphasize spatial aspects
3.6 Trends: Object-Relational SQL
Learning Objectives
Learning Objectives (LO)
LO1: Understand concept of a query language
• What is a query language?
• Why use query languages?
LO2 : Learn to use standard query language (SQL)
LO3: Learn to use spatial ADTs with SQL
LO4: Learn about the trends in query languages
• 3 Relations
Country(Name, Cont, Pop, GDP, Life-Exp, Shape)
City(Name, Country, Pop,Capital, Shape)
River(Name, Origin, Length, Shape)
• Keys
•Primary keys are Country.Name, City.Name, River.Name
• Foreign keys are River.Origin, City.Country
•Data for 3 tables
•Shown on next slide
World database data tables
Learning Objectives
Learning Objectives (LO)
LO1: Understand concept of a query language
LO2 : Learn to use standard query language (SQL)
• How to create and populate tables?
• How to query given tables?
LO3: Learn to use spatial ADTs with SQL
LO4: Learn about the trends in query languages
• Related statements
• SELECT statement with INTO clause can insert multiple rows in a table
• Bulk load, import commands also add multiple rows
• DELETE statement removes rows
•UPDATE statement can change values within selected rows
Querying populated Tables in SQL
• SELECT statement
• The commonly used statement to query data in one or more tables
•Returns a relation (table) as result
• Has many clauses
• Can refer to many operators and functions
• Allows nested queries which can be hard to understand
• Scope of our discussion
• Learn enough SQL to appreciate spatial extensions
•Observe example queries
• Read and write simple SELECT statement
• Understand frequently used clauses, e.g. SELECT, FROM, WHERE
• Understand a few operators and function
SELECT Statement- General Information
• Clauses
•SELECT specifies desired columns
•FROM specifies relevant tables
•WHERE specifies qualifying conditions for rows
•ORDER BY specifies sorting columns for results
•GROUP BY, HAVING specifies aggregation and statistics
•Operators and functions
•arithmetic operators, e.g. +, -, …
•comparison operators, e.g. =, <, >, BETWEEN, LIKE…
•logical operators, e.g. AND, OR, NOT, EXISTS,
•set operators, e.g. UNION, IN, ALL, ANY, …
•statistical functions, e.g. SUM, COUNT, ...
• many other operators on strings, date, currency, ...
SELECT Example 1.
• Simplest Query has SELECT and FROM clauses
• Query: List all the cities and the country they belong to.
Result →
SELECT Example 2.
• Commonly 3 clauses (SELECT, FROM, WHERE) are used
•Query: List the names of the capital cities in the CITY table.
SELECT *
FROM CITY
WHERE CAPITAL=‘Y ’
Result →
Query Example…Where clause
Query: List the attributes of countries in the Country relation
where the life-expectancy is less than seventy years.
SELECT Co.Name,Co.Life-Exp
FROM Country Co
WHERE Co.Life-Exp <70
Result →
Multi-table Query Examples
Query: List the capital cities and populations of countries
whose GDP exceeds one trillion dollars.
Note:Tables City and Country are joined by matching “City.Country =
Country.Name”. This simulates relational operator “join” discussed in 3.2
SELECT Ci.Name,Co.Pop
FROM City Ci,Country Co
WHERE Ci.Country =Co.Name
AND Co.GDP >1000.0
AND Ci.Capital=‘Y ’
Multi-table Query Example
Query: What is the name and population of the capital city in the
country where the St. Lawrence River originates?
Note: Three tables are joined together pair at a time. River.Origin is matched
with Country.Name and City.Country is matched with Country.Name. The
order of join is decided by query optimizer and does not affect the result.
Query Examples…Aggregate Staistics
Query: What is the average population of the noncapital cities listed in the
City table?
SELECT AVG(Ci.Pop)
FROM City Ci
WHERE Ci.Capital=‘N ’
Query: List the countries whose GDP is greater than that of Canada.
SELECT Co.Name
FROM Country Co
WHERE Co.GDP >ANY(SELECT Co1.GDP
FROM Country Co1
WHERE Co1.Name =‘Canada ’)
Learning Objectives
Learning Objectives (LO)
LO1: Understand concept of a query language
LO2 : Learn to use standard query language (SQL)
LO3: Learn to use spatial ADTs with SQL
• Learn about OGIS standard spatial data types and operations
• Learn to use OGIS spatial ADTs with SQL
LO4: Learn about the trends in query languages
Note: Spatial operator Touch() is used in WHERE clause to join Country table
with itself. This query is an example of spatial self join operation.
Spatial Query with multiple tables
Query: For all the rivers listed in the River table, find the countries through
which they pass.
Note: Spatial operation “Cross” is used to join River and Country tables. This
query represents a spatial join operation.
Query: The St. Lawrence River can supply water to cities that are
within 300 km. List the cities that can use water from the St.
Lawrence.
SELECT Ci.Name
FROM City Ci, River R
WHERE Overlap(Ci.Shape, Buffer(R.Shape,300))=1
AND R.Name =‘St.Lawrence ’
Earlier version of OGIS did not provide spatial aggregate operation to support
GIS operations like reclassify.
Using Spatial Operation in Nested Queries
Query: For each river, identify the closest city.
Note: It shows a complex nested query with aggregate operations. Such queries can be written into
two expression, namely a view definition, and a query on the view. The inner query becomes a view
and outer query is runon the view. This is illustrated in the next slide.
Rewriting nested queries using Views
•Views are like tables
•Represent derived data or result of a query
•Can be used to simplify complex nested queries
•Example follows:
CREATE VIEW Neighbor AS
SELECT Co.Name, Count(Co1.Name)AS num neighbors
FROM Country Co,Country Co1
WHERE Touch(Co.Shape,Co1.Shape)
GROUP BY Co.Name
Spatial Databases
GIS Case Studies
Elizabeth Sayed
Elizabeth Stoltzfus
December 4, 2002
Case Studies
Common applications
Spatial databases provide structures for storage and analysis of spatial data
Storing spatial data in a standard database would require excessive amounts of space
Queries to retrieve and analyze spatial data from a standard database would be long and
cumbersome leaving a lot of room for error
Spatial databases provide much more efficient storage, retrieval, and analysis of spatial data
SDBMS:
– Works with an underlying DBMS
– Allows spatial data models and types
– Supports querying language specific to spatial data types
– Provides handling of spatial data and operations
Spatial application
Interface to DBMS
Taxonomy
SDBMS has three layers:
Data types
– Interface to spatial application
DBMS
Operations
– Core spatial functionality
Query language
– Interface to DBMS
Algorithms
Access methods
Spatial query language provides tools and structures specifically for working with spatial data
Find all the counties through which the Merced river runs
SELECT C.Name, R.Name
FROM County C, River R
WHERE Intersect(C.Shape, R.Shape) = 1 AND R.Name = ‘Merced’;
Common applications
Analysis
1. Geomorphometric –slope values, gradients, aspects, convexity
2. Aggregation and expansion
3. Querying
Integration
1. Relationship and conversion among vector and raster data
Pine
(0,2)
Fir Oak
N 0, N *
Derived Shapes User Defined Shape
PointID Within
Monitors Fire Station Monitors
Fire Station
FiName
FiName
PointID
Spatial accuracy
P A
1. Predictions vs. actual P P
Case 2:
2. Impact A A
Possible Prediction
Source: What’s Spatial About Spatial Data Mining pg 490
Methodology:
– Initiated by development of land-use/land cover maps and followed by field measurements
– Spatial database construction development based on 1986 and 1992 land-use/land cover
maps that developed from Landsat MSSR and SPOT
– Weight of sample components of the tree and streams, branches, twigs, etc were estimated
from equations and literature
– Emission rates were developed by plotting and analyzing collected air samples
– Field data measurements and GIS spatial data were combined using a Look Up Table of
Arc/Info.
Source: “Spatial Database Development for green house gas emission Estimation using remote sensing and GIS”
Data Source:
– 3 Landsat TM images of the Pantal study area from 1985, 1990, 1996
– A land-use survey from 1997
Assessment Methodology:
– Normalized Difference Vegetation Index (NDVI) was computed for each year
– NDVI maps of the three years combined and submitted to multi-dimensional image
segmentation
– Classified vegetation
– Produced a color composite by year that identified the density of vegetation
Length
Type
Closet
Hallway
RoomID
HallI Accesses
D
Belongs_T
Room o
Belongs_
To
FurnID
Belongs_To
Furniture
Nam
e