CHAPTER 4:
eXtensible Markup Language
(XML)
What is XML?
XMLstands for EXtensible Markup
Language
XML
is a markup language much like
HTML
XML was designed to describe data, not
to display data
XML tags are not predefined. You must
define your own tags
XML is designed to be self-descriptive
XML is a W3C Recommendation
XML….
Basedon Standard Generalized Markup
Language (SGML)
Version
1.0 introduced by World Wide
Web Consortium (W3C) in 1998
Bridge for data exchange on
the Web
Comparisons
XML HTML
Extensible set of tags Fixed set of tags
Content orientated Presentation oriented
Standard Data No data validation
infrastructure
capabilities
Allows multiple output Single presentation
forms
HTML was designed to
XML was designed to
describe data, with focus display data, with focus
on what data is. on how data looks
XML was designed for how HTML was designed for
to store data. how to display data.
XML tags are not HTML tags are
predefined. You must predefined
define your own tags
Authoring XML
Elements
An XML element is made up of a start tag, an end
tag, and data in between.
Example:
<director> Matthew Dunn </director>
Example of another element with the same value:
<actor> Matthew Dunn </actor>
XML tags are case-sensitive:
<CITY> <City> <city>
XML can abbreviate empty elements, for example:
<married> </married> can be abbreviated to
<married/>
Authoring XML
Elements (cont’d)
Anattribute is a name-value pair
separated by an equal sign (=).
Example:
<City ZIP=“94608”>
Emeryville </City>
Attributesare used to attach
additional, secondary
information to an element.
Authoring XML
Documents
A basic XML document is an XML element
that can, but might not, include nested
XML elements.
Example:
<books>
<book isbn=“123”>
<title> Second Chance </title>
<author> Matthew Dunn </author>
</book>
</books>
XML Data Model:
Example
<BOOKS>
<book id=“123”
loc=“library”>
<author>Hull</author>
<title>California</title>
<year> 1995 </year>
</book>
<article id=“555”
ref=“123”>
<author>Su</author>
<title> Purdue</title>
</article>
Hull
</BOOKS>
XML Data Model: Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<customer>
<firstname>Michael</firstname>
<lastname>Smith</lastname>
<gender>male</gender>
<address>
<street>197 West Park Ave.</street>
<city>New York</city>
<state>NY</state>
<zip>11375</zip>
<country>US</country>
</address>
<phone>718-235-5670</phone>
<email>msmith278@yahoo.com</email>
</customer>
.
customer -> firstname
lastname
gender
address ->
street
city
State
zip
Country
phone
email
Authoring XML
Documents (cont’d)
Authoring guidelines:
All elements must have an end tag.
All elements must be cleanly nested (overlapping
elements are not allowed).
All attribute values must be enclosed in quotation
marks.
Each document must have a unique first element,
the root node.
Document Type
Definitions (DTD)
An XML document may have an optional
DTD.
DTD serves as grammar for the
underlying XML document, and it is part of
XML language.
DTDs are somewhat unsatisfactory, but no
consensus exists so far beyond the basic
DTDs.
DTD has the form:
<!DOCTYPE name [markupdeclaration]>
DTD (cont’d)
Consider an XML document:
<db><person><name>Alan</name>
<age>42</age>
<email>agb@usa.net
</email>
</person>
<person>………</person>
……….
</db>
DTD (cont’d)
DTD for it might be:
<!DOCTYPE db [
<!ELEMENT db (person*)>
<!ELEMENT person (name, age, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT age (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
DTD (cont’d)
Occurrence Indicator:
Indicator Occurrence
(no indicator) Required One and only
one
? Optional None or one
* Optional, None, one, or
repeatable more
+ Required, One or more
repeatable
XML Query Languages
The first XML query languages
LOREL (Stanford)
XQL
Several other query languages have
been developed (e.g. UNQL, XPath)
XML-QL considered by W3C for
standardization
Currently W3C is considering and
working on a new query language:
XQuery
A Query Language for
XML: XML-QL
Developed at AT&T labs
To extract data from the input XML data
Has variables to which data is bound and
templates which show how the output XML
data is to be constructed
Uses the XML syntax
Based on a where/construct syntax
Where combines from and where parts of SQL
Construct corresponds to SQL’s select
XML-QL Query:
Example 1
Retrieve all authors of books published by Morgan
Kaufmann:
where <book>
<publisher>
<name> Morgan Kaufmann</name>
</publisher>
<title> $T </title>
<author> $A </author>
</book> in “www.a.b.c/bib.xml”
construct <result> $A </result>
XML-QL Query: Example 2
XML-QL query asking for all bookstores that sell
The Java Programming Language for under $25:
where <store>
<name> $N </name>
<book>
<title> The Java Programming Language </title>
<price> $P </price>
</book>
</store> in “www.store/bib.xml”
$P < 25
construct <result> $N </result>
Semistructured Data and
Mediators
Semistructured data is often encountered in
data exchange and integration
At the sources the data may be structured
(e.g. from relational databases)
We model the data as semistructured to
facilitate exchange and integration
Users see an integrated semistructured view
that they can query
Queries are eventually reformulated into
queries over the structured resources (e.g.
SQL)
Only results need to be materialized
What is a mediator ?
A complex software component that
integrates and transforms data from
one or several sources using a
declarative specification
Two main contexts:
Data conversion: converts data between two different models
e.g. by translating data from a
relational database into XML
Data integration: integrates data from different sources into a
common view
Converting Relational
Database to XML
Example: Export the following data into XML and
group books by store
Relational Database:
Store (sid, name, phone)
Book (bid, title, authors)
StoreBook (sid , bid, price, stock)
price stock
name Store StoreBook Book authors
phone sid title bid
Converting Relational
Database to XML (Cont’d)
XML:
<store>
<name> … </name>
<phone> … </phone>
<book> <title>… </title>
<authors> … </authors>
<price> … </price>
</book>
<book>…</book>
…
</store>
Example