[go: up one dir, main page]

0% found this document useful (0 votes)
45 views50 pages

CS549 Distributed Information Systems: Lecture 2: XML and Internet Databases

The document discusses XML and its components. It defines XML as a markup language that is extensible and designed for delivering information over the internet. The key components of an XML document include elements, attributes, processing instructions, and comments. Elements can have attributes and be nested within the root element. XML documents must follow rules to be well-formed, and can be validated against a DTD or schema to check validity.

Uploaded by

Uth Festival
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views50 pages

CS549 Distributed Information Systems: Lecture 2: XML and Internet Databases

The document discusses XML and its components. It defines XML as a markup language that is extensible and designed for delivering information over the internet. The key components of an XML document include elements, attributes, processing instructions, and comments. Elements can have attributes and be nested within the root element. XML documents must follow rules to be well-formed, and can be validated against a DTD or schema to check validity.

Uploaded by

Uth Festival
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

CS549

Distributed Information Systems


Lecture 2: XML and Internet Databases

This lecture is based on Chapter26: Elmasri and Navathe, fundamentals of database systems
Session Learning Outcomes

The learning outcomes of the session are to understand the:


• Anatomy of XML document
• Components of XML document
• XML validation
• Rules for well-formed XML document
• XML DTD :Document Type Definition
• Information retrieval by using XPath & XQuery

2
- Introduction
• What is XML

• How can XML be used

• What does XML look like

• XML and HTML

• XML is free and extensible

3
What is XML
• XML stands for Extensible Markup Language.

• XML developed by the World Wide Web Consortium (www.W3C.org)

• Created in 1996. The first specification was published in 1998 by the W3C

• It is specifically designed for delivering information over the internet.

• XML like HTML is a markup language, but unlike HTML it doesn’t have
predefined elements.

• You create your own elements and you assign them any name you like,
hence the term extensible.

4
How can XML be Used?
• XML is used to Exchange Data

• With XML, data can be exchanged between incompatible systems

• With XML, financial information can be exchanged over the Internet

• XML can be used to Share Data

• XML can be used to Store Data

• XML can make your Data more Useful


• ……..

5
What does XML look like
<Books>

<Book>
<Title> Java </Title>
Books <Author> John </Author>
<Year> 1999 </year>
Title Author year </Book>

Java John 1999 …
Pascal Sara 1980 …
<Book>
Basic Mary 1975 <Title> Oracle </Title>
<Author> Emad
Oracle Emad 1999 </Author>
…. …. <Year> 1999 </Year>
</Book>
….
Relation ….
</ Books>

XML document

6
XML and HTML …

• XML is not a replacement for HTML


• XML was designed to carry data
• XML and HTML were designed with different goals
• XML was designed to describe data and to focus on what data is
• HTML was designed to display data and to focus on how data looks.
• HTML is about displaying information, while XML is about describing
information

7
XML and HTML
• HTML is for humans and HTML describes web pages
• You don’t want to see error messages about the web pages you visit
• Browsers ignore and/or correct as many HTML errors as they can, so
HTML is often sloppy

• XML is for computers and XML describes data


• The rules are strict, and errors are not allowed
• In this way, XML is like a programming language
• Current versions of most browsers can display XML
8
XML is free and extensible
• XML tags are not predefined
• You must "invent" your own tags
• The tags used to mark up HTML documents and the structure of HTML
documents are predefined
• The author of HTML documents can only use tags that are defined in
the HTML standard

• XML allows the author to define his own tags and his own document
structure, hence the term extensible.
9
The Anatomy of XML Document

<?xml version:”1.0”?>
XML Processing
Declaration
<?xml-stylesheet type="text/xsl" href=“template.xsl"?> instruction

Comments <!-- File name: Bibliography.xml -->


Attribute
<Bibliography>
<Book ISBN=“1-111-122”>
<Title> Java </Title>
<Author> John </Author>
<Year> 1998 </Year>
</Book> Elements nested
Root or document
element
. Within root element
.
<Book>
<Title> Oracle </Title>
<Author> Emad </Author>
<Year> 1999 </Year>
</Book>
</Bibliography>

10
Components of an XML Document
• Elements
• Each element has a beginning and ending tag
• <TAG_NAME>...</TAG_NAME>
• Elements can be empty (<TAG_NAME />)

• Attributes
• Describes an element; e.g., data type, data range, etc.
• Can only appear on beginning tag
• Example: <Book ISBN = “1-111-123”>

• Processing instructions
• Encoding specification (Unicode by default)
• Namespace declaration
• Schema declaration

11
XML declaration
• The XML declaration looks like this:
<?xml version="1.0" encoding="UTF-8“ standalone="yes"?>

• The XML declaration is not required by browsers but is required by


most XML processors (so include it!)

• If present, the XML declaration must be first--not even white space


should precede it

• Note that the brackets are <? and ?>


• version="1.0" is required
• encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or
something else, or it can be omitted.
• Standalone tells whether there is a separate DTD
12
Processing Instructions
• PIs (Processing Instructions) may occur anywhere in the XML document (but
usually in the beginning)

• A PI is a command to the program processing the XML document to handle


it in a certain way

• XML documents are typically processed by more than one program

• Programs that do not recognize a given PI should just ignore it

• General format of a PI: <?target instructions?>

• Example: <?xml-stylesheet type="text/css“ href="mySheet.css"?>


13
XML Elements
• An XML element is everything from the element's start tag to the
element's end tag

• XML Elements are extensible and they have relationships

• XML Elements have simple naming rules:

• Names can contain letters, numbers, and other characters

• Names must not start with a number or punctuation character

• Names must not start with the letters: xml (or XML or Xml ..)
14
XML Attributes
• XML elements can have attributes: Example: <Book ISBN = “1-111-123”>
• Data can be stored in child elements or in attributes
• Should you avoid using attributes?
• Here are some of the problems using attributes:

• attributes cannot contain multiple values (child elements can)

• attributes are not easily expandable (for future changes)

• attributes cannot describe structures (child elements can)

• attributes are more difficult to manipulate by program code

• attribute values are not easy to test against a (DTD) - which is used to
define the legal elements of an XML document
15
Distinction between sub-element and attribute
• In the context of documents, attributes are part of markup, while sub-element
contents are part of the basic document contents

• In the context of data representation, the difference is unclear and may be


confusing

• Same information can be represented in two ways

• <Book … Publisher = “McGraw Hill”> … <??Book>

• <Book>

<Publisher> McGraw Hill </Publisher>

</Book>

• Suggestion: use attributes for identifiers of elements, and use sub-elements for
contents 16
XML Validation
• Well-Formed XML document:
• Is an XML document with the correct basic syntax

• Valid XML document:


• Must be well formed plus
• Conforms to a predefined DTD or XML Schema.

17
Rules For Well-Formed XML
• Must begin with the XML declaration
• Must have one unique root element
• All start tags must match end-tags
• XML tags are case sensitive

• All elements must be closed

• All elements must be properly nested


• All attribute values must be quoted
• XML entities must be used for special characters
18
XML DTD: Data Type Definition

• A DTD defines the legal elements of an XML document


• defines the document structure with a list of legal elements and attributes

• XML Schema
• XML Schema is an XML based alternative to DTD

• Errors in XML documents will stop the XML program

• XML Validators

19
CDATA
• By default, all text inside an XML document is parsed

• You can force text to be treated as unparsed character data by enclosing


it in <![CDATA[ ... ]]>

• Any characters, even & and <, can occur inside a CDATA

• White space inside a CDATA is (usually) preserved

• The only real restriction is that the character sequence ]]> cannot occur
inside a CDATA

• CDATA is useful when your text has a lot of illegal characters (for
example, if your XML document contains some HTML text)
20
XML and DTDs
• A DTD (Document Type Definition) describes the structure of one or more
XML documents.

• Specifically, a DTD describes:


• Elements :name and value
• Attributes, and
• Entities such as &lt; which represents character ‘<‘

• An XML document is well-structured if it follows certain simple syntactic


rules

• An XML document is valid if it also specifies and conforms to a DTD

21
Why DTDs?
• With DTD, each of your XML files can carry a description of its
own format with it.

• With a DTD, independent groups of people can agree to use a


common DTD for interchanging data.

• Your application can use a standard DTD to verify that the data
you receive from the outside world is valid.

• You can also use a DTD to verify your own data.

22
Parsers

• An XML parser is an API that reads the content of an XML


document

• Currently popular APIs are DOM (Document Object


Model) and SAX (Simple API for XML)

• A validating parser is an XML parser that compares the XML


document to a DTD and reports any errors

23
An XML example
• <novel>
<foreword>
<paragraph> This is a great novel </paragraph>
</foreword>
<chapter number="1">
<paragraph>It was a dark and stormy night.</paragraph>
<paragraph>Suddenly, a shot rang out!</paragraph>
</chapter>
</novel>
An XML document contains (and the DTD describes):
Elements, such as novel and paragraph, consisting of tags and content
Attributes, such as number="1", consisting of a name and a value
Entities (not used in this example)
24
<novel>
<foreword>
A DTD example <paragraph> This is a great novel </paragraph>
</foreword>
<chapter number="1">
• <!DOCTYPE novel [ <paragraph>It was a dark...</paragraph>
<paragraph>Suddenly, a shot..!</paragraph>
<!ELEMENT novel (foreword, chapter+)> </chapter>
<!ELEMENT foreword (paragraph+)> </novel>
<!ELEMENT chapter (paragraph+)>
<!ELEMENT paragraph (#PCDATA)>
<!ATTRIBUTE chapter number CDATA #REQUIRED>
]>
• A novel consists of a foreword and one or more chapters, in that order
• Each chapter must have a number attribute
• A foreword consists of one or more paragraphs
• A chapter also consists of one or more paragraphs
• A paragraph consists of parsed character data (text that cannot contain any
other elements)
25
ELEMENT descriptions
<!DOCTYPE novel [
• Suffixes: Example <!ELEMENT novel (foreword, chapter+)>
? optional foreword? <!ELEMENT foreword (paragraph+)>
<!ELEMENT chapter (paragraph+)>
+ one or more chapter+ <!ELEMENT paragraph (#PCDATA)>
<!ATTRIBUTE chapter number CDATA #REQUIRED>
• zero or more appendix* ]>

• Separators:
, both, in order foreword?, chapter+
| or section|chapter

• Grouping:
() grouping (section|chapter)+
#REQUIRED Attribute is not optional
26
Another example: XML
<?xml version="1.0"?>
<!DOCTYPE myXmlDoc SYSTEM "http://www.mysite.com/mydoc.dtd">
<weatherReport>
<date>05/29/2002</date>
<location>
<city>Philadelphia</city>
<state>PA</state>
<country>USA</country>
</location>
<temperature-range>
<high scale="F">84</high>
<low scale="F">51</low>
</temperature-range>
</weatherReport> 27
The DTD for this example
<!ELEMENT weatherReport (date, location, temperature-range)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT location (city, state, country)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT country (#PCDATA)>
<!ELEMENT temperature-range ((low, high)|(high, low))>
<!ELEMENT low (#PCDATA)>
<!ELEMENT high (#PCDATA)>
<!ATTLIST low scale (C|F) #REQUIRED>
<!ATTLIST high scale (C|F) #REQUIRED>
28
XML Schema …

• The purpose of an XML Schema is to define the legal building


blocks of an XML document, just like a DTD.
• An XML Schema:
• defines elements that can appear in a document
• defines attributes that can appear in a document
• defines which elements are child elements
• defines the order of child elements
• defines the number of child elements
• defines whether an element is empty or can include text
• defines data types for elements and attributes
• defines default and fixed values for elements and attributes
29
XML Schema …
• Many think that very soon XML Schemas will be used in most Web
applications as a replacement for DTDs. Here are some reasons:

• XML Schemas are extensible to future additions

• XML Schemas are richer and more useful than DTDs

• XML Schemas are written in XML

• XML Schemas support data types

• XML Schemas support namespaces


30
XML Schema …
• Look at this simple XML document called "note.xml":

• <?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body> Don't forget me this weekend!</body>
</note>

• This is a simple DTD file called "note.dtd" that defines the elements of the
XML document above ("note.xml"):

• <!ELEMENT note (to, from, heading, body)>


<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)> 31
Simple XML schema
• <?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com" elementFormDefault="qualified">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence> <?xml version="1.0"?>
<note>
</xs:complexType> <to>Tove</to>
<from>Jani</from>
</xs:element> <heading>Reminder</heading>
</xs:schema> <body> Don't forget me this weekend!</body>
</note> 32
XML schema
• The <schema> is the root element of every XML schema
<?xml version="1.0"?>

<xs:schema>
...
...
</xs:schema>

• The <schema> element may contain some attributes. A schema declaration often looks
something like this:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com"
elementFormDefault="qualified">
<xs:schema> ... ... </xs:schema>
33
Information Retrieval by Using
XPath and XQuery
XML Tree
• An XML file can be viewed as a tree with a
root element and leaves that are
connected by branches.
<?xml version="1.0" encoding="ISO-8859-1" ?>
<library> <--- root element
<book>
<chapter> </chapter>
<chapter>
<section> • A tree has a root element and child
<paragraph>A </paragraph> elements.
• All elements can be parent so can have
<paragraph>B </paragraph> sub elements (child elements):
</section> • An element can have text content and
</chapter> attribute
• children on the same level are called
</book> Siblings (brothers and sisters).
</library> 35
XPath
• XPath is a syntax used for selecting parts of an XML
document
• The way XPath describes paths to elements is similar to the
way an operating system describes paths to files
• XPath is almost a small programming language; it has
functions, tests, and expressions
• XPath is a W3C standard

36
Terminology
<library> • library is the parent of book; book is the
parent of the two chapters
<book>
<chapter>
• The two chapters are the children of
</chapter>
book, and the section is the child of the
second chapter
<chapter>
<section> • The two chapters of the book are
<paragraph>A </paragraph> siblings (they have the same parent)

<paragraph>B </paragraph>
</section> • library, book, and the second chapter
are the ancestors of the section
</chapter>

• The two chapters, the section, and the


</book> two paragraphs are the descendants of
</library> the book
37
Paths
• Operating System • XPath

➢/ = the root directory ➢/library = the root element (if


named library )

➢/users/tony/foo = the file ➢/library/book/chapter/section =


named foo in user in users every section element in a chapter
in every book in the library
➢foo = the file named foo in the ➢section = every section element
current directory that is a child of the current
element

➢. = the current directory ➢. = the current element

➢.. = the parent directory ➢.. = parent of the current element

➢/users/ tony /* = all the files in ➢/library/book/chapter/* = all the


/users/ tony elements in /library/book/chapter
38
Slashes
• A path that begins with a / represents an absolute path, starting from the
top of the document
• Example: /email/message/header/from
• Note that even an absolute path can select more than one element
• A slash by itself means “the whole document”

• A path that does not begin with a / represents a path starting from the
current element
• Example: header/from

• A path that begins with // can start from anywhere in the document
• Example: //header/from selects every element from that is a child of an element
header
• This can be expensive, since it involves searching the entire document
39
Brackets and last()
• A number in brackets selects a particular matching child
• Example: /library/book[1] selects the first book of the library
• Example: //chapter/section[2] selects the second section of every chapter in the XML
document
• Example: //book/chapter[1]/section[2]
• Only matching elements are counted; for example, if a book has both sections and exercises,
the latter are ignored when counting sections

• The function last() in brackets selects the last matching child


• Example: /library/book/chapter[last()]

• You can even do simple arithmetic


• Example: /library/book/chapter[last()-1]
40
Stars
• A star, or asterisk, is a “wild card”--it means “all the elements at
this level”
• Example: /library/book/chapter/* selects every child of every chapter of
every book in the library
• Example: //book/* selects every child of every book (chapters,
tableOfContents, index, etc.)
• Example: /*/*/*/paragraph selects every paragraph that has exactly
three ancestors
• Example: //* selects every element in the entire document

41
XQuery
• XQuery is the language for querying XML data

• XQuery for XML is like SQL for databases

• XQuery is built on XPath expressions

• XQuery is defined by the W3C

• XQuery is supported by all the major database engines (IBM, Oracle,


Microsoft, etc.)

• XQuery will become a W3C standard - and developers can be sure


that the code will work among different products
42
XQuery Basic Syntax Rules
• XQuery is case-sensitive

• XQuery elements, attributes, and variables must be valid XML names

• An XQuery string value can be in single or double quotes

• An XQuery variable is defined with a $ followed by a name, e.g. $bookstore

• XQuery comments are delimited by (: and :), e.g. (: XQuery Comment :)

43
XQuery Example
• Example:

• The following predicate is used to select all the book elements


under the bookstore element that have a price element with a
value that is less than 30:
• doc("books.xml")/bookstore/book[price<30]

• Output
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
44
XQuery FLWOR Expressions
• The syntax of Flower expression looks like the combination of SQL and path
expression

• The following path expression will select all the title elements under the book
elements that is under the bookstore element that have a price element with a
value that is higher than 30.
doc("books.xml")/bookstore/book[price>30]/title

• The following FLWOR expression will select exactly the same as the path
expression above
for $x in doc("books.xml")/bookstore/book
where $x/price>30
return $x/title
• Output
<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>
45
-- FLWOR briefly explained
for $x in doc("books.xml")/bookstore/book
where $x/price>30
order by $x/title
return $x/title

• FLWOR is an acronym for "For, Let, Where, Order by, Return".


• The for clause selects all book elements under the bookstore element into a variable
called $x.
• The Let clause allows to define variable and assign it a sequence, e.g., let $x:= (1 to 5)
• The where clause selects only book elements with a price element with a value greater
than 30.
• The order by sorts the results according to the specified element
• The return clause specifies what should be returned. Here it returns the title elements
• All the clauses but return are optional 46
This lecture in exam!
Can be one or more of the following:
• Write an XML file that is equivalent to a relational table
• Draw an XML tree
• Write an XML Schema or DTD for a given XML file
• Use XPath to find answers from an XML file
• Use XQuery to find answers from an XML file

47
- References

• W3 Schools XML Tutorial

• http://www.w3schools.com/xml/default.asp

• W3C XML page

• http://www.w3.org/XML/

• XML Tutorials

• http://www.programmingtutorials.com/xml.aspx

• Online resource for markup language technologies

• http://xml.coverpages.org/

• Several Online Presentations


48
- Tutorial
• W3 Schools XML Tutorial
• http://www.w3schools.com/xml/default.asp

49
END

You might also like