[go: up one dir, main page]

0% found this document useful (0 votes)
10 views118 pages

DSS01

This document provides an overview of XML (eXtensible Markup Language), detailing its purpose for data transport and storage, as well as its differences from HTML. It explains the structure of XML documents, including naming rules, the importance of well-formed and valid documents, and the use of DTDs (Document Type Definitions) for defining XML languages. Additionally, it discusses the advantages of XML in data sharing, transport, and creating new internet languages.

Uploaded by

0pear.lina0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views118 pages

DSS01

This document provides an overview of XML (eXtensible Markup Language), detailing its purpose for data transport and storage, as well as its differences from HTML. It explains the structure of XML documents, including naming rules, the importance of well-formed and valid documents, and the use of DTDs (Document Type Definitions) for defining XML languages. Additionally, it discusses the advantages of XML in data sharing, transport, and creating new internet languages.

Uploaded by

0pear.lina0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

Semi-Structured

Documents
Management
Level: 3rd Licence
Academic Year: 2022-2023

Dr. MEADI Mohamed Nadjib


Chapter 01:
XML Language
Presentation
Introduction
• XML stand for: eXtensible Markup Language
• Developed from SGML
• XML was designed to transport and store data.
• XML is a meta-language.
– A meta-language is a language that's used to define other languages.
You can use XML for instance to define a language like WML.
• Deficiencies of HTML and SGML
– Lax syntactical rules
– Many complex features that are rarely used
• XML can be written by hand or generated by computer
– Useful for data exchange
• XML documents are processed by parsers:
– a program that analyzes the syntax or structure of a given file.
Introduction

• XML tags are not predefined. You must define your own tags
• XML is designed to be self-descriptive
• XML is a W3C Recommendation
• XML is Not a Replacement for HTML
• What you can do with XML:
– Define data structures
– Make these structures platform independent
– Process XML-defined data automatically
– Define your own tags
• What you cannot do with XML:
– Define how your data is shown. To show data, you need other
techniques.
Multi-supports publication

Middleware
XMLizer XML Publication
application
(XSL)

Data Set

Digital TV

ourquoi XML ?
Difference between XML and HTML

• XML was designed to carry data, not displaying data


• XML is not a replacement for HTML.
• Different goals:
– XML was designed to describe data and to focus on what data
is.
– HTML was designed to display data and to focus on how data
looks.
• HTML is about displaying information, XML is about describing
information.

CSEB 124 Web Programming


Differences between XML and HTML

1. XML and HTML were designed with different goals:


– XML was designed to transport and store data, with a focus on
what data is.
– HTML was designed to display data, with a focus on how data
looks.
2. HTML is a markup language used to describe the layout of any kind
of information
– XML is a meta-markup language that can be used to define
markup languages that can define the meaning of specific kinds of
information
4. XML does not predefine any tags whereas HTML tags are
predefined in the official specification of HTML
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
XML
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>

</bibliography>
XML describes the content
How XML Can Be Used?

• Plain Text
– Easy to edit
– Useful for storing small (compared to large databases) amounts
of data
– Possible to efficiently store large amounts of XML data
through an XML front end to a database,
• Data Identification
– Tell you what kind of data you have
– Can be used in different ways by different applications
How XML Can Be Used?

• XML Separates Data from HTML


• If you need to display dynamic data in your HTML
document, it will take a lot of work to edit the HTML each
time the data changes.
• With XML, data can be stored in separate XML files. This
way you can concentrate on using HTML for layout and
display, and be sure that changes in the underlying data will
not require any changes to the HTML.
• With a few lines of JavaScript code, you can read an external
XML file and update the data content of your web page.
How XML Can Be Used?

• XML Simplifies Data Sharing


• In the real world, computer systems and databases contain
data in incompatible formats.
• XML data is stored in plain text format. This provides a
software- and hardware-independent way of storing data.
• This makes it much easier to create data that can be shared
by different applications.
How XML Can Be Used?

• XML Simplifies Data Transport


• One of the most time-consuming challenges for developers is
to exchange data between incompatible systems over the
Internet.
• Exchanging data as XML greatly reduces this complexity,
since the data can be read by different incompatible
applications.
How XML Can Be Used?

• XML Simplifies Platform Changes


• Upgrading to new systems (hardware or software platforms),
is always time consuming. Large amounts of data must be
converted and incompatible data is often lost.
• XML data is stored in text format. This makes it easier to
expand or upgrade to new operating systems, new
applications, or new browsers, without losing data.
How XML Can Be Used?

• XML Makes Your Data More Available


• Different applications can access your data, not only in
HTML pages, but also from XML data sources.
• With XML, your data can be available to all kinds of
"reading machines" (Handheld computers, voice machines,
news feeds, etc), and make it more available for blind
people, or people with other disabilities.
How XML Can Be Used?

• XML is Used to Create New Internet Languages


• A lot of new Internet languages are created with XML.
• Here are some examples:
– XHTML
– WSDL for describing available web services
– WAP and WML as markup languages for handheld devices
– RSS languages for news feeds
– RDF and OWL for describing resources and ontology
– SMIL for describing multimedia for the web
Why do we need XML?
• XML is used to aid the exchange of data. It makes it possible
to define data in a clear way.
• Both the sending and the receiving parties will use XML to
understand the kind of data that's been sent. By using XML
everybody knows that the same interpretation of the data is
used,
• XML makes communication easy. It's a great tool for
transactions between businesses.
• You can define other languages with XML. A good example is
WML
– (Wireless Markup Language), the language used in
WAPcommunications.
– WML is just an XML dialect.
Displaying on the web
• Generally, a generic XML document is rendered as raw XML text by
most web browsers. Some display it with 'handles' (e.g. + and - signs in
the margin) that allow parts of the structure to be expanded or collapsed
with mouse-clicks.
• You need style sheets (CSS or XSLT) to render a display of your
choice.
• In order to style the rendering in a browser with CSS or XSLT, the
XML document must include a reference to the stylesheets. E.g.
• <?xml-stylesheet type="text/css" href="myStyleSheet.css"?>
• <?xml-stylesheet type="text/xml" href="myTransform.xslt"?>
Two Related types of XML documents

1. A "Well Formed" XML document has correct XML syntax. It


conforms to the general rules of XML syntax:
– XML documents must have a root element
– XML elements must have a closing tag
– XML tags are case sensitive
– XML elements must be properly nested
– XML attribute values must be quoted

2. A Valid XML document is an XML validated against a DTD.


– A "Valid" XML document is a "Well Formed" XML document,
which also conforms to the rules of a Document Type Definition
(DTD)
Valid XML Document
• A valid XML document has a structure that's valid. That's the part
you can check.
• If a document is valid, it's clearly defined what the data in the
document really means.

• To use XML you need a DTD (Document Type Definition), or


XML schema
• A DTD( or XML schema) contains the rules for a particular tXML
documentscuments.
• It's the DTD (or XML schema) that defines the language.
XML documents
• XML documents use a self-describing and simple syntax.
• All elements can have sub elements (child elements):
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
• The terms parent, child, and sibling are used to describe the
relationships between elements. Parent elements have
children. Children on the same level are called siblings
(brothers or sisters).
• All elements can have text content and attributes (just like in
HTML).
<?xml version="1.0"
encoding="ISO-8859-1"?> XML Documents Form a
<bookstore> Tree Structure
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry P.</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
XML Naming Rules
XML elements must follow these naming rules:
• Names can contain letters, numbers, and other characters
• Names cannot start with a number or punctuation character
• Names cannot start with the letters xml (or XML, or Xml, etc)
• Names cannot contain spaces
• Any name can be used, no words are reserved.
Parts of XML Document

• Declaration: <?xml version=’1.0’ encoding=’UTF-16’


standalone=’yes’?>
• Element: this is any text properly nested between two matching tags:
<aTag> ... </aTag>.
• name this refers to the tag’s text e.g."aTag”
• content is the text between the tags.
• parent-child relationships this occurs between elements and are given by
the nesting of the tags.
• attributes can be attached to the opening tag.
• attribute values must be enclosed in quotes.
• empty elements can be given by <aTag/>.
• comments are any text enclosed in <!-- ... -->
• processing instructions are enclosed in <? ... ?> and may be used by the
XML processor receiving the document.
More XML: Entity References
• Syntax: &entityname;
• Example:
<element> this is less than &lt; </element>
• Some entities: &lt; <

&gt; >

&amp; &

&apos; ‘

&quot; “

&#38; Unicode char


More XML: Processing
Instructions
• Syntax: <?target argument?>
• Example:
<product> <name> Alarm Clock </name>
<?ringBell 20?>
<price> 19.99 </price>
</product>

• What do they mean ?


More XML: Comments
• Syntax <!-- .... Comment text... -->

• Yes, they are part of the data model !!!


Attributes in XML Declaration
<?xml version=“1.0” encoding=“UTF-16” standalone=“yes” ?>
• Version:
– This specifies which version of the XML specification the document adheres to.
– There are two versions of the XML specification, 1.0 and 1.1
• Encoding:
– The encoding declaration identifies which encoding is used to represent the
characters in the document. E.g. UTF-8 or UTF-16, ISO-8859-1 Unicode
encoding.
• Standalone:
– The standalone declaration indicates whether a document relies on information
from an external source, such as external document type definition (DTD), for
its content.
– It must be set to either yes or no:
– yes specifies that the document exists entirely on its own, without
depending on any other files.
– no indicates that the document may depend on an external DTD
XML Namespaces
• http://www.w3.org/TR/REC-xml-names
(1/99)
• name ::= [prefix:]localpart

<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
XML Namespaces
• syntactic: <number> , <isbn:number>
• semantic: provide URL for schema
<tag xmlns:mystyle = “http://…”>
… defined here

<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
Attribute Or Element
1. Use an element:
• When the order is important (the order of the attributes is random)
• When you want to reuse an element several times (with the same
parent)
• When you want (in the future) to have descendants / an internal
structure
• To represent a data type (object),
2. Use an attribute:
• When you want to refer to another element,
• To indicate use/type/etc. of an <address usage="prof"> ... </address>
element,
• When you want to impose default values in the DTD,
Avoid XML Attributes?
Some of the problems with using attributes are:
• attributes cannot contain multiple values (elements can)
• attributes cannot contain tree structures (elements can)
• attributes are not easily expandable (for future changes)

Attributes are difficult to read and maintain. Use elements


for data. Use attributes for information that is not relevant to
the data.
XML Document Structure
• An XML document uses two Auxiliary files
– Schema file
• DTD or XML Schema
– Style file
• Cascading Style Sheets
• XSLT - XSL (eXtensible Stylesheet Language) is created for this
purpose.
• An XML document is a tree of elements with a single root
• In XML, you define your own tags.
• If you want to use a tag, you'll have to define it's meaning.
• This definition is stored in a DTD (Document Type Definition).
You can define your own DTD or use an existing one.
• An alternative for a DTD is Schema.
XML Document Type
Definitions
Very Simple DTD
<!DOCTYPE company [
<!ELEMENT company ((person|product)*)>
<!ELEMENT person (ssn, name, office, phone?)>
<!ELEMENT ssn (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT office (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT product (pid, name, description?)>
<!ELEMENT pid (#PCDATA)> <company>
<!ELEMENT description (#PCDATA)> <person> <ssn> 123456789 </ssn>
]> <name> John </name>
<office> B432 </office>
<phone> 1234 </phone>
</person>
<person> <ssn> 987654321 </ssn>
<name> Jim </name>
<office> B123 </office>
</person>
Example of a valid <product> ... </product>
XML document: ...
</company>
Content Model
• Element content: what we can put in an element,
• Content model:
– Complex = a regular expression over other elements
– Text-only = #PCDATA
– Empty = EMPTY
– Any = ANY
– Mixed content = (#PCDATA | A | B | C)*
• (i.e. very restrictied)
Attributes in DTDs
<!ELEMENT person (ssn, name, office, phone?)>
<!ATTLIS person age CDATA #REQUIRED>

<person age=“25”>
<name> ....</name>
...
</person>
Attributes in DTDs
Types:
• CDATA = string
• ID = key
• IDREF = foreign key
• IDREFS = foreign keys separated by space
• (Monday | Wednesday | Friday) = enumeration
• NMTOKEN = must be a valid XML name
• NMTOKENS = multiple valid XML names
• ENTITY = you don’t want to know this
Attributes in DTDs
Kind:
• #REQUIRED
• #IMPLIED = optional
• value = default value
• value #FIXED = the only value allowed
Attributes in DTDs
<!ELEMENT person (ssn, name, office, phone?)>
<!ATTLIS person age CDATA #REQUIRED
id ID #REQUIRED
manager IDREF #REQUIRED
manages IDREFS #REQUIRED
>

<person age=“25”
id=“p29432”
manager=“p48293” manages=“p34982 p423234”>
<name> ....</name>
...
</person>
Attributes in DTDs
Using DTDs
• Must include in the XML document
• Either include the entire DTD:
– <!DOCTYPE rootElement [ ....... ]>
• Or include a reference to it:
– <!DOCTYPE rootElement SYSTEM
“http://www.mydtd.org”>
• Or mix the two... (e.g. to override the
external definition)
Use an internal DTD
• The DTD is declared directly in the XML document in the
DOCTYPE tag DOCTYPE

43
Use an external DTD

• The DTD is declared in an external file

– SYSTEM : The DTD is indicated by providing its location by


its URI

– PUBLIC : the DTD is in the public domain (standard),


indicated by its identifier (FPI) and its location by its URI

44
XML Schema
What is XML Schema?

• The origin of schema


 XML Schema documents are used to define and
validate the content and structure of XML data.
 XML Schema was originally proposed by Microsoft,
but became an official W3C recommendation in May
2001
DTD versus Schema

Limitations of DTD Advantages of Schema


• No constraints on character • Syntax in XML Style
data • Supporting Namespace and
• Not using XML syntax import/include
• No support for namespace • More data types
• Very limited for reusability and • Able to create complex data type by
extensibility inheritance
• Inheritance by extension or restriction
• More …
An XML Instance Document Example

<book isbn="0836217462">
<title> Being a Dog Is a Full-Time Job</title>
<author>Charles M. Schulz</author>
<qualification> extroverted beagle </qualification>
</book>
The Example’s Schema

<?xml version="1.0" encoding="utf-8"?>


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name=“qualification“ type=“xs:string”/>
</xs:sequence>
<xs:attribute name=“isbn” type=“integer”/>
</xs:complexType>
</xs:element>
</xs:schema>
book.xsd
DTD v.s. XML Schemas
DTD:
<!ELEMENT paper (title,author*,year, (journal|conference))>
XML Schema:
<xs:element name=“paper” type=“paperType”/>
<xs:complexType name=“paperType”>
<xs:sequence>
<xs:element name=“title” type=“xs:string”/>
<xs:element name=“author” minOccurs=“0”/>
<xs:element name=“year”/>
<xs: choice> < xs:element name=“journal”/>
<xs:element name=“conference”/>
</xs:choice>
</xs:sequence>
</xs:element>
Example

A valid XML Document:

<paper>
<title> The Essence of XML </title>
<author> Simeon</author>
<author> Wadler</author>
<year>2003</year>
<conference> POPL</conference>
</paper>
Elements v.s. Types

<xs:element name=“person”> <xs:element name=“person”


<xs:complexType> type=“ttt”>
<xs:sequence> <xs:complexType name=“ttt”>
<xs:element name=“name” <xs:sequence>
type=“xs:string”/> <xs:element name=“name”
<xs:element name=“address” type=“xs:string”/>
type=“xs:string”/> <xs:element name=“address”
</xs:sequence> type=“xs:string”/>
</xs:complexType> </xs:sequence>
</xs:element> </xs:complexType>

Both say the same thing; in DTD:


<!ELEMENT person (name,address)>
Basic commands
• element: association of a type to a tag,
o attributes name, type, ref, minOccurs, maxOccurs, ...
• attribute: association of a type to an attribute,
o attributes name, type,

• simpleType:
o the multiple basic types integer, real, string, time, date,
ID, IDREF, …,
o expandable by constraints,
• complexType:
o a composition of types defines an aggregation of typed
elements
Simple types
• string • integer
– Confirm this is electric – -126789, -1, 0, 1, 126789
• normalizedString • positiveInteger
– Confirm this is electric – 1, 126789
• token • negativeInteger
– Confirm this is electric – -126789, -1
• byte • nonNegativeInteger
– -1, 126 – 0, 1, 126789
• unsignedByte • nonPositiveInteger
– 0, 126 – -126789, -1, 0
• base64Binary • int
– GpM7 – -1, 126789675
• hexBinary • unsignedInt
– 0FB7 – 0, 1267896754

55
Schema
Simple types
• long • boolean
– -1, 12678967543233 – true, false 1, 0
• unsignedLong • time
– 0, 12678967543233 – 13:20:00.000, 13:20:00.000-05:00
• short • dateTime
– -1, 12678 – 1999-05-31T13:20:00.000-05:00
• unsignedShort • duration
– 0, 12678 – P1Y2M3DT10H30M12.3S
• decimal • date
– -1.23, 0, 123.4, 1000.00 – 1999-05-31
• float • gMonth
– -INF, -1E4, -0, 0, 12.78E-2, 12, INF, – --05--
NaN • gYear
• double – 1999
– -INF, -1E4, -0, 0, 12.78E-2, 12, INF,
NaN

56
Schema
Simple types
• gYearMonth • language
– 1999-02 – en-GB, en-US, fr
• gDay
– ---31
• ID
– "A212"
• gMonthDay
– --05-31 • IDREF
• Name – "A212"
– shipTo • IDREFS
• QName – "A212" "B213"
– po:USAddress
• ENTITY
• NCName
– USAddress • ENTITIES
• anyURI • NOTATION
– http://www.example.com/,
– http://www.example.com/doc.html#ID5 • NMTOKEN, NMTOKENS
– US
– Brésil Canada Mexique

57
Schema
Local v.s. Global Types

• Local type:
<xs:element name=“person”>
[define locally the person’s type]
</xs:element>
• Global type:
<xs:element name=“person” type=“ttt”/>

<xs:complexType name=“ttt”>
[define here the type ttt]
</xs:complexType>
Global types: can be reused in other elements
Local v.s. Global Elements

• Local element:
<xs:complexType name=“ttt”>
<xs:sequence>
<xs:element name=“address” type=“...”/>...
</xs:sequence>
</xs:complexType>
• Global element:
<xs:element name=“address” type=“...”/>

<xs:complexType name=“ttt”>
<xs:sequence>
<xs:element ref=“address”/> ...
</xs:sequence>
</xs:complexType>
Regular Expressions
Recall the element-type-element alternation:
<xs:complexType name=“....”>
[regular expression on elements]
</xs:complexType>
Regular expressions:
• <xs:sequence> A B C </...> =ABC
• <xs:choice> A B C </...> =A|B|C
• <xs:group> A B C </...> = (A B C)
• <xs:... minOccurs=“0” maxOccurs=“unbounded”> ..</...> = (...)*
• <xs:... minOccurs=“0” maxOccurs=“1”> ..</...> = (...)?
Local Names
<xs:element name=“person”>
<xs:complexType>
name has . . . . .
<xs:element name=“name”>
different meanings <xs:complexType>
<xs:sequence>
in person and <xs:element name=“firstname” type=“xs:string”/>
in product <xs:element name=“lastname” type=“xs:string”/>
</xs:sequence>
</xs:element>
. . . .
</xs:complexType>
</xs:element>

<xs:element name=“product”>
<xs:complexType>
. . . . .
<xs:element name=“name” type=“xs:string”/>

</xs:complexType>
</xs:element>
“Mixed” Content, “Any” Type
<xs:complexType mixed="true">
. . . .
• Better than in DTDs: can still enforce the type, but now
may have text between any elements

<xs:element name="anything" type="xs:anyType"/>


....
• Means anything is permitted there
“All” Group
<xs:complexType name="PurchaseOrderType">
<xs:all> <xs:element name="shipTo" type="USAddress"/>
<xs:element name="billTo" type="USAddress"/>
<xs:element ref="comment" minOccurs="0"/>
<xs:element name="items" type="Items"/>
</xs:all>
<xs:attribute name="orderDate" type="xs:date"/>
</xs:complexType>

• A restricted form of & in SGML


• Restrictions:
– Only at top level
– Has only elements
– Each element occurs at most once
• E.g. “comment” occurs 0 or 1 times
Derived Types by Extensions
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>

<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
Derived Types by Restrictions

<complexContent>
<restriction base="ipo:Items“>
… [rewrite the entire content, with restrictions]...
</restriction>
</complexContent>

• (*): may restrict cardinalities, e.g. (0,infty)


to (1,1); may restrict choices; other
restrictions…
Corresponds to set inclusion
The patterns
• Constraints on predefined simple type
• Use of regular expressions
• Example
<xsd:simpleType name="NumItem">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
</xsd:restriction>
</xsd:simpleType>

66
Schema
Facets of Simple Types
•Facets = additional properties restricting a simple type
•15 facets defined by XML Schema

Examples • maxInclusive
• length • maxExclusive
• minLength
• minInclusive
• maxLength
• pattern • minExclusive
• enumeration • totalDigits
• whiteSpace • fractionDigits
Types reuse
• Simple type extension :
<xs:simpleType name="num5">
<xs:restriction base="xs:string">
<xs:pattern value="\d{5}"/>
</xs:restriction>
</xs:simpleType>

• Complexe type (sequence):


<xs:element name="livre">
<xs:complexType>
<xs:sequence>
<xs:element name="Titre" type="xs:string"/>
<xs:element name="Auteur" type="xs:string"/>
<xs:element name="ISBN" type="num5"/>
</xs:sequence>
</xs:complexType>
</xs:element>
68
Attributes

• The definition of attributes associated with an element is


done in the tag <attribute>.
– name :
– type : attribute type, can only be a simple type
– use : specify whether the attribute is required, optional or
prohibited.
Possible values:
• required
• optional
• prohibited
– fixed : do not change
– default : default value. 69
Attributes
• Attributes definition can be done in two different ways.
– Define the attribute and use it in the type definition.
– Define the attribute directly inside the type,

70
Groupage d’éléments
<xs:group name="TitreAuteurISBN">
<xs:sequence>
<xs:element name="Titre" type="xs:string"/>
<xs:element name="Auteur" type="xs:string"/>
<xs:element name="ISBN" type="num5"/>
</xs:sequence>
</xs:group>
<xs:element name="livre">
<xs:complexType>
<xs:sequence>
<xs:group ref="TitreAuteurISBN"/>
<xs:element ref="traduction"/>
</xs:sequence>
</xs:complexType>
</xs:element>

71
XML Schema Reference
• Reference without namespace

• Reference with namespace

72
XPath
What is XPath?
• A language designed to be used by both XSL
Transformations (XSLT) and XPointer.
• Provides common syntax and semantics for
functionality shared between XSLT and XPointer.
• Primary purpose: Address ‘parts’ of an XML
document, and provide basic facilities for
manipulation of strings, numbers and booleans.
• W3C Recommendation. November 16, 1999
• Latest version: http://www.w3.org/TR/xpath
Introduction
• XPath uses a compact, string-based, rather
than XML element-based syntax.
• Operates on the abstract, logical structure of
an XML document (tree of nodes) rather
than its surface syntax.
• Uses a path notation (like URLs) to
navigate through this hierarchical tree
structure.
Introduction
Introduction Cont.
• Defines a way to compute a string-value for
each type of node: element, attribute, text.
• Supports Namespaces.
• Name of a node (a pair consisting of a local
part and namespace URI).
• Expression (Expr) is the primary syntactic
construct.

Introduction
Data Model
• Treats an XML document as a logical tree
• This tree consists of 7 nodes:
o Root Node – the root of the document
o Element Nodes – one for each element in the document
o Unique ID’s
o Attribute Nodes
o Namespace Nodes
o Processing Instruction Nodes
o Comment Nodes
o Text Nodes
• The tree structure is ordered and reads from top to
bottom and left to right
Data Model
Example for XPath Queries
<bib>
<book> <publisher> Addison-Wesley </publisher>
<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<title> Foundations of Databases </title>
<year> 1995 </year>
</book>
<book price=“55”>
<publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
</bib>
Data Model for XPath
The root

Processing Comment bib


instruction
The root element

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul


Much like the Xquery data model
Location Paths

• LocationPath (most important construct) describes a path


from 1 point to another.
• LocationPath provides the mechanism for ‘addressing’
items in an XML doc
• Two types of paths: Relative & Absolute
• Composed of a series of steps (1 or more) and optional
predicates

LocationPath
XPath: Simple Expressions
/bib/book/year

Result: <year> 1995 </year>


<year> 1998 </year>

/bib/paper/year

Result: empty (there were no papers)


XPath: Restricted Kleene Closure
//author

Result:<author> Serge Abiteboul </author>


<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<author> Jeffrey D. Ullman </author>

/bib//first-name
Result: <first-name> Rick </first-name>
Xpath: Functions
/bib/book/author/text()

Result: Serge Abiteboul


Jeffrey D. Ullman

Rick Hull doesn’t appear because he has firstname, lastname

Functions in XPath:
– text() = matches the text value
– node() = matches any node (= * or @* or text())
– name() = returns the name of the current tag
Xpath: Wildcard
//author/*

Result: <first-name> Rick </first-name>


<last-name> Hull </last-name>

* Matches any element


Xpath: Attribute Nodes
/bib/book/@price

Result: “55”

@price means that price is has to be an


attribute
Xpath: Qualifiers
/bib/book/author[firstname]

Result: <author> <first-name> Rick </first-name>


<last-name> Hull </last-name>
</author>
Xpath: More Qualifiers
/bib/book/author[firstname][address[//zip][city]]/lastname

Result: <lastname> … </lastname>


<lastname> … </lastname>
Xpath: More Qualifiers
/bib/book[@price < “60”]

/bib/book[author/@age < “25”]

/bib/book[author/text()]
Xpath: Summary
bib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book/[@price<“55”]/author/lastname matches…
Xpath: More Details
• An Xpath expression, p, establishes a relation
between:
– A context node, and
– A node in the answer set
• In other words, p denotes a function:
– S[p] : Nodes -> {Nodes}
• Examples:
– author/firstname
– . = self
– .. = parent
– part/*/*/subpart/../name = part/*/*[subpart]/name
The Root and the Root
• <bib> <paper> 1 </paper> <paper> 2 </paper> </bib>
• bib is the “document element”
• The “root” is above bib

• /bib = returns the document element


• / = returns the root

• Why ? Because we may have comments before and after <bib>; they
become siblings of <bib>

• This is advanced xmlogy


Xpath: More Details
• We can navigate along 13 axes:
ancestor
ancestor-or-self
attribute
child
descendant
We’ve only seen these, so far.
descendant-or-self
following
following-sibling
namespace
parent
preceding
preceding-sibling
self
Xpath: More Details
• Examples:
– child::author/child:lastname = author/lastname
– child::author/descendant::zip = author//zip
– child::author/parent::* = author/..
– child::author/attribute::age = author/@age
• What does this mean ?
– paper/publisher/parent::*/author
– /bib//address[ancestor::book]
– /bib//author/ancestor::*//zip
Xpath: Even More Details
• name() = the name of the current node
– /bib//*[name()=book] same as /bib//book

• What does this mean ?


/bib//*[ancestor::*[name()!=book]]

– In a different notation bib.[^book]*._

• Navigation axis gives us strictly more power !


XSLT

97
XSLT
XSLT Overview
• What is XSLT?
– XSL is the Extensible Style Language.
– It has two parts: the transformation language and the
formatting language.
– XSLT provides a syntax for defining rules that
transform an XML document to another document.
• For example, to an HTML document.
– An XSLT “style sheet” consists primarily of a set of
template rules that are used to transform nodes
matching some patterns.
XSLT Overview
• The xml-stylesheet element in the XML instance references an XSL
style sheet.
• In general, children of the stylesheet element in a stylesheet are
templates.
• A template specifies a pattern; the template is applied to nodes in the
XML source document that match this pattern.
– Note: the pattern “/” matches the root node of the document, we will see
this later
• In the transformed document, the body of the template element
replaces the matched node in the source document.
• In addition to text, the body may contain further XSL terms, e.g.:
– xsl:value-of extracts data from selected sub-nodes.
XSLT Overview

• We have an XML document and the style sheet (or rules) to transform it. So,
how do you transform the document?.
• You can transform documents in three ways:
– In the server. A server program, such as a Java servlet, can use a style
sheet to transform a document automatically and serve it to the client.
For example, XML Enabler, which is a servlet that you’ll find at XML for
Java Web site, www.alphaworks.ibm.com/tech/xml4j
– In the client. An XSL-enabled browser may convert XML downloaded
from the server to HTML before display. Currently Internet Explorer
supports a subset of XSLT.
– In a standalone program. XML stored in or generated from a database,
say, maybe “manually” converted to HTML before placing it in the
server’s document directory.
• In any case, a suitable program takes an XML document as input and an XSLT
“style sheet”.
Format of Style Sheet
• XSLT style sheet is itself an XML document.
• We will be using the XSLT elements from the namespace.
http://www.w3.org/1999/XSL/Transform
– As a matter of convention we use the prefix xsl: for this namespace.
• The document root in an XSLT style sheet is an xsl:stylesheet element,
e.g.:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
...
</xsl:stylesheet>
– A synonym for xsl:stylesheet is xsl:transform.
• Several kinds of elements can be nested inside xsl:stylesheet, but by
far the most important is the xsl:template element.
Format of Style Sheet

103
XSLT
Templates

• The XSLT processor transforms the XML document according to


transformation models (templates) described in the XSL sheet to produce
a new document according to the type of desired output,

• Each transformation model defines processing to be performed on an


element or set of elements in the source XML document. .

• A template is represented by the <template> tag in the XSL sheet,

• An XSL sheet can contain several models


XSLT
Templates

• When you match or select nodes, a template tells the XSLT processor how to
transform the node for output
• So all our templates will have the form:
<xsl:template match=“pattern”>
template body
</xsl:template>
• The pattern is an Xpath expression describing the nodes to which the template
can be applied.
• The processor scans the input document for nodes matching this pattern, and
replaces them with the text included in the template body.
XSLT
Templates

• The content of the <template> tag represents the transformation rules to apply
to the elements selected by the match expression

106
XSLT
Templates

 The template tag defines a transformation model.


 match: allows you to select elements of the XML document on which the
transformation will be applied name,:
 name of the template, it allows you to call the template directly without going
through the evaluation of the nodes of the document <xsl:call-template> ,
 priority: model priority, used if the XSLT processor identifies multiple
transformation models for the same node
XSLT
Templates

• The <apply-templates> tag: asks the XSLT processor to apply a defined


template that corresponds to the XPath expression provided by its select
attribute.
• select: is used to identify the models that will be applied, the match is made on
the basis of the match attribute of the model
- If select is not provided, the template is applied to all childs of the current
node..

• In the case where several models correspond to the select expression: It is


therefore necessary:
– Use the model's priority attribute
– Use the template with the most specific match expression
XSLT
Templates
XSLT
Logic : Loops

• <for-each> loops through elements that match the result of the attribute's
XPath expression select
XSLT
Logic: conditional processing

• <if> allows performing conditional processing if the result of the expression of


the attribute test is true
– The test expression accepts the same syntax as XPath predicates.
XSLT
Logic: conditional processing

• <choose>allows making a choice among several alternatives.


• <when>, processing to be performed if the test expression is true
• <otherwise>, process to perform if no condition <when> is met
XSLT
Logique : order

• <sort> is used to order a set of elements


• <sort> comes as a child of a <template> or a <for-each> to order its
elements
– select : expression
used as a sorting criterion
– data-type : text or number , specifies the sort type
– order : ascending ou descending
– case-order : upper-first ou lower-first
XSLT
XML content generation

• <copy> provides a simple way to copy the current node to the output,
– use-attribute-sets : the attributes of the node that will be copied, if empty all
– <copy> does not copy the children of the node
– <element> allows to creation an XML element in the output
– name : local name of the element
– <attribute> > used in conjunction with <element> to add an
attribute to it
– name : attribute name,
XSLT
Output

• The <output> tag is the 1st child of the root of the XSLT document, this
tag indicates:
• method : Output format xml, html or texte.
– doctype-public : is the name of the standard respected by the output.
– doctype-system : is the link to the DTD of this standard.
– indent=yes : indicates that the generated file will be indented automatically.
• disabling indentation decreases the size of generated files.
Exemple
Consider the following XML document and XSLT document, which we can think of as the definition of a query on
the document:
<BIBLIO>
<LIVRE ISBN= “2-212-09052-8” LANG= “FR”>
<AUTEUR>
<NOM>Michard</NOM>
<PRENOM>Alain</PRENOM>
</AUTEUR>
<TITRE>XML, Langage et Applications</TITRE>
<EDITEUR>Eyrolles</EDITEUR>
<DATE_ACHAT>1998</DATE_ACHAT>
<PRIX monnaie= “FF”>340</PRIX>
</LIVRE>
<LIVRE ISBN= “2-7440-0628-9” LANG= “FR”>
<AUTEUR>
<NOM>Ladd</NOM>
<PRENOM>Eric</PRENOM>
</AUTEUR>
<AUTEUR>
<NOM>O’Donnel</NOM>
<PRENOM>Jim</PRENOM>
</AUTEUR>
<TITRE>HTML4, XML et Java 2</TITRE>
<EDITEUR>Campus Press</EDITEUR>
<PRIX monnaie= “FF”>349</PRIX>
</LIVRE>
</BIBLIO>
Exemple
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<LIVRES>
<xsl:for-each select="BIBLIO/LIVRE">
<LIVRE>
<ISBN> <xsl:apply-templates select="@ISBN"/></ISBN>
<TITRE> <xsl:apply-templates select="TITRE"/></TITRE>
<xsl:for-each select="AUTEUR">
<AUTEUR> <xsl:apply-templates select="NOM"/>,
<xsl:apply-templates select="PRENOM"/>
</AUTEUR>
</xsl:for-each>
<EDITEUR> <xsl:apply-templates select="EDITEUR"/></EDITEUR>
</LIVRE>
</xsl:for-each>
</LIVRES>
</xsl:template>
</xsl:stylesheet>
117
Exemple
Executing this document allows to obtain the following XML document:

<LIVRES>
<LIVRE>
<ISBN>2-212-09052-8</ISBN>
<AUTEUR>Michard, Alain</AUTEUR>
<TITRE>XML, Langage et Applications</TITRE>
<EDITEUR>Eyrolles</EDITEUR>
</LIVRE>
<LIVRE>
<ISBN>2-7440-0628-9</ISBN>
<AUTEUR>Ladd, Eric</AUTEUR>
<AUTEUR>O’Donnel, Jim</AUTEUR>
<TITRE>HTML4, XML et Java 2</TITRE>
<EDITEUR>Campus Press</EDITEUR>
</LIVRE>
</LIVRES>

118

You might also like