DSS01
DSS01
Documents
Management
Level: 3rd Licence
Academic Year: 2022-2023
• XML tags are not predefined. You must define your own tags
• XML is designed to be self-descriptive
• XML is a W3C Recommendation
• XML is Not a Replacement for HTML
• What you can do with XML:
– Define data structures
– Make these structures platform independent
– Process XML-defined data automatically
– Define your own tags
• What you cannot do with XML:
– Define how your data is shown. To show data, you need other
techniques.
Multi-supports publication
Middleware
XMLizer XML Publication
application
(XSL)
Data Set
Digital TV
ourquoi XML ?
Difference between XML and HTML
• Plain Text
– Easy to edit
– Useful for storing small (compared to large databases) amounts
of data
– Possible to efficiently store large amounts of XML data
through an XML front end to a database,
• Data Identification
– Tell you what kind of data you have
– Can be used in different ways by different applications
How XML Can Be Used?
> >
& &
' ‘
" “
<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
XML Namespaces
• syntactic: <number> , <isbn:number>
• semantic: provide URL for schema
<tag xmlns:mystyle = “http://…”>
… defined here
<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
Attribute Or Element
1. Use an element:
• When the order is important (the order of the attributes is random)
• When you want to reuse an element several times (with the same
parent)
• When you want (in the future) to have descendants / an internal
structure
• To represent a data type (object),
2. Use an attribute:
• When you want to refer to another element,
• To indicate use/type/etc. of an <address usage="prof"> ... </address>
element,
• When you want to impose default values in the DTD,
Avoid XML Attributes?
Some of the problems with using attributes are:
• attributes cannot contain multiple values (elements can)
• attributes cannot contain tree structures (elements can)
• attributes are not easily expandable (for future changes)
<person age=“25”>
<name> ....</name>
...
</person>
Attributes in DTDs
Types:
• CDATA = string
• ID = key
• IDREF = foreign key
• IDREFS = foreign keys separated by space
• (Monday | Wednesday | Friday) = enumeration
• NMTOKEN = must be a valid XML name
• NMTOKENS = multiple valid XML names
• ENTITY = you don’t want to know this
Attributes in DTDs
Kind:
• #REQUIRED
• #IMPLIED = optional
• value = default value
• value #FIXED = the only value allowed
Attributes in DTDs
<!ELEMENT person (ssn, name, office, phone?)>
<!ATTLIS person age CDATA #REQUIRED
id ID #REQUIRED
manager IDREF #REQUIRED
manages IDREFS #REQUIRED
>
<person age=“25”
id=“p29432”
manager=“p48293” manages=“p34982 p423234”>
<name> ....</name>
...
</person>
Attributes in DTDs
Using DTDs
• Must include in the XML document
• Either include the entire DTD:
– <!DOCTYPE rootElement [ ....... ]>
• Or include a reference to it:
– <!DOCTYPE rootElement SYSTEM
“http://www.mydtd.org”>
• Or mix the two... (e.g. to override the
external definition)
Use an internal DTD
• The DTD is declared directly in the XML document in the
DOCTYPE tag DOCTYPE
43
Use an external DTD
44
XML Schema
What is XML Schema?
<book isbn="0836217462">
<title> Being a Dog Is a Full-Time Job</title>
<author>Charles M. Schulz</author>
<qualification> extroverted beagle </qualification>
</book>
The Example’s Schema
<paper>
<title> The Essence of XML </title>
<author> Simeon</author>
<author> Wadler</author>
<year>2003</year>
<conference> POPL</conference>
</paper>
Elements v.s. Types
• simpleType:
o the multiple basic types integer, real, string, time, date,
ID, IDREF, …,
o expandable by constraints,
• complexType:
o a composition of types defines an aggregation of typed
elements
Simple types
• string • integer
– Confirm this is electric – -126789, -1, 0, 1, 126789
• normalizedString • positiveInteger
– Confirm this is electric – 1, 126789
• token • negativeInteger
– Confirm this is electric – -126789, -1
• byte • nonNegativeInteger
– -1, 126 – 0, 1, 126789
• unsignedByte • nonPositiveInteger
– 0, 126 – -126789, -1, 0
• base64Binary • int
– GpM7 – -1, 126789675
• hexBinary • unsignedInt
– 0FB7 – 0, 1267896754
55
Schema
Simple types
• long • boolean
– -1, 12678967543233 – true, false 1, 0
• unsignedLong • time
– 0, 12678967543233 – 13:20:00.000, 13:20:00.000-05:00
• short • dateTime
– -1, 12678 – 1999-05-31T13:20:00.000-05:00
• unsignedShort • duration
– 0, 12678 – P1Y2M3DT10H30M12.3S
• decimal • date
– -1.23, 0, 123.4, 1000.00 – 1999-05-31
• float • gMonth
– -INF, -1E4, -0, 0, 12.78E-2, 12, INF, – --05--
NaN • gYear
• double – 1999
– -INF, -1E4, -0, 0, 12.78E-2, 12, INF,
NaN
56
Schema
Simple types
• gYearMonth • language
– 1999-02 – en-GB, en-US, fr
• gDay
– ---31
• ID
– "A212"
• gMonthDay
– --05-31 • IDREF
• Name – "A212"
– shipTo • IDREFS
• QName – "A212" "B213"
– po:USAddress
• ENTITY
• NCName
– USAddress • ENTITIES
• anyURI • NOTATION
– http://www.example.com/,
– http://www.example.com/doc.html#ID5 • NMTOKEN, NMTOKENS
– US
– Brésil Canada Mexique
57
Schema
Local v.s. Global Types
• Local type:
<xs:element name=“person”>
[define locally the person’s type]
</xs:element>
• Global type:
<xs:element name=“person” type=“ttt”/>
<xs:complexType name=“ttt”>
[define here the type ttt]
</xs:complexType>
Global types: can be reused in other elements
Local v.s. Global Elements
• Local element:
<xs:complexType name=“ttt”>
<xs:sequence>
<xs:element name=“address” type=“...”/>...
</xs:sequence>
</xs:complexType>
• Global element:
<xs:element name=“address” type=“...”/>
<xs:complexType name=“ttt”>
<xs:sequence>
<xs:element ref=“address”/> ...
</xs:sequence>
</xs:complexType>
Regular Expressions
Recall the element-type-element alternation:
<xs:complexType name=“....”>
[regular expression on elements]
</xs:complexType>
Regular expressions:
• <xs:sequence> A B C </...> =ABC
• <xs:choice> A B C </...> =A|B|C
• <xs:group> A B C </...> = (A B C)
• <xs:... minOccurs=“0” maxOccurs=“unbounded”> ..</...> = (...)*
• <xs:... minOccurs=“0” maxOccurs=“1”> ..</...> = (...)?
Local Names
<xs:element name=“person”>
<xs:complexType>
name has . . . . .
<xs:element name=“name”>
different meanings <xs:complexType>
<xs:sequence>
in person and <xs:element name=“firstname” type=“xs:string”/>
in product <xs:element name=“lastname” type=“xs:string”/>
</xs:sequence>
</xs:element>
. . . .
</xs:complexType>
</xs:element>
<xs:element name=“product”>
<xs:complexType>
. . . . .
<xs:element name=“name” type=“xs:string”/>
</xs:complexType>
</xs:element>
“Mixed” Content, “Any” Type
<xs:complexType mixed="true">
. . . .
• Better than in DTDs: can still enforce the type, but now
may have text between any elements
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
Derived Types by Restrictions
<complexContent>
<restriction base="ipo:Items“>
… [rewrite the entire content, with restrictions]...
</restriction>
</complexContent>
66
Schema
Facets of Simple Types
•Facets = additional properties restricting a simple type
•15 facets defined by XML Schema
Examples • maxInclusive
• length • maxExclusive
• minLength
• minInclusive
• maxLength
• pattern • minExclusive
• enumeration • totalDigits
• whiteSpace • fractionDigits
Types reuse
• Simple type extension :
<xs:simpleType name="num5">
<xs:restriction base="xs:string">
<xs:pattern value="\d{5}"/>
</xs:restriction>
</xs:simpleType>
70
Groupage d’éléments
<xs:group name="TitreAuteurISBN">
<xs:sequence>
<xs:element name="Titre" type="xs:string"/>
<xs:element name="Auteur" type="xs:string"/>
<xs:element name="ISBN" type="num5"/>
</xs:sequence>
</xs:group>
<xs:element name="livre">
<xs:complexType>
<xs:sequence>
<xs:group ref="TitreAuteurISBN"/>
<xs:element ref="traduction"/>
</xs:sequence>
</xs:complexType>
</xs:element>
71
XML Schema Reference
• Reference without namespace
72
XPath
What is XPath?
• A language designed to be used by both XSL
Transformations (XSLT) and XPointer.
• Provides common syntax and semantics for
functionality shared between XSLT and XPointer.
• Primary purpose: Address ‘parts’ of an XML
document, and provide basic facilities for
manipulation of strings, numbers and booleans.
• W3C Recommendation. November 16, 1999
• Latest version: http://www.w3.org/TR/xpath
Introduction
• XPath uses a compact, string-based, rather
than XML element-based syntax.
• Operates on the abstract, logical structure of
an XML document (tree of nodes) rather
than its surface syntax.
• Uses a path notation (like URLs) to
navigate through this hierarchical tree
structure.
Introduction
Introduction Cont.
• Defines a way to compute a string-value for
each type of node: element, attribute, text.
• Supports Namespaces.
• Name of a node (a pair consisting of a local
part and namespace URI).
• Expression (Expr) is the primary syntactic
construct.
Introduction
Data Model
• Treats an XML document as a logical tree
• This tree consists of 7 nodes:
o Root Node – the root of the document
o Element Nodes – one for each element in the document
o Unique ID’s
o Attribute Nodes
o Namespace Nodes
o Processing Instruction Nodes
o Comment Nodes
o Text Nodes
• The tree structure is ordered and reads from top to
bottom and left to right
Data Model
Example for XPath Queries
<bib>
<book> <publisher> Addison-Wesley </publisher>
<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<title> Foundations of Databases </title>
<year> 1995 </year>
</book>
<book price=“55”>
<publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
</bib>
Data Model for XPath
The root
book book
publisher author . . . .
LocationPath
XPath: Simple Expressions
/bib/book/year
/bib/paper/year
/bib//first-name
Result: <first-name> Rick </first-name>
Xpath: Functions
/bib/book/author/text()
Functions in XPath:
– text() = matches the text value
– node() = matches any node (= * or @* or text())
– name() = returns the name of the current tag
Xpath: Wildcard
//author/*
Result: “55”
/bib/book[author/text()]
Xpath: Summary
bib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book/[@price<“55”]/author/lastname matches…
Xpath: More Details
• An Xpath expression, p, establishes a relation
between:
– A context node, and
– A node in the answer set
• In other words, p denotes a function:
– S[p] : Nodes -> {Nodes}
• Examples:
– author/firstname
– . = self
– .. = parent
– part/*/*/subpart/../name = part/*/*[subpart]/name
The Root and the Root
• <bib> <paper> 1 </paper> <paper> 2 </paper> </bib>
• bib is the “document element”
• The “root” is above bib
• Why ? Because we may have comments before and after <bib>; they
become siblings of <bib>
97
XSLT
XSLT Overview
• What is XSLT?
– XSL is the Extensible Style Language.
– It has two parts: the transformation language and the
formatting language.
– XSLT provides a syntax for defining rules that
transform an XML document to another document.
• For example, to an HTML document.
– An XSLT “style sheet” consists primarily of a set of
template rules that are used to transform nodes
matching some patterns.
XSLT Overview
• The xml-stylesheet element in the XML instance references an XSL
style sheet.
• In general, children of the stylesheet element in a stylesheet are
templates.
• A template specifies a pattern; the template is applied to nodes in the
XML source document that match this pattern.
– Note: the pattern “/” matches the root node of the document, we will see
this later
• In the transformed document, the body of the template element
replaces the matched node in the source document.
• In addition to text, the body may contain further XSL terms, e.g.:
– xsl:value-of extracts data from selected sub-nodes.
XSLT Overview
• We have an XML document and the style sheet (or rules) to transform it. So,
how do you transform the document?.
• You can transform documents in three ways:
– In the server. A server program, such as a Java servlet, can use a style
sheet to transform a document automatically and serve it to the client.
For example, XML Enabler, which is a servlet that you’ll find at XML for
Java Web site, www.alphaworks.ibm.com/tech/xml4j
– In the client. An XSL-enabled browser may convert XML downloaded
from the server to HTML before display. Currently Internet Explorer
supports a subset of XSLT.
– In a standalone program. XML stored in or generated from a database,
say, maybe “manually” converted to HTML before placing it in the
server’s document directory.
• In any case, a suitable program takes an XML document as input and an XSLT
“style sheet”.
Format of Style Sheet
• XSLT style sheet is itself an XML document.
• We will be using the XSLT elements from the namespace.
http://www.w3.org/1999/XSL/Transform
– As a matter of convention we use the prefix xsl: for this namespace.
• The document root in an XSLT style sheet is an xsl:stylesheet element,
e.g.:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
...
</xsl:stylesheet>
– A synonym for xsl:stylesheet is xsl:transform.
• Several kinds of elements can be nested inside xsl:stylesheet, but by
far the most important is the xsl:template element.
Format of Style Sheet
103
XSLT
Templates
• When you match or select nodes, a template tells the XSLT processor how to
transform the node for output
• So all our templates will have the form:
<xsl:template match=“pattern”>
template body
</xsl:template>
• The pattern is an Xpath expression describing the nodes to which the template
can be applied.
• The processor scans the input document for nodes matching this pattern, and
replaces them with the text included in the template body.
XSLT
Templates
• The content of the <template> tag represents the transformation rules to apply
to the elements selected by the match expression
106
XSLT
Templates
• <for-each> loops through elements that match the result of the attribute's
XPath expression select
XSLT
Logic: conditional processing
• <copy> provides a simple way to copy the current node to the output,
– use-attribute-sets : the attributes of the node that will be copied, if empty all
– <copy> does not copy the children of the node
– <element> allows to creation an XML element in the output
– name : local name of the element
– <attribute> > used in conjunction with <element> to add an
attribute to it
– name : attribute name,
XSLT
Output
• The <output> tag is the 1st child of the root of the XSLT document, this
tag indicates:
• method : Output format xml, html or texte.
– doctype-public : is the name of the standard respected by the output.
– doctype-system : is the link to the DTD of this standard.
– indent=yes : indicates that the generated file will be indented automatically.
• disabling indentation decreases the size of generated files.
Exemple
Consider the following XML document and XSLT document, which we can think of as the definition of a query on
the document:
<BIBLIO>
<LIVRE ISBN= “2-212-09052-8” LANG= “FR”>
<AUTEUR>
<NOM>Michard</NOM>
<PRENOM>Alain</PRENOM>
</AUTEUR>
<TITRE>XML, Langage et Applications</TITRE>
<EDITEUR>Eyrolles</EDITEUR>
<DATE_ACHAT>1998</DATE_ACHAT>
<PRIX monnaie= “FF”>340</PRIX>
</LIVRE>
<LIVRE ISBN= “2-7440-0628-9” LANG= “FR”>
<AUTEUR>
<NOM>Ladd</NOM>
<PRENOM>Eric</PRENOM>
</AUTEUR>
<AUTEUR>
<NOM>O’Donnel</NOM>
<PRENOM>Jim</PRENOM>
</AUTEUR>
<TITRE>HTML4, XML et Java 2</TITRE>
<EDITEUR>Campus Press</EDITEUR>
<PRIX monnaie= “FF”>349</PRIX>
</LIVRE>
</BIBLIO>
Exemple
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<LIVRES>
<xsl:for-each select="BIBLIO/LIVRE">
<LIVRE>
<ISBN> <xsl:apply-templates select="@ISBN"/></ISBN>
<TITRE> <xsl:apply-templates select="TITRE"/></TITRE>
<xsl:for-each select="AUTEUR">
<AUTEUR> <xsl:apply-templates select="NOM"/>,
<xsl:apply-templates select="PRENOM"/>
</AUTEUR>
</xsl:for-each>
<EDITEUR> <xsl:apply-templates select="EDITEUR"/></EDITEUR>
</LIVRE>
</xsl:for-each>
</LIVRES>
</xsl:template>
</xsl:stylesheet>
117
Exemple
Executing this document allows to obtain the following XML document:
<LIVRES>
<LIVRE>
<ISBN>2-212-09052-8</ISBN>
<AUTEUR>Michard, Alain</AUTEUR>
<TITRE>XML, Langage et Applications</TITRE>
<EDITEUR>Eyrolles</EDITEUR>
</LIVRE>
<LIVRE>
<ISBN>2-7440-0628-9</ISBN>
<AUTEUR>Ladd, Eric</AUTEUR>
<AUTEUR>O’Donnel, Jim</AUTEUR>
<TITRE>HTML4, XML et Java 2</TITRE>
<EDITEUR>Campus Press</EDITEUR>
</LIVRE>
</LIVRES>
118