Chapitre 05
Chapitre 05
Données Semi-Structurées.
Chapitre5: XML Parsing: SAX and DOM
Amel Boustil
35000, Algeria.
23 mai 2023
1
XML Parsing DOM or SAX XML and Python Python and XSLT
Agenda
DOM in python
XML Parsing
SAX in Python
DOM or SAX ElementTree API
XML and Python Python and XSLT
2
XML Parsing DOM or SAX XML and Python Python and XSLT
XML Parser
XML Parser
DOM and SAX parsers are the two most popular parsers used to
parse XML documents. Despite both DOM and SAX are used in
XML parsing, they are completely dierent from each other.
3
XML Parsing DOM or SAX XML and Python Python and XSLT
DOM
DOM
4
XML Parsing DOM or SAX XML and Python Python and XSLT
SAX
SAX
SAX, also known as the Simple API for XML, is used for parsing
XML documents. It is based on events generated while reading
through the document.
5
XML Parsing DOM or SAX XML and Python Python and XSLT
1. Working
The rst and major dierence between DOM vs SAX parsers is how
they work. DOM parser loads a full XML le in memory and creates
a tree representation of XML document, while SAX is an
event-based XML parser and doesn't load the whole XML
document into memory.
6
XML Parsing DOM or SAX XML and Python Python and XSLT
2. XML Size
3. Full form
DOM stands for Document Object Model while SAX stands for
Simple API for XML parsing.
7
XML Parsing DOM or SAX XML and Python Python and XSLT
4. API Type
5. Ease of Use
8
XML Parsing DOM or SAX XML and Python Python and XSLT
6. XPath Capability
7. Direction
9
XML Parsing DOM or SAX XML and Python Python and XSLT
10
XML Parsing DOM or SAX XML and Python Python and XSLT
• Java
• Perl
• C++
• PHP
• Python
11
XML Parsing DOM or SAX XML and Python Python and XSLT
12
XML Parsing DOM or SAX XML and Python Python and XSLT
13
XML Parsing DOM or SAX XML and Python Python and XSLT
Two methods :
14
XML Parsing DOM or SAX XML and Python Python and XSLT
15
XML Parsing DOM or SAX XML and Python Python and XSLT
• Node : Objects which are the base interface for most objects in
a document.
• ...,ect.
16
XML Parsing DOM or SAX XML and Python Python and XSLT
Element Object
The Element object represents an element in an XML document.
Elements may contain attributes, other elements, or text. If an
element contains text, the text is represented in a text-node.
Element Object Properties (Examples)
Property Description
attributes Returns a NamedNodeMap of attributes for the element
baseURI Returns the absolute base URI of the element
childNodes Returns a NodeList of child nodes for the element
rstChild Returns the rst child of the element
lastChild Returns the last child of the element
parentNode Returns the parent node of the element
nodeValue Returns the value of the element
nodeName Returns the name of the element
17
XML Parsing DOM or SAX XML and Python Python and XSLT
Element Object
Method Description
appendChild() Adds a new child node to the end
of the list of children of the node
getAttribute() Returns the value of an attribute
getAttributeNode() Returns an attribute node as an
Attribute object
getElementsByTagName() Returns a NodeList of matching
element nodes, and their children
18
XML Parsing DOM or SAX XML and Python Python and XSLT
19
XML Parsing DOM or SAX XML and Python Python and XSLT
20
XML Parsing DOM or SAX XML and Python Python and XSLT
Execution Result
21
XML Parsing DOM or SAX XML and Python Python and XSLT
SAX API
22
XML Parsing DOM or SAX XML and Python Python and XSLT
SAX API
23
XML Parsing DOM or SAX XML and Python Python and XSLT
SAX API
• Now you can use above parser object to parse an XML le.
parser.parse('myle.xml')
24
XML Parsing DOM or SAX XML and Python Python and XSLT
25
XML Parsing DOM or SAX XML and Python Python and XSLT
26
XML Parsing DOM or SAX XML and Python Python and XSLT
ElementTree API
27
XML Parsing DOM or SAX XML and Python Python and XSLT
ElementTree API
28
XML Parsing DOM or SAX XML and Python Python and XSLT
Property Description
Tag It is a string representing the type of
data being stored
Attributes Consists of a number of attributes stored
as dictionaries
Text String A text string having information that
needs to be displayed
Tail String Can also have tail strings if necessary
Child Elements Consists of a number of child elements
stored as sequences
29
XML Parsing DOM or SAX XML and Python Python and XSLT
There are two ways to parse the le using `ElementTree' module.
30
XML Parsing DOM or SAX XML and Python Python and XSLT
31
XML Parsing DOM or SAX XML and Python Python and XSLT
32
XML Parsing DOM or SAX XML and Python Python and XSLT
33
XML Parsing DOM or SAX XML and Python Python and XSLT
With Element.nd()
34
XML Parsing DOM or SAX XML and Python Python and XSLT
With Element.ndall()
35
XML Parsing DOM or SAX XML and Python Python and XSLT
36
XML Parsing DOM or SAX XML and Python Python and XSLT
37
XML Parsing DOM or SAX XML and Python Python and XSLT
With Element.iter()
With Element.iternd()
38
XML Parsing DOM or SAX XML and Python Python and XSLT
• We had a look at the SAX API for XML, the DOM API for
XML, and lastly the ElementTree API for XML. They each
have their pros can cons
39
XML Parsing DOM or SAX XML and Python Python and XSLT
40
XML Parsing DOM or SAX XML and Python Python and XSLT
40
XML Parsing DOM or SAX XML and Python Python and XSLT
40