[go: up one dir, main page]

0% found this document useful (0 votes)
35 views42 pages

Chapitre 05

Uploaded by

zinebayref
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views42 pages

Chapitre 05

Uploaded by

zinebayref
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

XML Parsing DOM or SAX XML and Python Python and XSLT

Données Semi-Structurées.
Chapitre5: XML Parsing: SAX and DOM

Amel Boustil

Computer Science Department, FS, University M'Hamed Bougara of Boumerdes,

35000, Algeria.

23 mai 2023

1
XML Parsing DOM or SAX XML and Python Python and XSLT

Agenda

DOM in python
XML Parsing
SAX in Python
DOM or SAX ElementTree API
XML and Python Python and XSLT

2
XML Parsing DOM or SAX XML and Python Python and XSLT

XML Parser

XML Parser

An XML parser is a software library or package that provides


interfaces for client applications to work with an XML document

Types of XML Parsers

DOM and SAX parsers are the two most popular parsers used to
parse XML documents. Despite both DOM and SAX are used in
XML parsing, they are completely dierent from each other.

3
XML Parsing DOM or SAX XML and Python Python and XSLT

DOM

DOM

The Document Object Model is a cross-language API recommended


by World Wide Web Consortium (W3C) for accessing and modifying
the XML documents. It is extremely useful for random-access
applications. It presents an XML document as a tree.

4
XML Parsing DOM or SAX XML and Python Python and XSLT

SAX

SAX

SAX, also known as the Simple API for XML, is used for parsing
XML documents. It is based on events generated while reading
through the document.

5
XML Parsing DOM or SAX XML and Python Python and XSLT

Dierence between SAX vs DOM parser

In this section, we will see some behavioral dierences between


DOM and SAX parsers.

1. Working

The rst and major dierence between DOM vs SAX parsers is how
they work. DOM parser loads a full XML le in memory and creates
a tree representation of XML document, while SAX is an
event-based XML parser and doesn't load the whole XML
document into memory.

6
XML Parsing DOM or SAX XML and Python Python and XSLT

Dierence between SAX vs DOM parser

2. XML Size

For small and medium-sized XML documents DOM is much faster


than SAX because of in memory operation.

3. Full form

DOM stands for Document Object Model while SAX stands for
Simple API for XML parsing.

7
XML Parsing DOM or SAX XML and Python Python and XSLT

Dierence between SAX vs DOM parser

4. API Type

The DOM parser creates an in-memory tree and allows you to


access elements while the API type of SAX is push and
streaming-based.

5. Ease of Use

DOM is relatively easier to use than SAX parser.

8
XML Parsing DOM or SAX XML and Python Python and XSLT

Dierence between SAX vs DOM parser

6. XPath Capability

DOM allows you to use Xpath to access elements this is not


available with the SAX parser. This is also a powerful feature and
most of the time people use DOM to leverage XPath.

7. Direction

SAX is a forward-only parser which means you cannot go backward


but DOM allows you to go on any direction.

9
XML Parsing DOM or SAX XML and Python Python and XSLT

Dierence between SAX vs DOM parser

8. When to use DOM and SAX parsers

Another dierence between DOM vs SAX is that, learning where to


use DOM parser and where to use SAX parser. DOM parser is
better suited for small XML les with sucient memory, while SAX
parser is better suited for large XML les.

10
XML Parsing DOM or SAX XML and Python Python and XSLT

Which languages are supported ?

• Java

• Perl

• C++

• PHP

• Python

11
XML Parsing DOM or SAX XML and Python Python and XSLT

Python Library for XML

Python's standard library contains xml package. This package has


following modules that dene XML processing APIs.

• xml.dom : the DOM API denition

• xml.sax : SAX2 base classes and convenience functions

• xml.etree.ElementTree : a simple and lightweight XML


processor API

12
XML Parsing DOM or SAX XML and Python Python and XSLT

The DOM API

• The easiest way to load an XML document xml.dom module.

• Minimal implementation of the Document Object Model


interface is done by xml.dom.minidom with an API that is
available in other languages. It is simpler than the full DOM
and also signicantly smaller.

• The xml.dom.pulldom module provides a pull parser that


generates DOM-accessible fragments of the document.

13
XML Parsing DOM or SAX XML and Python Python and XSLT

DOM : Parsing an XML document

Two methods :

• The parse() function can take either a lename or an open le


object and returns a document object representing the content
of the document.

• The parseString() function returns a document object


representing the content fo the document from a string input.

14
XML Parsing DOM or SAX XML and Python Python and XSLT

DOM : Parsing an XML document Example

15
XML Parsing DOM or SAX XML and Python Python and XSLT

Objects in the DOM

• Node : Objects which are the base interface for most objects in
a document.

• NodeList : Objects which are the interface for a sequence of


nodes.

• Document : Object which represents an entire document.

• Element : Objects which represent Element nodes in the


document hierarchy.

• Attr : Objects which gives attribute value nodes on element


nodes.

• ...,ect.

16
XML Parsing DOM or SAX XML and Python Python and XSLT

Element Object
The Element object represents an element in an XML document.
Elements may contain attributes, other elements, or text. If an
element contains text, the text is represented in a text-node.
Element Object Properties (Examples)

Property Description
attributes Returns a NamedNodeMap of attributes for the element
baseURI Returns the absolute base URI of the element
childNodes Returns a NodeList of child nodes for the element
rstChild Returns the rst child of the element
lastChild Returns the last child of the element
parentNode Returns the parent node of the element
nodeValue Returns the value of the element
nodeName Returns the name of the element

17
XML Parsing DOM or SAX XML and Python Python and XSLT

Element Object

Element Object Methods (Examples)

Method Description
appendChild() Adds a new child node to the end
of the list of children of the node
getAttribute() Returns the value of an attribute
getAttributeNode() Returns an attribute node as an
Attribute object
getElementsByTagName() Returns a NodeList of matching
element nodes, and their children

18
XML Parsing DOM or SAX XML and Python Python and XSLT

Example for the root

19
XML Parsing DOM or SAX XML and Python Python and XSLT

Example for the elements

20
XML Parsing DOM or SAX XML and Python Python and XSLT

Execution Result

21
XML Parsing DOM or SAX XML and Python Python and XSLT

SAX API

SAX is a standard interface for event-driven XML parsing. We need


xml.sax.ContentHandler to obtain ContentHandler.

• The ContentHandler is the main callback interface in SAX.


The order of events in this interface mirrors the order of the
information in the document. It handles tags and attributes of
XML.

• The ContentHandler class provides startElement() and


endElement() methods which get called when an element
starts and ends respectively. The main others events are :

22
XML Parsing DOM or SAX XML and Python Python and XSLT

SAX API

• ContentHandler.startDocument() : Receive notication of the


beginning of a document.

• ContentHandler.endDocument() : Receive notication of the


end of a document.

• ContentHandler.startElement() Signals the start of an element


in non-namespace mode.

• ContentHandler.endElement() Signals the end of an element in


non-namespace mode.

• ContentHandler.characters() Receive notication of character


data.

23
XML Parsing DOM or SAX XML and Python Python and XSLT

SAX API

• make _parser function creates a SAX XMLReader object.


parser = xml.sax.make _parser ()

• Then set the contenthandler to user-dened class subclassed


from SAX.ContentHandler.
Handler = MyHandler()
parser.setContentHandler(Handler)

• Now you can use above parser object to parse an XML le.
parser.parse('myle.xml')

24
XML Parsing DOM or SAX XML and Python Python and XSLT

Example of SAX parsing

25
XML Parsing DOM or SAX XML and Python Python and XSLT

Example of SAX parsing

26
XML Parsing DOM or SAX XML and Python Python and XSLT

ElementTree API

• The DOM API is vast and oers cross-language and


cross-platform API for working with XML data (not specic to
a specic language).
• The ElementTree API provides an easy-to-use and ecient API
for parsing, manipulating, and generating XML documents in
Python. It takes a dierent approach by focusing instead on :
• Element-centric approach, where XML elements are
represented as Element objects that can be accessed, modied,
and navigated easily.
• ElementTree parses the entire XML document into memory,
creating a tree structure that can be traversed and
manipulated eciently.

27
XML Parsing DOM or SAX XML and Python Python and XSLT

ElementTree API

• Elements are treated as if they were lists. This means that if


you have an XML element that contains other elements, it is
possible to iterate over those child elements using standard
iteration like a for loop.

• The ElementTree API treats attributes like dictionaries. So if


you have a reference to an element, then you can access its
attrib property which is a dictionary of all the attribute names
and values.

• ElementTree makes searching for content within XML


straightforward. It oers functions that can use XPath Syntax
to search the XML for specic data.

28
XML Parsing DOM or SAX XML and Python Python and XSLT

ElementTree API components


ElementTree is a class that wraps the element structure and allows
conversion to and from XML.
Element type allows storage of hierarchical data structures in
memory and has the following properties :

Property Description
Tag It is a string representing the type of
data being stored
Attributes Consists of a number of attributes stored
as dictionaries
Text String A text string having information that
needs to be displayed
Tail String Can also have tail strings if necessary
Child Elements Consists of a number of child elements
stored as sequences

29
XML Parsing DOM or SAX XML and Python Python and XSLT

Parsing with ElementTree

There are two ways to parse the le using `ElementTree' module.

• The parse() function parses XML document which is supplied


as a le whereas,

• The fromstring() function parses XML when supplied as a


string i.e within triple quotes.

30
XML Parsing DOM or SAX XML and Python Python and XSLT

Parsing with ElementTree

Using parse() function :

31
XML Parsing DOM or SAX XML and Python Python and XSLT

Parsing with ElementTree


Using fromString() function :

32
XML Parsing DOM or SAX XML and Python Python and XSLT

Finding interesting elements in ElementTree

• Element.ndall() : The ndall() method is used to nd all


occurrences of elements that match a specied tag or
ElementPath expression. It returns a list of matching elements.
Usage : element.ndall(tag) or element.ndall(ElementPath).

• Element.nd() The nd() method is used to nd the rst


occurrence of an element that matches a specied tag or
ElementPath expression. It returns the rst matching element
or None if no matching element is found.
Usage : element.nd(tag) or element.nd(ElementPath)

33
XML Parsing DOM or SAX XML and Python Python and XSLT

With Element.nd()

34
XML Parsing DOM or SAX XML and Python Python and XSLT

With Element.ndall()

35
XML Parsing DOM or SAX XML and Python Python and XSLT

36
XML Parsing DOM or SAX XML and Python Python and XSLT

Finding interesting elements in ElementTree

• iter()and iternd() : accepts a tag name or an ElementPath


expression to specify the element(s) to iterate over.
• The iter() method is used to iterate over all elements in the
XML tree that match a specied tag. It returns an iterator
object that yields each matching element.
• The iternd() is used for targeted element selection based on
ElementPath expressions.
Usage : element.iter(tag) or element.iter(ElementPath)or
element.iternd(pathExpression)

37
XML Parsing DOM or SAX XML and Python Python and XSLT

With Element.iter()

With Element.iternd()

38
XML Parsing DOM or SAX XML and Python Python and XSLT

Python XML Parsing Summary

• Reading and manipulating XML data in Python is done by


using any of the libraries mentioned above.

• We had a look at the SAX API for XML, the DOM API for
XML, and lastly the ElementTree API for XML. They each
have their pros can cons

39
XML Parsing DOM or SAX XML and Python Python and XSLT

Python and XSLT

40
XML Parsing DOM or SAX XML and Python Python and XSLT

To apply an existing XSLT stylesheet to an XML document in


Python, we can use the lxml library.

40
XML Parsing DOM or SAX XML and Python Python and XSLT

40

You might also like