Parsing XML Into Programming Languages: Jaxp, Dom, Sax, Jdom/Dom4J, Xerces, Xalan, JAXB
Parsing XML Into Programming Languages: Jaxp, Dom, Sax, Jdom/Dom4J, Xerces, Xalan, JAXB
languages
JAXP, DOM, SAX, JDOM/DOM4J,
Xerces, Xalan, JAXB
• Possible strategies
– Parse by hand with some reusable libraries
– Parse into generic tree structure
– Parse as sequence of events
– Automagically parse to language-specific objects
Parsing by-hand
• Advantages
– Complete control
– Good if simple needs – build off of regex package
• Disadvantages
– Must write the initial code yourself, even if it becomes
generalized
– Pretty tedious and error prone.
– Gets very hard when using schema or DTD to validate
Parsing into generic tree structure
• Advantages
– Industry-wide, language neutral standard exists called DOM
(Document Object Model)
– Learning DOM for one language makes it easy to learn for any
other
– As of JAXP 1.2, support for Schema
– Have to write much less code to get XML to something you want
to manipulate in your program
• Disadvantages
– Non-intuitive API, doesn‟t take full advantage of Java
– Still quite a bit of work
What is JAXP?
• JAXP: Java API for XML Processing
– In the Java language, the definition of these standard
API‟s (together with XSLT API) comprise a set of
interfaces known as JAXP
– Java also provides standard implementations together
with vendor pluggability layer
– Some of these come standard with J2SDK, others are
only availdable with Web Services Developers Pack
– We will study these shortly
Another alternative
• JDOM: Native Java published API for
representing XML as tree
• Like DOM but much more Java-specific,
object oriented
• However, not supported by other languages
• Also, no support for schema
• Dom4j another alternative
JAXB
• JAXB: Java API for XML Bindings
org.w3d.dom.Document
Sample Code
A factory instance
DocumentBuilderFactor factory = is the parser implementation.
Can be changed with runtime
DocumentBuilderFactory.newInstance(); System property. Jdk has default.
Xerces much better.
javax.xml.parsers.DocumentBuilderFactory
For reference. Notice that the
javax.xml.parsers.DocumentBuilder
Document class comes from the
org.w3c.dom.Document w3c-specified bindings.
Validation
• Note that by default the parser will not
validate against a schema or DTD
Each of these has a special and non-obvious associated type, value, and name.
TransformerFactory tFactory =
TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
}
}
Creating a DOM from scratch
• Sometimes you may want to create a DOM
tree directly in memory. This is done with:
DocumentBuilderFactory factory
= DocumentBuilderFactory.newInstance();
DocumentBuilder builder
= factory.newDocumentBuilder();
document = builder.newDocument();
Manipulating Nodes
• Once the root node is obtained, typical tree
methods exist to manipulate other elements:
boolean node.hasChildNodes()
NodeList node.getChildNodes()
Node node.getNextSibling()
Node node.getParentNode()
String node.getValue();
String node.getName();
String node.getText();
void setNodeValue(String nodeValue);
Node insertBefore(Node new, Node ref);
SAX