SGML and XML
SGML and XML
XML stands for eXtensible Markup Language. It is a simple and flexible markup language. It
is known as universal language for data on the web because XML documents can be created
and used in any language. It is universal standard for information interchange.
Simplicity: Very easy to read and understand the information coded in XML.
Extensibility: It is extensible because it has no fixed set of tags. You can define them as
you need.
Self-descriptive: XML documents do not need special schema set-up like traditional
databases to store data. XML documents can be stored without such definitions, because
they contain metadata in the form of tags and attributes.
Scalable: XML is not in binary format so you can create and edit files with anything and it is
also easy to debug.
Fast access: XML documents are arranged in hierarchical form so it is comparatively faster.
A syntactically correct document is called well formed XML document. A well formed XML
document must follow the XML?s basic rules of syntax:
o Note: A valid XML document may be well-formed but a well-formed XML document
may not be valid.
What is DTD?
DTD stands for Document Type Definition. It defines a leading building block of an XML
document. It defines:
o Names of elements
o How and where they can be used
o Element attributes
o Proper nesting
o Use the DTD element definition within the XML document itself.
o Provide a DTD as a separate file and reference its name in XML document.
XML data binding is used to short your development effort, simplify maintenance, increase
reliability. It saves your development time and money. It makes working with XML data
very intuitive.
These errors occur because XML document can contain non ASCII characters like Norwegian
and French. These errors can be avoided by specifying the XML encoding Unicode.
Event-based API: An event based API provides the reports to an application about the
parsing event. It uses a set of built-in call back functions. Example of event-based API is
SAX parser.
XML Schema
What is XML schema
XML schema is a language which is used for expressing constraint about XML documents.
There are so many schema languages which are used now a days for example Relax- NG
and XSD (XML schema definition).
An XML schema is used to define the structure of an XML document. It is like DTD but
provides more control on XML structure.
Checking Validation
An XML document is called "well-formed" if it contains the correct syntax. A well-formed
and valid XML document is one which have been validated against Schema.
employee.xsd
1. <?xml version="1.0"?>
2. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
3. targetNamespace="http://www.javatpoint.com"
4. xmlns="http://www.javatpoint.com"
5. elementFormDefault="qualified">
6.
7. <xs:element name="employee">
8. <xs:complexType>
9. <xs:sequence>
10. <xs:element name="firstname" type="xs:string"/>
11. <xs:element name="lastname" type="xs:string"/>
12. <xs:element name="email" type="xs:string"/>
13. </xs:sequence>
14. </xs:complexType>
15. </xs:element>
16.
17. </xs:schema>
1. <?xml version="1.0"?>
2. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
3. targetNamespace="http://www.javatpoint.com"
4. xmlns="http://www.javatpoint.com"
5. elementFormDefault="qualified">
6.
7. <xs:element name="employee">
8. <xs:complexType>
9. <xs:sequence>
10. <xs:element name="firstname" type="xs:string"/>
11. <xs:element name="lastname" type="xs:string"/>
12. <xs:element name="email" type="xs:string"/>
13. </xs:sequence>
14. </xs:complexType>
15. </xs:element>
16.
17. </xs:schema>
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://w
targetNamespace="http://www.j
xmlns="http://www.javatpoint.co
elementFormDefault="qualified">
Let's see the xml file using XML schema or XSD file.
employee.xml
1. <?xml version="1.0"?>
2. <employee
3. xmlns="http://www.javatpoint.com"
4. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5. xsi:schemaLocation="http://www.javatpoint.com employee.xsd">
6.
7. <firstname>vimal</firstname>
8. <lastname>jaiswal</lastname>
9. <email>vimal@javatpoint.com</email>
10. </employee>
1. <?xml version="1.0"?>
2. <employee
3. xmlns="http://www.javatpoint.com"
4. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5. xsi:schemaLocation="http://www.javatpoint.com employee.xsd">
6.
7. <firstname>vimal</firstname>
8. <lastname>jaiswal</lastname>
9. <email>vimal@javatpoint.com</email>
10. </employee>
<?xml version="1.0"?>
<employee
xmlns="http://www.javatpoint.co
xmlns:xsi="http://www.w3.org/2
xsi:schemaLocation="http://www
Test it Now
1. simpleType
2. complexType
simpleType
The simpleType allows you to have text-based elements. It contains less attributes, child
elements, and cannot be left empty.
complexType
The complexType allows you to hold multiple attributes and elements. It can contain
additional sub elements and can be left empty.
XML Validation
A well formed XML document can be validated against DTD or Schema.
A well-formed XML document is an XML document with correct syntax. It is very necessary
to know about valid XML document before knowing XML validation.
XML DTD
A DTD defines the legal elements of an XML document
In simple words we can say that a DTD defines the document structure with a list of legal
elements and attributes.
Actually DTD and XML schema both are used to form a well formed XML document.
We should avoid errors in XML documents because they will stop the XML programs.
XML schema
It is defined as an XML language
It supports a large number of built in data types and definition of derived data types
XML DTD
What is DTD
DTD stands for Document Type Definition. It defines the legal building blocks of an XML
document. It is used to define document structure with a list of legal elements and
attributes.
Purpose of DTD
Its main purpose is to define the structure of an XML document. It contains a list of legal
elements and define the structure with the help of them.
Checking Validation
Before proceeding with XML DTD, you must check the validation. An XML document is called
"well-formed" if it contains the correct syntax.
A well-formed and valid XML document is one which have been validated against DTD.
employee.xml
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <firstname>vimal</firstname>
5. <lastname>jaiswal</lastname>
6. <email>vimal@javatpoint.com</email>
7. </employee>
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <firstname>vimal</firstname>
5. <lastname>jaiswal</lastname>
6. <email>vimal@javatpoint.com</email>
7. </employee>
In the above example, the DOCTYPE declaration refers to an external DTD file. The content
of the file is shown in below paragraph.
employee.dtd
Description of DTD
<!DOCTYPE employee : It defines that the root element of the document is employee.
<!ELEMENT lastname: It defines that the lastname element is #PCDATA typed. (parse-
able data type).
<!ELEMENT email: It defines that the email element is #PCDATA typed. (parse-able data
type).
1. An ampersand (&)
2. An entity name
3. A semicolon (;)
author.xml
CDATA vs PCDATA
CDATA
CDATA: (Unparsed Character data): CDATA contains the text which is not parsed further in
an XML document. Tags inside the CDATA text are not treated as markup and entities will
not be expanded.
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <![CDATA[
5. <firstname>vimal</firstname>
6. <lastname>jaiswal</lastname>
7. <email>vimal@javatpoint.com</email>
8. ]]>
9. </employee>
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <![CDATA[
5. <firstname>vimal</firstname>
6. <lastname>jaiswal</lastname>
7. <email>vimal@javatpoint.com</email>
8. ]]>
9. </employee>
Test it Now
In the above CDATA example, CDATA is used just after the element employee to make the
data/text unparsed, so it will give the value of employee:
<firstname>vimal</firstname><lastname>jaiswal</lastname><email>vimal@javatpoi
nt.com</email>
PCDATA
PCDATA: (Parsed Character Data): XML parsers are used to parse all the text in an XML
document. PCDATA stands for Parsed Character data. PCDATA is the text that will be parsed
by a parser. Tags inside the PCDATA will be treated as markup and entities will be
expanded.
In other words you can say that a parsed character data means the XML parser examine the
data and ensure that it doesn't content entity if it contains that will be replaced.
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <firstname>vimal</firstname>
5. <lastname>jaiswal</lastname>
6. <email>vimal@javatpoint.com</email>
7. </employee>
1. <?xml version="1.0"?>
2. <!DOCTYPE employee SYSTEM "employee.dtd">
3. <employee>
4. <firstname>vimal</firstname>
5. <lastname>jaiswal</lastname>
6. <email>vimal@javatpoint.com</email>
7. </employee>
Test it Now
In the above example, the employee element contains 3 more elements 'firstname',
'lastname', and 'email', so it parses further to get the data/text of firstname, lastname and
email to give the value of employee as:
XML DOM
As a W3C specification, one important objective for the Document Object Model is to
provide a standard programming interface that can be used in a wide variety of
environments and applications. The Document Object Model can be used with any
programming language.
XML DOM defines a standard way to access and manipulate XML documents.
We can modify or delete their content and also create new elements. The elements, their
content (text and attributes) are all known as nodes.
1. <TABLE>
2. <ROWS>
3. <TR>
4. <TD>A</TD>
5. <TD>B</TD>
6. </TR>
7. <TR>
8. <TD>C</TD>
9. <TD>D</TD>
10. </TR>
11. </ROWS>
12. </TABLE>
1. <TABLE>
2. <ROWS>
3. <TR>
4. <TD>A</TD>
5. <TD>B</TD>
6. </TR>
7. <TR>
8. <TD>C</TD>
9. <TD>D</TD>
10. </TR>
11. </ROWS>
12. </TABLE>
note.xml
Let's see the HTML file that extracts the data of XML document using DOM.
xmldom.html
1. <!DOCTYPE html>
2. <html>
3. <body>
4. <h1>Important Note</h1>
5. <div>
6. <b>To:</b> <span id="to"></span><br>
7. <b>From:</b> <span id="from"></span><br>
8. <b>Message:</b> <span id="message"></span>
9. </div>
10. <script>
11. if (window.XMLHttpRequest)
12. {// code for IE7+, Firefox, Chrome, Opera, Safari
13. xmlhttp=new XMLHttpRequest();
14. }
15. else
16. {// code for IE6, IE5
17. xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
18. }
19. xmlhttp.open("GET","note.xml",false);
20. xmlhttp.send();
21. xmlDoc=xmlhttp.responseXML;
22. document.getElementById("to").innerHTML=
23. xmlDoc.getElementsByTagName("to")[0].childNodes[0].nodeValue;
24. document.getElementById("from").innerHTML=
25. xmlDoc.getElementsByTagName("from")[0].childNodes[0].nodeValue;
26. document.getElementById("message").innerHTML=
27. xmlDoc.getElementsByTagName("body")[0].childNodes[0].nodeValue;
28. </script>
29. </body>
30. </html>
1. <!DOCTYPE html>
2. <html>
3. <body>
4. <h1>Important Note</h1>
5. <div>
6. <b>To:</b> <span id="to"></span><br>
7. <b>From:</b> <span id="from"></span><br>
8. <b>Message:</b> <span id="message"></span>
9. </div>
10. <script>
11. if (window.XMLHttpRequest)
12. {// code for IE7+, Firefox, Chrome, Opera, Safari
13. xmlhttp=new XMLHttpRequest();
14. }
15. else
16. {// code for IE6, IE5
17. xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
18. }
19. xmlhttp.open("GET","note.xml",false);
20. xmlhttp.send();
21. xmlDoc=xmlhttp.responseXML;
22. document.getElementById("to").innerHTML=
23. xmlDoc.getElementsByTagName("to")[0].childNodes[0].nodeValue;
24. document.getElementById("from").innerHTML=
25. xmlDoc.getElementsByTagName("from")[0].childNodes[0].nodeValue;
26. document.getElementById("message").innerHTML=
27. xmlDoc.getElementsByTagName("body")[0].childNodes[0].nodeValue;
28. </script>
29. </body>
30. </html>
Test it Now
Output:
Important Note
To: sonoojaiswal@javatpoint.com
From: vimal@javatpoint.com
Message: Hello XML DOM
Let's see the HTML file that extracts the data of XML string using DOM.
xmldom.html
1. <!DOCTYPE html>
2. <html>
3. <body>
4. <h1>Important Note2</h1>
5. <div>
6. <b>To:</b> <span id="to"></span><br>
7. <b>From:</b> <span id="from"></span><br>
8. <b>Message:</b> <span id="message"></span>
9. </div>
10. <script>
11. txt1="<note>";
12. txt2="<to>Sania Mirza</to>";
13. txt3="<from>Serena William</from>";
14. txt4="<body>Don't forget me this weekend!</body>";
15. txt5="</note>";
16. txt=txt1+txt2+txt3+txt4+txt5;
17.
18. if (window.DOMParser)
19. {
20. parser=new DOMParser();
21. xmlDoc=parser.parseFromString(txt,"text/xml");
22. }
23. else // Internet Explorer
24. {
25. xmlDoc=new ActiveXObject("Microsoft.XMLDOM");
26. xmlDoc.async=false;
27. xmlDoc.loadXML(txt);
28. }
29. document.getElementById("to").innerHTML=
30. xmlDoc.getElementsByTagName("to")[0].childNodes[0].nodeValue;
31. document.getElementById("from").innerHTML=
32. xmlDoc.getElementsByTagName("from")[0].childNodes[0].nodeValue;
33. document.getElementById("message").innerHTML=
34. xmlDoc.getElementsByTagName("body")[0].childNodes[0].nodeValue;
35. </script>
36. </body>
37. </html>
1. <!DOCTYPE html>
2. <html>
3. <body>
4. <h1>Important Note2</h1>
5. <div>
6. <b>To:</b> <span id="to"></span><br>
7. <b>From:</b> <span id="from"></span><br>
8. <b>Message:</b> <span id="message"></span>
9. </div>
10. <script>
11. txt1="<note>";
12. txt2="<to>Sania Mirza</to>";
13. txt3="<from>Serena William</from>";
14. txt4="<body>Don't forget me this weekend!</body>";
15. txt5="</note>";
16. txt=txt1+txt2+txt3+txt4+txt5;
17.
18. if (window.DOMParser)
19. {
20. parser=new DOMParser();
21. xmlDoc=parser.parseFromString(txt,"text/xml");
22. }
23. else // Internet Explorer
24. {
25. xmlDoc=new ActiveXObject("Microsoft.XMLDOM");
26. xmlDoc.async=false;
27. xmlDoc.loadXML(txt);
28. }
29. document.getElementById("to").innerHTML=
30. xmlDoc.getElementsByTagName("to")[0].childNodes[0].nodeValue;
31. document.getElementById("from").innerHTML=
32. xmlDoc.getElementsByTagName("from")[0].childNodes[0].nodeValue;
33. document.getElementById("message").innerHTML=
34. xmlDoc.getElementsByTagName("body")[0].childNodes[0].nodeValue;
35. </script>
36. </body>
37. </html>
Test it Now
Output:
Important Note2
XML CSS
cssemployee.css
1. employee
2. {
3. background-color: pink;
4. }
5. firstname,lastname,email
6. {
7. font-size:25px;
8. display:block;
9. color: blue;
10. margin-left: 50px;
11. }
1. employee
2. {
3. background-color: pink;
4. }
5. firstname,lastname,email
6. {
7. font-size:25px;
8. display:block;
9. color: blue;
10. margin-left: 50px;
11. }
employee.dtd
employee.xml
1. <?xml version="1.0"?>
2. <?xml-stylesheet type="text/css" href="cssemployee.css"?>
3. <!DOCTYPE employee SYSTEM "employee.dtd">
4. <employee>
5. <firstname>vimal</firstname>
6. <lastname>jaiswal</lastname>
7. <email>vimal@javatpoint.com</email>
8. </employee>
1. <?xml version="1.0"?>
2. <?xml-stylesheet type="text/css" href="cssemployee.css"?>
3. <!DOCTYPE employee SYSTEM "employee.dtd">
4. <employee>
5. <firstname>vimal</firstname>
6. <lastname>jaiswal</lastname>
7. <email>vimal@javatpoint.com</email>
8. </employee>
Test it Now
CSS is not generally used to format XML file. W3C recommends XSLT instead of CSS.
XML Database
XML database is a data persistence software system used for storing the huge amount of
information in XML format. It provides a secure place to store XML documents.
You can query your stored data by using XQuery, export and serialize into desired format.
XML databases are usually associated with document-oriented databases.
1. XML-enabled database
2. Native XML database (NXD)
XML-enable Database
XML-enable database works just like a relational database. It is like an extension provided
for the conversion of XML documents. In this database, data is stored in table, in the form
of rows and columns.
Native XML database is preferred over XML-enable database because it is highly capable to
store, maintain and query XML documents.
1. <?xml version="1.0"?>
2. <contact-info>
3. <contact1>
4. <name>Vimal Jaiswal</name>
5. <company>SSSIT.org</company>
6. <phone>(0120) 4256464</phone>
7. </contact1>
8. <contact2>
9. <name>Mahesh Sharma </name>
10. <company>SSSIT.org</company>
11. <phone>09990449935</phone>
12. </contact2>
13. </contact-info>
1. <?xml version="1.0"?>
2. <contact-info>
3. <contact1>
4. <name>Vimal Jaiswal</name>
5. <company>SSSIT.org</company>
6. <phone>(0120) 4256464</phone>
7. </contact1>
8. <contact2>
9. <name>Mahesh Sharma </name>
10. <company>SSSIT.org</company>
11. <phone>09990449935</phone>
12. </contact2>
13. </contact-info>
In the above example, a table named contacts is created and holds the contacts (contact1
and contact2). Each one contains 3 entities name, company and phone.