Copyright © 2004-2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
RDF Concepts and Abstract Syntax defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of key concepts, datatyping, character normalization and handling of IRIs.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is work in progress towards a revision of the RDF Concepts and Abstract Syntax Recommendation, and is intended to eventually replace that document. It is part of a larger effort to revise the RDF specifications as published in 2004. The most significant changes from the 2004 edition are: modified string literals, a section on skolemization of blank nodes, and many updated references to other specifications (including a change in terminology from “URI references” to “IRIs”). A fuller list of changes that have been made to date is provided in Appendix A. Various areas of work to be tackled in upcoming working drafts are highlighted throughout the document, but should not yet be understood as an exhaustive list.
This document was published by the RDF Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-rdf-comments@w3.org (subscribe, archives). All feedback is welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document reflects current progress of the RDF Working Group towards updating the 2004 version of RDF Concepts and Abstract Syntax. The editors expect to work on a number of issues, some of which are listed in boxes like this throughout the document.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
This document defines an abstract syntax (a data model) on which RDF is based, and which serves to link concrete syntaxes to its formal semantics. It also includes discussion of key concepts, datatyping, character normalization and handling of IRIs.
Normative documentation of RDF falls into the following areas:
The framework is designed so that vocabularies can be layered. The terms defined in [RDF-SCHEMA] are the first such vocabulary. Several other vocabularies for RDF are mentioned in the Primer [RDF-PRIMER].
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words must, must not, required, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC2119].
This section is non-normative.
This section is quite redundant with later normative sections and the RDF Primer. Its removal has been proposed. This is ISSUE-68.
RDF uses the following key concepts:
The underlying structure of any expression in RDF is a collection of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph (defined more formally in section 6). This can be illustrated by a node and directed-arc diagram, in which each triple is represented as a node-arc-node link (hence the term “graph”).
Each triple represents a statement of a relationship between the things denoted by the nodes that it links. Each triple has three parts:
The direction of the arc is significant: it always points toward the object.
The nodes of an RDF graph are its subjects and objects.
The assertion of an RDF triple says that some relationship, indicated by the predicate, holds between the things denoted by subject and object of the triple. The assertion of an RDF graph amounts to asserting all the triples in it, so the meaning of an RDF graph is the conjunction (logical AND) of the statements corresponding to all the triples it contains. A formal account of the meaning of RDF graphs is given in [RDF-MT].
A node may be an IRI, a literal, or blank (having no separate form of identification). Properties are IRIs.
An IRI or literal used as a node identifies what that node represents. An IRI used as a predicate identifies a relationship between the things represented by the nodes it connects. A predicate IRI may also be a node in the graph.
A blank node is a node that is not an IRI or a literal. In the RDF abstract syntax, a blank node is just a unique node that can be used in one or more RDF statements.
A convention used by some linear representations of an RDF graph to allow several statements to use the same blank node is to use a blank node identifier, which is a local identifier that can be distinguished from all IRIs and literals. When graphs are merged, their blank nodes must be kept distinct if meaning is to be preserved; this may call for re-allocation of blank node identifiers. Note that such blank node identifiers are not part of the RDF abstract syntax, and the representation of triples containing blank nodes is entirely dependent on the particular concrete syntax used.
Datatypes are used by RDF in the representation of values such as integers, floating point numbers and dates.
A datatype consists of a lexical space, a value space and a lexical-to-value mapping, see section 5.
For example, the lexical-to-value mapping for the XML Schema datatype xsd:boolean, where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:
Value Space | {T, F} |
---|---|
Lexical Space | {"0", "1", "true", "false"} |
Lexical-to-Value Mapping | {<"true", T>, <"1", T>, <"0", F>, <"false", F>} |
RDF predefines just one datatype rdf:XMLLiteral
, used for
embedding XML in RDF (see section
5.1).
There is no built-in concept of numbers or dates or other common values. Rather, RDF defers to datatypes that are defined separately, and identified with IRIs. The predefined XML Schema datatypes [XMLSCHEMA-2] are expected to be widely used for this purpose.
RDF provides no mechanism for defining new datatypes. XML Schema Datatypes [XMLSCHEMA-2] provides an extensibility framework suitable for defining new datatypes for use in RDF.
Literals are used to identify values such as numbers and dates by means of a lexical representation. Anything represented by a literal could also be represented by an IRI, but it is often more convenient or intuitive to use literals.
A literal may be the object of an RDF statement, but not the subject or the predicate.
Literals may be typed or language-tagged:
Continuing the example from section 3.3, the typed literals that can be defined using the XML Schema datatype xsd:boolean are:
Typed Literal | Lexical-to-Value Mapping | Value |
---|---|---|
<xsd:boolean, "true"> | <"true", T> | T |
<xsd:boolean, "1"> | <"1", T> | T |
<xsd:boolean, "false"> | <"false", F> | F |
<xsd:boolean, "0"> | <"0", F> | F |
For text that may contain
markup, use typed literals
with type rdf:XMLLiteral.
If language annotation is required,
it must be explicitly included as markup, usually by means of an
xml:lang
attribute.
XHTML [XHTML10] may be included within RDF
in this way. Sometimes, in this latter case,
an additional span
or div
element is needed to carry an
xml:lang
or lang
attribute.
Update the XHTML 1.0 reference to something more recent?
The string in both plain and typed literals is recommended to be in Unicode Normal Form C [NFC]. This is motivated by [CHARMOD] particularly section 4 Early Uniform Normalization.
The ideas on meaning and inference in RDF are underpinned by the formal concept of entailment, as discussed in the RDF semantics document [RDF-MT]. In brief, an RDF expression A is said to entail another RDF expression B if every possible arrangement of things in the world that makes A true also makes B true. On this basis, if the truth of A is presumed or demonstrated then the truth of B can be inferred .
RDF uses IRIs to identify resources and properties. Certain IRIs with the following leading substring are defined by the RDF specifications to denote specific concepts:
http://www.w3.org/1999/02/22-rdf-syntax-ns#
(conventionally associated with namespace prefix rdf:
)Vocabulary terms in the rdf:
namespace are listed and described in detail in the
RDF Schema specification [RDF-SCHEMA].
The RDF namespace is also used as an
XML namespace [XML-NAMES] to define a number of additional
element and attribute names for purely syntactic purposes within
the RDF/XML syntax ([RDF-SYNTAX-GRAMMAR],
section 5.1).
These terms (e.g., rdf:about
and rdf:ID
)
do not denote concepts.
This section perhaps should discuss
the XSD datatype map
and rdf:PlainLiteral
.
This is ISSUE-70.
The datatype abstraction used in RDF is compatible with the abstraction used in XML Schema Part 2: Datatypes [XMLSCHEMA-2].
A datatype consists of a lexical space, a value space and a lexical-to-value mapping.
The lexical space of a datatype is a set of Unicode [UNICODE] strings.
The lexical-to-value mapping of a datatype is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:
A datatype is identified by one or more IRIs.
RDF may be used with any datatype definition that conforms to this abstraction, even if not defined in terms of XML Schema.
Certain XML Schema built-in datatypes are not suitable for use within RDF. For example, the QName datatype requires a namespace declaration to be in scope during the mapping, and is not recommended for use in RDF. [RDF-MT] contains a more detailed discussion of specific XML Schema built-in datatypes.
When the datatype is defined using XML Schema:
The canonicalization rules required for XML literals are quite complicated. Increasingly, RDF is produced and consumed in environments where no XML parser and canonicalization engine is available. A possible change to relax the requirements for the lexical space, while retaining the value space, is under discussion. This is ISSUE-13.
RDF provides for XML content as a possible literal value. Such content is indicated in an RDF graph using a typed literal whose datatype is a special built-in datatype rdf:XMLLiteral, defined as follows.
http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral
.Not all values of this datatype are compliant with XML 1.1 [XML11]. If compliance with XML 1.1 is desired, then only those values that are fully normalized according to XML 1.1 should be used.
XML values can be thought of as the [XML-INFOSET] or the [XPATH] nodeset corresponding to the lexical form, with an appropriate equality function.
RDF applications may use additional equivalence relations, such as
that which relates an
xsd:string
with an rdf:XMLLiteral
corresponding to
a single text node of the same string.
This section defines the RDF abstract syntax. The RDF abstract syntax is a set of triples, called the RDF graph.
This section also defines equivalence between RDF graphs. A definition of equivalence is needed to support the RDF Test Cases [RDF-TESTCASES] specification.
This abstract syntax is the syntax over which the formal semantics are defined. Implementations are free to represent RDF graphs in any other equivalent form. As an example: in an RDF graph, literals with datatype rdf:XMLLiteral can be represented in a non-canonical format, and canonicalization performed during the comparison between two such literals. In this example the comparisons may be being performed either between syntactic structures or between their denotations in the domain of discourse. Implementations that do not require any such comparisons can hence be optimized.
An RDF triple contains three components:
An RDF triple is conventionally written in the order subject, predicate, object.
The predicate is also known as the property of the triple.
IRIs, blank nodes and literals are collectively known as RDF terms.
An RDF graph is a set of RDF triples.
The set of nodes of an RDF graph is the set of subjects and objects of triples in the graph.
Two RDF graphs G and G' are equivalent if there is a bijection M between the sets of nodes of the two graphs, such that:
With this definition, M shows how each blank node in G can be replaced with a new blank node to give G'.
An IRI (Internationalized Resource Identifier) within an RDF graph is a Unicode string [UNICODE] that conforms to the syntax defined in RFC 3987 [IRI]. IRIs are a generalization of URIs [URI]. Every absolute URI and URL is an IRI.
IRIs in the RDF abstract syntax must be absolute, and may contain a fragment identifier.
Two IRIs are equal if and only if they are equivalent under Simple String Comparison according to section 5.1 of [IRI]. Further normalization must not be performed when comparing IRIs for equality.
When IRIs are used in operations that are only defined for URIs, they must first be converted according to the mapping defined in section 3.1 of [IRI]. A notable example is retrieval over the HTTP protocol. The mapping involves UTF-8 encoding of non-ASCII characters, %-encoding of octets not allowed in URIs, and Punycode-encoding of domain names.
Some concrete syntaxes permit relative IRIs as a shorthand for absolute IRIs, and define how to resolve the relative IRIs against a base IRI.
Previous versions of RDF used the term
“RDF URI Reference” instead of “IRI” and allowed
additional characters:
“<
”, “>
”,
“{
”, “}
”,
“|
”, “\
”,
“^
”, “`
”,
‘“
’ (double quote), and “
” (space).
In IRIs, these characters must be percent-encoded as
described in section 2.1
of [URI].
Interoperability problems can be avoided by minting only IRIs that are normalized according to Section 5 of [IRI]. Non-normalized forms that should be avoided include:
http://example.com:80/
);
http://example.com/
is preferrablehttp://example.com
);
http://example.com/
is preferrable/./
” or “/../
” in the path
component of an IRI%3F
” is preferable over
“%3f
”)This section is a major departure from RDF 2004
as simple literals are now treated
as syntactic sugar for xsd:string
typed literals. Further changes
to RDF's literal design are under consideration:
Language-tagged literals
may receive a datatype, and
rdf:PlainLiteral
s [RDF-PLAINLITERAL]
may be folded into the design somehow. This is
ISSUE-71.
A literal in an RDF graph is either a typed literal or a language-tagged literal.
All literals have a lexical form being a Unicode [UNICODE] string, which should be in Normal Form C [NFC].
Language-tagged literals have a lexical form and a non-empty language tag as defined by [BCP47]. The language tag must be well-formed according to section 2.2.9 of [BCP47], and must be normalized to lowercase.
Typed literals have a lexical form and a datatype IRI being an IRI.
Concrete syntaxes may support simple
literals, consisting of only a lexical form
without any language tag or datatype IRI. Simple literals only
exist in concrete syntaxes, and are treated as
syntactic sugar for abstract syntax
typed literals with the datatype IRI
http://www.w3.org/2001/XMLSchema#string
.
Simple literals and language-tagged literals are
collectively known as plain literals.
Earlier versions of RDF allowed simple literals in the abstract syntax.
Literals in which the lexical form begins with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML11].
Earlier versions of RDF permitted tags that adhered to the generic tag/subtag syntax of language tags, but were not well-formed according to [BCP47]. Such language tags do not conform to RDF 1.1.
When using the language tag, care must be taken not to confuse language with locale. The language tag relates only to human language text. Presentational issues should be addressed in end-user applications.
The case normalization of language tags is part of the description of the abstract syntax, and consequently the abstract behaviour of RDF applications. It does not constrain an RDF implementation to actually normalize the case. Crucially, the result of comparing two language tags should not be sensitive to the case of the original input.
Two literals are equal if and only if all of the following hold:
RDF Literals are distinct and distinguishable
from IRIs; e.g. http://example.org/
as an RDF
Literal (untyped, without a language tag) is not equal to
http://example.org/
as an IRI.
The datatype IRI refers to a datatype. For XML Schema
built-in datatypes, IRIs such as
http://www.w3.org/2001/XMLSchema#int
are used. The IRI
of the datatype rdf:XMLLiteral may be used.
There may be other, implementation dependent, mechanisms by which
IRIs refer to datatypes.
The value associated with a typed literal is found by applying the lexical-to-value mapping associated with the datatype IRI to the lexical form.
If the lexical form is not in the lexical space of the datatype associated with the datatype IRI, then no literal value can be associated with the typed literal. Such a case, while in error, is not syntactically ill-formed.
In application contexts, comparing the values of typed literals (see section 6.5.2) is usually more helpful than comparing their syntactic forms (see section 6.5.1). Similarly, for comparing RDF Graphs, semantic notions of entailment (see [RDF-MT]) are usually more helpful than syntactic equality (see section 6.3).
The blank nodes in an RDF graph are drawn from an infinite set. This set of blank nodes, the set of all IRIs and the set of all literals are pairwise disjoint.
Otherwise, this set of blank nodes is arbitrary.
RDF makes no reference to any internal structure of blank nodes. Given two blank nodes, it is possible to determine whether or not they are the same.
Blank nodes do not have identifiers in the RDF abstract syntax. The blank node identifiers introduced by some concrete syntaxes have only local scope and are purely an artifact of the serialization.
In situations where stronger identification is needed, systems may systematically transform some or all of the blank nodes in an RDF graph into IRIs [IRI]. Systems wishing to do this should mint a new, globally unique IRI (a Skolem IRI) for each blank node so transformed.
This transformation does not change the meaning of an RDF graph, provided that the Skolem IRIs do not occur anywhere else.
Systems may wish to mint Skolem IRIs in such a way that they can recognize the IRIs as having been introduced solely to replace a blank node, and map back to the source blank node where possible.
Systems that want Skolem IRIs to be recognizable outside of the system
boundaries should use a well-known IRI [WELL-KNOWN] with the registered
name genid
. This is an IRI that uses the HTTP or HTTPS scheme,
or another scheme that has been specified to use well-known IRIs; and whose
path component starts with /.well-known/genid/
.
For example, the authority responsible for the domain
example.com
could mint the following recognizable Skolem IRI:
http://example.com/.well-known/genid/d26a2d0e98334696f4ad70a677abc1f6
IETF registration of the genid
name is
currently in progress.
RFC 5785 [WELL-KNOWN] only specifies well-known URIs, not IRIs. For the purpose of this document, a well-known IRI is any IRI that results in a well-known URI after IRI-to-URI mapping [IRI].
The Working Group will standardize a model and semantics for multiple graphs and graphs stores. The charter notes:
The RDF Community has used the term “named graphs” for a number of years in various settings, but this term is ambiguous, and often refers to what could rather be referred as quoted graphs, graph literals, IRIs for graphs, knowledge bases, graph stores, etc. The term “Support for Multiple Graphs and Graph Stores” is used as a neutral term in this charter; this term is not and should not be considered as definitive. The Working Group will have to define the right term(s).
Progress on the design for this feature is tracked under multiple issues:
This section is non-normative.
This section does not address the case where RDF is embedded in other document formats, such as in RDFa or when an RDF/XML fragment is embedded in SVG. It has been suggested that this may be a general issue for the TAG about the treatment of fragment identifiers when one language is embedded in another. This is ISSUE-37.
This section treats the RDF/XML media type as canonical for establishing the referent of IRIs that include fragment identifier. Today we have many different media types that can carry RDF graphs, and HTTP content negotiation is more common. Also, the problem addressed in the section (context-dependence of fragment identifiers) has to some extent gone away when RFC 2396 was replaced by RFC 3986. The latter states that the same fragment should be used for the same thing in resources that have multiple representations (Section 3.5 [URI]). This is ISSUE-69.
RDF uses IRIs, which may include fragment identifiers, as context free identifiers for resources. RFC 2396 states that the meaning of a fragment identifier depends on the MIME content-type of a document, i.e. is context dependent.
These apparently conflicting views are reconciled by
considering that an IRI in an RDF graph is treated
with respect to the MIME type application/rdf+xml
[RDF-MIME-TYPE]. Given an IRI that includes a fragment identifier,
the fragment identifer identifies the same thing
that it does in an application/rdf+xml
representation of the
resource identified by the IRI excluding the fragment identifier. Thus:
eg:someurl#frag
is used in an RDF
document, eg:someurl
is taken to
designate some RDF document (even when no such document can
be retrieved).eg:someurl#frag
means the thing
that is indicated, according to the rules of the
application/rdf+xml
MIME content-type as
a “fragment” or “view” of the RDF document at
eg:someurl
. If the document does not
exist, or cannot be retrieved, or is available only in
formats other than application/rdf+xml
, then exactly what
that view may be is somewhat undetermined, but that does not
prevent use of RDF to say things about it.application/rdf+xml
document acts as an
intermediary between some Web retrievable documents (itself,
at least, also any other Web retrievable IRIs that it may
use, possibly including schema IRIs and references to other
RDF documents), and some set of possibly abstract or non-Web
entities that the RDF may describe.This provides a handling of IRIs and their denotation that is consistent with the RDF model theory and usage, and also with conventional Web behavior. Note that nothing here requires that an RDF application be able to retrieve any representation of resources identified by the IRIs in an RDF graph.
This section is non-normative.
This section does not yet list those who made contributions to the RDF 1.1 version, nor does it list the current RDF WG members.
The RDF 2004 editors acknowledge valuable contributions from Frank Manola, Pat Hayes, Dan Brickley, Jos de Roo, Dave Beckett, Patrick Stickler, Peter F. Patel-Schneider, Jerome Euzenat, Massimo Marchiori, Tim Berners-Lee, Dave Reynolds and Dan Connolly.
This specification contains a significant contribution from the designers of the RDF typed literal mechanism, Pat Hayes, Sergey Melnik and Patrick Stickler. The document draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha.
This specification is a product of extended deliberations by the members of the RDFcore Working Group and the RDF and RDF Schema Working Group.
This section is non-normative.
application/rdf+xml
is archived at http://www.w3.org/2001/sw/RDFCore/mediatype-registration .