The Semantic Web Overview
Luis Snchez Fernndez
Departamento de Ingeniera Telemtica Universidad Carlos III de Madrid
Short history of the Web
1990: Creation of World Wide Web infraestructure at CERN by Tim Berners Berners-Lee
HTTP, HTML, first Web client, client, first Web server
1993: Mosaic Mosaic, , first graphic Web client 1994: Netscape Navigator 1996: Commercial use of WWW is generalized 1999: Tim Berners Berners-Lee proposes the Semantic Web 2002: Weblogs and RSS Web 2.0 6th October 2009: at least 8 billion indexable Web pages 23rd September 2010: at least 15 billion indexable Web pages
according to http://www.worldwidewebsize.com/
The problem of information overload
The great success of the web has lead to one of its current problems: information overload
Difficult and time costly to find and update relevant information for people and companies
Ex.: keep an updated state of the art
Company employees can use up to 20% of their working time searching in the Web (Outsell Inc, 2002)
Web problems and pitfalls example: search engines
We make queries and get Web pages that are not related to what we wanted We make queries and do not get Web pages that are related to what we wanted
Another example
Search for images in flickr.com
Java (island (island/ /bird/ bird/coffee/ coffee/programming language) language) vela (Spanish (Spanish) ) (in English: English: candle/ candle/sail) sail)
Web problems and pitfalls example example: : search engines
Some reasons of search engines problems: problems:
polisemy/ polisemy/homonym synonymy multilinguism
In summary summary: : search engines are not based on meanings but in terms (syntactic search) search)
Semantic Search Engine v1.0
We need a notation to identify meanings
example: example: (term (term, , number) number) tuple
(George Bush , 1) Bush senior (George Bush , 2) Bush junior
To attach to (parts of) a document the meaning of terms that are mentioned on it it: : semantic annotation
Semantic Annotation
How somebody knows which is the code of some concept?
By means of vocabularies shared by a community of users
example: example : Wikipedia
How somebody knows that the concept represented by some code is referenced in a given Web page?
semantic annotation the Web is an open world need for trustness mechanisms
Types of vocabularies
Free text Controlled vocabulary Thesauri narrower term relation Informal is-a Formal is-a General Frames Logical (properties constraints ) Value Restrs. Disjointness, Inverse, Part-Of ...
Terms/ glossary
Formal instance
Lassila O, McGuiness D. The Role of Frame-Based Representation on the Semantic Web. Technical Report. Knowledge Systems Laboratory. Stanford University. KSL-01-02. 2001.
Controlled Vocabulary
Example: Example : a catalog
http://www.todocoleccion.net/catalogo.cfm
Glossary
http://www.essentialsofmusic.com/glossary/glossary.html
Thesaurus
Example: Example : UNESCO Thesaurus
http://www2.ulcc.ac.uk/unesco/thesaurus.htm
Informal IsIs-a
IsIs-a: specifies that a concept is narrower than other
A Professors is is-a Human Luis is is-a Professor
Informal IsIs-a: although the relation is usually fulfilled there is not a 100% guarrantee
A Mammal is is-a Not EggEgg-Lying Animal
Platypus
WordNet
Developed at Princeton University Contains nouns, nouns, verbs, verbs, adjetives, adverbs Organized in synsets (synonyms lists + gloss) gloss) Meanings identified by tuples (term, term, number) number) Meanings identified by number Semantic relationships among synsets WordNet Multilingual version: version: EuroWordNet
Formal IsIs-a
A hierarchy of concepts
Logical constraints
Additional properties Axioms that model the relations between concepts and properties in the vocabulary
Ontology definition
Gruber, Borst, 1993:
An ontology is a formal, explicit specification of a shared conceptualization
URIs
Mechanism used to represent namespaces Used to identify resources
anything you can talk about in a Web document
URI components:
URI scheme: http, ftp, urn, ... name
Types of URIs: URL, URN
URIref
URI plus (optionally) fragment identifier Example
http://www.example.org/index.html#section2 URI: http://www.example.org/index.html Fragment identifier: section2
Web problems and pitfalls example: search engines
Questions and answers: answers:
Which are the names of Spanish Comunidades Autnomas? Who is the Spanish political party PSOE president? president ?
10
Semantic Search Engine v2.0
Semantic annotations extension Statements formal representation: representation:
Grin is PSOE president president
isPresident( isPresident (Grian Grian, , PSOE)
Andaluca is an Spanish Comunidad Autnoma
isCCAA(Andaluca) isCCAA (Andaluca)
Semantic Search Engine v3.0
Imagine the following query:
We want to find PSOE members that have positions in some Comunidad Autnoma government
We search in Web (1.0) and find this information:
NOTE: real example by 2005 year
11
Semantic Search Engine v3.0
Manuel Chaves is PSOE president PSOE is a political party Manuel Chaves is Andaluca government president Manuel Chaves is Juan Carlos Rodrguez Ibarra party comrade Andaluca is a Comunidad Autnoma Juan Carlos Rodrguez Ibarra is Extremadura government president Extremadura is a Comunidad Autnoma
Semantic Search Engine v3.0
Domain knowledge:
X, Y, Z, isPartyMember(X,Z) isPartyMember(X,Z) isPartyMember(Y,Z) isPartyComrade(X,Y) X, Y, isPartyComrade(Y,X) isPartyComrade(Y,X) isPartyComrade(X,Y) X, Z, isPartyMember(X,Z) isPartyMember(X,Z) isPartyPresident(X,Z) X, Y, isGovernmentMemberCCAA(X,Y) isGovernmentMemberCCAA(X,Y) isConsejeroCCAA(X,Y) X, Y, isGovernmentMemberCCAA(X,Y) isGovernmentMemberCCAA(X,Y) isPresidentCCAA(X,Y) X, Y, isPresidentCCAA(X,Y) isPresidentCCAA(X,Y) isPresident(X,Y) isCCAA(Y) X, Y, isPartyPresident(X,Y) isPartyPresident(X,Y) isPresident(X,Y) isParty(Y)
12
Algorithm
example example: : PSOE
Note: this is just a sample algorithm
Search in the (Semantic Semantic) ) Web for statements related to entities mentioned in the query Apply logic reasoning to get new statements Repeat Web search over entities (subject, subject, object) object) present in found statements Stop after a number of cycles
Step 1
Search for statements where PSOE appears You get: get:
isPresident( (Chaves,PSOE Chaves,PSOE) ) isPresident isParty isParty(PSOE) (PSOE)
Applying rules
X, Y, isPartyPresident sPartyPresident( (X,Y X,Y) ) isPresident isPresident( (X,Y X,Y) ) isParty(Y) isParty (Y) X, Z, esPartyMember( esPartyMember(X,Z X,Z) ) isPartyPresident isPartyPresident( (X,Z X,Z) ) isPartyPresident isPartyPresident( (Chaves,PSOE Chaves,PSOE) ) isPartyMember( isPartyMember(Chaves,PSOE Chaves,PSOE) )
You get: get:
13
Step 2
Search for statements where Chaves appears You get:
isPresident(Chaves,Andaluca) isPartyComrade(Chaves,Ibarra)
Applying rules
X, Y, Z, isPartyMember(X,Z) isPartyMember(X,Z) isPartyMember(Y,Z) isPartyComrade(X,Y) X, Y, isPartyComrade(Y,X) isPartyComrade(Y,X) isPartyComrade(X,Y)
You get:
isPartyMember(Ibarra,PSOE)
Step 3
Search for statements where Andaluca or Ibarra appear You get: get:
isCCAA isCCAA(Andaluca) (Andaluca) isPresident isPresident(Ibarra, (Ibarra, Extremadura)
Applying rules
X, Y, isPresidentCCAA sPresidentCCAA( (X,Y X,Y) ) isPresidente isPresidente( (X,Y X,Y) ) isCCAA sCCAA(Y) (Y)
You get: get:
isPresidentCCAA isPresidentCCAA( (Chaves,Andaluca Chaves,Andaluca) )
14
Step 4
Search for statements where Extremadura appears You get: get:
isCCAA isCCAA(Extremadura) (Extremadura)
Applying rules
X, Y, isPresidentCCAA( isPresidentCCAA(X,Y) X,Y) isPresident( isPresident (X,Y) X,Y) isCCAA(Y) sCCAA(Y)
You get: get:
isPresidentCCAA( isPresidentCCAA(Ibarra,Extremadura) Ibarra,Extremadura)
Query result
Chaves Ibarra
15
Semantic Search Engine v3.0: Components
Knowledge base Reasoner These are the well known basic components of a knowledgeknowledge-based system
Knowledge Base
Domain knowledge model Information recovered from the Web The domain knowledge model can be built reusing available components
Ex.: a model for locations (concepts like city, country, etc.) can be used in different applications (book a travel, a semantic search application for news items, etc.) Reusable knowledge components => ontologies
16
Ontology components
Classes (ex. Party, Comunidad Autnoma) Instances (ex. PSOE, Andaluca) Properties (ex. isPartyMember) Rules
knowledge creation restrictions reactive rules
Lightweight vs. heavyweight ontologies
What is the Semantic Web
Formal description (machine readable) of Web published contents
17
Technologies
Semantic annotation Ontology engineering Reasoners
Proposal (T. BernersBerners-Lee)
18
Applications
Semantic search
Show user just relevant results Question/answer functionality Multimedia content search
Information integration
Several, distributed sources (ex. databases) integrated by a common domain ontology + mappings
Services semantic description will enable its discovery Web task automatization
Ex. comparing prices between several commercial sites
Filling of forms based on semantics instead of syntax
Material for next session
Protg Ontology editor
http://protege.stanford.edu/
Pellet DL Reasoner
http://clarkparsia.com/pellet
OWL tutorial
http://www.cohttp://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf
19