Note: Community Groups are proposed and run by the community. Although W3C hosts these
conversations, the groups do not necessarily represent the views of the W3C Membership or staff.
I am looking for people to spend 15 minutes with a study into the difficulties experienced understanding and reasoning with DLs. See http://people.kmi.open.ac.uk/warren/
All responses will be anonymised and only aggregate results will be published.
What is a MCQ and it’s parts (stem, key, distractors)
Example of good and bad distractors.
What makes a good one: Item response theory: Tuned difficulty, low guess ability, and right discrimination.
MCQ generation is difficult and time consuming and you need a lot of them (exam coverage, security, and practice exams).
Automation!
Difficulty prediction. Needed to increase validity and exam quality. People are bad at it.
Similarly conjecture (degree of similarity between keys and distractors is proportional to difficulty). Existing measures didn’t work so we developed new ones (see ISWC poster).
Key questions: Can we control difficulty? Can we generate exams? Is it cost effective?
Experiment description: 1) Build ontology. 2) Generate questions. 3) Expert (3 — Two instructors and one domain expert) review. 4) Test with students.
Two rounds of student tests (in review session and online).
Usefulness rating: High heterogenity (only 1 question agreed upon by all reviewers).
Distractor utility: For all 6 questions, 2 out of 3 distractor.
5 out of 6 questions the key was picked more frequently than the distractors.
Discrimination: in class better than online
Difficulty: 4 out of 6 correctly predicted by tool.
Check out the tool:
http://edutechdeveloper.com/MCQGen
QA
QUESTION: About reusability: Often the labels are general and not specific for domains, so may not be suitable. I.e., reusability in general might hurt applicability.
Need to be able to write things in one’s own language (and there are a lot of languages).
OWL has challenges (e.g., Manchester syntax not suitable for Spanish).
We need a lot more linguistic annotation than alternative terms.
Some approaches separate linguistic and ontological layer. Community Group ontolex-lemon.
Bantu langauges: spoken by >200 million. Being picked up by the likes of Microsoft, Google, and Facebook.
Each noun (in Bantu) (class in OWL) have between 10 and 23 noun classes. Complexity which must be captured!
“Concordial agreement of verb with nc of noun of OWL class)
No Semantic Web stuff for Bantu. Some XML. Some work on multilingual ontologies.
lemon defines verbalisation of ontological elements in a particular language. lemon advocates GOLD and ISOcat which don’t do the job for Bantu noun classes. E.g., nc is like gender but with semantic significance.
To address this, developed a Noun Class Ontology (small, e.g., 42 classes, 6 op, and 130 axioms.).
Word variation. lemon generally uses Perl like regexs. Chichewa has too much complexity for this.
Agglutination is a challenge.
Application: Lexicalise FOAF and GoodRelations in Chichewa.
(foaf:knows requires a big chunk of rules!)
FOAF: 1:1 correspondence for classes; some odd things; object properties are usually verb phrases; data properties are easier. FOAF lemon covers 90% of foaf.
GoodRelation: Very domain specific thus difficult due to lack of terms in Chichewa. Only 25% of the entities were lexicalized.
QUESTION: What are the actual remaining problems with OWL re: multilingualism? Annotations don’t quite do it. Like name of the OP property varying with type of object.
QUESTION: Is this a problem with programming languages as well? Yes, people are tackling it.
QUESTION: You mentioned issues with Manchester Syntax, can we localise it by varying things a bit? No, it’s much harder. [Ed; Much much harder!]
Two task:
surveys http://goog.gl/Cjpqtg 10 examples for intuitiveness, conciseness, and understandability
usability for writing small ontology
Results: Latex near as good s Manchester on intuitiveness and better than everything else. Concision, latex was way ahead with functional next (9.7;5.3)
re ont building: Difficulty 3.5, syntax easy to remember 3.17. Compare to prior syntax 3.67 (all on 5 point scale).
QUESTION: Does it work with latex files? Nope!
QUESTION: What are the characteristics of the participants? See poster!
POINT: The fragments displayed are non equivalent in many ways.
QUESTION: Do you need declarations in your syntax? Need to check grammar.
Ontology matching problem (same domain, variant modelling, how to align?)
Three matching principles: consistency principle (no new incoherent classes), conservatively principle (no new entailments inside a single signature), locality (map neighbourhoods)
Conservativity (Deductive)/Deductive differnce (are there different entailments over a given signature).
Approach: Approx diff (atomic); use modularisation, projection to propositional horn, reuse semantic indexing from LogMap to help scalability; disjointness assumption [Sch05] helps basic violation reduction to sat; equivalence violations detected using answer set programming
[[sorry, had to prepare for covering the next talk![[
Lots of violations in experiments. Reasonable computation time (80secs)
Fully automated and “conservative” repair (as well as detection)
Tons of data and ontologies.
What can we do with it? SPARQL Queries (yeek!); Controlled Natural Languages for query (e.g., Quelo); Visual Query formulation (geek); and Faceted Search!
Lots of work on faceted search on semantic web. What’s the common principle?
Faceted Search over RDF
Search over several sets of items
Progressive filters (which extend a query)
output a user-chosen subset of items
Variance in systems is within this paradigm (what you can filter, what filters to add, etc.)
Existing Solutions: Don’t use ontologies but are data driven; no theoretical underpinnings (e.g., what fragments of SPARQL are covered; complexity of those fragments; formal capture of update)
Formalised faceted interface tailored toward RDF and OWL abstracted from GUI.
Study expressive power and complexity
Study interface generation and update
Result: SemFacet system (scales to millions of triples).
Simple mapping of facets to predicates and conjunctions and disjunctions. Then translate to FO (i.e., positive Existential formulas, monadic, directed tree rooted at free variable; disjunctive connected share one variable).
Combined complexity: Faceted query answering over RDF datasets is tractable. WRT to (active domain) RL (p-Complete), EL p-complete, QL, in P; (classic semantics) RL: P-complete; EL (guarded) P-Complete; QL NP-complete.
Bottom up evaluation.
Interface generation & update. Algos guided by ontology and data. Every facet in the initial interface is justified by an entailment.
Each update is semantically justified>
Facet graph. Project ont and data on graph. Nodes are possible facet values. Edges are facet names. Every edge must be justified by an entailed axiom or fact.
Systesm (SemFacet) combines keyword and faceted search.
Auto generation of f-search
In memory
online and off lien reasoning
Scales to millions of triples
Configurable
Keynote: Claudia d’Amato. Machine Learning for Ontology Mining: Perspectives and Issues
Use ML for Ontology mining
Inductive learning (robust)
Supervised, Unsupervised & semi-supervised concept learning (basic refresher)
Instance retrieval regarded as a classification problem; challenging for State of the Art ML Classification
SOTA applied to feature vector representation (not relational DL expressivity)
Implicit closed world assumption (unlike DLs)
SOTA treat classes as disjoint (unlike DLs)
Solutions: New semantic similarity measures for DL representations; cope with all problems
Problem Definton: Given a populated ontology, a query concept, and training set with +1 0 -1 as targets learn a classification st f(a) = +1 if a in Q, -1 if in ~Q, and 0 otherwise.
Duel problem, given individual find all C it bellows to
Example, nearest neighbour (given a similarity metric) voting.
How to evaluate classifiers.
Compare to standard reasoner (Pellet) but this didn’t help with the “new” knowledge
Added evaluation parameter, match rate, omission error rate, commission error rate, induction rate: Key bit: classic reasoner is indecisive and Classifier is determined
Commission and omission rates are basically null; induction rate is not null! New knowledge (perhaps supporting semi-automated ontology population)
Most of the time the most effective method is relational K-NM
Most scalable kernal method embedded in blah balh
Concept Drift and Novelty Detection via Conceptual clustering methods
Clustering: Intra-cluster similarity is high and inter-cluster similarity is low.
(Key idea: Global Decision boundary; if new candidate cluster is out side, then it’s either novel or drift depending how it relates to existing clusters)
Evaluation needs domain expert 🙁
How to learn intensional description of the new clusters (separate and conquer vs. divide and conquer)
Ontology enrichment as pattern discovery problem
[[holy moly, a lot more stuff; learning DL Safe rules for a variety of things using a variety of techniques]]
Data driven tableaux — drive/guide the reasoning process using data induced rules.
QA QUESTION: Is the data driven tableau for unsound inferences or for optimisation? Both!
QUESTION: Tell us a bit more about scalability? What’s the size of the ontologies? Started with toy onts with 1000s of individuals. Scaling up!
First talk, “Nicolas Matentzoglu and Bijan Parsia. The OWL Full/DL gap in the field” (Nico presenting).
Motivation: We were assembling a corpus for a(n OWL DL) reasoner competition. OWL Full is a “problem” for us.
(David: Current OBO is well within OWL DL now!)
Measuring the Gap: In our corpus: 81% OWL Full. <– prima facie odd! (For OWL DL folks :))
Nice tour of how things can be OWL Full in a “silly” way (lack of declarations, sub property of rdfs:label) with concrete examples.
After fixing the “silly” violations, we end up 40% OWL Full (50% reduction).
Declaration failures <– Ugh!
Reserved vocabulary “misuse”: Subpropertying rdfs:label seems harmless. SubClassOf(A, rdf:type) less so.
Procedure: Crawl based corpus. Load and get metrics (using OWL API profile check). Repair. Check metrics.
2/3s of the violations are Declaration Failures!!
(Chitchat about reserved vocabulary)
Q&A
Point: Old OBO translator is suboptimal. Update to the latest OWL API.
Question: Can I get the fixed versions? Sure!
Question: Versions in corpus? We didn’t sanitise it beyond some minor automated stuff.
——-
David Carral, Adila Krisnadhi, Sebastian Rudolph and Pascal Hitzler. All But Not Nothing: Left-Hand Side Universals for Tractable OWL Profiles presented by Adila.
Problems with universals (vacuous applicability).
If you say All X are Y, we normally assume that there’s at least one X.
“onlySome” R only C and R some C <–coupled! (Common “good practice”)
Called “witnessed universal”
Can be added to OWL EL (and horn-SROIQ) without compromising polynomiality but only on LHSs.
Shown by a rewriting into ELH.
Proposed some syntax extensions. (This doesn’t work for OWL RL or QL.)
QA
Question: Doesn’t this destroy the “arbitrary use of contructs property”? Yes, but we don’t know how this affect modellers.
Question: Can we use this with ELK? yes.
Question: What entailments does this support? In EL we can’t see any? Dunno! Good question.
——-
Nicolas Matentzoglu and Bijan Parsia. OWL/ZIP: Distributing Large and Modular Ontologies presented by Nico
How do we distribute large and modularised ontologies?
Even if people distribute ZIPPED archives…no standard “starting” point.
AutoModularized ontologies can have hundreds of modules.
Key results:
Compression rates of 80-90% with greater rates on larger ontologies
Load time: (Note, unzip to disk, so pessimal): up to 90% overhead, but dropping to 50% for ontologies that take >1sec to load.
We want to standardise this sort of thing!
QA
Question: Carole Goble has been working on Research Objects and packaging them together so maybe look at that? (Also, they go further with ) There are some binary formats like this one HDT for RDF that adds indexing.
Question: What about versioning? Big general issue.
Question: Also collections.
Nico: apt-get for ontologies (ont-get).
COFFEE!!!!
———-
David Osumi-Sutherland, Marta Costa, Robert Court and Cahir O’Kane. Virtual Fly Brain – Using OWL to support the mapping and genetic dissection of the Drosophila brain. presented by David.
Going to talk about the Fly Brain instead 🙂
~200,000 neurons (5-10,000 types?)
Neurons have many structural and behavioural properties used for classification.
Complicated literature for published neurone classes.
Drosophila anatomy ontology
42% on the nervous system
50% of >10,000 classifications inferred.
Following the Rector normalisation pattern.
Richly annotations and axiomatised. Imports a bit of GO.
Expressiveness is ~EL without explicit nested class expressions.
“Brain region classes are defined with reference to volumes in standard brain.”
“part of” not as helpful as “part in” for neurone (i.e., they have parts in lots of other things). Various specific specialisations of “overlaps”.
QUESTION: Michel had some issue with using role chains instead of generic part of (query issue).
Discussion of image queries (nested expressions! :))
Beyond EL!
“Complete knowledge of spatial information about neurone is common”
QA
Question: Can you reconstruct the full neuron track information? Yes, we have all this low level microscopy and then [argble bio bargle]. Question: Do you need negation or epistemic negation? Our closure works except for scaling and it uses e.g., our role chains etc.
———
Chris Mungall, Heiko Dietze and David Osumi-Sutherland. Use of OWL within the Gene Ontology presented by David.
Historic conception of GO/OBO as DAG.
Then translation into FOL which was ditched.
New version of the translation by Horrocks et al.
Design patterns in GO (pre formalisation) which improved quality.
OBO(1.4) is OWL (yay!)
Roundtripping is effective.
Go comes in lots of versions and the default version is axiom-light.
TermGenie: Web based, templated term submission (with inference checking!) (Sounds supercool)
More about property chains and partonomy.
Nice discussion of “challenges of inputs”
OBO Relations Ontology
Spatial disjointness
Taxon constraints via macro expansions.
OBO format discussion: list of valuable properties (hackability important) Big ones: Readable diffs and easy stable VC.
QUESTION: What makes it VC friendly? Standard pretty printing of serialisation. R: So we could lift this to other syntaxes?
Moving all editing to Protege via plugins.
Managing inference. GO caches inferences in file. Very bad editing cycle at the moment.
Plans to speedup cycle.
Smuggling a Little EL into databases! Class expressions for annotators.
Rise of the ABoxes (standard conversion of GO annotations to ABox individuals.) Having links of annotations to give fuller narratives.
QA
Question: Are the annotation structures related to research object or micro-/nano-publications? Offline!
Question: What’s up with the properties? (loads of discussion)
——–
Keynote: Nicola Guarino. On the semantics of reified relationships
The intro is a bit weird. If you are going to explain how you’re different from the OWL community to have a reasonable understanding of the OWL community, e.g., we just had a two talks on content!
Relations vs. Relationships. Reification of relationships. Facts (true propositions) vs. episodes (truth makers). How episodes and events relate.
Relation is a class of tuples. Relationship is a tuple.
Both relation and relationships can be reified.
Cardinality constraints are on relations.
We might have constraints on relationships. [[BJP: I didn’t understand the example of at most 1 spouse at a time]]
Common reifications: as assertions, as facts (situations/true propositions), as perdurants (events)
Propositions are true or false at certain times. Facts are true propositions. Events are world-bits that exemplify or instantiate a propositions. Situations are kinds of events. Events are time localised.
Episodes. Endurants (entities persisting in time) and perdurants (entities that happen in time). Person vs. a talk.
Ordinary endurance are called objects
No standard term for ordinary perdurants (event, happening, situation, etc.) Episode: a large class of relevant perdurants.
Relevants
unity criterion (maximality)
time and context
Episode are perception relevant (context is perceptually bound).
Kinds of relationships (a rough taxonomy)
Permanent relationships
Essential (greater(3,2))
Contingent
instrinsic: same-blood-group(John, Mary)
extrinsic: born-in(John, Brazil)
Temporary relationships
Intrinsic: taller(John, Mary)
Extrinsic: loves(John, Mary)
All temporary relationships require an episode as their truth maker
Some permanent relationships (extrinsic) require an episode as their truth maker
Whenever there’s a time varying property, consider putting truth-making episodes in domain of discourse.
Episodes are better than events because events are too limited?
——
Matthew Horridge, Csongor I Nyulas, Tania Tudorache and Mark Musen. WebProtégé: a Web-based Development Environment for OWL Ontologies presented by Matthew.
Google Docs for OWL ontologies.
10,000 projects; 300_ users
“Horrified to find out that WebProtege has been around for quite a while.”
WebProtege 1.0 based on Protege 3. WebProtege 2 with new UI and based on the OWL API. Simplified UI plus better OWL 2 support for experts.
OpenSource usign GWT hosted on GitHub. Locally installable.
DEMO! (Which was AWESOME AND INTERESTING!)
(In particular the reasoning architecture is awesome.)
Simple profile coverage (i.e., what the simplified interface can handle)
Custom form editor!
GitHub integration as a future option.
——-
Ewa Kowalczuk, Jędrzej Potoniec and Agnieszka Lawrynowicz. Extracting Usage Patterns of Ontologies on the Web: a Case Study on GoodRelations Vocabulary in RDFa presented by Ewa.
Analysis of GoodRelation annotations published on the web in RDFa.
Try to show a bunch of stuff, including OWL usage, that our pattern tool works, etc.
Used Web Data Commons extracted from Common Crawl.
Fr-ONT-Qu example.
over 2.6 billion quads
They used recursive concise bounded descriptions (haven’t seen those in AGES).
(Ooops, I got caught up with playing with WebProtege! Bad liveblogger!)
We’re in a discussion of good vs. bad patterns (expressed as sparql queries) found.
The OWL pattern is a bit odd! Seems to reference without using the ontology.
Results are online: http://semantic.cs.put.poznan.pl/~ekowalczuk/OWLandGR/
——
POSTER PITCHES
Alexander Šimko and Ondrej Zamazal. Towards Searching for Transformation Patterns in Support of Language Profiling
https://code.google.code/p/tpgen
Automating of Ontology Analsyis something something (not in programme!) Ontology summaries, roughly. Ultimately determine relation between features in ints and aspects of tools.
Catalina Martínez Costa and Stefan Schulz. An example of approximating DL reasoning by ontology-aware RDF querying Motivated by semantic interoperability problem in diagnosis support systems.
Rafael Peñaloza and Aparna Saisree Thuluva. COBRA, a Demo Sub-ontologies offered as views. Instead of materialising each subontologies, synthesise a single ontology. Context based stuff.
Zubeida Khan and C. Maria Keet. The ROMULUS resource for using foundational ontologies FO are hard: philosophical notions, which one, link to which, scability.
Even if you didn’t submit a paper, please consider attending. We’re going to have some working sessions to start the process of drafting Community Reports which may very well change your experience of OWL!
We’re getting it sorted, but until then here’s some details:
OWLED early registration fees Euros 150 (until October 5th). Regular fees Euros 200 (from October 6th). The registration fees cover both days of the workshop and include coffee breaks and the poster reception.
Work expands to meet the deadlines…so our deadline extension to push out of conflict with the Vienna Summer of Logic did not prevent a flood of piteous pleas that even my stoney heart could not resist…thus, we have extended the deadline for OWLED to Aug 4th.
If you aren’t sure if what you have is appropriate, please drop me a line! OWLED is very inclusive.
The 2014 OWL Reasoner Evaluation Workshop and Competition (ORE2014) is the 3rd iteration and the second when there was a meaningful competition. So hurray! We had the workshop today (the competition will be held during DL 2014, to facilitate betting) and it was quite good. In particular, there was some super interesting work on new benchmarks (esp. for conjunctive query) and some reasoner-breaking ontologies with nice investigations. I’ll invite some of those folks to write a blog post about these.
The business meeting (see slides) focused on the critical question of whether
to move organisation into the OWLED community group
to merge ORE with OWLED (or tightly bind them).
Everyone is fine with using the community group (that’s, after all, secondary to the actual organisation structure we develop), but there was a marked preference for DL over OWLED as colocation partner (i.e., 14 would go happily to ORE with DL and only 2 happily and 9 (for 11 total) would go to ORE with OWLED). A lot depends on the participation and enthusiasm of OWLED folks.
The DL community very much enjoyed the live competition last year. It’ll be interesting to see how they react this year. Overall, I think convergence is a good thing and ORE more as “users meet developers” would be a great thing in OWLED.
I’d be remiss not to tip my hat to all those involved in the organisation. ORE is more work than the typical workshop because a competition is a very hard thing to get even marginally right! PC Chairs Ernesto Jiménez-Ruiz and Samantha Bail continued their sterling work even though they couldn’t be there due to (resp) new child and new job. The Competition Organisers were (test infrastructure) Birte Glimm and Andreas Steigmiller and (test corpus) Nicolas Matentzoglu with some help from me. Our invaluable local organizers were Magdalena Ortiz and Mantas Šimkus while Thomas Krennwallner runs the Olympic Games overall.
As last year, Konstantin Korovin, of the University of Manchester, donated his Royal Society (grant RG080491) funded cluster to run the competition.
The T-Shirts this year are so awesome I can barely contain myself. They feature a unicorn and random people wanted one because they “were so fun”. My legacy is complete!
The Call For Papers for OWLED 2014 (colocated with ISWC 2014 in Trentino Italy on Oct 17 and 18) has been out without a blog post here so I take the fact that we’ve updated the submission date to July 30 (to avoid having the due date occur in the middle of the Vienna Summer of Logic) to rectify this fact.
We’re going to take the first steps toward merging the OWL Reasoner Evaluation (ORE) workshop and contest with OWLED. ORE was great fun last year (with extensive betting on the live contest!) and flushed out lots of interesting reasoner that I, personally, had never heard of before. Part of the contest set is derived from a large crawl of ontologies on the Web, but other “challenge problems” were submitted by working ontologists. This makes ORE a perfect fit for OWLED and I hope that folks making and using ontologies step up to the challenge of making reasoner developers cry.
The other big goal is getting standardisation efforts rolling again. OWL 2 was a great step, but there are still lots of little niggles we could tackle even before thinking of major extensions. SPARQL 1.1’s entailment regimes have made querying against OWL ontologies with OWL semantics a reality, but the syntax isn’t always ideal and we don’t have a set of best practices around it. The Community group provides a mechanism for publishing reports, so we should use that! If you have a spec or extension idea, please submit a paper about it or contact me directly (even a comment on this post would be good!)
As some of you may know, I’ve been selected to be General Chair of OWLED 2014.
Fun!
It’s my very great pleasure to announce that Valentina Tamma and Maria Keet have agreed to be co-PC chairs. We’ll be sorting out the PC and CFPs shortly.
In all probability, we will be colocated with ISWC 2014 (there are few more details to be ironed out).
A lot of OWLED this year will be familiar (submit papers/come have fun!), but I also hope to get us back to our roots as an activist community which is effective in improving the state of OWL for all of us.
To that end, I’ve started organizing this community group a bit more (for example, I just rationalized the number of chairs to just me (qua general chair) and Pavel (qua rep of the steering committee); this will evolve) and plan to transition a lot of the normal OWLED infrastructure here. I also hope to get some traction on “reports” (e.g., specs for new features).
Also, we’ve an intention to fold the OWL Reasoner Evaluation Workshop into OWLED. ORE was a great success last year, but we really want to connect reasoner evaluation to people’s lived experience of them. To that end, I think we as a community can make this into a useful way to improve the state of the art as well as reward hardworking reasoner developers.
Chris Mungall initiated a controversial blog post [1] regarding the challenge of diff’ing OWL files stored in version control systems. Is it possible to specify an ordering of OWL axioms so that these play more nicely in standard VCS?