Margherita Sini (Food and Agriculture Organization of the United Nations,
Rome, Italy)
Sachit Rajbhandari (Food and Agriculture Organization of the United Nations,
Rome, Italy)
Jeetendra Singh (Indian Institute of Technology, Kanpur, India)
Johannes Keizer (Food and Agriculture Organization of the United Nations,
Rome, Italy)
T.V. Prabhakar (Indian Institute of Technology, Kanpur, India)
Asanee Kawtrakul (Kasetsart University, Bangkok, Thailand)
Smart Organization of Agricultural Knowledge
The example of the AGROVOC Concept Server and Agropedia
Abstract: The tendency of representing information in a form that could be better elaborated by computers
(the so called “machine readable format”) (Berners-Lee 1998) initiated years ago, expanded to many domains,
among which Agriculture. The Food and Agriculture Organization of the United Nations, The Kasetsart
University and the Indian Institute of Technology Kanpur are pioneers in the representation of information and
knowledge related to this domain using modern techniques such as ontology languages. This paper analyzes a
couple of projects developed by these organizations, aiming to make use of a concept-oriented approach while
describing agricultural topics. It is organized in two chapters each referred to each project, describing in
particular the innovative aspects, the benefits, and the technology used.
1: The AGROVOC Concept Server
AGROVOC is a multilingual thesaurus for agriculture developed in the early ‘80s by
the Food and Agriculture Organization and the European Commission. It was originally
developed in English and translated in the other four official languages of the UN agency
(i.e. Spanish, French, Arabic, Chinese). Due to its large use and success (in particular
within the AGRIS network, www.fao.org/agris), it has currently reached 20 languages.
As a thesaurus it has been used for indexing resources and for Information Retrieval (IR)
on AGROVOC-indexed documents and on free text search engines. However, it is known
how controlled vocabularies are of limited semantics and how their reengineering may
help improving IR, information discovery and to contribute to the Semantic Web vision
(Harper 2006).
In this line, the FAO started in 2001 the Agricultural Ontology Service Initiative
(AOS) aiming to provide better services to users making use of semantic technologies.
One of the services of the AOS is a concept-based system allowing users to organize and
manage the agricultural keywords of the AGROVOC thesaurus into network of concepts.
This idea led to the AGROVOC Concept Server, and therefore the following actions were
initiated: identification of the new structure that should be given to the concept server
(Soergel 2004), specification of the ontological model (Liang 2006), revision and
refinement of the AGROVOC Thesaurus data to be compliant with the new structure
(Kawtrakul 2005 and Sini 2009), and development of the collaborative platform for the
management of the reengineered data (Sini 2008).
Innovative aspects
The reengineering of thesauri into ontologies is a complex matter discussed already by
several authors and similar approaches
have been already implemented (van
Assem 2006). The solution at FAO was to use the Ontology Web Language (OWL) to
implement the new concept-based structure as indicated in Soergel (2004). The FAO
recognized the importance of being able to represent specific types of relationships at
concept level and at a term level, and this characterizes the AGROVOC Concept Server
backbone structure: all terms have been represented as instances of the class “noun”
which is a subclass of “lexicalization”. The concepts, derived from the AGROVOC main
descriptors, are then connected with the different instances with the hasLexicalization
property. For a more detailed description of the full model see (Liang 2006). This fact led
to the ability of representing all lexicalization and terminological information in addition
to the primary conceptual structure of the concept hierarchy. This has been further
elaborated in the Lexical Information Repository model of the NeOn EU project
(Montiel-Ponsoda 2008).
Benefits
Once the AGROVOC Concept Server and its management tool, the Workbench, will
be realized, many agricultural related concepts will be identified with a unique identifier
and will be represented with multiple terms in many languages. The terms may also
include spelling variants, acronyms, dialectal forms or local terms used in specific
geographical area. This led to the ability of: a) realizing URI-based indexing systems,
versus the traditional word-based indexing (generally using only English), with
consequent ability of creating catalogues more machine-interpretable; b) allowing more
interoperability with other systems using ontologies: the Concept Server may contain
mapping and linking to other URI, if not reusing those; c) allowing to the users the
freedom to use any language they wish or are used to, and any term they want to find
agricultural information (no need to refer always to preferred terms as defined in
controlled vocabularies).
Current efforts are undergoing in FAO in order to apply the Concept Server data to
existing cataloguing systems (e.g. AGRIS, the FAO document repository), and Kasetsart
University is also exploiting the same techniques and tools for improving access to
information for example for farmers in Thailand.
The Technology
The AGROVOC Concept Server model had been designed using OWL DL. A basic
model (www.fao.org/aims/aos) acts as a Foundational Agricultural Ontology identifying
how the agricultural concepts and their lexical representation should be represented.
The original AGROVOC Thesaurus (www.fao.org/agrovoc) is stored in MySQL
database. It has been converted to OWL using a java-based routine and imported in a
triple store using the Protégé API1.
The Workbench system is a Java-based web application. This application accesses the
triple store via the Protégé OWL API and outputs the results to the users in an AJAX style
using the Google Web Toolkit (GWT). Additional widgets from GWT incubator have
been used to design a more user friendly system.
Hibernate has been used to interact with the database providing high performance
object/relational persistence, easy retrieval/update of data, transaction management and
database connection pooling. Gilead (previously known as hibernate4gwt ) has been used
1
http://protege.stanford.edu/plugins/owl/api/
as a layer between GWT and Hibernate, allowing the use of persistent entities outside the
JVM.
AGROVOC Workbench also exposes its triple store data via web services and RSS
feeds. With these technologies, other applications can easily access data and news about
changes, thus reducing time and resources to download whole database and update the
latest version of the AGROVOC Concept Server data into their applications.
The system is available at naist.cpe.ku.ac.th/agrovoc.
2: The Agropedia project
Agropedia can be defined as an agriculture knowledge repository of universal meta
models and localized content for a variety of users with appropriate interfaces built in
collaborative mode in multiple languages.
The primary objective of the Agropedia project (agropedia.iitk.ac.in) is to build an
infrastructure of agricultural knowledge presented in various customized ways to
different stakeholders, being those scientists, students, extension workers, farmers, or
policy makers. Multilingual and localized information is very important in India.
In order to have a backbone system “smart” enough to allow easy interaction with
other data, e.g. with the agricultural data already available in several Indian institutes
such as the G. B. Pant University Of Agriculture & Technology, Patnagar, or the
International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), partners of
the project, the Indian Institute of Technology Kanpur (IITK) decided to develop
Knowledge Models (KMs), which should act as conceptual reference for many domains
starting from different crops such as Chickpea, Groundnut, Litchi, Pigeon pea, Rice,
Sorghum, Sugarcane, Vegetable pea, and Wheat.
While working with domain experts, an easy tool for knowledge representation to
work with is essential. Elaborated ontology editors such as Protégé, the NeOn Toolkit,
Swoop or others are too complex for non Information Management expert. Consequently,
IITK initiated a series of workshops in order to explain how to represent knowledge
models using the CMap tool (cmap.ihmc.us/coe) to agronomists, soil scientists, plant
breeders or geneticists, farm managers, and other experts. A series of maps were prepared
where agricultural concepts were represented in a consistent way across multiple maps.
Specific guidelines were elaborated so that “Seed_treatment” in the chickpea KM results
the same concept of the one used in the Sorghum KM.
Although the KMs were prepared with the CMap tool, known more as a visual tool for
human consumption, more than machine consumption, the maps were prepared in an
intelligent way: the guidelines homogenized the representation of the URI, allowing easy
linkage or mapping between maps. Instances and classes were differentiated thanks to the
functionalities of the tool, and object properties were taken from a registry for easy reuse.
The registry of relationships (containing object properties and data type properties) has
been build reusing some relationships from the AOS ontological relationships used to
refine the AGROVOC Thesaurus in the AGROVOC Concept Server
(aims.fao.org/en/website/Ontology-relationships) and has extended them, with
consequent mutual enrichment between the two registries.
However, not a unique registry could have been built because of the need of keeping
modularized agricultural information: the AGROVOC Concept Server aims to represent
general agricultural related information and be the base for the development of more
domain specific ontologies, the Agropedia KMs aim to represent more in detail
information related to local fertilizers,
etc.
soil, cropping techniques and methods,
Innovative aspects
Agropedia presents to users different semantically oriented tools: textual and audio
blogs, wikis, forums, and the KMs presented in different formats (pdf, static or
context-sensitive images). Users have the possibility to choose a preferred way of
navigating the KMs: as a simplified hierarchical list or as a network of concepts and
instances. In both cases, when the user clicks an element, he/she get as results all
resources in the knowledge base (catalogue) related to the selected item. In fact, resources
from the library catalogue are already tagged with the concepts from the KM, making the
system highly innovative compared to the traditional string search. No matter what
languages the maps are displayed, the results will be always the same (currently, KMs
exists in English and Hindi).
Benefits
Agropedia is an attempt to inject social networking and semantic technologies into
agriculture in general and Indian agriculture in particular.
Agriculture knowledge (for that matter any knowledge) can be said to be of two kinds
– the well articulated, expert certified knowledge what we call Gyandhara and the tacit,
folk knowledge, called Jandhara. Agropedia attempts to provide a platform for hosting
both kinds of knowledge and also linking amalgamating them in an interesting way.
The Library section of the Agropedia is the expert certified knowledge. Wiki, blogs,
Forum provide the platform for un-regulated people-created content/knowledge. In
addition, Agropedia permits users to comment upon certified knowledge in the library.
These comments, usually in the nature of how they benefited from that piece, ratings,
variations in the advice that have been innovated by the user, etc., are un-moderated and
create a situation where the certified knowledge and experience by the end-user sits
together greatly increasing its utility.
The Technology
The first release of Agropedia was implemented using the Alfresco suite.
Subsequently, because of the need of incorporating other functionalities, the Drupal
Content Management System (CMS) has been used as the main technological backbone.
The KMs, as mentioned, are created with CMap, and exported in SVG format to make
them available as context-sensitive images. Other formats (pdf, jpg) are used for
visualization only. A Java routine customize the OWL version of the KMs (obtained with
a simple export from the CMap tool) as a hierarchical tree as a content index of the
catalogue.
Drupal is used for implementing the blogs, the chats, the forums, the Q/A area, the user
management area, etc. The taxonomy module is used for tagging and searching the
content.
A Java module for automatic tagging using an the KMs is in process of
implementation.
3: Conclusions and Future Works
The work undergoing by FAO and other AOS partners for making better use of
traditional thesauri is inline with the current strategies of making data more
“processable”. Similarly, the Agropedia project opens the road to the representation of
agricultural knowledge in the form of concept based maps.
A lot still remains to do: for the AGROVOC Concept Server investigations on the role
of OWL2 and OWL rules should be carried out, as well as the completion of the
collaborative tool to maintain the data pool. Further extensions of this tool consider the
exploitation of knowledge extraction from corpus text in different languages to enrich the
concepts with more terms or synonyms.
In Agropedia further work is planned to develop more KM, and export the CMap
formatted data to OWL for further incorporation and better services through the
Agropedia portal, in particular making use of reasoning tools between the different maps.
For both systems, mutual integration can be further investigated, being them both
addressed to the same area of users.
Finally, Linked Data (linkeddata.org) exposure for both systems is a must.
References
Berners-Lee T., September 1998, Semantic Web Road map,
<www.w3.org/DesignIssues/Semantic.html>
Harper C.A., Tillett B., 2006, Library of Congress controlled vocabularies and their application to
the Semantic Web
Soergel D., Lauser B., Liang A., Keizer J., Katz S., 2004, Reengineering thesauri for new
applications. The AGROVOC example, in Journal of Digital Information, Volume 4 Issue 4,
Article No. 257
Liang A.C., Lauser B., Sini M., Keizer J., Katz S., 2006, From AGROVOC to the Agricultural
Ontology Service / Concept Server. An OWL model for managing ontologies in the agricultural
domain, OWL workshop 2006, Athens, Georgia, U. S. A., and International Conference on
Dublin Core and Metadata Applications, DC-2006--Colima, Mexico Proceedings
Kawtrakul A., Imsombut A., Thunyakijjanukit A., Soergel D., Liang A.C., Sini M., Johannsen G.,
Keizer J., 2005, Automatic Term Relationship Cleaning and Refinement for AGROVOC, <
www.fao.org/docrep/008/af240e/af240e00.htm>
Sini M., Soergel D., Johannsen G., 2009, Proposal for restructuring Scientific Names and Common
Names of Organisms in AGROVOC, <ftp.fao.org/docrep/fao/011/ak283e/ak283e00.pdf>,
<agrovoc-revision-refinement.blogspot.com>, <www.icrisat.org/vasat/agrovoc.htm>
Sini M., Lauser B., Salokhe G., Keizer J., Katz S., 2008, The AGROVOC Concept Server:
rationale, goals and usage, eds. Emerald Group Publishing Limited, Journal: Library Review,
Year: 2008, Volume: 57, Issue: 3, Page: 200 – 212
Montiel-Ponsoda E., Aguado de Cea G., Gómez-Pérez A., Peters W., Coling 2008, Modelling
Multilinguality in Ontologies, Companion volume, Posters and Demonstrations, pages 67-70,
Manchester, August 2008
van Assem M., Malaise V., Miles A., Schreiber G., 2006, A Method to Convert Thesauri to SKOS,
in European Semantic Web Conference, pp. 95-109, ESWC 2006