[go: up one dir, main page]

Academia.eduAcademia.edu
Knowledge base development D. Martinez1 , M. Taboada2 , and J. Mira3 1 2 Dpto. de Fisica Aplicada. Universidade de Santiago de Compostela. Bernardino Pardo Ouro, S.N. 27002 Lugo. Spain. fadiego@usc.es http://www.usc.es Dpto. de Electronica e Computacion. Universidade de Santiago de Compostela. 15782 Santiago de Compostela. Spain. chus@dec.usc.es http://aiff.usc.es/ elchus/ 3 Dpto. de Inteligencia Artificial. UNED. Madrid. Spain jmira@dia.uned.es http://www.ia.uned.es/personal/jmira/ Abstract. Recently, the development of knowledge bases is considered to be a process of extending an ontology with specific knowledge of a particular application. Reuse of pre-existing knowledge sources gives rise to knowledge bases endowed with a common and standard terminology. But, reuse often requires an enormous effort, so few knowledge-based applications have arisen following this approach. In this paper, we analyze some problems that make reuse difficult, as well as types of activities and current available resources that may contribute to improve this process. 1 Introduction Recently, the development of Knowledge Bases (KB) is considered to be a process of extending generic concept descriptions by enumerating the specific details of a particular application [1]. The formal explicit description of concepts in a domain of discourse, their properties, relationships among the concepts and axioms constitute the core of the KB. In the Artificial Intelligence literature, this description is referred to as domain ontology [2]. So, a KB can be viewed as an extension of an ontology with specific knowledge of a particular application. Reusing pre-existing knowledge sources during the development process may: – Facilitate the knowledge acquisition process of both the domain ontology and its instantiations. – Give rise to KBs endowed with a common and standard terminology. – Increase the sharing and reuse of these KBs. In order to achieve these potential benefits, we need to have available: 1. The required knowledge sources and resources. 2. Some tools oriented to support reusing activities, such as importing, combining and reorganizing the needed contents of the reused knowledge sources. A high number of knowledge sources in many domain areas (including reference terminologies, formal ontologies and knowledge bases) have been published in the literature and internet, making them more accessible and very attractive for reusing. However, the number of knowledge-based applications built by reusing is still very small [3]. In this paper, we analyze some problems that make reuse difficult. In addition, we present our experiences on building KBs by reusing for a specific domain: The medical domain. Reuse of existing knowledge often requires an enormous effort. So, the first research questions that we try to answer in this paper are the following. What types of activities facilitate the development process of KBs by reusing? How do the current available resources contribute to improve this process? As the reuse process is too tedious, it would be very interesting to produce an easily reusable and shared KB. But, how should the design process be carried out with the aim of increasing the reuse and share of the resulting KB? What properties should a KB verify in order to be easily reusable and shared? In order to answer these questions, a choice had to be made of the knowledge sources, as well as of the methodologies and tools to be used. The justification for these choices is further discussed in a next section. The structure of the paper is as follows. First, we analyze some problems with reusing and revise the methodologies oriented to reuse. Then, we present a specific case for reusing: the development of a KB for use in a clinical guideline. We justify our choice of the knowledge sources, the methodologies and tools to be used. Next we go into the details of the main stages in building a KB for this particular case. Finally, we end with some conclusions. 2 Some problems with knowledge reusing Several problems arise when independently developed knowledge sources are combined and adapted for new knowledge based applications. One problem comes from the need for combining portions of different sources, which can be formalized in different levels. In many domains, such as in medical domains, developing a knowledge base may need to import and combine information from both a formal ontology and a unified terminology server [4]. With regard to reuse ontologies, problems are well known [5]: ontology dispersion over several servers, different formalization depending on the container server, description of the same sources with different levels of detail, etc. In addition, although recently a lot of ontology repositories are available on the web, they do not provide many facilities for searching. Otherwise, unified terminology servers are usually well-organized and provide different facilities for searching. But, the knowledge contained in these servers is not formally represented. So, additional steps are necessary to formalize the imported knowledge. 3 Methodologies oriented to reuse Developing a KB can be viewed as consisting of two interrelated activities: 1. Developing the core ontology of the KB. 2. Specializing the core ontology. In order to carry out the first activity, we can use the methodologies and tools existing for developing ontologies (see [5] for a review of the most important ones). In many cases, it is useful to include more than one ontology. But, the process of manually creating this core by including coherently information from all the sources, may be too tedious. Recent efforts have tried to address the problem of bringing together disparate source ontologies. Currently, two approaches are distinguished [6]: 1. Merging different ontologies into a single ontology. 2. Assembling several ontologies in a consistent and coherent way with each other, but keeping them separate, as parts of a resulting ontology (sometimes referred to as alignment). Nowadays, several tools exist for helping users find similarities and differences between ontologies, such as ONION [7], Chimaera [8] or PROMPT [6] (see [9] for a comparison of the most prominent ones). On the other hand, the resulting domain ontology can be enriched with information imported directly from unified terminology servers. However, this activity is not easy, as it does not consist of importing all the information from a server into an evolving KB [4]. In many cases, the information must be imported selectively and combined with the domain ontology possibly by adding extra information. So, again this activity requires the use of methodologies and tools for developing and integrating ontologies. 4 A scenario: the development of a KB for a clinical guideline In this section we present the characteristics of a case study of the development of a KB for a specific medical domain. The purpose of the KB is to support an electronic guideline in ophthalmology taken from the American Academy of Ophthalmology (AAO) Web site (http://www.aao.org/aao/education/library/ppp.cfm). So, we need a sufficiently general methodology for both the modeling of an ontology (the core of our KB) and its extension with specific details. We have used the tool PROTEGE-2000 [10], which is an ontology design and knowledge acquisition tool. For modeling the core of the KB, we have found the use of more than one ontology very useful. In order to simplify the merging of these ontologies, we have chosen PROMPT [6] for several reasons. Firstly, it allows merging of source ontologies into a resulting ontology. We have found this feature very useful, as development and merging time is reduced, on the contrary to those tools that only provide similarities and differences as a result. Secondly, it analyses source concepts, properties (including restriction on value properties) and relationships. Thirdly, it is interactive with the user, allowing us to accept or reject the suggestions. We have also revised and selected specific concepts from the Unified Medical Language System (UMLS) [11]. This server has been developed and is maintained by the U.S.A. National Library of Medicine. It is a tool focussed on facilitating the development of biomedical systems and the integration of information from different sources. Of all the sources of knowledge and tools provided by UMLS, we have used the following: 1. Methasaurus, which contains information about a large number of medical concepts, nomenclatures and thesauri. 2. The Semantic Network, which provides a classification of all the concepts represented in the Methasaurus. 3. The Semantic Navigator, which draws all of the semantic space that is related to each UMLS concept. However, the embodiment of these specific concepts in our KB is not only a process consisting of the specialization of the core ontology (i.e., consisting of the adding instances). In general, it is a development process similar to ontology design. Next, we summarize the main activities that have been carried out during the development of the KB. It should be noted here that we are only considering the activities relative to the building of the domain factual model and not the whole of the knowledge model. The modeling of the implicit algorithm will depend on the specific language chosen to represent the medical guideline. Our main objective is to provide an enough generic KB, which may be reused for different purposes. So it could be used for developing the same electronic medical guideline with different languages. 4.1 Developing the core of the KB We started to construct the core by developing an initial ontology (referred to as ontoconj-1), which only contained some required concepts from the Semantic Network of UMLS. Next, we have extracted some required parts of the EON ontology [12] (referred to as ontoconj-2). EON is a system for guideline-based medical care. In EON, a core ontology defines the general structure of clinical guidelines. In addition, it includes a set of models containing modeling primitives in order to represent different types of knowledge in clinical guidelines. For example, the medical domain class provides a core model for representing the medical concepts and the time entity class includes a set of primitives to represent temporal knowledge. In particular, we have selected and copied these two parts of the EON ontology using PROMPT, which partially automated the extraction process, as we are commenting next. Each time a concept was copied, PROMPT showed a list of suggestions for operations that should be done as a result of our copy operations. The list of suggestions mainly included the copying of all concepts from the EON ontology to which any concept in the extracted ontology was referred. So, in this stage, it was necessary to decide whether to include each recommended concept (by accepting the suggestion) or not. If not, PROMPT preserves the dangling reference. In order to remove the dangling reference, we chose between the two following operations. In some cases, we manually removed the property that produced the dangling reference. In other cases, we changed the restriction on the property values. From our experience, this process is tedious and is not error free. So, we think that this process should be automatized by the tool. Next, we merged the resulting previous ontologies (ontoconj-1 and ontoconj2). PROMPT automatically identifies candidates for merging if they have the same name. In our particular case, this facility is important, as the two ontology sources cover very similar aspects of the same domain and they are taken from the same source, the Semantic Network of the UMLS. However, the ontology sources were developed independently from each another. Therefore, the correlation among concepts is not complete. For example, they contain identical concepts with different names (synonyms), the concept hierarchy is different and they describe similar concepts in a different way. Prompt identified all these conflicts. 4.2 Specializing the core of the KB The followed strategy for acquiring specific knowledge has been centered on interviews with an expert, to analyze the medical guideline on paper. Once the expert underlined a medical word, we searched for it on the Methasaurus. Each Methasaurus concept belongs to a semantic type (represented by a Semantic Network concept). When found, the concept was imported as a class in Protege. We have reproduced the hierarchical nature of UMLS. In this way, each selected Methasaurus concept was imported as a subclass of the Protege class representing its semantic type. This process was very tedious, as many similar terms were found and many of them are not defined in the UMLS server. So, the resulting ontology may contain some conceptual mistakes. In Figure 1, we can see an example of a concept imported from UMLS. The initial ontology, developed in the previous stage, only contained some concepts from the Semantic Network of UMLS. So, we had to import new concepts during this stage. All medical concepts are represented as a class in the Medical Domain Class hierarchy. On the left-hand side, Fig. 1 shows a small part of this hierarchy. Each class in this hierarchy is associated with a Methasaurus or a Semantic Type concept. In Fig. 1, the class Symptom corresponds to the Methasaurus concept labeled as symptoms< 1 >. We have added two metaclasses, Methasaurus Concept Metaclass and Semantic Type Metaclass, which specify the two types of UMLS templates. Each metaclass defines a link to some UMLS concept. For example, in Fig. 1 the class Symptom belongs to the Methasaurus Concept Metaclass, as shown on the right-hand side. Summarizing, we have followed a hybrid development process. On one hand, we have searched the UMLS to specialize some parts of our KB (top-down development process). When some Methasaurus concepts were found useful, some- Fig. 1. A screen shot from Protege-2000 showing a small part of the medical domain class hierarchy. The class Symptom is shown with some slots on the right and with its associated Methasaurus concept (symptoms < 1 >). times we needed to add new concepts from the Semantic Network, in order to classify the new Methasaurus concepts and other times we needed to group them into more general concepts by defining middle-level concepts (also taken from the Methasaurus). We have carried out this knowledge formalization process by following the guide for developing ontologies provided from Noy and McGuinness [13]. 5 Conclusions Reuse knowledge is becoming increasingly important in knowledge engineering, because of its potential to promote the knowledge acquisition process and endow the resulting system with a common standard terminology. Nowadays, several libraries of knowledge sources exist on the Web, making the reuse approach more attractive. In addition, we have tools available for developing ontologies [5, 10, 13] and tools oriented to merge and combine knowledge sources [6–8]. However, reuse is still a labor intensive process. Moreover, the current methodologies do not warrant that the resulting KB will be reused. In order to promote the latter aspect, we should advance a modular KB design to facilitate the future extraction of the needed parts. Another important aspect is to create some type of links between the imported knowledge and the resulting KB. For example, we have separately represented the imported UMLS concepts from the Medical Domain View publication stats Class. In this way, when the information contained in UMLS evolves over time, these links will be useful to re-interpret data and knowledge under the resulting KB by means of the Protege Axiom Language (PAL). Acknowledgements: This work has been funded by the Secretaria Xeral de Investigacion e Desenvolvemento da Xunta de Galicia, through the research project XUGA20601A98. References 1. Musen, M.A. Modern architectures for intelligent systems: reusable ontologies and problem-solving methods. In Proc. of the AIMA’98, C.G. Chute (ed.), Orlando, FL (1998) 46-52 2. Guarino, N. Formal ontology, conceptual analysis and knowledge representation. International Journal of Human-Computer Studies, 43 (1995) 625-40 3. Cohen, P., Chaudhri, V., Pease, A. and Schrag, R. Does prior knowledge facilitate the development of knowledge-based systems? In: Proc. of the Sixteenth National Conference on Artificial Intelligence, (1999) 221-226 4. Li, Q., Shilane, P., Noy, N. F. and Musen, M. A.: Ontology Acquisition from Online Knowledge Sources. In: Proc. of the AMIA Annual Symposium, Los Angeles, CA (2000). 5. Gomez-Perez, A. Knowledge Sharing and Reuse. In: Liebowitz (ed.): The Handbook on Applied Expert Systems. CRC Press (1998) 6. Noy, N.F. and Musen, M.A. PROMPT: Algorithm and tool for automated Ontology merging and alignment. In: Proc. of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), Austin, TX (2000) 7. Mitra, P., Wiederhold, G. and Kersten, M. A graph-oriented model for articulation of ontology interdependencies. In: Proc. Conference on Extending Database Technology 2000 (EDBT’2000), Konstanz, Germany (2000) 8. McGuinness, D.L., Fikes, R., Rice, J. and Wilder, S. An environment for merging and testing large ontologies. In: Proc. of the Seventh International Conference (KR’2000). Morgan Kaufmann Publishers, San Francisco (2000) 9. Noy, N. F. and Musen, M. A. Evaluating Ontology-Mapping Tools: Requirements and Experience. In: Proc. of the Workshop on Evaluation of Ontology Tools at EKAW’02 (EON2002), Sigenza, Spain (2002) 10. Gennari, J., Musen, M. A., Fergerson, R. W. Grosso, W. E., Crubezy, M., Eriksson, H., Noy, N. F. and Tu, S. W. The Evolution of Protg: An Environment for Knowledge-Based Systems Development. International Journal of HumanComputer Interaction (in press, 2002). http://protege.stanford.edu 11. National Library of Medicine, Unified Medical Language System, Bethesda, MD (2001). In http://umlsks.nlm.nih.gov 12. Tu, S. W. and Musen, M. A. Modeling Data and Knowledge in the EON Guideline Architecture. MedInfo, London, UK (2001) http://www.smi.stanford.edu/projects/eon/ 13. Noy, N. F. and McGuinness, D.L. Ontology Development 101: a guide for creating your first ontology. SMI Technical Report SMI-2001-0880, In http://www. stanford.edu