Skip to main content

C. Deaton

Semantic Web technologies offer the potential to revolutionize management of health care data by increasing interoperability and reusability while reducing the need for redundant data collection and storage. From 1998 through 2010,... more
Semantic Web technologies offer the potential to revolutionize management of health care data by increasing interoperability and reusability while reducing the need for redundant data collection and storage. From 1998 through 2010, Cleveland Clinic sponsored a project designed to explore and develop this potential. The product of this effort, SemanticDB, is a suite of software tools and knowledge resources built to facilitate the collection, storage and use of the diverse data needed to conduct clinical research and health care quality reporting. SemanticDB consists of three main components: 1) a content repository driven by a meta-model that facilitates collection and integration of data in an XML format and automatically converts the data to RDF; 2) an inference-mediated, natural language query interface designed to identify patients who meet complex inclusion and exclusion criteria; and 3) a data production pipeline that uses inference to generate customized views of the repository content for statistical analysis and reporting. Since 2008, this system has been used by the Cleveland Clinic's Heart and Vascular Institute to support numerous clinical investigations, and in 2009 Cleveland Clinic was certified to submit data produced in this manner to national quality monitoring databases sponsored by the Society of Thoracic Surgeons and the American College of Cardiology.
The ability to identify patient populations for clinical research according to specific criteria is a critical component of clinical informatics. Traditionally, database systems that leverage relational databases are used to store and... more
The ability to identify patient populations for clinical research according to specific criteria is a critical component of clinical informatics. Traditionally, database systems that leverage relational databases are used to store and manage the data targeted for population identification. The process of starting with an abstract research question (posed in prose) and distilling it into the specific query plans and corresponding data structures necessary to answer such research questions is often significantly tedious and primarily driven by idiosyncrasies of how the data is stored rather than the underlying science that guides the intuition of the researcher. Contemporary clinical data management systems have yet to move beyond this approach.

Cleveland Clinic and Cycorp Inc. have developed a state-of-the-art query interface that builds query fragments through natural language-driven interactions with an investigator. This interface targets a clinical research repository comprised of patient record content expressed in an RDF dataset that conforms to a patient record OWL ontology and is queried over the SPARQL service protocol. It leverages the Cyc common sense ontology, Cyc's natural language processing capabilities, formalized mappings from the common sense ontology into the patient record ontology, and the availability of a SPARQL service for querying the patient population.

In this session, we describe the specifics of this framework, enumerate its strengths and weaknesses, and derive insights into the opportunities and challenges of adopting Semantic Web Technologies for the purpose of managing and identifying patient populations.
Medical records come in many forms including enterprise-wide electronic health records mandated by the 2009 federal stimulus package, lab and clinic-based data management systems, administrative databases for scheduling, billing and... more
Medical records come in many forms including enterprise-wide electronic health records mandated by the 2009 federal stimulus package, lab and clinic-based data management systems, administrative databases for scheduling, billing and inventory control, and secondary databases for clinical research and quality reporting. Making meaningful use of these disparate data sources to improve the cost and quality of patient care turns on our ability to accurately model their complex semantics and exploit these semantics through machine processing.

Since 2003, Cleveland Clinic has applied semantic technologies to making use of diverse data derived from the treatment of over 200,000 patients with cardiovascular disease over the past 30 years. This presentation outlines the practical issues and challenges faced, and describes how some of these issues have been overcome while others remain unsolved. Topics covered include:

domain ontology development and utilization
linking domain ontologies to specific instance data models
data definition, validation and aggregation
data utilization for health care quality analysis and reporting