2010 was a good year for the Semantic Web, which has especially gained momentum in health care and life sciences. I will just mention a few of the reasons it was a good year. Semantic Web and Linked Data have become a clear choice for projects that have a special interest in interoperability and data sharing across enterprises and between project partners. This includes the European Innovative Medicine Initiatives (IMI), in which a number of projects that will want to share results and information across several domains including drug discovery, electronic patient records, clinical trials, tissue banking, etc. The Pistoia Alliance SESL project has also moved things forward, showing how to add value to data through linking microarray data with claims in literature and other public domain resources. Great news: With another 5 years of NIH funding, NCBO is well-positioned to contribute to collaborative science and translational research.
Biobanking provides Semantic Web with an ideal resource sharing application – one that makes the added value of linked data more concrete for bench scientists, who understand the immediate need of sharing tissue libraries in order to move certain types of research forward. A keynote speaker at BBMRI meetings in the Netherlands and the Life Sciences Momentum 2010 conference, David Cox (Pfizer) provides a business case for biobanking, couched in a solid research strategy that is mutually beneficial to pharmaceutical companies and their academic partners. In the strategy outlined by Cox, a transparent policy of public domain knowledge and intellectual property is key to make biobanking and drug development work well for academic and pharmaceutical partners. Although David Cox does not focus on data sharing in his presentation, it plays an implicit central role in biobanking and matching biobanking resources with drug development goals. Please see the interview with David Cox “Collaborate in order to understand the biology”.
In the HCLS Interest Group, we have progressed on several fronts: an approach to creating semantic views on distributed data sources, including both relational databases and triplestores (see the SWObjects federated query tutorial at SWAT4LS), published about how we applied the Translational Medicine Ontology to patient records in Indivo format and were able to pose cross-domain queries on the results, demonstrated and published about a federated approach to microarray study results in RDF at the Provenance Workshop at ISWC, further defined several ontologies for describing discourse and annotations, in the process of documenting best practices for linked data, checking correspondence of radiology and pathology reports for breast cancer, with more publications, demonstrations, and W3C notes on the way. Look for a Best Practices document from the Linked Open Drug Data task force and a specification of microarray RDF practices from BioRDF / Scientific Discourse task forces in the coming months.
Did I mention being impressed by TripleMap? It makes it possible to create maps of interest using Linked Open Drug Data in a very nice user interface geared toward essential concerns such as compounds, disease, genes, etc.