[go: up one dir, main page]

Skip to main content

Data Access and Integration in the ISPIDER Proteomics Grid

  • Conference paper
Data Integration in the Life Sciences (DILS 2006)

Abstract

Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Alpdemir, M.N., Mukherjee, A., Paton, N.W., Watson, P., Fernandes, A.A., Gounaris, A., Smith, J.: Service-based distributed querying on the Grid. In: Proc. of the 1st Int. Conf. on Service Oriented Computing, pp. 467–482 (2003)

    Google Scholar 

  2. Antonioletti, M., et al.: The design and implementation of grid database services in OGSA-DAI. Concurrency - Practice and Experience 17(2-4), 357–376 (2005)

    Article  Google Scholar 

  3. Bowers, S., Ludäscher, B.: An ontology-driven framework for data transformation in scientific workflows. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 1–16. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Buneman, P., Libkin, L., Suciu, D., Tannen, V., Wong, L.: Comprehension syntax. SIGMOD Record 23(1), 87–96 (1994)

    Article  Google Scholar 

  5. Cattell, R.G.G., Barry, D.K.: The Object Database Standard: ODMG 3.0. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  6. Craig, R., Cortens, J.P., Beavis, R.C.: Open source system for analyzing, validating, and storing protein identification data. Journal of Proteome Research 3(6) (2004)

    Google Scholar 

  7. Davidson, S.B., Overton, C., Tannen, V., Wong, L.: BioKleisli: A Digital Library for Biomedical Researchers. Journal of Digital Libraries 1(1), 36–53 (1997)

    Google Scholar 

  8. Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A., Huber, W.: Biomart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21(16), 3439–3440 (2005)

    Article  Google Scholar 

  9. Garwood, K., et al.: Pedro: A database for storing, searching and disseminating experimental proteomics data. BMC Genomics 5 (2004)

    Google Scholar 

  10. Goble, C.A., Stevens, R., Ng, G., Bechhofer, S., Paton, N.W., Baker, P.G., Peim, M., Brass, A.: Transparent access to multiple bioinformatics information sources. IBM Systems Journal 40(2), 532–551 (2001)

    Article  Google Scholar 

  11. Haas, L.M., Schwarz, P.M., Kodali, P., Kotlar, E., Rice, J.E., Swope, W.C.: Discoverylink: A system for integrated access to life sciences data sources. IBM Systems Journal 40(2), 489–511 (2001)

    Article  Google Scholar 

  12. Jasper, E., Poulovassilis, A., Zamboulis, L.: Processing IQL queries and migrating data in the AutoMed toolkit. AutoMed Tech. Rep. 20 (June 2003)

    Google Scholar 

  13. Maibaum, M., Zamboulis, L., Rimon, G., Orengo, C., Martin, N., Poulovassilis, A.: Cluster Based Integration of Heterogeneous Biological Databases Using the AutoMed Toolkit. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, pp. 191–207. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Mçbrien, P., Poulovassilis, A.: Defining Peer-to-Peer Data Integration Using Both as View Rules. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, pp. 91–107. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. McBrien, P., Poulovassilis, A.: Data integration by bi-directional schema transformation rules. In: Proc. ICDE 2003, pp. 227–238 (2003)

    Google Scholar 

  16. McLaughlin, T., Siepen, J.A., Selley, J., Lynch, J.A., Lau, K.W., Yin, H., Gaskell, S.J., Hubbard, S.J.: Pepseeker: a database of proteome peptide identifications for investigating fragmentation patterns. Nucleic Acids Research 34 (2006)

    Google Scholar 

  17. Oinn, T.M., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, R.M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)

    Article  Google Scholar 

  18. Perkins, D.N., Pappin, D.J., Creasy, D.M., Cottrell, J.S.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18) (1999)

    Google Scholar 

  19. Pruess, M., Kersey, P., Apweiler, R.: The integr8 project - a resource for genomic and proteomic data. In: Silico Biology, vol. 5 (2004)

    Google Scholar 

  20. Shah, S.P., Huang, Y., Xu, Y., Yuen, M.M.S., Ling, J., Ouellette, B.F.F.: Atlas – a data warehouse for integrative bioinformatics. BMC Bioinformatics 6(81) (2005)

    Google Scholar 

  21. Smith, J., Gounaris, A., Watson, P., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Distributed query processing on the grid. In: Parashar, M. (ed.) GRID 2002. LNCS, vol. 2536, pp. 279–290. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  22. Zdobnov, E.M., Lopez, R., Apweiler, R., Etzold, T.: The EBI SRS server-recent developments. Bioinformatics 18(2), 368–373 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zamboulis, L. et al. (2006). Data Access and Integration in the ISPIDER Proteomics Grid. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_3

Download citation

  • DOI: https://doi.org/10.1007/11799511_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36593-8

  • Online ISBN: 978-3-540-36595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics