Data Access and Integration in the ISPIDER Proteomics Grid

Lucas Zamboulis^22,23,
Hao Fan^22,23,
Khalid Belhajjame²⁴,
Jennifer Siepen²⁴,
Andrew Jones²⁴,
Nigel Martin²²,
Alexandra Poulovassilis²²,
Simon Hubbard²⁴,
Suzanne M. Embury²⁵ &
…
Norman W. Paton²⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4075))

Included in the following conference series:

International Workshop on Data Integration in the Life Sciences

552 Accesses
13 Citations
1 Altmetric

Abstract

Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

PROGENIA: An Approach for Grid Interoperability at Workflow Level

A Grid-Enabled Modular Framework for Efficient Sequence Analysis Workflows

From the desktop to the grid: scalable bioinformatics via workflow conversion

Article Open access 12 March 2016

References

Alpdemir, M.N., Mukherjee, A., Paton, N.W., Watson, P., Fernandes, A.A., Gounaris, A., Smith, J.: Service-based distributed querying on the Grid. In: Proc. of the 1st Int. Conf. on Service Oriented Computing, pp. 467–482 (2003)
Google Scholar
Antonioletti, M., et al.: The design and implementation of grid database services in OGSA-DAI. Concurrency - Practice and Experience 17(2-4), 357–376 (2005)
Article Google Scholar
Bowers, S., Ludäscher, B.: An ontology-driven framework for data transformation in scientific workflows. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 1–16. Springer, Heidelberg (2004)
Chapter Google Scholar
Buneman, P., Libkin, L., Suciu, D., Tannen, V., Wong, L.: Comprehension syntax. SIGMOD Record 23(1), 87–96 (1994)
Article Google Scholar
Cattell, R.G.G., Barry, D.K.: The Object Database Standard: ODMG 3.0. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Craig, R., Cortens, J.P., Beavis, R.C.: Open source system for analyzing, validating, and storing protein identification data. Journal of Proteome Research 3(6) (2004)
Google Scholar
Davidson, S.B., Overton, C., Tannen, V., Wong, L.: BioKleisli: A Digital Library for Biomedical Researchers. Journal of Digital Libraries 1(1), 36–53 (1997)
Google Scholar
Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A., Huber, W.: Biomart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21(16), 3439–3440 (2005)
Article Google Scholar
Garwood, K., et al.: Pedro: A database for storing, searching and disseminating experimental proteomics data. BMC Genomics 5 (2004)
Google Scholar
Goble, C.A., Stevens, R., Ng, G., Bechhofer, S., Paton, N.W., Baker, P.G., Peim, M., Brass, A.: Transparent access to multiple bioinformatics information sources. IBM Systems Journal 40(2), 532–551 (2001)
Article Google Scholar
Haas, L.M., Schwarz, P.M., Kodali, P., Kotlar, E., Rice, J.E., Swope, W.C.: Discoverylink: A system for integrated access to life sciences data sources. IBM Systems Journal 40(2), 489–511 (2001)
Article Google Scholar
Jasper, E., Poulovassilis, A., Zamboulis, L.: Processing IQL queries and migrating data in the AutoMed toolkit. AutoMed Tech. Rep. 20 (June 2003)
Google Scholar
Maibaum, M., Zamboulis, L., Rimon, G., Orengo, C., Martin, N., Poulovassilis, A.: Cluster Based Integration of Heterogeneous Biological Databases Using the AutoMed Toolkit. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, pp. 191–207. Springer, Heidelberg (2005)
Chapter Google Scholar
Mçbrien, P., Poulovassilis, A.: Defining Peer-to-Peer Data Integration Using Both as View Rules. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, pp. 91–107. Springer, Heidelberg (2004)
Chapter Google Scholar
McBrien, P., Poulovassilis, A.: Data integration by bi-directional schema transformation rules. In: Proc. ICDE 2003, pp. 227–238 (2003)
Google Scholar
McLaughlin, T., Siepen, J.A., Selley, J., Lynch, J.A., Lau, K.W., Yin, H., Gaskell, S.J., Hubbard, S.J.: Pepseeker: a database of proteome peptide identifications for investigating fragmentation patterns. Nucleic Acids Research 34 (2006)
Google Scholar
Oinn, T.M., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, R.M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)
Article Google Scholar
Perkins, D.N., Pappin, D.J., Creasy, D.M., Cottrell, J.S.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18) (1999)
Google Scholar
Pruess, M., Kersey, P., Apweiler, R.: The integr8 project - a resource for genomic and proteomic data. In: Silico Biology, vol. 5 (2004)
Google Scholar
Shah, S.P., Huang, Y., Xu, Y., Yuen, M.M.S., Ling, J., Ouellette, B.F.F.: Atlas – a data warehouse for integrative bioinformatics. BMC Bioinformatics 6(81) (2005)
Google Scholar
Smith, J., Gounaris, A., Watson, P., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Distributed query processing on the grid. In: Parashar, M. (ed.) GRID 2002. LNCS, vol. 2536, pp. 279–290. Springer, Heidelberg (2002)
Chapter Google Scholar
Zdobnov, E.M., Lopez, R., Apweiler, R., Etzold, T.: The EBI SRS server-recent developments. Bioinformatics 18(2), 368–373 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Systems, Univ. of London, Birkbeck
Lucas Zamboulis, Hao Fan, Nigel Martin & Alexandra Poulovassilis
Department of Biochemistry and Molecular Biology, University College London,
Lucas Zamboulis & Hao Fan
Faculty of Life Sciences, University of Manchester,
Khalid Belhajjame, Jennifer Siepen, Andrew Jones & Simon Hubbard
School of Computer Science, University of Manchester,
Suzanne M. Embury & Norman W. Paton

Authors

Lucas Zamboulis
View author publications
You can also search for this author in PubMed Google Scholar
Hao Fan
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Belhajjame
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Siepen
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Jones
View author publications
You can also search for this author in PubMed Google Scholar
Nigel Martin
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Poulovassilis
View author publications
You can also search for this author in PubMed Google Scholar
Simon Hubbard
View author publications
You can also search for this author in PubMed Google Scholar
Suzanne M. Embury
View author publications
You can also search for this author in PubMed Google Scholar
Norman W. Paton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Humboldt-Universität zu Berlin,
Ulf Leser
Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
Felix Naumann
IBM Application and Integration Middleware, 1475 Phoenixville Pike, 19380, West Chester, PA, USA
Barbara Eckman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zamboulis, L. et al. (2006). Data Access and Integration in the ISPIDER Proteomics Grid. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_3

Download citation

DOI: https://doi.org/10.1007/11799511_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36593-8
Online ISBN: 978-3-540-36595-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Access and Integration in the ISPIDER Proteomics Grid

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

PROGENIA: An Approach for Grid Interoperability at Workflow Level

A Grid-Enabled Modular Framework for Efficient Sequence Analysis Workflows

From the desktop to the grid: scalable bioinformatics via workflow conversion

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Data Access and Integration in the ISPIDER Proteomics Grid

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

PROGENIA: An Approach for Grid Interoperability at Workflow Level

A Grid-Enabled Modular Framework for Efficient Sequence Analysis Workflows

From the desktop to the grid: scalable bioinformatics via workflow conversion

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation