Abstract
Data-intensive workflows process and produce large volumes of data. The volume of data, number of workflow participants and activities may range from small to large numbers. The traditional way of logging experimental process is no longer valid. This has resulted in a need for techniques to automatically collect information on workflows known as provenance. Several solutions for e-Science provenance have been proposed but these are predominantly domain and application specific. In this chapter, the requirements of e-Science provenance systems are first clearly defined, and then a novel solution named the Vienna e-Science Provenance System (VePS) that satisfies these requirements is proposed. The VePS not only promises to be light weight, workflow enactment engine, domain and application independent, but it also measures the significance of workflow parameters using the Ant Colony Optimization meta-heuristic technique. Major contributions include: (1) interoperable provenance system, (2) quantification of parameters significance, and (3) generation of executable workflow documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hochbaum, D.S. (ed.): Approximation Algorithms for NP-Hard Problems. Course Technology, Florence (1996). ISBN: 978-0534949686
Azeez, A.: Axis2 popularity exponentially increasing. http://afkham.org/2008/08/axis2-popularity-exponentially.html (URL)
Bose, R., Frew, J.: Lineage retrieval for scientific data processing: a survey, pp. 1–28 (2005)
Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F., Cowan, J.: Extensible markup language (XML) 1.1 (2004)
Buneman, P., Khanna, S., Tan, W.C.: Why and where: A characterization of data provenance, pp. 316–330. LNCS, London (2001)
Carole, G., Robert, S., et al.: Data curation + process curation=data integration + science. Brief Bioinform. 6, 506–517 (2008)
Deelman, E., Chervenak, A.: Data management challenges of data-intensive scientific workflows (2008)
Deelman, E., Taylor, I.: Special issue on scientific workflows. J. Grid Comput. 3–4, 151–151 (2005)
Donoho, D.L., Maleki, A., et al.: Reproducible research in computational harmonic analysis, pp. 8–18 (2009)
Dorigo, M., Sttzle, T.: Ant colony optimization. MIT, MA (2004)
Fox, G., Gannon, D.: Workflow in grid systems. pp. 1009–1019 (2006)
Greenwood, M., Goble, C., et al.: Provenance of e-Science Experiments – Experience from Bioinformatics, pp. 223–226 (2003)
Groth, P., Luck, M., Moreau, L.: Formalising a protocol for recording provenance in grids, pp. 147–154 (2004)
Jayasinghe, D.: Quickstart Apache Axis2: A practical guide to creating quality web services. Packt Publishing (2008)
Khan, F.A., Han, Y., Pllana, S., Brezany, P.: Provenance support for grid-enabled scientific workflows, pp. 173–180. IEEE, Beijing, (2008)
Khan, F.A., Han, Y., Pllana, S., Brezany, P.: Estimation of parameters sensitivity for scientific workflows. In: Proceedings of International Conference on ICPP, Vienna, Austria. IEEE Computer Society (2009)
Khan, F.A., Han, Y., Pllana, S., Brezany, P.: An ant-colony-optimization based approach for determination of parameter significance of scientific workflows, pp. 1241–1248 (2010)
Lord, P., Macdonald, A., Lyon, L., Giaretta, D.: From data deluge to data curation, pp. 371–375 (2004)
Ludaescher, B., Goble, C.: Special section on scientific workflows. SIGMOD Rec. 3, 1–2 (2005)
Moreau, L., Foster, I.: Provenance and annotation of data. In: International Provenance and Annotation Workshop, LNCS. Springer, Berlin (2006)
Moreau, L., Clifford, B., et. al. The Open Provenance Model Core Specification (v1.1). Future Generation Computer Systems, New York (2010)
Muehlen, M.Z.: Volume versus variance: Implications of data-intensive workflows (2009)
OASIS: The WS-BPEL 2.0 specification. http://www.oasis-open.org/committees/download.php/23964/wsbpel-v2.0-primer.htm (2007)
Rajbhandari, S., Walker, D.W.: Incorporating provenance in service oriented architecture, pp. 33–40. IEEE Computer Society, USA (2006)
Rusbridge, C., Burnhill, P., Ross, S. et al.: The digital curation centre: A vision for digital curation, pp. 31–41 (2005). doi: http://doi.ieeecomputersociety.org/10.1109/LGDI.2005.1612461
Schroeder, R.: e-Sciences as research technologies: reconfiguring disciplines, globalizing knowledge. Soc. Sci. Inf. Surles Sci. Sociales 2, 131–157 (2008). doi: 10.1177/ 0539018408089075
Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-Science, pp. 31–36 (2005)
Simmhan, Y.L., Plale, B., Gannon, D.: Karma2: Provenance management for data-driven workflows. Int. J. Web Service Res. 2, 1–22 (2008)
Stevens, R.D., Tipney, H.J., Wroe, C.J., et al.: Exploring Williams-Beuren syndrome using myGrid. In: In Proceedings of 12th International Conference on Intelligent Systems in Molecular Biology (2003)
Szomszor, M., Moreau, L.: Recording and reasoning over data provenance in web and grid services, pp. 603–620 (2003)
Talbi, E.G.: Metaheuristics: From design to implementation (Wiley Series on Parallel and Distributed Computing). Wiley, NY (2009). http://www.amazon.com/Metaheuristics-Design-Implementation-El-Ghazali-Talbi/dp/0470278587
Tan, W.C.: Research problems in data provenance, pp. 45–52 (2004)
Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific workflows for grid. Springer, Berlin (2006)
Uri, B., Avraham, S., Margo, S.: Securing provenance, pp. 1–5. USENIX Association, CA, (2008)
Zhao, J., Goble, C., Greenwood, M., Wroe, C., Stevens, R.: Annotating, linking and browsing provenance logs for e-Science, pp. 158–176 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Khan, F.A., Brezany, P. (2011). Provenance Support for Data-Intensive Scientific Workflows. In: Fiore, S., Aloisio, G. (eds) Grid and Cloud Database Management. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20045-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-20045-8_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20044-1
Online ISBN: 978-3-642-20045-8
eBook Packages: Computer ScienceComputer Science (R0)