Abstract
This paper presents GMP, a library for generic, SQL-style programming with multisets. It generalizes the querying core of SQL in a number of ways: Multisets may contain elements of arbitrary first-order data types, including references (pointers), recursive data types and nested multisets; it contains an expressive embedded domain-specific language for specifying user-definable equivalence and ordering relations, extending the built-in equality and inequality predicates; it admits mapping arbitrary functions over multisets, not just projections; it supports user-defined predicates in selections; and it allows user-defined aggregation functions.
Most significantly, it avoids many cases of asymptotically inefficient nested iteration through Cartesian products that occur in a straightforward stream-based implementation of multisets. It accomplishes this by employing two novel techniques: symbolic (term) representations of multisets, specifically for Cartesian products, for facilitating dynamic symbolic computation, which intersperses algebraic simplification steps with conventional data processing; and discrimination-based joins, a generic technique for computing equijoins based on equivalence discriminators, as an alternative to hash-based and sort-merge joins.
Full source code for GMP in Haskell, which is based on generic top-down discrimination (not included), is included for experimentation. We provide illustrative examples whose performance indicates that GMP, even without requisite algorithm and data structure engineering, is a realistic alternative to SQL even for SQL-expressible queries.
Similar content being viewed by others
References
Atkinson, M.P., Buneman, O.P.: Types and persistence in database programming languages. ACM Comput. Surv. 19(2), 105–170 (1987). http://doi.acm.org/10.1145/62070.45066
Backhouse, R.: An exploration of the Bird-Meertens formalism. In: STOP Summer School on Constructive Algorithmics (1989)
Bentley, J.: Programming pearls: Little languages. Commun. ACM 29(8), 711–721 (1986). http://doi.acm.org/10.1145/6424.315691
Breazu-Tannen, V., Buneman, P., Naqvi, S.: Structural recursion as a query language. In: DBPL3: Proceedings of the Third International Workshop on Database Programming Languages: Bulk Types & Persistent Data, pp. 9–19. Morgan Kaufmann, San Francisco (1992)
Buneman, P., Nikhil, R., Frankel, R.: A practical functional programming system for databases. In: FPCA’81: Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, pp. 195–202. ACM, New York (1981). http://doi.acm.org/10.1145/800223.806779
Buneman, P., Frankel, R.E., Nikhil, R.: An implementation technique for database query languages. ACM Trans. Database Syst. 7(2), 164–186 (1982). http://doi.acm.org/10.1145/319702.319711
Buneman, P., Naqvi, S., Tannen, V., Wong, L.: Principles of programming with complex objects and collection types. In: ICDT’92: Selected Papers of the Fourth International Conference on Database Theory. Elsevier, Amsterdam, pp. 3–48 (1995). doi:10.1016/0304-3975(95)00024-Q
Chamberlin, D.: SQL. In: Encyclopedia of Database Systems, 1st edn. Springer, Berlin (2009)
Chaudhuri, S.: An overview of query optimization in relational systems. In: PODS, pp. 34–43. ACM Press, New York (1998)
Cheney, J., Hinze, R.: First-class phantom types. CUCIS TR2003-1901. Cornell University (2003)
Cooper, E., Lindley, S., Wadler, P., Yallop, J.: Links: Web programming without tiers. In: Proceedings of the 5th International Conference on Formal Methods for Components and Objects, pp. 266–296. Springer, Berlin (2006)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Deshpande, A., Ives, Z., Raman, V: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007). doi:10.1561/1900000001
Eini, O.: The pain of implementing LINQ providers. Commun. ACM 54(8), 55–61 (2011). http://doi.acm.org/10.1145/1978542.1978556
Finland, S.: World database from MySQL. Available as example database from MySQL.com (2010). http://downloads.mysql.com/docs/world.sql.zip, the sample data used in the world database is Copyright Statistics Finland, http://www.stat.fi/worldinfigures
GHC Team: The Glasgow Haskell Compiler (2011). http://www.haskell.org/ghc/
Goyal, D., Paige, R.: The formal reconstruction and speedup of the linear time fragment of Willard’s relational calculus subset. In: Proceedings of the IFIP TC 2 WG 2.1 International Workshop on Algorithmic Languages and Calculi, pp. 382–414. Chapman & Hall, London (1997)
Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–169 (1993). http://doi.acm.org/10.1145/152610.152611
Grust, T., Sakr, S., Teubner, J.: XQuery on SQL hosts. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, p. 263. VLDB Endowment (2004)
Grust, T., Mayr, M., Rittinger, J., Schreiber, T.: Ferry: Database-supported program execution. In: Çetintemel, U., Zdonik, S.B., Kossmann, D., Tatbul, N. (eds.) SIGMOD Conference, pp. 1063–1066. ACM, New York (2009)
Grust, T., Rittinger, J., Schreiber, T.: Avalanche-safe LINQ compilation. Proc. VLDB Endow. 3, 162–172 (2010). http://dl.acm.org/citation.cfm?id=1920841.1920866
Henglein, F.: Generic discrimination: Sorting and partitioning unshared data in linear time. In: Hook, J., Thiemann, P. (eds.) ICFP’08: Proceeding of the 13th ACM SIGPLAN International Conference on Functional Programming, pp. 91–102. ACM, New York (2008). http://doi.acm.org/10.1145/1411204.1411220
Henglein, F.: Generic top-down discrimination for sorting and partitioning in linear time. Tech. rep., Department of Computer Science, University of Copenhagen (DIKU), submitted to Journal of Functional Programming (JFP) (2010)
Henglein, F.: Optimizing relational algebra operations using discrimination-based joins and lazy products. In: Proc ACM SIGPLAN 2010 Workshop on Partial Evaluation and Program Manipulation, pp. 73–82. ACM, New York (2010). http://doi.acm.org/10.1145/1706356.1706372, also DIKU TOPPS D-report no. 611
Henglein, F., Larsen, K.F.: Generic multiset programming for language-integrated querying. In: Proc. 6th Workshop on Generic Programming (WGP) (2010)
Hudak, P., Fasel, J.H.: A gentle introduction to Haskell Version 98. http://www.haskell.org/tutorial, online tutorial (1999)
Jain, N., Mishra, S., Srinivasan, A., Gehrke, J., Widom, J., Balakrishnan, H., Çetintemel, U., Cherniack, M., Tibbetts, R., Zdonik, S.: Towards a streaming sql standard. Proc. VLDB Endow. 1(2), 1379–1390 (2008)
Leijen, D., Meijer, E.: Domain-specific embedded compilers. In: Proceedings of the 2nd Conference on Domain-Specific Languages, USENIX Association Berkeley, CA, USA, Austin, Texas, pp. 109–122 (1999)
Lindley, S., Wadler, P., Yallop, J., Cooper, E.: Links: Linking theory to practice for the web (2009). http://groups.inf.ed.ac.uk/links/
Meijer, E., Beckman, B., Bierman, G.: LINQ: Reconciling objects, relations, and XML in the .NET Framework. In: Proc. SIGMOD (2006)
Ohori, A.: Orderings and types in databases. In: Bancilhon, F., Buneman, P. (eds.) Advances in Database Programming Languages, Frontier Series, pp. 97–116. Addison-Wesley, ACM Press, Reading, New York (1990), Chap. 6
Ohori, A., Buneman, P.: Type inference in a database programming language. In: Proc. 1988 ACM Conf. on LISP and Functional Programming, pp. 174–183. ACM Press, New York (1988)
O’Sullivan, B.: Criterion package (2010). Available from http://hackage.haskell.org/package/criterion
Peyton Jones, S.: The Haskell 98 language. J. Funct. Program. 13(1), 1–146 (2003)
Peyton Jones, S., Wadler, P.: Comprehensive comprehensions. In: Proc. 2007 Haskell Workshop, Freiburg, Germany (2007)
Ramakrishnan, R., Gehrke, J.: Database Management Systems, 3rd edn. McGraw-Hill, New York (2003)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data. SIGMOD’79, pp. 23–34. ACM, New York (1979). http://doi.acm.org/10.1145/582095.582099
Skillicorn, D.: Architecture-independent parallel computation, presented at IFIP WG2.1 conference, Burton Manor, England (1990)
Steele, G.: Organizing functional code for parallel execution; or, foldl and foldr considered slightly harmful. In: Proc. ICFP 2009 (2009), invited talk
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-Store: A column-oriented DBMS. In: Proc. 31st VLDB Conf. Trondheim, Norway (2005)
Tannen, V., Buneman, P., Wong, L.: Naturally embedded query languages. In: ICDT’92: Proceedings of the 4th International Conference on Database Theory, pp. 140–154. Springer, London (1992)
Trinder, P., Wadler, P.: List comprehensions and the relational calculus. In: Proceedings of the 1988 Glasgow Workshop on Functional Programming, Rothesay, Scotland, pp. 115–123 (1988)
Trinder, P., Wadler, P.: Improving list comprehension database queries. In: TENCON’89. Fourth IEEE Region 10 International Conference, Bombay, India, pp. 186–192. IEEE, New York (1989). doi:10.1109/TENCON.1989.176921
Wadler, P.: Comprehending monads. Math. Struct. Comput. Sci. 2, 461–493 (1992)
Willard, D.E.: Applications of range query theory to relational data base join and selection operations. J. Comput. Syst. Sci. 52(1), 157–169 (1996). doi:10.1006/jcss.1996.0012
Willard, D.E.: An algorithm for handling many relational calculus queries efficiently. J. Comput. Syst. Sci. 65(2), 295–331 (2002). doi:10.1006/jcss.2002.1848
Wong, L.: Querying nested collections. Ph.D. thesis, University of Pennsylvania (1994)
Wong, L.: Kleisli, a functional query system. J. Funct. Program. 10(1):19–56 (2000)
Xi, H., Chen, C., Chen, G.: Guarded recursive datatype constructors. In: Proceedings of the 30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 224–235. ACM, New York (2003)
Author information
Authors and Affiliations
Corresponding author
Additional information
This material is based upon work supported by the Danish Strategic Research Council under Project HIPERFIT (hiperfit.dk) and by the Danish National Advanced Technology Foundation under Project 3gERP.
Rights and permissions
About this article
Cite this article
Henglein, F., Larsen, K.F. Generic multiset programming with discrimination-based joins and symbolic Cartesian products. Higher-Order Symb Comput 23, 337–370 (2010). https://doi.org/10.1007/s10990-011-9078-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10990-011-9078-8