Abstract
Processing of large amounts of data in data warehouses is increasingly being done in cluster architectures to achieve scalability. In this paper we look into the problem of ad hoc star join query processing in clusters architectures. We propose a new technique, the Star Hash Join (SHJ), which exploits a combination of multiple bit filter strategies in such architectures. SHJ is a generalization of the Pushed Down Bit Filters for clusters. The objectives of the technique are to reduce (i) the amount of data communicated, (ii) the amount of data spilled to disk during the execution of intermediate joins in the query plan, and (iii) amount of memory used by auxiliary data structures such as bit filters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Top500 Supercomputer Sites, http://www.top500.org
Transaction processing and database benchmarks, http://www.tpc.org
Aguilar-Saborit, J., Muntes-Mulero, V., Larriba-Pey, J.-L.: Pushing down bit filters in the pipelined execution of large queries. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 328–337. Springer, Heidelberg (2003)
Aguilar-Saborit, J., Muntes-Mulero, V., Larriba-Pey, J.-L., Zuzarte, C.: Ad-hoc star hash join processing in clusters of smp. Technical Report. Universitat Politecnica de Catalunya UPC-DAC-RR-GEN-2005-4
Aguilar-Saborit, J., Muntes-Mulero, V., Larriba-Pey, J.-L., Zuzarte, C., Pereyra, H.: On the use of bit filters in shared nothing partitioned systems. To appear in IWIA 2005 (2005)
Bernstein, P.A., Chiu, D.M.: Using semijoins to solve relational queries. J. ACM 28(1), 25–40 (1981)
Bernstein, P.A., Goodman, N.: The power of natural joins. SIAM J. Computi. 10, 751–771 (1981)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)
Chan, C.-Y., Ioannidis, Y.E.: Bitmap index design and evaluation. In: Proc. of the SIGMOD Conf. on the Management of Data, pp. 355–366 (1998)
Chaudhuri, S., Dayal, U.: Data warehousing and olap for decision support (turorial). In: SIGMOD Conference 1997, pp. 507–508 (1997)
Chaudhuri, S., Dayal, U.: An overview of data warehousing and olap technology. In: SIGMOD, vol. 26, pp. 65–74 (1997)
Deshpande, P., Ramasamy, K., Shuckla, A., Naughton, J.F.: Caching multidimensional queries using chunks. In: SIGMOD Conference, pp. 259–270 (1998)
DeWitt, D.J., Katz, R., Olken, F., Shapiro, L., Stonebreaker, M., Wood, D.: Implementation Techniques for Main Memory Database Systems. In: Proceedings of the SIGMOD Int’l. Conf. on the Management of Data, pp. 1–8. ACM, New York (1984)
Markl, V., Ramsak, F., Bayer, R.: Improving olap performance by multidimensional hierarchical clustering. In: Proc. of the Intl. Database Enfineering and Applications Symposium, pp. 165–177 (1999)
Mehta, M., DeWitt, D.J.: Parallel database systems: The future of high performance database processing. In: Proceedings of the 21st VLDB Conference (1995)
O’Neil, P., Graefe, G.: Multi-Table Joins Through Bitmapped Join Indices. SIGMOD Record 24(3), 8–11 (1995)
O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 38–49 (1997)
Roussopoulos, R.: Materialized Views and Data Warehouses. SIGMOD Record 27(1), 21–26 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aguilar-Saborit, J., Muntés-Mulero, V., Zuzarte, C., Larriba-Pey, JL. (2005). Ad Hoc Star Join Query Processing in Cluster Architectures. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_20
Download citation
DOI: https://doi.org/10.1007/11546849_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28558-8
Online ISBN: 978-3-540-31732-6
eBook Packages: Computer ScienceComputer Science (R0)