Abstract
The similarity join finds all pairs of similar objects in a large collection. This search problem could be successfully divided into many sub-problems by an algorithm called Quickjoin recently. Besides, this algorithm could be extended to a wide range of application areas as it is based on metric spaces instead of vector spaces only. When the volume of a dataset reaches to a certain degree or the distance measure of the similarity is complex enough, however, Quickjoin still takes much time to accomplish the similarity join task, which leads us to develop a parallel version of the algorithm. In this paper, we present two parallel versions of the Quickjoin algorithm exploiting multi-core and many-core processors respectively as well as evaluate them. Experiments show that the parallelization of this algorithm in our many-core processor yields speedup to 22 at most compared with its non-parallel version.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10 (1966)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer (2006)
Jacox, E.H., Samet, H.: Metric Space Similarity Joins. ACM Transaction on Database Systems 33(2), 1–38 (2008)
Shim, K., Srikant, R., Agrawal, R.: High-Dimensional Similarity Joins. IEEE Transactions on Knowledge and Data Engineering 14(1), 156–171 (2002)
Böhm, C., Braunmüller, B., Krebs, F., Kriegel, H.-P.: Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data. In: SIGMOD, pp. 379–388 (2001)
Lieberman, M.D., Sankaranarayanan, J., Samet, H.: A Fast Similarity Join Algorithm Using Graphics Processing Units. In: ICDE, pp. 1111–1120 (2008)
Faloutsos, C., Lin, K.: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. In: SIGMOD, pp. 163–174 (1995)
Dohnal, V., Gennaro, C., Zezula, P.: Similarity Join in Metric Spaces Using eD-Index. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 484–493. Springer, Heidelberg (2003)
CUDA C Programming Guide: CUDA Toolkit Documentation, http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
OpenMP: An API for multi-platform shared-memory parallel programming in C/C++ and Fortran, http://www.openmp.org/
Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The Design of OpenMP Tasks. IEEE Transactions on Parallel and Distributed Systems 20(3), 404–418 (2009)
Yianilos, P.N.: Data Structures and Algorithms for Nearest Neighbour Search in General Metric Spaces. In: SODA, pp. 311–321 (1993)
Dynamic Parallelismin CUDA, http://developer.download.nvidia.com/assets/cuda/docs/TechBrief_Dynamic_Parallelism_in_CUDA_v2.pdf
CUDA C/C++ Streams and Concurrency, http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, S., Kim, O., Feng, W. (2013). Accelerating Metric Space Similarity Joins with Multi-core and Many-core Processors. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39640-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-39640-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39639-7
Online ISBN: 978-3-642-39640-3
eBook Packages: Computer ScienceComputer Science (R0)