[go: up one dir, main page]

skip to main content
10.1145/253260.253263acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free access

Fast parallel similarity search in multimedia databases

Published: 01 June 1997 Publication History

Abstract

Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces. The core problem of designing a parallel nearest-neighbor algorithm is to find an adequate distribution of the data onto the disks. Unfortunately, the known declustering methods to not perform well for high-dimensional nearest-neighbor search. In contrast, our method has been optimized based on the special properties of high-dimensional spaces and therefore provides a near-optimal distribution of the data items among the disks. The basic idea of our data declustering technique is to assign the buckets corresponding to different quadrants of the data space to different disks. We show that our technique - in contrast to other declustering methods - guarantees that all buckets corresponding to neighboring quadrants are assigned to different disks. We evaluate our method using large amounts of real data (up to 40 MBytes) and compare it with the best known data declustering method, the Hilbert curve. Our experiments show that our method provides an almost linear speed-up and a constant scale-up. Additionally, it outperforms the Hilbert approach by a factor of up to 5.

References

[1]
Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D.J.: 'A Basic Local Alignment Search Tool', Journal of Molecular Biology, Vol. 215, No. 3, 1990, pp. 403-410.]]
[2]
Arya S.: "Nearest Neighbor Searching and Applications', Ph.D. thesis, University of Maryland, College Park, MD, 1995.]]
[3]
Biggs N.L.: 'Discrete Mathematics', Oxford Science Publications, Clarendon Press-Oxford, 1989, pp. 172-176.]]
[4]
Berchtold S., B6hm C., Keim D., Kriegel H.-P.: 'A Cost Model For Nearest Neighbor Search in High- Dimensional Data Space', ACM PODS Symposium on Pricinples of Database Systems, 1997, Tucson, Arizona.]]
[5]
Berchtold S., Keim D., Kriegel H.-P.: 'The X-tree: An Index Structure for High-Dimensional Data', 22nd Conf. on Very Large Databases, 1996, Bombay, India, pp. 28-39.]]
[6]
Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: "The R*-tree: An Efficient and Robust Access Method for Points and Rectangles ', Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322-331.]]
[7]
Du H.C., Sobolewski J.S.: 'Disk allocation for cartesian product files on multiple Disk systems', ACM TODS, Journal of Transactions on Database Systems, 1982, pp. 82-101.]]
[8]
Faloutsos C., Barber R., Flickner M., Hafner J., et al.: 'Efficient and Effective Querying by Image Content', Journal of Intelligent Information Systems, 1994, Vol. 3, pp. 231-262.]]
[9]
Faloutsos C., Bhagwat P.: 'Declustering Using Fractals', PDIS Journal of Parallel and Distributed Information Systems, 1993, pp. 18-25.]]
[10]
Friedman J. H., Bentley J. L., Finkel R. A.: 'An Algorithm for Finding Best Matches in Logarithmic Expected Time', ACM Transactions on Mathematical Software, Vol. 3, No. 3, September I977, pp. 209-226.]]
[11]
Hjaltason G. R., Samet H.: 'Ranking in Spatial Databases', Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, pp. 83-95.]]
[12]
Jagadish H. V.: 'A Retrieval Technique for Similar Shapes' Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208-217.]]
[13]
Kukich K.: 'Techniques for Automatically Correcting Words in Text', ACM Computing Surveys, Vol. 24, No. 4, 1992, pp. 377-440.]]
[14]
Kim M.H., Pramanik S.: ' Optimal file distribution for partial match retrieval', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1988, pp. 173-182.]]
[15]
Lin K., Jagadish H. V., Faloutsos C.: 'The TV-tree: An Index Structure for High-Dimensional Data ', VLDB Journal, Vol. 3, pp. 517-542, 1995.]]
[16]
Mehrotra R., Gary J.: 'Feature-Based Retrieval of Similar Shapes', Proc. 9th Int. Conf. on Data Engeneering, April 1993]]
[17]
Mehrotra R., Gary J.: 'Feature-lndex-Based Sililar Shape retrieval', Proc. of the 3rd Working Conf. on Visual Database Systems, March 1995]]
[18]
Preparata F.P., Shamos M. I.: 'Computational Geometry', Chapter 5 ('Proximity: Fundamental Algorithms'), Springer Verlag New York, 1985, pp. 185-225.]]
[19]
Roussopoulos N., Kelley S., Vincent F.: 'Nearest Neighbor Queries', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1995, pp. 71-79.]]
[20]
Ramasubramanian V., Paliwal K. K.: 'Fast k- Dimensional Tree Algorithms for Nearest Neighbor Search with Application to Vector Quantization Encoding', IEEE Transactions on Signal Processing, Vol. 40, No. 3, March 1992, pp. 518-531.]]
[21]
Shoichet B. K., Bodian D. L., Kuntz 1. D.: 'Molecular Docking Using Shape Descriptors', Journal of Computational Chemistry, Vol. 13, No. 3, 1992, pp. 380-397.]]
[22]
Shawney H., Hafner J.: "Efficient Color Histogram hzdexing', Proc. Int. Conf. on Image Processing, 1994, pp. 66-70.]]
[23]
Welch T.: 'Bounds on the Information Retrieval Efficiency of Static File Structures', Technical Report 88, MIT, 1971.]]
[24]
Wallace T., Wintz P.: 'An Efficient Three- Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors ', Computer Graphics and Image Processing, Vol. ! 3, pp. 99-126, 1980]]

Cited By

View all
  • (2022)Data Allocation with Neural Similarity Estimation for Data-Intensive ComputingComputational Science – ICCS 202210.1007/978-3-031-08757-8_45(534-546)Online publication date: 15-Jun-2022
  • (2019)Data Allocation Service ADAS for the Data Rebalancing of ATLASEPJ Web of Conferences10.1051/epjconf/201921406012214(06012)Online publication date: 17-Sep-2019
  • (2019)Hollow-tree: a metric access method for data with missing valuesJournal of Intelligent Information Systems10.1007/s10844-019-00567-8Online publication date: 9-Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data
June 1997
594 pages
ISBN:0897919114
DOI:10.1145/253260
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS97
Sponsor:

Acceptance Rates

SIGMOD '97 Paper Acceptance Rate 42 of 202 submissions, 21%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)30
Reflects downloads up to 11 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Data Allocation with Neural Similarity Estimation for Data-Intensive ComputingComputational Science – ICCS 202210.1007/978-3-031-08757-8_45(534-546)Online publication date: 15-Jun-2022
  • (2019)Data Allocation Service ADAS for the Data Rebalancing of ATLASEPJ Web of Conferences10.1051/epjconf/201921406012214(06012)Online publication date: 17-Sep-2019
  • (2019)Hollow-tree: a metric access method for data with missing valuesJournal of Intelligent Information Systems10.1007/s10844-019-00567-8Online publication date: 9-Jul-2019
  • (2018)Multidimensional range queries on modern hardwareProceedings of the 30th International Conference on Scientific and Statistical Database Management10.1145/3221269.3223031(1-12)Online publication date: 9-Jul-2018
  • (2018)A Tree-Based Graph Coloring Algorithm Using Independent SetProgress in Advanced Computing and Intelligent Engineering10.1007/978-981-13-0224-4_48(537-546)Online publication date: 10-Jul-2018
  • (2018)Data Allocation Based on Evolutionary Data Popularity ClusteringComputational Science – ICCS 201810.1007/978-3-319-93698-7_12(153-166)Online publication date: 12-Jun-2018
  • (2017)Efficient Parallel Processing for KNN QueriesProceedings of the 2017 International Conference on Industrial Design Engineering10.1145/3178264.3178289(88-94)Online publication date: 29-Dec-2017
  • (2017)DBSCAN Revisited, RevisitedACM Transactions on Database Systems10.1145/306833542:3(1-21)Online publication date: 31-Jul-2017
  • (2016)High Performance Parallel Graph Coloring on GPGPUs2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.11(845-854)Online publication date: May-2016
  • (2016)Efficient and high‐quality sparse graph coloring on GPUsConcurrency and Computation: Practice and Experience10.1002/cpe.406429:10Online publication date: 22-Dec-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media