[go: up one dir, main page]

skip to main content
research-article

Schism: a workload-driven approach to database replication and partitioning

Published: 01 September 2010 Publication History

Abstract

We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of shared-nothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed transactions, while producing balanced partitions. Schism consists of two phases: i) a workload-driven, graph-based replication/partitioning phase and ii) an explanation and validation phase. The first phase creates a graph with a node per tuple (or group of tuples) and edges between nodes accessed by the same transaction, and then uses a graph partitioner to split the graph into k balanced partitions that minimize the number of cross-partition transactions. The second phase exploits machine learning techniques to find a predicate-based explanation of the partitioning strategy (i.e., a set of range predicates that represent the same replication/partitioning scheme produced by the partitioner).
The strengths of Schism are: i) independence from the schema layout, ii) effectiveness on n-to-n relations, typical in social network databases, iii) a unified and fine-grained approach to replication and partitioning. We implemented and tested a prototype of Schism on a wide spectrum of test cases, ranging from classical OLTP workloads (e.g., TPC-C and TPC-E), to more complex scenarios derived from social network websites (e.g., Epinions.com), whose schema contains multiple n-to-n relationships, which are known to be hard to partition. Schism consistently outperforms simple partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.

References

[1]
S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, 2004.
[2]
F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A distributed storage system for structured data. In OSDI, 2006.
[3]
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s hosted data serving platform. PVLDB, 1(2), 2008.
[4]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. SoCC, 2010.
[5]
C. Curino, E. Jones, Y. Zhang, E. Wu, and S. Madden. Relationalcloud: The case for a database service. New England Database Summit, 2010.
[6]
D. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Comm. ACM, 1992.
[7]
R. Freeman. Oracle Database 11g New Features. McGraw-Hill, Inc., New York, NY, USA, 2008.
[8]
S. Ghandeharizadeh and D. J. DeWitt. Hybrid-range partitioning strategy: a new declustering strategy for multiprocessor databases machines. In VLDB, 1990.
[9]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11, 2009.
[10]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1), 1998.
[11]
G. Karypis and V. Kumar. MeTis: Unstructured Graph Partitioning and Sparse Matrix Ordering System, Version 5.0. http://www.cs.umn.edu/~metis, 2009.
[12]
B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell Systems Technical Journal 49, pages 291--307, 1970.
[13]
R. Khandekar, S. Rao, and U. Vazirani. Graph partitioning using single commodity flows. J. ACM, 56(4):1--15, 2009.
[14]
M. Koyutürk and C. Aykanat. Iterative-improvement-based declustering heuristics for multi-disk databases. Journal of Information Systems, 30(1):47--70, 2005.
[15]
P. Massa and P. Avesani. Controversial users demand local trust metrics: an experimental study on epinions.com community. In AAAI'05, 2005.
[16]
J. M. Pujol, G. Siganos, V. Erramilli, and P. Rodriguez. Scaling online social networks without pains. NetDB, 2009.
[17]
J. R. Quinlan. C4.5: Programs for machine learning. Morgan Kaufmann Series in Machine Learning, 1993.
[18]
J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In SIGMOD, 2002.
[19]
D. ren Liu and S. Shekhar. Partitioning similarity graphs: A framework for declustering problems. ISJ, 21, 1996.
[20]
N. Selvakkumaran and G. Karypis. Multi.objective hypergraph partitioning algorithms for cut and maximum subdomain degree minimization. In ICCAD, 2003.
[21]
M. Stonebraker, S. Madden, D. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). In VLDB, 2007.
[22]
M. M. Tsangaris and J. F. Naughton. A stochastic approach for clustering in object bases. SIGMOD Rec., 20(2), 1991.
[23]
D. C. Zilio. Physical database design decision algorithms and concurrent reorganization for parallel database systems. In PhD thesis, 1998.

Cited By

View all
  • (2025)AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access CorrelationsProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710856(142-155)Online publication date: 28-Feb-2025
  • (2024)Lightweight Asynchronous Repartitioning for Local State Partitioned SystemsAnais do XXV Simpósio em Sistemas Computacionais de Alto Desempenho (SSCAD 2024)10.5753/sscad.2024.244785(204-215)Online publication date: 23-Oct-2024
  • (2024)Reducing Persistence Overhead in Parallel State Machine Replication through Time-Phased Partitioned CheckpointJournal of Internet Services and Applications10.5753/jisa.2024.389115:1(194-211)Online publication date: 26-Jul-2024
  • Show More Cited By
  1. Schism: a workload-driven approach to database replication and partitioning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
    September 2010
    1658 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 September 2010
    Published in PVLDB Volume 3, Issue 1-2

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)95
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access CorrelationsProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710856(142-155)Online publication date: 28-Feb-2025
    • (2024)Lightweight Asynchronous Repartitioning for Local State Partitioned SystemsAnais do XXV Simpósio em Sistemas Computacionais de Alto Desempenho (SSCAD 2024)10.5753/sscad.2024.244785(204-215)Online publication date: 23-Oct-2024
    • (2024)Reducing Persistence Overhead in Parallel State Machine Replication through Time-Phased Partitioned CheckpointJournal of Internet Services and Applications10.5753/jisa.2024.389115:1(194-211)Online publication date: 26-Jul-2024
    • (2024)GaussDB: A Cloud-Native Multi-Primary Database with Compute-Memory-Storage DisaggregationProceedings of the VLDB Endowment10.14778/3685800.368580617:12(3786-3798)Online publication date: 8-Nov-2024
    • (2024)Towards Optimal Transaction SchedulingProceedings of the VLDB Endowment10.14778/3681954.368195617:11(2694-2707)Online publication date: 30-Aug-2024
    • (2024)A Comparative Study and Component Analysis of Query Plan Representation Techniques in ML4DB StudiesProceedings of the VLDB Endowment10.14778/3636218.363623517:4(823-835)Online publication date: 5-Mar-2024
    • (2024)Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage EnginesProceedings of the ACM on Management of Data10.1145/36393022:1(1-28)Online publication date: 26-Mar-2024
    • (2024)Scalable High-Quality Hypergraph PartitioningACM Transactions on Algorithms10.1145/362652720:1(1-54)Online publication date: 22-Jan-2024
    • (2024)Stage: Query Execution Time Prediction in Amazon RedshiftCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653391(280-294)Online publication date: 9-Jun-2024
    • (2024)Locality-Preserving Graph Traversal With Split Live MigrationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343682835:10(1810-1825)Online publication date: 2-Aug-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media