Zhuhe Fang

Followers

Following

Mentions

Public Views

Interests

Uploads

Papers by Zhuhe Fang

Join algorithms towards inmemory computing

Journal of East China Normal University, Sep 25, 2014

Parallelizing Multiple Pipelines of One Query in a Main Memory Database Cluster

2018 IEEE 34th International Conference on Data Engineering (ICDE), 2018

To fully use the advanced resources of a main memory database cluster, we take independent parall... more To fully use the advanced resources of a main memory database cluster, we take independent parallelism into account to parallelize multiple pipelines of one query. However, scheduling resources to multiple pipelines is an intractable problem. Traditional static approaches to this problem may lead to a serious waste of resources and suboptimal execution order of pipelines, because it is hard to predict the actual data distribution and fluctuating workloads at compile time. In response, we propose a dynamic scheduling algorithm, List with Filling and Preemption (LFPS), based on two techniques. (1) Adaptive filling improves resource utilization by issuing more extra pipelines to adaptively fill idle resource "holes" during execution. (2) Cost-based preemption strictly guarantees scheduling the pipelines on a critical path first at run time. We implement LFPS in our prototype database system. Under the workloads of TPC-H, experiments show our work improves the finish time of p...

TiDB

Proceedings of the VLDB Endowment, 2020

Hybrid Transactional and Analytical Processing (HTAP) databases require processing transactional ... more Hybrid Transactional and Analytical Processing (HTAP) databases require processing transactional and analytical queries in isolation to remove the interference between them. To achieve this, it is necessary to maintain different replicas of data specified for the two types of queries. However, it is challenging to provide a consistent view for distributed replicas within a storage system, where analytical requests can efficiently read consistent and fresh data from transactional workloads at scale and with high availability. To meet this challenge, we propose extending replicated state machine-based consensus algorithms to provide consistent replicas for HTAP workloads. Based on this novel idea, we present a Raft-based HTAP database: TiDB. In the database, we design a multi-Raft storage system which consists of a row store and a column store. The row store is built based on the Raft algorithm. It is scalable to materialize updates from transactional requests with high availability. ...

Download

Interleaved multi-vectorizing

Proceedings of the VLDB Endowment, 2019

SIMD is an instruction set in mainstream processors, which provides the data level parallelism to... more SIMD is an instruction set in mainstream processors, which provides the data level parallelism to accelerate the performance of applications. However, its advantages diminish when applications suffer from heavy cache misses. To eliminate cache misses in SIMD vectorization, we present interleaved multi-vectorizing (IMV) in this paper. It interleaves multiple execution instances of vectorized code to hide memory access latency with more computation. We also propose residual vectorized states to solve the control flow divergence in vectorization. IMV can make full use of the data parallelism in SIMD and the memory level parallelism through prefetching. It reduces cache misses, branch misses and computation overhead to significantly speed up the performance of pointer-chasing applications, and it can be applied to executing entire query pipelines. As experimental results show, IMV achieves up to 4.23X and 3.17X better performance compared with the pure scalar implementation and the pure...

Download

SIMD Accelerates the Probe Phase of Star Joins in Main Memory Databases

Database Systems for Advanced Applications, 2019

In main memory databases, the joins on star schema tables cost the majority of time, which is dom... more In main memory databases, the joins on star schema tables cost the majority of time, which is dominated by the expensive probe phase. In this paper, we vertically or horizontally vectorize the probe phase using SIMD. In addition, we speed up the vectorized probe by prefetching. As our results show, the vertical vectorized integrated probe is up to 2.19X (2.63X) faster than its scalar version, as well as 3.24X (2.74X) faster than the traditional execution based on the right-deep-tree plans on CPU processors (co-processors).