1. Introduction
Together with sorting, selection is one of the most widely used procedures in computer algorithms. Given a sequence
A of
n numbers and an integer (selection) parameter
, the selection problem asks to find the
ith smallest element in
A. Sorting trivially solves the selection problem; however, a higher level of sophistication is required by a linear time algorithm. A now classic approach for selection [
1,
2,
3,
4,
5] from the 1970s is to use an element in
A as a pivot to partition
A into two smaller subsequences and recurse on one of them with a (possibly different) selection parameter
i.
The time complexity of this kind of algorithm is sensitive to the pivots used. If a good pivot is used, many elements in
A can be discarded, while if a bad pivot is used, the size of the problem may be only reduced by a constant in the worst case, leading to a quadratic worst-case running time. However carefully choosing a good pivot can be time consuming. Choosing the pivots randomly (and thus without much effort) yields a well-known randomized selection algorithm with expected linear running time; see e.g., [
6] Ch. 9.2, or [
7] Ch. 13.5; or [
8], Ch. 3.4, however its worst case running time is quadratic in
n.
The first deterministic linear time selection algorithm
Select is due to Blum et al. [
1]; it is recursive in nature. By using the median of medians of small disjoint groups of the input array (of constant size at least 5), good pivots that reduce the size of the problem by a constant fraction and thereby lead to
time overall can be chosen at low cost in each recursive invocation. More recently, several variants of
Select with groups of 3 and 4, also running in
time, have been obtained by Chen and Dumitrescu and independently by Zwick; see [
9]. The selection problem, and computing the median in particular, are in close relation with the problem of finding the quantiles of a set.
Quantiles. The
kth
quantiles of an
n-element set are the
order statistics that divide the sorted set in
k equal-sized groups (to within 1). It is known that the
kth quantiles of a set can be computed by a recursive algorithm running in
time ([
6], p. 223). Such an algorithm can be modified, if needed, so that the
k groups can be also output, say, each as a linked list, within the same overall time. For
, the
ith group of elements (bounded by the
th and the
ith quantile) is referred to as the
ith
quantile group; the first quantile group consists of the elements less than or equal to the first quantile, and the
kth quantile group consists of the elements greater than or equal to the
th quantile.
The following problem has been devised by Fredman about 25 years ago for a graduate algorithms course. This paper both generalizes and strengthens that result.
A “very sloppy heap” (abbreviated vsh) is a data structure for performing the following operations on a set S: (i) insert and (ii) delete-small. The latter operation deletes (and returns) an element x which is among the smallest elements in the set, where n is the current size of the set. Explain how to implement a vsh in constant amortized time per operation.
Our main result is the following; for the optimality of the dependence on
k, see the first remark in
Section 4.2.
Theorem 1. For any fixed integer k, a data structure for dynamic sets exists accommodating each of the following operations in constant time (that depends on k): (i)Inserta new element and (ii)Delete(and return) an element of the ith quantile group of the current set, where . The time per operation is , which is optimal in a comparison-based model even in an amortized setting.
Background and related problems. Since the selection problem is of primary importance, the interest in selection algorithms has remained high ever since; see for instance [
2,
5,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24]. In particular, determining the comparison complexity for computing various order statistics including the median has led to many exciting questions, some of which are still unanswered today; in this respect, Yao’s hypothesis on selection ([
25], Ch. 4) remains an inspiring endeavor [
4,
15,
23]. A comprehensive review of early developments in selection is provided by Knuth [
26]. Computational results on the exact comparison complexity of finding the
ith smallest out of
n for small
, have been obtained in [
18,
27]. We also refer the reader to the dedicated book chapters on selection in [
6,
7,
28,
29,
30] and the more recent articles [
9,
31,
32], including experimental work [
33].
The selection problem is also of interest in an online or dynamic setting, where elements are inserted or deleted. A balanced binary search tree on
n distinct elements can be augmented with a
size attribute for each node, thereby allowing the retrieval of an element of a given rank in
time ([
6], Ch. 14). We say that the
ith smallest element has
rank i, where
, with ties broken arbitrarily. Further, determining the rank of an element in the data structure can also be done in
time. Consequently, a dynamic order statistic operation (inserting a new element or deleting an element of a given rank) can be accomplished within the same time bound.
Priority queues (
heaps in particular) are ubiquitous data structures; a typical operation set might include
Insert,
Delete-Min,
Decrease-Key (and perhaps also
Find-Min,
Increase-Key, or other operations). See for instance [
34] and the references therein for some new variants and recent developments.
A variant of a priority queue that allows dynamic maintenance of percentiles is the
soft heap data structure due to Chazelle [
35]; in addition to
Create and
Meld, the data structure accommodates
Insert,
Delete and
Findmin (
Delete x removes item
x). Consider a sequence of
operations after its creation, that includes
n Insert operations. For any
, a soft heap with error rate
supports each operation in constant amortized time, except for
Insert which takes
amortized time. The data structure ensures that at most
of its items are
corrupted (their keys have been artificially raised).
Findmin returns the minimum current key (which might be corrupted). In contrast, our data structure uses
time per operation, where
, and thereby outperforms the soft heap with respect to the worst-case time per update (insert or delete) operation.
Definition 1. Let k be a fixed positive integer. A “selectable sloppy heap” (abbreviated ssh) is a data structure for performing the following operations on a set S with n elements:
- (a)
Insertx: a new elementxis inserted.
- (b)
Deletei: this deletes (and returns) some unspecified elementxwhich belongs to theith quantile group of the current set, where; if, the deleted element is not subject to any requirement.
Outline of the paper. To explain the main challenges and introduce the main ideas, we first sketch several preliminary implementations of the data structure, meeting suboptimal (in
k) benchmarks: (i)
amortized time per operation for the first variant in
Section 2; (ii)
worst-case time per operation for the second variant in
Section 2; (iii)
amortized time per operation for the third variant in
Section 2. We then further refine these methods in
Section 3 to obtain an optimal implementation of the data structure running in
worst-case time per operation. In particular, for constant
k, the second variant in
Section 2 and the (main) variant in
Section 3 run in
worst-case time per operation. We conclude in
Section 4 with applications and some remarks.
Comments and notations. Duplicate elements are easily handled by the design. Each operation request is associated with a discrete time step, i.e., the jth operation occurs during the jth time step, where Without affecting the results, the floor and ceiling functions are omitted in the descriptions of the algorithms and their analyses.
Let A be a set; we write if for every . The size of a bucket , i.e., the number of elements in , is denoted by . If buckets and merge into a new bucket , written as , we clearly have . Similarly, if a bucket splits into two buckets , written as , then we have as well.
2. Some Ideas and Preliminary Suboptimal Solutions
A first variant withamortized time per operation. Let n denote the current number of elements in the data structure, with the elements being stored in a linear list (or an array). By default (if n is large enough) the algorithm proceeds in phases.
If , and no phase is under way, proceed by brute force: For Insert, add the new element to the list; for Delete i, compute the elements in the ith quantile group and delete (return) one of them arbitrarily. Since k is constant, each operation takes time .
If , and no phase is under way, start a new phase: Reorganize the data structure by computing the th quantiles of the set and the corresponding groups; store each group in a list (bucket) of size . The next operations make up the current phase. Process each of these operations by using exclusively elements stored in these buckets. For Insert, add the new element to an initially empty overflow bucket (an empty overflow bucket is created at each reorganization). For Delete i, remove any element from the bucket , i.e., from the middle third of the ith quantile group (out of the total k).
The reorganization takes
time and is executed about every
steps, where each operation counts as one step. The resulting amortized cost per operation is
A second variant withworst-case time per operation. The above idea is next refined so as to obtain the above expression as a worst-case time bound per operation. However, we need a larger set of quantiles: buckets will suffice (a more detailed analysis likely gives a better bound). We can assume that . The algorithm proceeds in phases: Each phase starts with a reorganization, namely computing the th quantiles and the corresponding quantile groups in time . For Insert, add the new element to an initially empty overflow bucket (an empty overflow bucket is created at each reorganization). For Delete i, remove any element from the bucket , i.e., from the middle fifth of the ith quantile group (out of the total k).
There are operations, i.e., steps following each reorganization until the results of the next reorganization become available. Assume that the data structure holds n items when the current reorganization starts (n is redefined at the start of each reorganization) and there are old buckets to use until the reorganization is finalized. That is, after that many steps, new buckets—however, with an old content—become available. These buckets are used for the next operations, and so on.
To transform the
amortized time guarantee into a
worst-case guarantee for each operation, spread the execution of reorganization over the next
operations, i.e., over the entire phase. The resulting time per operation is bounded from above by
Since Delete is serviced from the existent buckets and Insert is serviced using the overflow bucket, each operation involves inserting or deleting one element from a list; thus each operation takes time to process.
New buckets become available about every other
operations; this is a rather imprecise estimate because the current number of elements,
n, changes. Since a reorganization is spread out over multiple steps, the result becomes available with some delay, and moreover, its content is (partially) obsolete. Assume there are
n elements at the start of a reorganization during phase
. At the end of the reorganization, the data structure reflects the earlier situation when there were
n elements; and this organized structure is used to answer queries during the following phase
. As such, the number of elements,
, in the data structure at operation time (during
), belongs to the interval
The total number of operations in phases
and
is at most
To verify that the data structure operates as intended, one needs to check that the rank of a deleted element belongs to the required interval (quantile group). The key observation is that any one operation can affect the rank of any element by at most 1: To be precise, only Insert or Delete of a smaller element can increase the rank of an element by 1 or decrease the rank of an element by 1, respectively.
By the above observation, the rank of a deleted element lies in the interval
We need to show that this rank belongs to the interval
It suffices to verify that
and respectively that
Inequality (
6) is equivalent to
Since
, it is enough to verify that
Inequality (
7) is equivalent to
Similarly, this follows from the inequality .
A third variant withamortized time per operation. We briefly describe an implementation achieving
amortized time per operation that is tailored on the idea of a B-tree; this variant was suggested by Fredman [
36]. Use a balanced binary search tree
with
levels storing
splitting keys at the leafs. Each leaf comprises a bucket of
items, with order maintained between buckets, but
not within buckets.
As mentioned earlier, a balanced binary search tree on distinct elements can be augmented with a size attribute for each node, thereby allowing the retrieval of an element of a given rank in time proportional to its height. Using this mechanism, can be augmented with such information allowing the retrieval of a leaf containing elements within a given quantile group, as needed for answering an operation request, in time.
When an insertion causes a bucket to become too large, it gets split into two buckets by performing a median selection operation. The balanced binary search tree is correspondingly updated by deleting a leaf and inserting two new ones. Small buckets (due to deletions) get merged; the balanced binary search tree is correspondingly updated by deleting two leaves and inserting a new one. A bucket of size m, once split, won’t get split sooner than another operations.
When the number of elements doubles (or halves), a tree reorganization is triggered that partitions the present items into uniform sized buckets, so that the new common bucket size m is items. These event-triggered reorganizations ensure that buckets do not become too small unless they are target of deletions; similarly, buckets do not become too large unless they are target of insertions. Since the reorganization cost is , and operation requests take place between successive reorganizations, this scheme yields amortized time per operation.
Observe that when k is a constant, the 2nd variant supports operations in constant time.
3. A Variant with Optimal Worst-Case Time per Operation
A brief examination of the approach in the 3rd variant reveals two bottlenecks in achieving
worst-case time per operation: The median computation that comes with splitting large buckets and the tree reorganization that occurs when
n doubles (or halves). We briefly indicate below how ideas from the 2nd and 3rd early variants are refined to obtain
worst-case time per operation; in particular,
time for constant
k. It is shown in the process how execution in
parallel of several sequential procedures can lead to a
speed up of the data structure. This only means that some tasks are run as “background processes”, as no real parallel processing is used in this paper. The idea of slowing down for being faster has also appeared in other forms; see, e.g., [
37].
A balanced BST for keys is used, subdividing the data into buckets. Each bucket is maintained as a linked list. A size attribute is associated with each node reflecting the number of elements stored in the subtree rooted at the respective node. Modifications in the data structure at each operation are reflected in appropriate updates of the size attributes at the nodes of the search paths involved, in time overhead per operation.
If n is the current number of elements, each bucket holds at most elements, and so each of the k quantile groups contains at least one of the buckets entirely. By choosing one such bucket at the end of the search path in the BST for executing a Delete operation from the ith quantile group guarantees its correctness.
The
n elements are kept in
buckets of maximum size
that form the
leaves of a balanced binary search tree
. As in the third variant (in
Section 2), each leaf holds a bucket, with order maintained between buckets, but
not within buckets. Buckets that become too large are split in order to enforce a suitable upper limit on the bucket size. Median finding procedures along with other preparatory and follow-up procedures accompanying bucket splits are scheduled as background computation, as in the second variant (in
Section 2).
Our data structure merges small buckets in order to keep the number of buckets under control and renounces the periodic tree reorganizations by introducing new elements of design: a round robin process and a priority queue jointly control the maximum bucket size and the number of buckets in the BST. These mechanisms are introduced to prevent buckets becoming too small or too large as an effect of changes in the total number of elements, n, and not necessarily as an effect of operations directed to them.
Outline and features. Let . For illustrating the main ideas, assume now that . The buckets are linked in a doubly-linked linear list , in key order; adding two links between the last and the first bucket yields a circular list , referred to as the round robin list. We note that and are two views of the same data structure.
Each operation request translates to locating a suitable bucket for implementing the request. The circular list is traversed in a round robin fashion, so that the current round robin bucket in the list is also examined during the current operation request. The round robin process ensures that (i) the buckets do not exceed their maximum capacity, and (ii) certain “long-term” preparatory bucket-splitting procedures are run in the background over a succession of non-consecutive discrete time steps allocated to the same bucket.
Each bucket split entails a merge-test for the pair of adjacent buckets with the minimum sum of sizes. The process of merging adjacent buckets in is controlled by a priority queue in the form of a binary min-heap . If , i.e., there are t buckets, holds the sums of sizes , for buckets ; here denotes the bucket that follows in . A merge is made provided the minimum value at the top of is below some threshold; and , and are correspondingly updated. Merging adjacent buckets ensures that the total number of buckets remains under control, regardless of which buckets are accessed by operation requests.
The process of splitting buckets in is controlled by a round robin process. A bucket split is initiated as a background process provided its size (as a current bucket) is above some threshold. Splitting buckets ensures that buckets sizes remains under control, regardless of which buckets are accessed by operation requests. The bucket granularity is needed to ensure that the sloppy heap data structure responds correctly to operation requests.
Elements of the design. We highlight: (i) Use of the priority queue to keep the number of buckets under control; (ii) use of the round robin process to detect large buckets, and (iii) running the procedures involved at different rates as needed to ensure that certain task deadlines and precedence constraints among them are met. The data structure maintains the following two invariants:
Each bucket contains between 1 and elements; there is no limit from below imposed on the bucket size.
The number of buckets is between and , as implied by the maximum bucket size and a later argument based on the rules of operation (on merging adjacent buckets); see Action 2 and Lemma 1.
Recall that all buckets are linked in a circular round robin list: in key order, and with the last bucket linked to the first one. A pointer to the current round robin bucket (initialized arbitrarily, say, to the first bucket) is maintained. Each operation request advances this position in the list by one slot. Observe that every bucket becomes the round robin bucket about every discrete time steps. Further, each operation request leads via the search path in to one of the buckets, referred to as the operation bucket. Executing one operation is done by a sequence of actions performed in the operation bucket and the round robin bucket; these are referred to as the current buckets. All these actions are therefore associated with the same discrete time step. Every bucket update (this includes creation or deletion of buckets) entails a corresponding update of the binary heap in the (at most) two heap elements that depend on it.
A bucket is declared large if its current size exceeds of the maximum allowed, i.e., if the current size exceeds . All other buckets are declared regular. Each operation may cause an update in the status of the round robin bucket and the operation bucket (among large and regular).
Action 1. Execute the requested operation: Either add a new element to the respective bucket, or delete an arbitrary element from it. If the operation bucket becomes empty as a result of the current Delete operation, it is deleted, and , (and t) are correspondingly updated to reflect this change in time. Status updates in the current two buckets are made, if needed. If a median finding procedure or a follow up procedure is under way in the current operation bucket or the current round robin bucket (see details below), the next segment consisting of 500 (or less) elementary operations of this background procedure is executed. These are instructions usually counted in standard analyses of algorithms, such as arithmetic operations, data movements, comparisons, etc. Specifically, we assume that the median of n elements can be found in at most elementary operations.
Action 2 (merge-test). Let be the minimum value at the top of the heap . If , merge the two buckets into one: , and update the tree , the bucket list and the binary heap to reflect this change: (i) Delete the two buckets that are merged and insert the new one that results into and ; this only involves simple list handling and adjusting attributes at the nodes of on the search paths involved. (ii) extract the minimum element from and insert the two new sum of sizes of two adjacent buckets formed by the new bucket into (if they exist).
Handling of and takes time, time, and time, respectively. In particular, this action ensures that the size of any new bucket obtained by merge is at most . It is worth noting that: a bucket for which a partitioning procedure is under way, as described below, cannot be part of a pair of buckets to merge (i.e., passing the merge-test); bucket merges are not scheduled as background processes.
Action 3 (finalizing a split). Similar to the merge operations in Action 2, finalizing the split can be completed in time: It essentially involves updating , , and . It will be shown subsequently that the size of any new bucket resulting from a split is in the range .
Besides Actions , there are actions associated with procedures running in the background, whose execution is spread out over a succession of non-consecutive discrete time steps allocated to the same bucket. The procedures are in preparation of splitting large buckets.
Splitting a large bucket. For simplicity of exposition and without loss of generality, we may assume that , for a suitable constant (one can take ). It is worth noting that a bucket may become large by repeated deletions from other buckets leading to a reduction in the total number of items, n; moreover, by the same reasoning, several buckets may become large at the same time.
Recall that a bucket is large if . When such condition is reached, no background procedure is active in that bucket; this condition launches a bucket split described below. Let be the number of elements existent when the large bucket is detected and the procedure is initiated. Place elements into (a main part) and the remaining elements into (a secondary part) . It is worth noting at this point that even this first step of partitioning into and does not take time and is in fact part of the background procedure; indeed, each bucket is stored as a linked list and as such, partitioning it into two lists of prescribed sizes requires scanning the list; its size is .
There are two possible cases: (i) becomes large as a result of insertion into , and (ii) becomes large while being inactive and n is decreasing (as a result of deletions from other buckets). On one hand, , thus ; on the other hand, , and thus , since at the latest, a large bucket will be detected and dealt with at the step when it becomes a round robin bucket, namely in at most steps by Lemma 1 (forthcoming at the end of this section). Consequently, in either case, we have when the bucket splitting procedure is initiated. Any insertions and deletions from the current bucket until the split is finalized are performed using the secondary part . A balanced partition of the current bucket (as explained below) will be obtained in at most time steps.
Consider the following operation requests. By Lemma 1 below, the number of buckets is at most at any time, and so at least time steps are allocated to in the round robin process by the time mark. Let , where , mark the first occurrence when a total of discrete time steps have been allocated to (as operation steps or round robin steps). As shown next, this number of steps suffices for finalizing the balanced partition and the split of . The process calls two procedures, labeled (A) and (B):
(A) Start a median finding procedure, i.e., for finding the two quantile groups of : An element is found so that and . At the point when this procedure is launched, the computational steps are scheduled so that every time the bucket gets accessed (either as the operation bucket or as the round robin bucket), 500 computational steps for selecting the median element of take place. Assume for concreteness that median finding on an input set S takes at most elementary operations; when applied to , we have , and thus elementary operations suffice. At the rate of 500 per discrete time step, it follows that time steps suffice for finding the median of .
(B) After the median has been determined, a follow up procedure is initiated in the same bucket. It aims at reducing and finally eliminating the leftover of , so that in another discrete steps, a balanced partition of the current bucket is obtained. The follow up procedure starts with the two quantile groups of the same size, , as created by the median finding procedure, by comparing each element of against m and properly placing it in one of the two groups. This procedure runs at the rate of 10 items per discrete time step accessing the current bucket (either as the operation bucket or as the round robin bucket), until finally the partitioning process is completed with all elements in the current bucket properly placed against the pivot m. Note that at any time during the background procedure (as implied by the inequality ). At the rate of 10 items per discrete time step, it follows that time steps suffice for completing the follow up procedure.
The two procedures terminate within a total of at most discrete time steps, as required. The parameters are chosen so that the split takes place before the large bucket becomes illegal.
At most
operation requests have been made since the partitioning procedure was initiated and so
; thus, in terms of the current number of elements,
n, the split of a large bucket
(this qualification refers to the time when the partitioning procedure was initiated) produces two smaller buckets
,
, where
Remark 1. Let be a new bucket produced in the current operation. If is generated by a split operation, then by (8). If is generated by a merge operation, then by the merge-test. Analysis of merging buckets and maintaining the two invariants. To prove that the two invariants and are maintained, we need the following key fact.
Lemma 1. The number of buckets is at most at any time.
Proof. Let t denote the number of buckets after the current operation is executed, and denote the discrete time steps. Let be the buckets after the current step in key order. Write , for . We proceed by induction on j and show that, if the number of buckets in (and ) is at most N after each of the preceding N time steps, it remains at most N after the current time step. Observe that the number of buckets can only increase by one after a bucket split, and there can be at most two splits associated with a discrete time step. The induction basis is , and then indeed, we have , as required.
For the induction step, assume that the number of buckets is at most N after the previous operation. If no bucket splits occur during the execution of the current operation, the number of buckets remains unchanged, and is thus still at most N after the current operation. If bucket splits occur, it suffices to show that the number of buckets is at most N after each split. Consider a split operation, and let be the minimum value at the top of the heap after the split. There are two cases:
Case 1., and thus the merge is executed. Consequently, the number of buckets after the split is still at most N, as required.
Case 2., and thus no merge is executed. Since
is a min-heap, we have
Adding these
inequalities yields
or
, as claimed, and concluding the induction step. □
Summary of the analysis. As seen from the preceding paragraphs and the fact that the number of buckets is , the total time per operation is , as required.