In computer science, a fusion tree is a type of tree data structure that implements an associative array on -bit integers on a finite universe, where each of the input integers has size less than 2w and is non-negative. When operating on a collection of key–value pairs, it uses space and performs searches in time, which is asymptotically faster than a traditional self-balancing binary search tree, and also better than the van Emde Boas tree for large values of . It achieves this speed by using certain constant-time operations that can be done on a machine word. Fusion trees were invented in 1990 by Michael Fredman and Dan Willard.[1]
Several advances have been made since Fredman and Willard's original 1990 paper. In 1999[2] it was shown how to implement fusion trees under a model of computation in which all of the underlying operations of the algorithm belong to AC0, a model of circuit complexity that allows addition and bitwise Boolean operations but does not allow the multiplication operations used in the original fusion tree algorithm. A dynamic version of fusion trees using hash tables was proposed in 1996[3] which matched the original structure's runtime in expectation. Another dynamic version using exponential tree was proposed in 2007[4] which yields worst-case runtimes of per operation. Finally, it was shown that dynamic fusion trees can perform each operation in time deterministically.[5]
This data structure implements add key, remove key, search key, and predecessor (next smaller value) and successor (next larger value) search operations for a given key. The partial result of most significant bit locator in constant time has also helped further research. Fusion trees utilize word-level parallelism to be efficient, performing computation on several small integers, stored in a single machine word, simultaneously to reduce the number of total operations.
A fusion tree is essentially a B-tree with branching factor of (any small exponent is also possible as it will not have a great impact on the height of the tree), which gives it a height of . To achieve the desired runtimes for updates and queries, the fusion tree must be able to search a node containing up to keys in constant time. This is done by compressing ("sketching") the keys so that all can fit into one machine word, which in turn allows comparisons to be done in parallel. So, a series of computations involving sketching, parallel comparison and most significant bit index locator, help reach the required solution.
Sketching is the method by which each -bit key at a node containing keys is compressed into only bits. Each key may be thought of as a path in the full binary tree of height starting at the root and ending at the leaf corresponding to . This path can be processed by recursively searching the left child of i if the ith bit is 0, and the right child if it is 1, generally, until all bits are scanned. To distinguish two paths, it suffices to look at their branching point (the first bit where any two keys differ). As there are a maximum of k keys, there will not be more than k-1 branching points, which means that no more than k-1 bits are required to identify a key. And hence, no sketch will have more than k-1 bits.
An important property of the sketch function is that it preserves the order of the keys. That is, for any two keys . So, for the entire range of keys, sketch(x0)
If the locations of the sketch bits are b1 < b2 < ⋅⋅⋅ < br, then the sketch of the key xw-1⋅⋅⋅x1x0 is the r-bit integer
x | |
br |
x | |
br-1 |
…
x | |
b1 |
With only standard word operations, such as those of the C programming language, it is difficult to directly compute the perfect sketch of a key in constant time. Instead, the sketch bits can be packed into a range of size at most r4, using bitwise AND and multiplication, called the approximate sketch, which does have all the important bits but also some additional useless bits spread out in a predictable pattern. The bitwise AND operation serves as a mask to remove all these non-sketch bits from the key, while the multiplication shifts the sketch bits into a small range. Like the "perfect" sketch, the approximate sketch also preserves the order of the keys and means that sketch(x0)
Some preprocessing is needed to determine the correct multiplication constant. Each sketch bit in location bi will get shifted to bi + mi via a multiplication by m =
r | |
style\sum | |
i=1 |
An inductive argument shows how the mi can be constructed. Let m1 = w - b1. Suppose that 1 < t ≤ r and that m1, m2... mt-1 have already been chosen. Then pick the smallest integer mt such that both properties (1) and (2) are satisfied. Property (1) requires that mt ≠ bi - bj + ml for all 1 ≤ i, j ≤ r and 1 ≤ l ≤ t-1. Thus, there are less than tr2 ≤ r3 values that mt must avoid. Since mt is chosen to be minimal, (bt + mt) ≤ (bt-1 + mt-1) + r3. This implies Property (3).
The approximate sketch is thus computed as follows:
r-1 | |
\sum | |
i=0 |
bi | |
2 |
The purpose of the compression achieved by sketching is to allow all of the keys to be stored in one w-bit word. Let the node sketch of a node be the bit string
1sketch
(x1)1sketch
(x2)...1sketch
(xk)
Here, all sketch words are clubbed together in one string by prepending a set bit to each of them. We can assume that the sketch function uses exactly b ≤ r4 bits. Then each block uses 1 + b ≤ w4/5 bits, and since k ≤ w1/5, the total number of bits in the node sketch is at most w.
A brief notational aside: for a bit string s and nonnegative integer m, let sm denote the concatenation of s to itself m times. If t is also a bit string, st denotes the concatenation of t to s.
The node sketch makes it possible to search the keys for any b-bit integer y. Let z = (0y)k, which can be computed in constant time (multiply y by the constant (0b1)k), to make it as long as the node sketch such that each word in the node sketch can be compared with the query integer y in one operation, demonstrating word-level parallelism. If y were 5 bits long, it would be multiplied by 000001....000001 to get sketch(y)k. The difference between sketch(xi) and 0y results in the leading bit for each block to be 1, if and only if sketch(y)
\leq
sketch
(xi) ≥ y as follows:For an arbitrary query q, parallel comparison computes the index i such that
sketch
(xi-1) ≤ sketch
(q) ≤ sketch
(xi)Unfortunately, this does give the exact predecessor or successor of q, because the location of the sketch of q among the sketches of all the values may not be the same as the location of q in all the actual values. What is true is that, among all of the keys, either xi-1 or xi has the longest common prefix with q. This is because any key y with a longer common prefix with q would also have more sketch bits in common with q, and thus sketch
(y) would be closer to sketch
(q) than any sketch
(xj).
The length longest common prefix between two w-bit integers a and b can be computed in constant time by finding the most significant bit of the bitwise XOR between a and b. This can then be used to mask out all but the longest common prefix.
Note that p identifies exactly where q branches off from the set of keys. If the next bit of q is 0, then the successor of q is contained in the p1 subtree, and if the next bit of q is 1, then the predecessor of q is contained in the p0 subtree. This suggests the following algorithm for determining the exact location of q:
sketch
(xi-1) ≤ sketch
(q) ≤ sketch
(xi).sketch
(e). This is the actual predecessor of q.sketch
(e). This is the actual successor of q.An application of fusion trees to hash tables was given by Willard, who describes a data structure for hashing in which an outer-level hash table with hash chaining is combined with a fusion tree representing each hash chain.In hash chaining, in a hash table with a constant load factor, the average size of a chain is constant, but additionally with high probability all chains have size, where is the number of hashed items.This chain size is small enough that a fusion tree can handle searches and updates within it in constant time per operation. Therefore, the time for all operations in the data structure is constant with high probability.More precisely, with this data structure, for every inverse-quasipolynomial probability, there is a constant such that the probability that there exists an operation that exceeds time is at most .[6]
The computational model for the Fusion Tree algorithm is a Word RAM with a specific instruction set, including arithmetic instructions - addition, subtraction and multiplication (all performedmodulo) and Boolean operations - bitwise AND, NOT etc. A double-precision multiplication instruction is also included. It has been shown[7] that removal of the latter instruction makes it impossible to sort faster than, unless it is permitted to use memory space of nearly words (in contrast to linear space used by Fusion Trees), or include other instructions instead.