Version vector explained
A version vector is a mechanism for tracking changes to data in a distributed system, where multiple agents might update the data at different times. The version vector allows the participants to determine if one update preceded another (happened-before), followed it, or if the two updates happened concurrently (and therefore might conflict with each other). In this way, version vectors enable causality tracking among data replicas and are a basic mechanism for optimistic replication. In mathematical terms, the version vector generates a preorder that tracks the events that precede, and may therefore influence, later updates.
Version vectors maintain state identical to that in a vector clock, but the update rules differ slightly; in this example, replicas can either experience local updates (e.g., the user editing a file on the local node), or can synchronize with another replica:
- Initially all vector counters are zero.
- Each time a replica experiences a local update event, it increments its own counter in the vector by one.
- Each time two replicas and synchronize, they both set the elements in their copy of the vector to the maximum of the element across both counters:
Va[x]=Vb[x]=max(Va[x],Vb[x])
. After synchronization, the two replicas have identical version vectors.
Pairs of replicas,,, can be compared by inspecting their version vectors and determined to be either: identical (
), concurrent (
), or ordered (
or
). The ordered relation is defined as: Vector
if and only if every element of
is less than or equal to its corresponding element in
, and at least one of the elements is strictly less than. If neither
or
, but the vectors are not identical, then the two vectors must be concurrent.
Version vectors[1] or variants are used to track updates in many distributed file systems, such as Coda (file system) and Ficus, and are the main data structure behind optimistic replication.[2]
Other mechanisms
- Hash Histories [3] avoid the use of counters by keeping a set of hashes of each updated version and comparing those sets by set inclusion. However this mechanism can only give probabilistic guarantees.
- Concise Version Vectors [4] allow significant space savings when handling multiple replicated items, such as in directory structures in filesystems.
- Version Stamps [5] allow tracking of a variable number of replicas and do not resort to counters. This mechanism can depict scalability problems in some settings, but can be replaced by Interval Tree Clocks.
- Interval Tree Clocks[6] generalize version vectors and vector clocks and allows dynamic numbers of replicas/processes.
- Bounded Version Vectors [7] allow a bounded implementation, with bounded size counters, as long as replica pairs can be atomically synchronized.
- Dotted Version Vectors [8] address scalability with a small set of servers mediating replica access by a large number of concurrent clients.
External links
Notes and References
- Douglas Parker, Gerald Popek, Gerard Rudisin, Allen Stoughton, Bruce Walker, Evelyn Walton, Johanna Chow, David Edwards, Stephen Kiser, and Charles Kline. Detection of mutual inconsistency in distributed systems. Transactions on Software Engineering. 1983
- David Ratner, Peter Reiher, and Gerald Popek. Dynamic version vector maintenance. Technical Report CSD-970022, Department of Computer Science, University of California, Los Angeles, 1997
- ByungHoon Kang, Robert Wilensky, and John Kubiatowicz.The Hash History Approach for Reconciling Mutual Inconsistency. ICDCS, pp. 670-677, IEEE Computer Society, 2003.
- [Dahlia Malkhi]
- Paulo Almeida, Carlos Baquero and Victor Fonte. Version Stamps: Decentralized Version Vectors. ICDCS, pp. 544-551, 2002.
- Paulo Almeida, Carlos Baquero and Victor Fonte. Interval Tree Clocks. OPODIS, Lecture Notes in Computer Science, Vol. 5401, pp. 259-274, Springer, 2008.
- José Almeida, Paulo Almeida and Carlos Baquero. Bounded Version Vectors. DISC: International Symposium on Distributed Computing, LNCS, 2004.
- Nuno Preguiça, Carlos Baquero, Paulo Almeida, Victor Fonte and Ricardo Gonçalves. Brief Announcement: Efficient Causality Tracking in Distributed Storage Systems With Dotted Version Vectors. ACM PODC, pp. 335-336, 2012.