In cryptography, a universal hashing message authentication code, or UMAC, is a message authentication code (MAC) calculated using universal hashing, which involves choosing a hash function from a class of hash functions according to some secret (random) process and applying it to the message. The resulting digest or fingerprint is then encrypted to hide the identity of the hash function that was used. A variation of the scheme was first published in 1999. As with any MAC, it may be used to simultaneously verify both the data integrity and the authenticity of a message. In contrast to traditional MACs, which are serializable, a UMAC can be executed in parallel. Thus, as machines continue to offer more parallel-processing capabilities, the speed of implementing UMAC can increase.[1]
A specific type of UMAC, also commonly referred to just as "UMAC", is described in an informational RFC published as RFC 4418 in March 2006. It has provable cryptographic strength and is usually substantially less computationally intensive than other MACs. UMAC's design is optimized for 32-bit architectures with SIMD support, with a performance of 1 CPU cycle per byte (cpb) with SIMD and 2 cpb without SIMD. A closely related variant of UMAC that is optimized for 64-bit architectures is given by VMAC, which was submitted to the IETF as a draft in April 2007 but never gathered enough attention to be approved as an RFC.
See main article: universal hashing. Let's say the hash function is chosen from a class of hash functions H, which maps messages into D, the set of possible message digests. This class is called universal if, for any distinct pair of messages, there are at most |H|/|D| functions that map them to the same member of D.
This means that if an attacker wants to replace one message with another and, from his point of view, the hash function was chosen completely randomly, the probability that the UMAC will not detect his modification is at most 1/|D|.
But this definition is not strong enough - if the possible messages are 0 and 1, D= and H consists of the identity operation and not, H is universal. But even if the digest is encrypted by modular addition, the attacker can change the message and the digest at the same time and the receiver wouldn't know the difference.
A class of hash functions H that is good to use will make it difficult for an attacker to guess the correct digest d of a fake message f after intercepting one message a with digest c. In other words,
\Prh[h(f)=d|h(a)=c]
needs to be very small, preferably 1/|D|.
It is easy to construct a class of hash functions when D is field. For example, if |D| is prime, all the operations are taken modulo |D|. The message a is then encoded as an n-dimensional vector over . H then has |D|n+1 members, each corresponding to an -dimensional vector over . If we let
h(a)=h0+
n | |
\sum | |
i=1 |
{hi}{ai}
we can use the rules of probabilities and combinatorics to prove that
\Prh[h(f)=d|h(a)=c]={1\over|D|}
If we properly encrypt all the digests (e.g. with a one-time pad), an attacker cannot learn anything from them and the same hash function can be used for all communication between the two parties. This may not be true for ECB encryption because it may be quite likely that two messages produce the same hash value. Then some kind of initialization vector should be used, which is often called the nonce. It has become common practice to set h0 = f(nonce), where f is also secret.
Notice that having massive amounts of computer power does not help the attacker at all. If the recipient limits the amount of forgeries it accepts (by sleeping whenever it detects one), |D| can be 232 or smaller.
The following C function generates a 24 bit UMAC. It assumes that secret
is a multiple of 24 bits, msg
is not longer than secret
and result
already contains the 24 secret bits e.g. f(nonce). nonce does not need to be contained in msg
.
void UHash24 (uchar *msg, uchar *secret, size_t len, uchar *result)
/* This is the same thing, but grouped up (generating better assembly and stuff). It is still bad and nobody has explained why it's strongly universal. */void UHash24Ex (uchar *msg, uchar *secret, size_t len, uchar *result)
Functions in the above unnamed strongly universal hash-function family uses n multiplies to compute a hash value.
The NH family halves the number of multiplications, which roughly translates to a two-fold speed-up in practice.[2] For speed, UMAC uses the NH hash-function family. NH is specifically designed to use SIMD instructions, and hence UMAC is the first MAC function optimized for SIMD.
The following hash family is
2-w
\operatorname{NH}K(M)=\left(
(n/2)-1 | |
\sum | |
i=0 |
((m2i+k2i)\bmod~2w) ⋅ ((m2i+1+k2i+1)\bmod~2w)\right)\bmod~22w
where
Practically, NH is done in unsigned integers. All multiplications are mod 2^w, all additions mod 2^w/2, and all inputs as are a vector of half-words (
w/2=32
\lceilk/2\rceil
k
RFC 4418 is an informational RFC that describes a wrapping of NH for UMAC. The overall UHASH ("Universal Hash Function") routine produces a variable length of tags, which corresponds to the number of iterations (and the total lengths of keys) needed in all three layers of its hashing. Several calls to an AES-based key derivation function is used to provide keys for all three keyed hashes.
In RFC 4418, NH is rearranged to take a form of: Y = 0 for (i = 0; i < t; i += 8) do
\begin{align} Y&=
Y+64((Mi+0+32Ki+0)*64(Mi+4+32Ki+4)) |
\\ Y&=
Y+64((Mi+1+32Ki+1)*64(Mi+5+32Ki+5)) |
\\ Y&=
Y+64((Mi+2+32Ki+2)*64(Mi+6+32Ki+6)) |
\\ Y&=
Y+64((Mi+3+32Ki+3)*64(Mi+7+32Ki+7)) |
\end{align}