>Linux mistakenly claims using a cryptographic hash function helps avoid non-malicious collisions, but this is not the case.
of course it does. It is using a different field (cryptography) as a CRC that "really, really won't collide" because there is a whole field (cryptography) that is completely busted if it does.
Let me put it this way. If I really, really need a random distribution of white noise, I might use a different field, cryptography, to provide it: because if the distribution is not effectively random and uniformly distributed, that field in some fundamental sense is broken: no information is supposed to make it into the ciphertext, it should be indistinguishable from white noise.
So encrypting your source of white noise for the sole purpose of making it statistically closer to noise is a perfectly valid choice.
Actually in your commment you said it yourself: in as little as four billion commits CRC64 expects to see a collision. That is tiny compared to the search space cryptographers work with.
If you look at the history of git there was originally no reason to use cryptographic functions except in the same way as the analogy I just made (for white noise): he borrowed a property from a different field from the one he was working in.
You seem to be operating under the same sort of "cryptographic hash functions are magic!" delusions as Linus.
CRC and SHA1 both produce a uniform distribution. SHA1 does not magically do this better because cryptography. The only things that make CRC and SHA1 are any different are:
- SHA1 produces a longer tag (of course CRC256 is a thing)
- SHA1 is hardened against preimage attacks
- SHA1 was intended to be secure against collision attacks (not anymore!)
SHA1, truncated to 32 or 64-bits, will produce a distribution just as uniform as CRC.
In a non-security setting, we can pick the size of the tag based on the rough number of objects we'd like to be able to store before we'd expect to see a collision (i.e. the birthday bound). If that number is ~4 billion, then CRC64 is sufficient.
of course it does. It is using a different field (cryptography) as a CRC that "really, really won't collide" because there is a whole field (cryptography) that is completely busted if it does.
Let me put it this way. If I really, really need a random distribution of white noise, I might use a different field, cryptography, to provide it: because if the distribution is not effectively random and uniformly distributed, that field in some fundamental sense is broken: no information is supposed to make it into the ciphertext, it should be indistinguishable from white noise.
So encrypting your source of white noise for the sole purpose of making it statistically closer to noise is a perfectly valid choice.
Actually in your commment you said it yourself: in as little as four billion commits CRC64 expects to see a collision. That is tiny compared to the search space cryptographers work with.
If you look at the history of git there was originally no reason to use cryptographic functions except in the same way as the analogy I just made (for white noise): he borrowed a property from a different field from the one he was working in.