Friday, February 3, 2023

Which one to use - SHA256sum vs md5sum

  • Both algorithms SHA256sum and md5sum generate a hash. 
  • md5sum generates 128bit hash vs 256 bit hash for sha256sum. In theory, that can reduce hash collisions i.e. two different files/entities being hashed generating the same hash value.
  • Which algorithm to use depends on the use case i.e. the entity/ file being hashed. We want to pick something that does not result in a hash collision.
  • For most cases, 132bit hash sum is more than good enough for supporting a large number of unique items with very low probability of collision [2]. e.g. with a 160bit hash with 2* 10^20 unique entities, the odds of collision are 1 in 100million which is great for most purposes.
  • Considering the limited number of unique objects of a particular type being compared, for most cases a 32bit hash sum suffices.
label: programming, utilities, linux, artifact comparison, SHA256sum vs md5sum, CRC32

Reference
[1] md5sum vs sha256sum - Good discussion, but I do not agree with the hash collision discussion here.
[2] Hash collision probabilities  - Nice algorithm for computing probability of hash collision.