Hash Function

How does a blockchain work with hash functions?

Imagine you want to retain and monitor changes to a file, for example a log file. Now, imagine you also want to verify an unbroken history of all changes ever made to the file. How can you proceed?

A well-understood solution uses cryptographic hash functions. Let us briefly introduce this concept in case you are unfamiliar with them.

The ideal cryptographic hash function has five main properties:

  • Deterministic: the same message always results in the same hash.
  • Fast: the hash value for any given message is computed quickly.
  • Resistant: it is not feasible to generate a message from its hash value except by trying all possible messages.
  • Uncorrelated: a small change to a message alters the hash value so extensively that the new value shows no relation to the old.
  • Collision-resistant: it is infeasible to find two different messages with the same hash value.

A hash can be used to prove an input exactly matches the original, but the original cannot be reconstructed from a hash. So, a hash function can demonstrate that a copy of the file is an authentic replica of the original in every detail.

How does a blockchain rely on hash functions?

Blockchain technology relies heavily on hash functions, as they help establish the so-called "chain of blocks".

All cryptographic hash functions fulfill several properties:

  • Converts an input (a.k.a. the message) into an output (a.k.a the hash).
  • Does the conversion in a reasonable amount of time.
  • It is practically impossible to re-generate the message out of the hash.
  • The tiniest change in the message changes the hash beyond recognition.
  • It is practically impossible to find two different messages with the same hash.

With such a function, you can:

  • Prove that you have a message without disclosing the content of the message, for instance:
    • To prove you know your password.
    • To prove you previously wrote a message.
  • Be confident that the message was not altered.
  • Index your messages.
A closer look at a hash function
🧑‍💻
This provides a convenient example, but MD5 is no longer considered a hard-to-crack hash function. Bitcoin uses SHA-256. Ethereum uses Keccak-256and Keccak-512.

MD5 is such a hash function:

Copy$ echo "The quick brown fox jumps over the lazy dog" | md5

Which prints:

Copy37c4b87edffc5d198ff5a185cee7ee09

On Linux, this is md5sum.

Now introduce a typo to see what happens (e.g. changing "jumps" to "jump"):

Copy$ echo "The quick brown fox jump over the lazy dog" | md5

Which prints:

Copy4ba496f4eec6ca17253cf8b7129e43be

Notice how the two hashes have nothing in common other than their length, but length is identical for all MD5 hashes so it reveals nothing about the input.

Want to see a hash function live?
  • You can see hashing in action to get the feel for it here: https://www.browserling.com/tools/all-hashes.
  • As you type into the text box, the hash updates automatically. Even a minuscule change to the input creates completely different hashes. You can also see that different hashing algorithms produce different output. Hash algorithms have evolved over time, often for security reasons. Try it out!