Anyone with an interest in bitcoin will have heard the phrase ‘cryptographic hash function’ at some time or other. But what exactly does it mean, and how is it connected to cryptocurrency?
Hash functions are an essential part of, not only of the bitcoin protocol, but of information security as a whole.
In the following article we’ll take a look at some simple examples of how they work, with a simple demonstration, too.
What’s a hash function?
In the abstract, a hash function is a mathematical process that takes input data of any size, performs an operation on it, and returns output data of a fixed size.
In a more concrete example, this can be used to take a sequence of letters of any length as input – what we call a string – and return a sequence of letters of a fixed length. Whether the input string is a single letter, a word, a sentence, or an entire novel, the output – called the digest – will always be the same length.
A common use of this kind of hash function is to store passwords.
When you create a user account with any web service which requires a password, the password is run through a hash function, and the hash digest of the message is stored. When you type in your password to log in, the same hash function is run on the word you’ve entered, and the server checks whether the result matches the stored digest.
This means that if a hacker is able to access the database containing the stored hashes, they will not be able to immediately compromise all user accounts because there is no easy way to find the password which produced any given hash.
Simple hash functions in Python
You can experiment with hash values using Python, a programming language installed on Mac and Linux operating systems by default. (This tutorial will assume you’re using some version of either OS X or Linux, as using Python on Windows is more complicated.)
First, open a terminal, type
python and hit ENTER.
This will put you into the Python REPL, an environment where you can try out Python commands directly as opposed to writing a programme in a separate file.
Then, type the following, pressing ENTER after each line, and TAB where marked:
import hashlib def hash(mystring): [TAB] hash_object = hashlib.md5(mystring.encode()) [TAB] print(hash_object.hexdigest()) [ENTER]
You have now created a function,
hash(), which will calculate and print out the hash value for a given string using the MD5 hashing algorithm. To run it, put a string in between the parentheses in quotation marks, eg:
And press ENTER to see the hash digest of that string.
You will see that calling the hash function on the same string will always generate the same hash, but adding or changing one character will generate a completely different hash value:
hash("CoinDesk rocks") => 7ae26e64679abd1e66cfe1e9b93a9e85 hash("CoinDesk rocks!") => 6b1f6fde5ae60b2fe1bfe50677434c88
Hash functions in bitcoin
In the bitcoin protocol, hash functions are part of the block hashing algorithm which is used to write new transactions into the blockchain through the mining process.
In bitcoin mining, the inputs for the function are all of the most recent, not-yet-confirmed transactions (along with some additional inputs relating to the timestamp and a reference to the previous block).
In the code example above, we’ve already seen that changing a small part of the input for a hash function results in a completely different output. This property is crucial to the ‘proof of work’ algorithm involved in mining: to successfully ‘solve’ a block, miners try to combine all of the inputs with their own arbitrary piece of input data in such a way that the resulting hash starts with a certain number of zeroes.
As a basic demonstration, we could try ‘mining’ with our Python hash function by manually adding exclamation points after “CoinDesk rocks!” until we find a hash that starts with a single zero.
>>> hash("CoinDesk rocks!!") 66925f1da83c54354da73d81e013974d >>> hash("CoinDesk rocks!!!") c8de96b4cf781a6373766c668ceac0f0 >>> hash("CoinDesk rocks!!!!") 9ea367cea6a2cc4a6f5a1d9a334d0d9e >>> hash("CoinDesk rocks!!!!!") b8d43387d98f035e2f0ac49740a5af38 >>> hash("CoinDesk rocks!!!!!!") 0fe46518541f4739613b9ce29ecea6b6 => SOLVED!
Of course, solving the hash for a bitcoin block – which at the time of writing must start with 18 zeros – requires an extremely large amount of computation (and so the combined processing power of all the computers in the network still takes approximately 10 minutes to solve a block).
It’s the need for this large amount of processing power that means new bitcoins get mined over a long period of time, not all at once.
In order to earn bitcoins through mining, you need to put in the huge amount of work necessary to solve a block – and by earning that reward, you’re locking in all of the new transactions into a block, which is added to the permanent record of all previous transactions: the blockchain