What does it mean if a hashing algorithm creates the same hash for two different downloads?

A cryptographic hash can be used for many different tasks. In this video, you’ll learn about hashing, collisions, digital signatures, and more.

<< Previous Video: Symmetric and Asymmetric Encryption Next: Randomizing Cryptography >>


A cryptographic hash allows you to take any amount of data– it can be a small bit of text, or it can be an entire book– and you can represent that bit of data as a short string of text. We refer to this short string of hashed text as a message digest.

A hash is not an encrypted version of the original text. It’s really a one-way trip. There’s no way to recover the original text by simply looking at the hashed value. It’s because of this unique characteristic that we commonly use hashes to store passwords.

That way, we can compare a stored hash against another hash that’s given to us later, but we’ll still have no idea what the original password was from the user. We might also use hashes to confirm that a file that we’ve downloaded is identical to the original version of that file. And we use hashing during the creation of a digital signature that allows us to provide authentication, non-repudiation, and integrity to a particular document.

A fundamental characteristic of hashing algorithms is that two different messages will not have the exact same hash. We call this a hash collision, and it’s important that we don’t have two documents that somehow manage to represent exactly the same hash. Let’s look at the process of creating a hash. For this, we’re going to use the SHA256 hashing algorithm. This creates a hash that is 256 bits in length, and we usually represent this as something human-readable by using 64 hexadecimal characters.

For this hash, we’ll take a simple sentence which says, “my name is Professor Messer” with a period. If we were to perform the SHA256 hash of that text, we receive this particular hash value. We can also perform the same function with another bit of text. This one looks almost identical. But instead of having a period at the end, I put an exclamation mark, and I performed exactly the same SHA256 hash.

And you can see that the hash value for these is completely different, even though the only thing that is different between the two messages is that single piece of information at the end– a period or an exclamation mark. If we were to compare these two, you can see that they are very different from each other, even though the original message was very similar. The only difference was really the punctuation at the end. But because this hashing algorithm is designed to create very different hashes, we can see that there is a remarkable difference between one hash and the other.

This hashing method can be used with any type of message. It could be a large graphics file, it could be a disk image, or it can be a sentence like the one I used in the previous slide. We use that hashing algorithm to create that fixed size string, which is our message digest. This message digest– the hash that is created from this hashing algorithm– should be unique. If we have two different kinds of input, they should never create exactly the same hash.

If we do have an instance where that occurs, it’s a collision. We had an example of this with the MD5 hashing algorithm, where a collision was found in 1996. And because of that, we don’t tend to use MD5 to do our hashing these days.

Here is an example of this collision that occurred with MD5. These are two different pieces of information. They’re almost identical, except for the items that are highlighted in red. Those are the only differences between these two messages. But if I use MD5 to hash these messages, I get exactly the same MD5 hash, and that is our collision.

One common use of hashing is to verify a file that you may have downloaded. You might go to a website and find a number of files posted that have along with them a particular hash value. When you download this file, you can perform the same hash locally on your machine and then compare your results of that hash to the version that is posted on the website to verify that the file you’ve received is identical to the one that’s posted on the website.

And as we mentioned earlier, it’s common to use hashes when you’re storing passwords. Instead of storing the original plaintext password, you can combine the password with a little bit of extra information that’s called the salt, hash that information, and store that information instead of something that could be seen by anyone. During the authentication process, the user would input their password. The same hashing function would occur, and that hash would be compared to the one that is stored. If those two match, then you know the authentication was successful, and you were able to do that without storing the user’s plaintext password.

We can also combine hashes with asymmetric encryption to create digital signatures. Digital signatures can be used to provide integrity so you can be sure that the message that you’ve received is exactly the message that was originally sent. Digital signatures can also provide authentication so you know that the message that you received was really sent by the sender. And digital signatures can provide non-repudiation, which means that we know that the digital signature was not faked or provided by a third party.

To create a digital signature, we need to be able to sign the paper with something that no one else has. And with asymmetric encryption, we know that our private key is something that no one else has access to. That means then, obviously, that no one but us can provide that digital signature. And since we’re using asymmetric encryption, the recipient of this digitally signed document can verify my signature by verifying it with my public key.

Visually, here’s how this would work. We would start with Alice, who wants to digitally sign a document that she’s sending to Bob. This plaintext document says, you’re hired, Bob, and she’ll hash this document to create a hash of that plaintext. Alice will then encrypt this hash with her private key, and obviously, no one else would have access to Alice’s private key but Alice.

The result is this digital signature. This is not an encrypted version of the plaintext. In fact, it’s common to create a digital signature and simply attach the digital signature to the original plaintext that you’re sending to the third party. And that’s exactly what Alice has done here. She’s taken the original plaintext with the digital signature and sent it to Bob. Bob will take the digital signature off of the plaintext, and he will decrypt it using Alice’s public key.

That will provide the original hash that Alice originally created. Now Bob can use exactly the same plaintext, perform the same hashing function that Alice originally did, and come up with a hash of that plaintext. Now Bob’s able to compare the hash of the plaintext that was created from the digital signature and the hash of the plaintext that he performed himself. And if those two match, then he knows that this digital signature is valid.

What is it called when a hashing algorithm creates the same hash from two different messages?

An attack on a hashing system that attempts to send two different messages with the same hash function, causing a collision. LANMAN hash.

When two different messages produce the same hash value what has occurred?

If two different messages or files produce the same hashing digest, then a collision has occurred.

What happens if the hash functions used in both phases of the algorithm are the same?

Unique, because no two hash values are ever the same for two different pieces of data. If two hashes are found to be the same for two different pieces of data, it's called a 'hash collision' and that algorithm becomes useless.

Which type of algorithm is used when two different keys are used in encryption?

Asymmetric algorithms are better known as public/private-key. This encryption is best used between two parties who have no prior knowledge of each other but want to exchange data securely. Unlike symmetric algorithms, asymmetric algorithms use two different cryptographic keys to encrypt and decrypt plain text.