© 2024 Nuran Askarov.

Hashing Explained

Hashing is the process of transforming any given data (such as a file, message, etc.) into a fixed-size string of characters. For a practical example, in the command-line interface of Windows, you can calculate the hash of a file using “certutil”:

certutil -hashfile myfile.txt MD5

The output should look something like this.

MD5 hash of myfile.txt:
65a8e27d8879283831b664bd8b7f0ad4
CertUtil: -hashfile command completed successfully.

The output size of the MD5 hash is 128 bits, which is relatively short compared to other hashing functions like SHA-256 or SHA-512.

Applications of hashing

Hashing has a wide range of important applications.

Password Storage

User passwords are never stored as plaintext for security reasons. Instead, the password is hashed, and only the hash value is stored in the database.

For example, if a user sets their password to password123, the system will hash that string (e.g., using SHA-256) and store the resulting 64-character hash value in the database. When the user tries to log in, the system will hash the entered password and compare it to the stored hash; if they match, the user is authenticated.

File Integrity Checking

Hashing can be used to verify if a downloaded file is intact and has not been modified or corrupted.

For instance, when you download a software update, the provider will typically publish the SHA-256 hash of the installer file. You can then calculate the hash of the downloaded file and compare it to the published value; if they match, you can be confident the file was downloaded correctly without any tampering.

Digital Signatures

Hashes play a role in creating secure digital signatures, which are used for non-repudiation and data integrity purposes.

For example, if Bob wants to send Alise a legally binding contract, he can create a digital signature by hashing the contract details and encrypting the hash with his private key. Alise can then verify the signature by decrypting it with Bob’s public key and comparing the resulting hash to the hash of the contract details. This proves that Bob authorized the contract and that the details have not been altered.

Data Deduplication

Hashing is used to detect duplicate data during backup processes to avoid storing redundant copies.

Let’s say you have a backup system that periodically backs up your files. Instead of blindly copying all files every time, the backup system can use hashing to identify duplicate files. It calculates the hash value of each file and compares it to the hash values of files already stored in the backup. If a matching hash is found, it means the file already exists in the backup, so the system skips that file, saving storage space and backup time.

Cryptocurrency

The process of mining new bitcoins involves solving complex mathematical puzzles by repeatedly hashing data until a specific target hash value is found.

For example, bitcoin miners compete to find a hash value that meets certain criteria (e.g., starting with a certain number of leading zeros). They take the data from the new block (including transactions) and repeatedly hash it with different nonce values until a valid hash is found. The first miner to find a valid hash gets to add the new block to the blockchain and receives a reward in bitcoins. This process helps prevent double-spending and ensures the integrity of the blockchain.

Hashing Function

A hash function is a mathematical transformation that converts input data into a fixed-size output, known as a hash value. The key properties of a good hashing algorithm are that it is collision-resistant, meaning it is extremely unlikely for two different inputs to produce the same hash value; it is irreversible, so that given a hash value it is computationally infeasible to reconstruct the original input; it is sensitive to input changes, where even a tiny change to the input results in a completely different hash value; and it is deterministic, ensuring the same input always produces the same hash output.

Secure Hash Algorithm

The Secure Hash Algorithm, or SHA, is a set of cryptographic hash functions designed by the United States National Security Agency. SHA is one of the most widely used hashing algorithms and is often employed in digital signatures, secure communications protocols, and various security applications. Some common variants of the SHA algorithm include:

  • SHA-1: Produces a 160-bit hash value. While still widely used, SHA-1 is now considered insecure due to its vulnerability to collision attacks.
  • SHA-2: A family of hash functions that includes SHA-224, SHA-256, SHA-384, and SHA-512. These algorithms produce hash values of varying lengths (224, 256, 384, and 512 bits, respectively) and are considered more secure than SHA-1.
  • SHA-3: A newer hash function that was selected by NIST in 2015 after a public competition to design a more secure alternative to SHA-2. SHA-3 is based on a different mathematical structure than SHA-2 and is designed to be resistant to future attacks.

As you can see, hashing is widely used and is a fundamental concept. Thanks for reading! I hope this article was useful for you.