File verification is the process of using an algorithm for verifying the integrity of a computer file. This can be done by comparing two files bit-by-bit, but requires two copies of the same file, and may miss systematic corruptions which might occur to both files. A more popular approach is to generate a hash of the copied file and comparing that to the hash of the original file.
File integrity can be compromised, usually referred to as the file becoming corrupted. A file can become corrupted by a variety of ways: faulty storage media, errors in transmission, write errors during copying or moving, software bugs, and so on.
Hash-based verification ensures that a file has not been corrupted by comparing the file’s hash value to a previously calculated value. If these values match, the file is presumed to be unmodified. Due to the nature of hash functions, hash collisions may result in false positives, but the likelihood of collisions is often negligible with random corruption.
A checksum file is a small file that contains the checksums of other files.
The “.sha1” file extension indicates a checksum file containing 160-bit SHA-1 hashes in sha1sum format. The “.md5” file extension, or a file named “MD5SUMS”, indicates a checksum file containing 128-bit MD5 hashes in md5sum format. The “.sfv” file extension indicates a checksum file containing 32-bit CRC32 checksums in simple file verification format. The “crc.list” file indicates a checksum file containing 32-bit CRC checksums in brik format.
As of 2012, best practice recommendations is to use SHA-2 or SHA-3 to generate new file integrity digests; and to accept MD5 and SHA1 digests for backward compatibility if stronger digests are not available. The theoretically weaker SHA1, the weaker MD5, or much weaker CRC were previously commonly used for file integrity checks.
How to check file hash to validate integrity #
It is a good common practice to always check file hash for verifying integrity after receiving any file from any sender/network.
There are two important Powershell cmdlets which allow file integrity checks. Powershell core is availbale in both Windows and Linux.
The Get-FileIntegrity cmdlet gets integrity information for a file on a Resilient File System (ReFS) volume. An example is shown below:
Get-Item -Path 'H:\Temp\*' | Get-FileIntegrity
The Get-FileHash cmdlet computes the hash value for a file by using a specified hash algorithm. A hash value is a unique value that corresponds to the content of the file. Rather than identifying the contents of a file by its file name, extension, or other designation, a hash assigns a unique value to the contents of a file. File names and extensions can be changed without altering the content of the file, and without changing the hash value. Similarly, the file’s content can be changed without changing the name or extension. However, changing even a single character in the contents of a file changes the hash value of the file.
The purpose of hash values is to provide a cryptographically-secure way to verify that the contents of a file have not been changed. While some hash algorithms, including MD5 and SHA1, are no longer considered secure against attack, the goal of a secure hash algorithm is to render it impossible to change the contents of a file — either by accident, or by malicious or unauthorized attempt — and maintain the same hash value. You can also use hash values to determine if two different files have exactly the same content. If the hash values of two files are identical, the contents of the files are also identical.
By default, the
Get-FileHash cmdlet uses the SHA256 algorithm, although any hash algorithm that is supported by the target operating system can be used. An example is shown below:
Get-FileHash C:\Users\user1\Downloads\Contoso8_1_ENT.iso -Algorithm SHA384 | Format-List