------------------------------------------------------------------------------- Hashing: to generate checksum or MAC ------------------------------------------------------------------------------- Hashing, what use is it... Hashing is a one way encryption that is used for many tasks... * Checksuming or MAC (Message Authentication Codes) Which is what the rest of this document is about. * Password Storage (one way authentication) See "passwd_hashing.txt" * Cryptographic Key Derivation (KDF or key stretching) See "key_derivation.txt" ------------------------------------------------------------------------------- Keyed (cryptographic) Hash Function OR MAC (Message Authentication Code) The generation of a smaller string that represents a large data block, such as a message or file. If that data is modified or added to in any way the hash of the data will be different. The hash or MAC can then be regarded as a 'fingerprint' or verification that the data was not changed in some specific way. Cryptographic hashes are designed to make it very hard to generate a specific hash from 'designed' input data. Also it should prevent you from changing the data in a specific way, then adding extra data to the end so as to generate the same original hash. ------------------------------------------------------------------------------- Hashed Database Keys... That is the checksum of the search string is used to 'locate' or 'lookup' the desired data from a large database. This is actually what most 'identification numbers' provide, a fast direct lookup of the data, so as to avoid a slower search of data indexes. However it should be noted that hash'ing is not a compression. You can not store 5000 bytes of data in a 64 byte 'hash'. Because of this hash'ed checksums cannot guarantee that two completely separate data strings does not generate the same hash. IE: Hashed Data Clashes. It does however make the chance of such a 'clash' happening, quite small. Adding a second 'hash' or other form of identification, will make it a practical impossibility to have a data clash. Examples... * When a user types a password, it is hashed using the same method (salt, iteration counts) and then compared to the stored hash, to see if the passwords actually match. That way the original password is not actually stored, or even known by the system the user is attempting to connect to. See "passwd_hashing.txt" * Determining if a one file is a copy of another file. For example as a dirty means of file comparing lots of files. Hashing can be used to identify possible matching files, but file data should then be compared byte-by-byte before declaring any two files really do match. See "key_derivation.txt" * Freenet uses hashes to identify files in its network (cloud) distributed filesystem. All files are referenced via URL-like references of two separate hashes to index, locate, and download that file to other users. That means of course without a link a file in that network becomes 'lost'. ------------------------------------------------------------------------------- Portible MD5 checksum The MD5 checksum is available on almost all machines, but not always in the same command, or with the same output style It however is no longer deems 'safe', but that is relative. echo -n hi | md5 764efa883dda1e11db47671c4a3bbd9e - echo -n hi | md5sum 764efa883dda1e11db47671c4a3bbd9e - echo -n hi | openssl dgst -md5 (stdin)= 764efa883dda1e11db47671c4a3bbd9e # later version output... MD5(stdin)= 764efa883dda1e11db47671c4a3bbd9e To automatically use any available command... checksum_file=$( ( openssl dgst -md5 || md5sum || md5 ) /dev/null | sed 's/^.*= *//; s/ *-$//' ) -------------------------------------------------------------------------------