-------------------------------------------------------------------------------
Hashing: to generate checksum or MAC
-------------------------------------------------------------------------------
Hashing, what use is it...

Hashing is a one way encryption that is used for many tasks...

  * Checksuming or MAC (Message Authentication Codes)
    Which is what the rest of this document is about.

  * Password Storage (one way authentication)
    See "passwd_hashing.txt"

  * Cryptographic Key Derivation (KDF or key stretching)
    See "key_derivation.txt"

-------------------------------------------------------------------------------
Keyed (cryptographic) Hash Function
OR  MAC (Message Authentication Code)

The generation of a smaller string that represents a large data block, such as
a message or file.  If that data is modified or added to in any way the hash
of the data will be different.  The hash or MAC can then be regarded as
a 'fingerprint' or verification that the data was not changed in some specific
way.

Cryptographic hashes are designed to make it very hard to generate a specific
hash from 'designed' input data.  Also it should prevent you from changing the
data in a specific way, then adding extra data to the end so as to generate
the same original hash.

-------------------------------------------------------------------------------
Hashed Database Keys...

That is the checksum of the search string is used to 'locate' or 'lookup' the
desired data from a large database.

This is actually what most 'identification numbers' provide, a fast direct
lookup of the data, so as to avoid a slower search of data indexes.

However it should be noted that hash'ing is not a compression.  You can not
store 5000 bytes of data in a 64 byte 'hash'. Because of this hash'ed checksums
cannot guarantee that two completely separate data strings does not generate
the same hash.  IE: Hashed Data Clashes.

It does however make the chance of such a 'clash' happening, quite small.
Adding a second 'hash' or other form of identification, will make it
a practical impossibility to have a data clash.

Examples...

  * When a user types a password, it is hashed using the same method (salt,
    iteration counts) and then compared to the stored hash, to see if the
    passwords actually match.  That way the original password is not actually
    stored, or even known by the system the user is attempting to connect to.
    See "passwd_hashing.txt"

  * Determining if a one file is a copy of another file. For example as
    a dirty means of file comparing lots of files.  Hashing can be used to
    identify possible matching files, but file data should then be compared
    byte-by-byte before declaring any two files really do match.
    See "key_derivation.txt"

  * Freenet uses hashes to identify files in its network (cloud) distributed
    filesystem.  All files are referenced via URL-like references of two
    separate hashes to index, locate, and download that file to other users.
    That means of course without a link a file in that network becomes 'lost'.

-------------------------------------------------------------------------------
Portible MD5 checksum

The MD5 checksum is available on almost all machines,
but not always in the same command, or with the same output style
It however is no longer deems 'safe', but that is relative.

  echo -n hi | md5
  764efa883dda1e11db47671c4a3bbd9e  -

  echo -n hi | md5sum
  764efa883dda1e11db47671c4a3bbd9e  -

  echo -n hi | openssl dgst -md5
  (stdin)= 764efa883dda1e11db47671c4a3bbd9e
  # later version output...
  MD5(stdin)= 764efa883dda1e11db47671c4a3bbd9e

To automatically use any available command...

  checksum_file=$(
     ( openssl dgst -md5 || md5sum || md5 ) <file.txt 2>/dev/null |
         sed 's/^.*= *//; s/ *-$//' )

-------------------------------------------------------------------------------