------------------------------------------------------------------------------- Raw Binary to 'printable' ASCII encoding. ------------------------------------------------------------------------------- Base64 Encoding... Encodes a binary string into a printable form using a set of 64 characters such that 3 binary characters become 4 printable characters EG: each block of 3 binary characters of 8 bits is converted to 4 blocks of 6 bits each which is then mapped onto a character set of 64 characters. See "base_conversion.txt" for programs to do this. Which characters are used depends on exactly which type of Base64 encoding is being used, and the use of that encoding. For example, the data may be used in URL's or Filenames, which may require different characters that that used for some other purpose. Padding Character... Base64 may require 1 or 2 extra padding characters on the end to bring the final encoding blocks to 4. These padding characters are completely different to the normal set, the number of which then specified the actual length (1,2 or 3) of the original 3 character block. The characters cannot be part of the original 64 characters that are used for base 64 encoding, so in reality 65 characters. Alternatively either the length of the binary text, or the encoded text, can be used to determine the amount of padding that is needed. Assuming of course that other non-base64 characters are not passed though as-is. Space or just the end of string can be used for padding. But both should be permitted in that case. Special Characters (2) Most differences between various base64 implementations is in the choice of the 2 extra characters (beyond alpha-numerics), and the extra padding character used at end. Mime base64 encoding uses the characters '+' and '/', and '=' for padding. BUt these can cause problems for Filenames and URL's Similarly you may not want to use things like ' ' space or 'TAB' is generally a bad idea all around! '/' directory seperator in UNIX paths and URL's '+' space encoding in URL's '=' used for assignments '.' suffix or UNIX hidden files '\' character escapes ':' hostname or device delimiter '@' used to denote email and twitter names, There is also the long list of shell meta characters '*?|{}[]()~'`"#^' that may need special handling by shells, if the encoding is used as for filenames. That is the choice of these two characters can be very difficult depending on how the encoded binary string is to be stored, transmitted, or manipulated. (see Variations below) Character Order Usually the ordering is the same as ASCII encoding. That is the sequence of numbers, uppercase, lowercase, with the two special characters appended to that order. The most common base64 encoding (MIME) does this, but that is not always the case. The special characters may appear before or after the base64 encoding, as part of the files overall formatting (as it was in "uuencoding"). Case Sensitivity If you can NOT preserve case of the alpha-numerics, then you may need to use Base32 encoding. This merges uppercase and lowercase, and converts 5 binary characters to 8 printable characters. Of course that also means a larger number of padding characters may be needed. See below. ------------------------------------------------------------------------------- Variations of Base64 encoding... MIME or PEM (Privacy Enhanced Mail) A-Z a-z 0-9 + / (and = used as a pad) This is probably the most common form of Base64 encoding today, and typically what is meant when you say Base64 encoding. It is what is used for encoding SSL Certificates. However the special characters '/' and '+' make this selection difficult for use with URL's or as filenames. OpenPGP or Radix-64 As MIME but with the optional addition of a 24 bit CRC checksum, calculated in input data before encoding, and appended with an additional '=' separator. Modified Base64 for use in URLs and Filenames This replaces '+' and '/' with the characters '-' and '_' respectively. This makes it work well in almost all situations. No padding needed as the encoded string length is spaces or end of string. A-Z a-z 0-9 - _ Modified Base64 for XML A-Z a-z 0-9 _ : EncFS filename encoding The base64 using a modified character set to store binary encrypted filenames. Note that it uses the characters ',' and '-' for the 2 special characters needed, and these are prepended. The order of the major character alpha-numerical blocks has also been reversed. , - 0-9 A-Z a-z Filenames have a 2 byte (MAC-16) checksum added, before it is encrypted, then it isbase64 encoded, using the above characters. Source file from EncFS (binary to base64 indexes, then to ascii) https://github.com/vgough/encfs/blob/master/encfs/base64.cpp Detailed in comments around line 115 Password Crypt base64 (UNIX Shadow, and LDAP) LDAP and Unix shadow files use there own form of base64, which was developed for ths original DES encoded hash of password. It is not filename safe, and could not use the '#' and ':' characetrs as these are field seperators in the file format. . / 0-9 A-Z a-z http://lists.fedoraproject.org/pipermail/389-users/2009-January/008805.html It was developed for encoding the oldest DES password hashing, (crypt). The newer styles of hashing continued this legacy. UUencoding Was originally designed in the 1970's for sending binary data in mail and USENET news articles. It was for a long time the most well known base64 encoding, and used a completely different arrangement of characters, to later forms of base64. The character array was a direct sub-set of ASCII which made it very simple and fast to convert as you only needed to add character code 32 (space) to the 6 bit numbers. It also meant that it was case insensitive as no lowercase characters was used in the encoding, but contained almost every non-alpha-numeric character in the encoding. This made it problematic for other uses. The character sequence was then broken up into into lines of 45 binary (70 encoded) characters, and prefixed with a encoded count of the number of binary characters in each line (typically 'M' in long files, meaning 45). The final line will be a single '`' on it own (zero characters on line) The whole encoding wrapped by 'begin' and 'end' constructs, with a line consisting only of a single '`' added just before the 'end' line. The 'begin' line gives the filename and unix mode of the original file. Example... begin 644 cat.txt #0V%T ` end Note uuencode used spaces and other punctunation characters which caused a lot of problems with using the encoding, especially at end-of-lines, even in emails for which it was intended. (See "uuencode.txt") Commonly to avoid such problems, space was mapped to underscore '_'. Perl uuencoding perl -e 'print pack("u","Cat")' #0V%T XXencoding Is basically normal base64 but with the added wrapping lines like uuencode. MIME Base64 was developed from this format (without the wrapping lines) It is not filename or URL safe. A-Z a-z 0-9 + / Example begin-base64 644 cat.txt Q2F0 ==== ------------------------------------------------------------------------------- Other non-base64 binary encodings.... BinHex 4.0 Originally for old Tandy Electronics TRS-80, but later used by Apple Macs. Also base64 encoded but with many characters replaced so as to avoid confusion on printouts (eg characters '7' 'O' 'g' 'o' etc. were removed) !"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr A return should be inserted every 64 bytes. A Header in parenthesis is used to identify binhex files, and consisted of (This file must be converted with BinHex 4.0) Base32 Case insensitive encoding. 5 binary characters as 8 printable characters A-Z 0-7 and a pad of = (needing 6,4,3,1 such padding characters) Not many programs use base32 basenc --base32 basenc -d --base32 base32 base32 -d Base16 Case insensitive encoding for binary data 1 binary to 2 printable chars. Basically dubling the size of the data. Essentially it is just simply 'hex' encoding (using capitals) 0-9 A-F Actually many English words could also be used as a Base16 number. You can find such words using a dictionary search... egrep '^[a-f]{4,8}$' /usr/share/dict/words Example words... deaf decaf feed beef decade faded defaced facade deadbeef You generally want a string 8 characters to form a 32 bit word. See "base_conversion.txt" for examples and conversions. Quote Printable Not to be confused with 't-shirt' quotes! This is a format for encoding plain text that may have binary characters embeded in it. It is generally used as a method of MIME encoding for mail, This format replaces unprintable characters with hexadecimal codes of the form "=XX". Which of course means equal signs "=" must be encoded as "=3D". Not all characters need to be encoded, but can be. This for example is used in some news/mail messages and is part of the VMSG (SMS messages backup format) files, for the actual messages. Example decoding of 'Subject' lines in a VMSG format file... perl -pe 'if ( /^Subject;/ ) { s/:/\n/; s/=([0-9A-F][0-9A-F])/chr hex $1/eg; }' sms.vmsg Python Code... quopri — Encode and decode MIME quoted-printable data https://docs.python.org/3/library/quopri.html ModHex A variation on Base16, designed for use with standardized USB keyboard emulation. See "yubikey" in "base_conversion.txt" as well as the file "info/crypto/yubikey.txt" Base85 Another binary to text encoder (for btoa utility), using a much larger range of characters for encoding binary. 4 binary -> 5 printable characters Each 4 bytes is a 32 number (first byte most significant) and repeatedly divided by 85 to produce 5 'remainders' which are then encoded. Character set '!' to 'u' However as blocks of zero data is common, any all zero groups is replaced by a single character 'z' to represent 4 binary zeros. That can still be a lot of 'z' in the output. It was commonly used for file transfer using 7-bit modems, in protocols such as ZModem. The biggest problem it has is the same as uuencode. Certian characters having other special meanings such as '\' escapes, and quotes. ASCII85 Adobe (postscript) use of Base85, whitespace ignored, with a final delimiter of "~>" at the end of encoding. It was generally used to encode images in postscript files for printing. Z85 Is a alternative encoding that removes some of the special characters. Specifically for XML and storing encoded stings in code. ------------------------------------------------------------------------------- Programs available to handle. Also see "base_conversion.txt" Base64 Encoding... base64 # Part of Linux core utilities openssl enc -base64 # mimencode < file # gmime-uuencode --base64 # Gmime package perl -ne 'use MIME::Base64; print encode_base64($_);' file b64encode # personal perl script # These also added wrapping lines uuencode -m - # Base64, XXencoding (from sharutils package) uuencode file # uuencode a file mmencode - # uuencoding (from metamail or sharutils) Base64 Decoding base64 # Part of Linux core utilities openssl enc -base64 -d # no wrapper - multiple input/output methods gmime-uudecode --base64 perl -ne 'use MIME::Base64; print decode_base64($_);' mime_file b64decode # personal perl script uudecode # decode either uuencode or XXencoded file For Example remove and re-add the xxencode wrapper from base64 > echo passwd | uuencode -m - begin-base64 600 - cGFzc3dkCg== ==== > echo passwd | uuencode -m - | sed '1d;/^====$/d;' cGFzc3dkCg== > ( echo 'begin-base64 600 -'; echo cGFzc3dkCg==; echo '===='; ) | uudecode passwd > echo passwd | base64 cGFzc3dkCg== > echo cGFzc3dkCg== | base64 -d passwd ------------------------------------------------------------------------------- Perl and Base64 Perl provides the old UUencoding form of base64 encoding in the core pack() command. It provides a low level method of base64 encoding. Encoding.. to base64 without adding the padding characters. perl -e 'while(read(STDIN,$_,45)) { $_=substr(pack("u",$_),1); tr# -_`#A-Za-z0-9+/A#; print $_; }' file With any 'A' characters on the end converted to '=' characters instead. Decoding is harder as UUencode requires input split into 60 encoded characters, and a binary length character added. EG: and perl4 decoder (ignore non-b64 characters)... perl -ne ' tr#A-Za-z0-9+/\.\_##cd; tr#A-Za-z0-9+/# -_=#; print unpack("u", pack("c", 32+int(length($1)*6/8)). $1) while( s/(.{60}|.+)// ); ' mime_file The perl package "MIME::Base64" provides better and easier methods for conversion between base64 and Binary. For example... #!/usr/bin/perl use MIME::Base64; my $passwd = 'cGFzc3dkCg=='; my $decoded = &decode_base64($passwd); print "$decoded\n"; # Outputs: passwd (with a newline) -------------------------------------------------------------------------------