-------------------------------------------------------------------------------
TAR history

The tar command (tape-archive) was originally designed to archive a
number of files and complete directories to magnetic tape. However due to
its ability to archive directories, and its availability on EVERY unix
(and almost all non-unix) platform, it has become the standard archive
for files on the unix system.

The tar program is even available in the GNU source archives, and is commonly
known as "gtar" or gnu-tar. This program can be compiled for the DOS command
line which means that it is available for PCs too.  Gnu-tar is basically the
same as normal UNIX tar but has a lot more features (typical of the GNU
variants). For example gzip compression is built into the archiving program.

Due to its origins the default file to which archiving occurs is the tape
drive (typically "/dev/rmt8").  To use this command you need to substitute your
own filename, or `-' to represent stdin/stdout, otherwise it will attempt to
use the tape device which is probably not available for use by you. 

-------------------------------------------------------------------------------
Normal tar files (no compression)

  archive a directory "dir" to a file "directory.tar" 
      first change directory (cd) to just above the sub-directory
      then execute
        tar cvf subdir.tar dir

  de-archive the above tar file on
        tar xvf filename.tar

  look at the files in tar file
        tar tvf filename.tar

-------------------------------------------------------------------------------
Compressed Tar Files

The tar files DO NOT perform any compression on the data as more modern
day archives now available. As such archives are normally compressed to
form compressed tar files ".tar.Z" 

  compress existing tar file
      execute
        compress directory.tar

  create a compressed-tar of a sub-directory
      cd to just above the sub-directory
        tar cvf - subdir | compress > subdir.tar.Z

  de-archive a compressed-tar
        zcat filename.tar.Z | tar xvf -

  list the contents of a compressed-tar file
        zcat filename.tar.Z | tar tvf -

-------------------------------------------------------------------------------
Gzip'ed Tar Files

Today a new compression method is available called GZIP which is freely
available on ALL machines including PCs. It uses the Lempel-Ziv (LZ77)
compression which compresses much better than the normal unix compress
program.  Also the ungzip code will de-compress BOTH compressed and
gziped files.

The "gzip" source is available in the GNU FTP archive on most major software
sources around the world, and pre-compiled PC versions should also be
available in PC archives.

Gzip'ed tar files use the file extension  ".tar.gz" or  ".tgz".  At one
point ".tar.z" (with a little 'z') was also used as the suffix string but
was found to be too confusing with a normal compressed (".Z") tar archive.

  gzip existing tar file
        gzip directory.tar

  create a gzip'ed tar file of a sub-directory
      cd to just above the sub-directory
        tar cvf - subdir | gzip > subdir.tar.gz

  de-archive a compressed or gzip'ed tar archive
        gzip -dc filename.tar.gz | tar xvf -

  list the contents of a compressed or gzip'ed tar archive
        gzip -dc filename.tar.gz | tar tvf -

Gziped compression of a tar file can be improved by archiving all the files
of the same type together. EG: order the files archived so all GIF files are
together then all the text files, the html files the jpegs, etc..

Example

     tar cvf - `ls | sort -t. +1` | gzip > ../file.tar.gz

NOTE: this archives the files in the current directory. It is NOT recommended,
and your should always arcive a subdirectory.

-------------------------------------------------------------------------------
Tar TCSH aliases

I have the following CSH aliases in my .cshrc file to help handle
the creation of tar files.

#=======8<------CUT HERE--------axes/crowbars permitted---------------
# ...
# make things easier
alias a      alias
# ...
# set the paging command for `man' and `archive listing'
setenv PAGER less
# ...
#----------- Remote Aliases -----------
# These aliases are defined for use by remote commands
# or as part of a file manager or vi shell escape

  #-------- Archive Aliases ----------
  a uc         'uncompress'

  # compressed tar archives
  a ltar       'zcat <\!:1 | tar tvf - | ${PAGER}'
  a ftar       'zcat <\!:1 | tar xvfp - \!:2*'
  a ttar       'tar cvf - \!:1 | compress > \!:1.tar.Z'

  # uncompress gzip compressed file
  a gz         'gzip -v \!:1'
  a ugz        'gzip -dv \!:1'
  a ltgz       'gzip -dc \!:1 | tar tvf - | ${PAGER}'
  a ftgz       'gzip -dc \!:1 | tar xvf - \!:2*'
  a ttgz       'tar cvf - \!:1 | gzip > \!:1.tgz'

#=======8<------CUT HERE--------axes/crowbars permitted---------------

Example to archive a sub-directory

   ttgz  subdir 

To Tar Gziped archive OR   subdir  to  subdir.tgz
You can then delete the sub-directory.
NOTE: do NOT have any `/'s in the subdir argument.
To de-archive

   ftgz  subdir.tgz

and the subdir directory will be re-created in the current directory
the archive can then be removed or left as a backup.

===============================================================================
---------------     Other   Archiving   Programs        -----------------------
-------------------------------------------------------------------------------
All these archives were developed on NON-UNIX machines and will automatically
compress whole directories or individual files into/from a archive in one
step processing.  This is very important in a DOS like environment without
the use of pipes under UNIX.

-------------------------------------------------------------------------------
Zip Archive

Zip and alternative version PkZip is the `standard' PC archiving
programs. UNIX versions are available on kurango and gucis. They use
numerous compression algorithms selected for the best performance for
each individual file.

    Zip a sub-directory
      zip -r subdir.zip subdir

You can also auto remove that subdirectory by adding a -m option.

Advantages..
 * You can extract a single file from the archive without the program
   processing the whole archive to do so, as you need to for tar (or
   gzip'ed tar) archives

Disadvantages
 * Zip is the same compressor as Gzip but, compresses individual files.
   A Gzip'ed tar archive is compressed over all the files, making Gziped tar
   files smaller, and for a archive of GIF files this can be 10 times
   smaller than the same ZIP archive.

   See  https://antofthy.gitlab.io/info/misc/tgz_vs_zip.txt

-------------------------------------------------------------------------------
Bzip2 Compression

The newest compression algorithm in common use is  bzip2  (suffix .bz2).
this is very much like gzip (above) but compresses even better than
gzip.  It is however slower (EG: time-space trade-off)

Again this is only a compression program and NOT an archiver so tar files
will be most commonly bzip2'ed.  File suffixes are usually ".tar.bz"  or
".tar.bz2".

The command works just like gzip, so just use the gzip methods above
replacing "gzip" with "bzip2".

-------------------------------------------------------------------------------
LHarc and Zoo Archives

These commands were developed at the time of the old Amiga Micro Computers.
However a version of the command is available (if you can find it) for UNIX
machines.  It has the same advantages and disadvantages as the zip archive
above. However it has a much larger set of options and controls than most zip
commands.

For example unlike zip and tar archives you can do things like...
   * Pipe out a single file in the archive directly to standard output
     (EG: pipe out not de-archive!)
   * extract with OR without the archive directory structure
   * delete/replace particular archived files, repacking archive afterwards
     without needing a full de-archive and re-archive of everything.

Individual file selection is something very few archiving programs are capable
of.  I myself use this feature for my random quote message generator so I can
output a single file to stdout which is randomly selected from the archives
listing.  It also make it easy to add, delete and update the individual files
in the archive without needing to de-archive everything every time (as you
need to do with a gzip'ed tar archive).

-------------------------------------------------------------------------------
RSync and Hardlinked Snapshot Archives

For details on how a RSync or Hardlinked Snapshot Archive is made see
  https://antofthy.gitlab.io/info/apps/rsync_backup.txt

These archives are simply complete copies (snapshots) of some directory.  The
files are not compressed and look exactly as the original files they were
backed up from. The only differences if any will be in file metadata (owner,
permissions, and timestamps).

However the files that have not changed between the different snapshots are all
hardlinked to the single disk copy of the file.  That is the same file on disk
can appear in multiple 'snapshots', generally in the same file location in
each, but can have different names and paths. That is the file was moved or
renamed at some point between snapshots.

This works well with directories that don't change greatly (typical), as only
the changed files and directory structure use extra disk space.

The result is that for say an archive of 20 to 30 snapshots, could often use
less than double the space you would need to hold just one copy of the
directory being archived.  And yet look as if you have dozens of snapshot that
are arranged exactly as it did at the time the copys was made.

The biggest difference is like I mentioned before, the permissions and time
stamps of the files is likely to be that of the last snapshot made, and not
the permissions at the time the snap shot was made.  But that is generally not
regarded as critical.

Problems...

The biggest problem with such a snapshot archive is typically finding and
re-hardlinking files that have moved or renamed when a new shapshot is made.
But even that has solutions.

Also editing a file directly in such a snapshot directly will generally change
ALL the copies of that file, which is probably not desirable. As such editing
files in the archive is not advisable.   Also deleting unwanted files in the
archive may not delete the file from all the snapshots, and the file remains
on disk as part of another snapshot.  To fully delete, a large, or security
problematic file from the backup is to remove ALL the links to that file, no
matter their location or name.

You cannot store such hardlinked snapshots in the cloud, as these generally
do not allow hardlinking. In which case exact and every copy uses the same
amount of disk space at the original.  This system only works well on
a filesystem designed for UNIX or LINUX systems.

-------------------------------------------------------------------------------