------------------------------------------------------------------------------- TAR history The tar command (tape-archive) was originally designed to archive a number of files and complete directories to magnetic tape. However due to its ability to archive directories, and its availability on EVERY unix (and almost all non-unix) platform, it has become the standard archive for files on the unix system. The tar program is even available in the GNU source archives, and is commonly known as "gtar" or gnu-tar. This program can be compiled for the DOS command line which means that it is available for PCs too. Gnu-tar is basically the same as normal UNIX tar but has a lot more features (typical of the GNU variants). For example gzip compression is built into the archiving program. Due to its origins the default file to which archiving occurs is the tape drive (typically "/dev/rmt8"). To use this command you need to substitute your own filename, or `-' to represent stdin/stdout, otherwise it will attempt to use the tape device which is probably not available for use by you. ------------------------------------------------------------------------------- Normal tar files (no compression) archive a directory "dir" to a file "directory.tar" first change directory (cd) to just above the sub-directory then execute tar cvf subdir.tar dir de-archive the above tar file on tar xvf filename.tar look at the files in tar file tar tvf filename.tar ------------------------------------------------------------------------------- Compressed Tar Files The tar files DO NOT perform any compression on the data as more modern day archives now available. As such archives are normally compressed to form compressed tar files ".tar.Z" compress existing tar file execute compress directory.tar create a compressed-tar of a sub-directory cd to just above the sub-directory tar cvf - subdir | compress > subdir.tar.Z de-archive a compressed-tar zcat filename.tar.Z | tar xvf - list the contents of a compressed-tar file zcat filename.tar.Z | tar tvf - ------------------------------------------------------------------------------- Gzip'ed Tar Files Today a new compression method is available called GZIP which is freely available on ALL machines including PCs. It uses the Lempel-Ziv (LZ77) compression which compresses much better than the normal unix compress program. Also the ungzip code will de-compress BOTH compressed and gziped files. The "gzip" source is available in the GNU FTP archive on most major software sources around the world, and pre-compiled PC versions should also be available in PC archives. Gzip'ed tar files use the file extension ".tar.gz" or ".tgz". At one point ".tar.z" (with a little 'z') was also used as the suffix string but was found to be too confusing with a normal compressed (".Z") tar archive. gzip existing tar file gzip directory.tar create a gzip'ed tar file of a sub-directory cd to just above the sub-directory tar cvf - subdir | gzip > subdir.tar.gz de-archive a compressed or gzip'ed tar archive gzip -dc filename.tar.gz | tar xvf - list the contents of a compressed or gzip'ed tar archive gzip -dc filename.tar.gz | tar tvf - Gziped compression of a tar file can be improved by archiving all the files of the same type together. EG: order the files archived so all GIF files are together then all the text files, the html files the jpegs, etc.. Example tar cvf - `ls | sort -t. +1` | gzip > ../file.tar.gz NOTE: this archives the files in the current directory. It is NOT recommended, and your should always arcive a subdirectory. ------------------------------------------------------------------------------- Tar TCSH aliases I have the following CSH aliases in my .cshrc file to help handle the creation of tar files. #=======8<------CUT HERE--------axes/crowbars permitted--------------- # ... # make things easier alias a alias # ... # set the paging command for `man' and `archive listing' setenv PAGER less # ... #----------- Remote Aliases ----------- # These aliases are defined for use by remote commands # or as part of a file manager or vi shell escape #-------- Archive Aliases ---------- a uc 'uncompress' # compressed tar archives a ltar 'zcat <\!:1 | tar tvf - | ${PAGER}' a ftar 'zcat <\!:1 | tar xvfp - \!:2*' a ttar 'tar cvf - \!:1 | compress > \!:1.tar.Z' # uncompress gzip compressed file a gz 'gzip -v \!:1' a ugz 'gzip -dv \!:1' a ltgz 'gzip -dc \!:1 | tar tvf - | ${PAGER}' a ftgz 'gzip -dc \!:1 | tar xvf - \!:2*' a ttgz 'tar cvf - \!:1 | gzip > \!:1.tgz' #=======8<------CUT HERE--------axes/crowbars permitted--------------- Example to archive a sub-directory ttgz subdir To Tar Gziped archive OR subdir to subdir.tgz You can then delete the sub-directory. NOTE: do NOT have any `/'s in the subdir argument. To de-archive ftgz subdir.tgz and the subdir directory will be re-created in the current directory the archive can then be removed or left as a backup. =============================================================================== --------------- Other Archiving Programs ----------------------- ------------------------------------------------------------------------------- All these archives were developed on NON-UNIX machines and will automatically compress whole directories or individual files into/from a archive in one step processing. This is very important in a DOS like environment without the use of pipes under UNIX. ------------------------------------------------------------------------------- Zip Archive Zip and alternative version PkZip is the `standard' PC archiving programs. UNIX versions are available on kurango and gucis. They use numerous compression algorithms selected for the best performance for each individual file. Zip a sub-directory zip -r subdir.zip subdir You can also auto remove that subdirectory by adding a -m option. Advantages.. * You can extract a single file from the archive without the program processing the whole archive to do so, as you need to for tar (or gzip'ed tar) archives Disadvantages * Zip is the same compressor as Gzip but, compresses individual files. A Gzip'ed tar archive is compressed over all the files, making Gziped tar files smaller, and for a archive of GIF files this can be 10 times smaller than the same ZIP archive. See https://antofthy.gitlab.io/info/misc/tgz_vs_zip.txt ------------------------------------------------------------------------------- Bzip2 Compression The newest compression algorithm in common use is bzip2 (suffix .bz2). this is very much like gzip (above) but compresses even better than gzip. It is however slower (EG: time-space trade-off) Again this is only a compression program and NOT an archiver so tar files will be most commonly bzip2'ed. File suffixes are usually ".tar.bz" or ".tar.bz2". The command works just like gzip, so just use the gzip methods above replacing "gzip" with "bzip2". ------------------------------------------------------------------------------- LHarc and Zoo Archives These commands were developed at the time of the old Amiga Micro Computers. However a version of the command is available (if you can find it) for UNIX machines. It has the same advantages and disadvantages as the zip archive above. However it has a much larger set of options and controls than most zip commands. For example unlike zip and tar archives you can do things like... * Pipe out a single file in the archive directly to standard output (EG: pipe out not de-archive!) * extract with OR without the archive directory structure * delete/replace particular archived files, repacking archive afterwards without needing a full de-archive and re-archive of everything. Individual file selection is something very few archiving programs are capable of. I myself use this feature for my random quote message generator so I can output a single file to stdout which is randomly selected from the archives listing. It also make it easy to add, delete and update the individual files in the archive without needing to de-archive everything every time (as you need to do with a gzip'ed tar archive). ------------------------------------------------------------------------------- RSync and Hardlinked Snapshot Archives For details on how a RSync or Hardlinked Snapshot Archive is made see https://antofthy.gitlab.io/info/apps/rsync_backup.txt These archives are simply complete copies (snapshots) of some directory. The files are not compressed and look exactly as the original files they were backed up from. The only differences if any will be in file metadata (owner, permissions, and timestamps). However the files that have not changed between the different snapshots are all hardlinked to the single disk copy of the file. That is the same file on disk can appear in multiple 'snapshots', generally in the same file location in each, but can have different names and paths. That is the file was moved or renamed at some point between snapshots. This works well with directories that don't change greatly (typical), as only the changed files and directory structure use extra disk space. The result is that for say an archive of 20 to 30 snapshots, could often use less than double the space you would need to hold just one copy of the directory being archived. And yet look as if you have dozens of snapshot that are arranged exactly as it did at the time the copys was made. The biggest difference is like I mentioned before, the permissions and time stamps of the files is likely to be that of the last snapshot made, and not the permissions at the time the snap shot was made. But that is generally not regarded as critical. Problems... The biggest problem with such a snapshot archive is typically finding and re-hardlinking files that have moved or renamed when a new shapshot is made. But even that has solutions. Also editing a file directly in such a snapshot directly will generally change ALL the copies of that file, which is probably not desirable. As such editing files in the archive is not advisable. Also deleting unwanted files in the archive may not delete the file from all the snapshots, and the file remains on disk as part of another snapshot. To fully delete, a large, or security problematic file from the backup is to remove ALL the links to that file, no matter their location or name. You cannot store such hardlinked snapshots in the cloud, as these generally do not allow hardlinking. In which case exact and every copy uses the same amount of disk space at the original. This system only works well on a filesystem designed for UNIX or LINUX systems. -------------------------------------------------------------------------------