-------------------------------------------------------------------------------
Find Command

For details of the order in which UNIX find traverses a directory tree
and processes files, see the document  "info/perl/dir_traversal_notes"

It pays to things that ALL arguments have a "-a" (and) between them
and to remember that "-a" binds tighter than "-o".

And remember there is 'lots ways to skin a cat'.

You may also like to look at "tree" (standard linux utility)
which outputs a recursive directory listing.

-------------------------------------------------------------------------------
General Find Functions
  directories()  { find "$@" -type d -print; }              # set directories
  allfiles()     { find "$@" -type f -print; }              # all plain files
  executables()  { find "$@" -type f -perm -100 -print; }   # executables only
  datafiles()    { find "$@" -type f ! -perm -100 -print; } # non executables

  # Example usage
  d=public_html
  x=-r
  c=
  if [ -d "$d" ]; then
    directories "$d" | xargs $x chmod $c 755   # dirs  - accessable
    datafiles   "$d" | xargs $x chmod $c 644   # data  - readable
    executables "$d" | xargs $x chmod $c 755   # exec  - executable
  fi

Find files and directories not globally readable
  find . \! -perm -444 -print

-------------------------------------------------------------------------------
Top-level directory only

Traditional method is to use the directory as an argument

  find * -prune  ....

That however ignores 'dot' files
This includes them

  find ./* -prune ....

Another way is to prune at the right level

  find . \! -name . -prune

A modern way is to use the tree depth options (to do top-level only)

  find * -maxdepth 0          # does not include 'dot' files

  find . -maxdepth 1          # don't go deeper than one down

  find . -mindepth 1 -prune   # go one level down then prune any deeper

-------------------------------------------------------------------------------
Exclude a sub-directory

  find {path} {conditions to prune} -prune -o \
            {your usual conditions} -print

Normally you would leave out the "-print" (or "-print0") but you will need
it when you use "-prune" or it will go wrong.

Example...

  find . -name .snapshot -prune -o -name '*.foo' -print

WARNING: This will also ignore ".snapshot" files/directorys found
in sub-sub-directory, not just the top level ".snapshot" directory.

You can also use "NOT -path" logic. BUT that will still recurse into that
sub-directory, adn then ignore all the files it finds in that sub-directory.
BUT That can take time if that sub-directory has a LOT of files.
AND it will still report any access permissions in that sub-directory!

  find . \! -path ./.snapshot/*  -name '*.foo'

---
More exacting...

Add "-path" (or the equivelent "-wholename") for the specific sub-directory.

  find . -path ./.snapshot -prune -o -name '*.foo' -print

For non-GNU find (without "-path") you can use "-exec test" ...

  find . -name .snapshot -exec test '{}' = './.snapshot' \; -prune \
      -o -name '*.foo' -print

DO NOT ignore type of the filename being pruned!
Note this only matters if second condition may want to deal with
files of the same name as the directory to be excluded.

I also recommend the use of parenthesis for clarity of the arguments,
even though precedence would mean the same thing.

   find . \( -type d -path ./.snapshot -prune \) \
       -o \( -type f -print \)

---
Multiple Directory Exclude

   find . -type d \( -name media -o -name images -o -name backups \) -prune \
       -o -print

OR...

   find . -name media -prune \
       -o -name images -prune \
       -o -name backups -prune \
       -o -print

-------------------------------------------------------------------------------
Exclude a file suffix

  find . -type f \! -name '*.bz2' -print0 | xargs -0r bzip2 -v

-------------------------------------------------------------------------------
Find and Delete broken symbolic links

  find -L /app/nagios_scripts -type l -delete

This works as the '-L' causes find to TRY and follow symbolic links.
Broken links will fail and as such the '-type l' will then be true.
It will never be true if the symbolic link was followed as find will no
longer be looking at a symbolic link.

The exception is if a good symlink points to a broken symlink. The broken
one will in that case get removed, leaving the good symlink broken.

See also linux script 'symlinks', and my own script 'symlink'

-------------------------------------------------------------------------------
Run a command in each directory

  find . -type d -print0 | xargs -0 -n1 {command}

EG:
  find . -type d -print0 | xargs -0 -n1 echo

This could be done in parellel too!

---
Run a seperate find in parallel on each top level sub-directory,
It is faster..

  find . -mindepth 1 -type d -prune -print0 |
     xargs -0 -P0 -i find {} -name '*.dvi' -print

---
Run a seperate find in every sub-directory

  find . -type d -print0 |
    xargs -0 -P0 -i find {} -maxdepth 1 -name '*.dvi' -print

Can be slower than the above, due to process launching, but...
you can have a list of excluded directories

  find . -type d -print |
    fgrep -x -v -f exclude_list |
      xargs -0 -n1 ls

Note the -x to ensure it matches the whole line...
Which isn't very convenient.  Something better is needed

-------------------------------------------------------------------------------
With parellel execution (recursive compressions)

  find . -type f \! -name '*.bz2' -print0 | xargs -0r -n1 -P3 bzip2 -v

  pkill -USR1 xargs    # increase parallelism
  pkill -USR2 xargs    # decrease parallelism


or using GNU-parallel
No longer recommended due to 'citation requirement' making it less portible.

  find . -type f \! -name '*.bz2' -print0 | parallel -q -0 -j3  bzip2 -v

  pkill -USR1 parallel  # get parallel to list the current jobs running


-------------------------------------------------------------------------------
Add a suffix to a filename

This fails, as the argument is needed twice!

   find /path -type f -exec command {}.suffix \;

Find only expands "{}" as a seperate space separated argment
and doesn't recognise "{}.suffix" as a string substution.

Using "-exec" option..

  find /path -type f -exec sh -c 'command $1.suffix' -- {} \;

WARNING: find -exec will pause while command executes,
rather than continue to search for the next match (pipelined)

Using "sed" piped into shell

  find /path -type d -print | sed 's:.*:command &.suffix:' | sh

The "sed" solution is probably the most versatile. BUT it is dangerious when
malicious or uncontrolled filenames are posible.

Using "xargs"...

  find /path -type f -print0 | xargs -0r -I{} command {}.suffix


This is better, though not as general as using a shell

For example...
  # Bash file suffix replacement...
  for name in *.old; do mv -vn "$name" "${name%%.old}.new"; done

Or you can use a special purpose command...
  # mv_perm - file renaming script
  find /path -type f -print0 | xargs -0r mv_perl 's/\.old$/.new/'

-------------------------------------------------------------------------------
Find-Grep

Grep will NOT output a filename if only one argument is provided.
As such if you use "xargs" to run "grep" add a /dev/null to ensure
two filenames are provided.

  find /path -name "*.txt" | xargs grep "string" /dev/null

Gnu-grep can use a -H option to force filename output

  find /path -name "*.txt" | xargs grep -H "string"

-------------------------------------------------------------------------------
Empty and Old Directories

These rely on GNU-find's
  '-quit'   quit on first match
  '-empty'  empty directory


# clean out any old files - may leave empty directories
#
# NOTE you can not do the same technique for directories,
# as directories become become 'modified' when a file is deleted.
#
find . -type f -mtime +180 -print0 | xargs -0r rm


#
# Clean old directories as a whole...
#
# remove top-level directories with ALL files older than 6 months (180 days)
for DIR in *; do
  if [[ $(find ${DIR} -type f -mtime -180 -print -quit) == "" ]]; then
    echo rm -r "${DIR}"
  fi
done


#
# clean empty directories (if only files are removed)
#
# Just remove empty directories (depth first)
find . -depth -type d -empty -printf "rmdir %p\n"


# find top-level directories with no files in any sub-directory
for DIR in *; do
  if [[ $(find ${DIR} -type f -print -quit) == "" ]]; then
    echo rm -r "${DIR}"
  fi
done

See also https://antofthy.gitlab.io/info/shell/file.txt
  "Is a directory empty?"

-------------------------------------------------------------------------------
Xargs and Parallel

Xargs primary goal was to group filenames into batches before giving them to
a command.  However GNU-xargs can now also do parallel processing (see above)

However caution is needed to ensure correct quoting og the arguments.
Typically this is done by replacing newlines with NULL's in the input

  find ... -print 0 | xargs -0r ...

Parallel, is a drop in "xargs" replacement, but can do "find" itself or run
a list of shell commands.  It makes use of multi-processors to run the batched
commands that "xargs" would normally generate, and then run them in parallel.
It is also a perl script using only standard perl libraries, so no archeture
specific binary is needed.

  find ... -print 0 | parallel -0q ...

---
Poor mans "xargs" (to collect groups of filenames) using "fmt"

  ls | fmt |\
    while read args; do
      grep "some words" $args
    done

The 'fmt' does the collection of arguments into lines whcih are then read by
the shell loop to run 'batched commands'.   As with xargs it will have quoting
problems.

Also note that "xargs" can also be used to create a poorman's "fmt" command
see  "Word-Wrapping or Text Formatting" in "info/shell/general.txt"

-------------------------------------------------------------------------------