-------------------------------------------------------------------------------
Interpreted Text File Handling

The "SheBang" file magic

Polygot Scripts (script can run by two different interpretors)

-------------------------------------------------------------------------------
Example of how it works...

For a more offical explaination see Wikipedia, SheBang, Purpose
  http://en.wikipedia.org/wiki/Shebang_%28Unix%29#Purpose

More info
  http://www.in-ulm.de/~mascheck/various/shebang/
  http://homepages.cwi.nl/~aeb/std/hashexclam-1.html

----
This is my 'hands on' explaination....

Create a script called "echo_script"

  #!/usr/bin/echo Running

Now when you run the script

  ./echo_script  arg1 arg2

You get the output

   Running ./echo_script arg1 arg2

In otherwords the system executed the command

   /usr/bin/echo Running ./echo_script arg1 arg1

That is to say the program was run with the arguments...
  * ONE interpreter script argument "Running"
  * The filename of the script that is to be interpreted
  * User supplied arguments "arg1" "arg2"

--------------------
Internal Cavats...

What really happens is that the kernel function execve() sees the special
"#!" (shebang) magic at the start of the file, and runs the command as shown.

The script interpretor does not look along the users command PATH, the
interpreter program must be a hard coded path.

Only ONE argument may follow the command (before the scripts filename) on the
actual shebang line (see below) :-(

Quotes and any other syntax is completely ignored.

Whether the filename argument (to the script to be interpreted) is a relative
path or a absolute path, depends on the calling shell (its value is passed to
execve() syscall) and how the shell resolved the path to the script file.

Typically it is either what the user typed (relative to current directory) or
the full path from the shells internal hash of command locations.  It is the
calling shell that decides whether it remains relative, or a full path is used.

Also on most systems the 'interpreter' MUST be an actual C program, and not
another 'script' of some kind (that is SheBang is typically not recursive!)
Linux however does allow a script to interprete scripts.

Do not confuse this with the shell itself seeing a kernel failure and silently
taking over the interpretation.  BASH can do this, and can even do so on
a "noexec" mount!  That is it is interpreting the script, not executing it!
But that is symantics, and made be a classed as a bug, depending on who you
talk to.

-------------------------------------------------------------------------------
Find interpreter in users command PATH...

You can hardcode the interpritor programs location, like this...

  #!/opt/bin/perl

OR use a secondary program, like "env" to search for the interpretor
according to the PATH environment variable ...

  #!/usr/bin/env perl

This uses the "env" command to actually search for the interpreter, so that
its location is not fixed, but can be anywhere along the users command PATH.
That is the interpreters location is not hard coded into the script making it
more system independant, but less secure in the exact interpretor that is run.

In this form of SheBang, the interpreter can be another 'script', and not
a compiled program, as the "env" command makes a seperate kernel exec() call is
being performed.

WARNING: the "env" method does not work for windows 'CR' terminated lines such
as in cygwin (pc) environments.  Basically the return character will then be
added to the argument, and interpreter will not be found.

Note that there is nothing special about using the "env" command.
You can also use other simular commands such as

  /usr/bin/nice

Though it's path is more system type dependant, and has other effects (make
the interpretor run at lower priority).  Remember you cannot provide a second
argument so you have no real control over the 'nice' behaviour. (see next)

-------------------------------------------------------------------------------
She-Bang only allows ONE argument!       **** IMPORTANT ****

As such this works
  #!/usr/bin/sed -f

But using this with 'env' does not
  #!/usr/bin/env sed -f
with the error
  /usr/bin/env: sed -f: No such file or directory

The problem is execve() only allows one argument, The above 'env' needs
two arguments and thus fails.

What happens in all the shebang arguments were merged into a single argument
  "sed -f"
And of course no program is found called "sed -f"

For a solution see "Polyglot Scripts", below.

Most systems (based on BSD) do this, including...
  Linux,  NetBSD,  FreeBSD

Other systems...
  Solaris: silently ignores the second and later shebang arguments
  MacOS X: accepts more than one argument

The very first UNIX (by Denis Ritchie) did not allow any argument at all.
BSD (Robert Elz) added a single argument, (no quoting mechanism needed).
POSIX does not specify what should be done (thus the variations).

NOTE: Perl will itself parses the shebang line again, so as to to locate and
interpret any extra arguments that may be present.  However it can not handle
"env" style arguments.  But a different mechanism (see below) can be used.

Most languages also allow you to specify interpretor options later
For example, in shell you can set options using 'set' (EG: "set -v")

-------------------------------------------------------------------------------
Polyglot Scripts

Scripts which can be read by two different interpretors.
Typically so one (the shell) will locate and call the other, after preparations
or with additional arguments.

These rely on finding some code that some shell (typically Bourne-sh) and the
final interpreter can both parse and execute without error, but with the result
that the shell exec's the real interpreter, regardless of its location, or
number of arguments it needs.

----
Perl

Both Perl and Shell understand "eval" and "if', but perl uses ';'
for command termination, and ignores not end-of-lines as end-of-command.
As such the following will have shell perform the "exec",
while perl ignores it as it has a "if 0" condition, which the shell ignores.

The result is the shell is replaced by "perl", regardless of is location on the
shell's command PATH, with whatever extra options you need.

  #!/bin/sh
  eval 'exec perl -x -wS $0 ${1+"$@"}'
    if 0;
  # ...

Note that "perl" does re-parses the shebang line to handle multiple arguments
anyway, so this also works, but still needs the full path.

  #!/bin/perl -x -wS
  # ...

----
Python

The shell script is completely hidden in a python ''' string,
while the shell ignores the weird "''':'" expression.

Here it is used to have the script run using "python2" if available
otherwise use "python" which many systems have pointing to "python3".

  #!/bin/sh
  ''':'
  # This is run in shell...
  if type python2 >/dev/null 2>&1; then
    exec python2 "$0" "$@"
  else
    exec python "$0" "$@"
  fi
  '''
  # The real Python script starts here
  def ...

----
tcl/wish

Line continuation in tcl works for comments, but shells do not allow that.

  #!/bin/sh
  # the next line restarts using wish \
  exec wish "$0" ${1+"$@"}
  #
  #... tcl/wish script here ...

For a more complex solution see
  http://wiki.tcl.tk/812

----
sed

Sed requires a '-f' argument to interprete a script, which makes the "env"
solution imposible (see above for an example).

The following is a very deep and tricky solution...

  #!/bin/sh
  # shell to sed interpreter launch function
  b ()
  {
  x
  }
  i\
  f true; then exec sed -f "$0" "$@"; fi
  : ()
  #
  # sed script starts here
  s/a/b/

"sh" sees 'if true; ....' with everything else before it being valid shell,
but basically ignored.

"sed" parses the code and see valid sed script, but branches over it to
the '()' label where the real sed code is located.

-------------------------------------------------------------------------------
SetUID Attacks on the SheBang

Many systems disallow the use of  SetUID SheBang scripts due to the range of
attacks against it.

Most common one being a race condition between kernel starting the interpreter
and the interpreter opening the script file.  Often when script is SetUID, the
kernal will give the interpretor a pre-opened /dev/fd/n file descriptor to the
file it has opened instead of the script path, which will solve this race
condition.

Shells also often check scripts and run them directly, without bothering the
kernel.  As such it may still give the the filename as the argument, even if
the interpretor does handle /dev/fd/n

-------------------------------------------------------------------------------
Scripts Interpreting Scripts

So far only Linux and Minix machines allows a script to interpret another
script directly.  That is the #! program is itself another #! program.

See kernel patch for kernel 2.6.27.9
  http://lkml.org/lkml/2008/9/6/66

Do not confuse a successfull script interpreted script as truely working on
a system.  Many shells on seeing the kernel failure will silently taking over
the script interpretation, and thus still work, but will still fail if
a program (like "env") calles the C-library to try to run the script
interpreted script.   EG: you ran  "env script" where script is interpreted by
another script

BASH can do this, and can even do so on a "noexec" mount, while kernal can not.
This which may be classed as a BUG!

-------------------------------------------------------------------------------
Length of SheBang line

Many systems only reads the first N bytes to handle SheBang,
as a long SheBang could be truncated.

Systems that truncates...
  SunOS4:   32 bytes
  HP-UX:    80 bytes
  Linux:    127 bytes
  BSD/OS 4.2: unlimited
  AIX 5.1:  255 bytes

These allow at least 128 bytes...
  Solaris AIX IRIX OSF/1

These refuses to exec if the line is more than limit
  AIX 4.3:            255 bytes
  FreeBSD (3.4, 4.2): 64 bytes
  FreeBSD (5.0):      128 bytes  (ENAMETOOLONG errno)

-------------------------------------------------------------------------------
Trailing white space and returns

Most systems delete trailing white-space (before the newline),
but others include it in the argument!

This may or may not include a return character (DOS lines ending in '\r\n')

  Linux:  deletes extra whitespace, but not returns.
  Solaris: deletes extra whitespace, including returns
  Cygwin: includes any end-of-line return character

-------------------------------------------------------------------------------
UTF-8 and shell script

UFT-8 can be used for scripts, if no 'byte order mark' is present.
But UTF-16 can NOT be used as a script due to the special file magic.

Windows often prepends UTF-8 files with a 'byte order mark' ( 0xEF 0xBB 0xBF )
to files.   Thus changing the "#!" magic number to a UTF-8 magic number.  It is
normally not needed for UTF-8, though it is needed for UTF-16 file formats.

-------------------------------------------------------------------------------