------------------------------------------------------------------------------- Interpreted Text File Handling The "SheBang" file magic Polygot Scripts (script can run by two different interpretors) ------------------------------------------------------------------------------- Example of how it works... For a more offical explaination see Wikipedia, SheBang, Purpose http://en.wikipedia.org/wiki/Shebang_%28Unix%29#Purpose More info http://www.in-ulm.de/~mascheck/various/shebang/ http://homepages.cwi.nl/~aeb/std/hashexclam-1.html ---- This is my 'hands on' explaination.... Create a script called "echo_script" #!/usr/bin/echo Running Now when you run the script ./echo_script arg1 arg2 You get the output Running ./echo_script arg1 arg2 In otherwords the system executed the command /usr/bin/echo Running ./echo_script arg1 arg1 That is to say the program was run with the arguments... * ONE interpreter script argument "Running" * The filename of the script that is to be interpreted * User supplied arguments "arg1" "arg2" -------------------- Internal Cavats... What really happens is that the kernel function execve() sees the special "#!" (shebang) magic at the start of the file, and runs the command as shown. The script interpretor does not look along the users command PATH, the interpreter program must be a hard coded path. Only ONE argument may follow the command (before the scripts filename) on the actual shebang line (see below) :-( Quotes and any other syntax is completely ignored. Whether the filename argument (to the script to be interpreted) is a relative path or a absolute path, depends on the calling shell (its value is passed to execve() syscall) and how the shell resolved the path to the script file. Typically it is either what the user typed (relative to current directory) or the full path from the shells internal hash of command locations. It is the calling shell that decides whether it remains relative, or a full path is used. Also on most systems the 'interpreter' MUST be an actual C program, and not another 'script' of some kind (that is SheBang is typically not recursive!) Linux however does allow a script to interprete scripts. Do not confuse this with the shell itself seeing a kernel failure and silently taking over the interpretation. BASH can do this, and can even do so on a "noexec" mount! That is it is interpreting the script, not executing it! But that is symantics, and made be a classed as a bug, depending on who you talk to. ------------------------------------------------------------------------------- Find interpreter in users command PATH... You can hardcode the interpritor programs location, like this... #!/opt/bin/perl OR use a secondary program, like "env" to search for the interpretor according to the PATH environment variable ... #!/usr/bin/env perl This uses the "env" command to actually search for the interpreter, so that its location is not fixed, but can be anywhere along the users command PATH. That is the interpreters location is not hard coded into the script making it more system independant, but less secure in the exact interpretor that is run. In this form of SheBang, the interpreter can be another 'script', and not a compiled program, as the "env" command makes a seperate kernel exec() call is being performed. WARNING: the "env" method does not work for windows 'CR' terminated lines such as in cygwin (pc) environments. Basically the return character will then be added to the argument, and interpreter will not be found. Note that there is nothing special about using the "env" command. You can also use other simular commands such as /usr/bin/nice Though it's path is more system type dependant, and has other effects (make the interpretor run at lower priority). Remember you cannot provide a second argument so you have no real control over the 'nice' behaviour. (see next) ------------------------------------------------------------------------------- She-Bang only allows ONE argument! **** IMPORTANT **** As such this works #!/usr/bin/sed -f But using this with 'env' does not #!/usr/bin/env sed -f with the error /usr/bin/env: sed -f: No such file or directory The problem is execve() only allows one argument, The above 'env' needs two arguments and thus fails. What happens in all the shebang arguments were merged into a single argument "sed -f" And of course no program is found called "sed -f" For a solution see "Polyglot Scripts", below. Most systems (based on BSD) do this, including... Linux, NetBSD, FreeBSD Other systems... Solaris: silently ignores the second and later shebang arguments MacOS X: accepts more than one argument The very first UNIX (by Denis Ritchie) did not allow any argument at all. BSD (Robert Elz) added a single argument, (no quoting mechanism needed). POSIX does not specify what should be done (thus the variations). NOTE: Perl will itself parses the shebang line again, so as to to locate and interpret any extra arguments that may be present. However it can not handle "env" style arguments. But a different mechanism (see below) can be used. Most languages also allow you to specify interpretor options later For example, in shell you can set options using 'set' (EG: "set -v") ------------------------------------------------------------------------------- Polyglot Scripts Scripts which can be read by two different interpretors. Typically so one (the shell) will locate and call the other, after preparations or with additional arguments. These rely on finding some code that some shell (typically Bourne-sh) and the final interpreter can both parse and execute without error, but with the result that the shell exec's the real interpreter, regardless of its location, or number of arguments it needs. ---- Perl Both Perl and Shell understand "eval" and "if', but perl uses ';' for command termination, and ignores not end-of-lines as end-of-command. As such the following will have shell perform the "exec", while perl ignores it as it has a "if 0" condition, which the shell ignores. The result is the shell is replaced by "perl", regardless of is location on the shell's command PATH, with whatever extra options you need. #!/bin/sh eval 'exec perl -x -wS $0 ${1+"$@"}' if 0; # ... Note that "perl" does re-parses the shebang line to handle multiple arguments anyway, so this also works, but still needs the full path. #!/bin/perl -x -wS # ... ---- Python The shell script is completely hidden in a python ''' string, while the shell ignores the weird "''':'" expression. Here it is used to have the script run using "python2" if available otherwise use "python" which many systems have pointing to "python3". #!/bin/sh ''':' # This is run in shell... if type python2 >/dev/null 2>&1; then exec python2 "$0" "$@" else exec python "$0" "$@" fi ''' # The real Python script starts here def ... ---- tcl/wish Line continuation in tcl works for comments, but shells do not allow that. #!/bin/sh # the next line restarts using wish \ exec wish "$0" ${1+"$@"} # #... tcl/wish script here ... For a more complex solution see http://wiki.tcl.tk/812 ---- sed Sed requires a '-f' argument to interprete a script, which makes the "env" solution imposible (see above for an example). The following is a very deep and tricky solution... #!/bin/sh # shell to sed interpreter launch function b () { x } i\ f true; then exec sed -f "$0" "$@"; fi : () # # sed script starts here s/a/b/ "sh" sees 'if true; ....' with everything else before it being valid shell, but basically ignored. "sed" parses the code and see valid sed script, but branches over it to the '()' label where the real sed code is located. ------------------------------------------------------------------------------- SetUID Attacks on the SheBang Many systems disallow the use of SetUID SheBang scripts due to the range of attacks against it. Most common one being a race condition between kernel starting the interpreter and the interpreter opening the script file. Often when script is SetUID, the kernal will give the interpretor a pre-opened /dev/fd/n file descriptor to the file it has opened instead of the script path, which will solve this race condition. Shells also often check scripts and run them directly, without bothering the kernel. As such it may still give the the filename as the argument, even if the interpretor does handle /dev/fd/n ------------------------------------------------------------------------------- Scripts Interpreting Scripts So far only Linux and Minix machines allows a script to interpret another script directly. That is the #! program is itself another #! program. See kernel patch for kernel 2.6.27.9 http://lkml.org/lkml/2008/9/6/66 Do not confuse a successfull script interpreted script as truely working on a system. Many shells on seeing the kernel failure will silently taking over the script interpretation, and thus still work, but will still fail if a program (like "env") calles the C-library to try to run the script interpreted script. EG: you ran "env script" where script is interpreted by another script BASH can do this, and can even do so on a "noexec" mount, while kernal can not. This which may be classed as a BUG! ------------------------------------------------------------------------------- Length of SheBang line Many systems only reads the first N bytes to handle SheBang, as a long SheBang could be truncated. Systems that truncates... SunOS4: 32 bytes HP-UX: 80 bytes Linux: 127 bytes BSD/OS 4.2: unlimited AIX 5.1: 255 bytes These allow at least 128 bytes... Solaris AIX IRIX OSF/1 These refuses to exec if the line is more than limit AIX 4.3: 255 bytes FreeBSD (3.4, 4.2): 64 bytes FreeBSD (5.0): 128 bytes (ENAMETOOLONG errno) ------------------------------------------------------------------------------- Trailing white space and returns Most systems delete trailing white-space (before the newline), but others include it in the argument! This may or may not include a return character (DOS lines ending in '\r\n') Linux: deletes extra whitespace, but not returns. Solaris: deletes extra whitespace, including returns Cygwin: includes any end-of-line return character ------------------------------------------------------------------------------- UTF-8 and shell script UFT-8 can be used for scripts, if no 'byte order mark' is present. But UTF-16 can NOT be used as a script due to the special file magic. Windows often prepends UTF-8 files with a 'byte order mark' ( 0xEF 0xBB 0xBF ) to files. Thus changing the "#!" magic number to a UTF-8 magic number. It is normally not needed for UTF-8, though it is needed for UTF-16 file formats. -------------------------------------------------------------------------------