------------------------------------------------------------------------------- Coprocess Programming and Interactive Commands in Shell Scripts by Anthony Thyssen Last major 28 September 2011 Part of this has been copied to http://bash.cyberciti.biz/wiki/index.php?title=Co-Processing_-_controlling_other_programs A tutorial into which it seems to fit very nicely. A Coprocess is defined as Running Two (or more) processes with bi-directional Communications Basically this is about writing shell scripts which run and communicate with another process, service, or interactive program. Typically with the shell script being the master process, controling a more specialised background slave process. This is NOT a article of hardware coprocessors, or multi-CPU's, FPUs, or even GPU's. But about running more specialised processes in the backgrouds. ------------------------------------------------------------------------------- Introduction to using a Coprocess Shell scripts sometimes have to control a program that normally requires or enforces some type of interaction, typically directly with a user. Such programs are generally pre-existing commands that do the job that is required but which are far to complex to use in other ways. Examples include, running command on remote servers, making database requests, programs that deal with complex data, controlling devices, communicating with network daemons. So you end up having some 'command' that you want to run, under the control of some 'wrapping' script to do a specific task. Controlling another program and be tricky. Not only do you need to send information to that command, but you also need to get the commands results, make use of those results. You may even need to modify your future communications based on previous results. As part of this you may need to watch for prompts, errors, end-of-data, end-of-file, timeouts, and other conditions that could make your wrapping script 'flaky', or 'lockup'. ------------------------------------------------------------------------------- What is a coprocess? Generally speaking it is any process that runs in indepentant way, but whcih you are communicating with. It may include a backrgound process running on the same machine, or on another machine, it may even be a network service. Two-Stage Programming... Is a fancy way of saying, writing programs to control other programming. It is often the term used in theoretical computer science papers, where as co-processing is the programming term. This includes things like * Communicating with Databases (mysql, oracle, etc) * Running Backup programs and reacting appropriately (dump, tar) * Using control interfaces (scientific instruments, Arduino) * Processing data too complex for shells to deal with simply. (images) * Changing Passwords using interactive commands. In reality it is the same thing. Network Server Communications... The action to talking to a very complex network server, such as a NNTP usenet news server, or a local FreeNet Node, (which I have worked on in the past) is also practically identical in complexity, to talking with a background command. A network server however is typically better defined and documented then most CLI interfaces to a library. It usually defines exactly how results and data is returned, typically in the form of a RFC (internet Request for Comments). This includes * Remote control of a terminal session (via SSH, RSH, or telnet) * Retrieving files from some file service (world-wide-web, FTP) * Uploading files to some service (Flicker, Dropbox, Twitter) * Talking to some network server (NNTP news server, SMTP mail server) and so on. Modem 'Chat' scripts... This is also exactly the same problem. Setting up a modem initialization, dialling and automatic login using an old modem connection. Typically after being configured and logged-in, the 'chat' script will pass the pipelines to a terminal program for the user to interact with. So if you are old enough to have dealt with modems, you have already done co-processing! ------------------------------------------------------------------------------- Coprocesses In summery... Remember coprocessing is really more about communicating with other interactive programs. They may have been launched by you with a communications channel (pipeline), or already running and you connected to them via a IPC, or other networking protocols. All the above things are essentially the same. The critical aspect is that it is a two way communication, with FEEDBACK. That is you are not just... ---> Format_Requests --> 'Run Command Once' --> Parse_Results ---> Which is just a simple 'pipe-lined' programing technique, but includes a data feedback.... ---> Format_Requests --> 'Background Process' --> Parse_Results ---> ^ | `---------<------ feedback --------<-------' That is the results will effect the future requests that is made. Either in simply timing when those requests can be sent (synchronization), or modifying the request based on the data returned (feedback). Without the feedback the co-process is just a simple filtered pipeline. A simple example of such a co-process is a web page download script, which not only downloads web pages, but may also need to login to access the web page that is wanted. It may need to go through multiple pages with the additional complexity of cookies, and input forms. Of course since this is such a common task, and the input is so well known, many programs (wget, curl, etc) have been written to remove the need for feedback loops, so that each stage can be performed by simpler individual program calls. In other converting the loop into a linear sequence of separate commands. ------------------------------------------------------------------------------- Other guides to co-processing... Command-line interactive programs in UNIX shell-scripts http://www.osnews.com/story/10929 ------------------------------------------------------------------------------- Why use a co-process? Why not a more direct method? Basic reasons... * Convenience The command may do a very complex task, that would take a huge effort to re-engineer into a more convenient form. Of course common problems do get re-engineered, for example: web page download (using "wget" or "curl"), or running remote commands without compromising security, (using "ssh", "rsync" and related programs) Also The command may be the 'simplified' interface to a complex API. But dealing with the API may be just as complex, especially when you have the simplified command already waiting to be used. Typically you want to use a co-process with that command because it is better than the other alternatives, whatever they may be, if any. * Network Interface Only While communicating with a network or daemon service is not strictly a command, it is still co-processing. The network interface can in some cases be the only way to control or communicate with that service. This can include: direct access to "sendmail", "pop", and "imap" mail services. "nntp" usenet downloading. "freenet" node control, uploading files to flicker, or dropbox. * Automation and Monitoring If you want to have a program regularly access, check, or monitor some activity, a command line or network connection may be the only practical way to do it. If some feedback for login to a remote service is needed, co-processing may also be required to automate things. * Persistence of State Many programs and library's have a huge setup time, and state. In such a situation it can be far better to run the program once only in background, and feed it tasks to do, than running multiple commands, one for each step. The state (or complex data) may also have a high cost in terms of read, parsing, formatting, and writing that data that is best avoided, simply by only running a command once, rather than multiple times. Also you may get more exact results simply by keeping that data in memory. Examples of state include + Network connection, which can take some time to setup. + Current directory, or location you are currently working on. + Intermediate data, saved variables, and user functions. + access to data in another form such as uncompressed and decrypted. * Control and Looping You may need to decide what the next step is based on the data received from previous calls, or depending on user options. This is generally impossible to do in a strict "input | command | output" data pipeline. With co-processes you can make use of the shell/perl/etc scripts own IF-THEN and LOOPING constructs to do much more complex tasks that may be difficult to do in otherwise. Even just converting user commands into 'optional' actions can often be easier handled using a co-process technique. * Handling or Errors, or special conditions A strict pipeline of operations just can not modify later actions when some unusual condition develops. This is especially useful for working around known problems or special conditions. In many cases this is actually the whole point of creating a wrapper around some existing program in the first place. * Multiple Instances. You can have multiple co-processes, not only with the same 'command' but with a number of different commands and services. You can then have data from one, modify the other. I don't mean in a just 'piping' data, but in modifying things, or depending on data from both the source and the feedback from the destination service. One typical example is creating a graphical user interface wrapper for a command, or programs to convert data found in one database and storing it another completely different database with a completely different structure. * Less Version Changes or System Dependency over time A provided command to a library API typically does not change as often as the underlying methods and techniques. If some major API change does occur, the command interface often adsorbs that change so as to less effect users using that command. This can make a script using a Command Line Interface (CLI) far more portable, especially between different Operating Systems, such as Linux, Solaris, and MacOSX, than a program compiled to use an API directly. Also when something does change a scripted co-process is often simpler to modify than the more direct API program. Examples of CLI wrappers to complex API's include: "sqlplus" for oracle and other databases, "openssl" for encrypted files and secure communications, "gpg" for mail and file encryption "imagemagick" for image manipulation. And probably many others. ------------------------------------------------------------------------------- Problems with co-processes... This are the typical problems people face when dealing with a co-process. 1/ TTY Input problem If the command reads directly from the TTY device rather than standard input. For example.. Mount commands, SSH, Encryption Passwords etc... It is typically done for security reasons, such turning off echo while th user is typing the password, or just to make it harder for a novice user to abuse the command, in simple 'password cracking' attempts. The solution it to typically wrap the command in some type of PTY, allow you to pipe in your own automated 'keyboard' input. Examples of such programs that only do this include, "unbuffer", "condom", "unbuf". Packaged techniques that provides the same service is "pty", "expect" "empty", and "socat" (see below) Another program that was designed specifically to handle this problem with SSH is "sshpass". See commentary below. An example of what is needed is in "socat" examples page. 2/ Input Junked problem Some commands will simply read and junk all old input just before prompting for new input as a security precaution. This prevents user type-ahead from being used as a password, especially when network delays make that data irrelevant when a user is given an unexpected password prompt. While "ssh" and "passwd" does it just before prompting for a new password. Also the oldest versions of "telnet" did this just after it connects to a remote system, newer versions have had that 'mis-feature' removed. In that case you either need to delay, or specifically wait for the prompt before sending the necessary data. This is probably what you should be doing anyway. Note prompts may not have a final newline! (see below) 3/ Buffered Output problem Essentially a program does not 'flush' its output! You may need to avoid the system STDIO library from 'block' buffering the input/output, and keep it buffered line by line (both input and output). One way to do this is to PTY the command so it looks like it is in an interactive terminal, in which case the system auto-flushes whenever a linefeed is seen. Note that some modern versions of 'cat' and 'grep' have a 'line-by-line' buffering option added for this very reason, causing the program to forcefully flush after every end-of-line character, rather than leaving it up to the stdio library buffer handling. This last problem can be worse for network commands which uses network packets rather than lines. This can be solved by using "condom" (or "pty -0") of the PTY package. The "nobuf" script of the replacement "ptyget" package. The "unbuffer" of the "expect" package (standard on most linux). The "stdbuf" of GNU coreutils package (standard on most linux). The "script" type-scripting program is also supposed to do this. See also Buffering Problem in Bash Pipelines http://mywiki.wooledge.org/BashFAQ/009 4/ 'End-Of-Data' marker problem How do you know your requested information is complete? Basically... When has the program finished returning data? Many UNIX commands for example don't given any final 'end-of-data' indicator when they are finished. Though most interactive commands (and remote commands) will output a 'prompt' which can be used as a EOD indicator, not all do. If you can, change the prompt at the start to make it more 'unique', and provide a better EOD marker. (see next) If a program has no 'end-of-data' indicator, you could just send some specific 'echo' command which returns a indicator. Otherwise you would need to wait a 'fixed delay' after which you could assume the input is finished. You also have to be careful of errors, which may result in your program not receiving expected data, and even preventing the appropriate 'end-of-data' response. As such 'end-of-data' most not only include the, expected EOD or prompt, but also end-of-file, timeout, and possibly unexpected error reports. WARNING; Perl-Expect does not seem to recognise 'program exit' as a 'end-of-data' indicator! This is really bad. A typical solution is to simply append a 'print this string' as a extra request on the end. that way when you see that 'string' you know the command has finished processing your request. 5/ Prompt problem The prompt (or EOD marker) may not have a final newline character. This can cause buffering problems with programs which are not PTY wrapped to place command in 'raw' mode. However typically such prompts are flushed so unless the prompt is feed though other programs that do not always flush its data, this is not a problem. It also means that when reading a prompt that does not contain an linefeed or return character, you can not simply just 'read a line' as you would normally. In this case you must read 'available data', without regard to what data that is. 6/ Binary data. No matter how you look at it shells can not handle or store a NULL character. As such shells can not handler raw binary data. If this is a possibility, you will need to have the data read and saved to file, using programs that can handle binary data, such as "dd", "perl" or a compiled "C program". The secondary program will of course need to handle any "waitfor" conditions required, as typically you do not want that program simply waiting for an EOF condition that will never come. Alternatively if the program allows, ask it to save binary data directly to either a file on disk (which could be a pre-prepared named pipe), or to a separate file descriptor that you provide. Not all programs however can do the latter. 7/ Handling out-of-band data and errors... Some programs may output data, errors, or some type of status report, at any time, which could become mixed with the output you requested. For example a periodic status update of some background action to a TTY terminal. It can also output unexpected errors and problem reports, to standard error, which a PTY handler could then merge into the man data stream, complicating matter further. Worse still, some programs output a status or progress report directly to the TTY which again a PTY handler could merge together. That in turn can complicate the whole situation as you may have to continuously monitor the interactive command, for these updates, before, during, and even after you have read the data you specifically requested. It gets worse if this out-of-band data looks similar to your specifically requested data! Worse still the unexpected data can cause a erroneous 'end-of-data' sequence, or no 'end-of-data' indicator at all (lockup). Believe me this is the bane of interactive program control, and the reason such programs are often flaky and have a bad reputation. The only solution is to not only look for your 'end-of-data' but you also have to look for and identify these status and error data as separate to the data you was expecting. Unfortunately unless you are intimately familiar with a program (and all variants of that program!) you are unlikely to be able to predict all such 'exceptions'. That is you just collect all data and parse out error indicators as part of that data, either as a pre-filter, or during data collection. As long as 'end-of-data' indicators are not effected, and the out-of-band data is line buffered, it should not be too difficult. 8/ Timeouts When 'end-of-data' problems is also involved, the only common solution is a timeout of some type. The problem with that is "how long do you wait?". Too long and your program can become very slow when lots of such 'problems' happen. Too short, and you can fail to automatically handle problems caused by slow network connections. Another solution, if that application allows it is to program in some sort of 'heartbeat'. That is you get the program to output an out-of-band indicator that it is still alive and working. Or have it respond to asynchronous input "are you ok?" type signal. This is typically a key sequence that produces a small response of no real importance. With shell scripts, you have no simple way to just read all data that is currently waiting. Though this is getting better with 'shell select' or non-blocking reads becoming more available. The "sshpass" program for example avoids a lot of the complication by only handling TTY directed IO, leaving the wrapped "ssh" commands normal stdin/out/err pipelines connected as normal. This in turn prevents it mishandling a possible second 'password' request say from the program the "ssh" ran on the remote machine. 9/ Avoiding Lockups.... The worse problem with interactive command programming is a lockup. Basically if slave program has finished outputting its data, and is waiting for new input, while your controlling program is still waiting for more data, that will never come. Typically this is caused by the interactive program aborting unexpectedly or some unexpected error causing the program to never produce the desired 'end-of-data' response you are waiting for. This is another reason why using unique 'prompts' can provide a good solution as EOD, and syncronization markers. The other form of lockup involves multiple channels of communication. For example the controlling program is waiting for some type of expected response from the slave process, but the slave process errored, and output information about the error on the error channel, which is being ignored. As such the expected response never comes. Another lockup is similar but very rare. One of the process is sending lots of data, but the other process is not reading it. Eventually the communications buffers will reach there (huge) limits and communications stops, with one porgram waiting to send, and the other failing to read. Final type of lockup is if the parent does not check for the command unexpectedly dieing (child signals), or ignores 'pipe' signals (stream closures). Designing the right 'waitfor' data handler is critical to prevent lockups. Phew... And you thought a co-process would be an easy solution! Well in most cases it is. The above are generally specific things to look for when things go wrong, and should be kept in mind when writing co-processes. However in most cases they are not really much of a problem. ------------------------------------------------------------------------------- Programs that can help with scripted co-processing (not a complete list) Co-Processor Launchers expect This is the traditional TCL scripting for running interactive commands. However it is not usually thought of as 'simple' to use and does not interact well with more complex scripting (except TCL) pty A general run program under PTY's that is supposed to 'just work'. Last release pty-4.0 (1992) It also provided the code for the original extremely simple "waitfor" program. pty Another simple "pty" program from the Addison Wesley book Advanced Programming in the UNIX Environment - Second Edition By W. Richard Stevens, Stephen A. Rago ptyget A complete re-write of the pty package, by the same author Release 0.50 empty Run command in a PTY connecting it to FIFO named files. Good but seems incomplete. See story behind "empty" at Command-line interactive programs in UNIX shell-scripts http://www.osnews.com/story/10929 exec-with-piped.c A more primitive "empty" but without send/recv/waitfor functions. socat (see below) Buffer and TTY control unbuffer, condom, nobuf simple scripts from each of the first three packages that provide PTY wrappers for some command. It is provided to allow proper line-by-link data pipe-lining of the commands. At least one of these command is typically vital when creating a shell-based co-processing. stdbuf Similar to the previous PTY command, but by modifying the stream settings (size and mode of the buffer) directly. That is without using a full PTY wrapper around the command. It is part of 'GNU coreutils' so is available on most Linux machines by default, even when TCL/Expect is not. Less intensive than using a a full PTY socat (see below) Network Connections and Support telnet, mconnect Just simply connect to a remote network TCP port. tcp_client A 'telnet' in perl (as a example), without TTY, or buffer problems. Multiple versions, Mail me if interested. netcat A 'telnet' network connection replacement, providing both client 'connect' and server 'listen' handling for TCP and UDP. socat A universal network and command connector. Everything you need for any command or network connection. It includes PTY, or interactive "readline" wrappers. As well as SSL secure communications, port forwards, and so on. Expect to see this command become more commonly available. sshpass In many ways "sshpass" is a special C program 'expect'-like wrapper for SSH commands, to allow the use of passworded ssh connections from automatic scripts. Of course using public keys is consirdered the better alturnative. However the source code for this very small program makes for nice reading. The web project page does not really tell you what the program does until you download it read the manpage. :-( Multi-Stream Select/Poll shell_select.c A simple primitive wrapper around "select()" system call, to make it available in shell. Difficult to use and of limited ability. shell_select.pl A "select()" for shell, written in perl using the IO::Select module. Allows handling of both stdout/stderr or multiple simultaneous co-processor handling, or dealing with large data flows. (see below) Miscellaneous runscript This is actually designed for modem use but is feature rich and could be used as a basis for an 'expect' or 'waitfor' action handling of the co-processor. chat This is essentially a device controller. It sends specific strings, and looks for a specific response. If it does not see it, it sends a different response to abort and clean up the process. It is however a form of co-process with the sub-process being a physical device rather than another command. ------------------------------------------------------------------------------- Co-Processes shell script techniques... Using co-processes is not generally difficult, at least not until you come across some unexpected problem (see above). The bigger problem is that typically everyone does co-processing in their own way, depending on the complexity of the interaction, the command/service being used, and possible errors that needs to be dealt with. Quite a number of techniques have been developed to use coprocessing, in scripts or from more advance languages like C, perl and PHP. Over the years I have used many of them, for one reason or another. Coprocesses in shells is in some ways easier (no need to deal with the low level IO library) and fast debug cycle, and in other ways harder (no direct access to 'select()' and unnamed pipes). However using shell for coprocess programming is possible. --- Direct Data Pipeline... This is basically equivalent to UNIX FAQ - Running interactive programs from a shell script http://www.faqs.org/faqs/unix-faq/faq/part3/section-9.html This however only provides minimal start point. What follows goes much further. First if you do not have the 'feedback' method described above, then you do not actually need to use a co-process. A pipelined process which feeds the input, as a completely separate task to interpreting output is all that is required. For example, when using "ftp" as a sub-process, you can just use a simple 'piped-input'. In this case a 'HERE file' of static input will work perfectly fine. The output of the command can then be either logged, parsed or junked as a completely separate stage of the pipeline. =======8<-------- # download file from a anonymous ftp server ftp -n -i remote.server <<-EOF # 2>/dev/null user ftp your_email_address cd /direct/file/is/stored binary get filename dest_filename bye EOF =======8<-------- But of course you can't make decisions about what to feed into the command using a pipe-lined approach, after you start process. It is a purely 'blind', undirected request, that will either succeed or fail, according to the parsed output. Just about all ftp and web server requests are typically performed using this blind, one connect per request, methods. So lets look at a more problematic but simple command. --- Timed Data Pipelines... The password program is designed to be interactive with a user, not program driven. It will prompt for a new password (twice), if you are the superuser. Ordinary users get a extra initial prompt for their current password. Typically a system programmer wants to use password in a script because they don't want to have to edit the system password file and its associated locking, and all the possible problems that may cause. For example leaving you with a UNIX system that has no passwords, and which no one can use. Also not all systems record passwords in a password file, so scripting the "passwd" program can make the script more portable. ASIDE: Yes I know you should not have passwords on a command line, at any time. A condition that is typically unavoidable in a shell script. BUT it does make an excellent and simple example. However the "passwd" command only accepts input from a TTY. So a PTY wrapper such as "unbuffer" is a requirement, or use "expect", or "empty" technique. Also it likes to 'junk all input' before prompting the user for a new password. As such you MUST wait long enough for the command to prompt for the new passwords before actually sending them. For example this is a typical first attempt. A static 'input pipe with pauses', that will blindly change the given users password (assuming you are the superuser). =======8<-------- #!/bin/sh # username is passed as 1st arg, password as 2nd ( sleep 5; echo "$2"; sleep 5; echo "$2") | unbuffer passwd "$1" =======8<-------- But this fragile, as it will fail if the computer is slow to prompt for the new password. or some other unexpected error occurs. It is also very slow with all these pauses in its processing. Imagine using such a script to set the password of a 1000 new users! --- Watch output and respond appropritally. "Expect" is one of the oldest programs designed for co-process handling. using both PTYs and WaitFor handling. For example here is that same program using an expect script =======8<-------- #!/usr/bin/expect # username is passed as 1st arg, password as 2nd set password [index $argv 2] # start "passwd" co-process spawn passwd [index $argv 1] expect "*assword:" send "$password\r" expect "*assword:" send "$password\r" expect eof =======8<-------- This will run as fast as possible, as it will actually look for the password prompts the "passwd" command outputs, before sending the requested data. The 'expect' commands in the above are more commonly known as "waitfor" strings, as that is the feedback that you are 'waiting for'. The problem with "expect" is that it was written in a language that not many people understand, TCL, and although there are "expect" versions for other languages like: "PerlExpect" (for 'perl'), "py-expect" ( 'python'); it has so many options that is it generally regarded as hard to understand and use it properly. Its examples are not always much help when dealing with specific situations. Essentially to use "expect" you have to learn a new language! And who wants to do that. The shell can do the same job, with just a little extenal help. - Another alternative that is presented in Steve Parker's "Unix / Linux Shell Scripting Tutorial", in a section "Simple Expect Replacement". http://steve-parker.org/sh/expect.shtml My more generic script is available at http://www.ict.griffith.edu.au/anthony/software/shell_expect.sh And is used like this... shell_expect expect_data log_file | command > log_file This creates a generic shell script which reads a static 'send-expect' list operations. The send lines are sent, the 'expect' lines, either delay, or wait for specific 'expect strings' to appear in the processes log file. It is like the previous method essentially just simpler 'chat' script for the co-processor. Send this, expect that, or abort that. It is a simple solution for running a co-processor, with static input, with appropriate feedback "waitfor". But it provides a good method of handling a co-process with static input/output data. The major drawback with this technique is that it saves its output to a file, and unless the command 'flushes' its output properly, or is wrapped in a PTY the file may not contain the expected string. Especially watch out if you do further pipeline processing of the data before saving it into the log_file, as all the programs in the pipeline needs to flush there data continuously rather than buffer it. - And here is another script that is similar in nature, found on... http://grulos.blogspot.com/2006/02/script-expectsh-using-bash-instead-of.html =======8<-------- #!/bin/bash # # expect_list.sh # expect=("hi" "white" "in") reply=("hello" "black" "out") while read -n1 "char"; do word="$word$char" [ "$word" == "${expect[$count]:0:${#word}}" ] || word="$char" if [ "$word" == "${expect[$count]}" ]; then printf "\n${reply[$count]}\n" (( count++ )) [ "$count" == "${#expect[@]}" ] && break word="" fi done =======8<-------- Basically each 'expect' string is looked for (without regard for white space or newlines), and when seen the equivalent 'reply' string is echoed with a newline. Note it does not provide for timeouts or gracfully handle unexpected events such as EOF. - The above methods are typically either too simplistic, restrictive in format, or uses a language that is rarely used for anything else (tcl in expect). Of course there are 'expect' equivalences and API libraries in many scripting languages such as Perl, Python, and Ruby, but not much for a simplier shell script. --- True Shell Co-Processing (Bourne Shell)... A more shell centric method is to DIY the the co-process using FIFO named pipes, and a simple "waitfor" C program, in shell.. For example. =======8<-------- #!/bin/sh # username is passed as 1st arg, password as 2nd # set up "passwd" co-process mkfifo in.$$ out.$$ unbuffer out.$$ passwd "$1" & # background command exec 5>in.$$; rm -f in.$$ exec 4&5 "$2" waitfor <&4 'assword:' echo >&5 "$2" # close input pipe (send "end-of-file") exec 5>&- # wait for sub-process to exit wait =======8<-------- This is probably the most useful co-processor technique in shell scripts and typically all you need in most cases. Note that once the named pipes have been opened, the actual named pipe files are no longer needed so can be removed, saving the need to clean them up later. The pipe will remain in effect while the file descriptors remain open. The "waitfor" command is a simple program that searches for its argument in the input, character by character, without waiting for a 'end-of-line' character that may never come. The first "waitfor" program came from the old pty package (1992). Remember a prompt rarely ends in a linefeed, waiting for a the newline character that never comes, can be a bad idea. Later I will look at various methods of writing "waitfor" programs and functions that will read and test the program output (shell input). On older UNIX systems you may need to use the command "/etc/mknod in.$$ p" to create a named pipe. The "mkfifo" is the more modern method, but both do the same thing. Note that when using file descriptors stored in variables using exec, you may need to use "eval" first. For example... =======8<-------- pipe_out="/tmp/command_output.fifo" PIPE_OUT=8 exec ${PIPE_OUT}<${pipe_out} =======8<-------- will fail with "bash: exec: 8: not found" But using eval to expand the variable names before the exec will work fine.. eval "exec ${PIPE.OUT}<${pipe.out}" It can also be useful to know just what file descriptors are in current use so you can avoid something that is already being used. One method is to list the special directories /dev/fd or /proc/self/fd another is to look at the output of "lsof -p $$" but that can be slow. --- Shell Co-Processing, using internal pipeline... This example is the same as the last, but uses a shell pipeline to remove the need for one of the named pipes (the command output) =======8<-------- #!/bin/sh mkfifo cmd_in.$$; exec 2>&1 ( exec 4/dev/null ) | ( pty passwd "$1" >cmd_in.$$ ) rm -f cmd_in.$$ =======8<-------- I personally find this messy, and worse, places the co-process launch at the end of the script, rather than at the start where the command should be, to make it obvious what the script is doing. --- Shell Co-Processing, only using named pipes (no file descriptors)... This is another example of using FIFO named pipes, but this time I use the named pipes directly, without using shell file descriptors. =======8<-------- #!/bin/sh mkfifo out.fifo in.fifo unbuffer out.fifo passwd "$1" & # background command cat out.fifo > out.fifo & # prevent echos to fifo sending EOF cat > in.fifo & # That is keep the FIFO pipelines open input_pid=$? # Note the input so we can 'close' it # watch the output and send info waitfor in.fifo "$2" waitfor in.fifo "$2" # clean up and close connections rm out.fifo in.fifo kill $input_pid wait =======8<-------- The two 'cat' background processes are used to 'hold' the named pipe open so as to prevent individual 'echo's and 'read's from closing the pipe. This means sub-programs do not depend on file descriptors that they may not have access too, but instead can send/recv directly to the FIFO named pipes. That is to say multiple separate programs could be using the same named pipes (though that opens a whole other level of problems). More importantly, sub-programs called by your control script do not need to rely on the open file descriptors being passed via the 'exec' process. That last can be particularly a problem when a 'close-on-exec' flag has been set on the file descriptor. A problem I have encountered in perl scripts. Bash generally does not use 'close-on-exec', except for its "coproc" file descriptors, which I thought a rather silly thing for it to do. Using permanently open named pipes works, but I hate leaving named pipe around longer than I need to. Also I keep feeling that each 'echo' command I use will send a EOF to the command, as that is the normal behaviour. Also if something goes wrong you are left with background cats preventing the co-process from exiting. The only advantage I can see for it is for multiple separate invocations of some command occasionally talking to the one background process. A bit like the use of "screen" to hold TTY sessions open over separate user logins. But you'd have to watch out for two such commands executing at the time or you'll get problems. In summery it could easily be left in a very funny state, and I regard it as rather a bad programming technique. - The "exec_with_piped" launcher This was a advancement of this technique of coprocess communication. See the old 14 November 1997 package "pipe_scripting.sh" http://okmij.org/ftp/Communications.html#sh-agents =======8<-------- /etc/mknod FIFO-PIPE p exec_with_piped FIFO-PIPE "ssh remote Mathematica" & echo "Mathematica-command-1" > FIFO-PIPE ... see the result on the screen ... echo "Mathematica-command-2" > FIFO-PIPE ... see the result on the screen ... echo "Quit" > FIFO-PIPE =======8<-------- Note that this is basically a one directional named pipe handler (not a co-process) that launches a background process to do do the same task as the background "cat" in the previous example. The code is simple, and makes for interesting reading, before going to the next true co-processing via named pipes helper, "empty". - Using "empty"... The "empty" program is actually designed to handle things using named pipes just as described above. However it does all the work, and hides all the details of the named pipes, so as to simplify scipts enormously, and also provides other tools like a "waitfor". You can download it from http://empty.sourceforge.net/ When "empty" launches the co-process (using -f), it will create the named pipes and also a background FIFO holding daemon. =======8<-------- #!/bin/sh # # Changing passwords using "empty" coprocess # empty -f passwd "$1" # waitfor prompt (for 5 seconds) and send response empty -w -v -t 5 "assword:" empty -s "$2\n" # waitfor prompt (for 5 seconds) and send response empty -w -v -t 5 "assword:" empty -s "$2\n" # at this point process should be finished. # but lets be sure! empty -k =======8<-------- You do not actually need to specify the FIFO files to use (as in the above), but it is better to provide the FIFO filenames you want to use as "empty" does not then need to search for them (in /tmp) every time it is run. Also that "empty" background daemon automatically cleans thing up if the controlling process dies, or co-process exits, making what is otherwise a very dirty hack, a rather clean co-processing technique. I would however not like to use "empty" to send passwords (as above) as by being an external program (rather than a shell built-in like "echo"), the password arguments will appear in the process ("ps") listing! It also does not provide a way to 'pipe' data into empty (and thus the hidden named pipes). Its 'waitfor' system is to my mind inadequate and limited, as it only allow for a single 'expect' string. Also "empty" does not seem to have any way of specifically closing (EOF) the command input pipe, without it also closing its command output pipe, meaning you can not get the final bit of data form the command, unless you can 'quit' that command in some other way. It is a niffy technique, but "empty" could really use a major expansion to allow it to handle a broader range of problems, including data piping. Note that all the abilities that "empty" provides can be achieved by a BASH script, and functions. One example of something similar was found in... http://www.technetra.com/2009/04/26/discovering-web-access-latencies-using-bash-coprocessing/ a copy of which with my own notes is at... http://www.ict.griffith.edu.au/anthony/info/shell/co-processes.example_2 --- BASH "coproc", using file descriptors only... The bash "coproc" built-in can make the launch of a co-process much easier. =======8<-------- #!/bin/sh # username is passed as 1st arg, password as 2nd # set up "passwd" co-process coproc passwd { unbuffer passwd "$1"; } echo "co-process \"passwd\" started on $COPROC_PID" echo "command input is sent to file descriptor ${COPROC[1]}" echo "and output is read from file descriptor $COPROC" # watch the output and send info waitfor <&$COPROC 'assword:' echo >&${COPROC[1]} "$2" # watch the output and send info waitfor <&$COPROC 'assword:' echo >&${COPROC[1]} "$2" # close input pipe (send "end-of-file") exec ${COPROC[1]}>&- # wait for co-process to exit wait $COPROC_PID =======8<-------- This is almost identical to the original FIFO with file descriptors we previously recommended above. There is just no named pipe files involved, and you do not have to worry about it clashing with an existing, already opened file descriptor. However while the launch is clean and neat, the use of a file descriptor array is rather messy and at time hard to follow. So to make it more readable you can reassign the variables to something more practical. =======8<-------- #!/bin/sh # username is passed as 1st arg, password as 2nd # set up "passwd" co-process coproc p_out { unbuffer passwd "$1"; } #p_out=${p_out[0]} # output from command - this is already the case p_in=${p_out[1]} # input into command p_pid=$p_out_PID # the commands Process ID (for wait and kill) # watch the output and send info waitfor <&$p_out 'assword:' echo >&$p_in "$2" # watch the output and send info waitfor <&$p_out 'assword:' echo >&$p_in "$2" # close input pipe (send "end-of-file") exec $p_in>&- # wait for co-process to exit wait $p_pid =======8<-------- In my tests, the file descriptors used by BASH "coproc" starts at 60 which make be a problem when using the primitive "shell_select" program (see below). But an "perl" alternative is also provided that can handle it. Also the shells file descriptors opened by "coproc" was set with a "close-on-exec". this means you can not pass them to a another program to read/write/waitfor/select with those file descriptors. You can fix that problem by moving the file descriptors to another number, but that sort of defeats the whole purpose of using "coproc" in teh first place. The biggest gotcha with BASH "coproc" is you can only have one such process, even if you specifically name it. If you try you will get an error. warning: execute_coproc: coproc [????] still exists I would assume that if users demand it from BASH developers, it will likely be quickly fixed. All in all I think it is still better to DIY it using temporary named pipes, whcih does not have these problems. ------------------------------------------------------------------------------- Multiple Output Streams (stdout and stderr)... As is typical on most of the above techniques we have used only STDIN and STDOUT of the co-process, are connected. STDERR is left to output to the same place as the rest of you program. Hopefully into some log that can be checked when something goes wrong. However in most of the examples you can simply roll STDERR into the commands STDOUT, and deal with it as per the rest of the commands output. coproc unbuffer passwd "$1" 2>&1 However merging errors and output may not be convenient, especially if you want to monitor each output stream as a separate entity. For example to keep errors and or progress indicators from 'spoiling' the main data feed. However when you have two sets of output, and they both output lots of data, then you could also end up in a 'output lockup' where you are waiting for normal output, but the co-process is waiting to output more errors, and the error buffer is simply full. Or visa-versa. Basically when two or more output streams are involved, (or even multiple co-processes), you can not afford to only read from one stream, but have to be able to read from any stream when data is available. This basically devolves down to the question... When is data available to read? Closely related to this is... When is it safe to feed more data to the co-process. In both cases it really only becomes important when lots of data is being send (such as large files). It is also of particular importance when the co-process involves databases, large file transfers, or image processing. There are a number of solutions to this, but essentially they boil down to two methods. Polling (non-blocking reads), and IO wait (select system call). The 'select' method is typically the best solution, but rarely used in scripting. It is also a system call that no shell I have seen actually provides to its users, and so you will need to rely on some extenal program to make this call. I only found one 'C' program which provides a form of 'select' for shell scripts, the source of which is no longer generally available. This primate "shell_select" program is given a list of file descriptors and then returns what descriptors has data to read, or non-full buffer for write. When I say primitive, that is exactly what I mean. The program was limited to just the first 32 (on a 32 bit machine) file descriptors (0 to 31), and returned results in a fairly useless bitmask that shells find difficult to parse out the individual file descriptors that are ready for handling. WARNING: this limitation makes the C-program version usless when using it with a bash "coproc". Not that "coproc" allows separation of stdout and stderr or other file descriptors, directly. Later I wrote a version one using perl "IO::Select" module. and it has proved to work very well. You can download it from http://www.ict.griffith.edu.au/~anthony/software/shell_select.pl The command is 'eval'ed and sets the variables "$rd_ready" and "$wr_ready" with a comma separated list of the given file descriptors that are ready. A "-t" option will also allow you to specify a timeout (or poll) of the file descriptors. --- Co-Processing with both stdout and stderr... This example uses a "shell_select" program (either the C or perl version). For example this is a wrapped around the "bc" program to calculate mathematical expressions the user provides, but handles both stdout, stderr and a timeout of the results in different ways... =======8<-------- #!/bin/sh # # a "bc" calculator with 8 decimal places and error handling # # set up co-process with both stdout and stderr mkfifo /tmp/bc_in.$$ /tmp/bc_out.$$ /tmp/bc_err.$$ bc /tmp/bc_out.$$ 2>/tmp/bc_err.$$ & bc_in=10 bc_out=11 bc_err=12 eval "exec $bc_in>/tmp/bc_in.$$" eval "exec $bc_out " user_input; do # read request from user # send user input to "bc" with echo >&$bc_in "scale=8; $user_input" # wait for any output or timeout eval `shell_select.pl -t 1 -r $bc_out,$bc_err` # wait/timeout if [ -z "$rd_ready" ]; then # no "bc" output returned? if kill -0 $bc_pid 2>/dev/null; then # did the bc co-process die? wait $bc_pid bc_exit=$? echo >&2 "EXIT: bc program exited -- goodbye" exit $bc_exit; fi echo >&2 "TIMEOUT: no results for expression" continue fi # handle results (stdout) or errors (stderr) without lockup while true; do case ",$rd_ready," in *,$bc_out,*) read -r <&$bc_out result; echo "Result => $result" ;; *,$bc_err,*) read -r <&$bc_err error; echo "ERROR => $error" ;; *) break ;; # no file descriptor is ready - end of output esac eval `shell_select.pl -t 0 -r $bc_out,$bc_err` # poll for more output done done =======8<-------- Try running the above program and feeding it expressions like... a = 3 10 / a some error print "some output\n" This will handle both normal output results (from stdout), handle errors from (stderr), as well as timeout if "bc" takes more than a second to do its calculations, or just exiting, normally or unexpectantally. It could have been done using polled reads, but only at the cost of looping constantally, using up lots of CPU cycles, use of sleeps producing a lack of responsiveness, and longer timeouts to unusual events. WARNING: The above can still fail! For example if you did not put a "\n" in that DC test command. Can you see why? How can you fix that? The real point of using a helper program like "shell_select.pl" is that you could use it to handle simultaneous output from multiple co-processors, or even large volumes of data that is far larger than the named pipe buffer size (whatever that may be, but it is very large!). The above also demonstrates a type of "waitfor" handler, which could have been turned into a function or separate program. And that is the next topic. It is the the use of a simple line read rather that a true waitfor that was the cause of the programs down fall. When no 'newline' was returned the shell "read" locks up without any form of timeout. This of course brings use to creating a proper "waitfor" program, script or function. ------------------------------------------------------------------------------- WaitFor Handling The waitfor program something that will look for a specific strings, such as a "prompt" or "password" request, before returning. Better still it should allow you to have a list of possible "strings" and return which "string" matched, as well as handle EOF, and a timeout result. It should be able to report the text leading up to that string. It may also be able to send fixed responses for specific "strings", though this is not always desirable. It is often better to have the response handled by the calling script rather that the "waitfor" function itself. A C program form of "waitfor" used to be available at... ftp://stealth.acf.nyu.edu/pub/flat/misc-waitfor.c The original ultra simple version was from the pty packages (published 1992). Basically the "waitfor" provides the main core 'expect' function of the "expect" package, and just how complex it is depends on just what you need to handle. Here is shell script called "expect.sh" that does a simple "waitfor-respond" type task. Note that it assumes named pipes were used for the IO. =======8<-------- #!/bin/sh # # expect.sh "search" "response" # # The named pipes out.fifo and in.fifo connect to the # interactive program, and should be previously setup # This has no timeout, or error condition handling. # while :; do dd if=out.fifo bs=1b count=1 2>/dev/null | grep "$1" if [ $? -eq 0 ]; then echo "$2" > in.fifo # Match found, send response exit 0 fi # Match not found, continue to search. done =======8<-------- This is designed using original Bourne Shell constructs only so is portable to very old and odd UNIX systems. Note the use of 'dd' so as to to read one character at a time without end-of-line problems. An improvement to the above is to use BASH 'read' built-in to terminate the read on the last character of the requested response string. For example, this defines a function to wait for a string, but reading until it sees that last character of that string. =======8<-------- waitfor() { string="$1" lastchar=${string: -1 } output="" while [ "X${output: -${#string} }" != "X$string" ]; do read -d"$lastchar" new_output output="$output$new_output$lastchar" done } # ... waitfor 'assword:' =======8<-------- This is faster, with less looping than single character reads of the previous example, but it only works when looking for a single response (or EOF or timeout). However if you are only waitnig for a expected prompt or a fixed end-of-data marker, the above will work fine. FUTURE EXPANSION handling multiple possible responses fixed response, verses, programmed response timeouts and EOF handling multiple streams... --- Bash Read read -r -u $pipe_out line Blocking read until a full line has been read into $line status = 0 for success 1 = EOF WARNING this will ignores a final incomplete line on EOF. read -r -u $pipe_out -N 1 char Blocking read on a single character (no new line) status = 0 for success 1 = EOF read -r -u $pipe_out -d 't' string Blocking read until character 't' has been read (the 't' is junked) Useful for searching for a specific string without regard for newlines. (see example above). read -r -t 0 -u $pipe_out Using a timeout of 0 however does not actually ready anything, but simply polls the input to see if there is something to read. A exit status of 0 is yes, and 1 is no. partial lines and EOF handling is as yet unclear. read -r -t 0.00001 -u $pipe_out line This is a non-blocking read that times out very quickly. If a line was read you get a exit status of 0 (okay) However it will only return a result is a full line was available If a full line was not available you get a weird exit code of 142! This happens even for a incomplete line followed by EOF. The meaning of 142 status is unclear. 128 (signal) + 4 (Illegal Instruction) ???? When you see this status, you can then read the output one character at a time using -N 1, until all output is read. =============================================================================== Miscellaneous and specific notes and examples. ------------------------------------------------------------------------------- Using expect as a PTY wrapper only... Basically wrapper a command to allow piped, line based input and output, to commands that use /dev/tty for password reading. This is actually an old version of the 'unbuffer' expect script. =======8<-------- #!/bin/sh # # Description: unbuffer stdout/stderr of a program # Author: Don Libes, NIST # # Option: -p allow run program to read from stdin (for simplification) # # Note that expect can 'continue' a comment line, so the follow re-runs this # as a expect command regardless of its PATH location. # exec expect -- "$0" ${1+"$@"} if {[string compare [lindex $argv 0] "-p"] == 0} { # pipeline set stty_init "-echo" eval spawn -noecho [lrange $argv 1 end] interact } else { set stty_init "-opost" eval spawn -noecho $argv set timeout -1 expect } =======8<-------- Note that the "unbuffer" command is installed as part of the expect package. It is provided as I have actually need to re-create the above in a a Perl Expect PTY wrapper for a co-process in a perl script (for passing passwords to programs). ------------------------------------------------------------------------------- Awk 3.1 can do co-processing too http://oreilly.com/catalog/awkprog3/chapter/ch10.html BEGIN { Command="subprogram args" do { print data |& Command Command |& getline results while (some condition) close(Command, to) close(Command) } As is typical of awk, the string command should be exactly the same in all file descriptor calls. Output is flushed automatically, but the sub-program may need to flush its own output using some means. If the command is /inet/protocol/local-port/remote-host/remote-port then the bidirectional pipeline is a nework stream. Set local-port to '0' when you like the system to just pick a port. The remote port can be a number or a service port name. For example BEGIN { Service = "/inet/tcp/0/localhost/daytime" Service |& getline print $0 close(Service) } for more information see "info gawkinet" =============================================================================== Special Command Notes... Telnet This is generally used as a 'network connection' program, but has problems. First it does not read 'piped' input (especially of Sun Microsystems Computers). Second some versions flushes its input after it has connected, meaning you have to delay input until after that point. Better solutions. mconnect (under Solaris) is a unbuffered stdin/stdout telnet program. tcp_client Similar but a bi-directional perl script (lots of variants). netcat or nc can handle tcp or udp and both client or server (listen) side The newest of this series is... socat Described as "netcat" on steroids. Basically it is a 'universal' pipeline connector. See my notes in http://www.ict.griffith.edu.au/~anthony/info/apps/socat.txt and its home page http://www.dest-unreach.org/socat/ Of course this is only for the communications, you will still need appropriate "waitfor" or " type handling to make it all work. ------------------------------------------------------------------------------- Editor Controls for this Document: vim:set tw=72: -------------------------------------------------------------------------------