------------------------------------------------------------------------------- Co-Process Programming -- Interactive Commands in Shell Scripts By Anthony Thyssen Last Major Update 28 September 2011 This article is on the co-processing of software commands, services, or daemons. This is NOT a article of hardware co-processors, or multi-CPU's, FPUs, or even GPU's, which also comes under the heading of co-processing. A Co-process is defined as being two (or more) processes interacting with each other with bi-directional communications. This is very different to simply piping data into or out of a program as the output from programs can effect later input to the same program, which is not possible using the normal pipeline mechanic. We specifically focus on using shell scripts to run and communicate with another long-term background process, service, or interactive program. Typically with the shell script being the master process, controlling a more specialised background slave process. I find a shell script is preferred as there is typically a LOT of trial and error and constant adjustments and tweaking of the process, which is well suited to shell programming. Part of this has been copied to https://bash.cyberciti.biz/guide/Co-Processing_-_controlling_other_programs A tutorial into which it seems to fit very nicely. ------------------------------------------------------------------------------- Introduction to using a Co-Processing Programs sometimes have to control other programs that requires or enforces some type of interaction, ones that typically assumes a user will be doing the typing, and not another program or file stream. Such programs are generally pre-existing CLI commands that do jobs that are far to complex to use in a simple manner. Examples include, running commands on remote servers, making general database requests, programs that deal with complex data, controlling devices, communicating with network daemons. So you end up having some 'command' that you want to run, under the control of a 'wrapping' shell script, telling it what to do for some specific highly complex and involved task. Controlling another program and be tricky. Not only do you need to send information to that command, but you also need to get the commands results, make use of those results. You may even need to modify your future communications based on previous results, and with that loop, co-processing is required. Part of the loop, may only be the need to wait for prompts, watch and handle errors, end-of-data, end-of-file, timeout if the unexpected happens, or deal with other conditions that could make such interactions 'flaky', or result in a 'lockup' condition. ------------------------------------------------------------------------------- What is a co-process? Generally speaking it is any process that your program is interacts with, with some form of feedback. It may be a background process running on the same machine, a complex command requiring user interaction, or even some form of network service on another machine. What is Two-Stage Programming? Is a fancy way of saying, writing programs to control other programs. It is often the term used in theoretical computer science papers. Where "co-processing" is typically the programming term for the same thing. This includes things like * Communicating with Databases (mysql, oracle, etc) * Running Backup programs and reacting appropriately (dump, tar) * Running programs that may produce many different results. * Using control interfaces (scientific instruments, Arduino) * Processing data too complex for shells to deal with simply (images, video) * Changing Passwords using interactive commands. Network Server Communications... The action of talking to a very complex network server, such as a NNTP usenet news server, or a local FreeNet Node, (which I have worked on in the past) is also practically identical in complexity, to talking with a background command. A network server however is typically better defined and documented then most CLI interfaces to a library. It usually defines exactly how results and data is returned, typically in the form of a RFC (internet Request for Comments), and does so in a form that is easily parsed and handled. This includes * Remote control of a terminal session (via SSH, RSH, or telnet) * Retrieving files from some file service (world-wide-web, FTP) * Uploading files to some service (Flicker, Dropbox, Twitter) * Talking to some network server (NNTP news server, SMTP mail server) and so on. Modem 'Chat' scripts... This is also exactly the same problem. Setting up a modem initialization, dialling and automatic login using old modem connection methods. Typically after being configured and logged-in, the 'chat' script will pass the pipeline to a terminal program so a user can assume direct control. So if you are old enough to have dealt with modems, you have already done co-processing! ------------------------------------------------------------------------------- A more detailed explanation... Remember co-processing is really more about communicating and controlling other interactive programs. They may have been launched by you with a communications channel (pipeline), or already running and you connected to them via a IPC, or other networking protocols. All the above things are essentially the same. The critical aspect is that it is a two way communication, with FEEDBACK. That is you are not just doing... ───▶ Format_Requests ───▶ 'Run Command Once' ───▶ Parse_Results ───▶ Which is just a simple 'pipelined' programing technique. Or re-opening the connection again later to make use requests. ───▶ Format_Requests ───▶ 'Run Command Once' ───▶ Parse_Results ───▶ Make_Decission ───▶ 'Run New Command' ───▶ Parse_Results ───▶ Make_Decission ───▶ 'Run New Command' ───▶ Parse_Results True co-Processing more often includes some type of data feedback, to the continually running command or pipeline, without closing and re-opening. ───▶ Format_Requests ───▶ 'Background Process' ───▶ Parse_Results ───▶ ▲ │ ╰─────────◀──────── feedback ◀────────◀───────╯ That is the results will effect the future requests that is made. Either simply for timing when those requests can be sent (synchronization), or modifying the request based on the data returned (feedback). It is the feedback to the same command/pipeline that make it a true co-process. A simple example of such a co-process is a web page download script, which not only downloads web pages, but may also need to login to access the web page that is wanted, all within the same connected session. It may need to go through multiple pages, to find a specific or latest bit of information or file, with the additional complexity of cookies, and input forms, to actually get the data you are wanting to get in an automated way. It can get complex! Of course since this is such a common task, and the web interactions are so well known, many programs ("wget", "curl", etc) have been written to remove the need for most feedback loops, breaking it up into simpler individual and separate program calls. In other words such commands will convert the basic interactions into a linear sequence of completely separate command calls. ------------------------------------------------------------------------------- Other guides to co-processing... Command-line interactive programs in UNIX shell-scripts Development of the "empty" program (see below) http://www.osnews.com/story/10929 Co-Processes in PHP - using proc_open() http://php.net/manual/en/function.proc-open.php Co-processes in Python - using the subprocess module... http://www.doughellmann.com/PyMOTW/subprocess/ ------------------------------------------------------------------------------- Why use a co-process? Why not a more direct method like an API? Basic reasons to use a co-process... * Convenience A provided command may do a very complex task, that would take a huge effort to re-engineer into a form that you need. The command is mostly right, or just needs to be merged into a larger task. Such commands will often remove the need for doing much of the co-processing complexity. For example these are all programs that 'simplify' things... + web page download (using "wget" or "curl"), + running remote commands without compromising security, (using "ssh", "rsync", "unison", and related programs) + encrypting/archiving files, without requiring expert knowledge. + "bc" and "dc" for complex mathematical calculations If you are writing such commands, then you need to deal with co-processing issues from users. * Network Interface Only While communicating with a network or daemon service is not strictly a command, it is still co-processing. The network interface can in some cases be the only way to control or communicate with that service. This can include: direct access to "SMTP", "POP", and "IMAP" mail services. "nntp" usenet downloading. "freenet" node control, even uploading files to flicker, or dropbox. Basically the 'co-process' may in fact be entirely on a different machine, and you talk to it though file streams, just as you would to a 'local sub-process'. * Automation and Monitoring If you want to have a program regularly access, check, or monitor some activity, a command line or network connection may be the only practical way to do it. If some feedback for login to a remote service is needed, co-processing may also be required to pass usernames, passwords, MFA, and setup information at the appropriate times without timing problems. * Persistence of State Many programs and library's have a huge setup time, and state. In such a situation it can be far better to run the program once, in the background, and feed it tasks to do, rather than running multiple individual commands for each step. The state (or complex data) may also have a high cost in terms of read, parsing, formatting, and writing that data that is best avoided, simply by only running a command once, rather than multiple times. Also you may get more exact results simply by keeping that data in memory. Examples of state include + Network connection, which can take some time to setup. (smtp) + Current directory, or location you are currently working on. (ftp) + Intermediate data, saved variables, and user functions. (mysql) + Security concerns (password access, write access, encrypted data) * Control and Looping You may need to decide what the next step is, based on the data received from previous calls, or depending on user options. This is generally impossible to do in a strict "input | command | output" data pipeline. With co-processes you can make use of the shell/perl scripts own IF-THEN and LOOPING constructs to do much more complex tasks that may be difficult to do in otherwise. Even just converting user commands into 'optional' actions can often be easier handled using a co-process technique. * Handling or Errors, or special conditions A strict pipeline of operations just can not modify later actions when some unusual condition develops. This is especially useful for working around known problems or special conditions. In many cases this is actually the whole point of creating a wrapper around some existing program in the first place. * Multiple Instances You can have multiple co-processes, not only with the same 'command' but with a number of different commands and services. You can then have data from one, modify the other. I don't mean in a just 'piping' data, but in modifying things, or depending on data from both the source and the feedback from the destination service. One typical example is creating a graphical user interface wrapper for a command, or programs to convert data found in one database and storing it another completely different database with a completely different structure. * Less Version Changes or System Dependency over time A provided command to a library API does not change as often as the underlying methods and techniques. If some major API change does occur, the command interface often absorbs that change so as to less effect users using that command. This can make a script using a Command Line Interface (CLI) far more portable, especially between different Operating Systems, such as Linux, Solaris, and MacOSX, than a program compiled to use an API directly. Also when something does change a scripted co-process is often simpler to modify than the more direct API program. Examples of CLI wrappers to complex API's include: "sqlplus" for oracle and other databases, "openssl" for encrypted files and secure communications, "gpg" for mail and file encryption "imagemagick" for image manipulation. And probably many others. ------------------------------------------------------------------------------- Problems with co-processes... This are the typical problems people face when dealing with a co-process. 1/ TTY Input problem Reading directly from the TTY device, rather than standard input. For example.. Mount commands, SSH, Encryption Passwords etc... This is typically done for security reasons, such turning off echo while the user is typing the password, or just to make it harder for a novice user to abuse the command, in simple 'password cracking' attempts. Or simply so it can read passwords, while also reading standard input for data. The solution is to typically wrap the command in some type of PTY, allow you to pipe in your own automated 'keyboard' input. Examples of such programs that only do this include, "unbuffer", "condom", "unbuf". Packaged techniques that provides the same service is "pty", "expect" "empty", and "socat" (see below) Another program that was designed specifically to handle this problem with SSH is "sshpass". See commentary below. An example of what is needed is in "socat" examples page. Some commands (like "openssl") can read passwords from a user defined file stream, passed to the command by the shell. 2/ Input Junked problem Some commands will simply read and junk all old input, before prompting for new input as a security precaution. This prevents user type-ahead from being used, especially when network delays make that data irrelevant. A typical example is when prompting for a password. The programs junk all previous input before prompting for a password to assure you enter a password and not other delayed input. Both "ssh" and "passwd" does this just before prompting for a new password. Also the oldest versions of "telnet" did this just after it connects to a remote system. Some newer versions of these commands have this 'feature' removed, to allow easier pipe-lining. In that case you either need to delay, or specifically wait for the prompt to be received, before sending the necessary data. This is probably what you should be doing anyway. Note: that prompts may not have newline at the end! (see below) 3/ Input being Read, before it is needed That is some program will read the input when it doesn't need to. For example 'ssh' will often gobble up the input even if the command on the remote system does not need it yet. For example echo "one" | ( ssh localhost printf 'zero\\n' ; cat ) zero echo "one" | ( ssh -n localhost printf 'zero\\n' ; cat ) zero one For this example "ssh" needs a '-n' option to tell it not to read input, that it is NOT going to use. That is the input was actually meant for the "cat" command and not "ssh". This may not be intensional by the program, but may be the result of over zealous buffering by the library (see the head example next). 4/ Buffering Problem You may need to avoid the system STDIO library from 'blocking' the output. That is you don't want it reading data until it is full before sending, but to send the data either immediately, or when it gets a newline. One way to do this is run the command in a PTY (fake psuedo-terminal) so it thinks it is running on a normal interactive terminal. In that case the system will buffer data only until it gets a linefeed. OR the fake terminal was put into raw (unbuffered for immediately delivery) mode. A great guide to the STDIO library buffering problems is provided by... http://www.pixelbeat.org/programming/stdio_buffering/ Examples of such solutions include... + "condom" (or "pty -0") of the PTY package. + "nobuf" script of the replacement "ptyget" package. + "unbuffer" of the "expect" package (standard on most linux). The "stdbuf" of GNU coreutils package (standard on most linux and later solaris systems) however uses special I/O control, to avoid needing a PTY, and is the recommended general solution. However it will not work with some commands that use lower level I/O handling, for example 'cat'. Input Example... Here more input was read into a buffer by the 'sed', than the program needed, thus the 'cat' did not see the second line. printf "one\ntwo\n" | ( sed 1q ; cat ) one And here 'sed' will read only one line, before aborting, giving the second line to 'cat' to output. printf "one\ntwo\n" | ( stdbuf -i0 sed 1q ; cat ) one two Another example... $ coproc S { while read line; do echo "$line"; echo "Cool, huh?"; done; } $ echo "Test" >&${S[1]} $ head -1 <&${S[0]} Test $ echo "Test2" >&${S[1]} $ head -1 <&${S[0]} Test2 # kill $! That second "head" should have read and output "Cool, huh?", but the first "head" read a large buffer and threw away that line, so it was not available for the second "head" to read. Output Example... This is especially relevant when doing a grep/awk/sed of the output that has a real time component to them. For example log monitoring... tail -f /var/log/messages | grep 'error of interest' Both the tail and the grep will buffer the lines until the buffer is full before printing, so the results will not be seen as they happen. Using 'stdbuf -oL' to buffer by line tail -f /var/log/messages | stdbuf -oL grep 'error of interest' Modern greps have a option to line buffer tail -f /var/log/messages | grep --line-buffered 'error of interest' Using socat on the input side socat EXEC:'tail -f /var/log/messages',pty,ctty,end-close STDIO | grep 'error of interest' Note that many modern versions of 'cat', 'grep', 'sed' have 'line-by-line' or 'unbuffered' options added for this very reason, causing the program to forcefully flush after each line, rather than leaving it up to the stdio library buffer handling (see above examples). Other less obvious solutions... Running command via "ssh" with a '-t' option will also create a pseudo-TTY wrapper around the command, and will let you run the command on a remote machine at the same time. It also has a '-n' to tell it NOT to read stdin, when it does not need it, and you are only interested in its output. The "script" type-scripting program is also supposed to do this, but will generate logs of the output. One method is to use it as a 'tee' logging program. The 'cat' is to prevent script double echoing its input. long-running-command | script -q /dev/null sh -c 'exec cat > /dev/null' See also Buffering in standard streams http://www.pixelbeat.org/programming/stdio_buffering/ Buffering Problem in Bash Pipelines http://mywiki.wooledge.org/BashFAQ/009 A Forum discussion on the problem https://unix.stackexchange.com/questions/25372/ which also looks problems that read more input than is necessary This data buffering can be especially difficult with network services using packets of data. The packets could be split-up, or even modified and merged into larger chunks by intervening proxies. You cannot rely on data being kept together by end-of-lines, or other record boundaries, or even relay on how the source program originally sent the data. Be especially wary of record boundaries being split over multiple packets and data reads. 5/ 'End-Of-Data' marker problem How do you know your requested information is complete? Basically... When has the program finished returning data? Many UNIX commands for example don't given any final 'end-of-data' indicator when they are finished. Though most interactive commands (and remote commands) will output a 'prompt' which can be used as a EOD indicator. If you can, change the prompt at the start to make it more 'unique', so as to provide a better EOD marker (see "Prompt Problem" next). If a program has no 'end-of-data' indicator, you could just send some extra 'echo' or 'print' command which returns a unique indicator, and keeps your wrapper and the command 'in-sync' with the command. Generating such sync'ed output can be useful in and of itself, especially if some type of counter can be included, to ensure you are actually 'in-sync'. If you can't generate a EOD, you will need to resort to waiting a 'fixed delay', or timeout on the final read, after which you could assume the output is finished. However this is the worst case as it will slow and delay your processing enormously, and could be prone to a network delay or glitch. You also have to be careful of errors, which may result in your program not receiving the expected data, and even preventing the appropriate 'end-of-data' response. A big help here is if the error is returned in a standard recognizable format, that is different to the expected data. Network protocols are especially good at providing good error reports, as part of the standard response handling. As such 'end-of-data' handling should not only include the, expected EOD or prompt, but also program terminated/crashed conditions, end-of-file or stream, timeout, and known error reports. Expect such error strings to expand over time, as different situations, operating systems, machines, and versions of the program can change how reported errors are returned, for the exact same error condition. The best idea is for you to recognise invalid or unexpected data, rather than wait for valid data and known specific errors responses. Or better still a mix of the two. It is error handling that is the cause of most co-processing programs to fail, and the reason they tend to become fragile over time. WARNING: Perl-Expect does not seem to recognise 'program exit' as a viable 'end-of-data' indicator! 6/ Prompt problem... The prompt (or EOD marker), declaring the co-process is finished its current task, and is awaiting more input) may not have a final newline character! This can cause buffering problems with programs which are not PTY wrapped and allow you to place the command in a 'raw' terminal mode. However typically such prompts are flushed so unless the prompt is feed though other programs that does not auto-flush incomplete lines, this is not a problem. It also means that when reading a prompt that does not contain an linefeed or return character, you cannot simply just 'read a line' as you would normally. In this case you must read all 'available data', without regard to what data that is, append that to previous reads, before looking for the prompt string. Only once you have the prompt, can to look at the data line-by-line. As such if you can change the prompt, you may like to add a newline to the end of the prompt. Note that if an error did happen, the prompt may not even be the last part of the read data, but could actually appear somewhere 'in-the-middle' of the data that was read. That can happen if the command being run performs other background tasks, which may continue for a short time after the original request you made is finished. I have seen this happen! 7/ Binary data. No matter how you look at it, most shells cannot handle or store a NULL character in a string. As such shells in general cannot easily handle raw binary data. If this is a possibility, you will need to have the data read and saved to file, and then use programs that can handle binary data, such as "dd", "perl", "python" or a compiled "C program". The secondary program will then need to handle any "waitfor" conditions required, as typically you do not want that program simply waiting for an EOD or EOF condition that will never come. However saving binary data to a file, or to a separate file descriptor that you provide, is not always possible. 8/ Handling out-of-band data and errors... Some programs may output data, errors, or some type of status report, at any time, which could become mixed with the output you requested. For example a periodic status update of some background action to a TTY terminal. The classic case of old was, "You have mail" message! It can also output unexpected errors and problem reports, to standard error, which a PTY handler could then merge into the man data stream, complicating matter further. Worse still, some programs output a status or progress reports directly to the TTY and not to stdout. Again a PTY handler could merge into your data feed. Out-of-band data can enormously complicate the whole situation as you may have to continuously monitor an interactive command, for these updates, before, during, and even after you have read the data you specifically requested. It gets worse if this out-of-band data looks similar to your specifically requested data! Worse still the unexpected data can cause a erroneous 'end-of-data' sequence, or destroy your 'end-of-data' indicator, or make the EOD indicate disappear completely. The only solution is to not only look for your 'end-of-data' but you also have to look for and identify these status and error data as separate to the data you was expecting. Unfortunately unless you are intimately familiar with a program (and all variants of that program) you are unlikely to be able to predict all such 'exceptions'. This is the cause of 'frail' behaviour, and unexpected failures. If you find you are listing LOTS of 'problem conditions' then perhaps you need to re-think your situation. Handling unexpected data, and reporting it in a meaningful way can make the life of you and anyone who inherits your program much easier. That is you should be able to collect all data and parse out unexpected error indicators, either as a pre-filter, or during data collection. As long as 'end-of-data' indicators are not effected, and the out-of-band data is line buffered, it should not be too difficult. Even the normal bash login shell can produce such out-of-band data when when it runs scripts with background commands that are killed or exit at unexpected times. Believe me out-of-band data is the bane of interactive program control! 9/ Timeouts When 'end-of-data' is just not available, or something unexpected happens, the common solution is a timeout of some type. The problem with that is "How long do you wait?". Too long and your program can become very slow. Too short, and you can fail to handle problems caused by a slow network connections, or very busy computers. Another solution, if that application allows it, is to program in some sort of 'heartbeat'. That is you get the program to output an out-of-band indicator that it is still alive and working. Or have it respond to asynchronous input "Are you ok?" type signal. This is typically a key sequence that produces a small response of no real importance. With shell scripts, you have no simple way to just read all data that is currently waiting. Though this is getting better with 'shell select' or non-blocking reads becoming more available. The "sshpass" program for example avoids a lot of the complication by only handling TTY directed IO, leaving the wrapped "ssh" commands normal stdin/out/err pipelines connected as normal. This in turn prevents you miss-handling a possible second 'password' requests say from the program the "ssh" ran (like a "sudo") on the remote machine. 10/ Avoiding Lockups.... A real problem with interactive command programming is a lockup. Basically if slave program has finished outputting its data, and is waiting for new input, while your controlling program is still waiting for more input data, or a End-Of-Data indicator, that will never come. Typically this is caused by the interactive program aborting unexpectedly or some unexpected error causing the program to never produce the desired response you are waiting for. This is another reason why using unique 'prompts', and a timeout can provide a good solution. The other form of lockup involves multiple channels of communication. For example the controlling program is waiting for some type of expected response from the slave process, but the slave process errored, and has output information about the error on a separate error channel, which is being ignored. Another lockup is similar but very rare. One of the processes is sending lots of data, but the other process is not reading it. Eventually the communications buffers will reach their (huge) limits and communications stops, with one program waiting to send, and the other failing to read. But this could take a very long time due to the size of the network buffers involved in modern machines. Final type of lockup is if the parent does not check for the command unexpectedly dying (child signals), or ignores 'pipe' closure signals. Phew... And you thought a co-process would be an easy solution! Well in most cases it is. The above are generally specific things to look for when things go wrong, and should be kept in mind when writing co-processes. But while they can be a problem, in most cases they are exceptions and rare events. At least until they happen to you! Designing the right 'waitfor' data handler that can handle: End-Of-Data, End-Of-File, Out-of-band data, errors, program exit, and timeout; is crucial to dealing with the above problems. ------------------------------------------------------------------------------- Programs that can help with scripted co-processing (not a complete list) Co-Processor Launchers expect This is the traditional TCL scripting for running interactive commands. However it is not usually thought of as 'simple' to use and does not interact well with more complex scripting (except TCL). These days expect is available in most other languages (perl, python, ruby) in one form or another, and some shell expect commands are also available. See "WaitFor" below for the things such a package should be able to provide. pty A command to run a command in a PTY. Last release pty-4.0 (1992) It also provided code for the original extremely simple shell "waitfor" type program. pty Another simple "pty" program from the Addison Wesley book Advanced Programming in the UNIX Environment - Second Edition By W. Richard Stevens, Stephen A. Rago ptyget A complete re-write of the pty package, by the same author Release 0.50 empty Run command in a PTY connecting it to FIFO named files. Good but seems incomplete. See story behind "empty" at... Command-line interactive programs in UNIX shell-scripts http://www.osnews.com/story/10929 exec-with-piped.c A more primitive "empty" but without send/recv/waitfor functions. socat (see below) Buffer and TTY control unbuffer, condom, nobuf simple scripts from each of the three packages that provide PTY wrappers for commands. These provide proper line-by-line STDIO data pipe-lining of the commands. Having at least one of these command is typically vital when creating a shell-based co-processing, and I have created one, see "Using expect as a PTY wrapper" below. stdbuf Similar to the previous PTY command, but by modifying the stream settings (size and mode of the buffer) directly. That is without using a full PTY wrapper around the command. It is part of 'GNU coreutils' so is available on most Linux machines by default, even when TCL/Expect is not. Less intensive than using a a full PTY, and generally just as effective. socat Part of a general toolbox command (see below) Network Connections and Support telnet, mconnect Just simply connect to a remote network TCP port. tcp_client A 'telnet' in perl (as a example), without TTY, or buffer problems. Multiple versions. Mail me if interested. netcat A 'telnet' network connection replacement, providing both client 'connect' and server 'listen' handling for TCP and UDP. socat A universal network and command connector. It has been called "netcat" on steroids. Everything you need for any command, pipeline, file, or network connection. It includes PTY, or interactive "readline" wrappers. As well as SSL secure communications, port forwards, and so on. Expect to see this command become more commonly used in complex situations. sshpass In many ways "sshpass" is a special C program 'expect'-like wrapper for SSH commands, to allow the use of passworded ssh connections (or other programs) from automatic scripts. Of course using public keys is considered the better alternative. However the source code for this very small program makes for nice reading. The web project page does not really tell you what the program does until you download it so you can read the manpage. :-( It can even be used for different programs that wants a password! Example using it with mysql! sshpass -p topsecret mysql -u root --password -e 'statement' Of course mysql has its own password handling options. shell_seek A perl problem providing the means to seek on shell file handles. But more importantly allows you to shutdown() one side of a network connection (typically the write or send to denote EOF), while keeping the other direction open (input or receive for final results). https://antofthy.gitlab.io/software/shell_seek Multi-Stream Select/Poll shell_select.c A simple primitive wrapper around "select()" system call, to make it available in shell. Difficult to use, and of limited to first 32 file handles. shell_select.pl A "select()" for shell, written in perl using the IO::Select module. Allows handling of both stdout/stderr or multiple simultaneous co-processor handling, or dealing with large data flows. (see examples below) https://antofthy.gitlab.io/software/#shell_select Miscellaneous runscript This is actually designed for modem use but is feature rich and could be used as a basis for an 'expect' or 'waitfor' handler of a co-processor. chat This is essentially a device controller. It sends specific strings, and looks for a specific response. If it does not see it, it sends a different response to abort and clean up the process. It is however a form of co-process with the sub-process being a physical device rather than another command. =============================================================================== Co-Processes shell script techniques... Using co-processes is not generally difficult, at least not until you come across some unexpected problem (see above). The bigger problem is that typically everyone does co-processing in their own way, depending on the complexity of the interaction, the command/service being used, and possible errors that needs to be dealt with. Quite a number of techniques have been developed to use co-processing, in scripts or from more advance languages like C, perl, and PHP. Over the years I have used many of them, for one reason or another. Co-processes in shells is in some ways easier (no need to deal with the low level IO library) and fast debug cycle, and in other ways harder (no direct access to 'select()' and unnamed pipes, though there are solutions). ------------------------------------------------------------------------------- Direct Data Pipeline... This is basically equivalent to UNIX FAQ - Running interactive programs from a shell script http://www.faqs.org/faqs/unix-faq/faq/part3/section-9.html This however only provides minimal start point. What follows goes much further. First if you do not have the 'feedback' method described above, then you do not actually need to use a co-process. A pipelined process which feeds the input, as a completely separate task to interpreting output is all that is required. For example, when using "ftp" as a sub-process, you can just use a simple 'piped-input'. In this case a 'HERE file' of static input will work perfectly fine. The output of the command can then be either logged, parsed or junked as a completely separate stage of the pipeline. =======8<-------- # download file from a anonymous ftp server ftp -n -i remote.server <<-EOF # 2>/dev/null user ftp your_email_address cd /direct/file/is/stored binary get filename dest_filename bye EOF =======8<-------- But of course you can't make decisions about what to feed into the command using a pipe-lined approach, after you start process. It is a purely 'blind', undirected request, that will either succeed or fail, according to the parsed output. Just about all scripted ftp and web server requests are typically performed using this blind, one connect per request, method. So lets look at a more problematic command. ------------------------------------------------------------------------------- Timed Data Pipelines... The "passwd" program is designed to be interactive with a user, not program driven. It will prompt for a new password (twice), if you are the superuser. Ordinary users will also get a extra initial prompt for their current password, as a security measure. Typically a system programmer wants to use the "passwd" in a script because: * It is also the same command for every UNIX system that has ever been made. (convenience) * They don't want to have to edit the system password file and its associated locking, and all the possible problems that may cause. (API complexity) * They don't care how the password is encrypted and stored. (data store) * When something goes wrong, they don't want to end up with a system that no one can use. (error checking, robustness) ASIDE: In the following example I put a password on the command line. Yes I know this is bad at any time, though it is safe to do so when using a shell built-in like "echo", the BASH "printf", or KSH "print". So it is actually secure in this case, if it is secure in the script. The problem is that the "passwd" command only accepts input from a TTY. So a PTY wrapper such as "unbuffer", "expect", "empty", or "socat" is needed to allow you to communicate with it. The "passwd" command also likes to 'junk all input' before prompting the user for a new password. As such you MUST wait at least long enough for the command to prompt for the new passwords before actually sending them. For example this is a typical first attempt. A static 'input pipe with pauses', that will blindly change the given users password (assuming you are the superuser). =======8<-------- #!/bin/sh # username is passed as 1st arg, password as 2nd ( sleep 5; echo "$2"; sleep 5; echo "$2") | unbuffer passwd "$1" =======8<-------- But this fragile, as it will fail if the computer is very slow to prompt for the new password, or some other unexpected error occurs. It is also very slow with all the extra pauses in its processing. Imagine using such a script to set the password of a 1000 new users! It would take hours! Basically it needs some real co-processing techniques... For similar example using "telnet" see http://steve-parker.org/sh/hints.shtml#telnet ------------------------------------------------------------------------------- Shell Co-Processing, using shell pipelines... Here we present a true co-processor, which watches the output and responds appropriately. The next step is to save the output and look for the appropriate response. For example her I just save the output into a file and watch it. =======8<-------- #!/bin/sh # # username is passed as 1st arg, password as 2nd # The script is essentually ( ... ) | passwd user > $output # output="/tmp/passwd_chg.$$" ( while :; do # wait for the appropriate response if tail -1 "$output" | grep "assword:" >/dev/null; then continue sleep 100e-06 # replace usleep 100 -- Arrgghhh done echo "$2" # send password while :; do # wait for the appropriate response if tail -1 "$output" | grep "assword:" >/dev/null; then continue sleep 100e-06 # replace usleep 100 -- Arrgghhh done echo "$2" # send password sleep 1 ) | unbuffer passwd "$1" > "$output" =======8<-------- The above is good, and will actually wait for the appropriate response from the program before actually sending the data. The loops shown in the above is a form of 'waitfor' or 'expect' code. It is this code which is an essential feature of true co-processing. It can be further improved by adding timeouts so that if something goes wrong, you don't end up waiting forever. A more advanced version of the above see... http://steve-parker.org/sh/hints.shtml#telnet And in particular the "telnet2.sh" script http://steve-parker.org/sh/eg/telnet2.txt This version has timeouts, but uses full-second waits which can make it quite slow to respond to the feedback from the command (and the remote system). ------------------------------------------------------------------------------- Co-Process using expect... "Expect" is one of the oldest programs and is specifically designed for co-process handling. It provides the PTY's ("unbuffer" used above is actually an "expect" script) but it provides the ability to do "waitfor" or "expect" handling looking for multiple responses and time outs. For example here is that same program using an expect script =======8<-------- #!/usr/bin/expect # username is passed as 1st arg, password as 2nd set password [index $argv 2] # start "passwd" co-process spawn passwd [index $argv 1] expect "*assword:" send "$password\r" expect "*assword:" send "$password\r" expect eof =======8<-------- This will run as fast as possible, as it will actually look for the password prompts the "passwd" command outputs, before sending the requested data. The 'expect' commands in the above are more commonly known as "waitfor" strings, as that is the feedback that you are 'waiting for'. The problem with the original "expect" is that it was written in a language that not many people understand, TCL. Although there are "expect" versions for other languages like: "PerlExpect" (for 'perl'), "py-expect" ('python'); it has so many options that is it generally regarded as hard to understand and use properly. Provided examples are generally not always much help when dealing with specific situations, and requires time to figure out how to use properly. Also you may not have "expect" installed on a computer, making it a less viable option. Essentially to use "expect" you have to both install it, and learn a new language! And who wants to do that. The shell can do the same job, with just a little extra help ("unbuffer" or "empty"). --- Another alternative that is presented in Steve Parker's "Unix / Linux Shell Scripting Tutorial", in a section "Simple Expect Replacement". http://steve-parker.org/sh/expect.shtml My more generic script is available at https://antofthy.gitlab.io/software/#shell_expect And is used like this... shell_expect expect_data log_file | command > log_file This creates a generic shell script which reads a static 'send-expect' list operations. The send lines are sent, the 'expect' lines, either delay, or wait for specific 'expect strings' to appear in the command output log file. It is like the previous method essentially just simpler 'chat' script (like for old 'modem' setup programs). Essentially: Send this, expect that, or abort. It is a simple solution for running a co-processor, with static input, with appropriate feedback "waitfor". And provides a good method of handling a co-process with static input/output data. The major drawback with this technique is that it uses a file for the commands output, and unless the command 'flushes' its output properly, or is wrapped in a PTY the file the expected string may be buffered, and not flushed to the file, where it can be read. Especially ensure the data you are waiting for is 'flushed' continuously rather than buffer it. --- Here is another script that is similar in nature, found on... http://grulos.blogspot.com/2006/02/script-expectsh-using-bash-instead-of.html =======8<-------- #!/bin/bash # # expect_list.sh # expect=("hi" "white" "in") reply=("hello" "black" "out") while read -n1 "char"; do word="$word$char" [ "$word" == "${expect[$count]:0:${#word}}" ] || word="$char" if [ "$word" == "${expect[$count]}" ]; then printf "\n${reply[$count]}\n" (( count++ )) [ "$count" == "${#expect[@]}" ] && break word="" fi done =======8<-------- Basically each 'expect' string is looked for (without regard for white space or newlines), and when seen the equivalent 'reply' string is echoed with a newline. (This could be written better using a shell hash instead of arrays) Note it does not provide for timeouts, regualar expression matching, or graceful handling of unexpected events such as EOF. --- The above methods are typically either too simplistic, restrictive in format, and does not handle un-expected situations very well. They also make it harder to expand as the situation evolves and become more complex, which typical happens as the co-processing program is developed. ------------------------------------------------------------------------------- Full Shell Co-Processing (Bourne Shell)... ******** A full shell Co-Processing method is to setup the command using FIFO named pipes, and some type of simple "waitfor" C program. The advantage of using named pipes is that the command is launched at the top of the script (where it should be), and the rest of the processing follows in sequence. The "waitfor" program removes the loops making the rest of the script much cleaner looking, and easier to follow. (WANTED: Source for the "waitfor" C program) =======8<-------- #!/bin/bash # username is passed as 1st arg, password as 2nd # set up "passwd" co-process mkfifo /tmp/passwd_in.$$ /tmp/passwd_out.$$ # make named pipes unbuffer out.$$ passwd "$1" & # background the command passwd_pid=$! passwd_in=10 passwd_out=11 # name the file descriptors eval "exec $passwd_in>/tmp/passwd_in.$$" # assign them eval "exec $passwd_out&$passwd_in "$2" waitfor <&$passwd_out 'assword:' echo >&$passwd_in "$2" # close input pipe (send "end-of-file") eval "exec $passwd_in>&-" # wait for sub-process to exit wait =======8<-------- This is probably the most useful co-processor technique in shell scripts and typically all you need in most cases. Note that once the named pipes have been opened, the actual named pipe files are no longer needed so can be removed, saving the need to clean them up later. The pipe will remain in effect while the file descriptors remain open. The "waitfor" command is a simple program that searches for its argument in the input, character by character, without waiting for that 'end-of-line' character that may never come. The first "waitfor" program I know about came from the old "pty" package (1992), but there have been many variants of it since. Remember a prompt rarely ends in a linefeed, so the waitfor must not actually wait for a full line containing a final newline or return, as the EOL may never come. Later I will look at various methods of writing "waitfor" programs and functions that will read and test the program output (shell input). Side Notes: When using file descriptors stored in variables using exec, will need to use "eval" to substitute the varables. This is for both Bourne Shell and BASH. On older UNIX systems you may need to use the command /etc/mknod in.$$ p to create a named pipe. The "mkfifo" is the more modern method. It can also be useful to know just what file descriptors are in current use so you can avoid something that is already being used. One method to find unused file descriptors is to list the special directories /dev/fd or /proc/self/fd another is to look at the output of "lsof -p $$" but that can be slow. Generally there is no need. ------------------------------------------------------------------------------- Shell Co-Processing, only using named pipes (no file descriptors)... This is another example of using FIFO named pipes, but this time I use the named pipes directly, without using shell file descriptors. =======8<-------- #!/bin/sh mkfifo out.fifo in.fifo unbuffer out.fifo passwd "$1" & # background command cat out.fifo > out.fifo & # prevent echos to fifo sending EOF cat > in.fifo & # That is keep the FIFO pipelines open input_pid=$? # Note the input so we can 'close' it # watch the output and send info waitfor in.fifo "$2" waitfor in.fifo "$2" # clean up and close connections rm out.fifo in.fifo kill $input_pid wait =======8<-------- The two 'cat' background processes are used to 'hold' the named pipe open so as to prevent individual 'echo's and 'read's from closing the pipe. This means sub-programs do not depend on file descriptors that they may not have access too, but instead can send/recv directly to the the given FIFO named pipes. That is to say multiple separate programs could be opening the same named pipes, without the FIFO pipe closing. More importantly, sub-programs called by your control script do not need to rely on the open file descriptors being passed via the 'exec' process. That last can be particularly a problem when a 'close-on-exec' flag has been set on the file descriptor. A problem I have encountered with perl file handles (which enables 'close-on-exec' for security reasons). Bash generally does not use 'close-on-exec', except for its "coproc" file descriptors, which I thought was rather silly. Using permanently open named pipes works, but I hate leaving named pipes around longer than I need to. Also I keep feeling that each 'echo' command I use will send a EOF to the command, as that is its normal behaviour. In fact it is very difficult to actually send that EOF, unless to specifically kill that 'cat' holding command. And finally, if something does goes wrong (and it will), you can be left with running background cat processes preventing the background co-process from exiting. The only advantage I can see is for multiple separate invocations of some command occasionally talking to a shared background process, (like a network daemon process). A bit like the use of "screen" to hold TTY sessions open over separate user logins. But you'd have to watch out for two such commands executing at the time or you'll get problems. That is to say you may need to implement some type of file locking. In summery the technique uses extra processes, and name pipes hanging around that require appropriate clean up at the end. I could easily be left in a very funny state (co-process left running). I regard it as rather a bad co-processing technique. - The "exec_with_piped" launcher program This was a advancement on the previous technique of co-process communication. See the old 14 November 1997 package "pipe_scripting.sh" http://okmij.org/ftp/Communications.html#sh-agents =======8<-------- /etc/mknod FIFO-PIPE p exec_with_piped FIFO-PIPE "ssh remote Mathematica" & echo "Mathematica-command-1" > FIFO-PIPE ... see the result on the screen ... echo "Mathematica-command-2" > FIFO-PIPE ... see the result on the screen ... echo "Quit" > FIFO-PIPE =======8<-------- Note that this is basically a one directional named pipe handler (not a co-process) that launches a background process to do the same task as the background "cat" in the previous example. The code is simple, and makes for interesting reading, before going to the next true co-processing via named pipes helper, "empty". ------------------------------------------------------------------------------- Using "empty"... The "empty" program is actually designed to handle things using named pipes just as described above. However it hides all the details, and handles all the problems that is involved. This simplifies scripts enormously, while also providing other tools like a simple "waitfor" options. You can download it from http://empty.sourceforge.net/ When "empty" launches the co-process (using -f), it will create the named pipes and also a background FIFO holding daemon (replacing the background "cat" commands used previously). =======8<-------- #!/bin/sh # # Changing passwords using "empty" coprocess # empty -f passwd "$1" # waitfor prompt (for max 5 seconds) and send response empty -w -v -t 5 "assword:" empty -s "$2\n" # waitfor prompt (for max 5 seconds) and send response empty -w -v -t 5 "assword:" empty -s "$2\n" # at this point process should be finished. # but lets be sure, and kill it! empty -k =======8<-------- You do not actually need to specify the FIFO files to use (we didn't), but it is better to provide the FIFO filenames you want to use as "empty" does not then need to search for them (in /tmp) every time it is run (that could take a bit of time if you have a large /tmp). Also the "empty" background daemon automatically cleans up if the controlling shell script dies, or co-process exits, making what was a 'dirty hack', a rather clean co-processing technique. Security Note: In a recent update "empty" now allows you to give the send string as standard input, allowing you to avoid putting it on the command line. This avoids the password in the above being visible in the process ("ps") listing! Note this is not a problem using a built in like "echo". Its 'waitfor' system is to my mind inadequate and limited, as it only allows for a single static 'expect' string. A multi-string 'waitfor" (expect-like) would be a great addition. Also "empty" does not seem to have any way of specifically closing (EOF) the co-process input pipe, without it also closing its co-process output pipe. That means in some cases you not get the final bit of data from the co-process, such as a final summary or results,that a program generates on EOF, unless that program allows some form of 'quit' or 'exit' request. It is great technique, but "empty" could really use some extra features to allow it to handle a broader range of problems. Note that all the abilities that "empty" does provide can be achieved by a BASH script, and functions. One example of a "empty" type program that was written entirely in shell is... http://www.technetra.com/2009/04/26/discovering-web-access-latencies-using-bash-coprocessing/ A copy of which (in case the above disappears) is at... https://antofthy.gitlab.io/info/co-processing/coprocess_shell.txt ------------------------------------------------------------------------------- BASH "coproc", using file descriptors only... The bash "coproc" built-in can make the launch of a co-process much easier. A good introductory guide is guide is, on "Bash Hackers Wiki" https://wiki.bash-hackers.org/syntax/keywords/coproc =======8<-------- #!/bin/sh # username is passed as 1st arg, password as 2nd # set up "passwd" co-process coproc unbuffer passwd "$1" echo "co-process \"passwd\" started on $COPROC_PID" echo "command input is sent to file descriptor ${COPROC[1]}" echo "and output is read from file descriptor $COPROC" # watch the output and send info waitfor <&$COPROC 'assword:' echo >&${COPROC[1]} "$2" # watch the output and send info waitfor <&$COPROC 'assword:' echo >&${COPROC[1]} "$2" # close input pipe (send "end-of-file") exec ${COPROC[1]}>&- # wait for co-process to exit wait $COPROC_PID =======8<-------- This is almost identical to the original FIFO with file descriptors we previously recommended above. There is just no named pipe files involved, and you do not have to worry about it clashing with an existing, already opened file descriptor. However while the launch is clean and neat, the use of a file descriptor array is rather messy and at times hard to follow. So to make it more readable you can reassign the values to something more readable. =======8<-------- #!/bin/sh # username is passed as 1st arg, password as 2nd # set up "passwd" co-process coproc p_out { unbuffer passwd "$1"; } #p_out=${p_out[0]} # output from command - this is already the case p_in=${p_out[1]} # input into command p_pid=$p_out_PID # the commands Process ID (for wait and kill) # watch the output and send info waitfor <&$p_out 'assword:' echo >&$p_in "$2" # watch the output and send info waitfor <&$p_out 'assword:' echo >&$p_in "$2" # close input pipe (send "end-of-file") exec $p_in>&- # wait for co-process to exit wait $p_pid =======8<-------- In my tests, the file descriptors used by BASH "coproc" starts at 60 which may be a problem when using the primitive "shell_select" program (see below). But the "perl" alternative can handle the higher numbered file descriptors. Also the shells file descriptors opened by "coproc" was set with a "close-on-exec". This means you can not pass them to a child program so it can directly read/write/waitfor/select with those file descriptors. You can fix that problem by moving the file descriptors to another file descriptor (like stdin/stdout as I did in the above), but that sort of defeats the whole purpose of using "coproc" in the first place. The biggest gotcha with BASH "coproc" is you can only have one co-process, even if you specifically name it. If you try to create a second you will get an error. warning: execute_coproc: coproc [????] still exists I would assume that if users demand it, it will likely be updated to allow multiple coproc's. Though for now it appears to be a low priority. Which is unsurprising. All in all I think it is still better to DIY it, using temporary named pipes (or "empty"), which does not have these problems. --- There is a similar bash script, "coproc.bash" in the BASH source examples area. This provides a similar setup, using FIFO pipes that are removed after the shell file descriptors are connected. I think it may have been a for runner for the 'coproc' builtin. The defined function it provides... coprocess open command... coprocess close coprocess print "string to send" coprocess read var coprocess status NOTE: 'close' assumes the command has exited, as it does not kill the process after closing the pipeline. Also no 'waitfor' facility is provided. One feature of interest is a automatic 'close' if the 'print' causes a 'sigpipe' error signal. ------------------------------------------------------------------------------- Korn-sh (ksh) Co-Processing... The Korn-sh provides some very simple pipline handling for a single co-process. You would start a co-process using command |& Then you can send data to the command using print -p ... and read returned data from it using read -p ... Very simple and straight forward, with ksh handling all the pipelines making it easy to send and receive data from the co-process. Though you would still need to ensure you handle the expected output. An example of this is "Calc" script... http://www.shelldorado.com/scripts/cmds/calc.txt Here is a simplified version of that script.. =======8<-------- #!/bin/ksh Scale=2 Expr="$*" bc |& # start the co-process set -e # terminate on errors print -p -r -- "$Expr" # send bc the expression print -p '"GNUELPF\n"' # send bc a EOD while read -p; do [[ $REPLY = GNUELPF ]] && break # exit loop on EOD print -- "$REPLY" # echo result done set +e =======8<-------- It runs "bc" to do the given mathematical calculations. It first 'print' the expression to the co-process, then also send a command to output a specific End-Of-Data string. After that 'read' the output until the EOD string is seen. Quite simple and elegant. Like bash however only one co-process can be handled (see next section), but it is a lot simpler that bash in using a co-process. =============================================================================== Multiple Command Output Streams OR simultaneous use of stdout and stderr... OR handling multiple co-processes! All the above techniques have used only STDIN and STDOUT of the co-process. STDERR is left to output to the same destination as the rest of you program. Hopefully into some log, or user display, that can be checked when something goes wrong. One way of handling STDERR is to simply roll its output into the STDOUT stream of the co-process, and deal with errors as part of normal co-processing techniques. command <$in >$out 2>&1 Note that PTY wrappers (like "unbuffer") or remote commands (like "ssh") already do this automatically, as a consequence of the way internal TTY handlers work. However merging errors and output may not be convenient. It may be hard to differentiate between normal output and error output, and the waitfor may not work well with multiple possible input conditions. Even the "expect" package does not handle multiple streams very well. One solution may be to wrapper the co-process command, so as to include a unique marker to the error output. For example using a shell script wrapper around the sub-process withing the PTY. For example here I use a "perl" command to output to both stdout and stderr, and add a "sed" pipeline to identify the error channel output before the two streams become merged together. exec 9>&1; perl -e 'print "stdout\n"; print STDERR "stderr\n";' 2>&1 >&9 | sed --unbuffered 's/^/ERROR: /' This outputs (though order may be reversed)... stdout ERROR: stderr Another example is my "cmdout" script, that reports the command executed, marks the output, error lines, and the commands final exit status. I wrote this years ago to check how on how specific commands work and where I need to look for ite output. https://antofthy.gitlab.io/software/#cmdout For more about this see my info on "Shell File Handles and Descriptors" https://antofthy.gitlab.io/info/shell/file_handles.txt --- A better way is to simply keep output stream as a separate entity. That way errors and or data progress reports does not 'spoil' the main data feed. For example logging errors to a file, and checking that file at appropriate points for such errors and problems. However processing errors as a separate step also has its problems. For example you could be waiting for the 'end-of-data' marker in the normal output, but an error has happened, so that marker never comes. This is a 'output lockup', and unexpected errors can often be the biggest problem with any co-processing technique. Another lockup problem is if you are dealing with two streams which are both generating lots of data. You could end up with the co-process unable to output to one channel as the buffers are all full, while you are trying to read data on the another stream. This last condition is however very very rare as modern computers provide very very large buffers, (80Kb in one measurement of a internet stream, which I made), making it unlikely for a output block to happen. This problem is more common with trying to feed large volumes (larger than buffer sizes) of input while also reading large volumes of output. That is throughput data lockup. The solution is also the same. Basically when two or more output streams are involved, (or even handling multiple co-processes), you can not generally afford to wait (read) the output of one stream only. You need to be able to read and deal with both output streams simultaneously. This basically devolves down to the question... When is data available to read? Closely related to this, though far less important, is... When is it safe to feed more data to the co-process. In both cases it really only becomes important when lots of data is being send (such as large files). It is also of particular importance when the co-process involves databases, large file transfers, or image processing. There are a number of solutions to this, but essentially they boil down to two methods, both of which should be used. * Polling (non-blocking reads) * IO wait (select system call) The 'select' (or system IO wait) is typically the best solution, but rarely seen in scripting. It is also a system call that no shell I have seen actually provides to its users as a built-in, and so you will need some type of external program to make this call. I only found one 'C' program which provides a form of 'select' for shell scripts, the source of which is no longer generally available. This primitive "shell_select" program is given a list of file descriptors and then returns what descriptors has data to read (output from co-process), or a non-full buffer for write (input into co-process) though input buffers are rarely and even difficult to ever fill enough to block. When I say primitive, that is exactly what I mean. The program was limited to just the first 32 (on a 32 bit machine) file descriptors (0 to 31), and returned results in a fairly useless bitmask that shells had difficulty in parsing, and thus determine what file descriptors are ready for handling. WARNING: This limitation makes the C-program version useless when using it with a bash "coproc". Not that "coproc" allows separation of stdout and stderr or other file descriptors, directly. I wrote an equivalent version using the perl "IO::Select" module, and it has proved to work very well. It does not have the 32 bit limit, and outputs results in a shell compatible form. You can download it from https://antofthy.gitlab.io/software/#shell_select A "-t" option will also allow you to specify a timeout (or poll) the file descriptors, to allow programs to simply test for input while they continue to perform other operations while waiting. ------------------------------------------------------------------------------- Co-Processing with both stdout and stderr... This example uses a "shell_select" program (perl version preferred). https://antofthy.gitlab.io/software/#shell_select You can download it from that location, along with a more interactive version. For example this is a wrapped around the "bc" program to calculate mathematical expressions the user provides, but handles both stdout, stderr, and a timeout of the results in different ways... =======8<-------- #!/bin/bash # # A "bc" calculator with 8 decimal places and error handling. # # Example of setting up a co-process with completely separate handling of # stdin, stdout, and stderr channels. Basically this example completely # wrappers the "bc" program allowing for full program control without lockups. # # Try it with this example input. # a = 3 # 10 / a # some error # print "some output\n" # #### # # NOTE: Without the '-t 0.0001' in the shell "read" built-in below, a print # without a newline will lockup the program! In that event you will not get # a complete line from the read! # mkfifo /tmp/bc_in.$$ /tmp/bc_out.$$ /tmp/bc_err.$$ bc /tmp/bc_out.$$ 2>/tmp/bc_err.$$ & bc_pid=$! bc_in=10 bc_out=11 bc_err=12 eval "exec $bc_in>/tmp/bc_in.$$" eval "exec $bc_out " user_input; do # read request from user # send user input to "bc" with -- assuming it is always okay to send echo >&$bc_in "scale=8; $user_input" # wait for any output, or timeout, -- sets $rd_ready eval `shell_select -t 1 -r $bc_out,$bc_err` # wait/timeout #echo "rd_ready = $rd_ready" if kill -0 $bc_pid 2>/dev/null; then # has co-processor exited? : all is fine else wait $bc_pid status=$? echo "BC EXIT" exit $status fi if [ -z "$rd_ready" ]; then # any output ready to read? echo >&2 "TIMEOUT: no results for expression" continue fi # handle results (stdout) or errors (stderr) from appropriate source while true; do case ",$rd_ready," in *,$bc_out,*) read -r -t 0.0001 -u $bc_out result echo "Result => $result" ;; *,$bc_err,*) read -r -t 0.0001 -u $bc_err error echo "ERROR => $error" ;; *) break ;; # no valid file descriptor - end of output esac eval `shell_select -t 0 -r $bc_out,$bc_err` # poll for more output #echo "rd_ready = $rd_ready" done done echo "NO MORE USER INPUT" echo >&$bc_in "quit" wait $bc_pid exit "$?" =======8<-------- Try running the above program and feeding it expressions given in the comments. This will handle both normal output results (from stdout), handle errors from (stderr), as well as timeout if "bc" takes more than a second to do its calculations, or just exiting, normally or unexpectedly. It could have been done using polled reads, but only at the cost of looping constantly, and thus using up lots of CPU cycles. That can be slowed by using "sleep" commands, but that produces a lack of responsiveness, and longer timeouts to unusual events. WARNING: This script previously failed if you did not put a "\n" in the 'print' command of the example input. Can you see why? (See using bash "read" below) The real point of using a helper program like "shell_select.pl" is that you could use it to handle simultaneous output from multiple co-processors, or even large volumes of data that is far larger than the named pipe buffer size. Though that is unlikely as it is very large buffer! The above also demonstrates a type of "waitfor" handler, which could have been turned into a function or separate program. And that is the next topic. It is the use of a simple line read rather that a true "waitfor" that was the cause of this programs failure on lack of newline. --- This of course brings us to using a "waitfor" program, script or function. I showed one such aspect using the bash "read" builtin in the above script. =============================================================================== WaitFor or Expect Handling The "waitfor" program something that will look for a specific strings, such as a "prompt" or "password" request, before returning. Better still it should allow you to have a list of possible "strings" and return which "string" matched, as well as handle EOF, and a timeout result. It should be able to report the text leading up to that string. It may also be able to send fixed responses for specific "strings", though this is not always desirable. It is often better to have the response handled by the calling script rather that the "waitfor" function itself. A C program form of "waitfor" used to be available at... ftp://stealth.acf.nyu.edu/pub/flat/misc-waitfor.c The original ultra simple version was from the pty packages (published 1992). --- Here is shell script called "expect.sh" that does a simple "waitfor-respond" type task. Note that it assumes named pipes were used for the IO. http://www.osnews.com/story/10929/Command-line_interactive_programs_in_UNIX_shell-scripts/page4/ =======8<-------- #!/bin/sh # # expect.sh "search" "response" # # The named pipes out.fifo and in.fifo connect to the # interactive program, and should be previously setup # This has no timeout, or error condition handling. # while :; do dd if=out.fifo bs=1b count=1 2>/dev/null | grep "$1" if [ $? -eq 0 ]; then echo "$2" > in.fifo # Match found, send response exit 0 fi # Match not found, continue to search. done =======8<-------- This is designed using original Bourne Shell constructs only so it can be used on very old and odd UNIX systems. Note the use of 'dd' so as to to read one character at a time without end-of-line problems. An improvement to the above is to use BASH 'read' built-in to terminate the read on the last character of the requested response string. For example, this defines a function to wait for a string, but reading until it sees that last character of that string. =======8<-------- waitfor() { string="$1" lastchar=${string: -1 } output="" while [ "X${output: -${#string} }" != "X$string" ]; do read -d"$lastchar" new_output output="$output$new_output$lastchar" done } # ... waitfor 'assword:' =======8<-------- Note how it appends the string to what was previously read, and then test against the number of characters in the wanted string. This way it works even if the last character is not unique in the desired output. This makes it faster, with less looping than single character reads of the previous example, but it only works when looking for a single response (or EOF or timeout). However if you are only waiting for a expected prompt or a fixed end-of-data marker, the above will work fine. --- Basically the "waitfor" function is the main core of an "expect" type package, and just how complex it is depends on just what you need to handle. Things an 'WaitFor' should handle include... + multiple search strings (regualr expressions) + timeouts + end of stream (end-of-file, pipeline close, network shutdowns, program exit) + signal (sigchld, sigpipe) + optionally multiple channels (stdout & error) + continuation of data reads (EOD may become split over two reads) =============================================================================== Bash "read" Notes... As read is a very important part of input handling, here I look at its details and how it all works. Normal blocking read (waits for newline)... read -r -u $pipe_out line Note the "-r" to have it ignore backslash escapes status = 0 for success, or 1 for EOF WARNING this will ignores a final incomplete line on EOF. Blocking read one character (without waiting for newline)... read -r -u $pipe_out -N 1 char status = 0 for success, or 1 for EOF Blocking read until EOF... read -r -u $pipe_out -d '' string Useful for searching for a specific string without regard for newlines. (see example above). Poll input to see if there is anything to read... read -r -t 0 -u $pipe_out Using a timeout of 0 however does not actually read anything, but simply polls the input to see if there is something to read. A exit status of 0 is yes, and 1 is nothing to read or EOF. You can't distinguish between no input and EOF at this time, except though the using of a 'select()' system call (see above for "shell_select"). Non-blocking read of a full line read -r -t 0.00001 -u $pipe_out line This is a non-blocking read that times out very very quickly. If a line was read, you get a exit status of 0 (okay). However it will only return a result if a full line was available If a full line was not available you get a weird exit code of 142! This happens even for a incomplete line followed by EOF. That can be very bad if you want to use it to waitfor prompts. An exit of 142 means... 128 (signal) + 14 (SIGALRM or timeout) When you see this status, it may be that you can then read the output one character at a time using "-n1", or without a delimeter "-d ''", to get all the output. For more on bash reading see... https://antofthy.gitlab.io/info/shell/input_reading.txt =============================================================================== Miscellaneous examples... Especially those with a specific problem and solution... ------------------------------------------------------------------------------- Using expect as a PTY wrapper... Basically wrapper a command to allow piped, line based input and output, to commands that use /dev/tty for password reading. Here is an old version of the 'unbuffer' expect script. =======8<-------- #!/bin/sh # # Description: unbuffer stdout/stderr of a program # Author: Don Libes, NIST # # Option: -p allow run program to read from stdin (for simplification) # # Note that TCL can 'continue' a comment line, so the following # causes the shell to re-runs this script as a expect command, # but the Expect TCL script ignores this line. # \ exec expect -- "$0" ${1+"$@"} if {[string compare [lindex $argv 0] "-p"] == 0} { # pipeline set stty_init "-echo" eval spawn -noecho [lrange $argv 1 end] interact } else { set stty_init "-opost" eval spawn -noecho $argv set timeout -1 expect } =======8<-------- Note that the "unbuffer" command is installed as part of the expect package. I have provided the above as I have actually need to re-create it in a Perl Expect PTY wrapper for a co-process in a perl script (for passing passwords to programs that can read it from a terminal). ------------------------------------------------------------------------------- Awk 3.1 can do co-processing too But you need to use the special 'flush' pipe operator "|&" "info gawk" section 4.9.7 - Using 'getline' from a co-process "info gawk" section 11.3 - Two-Way Communications with Another Process http://oreilly.com/catalog/awkprog3/chapter/ch10.html =======8<-------- BEGIN { COMMAND="subprogram args" do { print data |& COMMAND COMMAND |& getline results while (some condition) close(COMMAND, to) close(COMMAND) } =======8<-------- As is typical of "awk", the string command should be exactly the same in all file descriptor calls. By putting it in a variable, you can ensure that this will be the case. I recommend making it all uppercase to distinguish its special usage as a file handle. "getline" does not update NR and FNR as it is not the normal input stream but it does split up the input files if not assigned specifically to a variable (such as "results" in the above). Output is flushed automatically when using "|&", but the sub-program may also need to automatically flush its own output, or be 'unbuffer' wrapped. And you can not read the sub-processes standard-error, as this only goes to the same standard-error place as awk itself. If the 'command' filename is /inet/protocol/local-port/remote-host/remote-port then the bidirectional pipeline is to/from a network stream. That is gawk sees no difference between a network co-process and a sub-program co-process. Set local-port to '0' when you like the system to just pick a unused client as the send port. The remote port can be either a number or a service port name. For example, talk to the local mail server (SMTP)... =======8<-------- gawk 'BEGIN { MAIL = "/inet/tcp/0/localhost/smtp" MAIL |& getline; print $0 print "HELO localhost" |& MAIL MAIL |& getline; print $0 print "quit" |& MAIL close(MAIL, "to") MAIL |& getline; print $0 close(MAIL) }' =======8<-------- For more information see "info gawkinet" Note that close() can close just one end of a bi-directional channel, Awk can use very special code to generate pty's for sub-processes For example command = "sort -nr" # command, save in convenience variable PROCINFO[command, "pty"] = 1 # update PROCINFO print ... |& command # start two-way pipe =============================================================================== NOTES for Specific Commands... Telnet This is generally used as a 'network connection' program, but has problems. First it does not read 'piped' input (especially on Sun Microsystems Computers), so a pty wrapper is needed to pipe into it. Second some versions flushes its input after it has connected, meaning you have to delay input until after you have read that has connected. Better solutions. mconnect (under Solaris) is a unbuffered stdin/stdout telnet program. tcp_client Similar but a bi-directional perl script (lots of variants). netcat or nc can handle tcp or udp and both client or server (listen) side The newest of these types of programs is... socat Described as "netcat" on steroids. Basically it is a 'universal' pipeline connector. See my notes about socat in https://antofthy.gitlab.io/info/apps/socat.txt and its home page http://www.dest-unreach.org/socat/ Of course this is only for the communications, you will still need appropriate co-processing and waitfor handling of the data streams to make it all work. ===============================================================================