Last update: December 23, 2000

Secrets I learned... about standard input/output

Do one thing and do it well.
The Great Canadian Railroad was a daunting undertaking, linking divergent communities across thousands of miles through many different conditions.

A single piece of railway track won't get you very far, but, put a lot of them together, and you have something that will. That's the Unix approach too. And understanding that concept is central to understanding Unix.

It would not surprise me to learn that there are as many Unix tools as that railway has ties. And, just like railway ties, they can be fit together to form something meaningful, to get you where you need to go.

for example
Suppose, for example, that you need a list of the different people currently logged onto the system.

The who command lists everybody on the system. It's a comparatively small task and it's all that who does. It has a few options on how it should do it and it does it very well.
But it doesn't answer the question because, if someone is logged on more than once, it reports that someone multiple times.

The uniq command looks at a list and weeds out duplicates. It has a few options and does its task quite well. Having an eye to the uniqeness of our user list would help. However, the uniq command only looks at adjacent items in a list. If our list looks like
      alan
      bob
      carol
      alan
then uniq won't be able to recognize that Alan is logged on twice. The input to uniq must be pre-sorted.

The sort command sorts things. Sorting is really quite a complicated affair, if it's done efficiently. Unix sort is both efficient and flexible. Sorting is all it does. And it does it very well.

Now, if we only had a way for who to give its results to sort which could give its results to uniq which could give its results to us...

Drip, drip, drip
I'm going to leave you in suspense for a moment while I talk about water heaters and introduce some important terminology.
  • My water heater performs a relatively small task. It heats water. It provides that heated water to a standard output device.
  • What is the standard output of the hot water heater? It's ME. I'm not much of a plumber. I got soaking wet until I re-directed that standard output into a pipe.
  • My hot water bathtub faucet performs a relatively small task. When I first installed it, it didn't do much. It sat there patiently waiting for ME to provide it with some standard input. Eventually I connected it to the hot water pipe and announced "I hereby inform you that your standard input has been re-directed so that it comes from this pipe instead of ME."
  • What is the standard output of the faucet? It's ME sitting there in the bathtub. It's also true that there have been occasions when I've re-directed that standard output into another pipe (actually, a hose) so I could fill up a water bed.
Herein lies a critically important Unix concept.
  • Just about everything gets some standard input from somewhere -- usually ME -- and provides something to standard output which is also usually ME.
  • Standard input may be re-directed so that it comes from a pipe.
  • Standard output may be re-directed so that it goes to a pipe.
  • On the computer, standard input is usually ME which is my keyboard.
  • On the computer, standard output is usually ME which is my screen.
who re-visited
Let's return to the original exercise: a list of the different people currently logged onto the system. Using the concepts of standard input, called stdin, and standard output, called stdout and pipe re-direction, the answer is at hand.
  • Execute the who command, re-directing
          its stdout to go to a pipe.
  • Execute the sort command, re-directing
          its stdin to come from a pipe and
          its stdout to go to a pipe.
  • Execute the uniq command, re-directing
          its stdin to come from a pipe.
    (Don't re-direct uniq's output. Leave stdout alone, which means it will come to ME ie: my screen.)
These sorts of thing happen so often in Unix, that there are special notations for them. The vertical bar on a keyboard is about the closest resemblence to a pipe. So, if we want the output of command1 to be piped into the input of command2 we write this:
command1 | command2

So, our answer is revealed by this "command" which is really a series of commands, each of which does one thing (and does it well).

who | sort | uniq

File re-directs

Pipes are the intermediators between the links of a chain of tasks. They are the plumbing that exists between commands. There is always a task (command, process, etc) on either side of the pipe. The pipe re-directs the output of the first command into the input of the second command.

There cannot be a pipe at the beginning or end of the chain. But the beginning and/or end of the chain can still have re-direction from/to a file.

  • The last (or only) command may have its stdout re-directed to go out to a file. The symbol used is > which looks rather much like an arrow pointing to the right. Since we read from left to right, it is akin to saying "what happens next". Thus, we could have something like:
    who > foo.bar
    which would re-direct who's output destination so that it instead goes into a file named foo.bar
  • The first (or only) command may have its stdin re-directed to come in from a file. It's symbol is representative of an arrow pointing to the left, ie: <
    The head command (which shows the top few lines of a file) usually takes one arguement which is a filename as in
    head foo.bar
    However, in the absence of a file name, head assumes its input is stdin. So, the following command amounts to the same thing:
    head < foo.bar
Let's make that distinction clear.
  • Pipe re-direction via | redirects the output of a command into the input of another command, whereas
  • File re-direction via > and < respectively redirect a command's output or input to a file, not another command.
2 variants of output file re-direction
There are, in fact two variants of output file re-direction. They are similar.
> re-directs output by writing it to a file. If the re-direction file does already exist, it will be replaced.
>> re-directs output by appending it onto the end of an old file. If the re-direction file does not already exist, it will be created anew.
2 variants of input file re-direction
There are, in fact two variants of input file re-direction as well. Truthfully, they too are similar, but it may not appear that way at first blush.
< re-directs stdin so that it comes from a file. If a file called my.list looked like this:
    peaches
    apples
    pears

then the command
    doSomething < my.list
would tell doSomething to get its input from my.list
<< re-directs stdin in a rather special way called "HERE (it) IS". Instead of saying "the stdin for this command is from some pre-existing file", this technique essentially says "...and HEREIS that file".

This technique is most often used within a script, but could also be used in a command line
    doSomething <<END
    peaches
    apples
    pears
    END

This is all one command, although it appears on several lines. We are re-directing stdin and essentially saying "and here it is". It is necessary for us to indicate where the input file ends, which we do by specifying some unique string. In this case, I've said that a line consisting of nothing more than the word END on a line by itself is the "end of file"
Web server CGI programming and pipeline re-directs

CGI programming is a sophisticated example of piping in action. Assume the web browser (eg: Netscape) sends a request to the web server to execute a program called myScript and to pass onto myScript some data that the Netscape user has entered into a form.

Here's what happens.

  1. The browser encodes the information that the user has entered (and maybe some more as well) into a series of Name=Value pairs. The encoded information is all on one line and might look something like this:
          product=gadget&quantity=12
  2. The web browser sends its request ("Please execute a CGI program named myScript") to the web server.
  3. The web server invokes myScript.
  4. Insodoing, it re-directs both myScript's stdin to come from the web server and myScript's stdout to go (back) to the web server.
  5. It then provides stdin for myScript and then waits for myScript to generate some stdout
Consider this small shell script:
 #!/bin/bash
 read INPUT                       #### Get user input ####
 echo "Content-type: text/html"   # (html stuff)
 echo                             # (html stuff)
 echo "<html><head></head><body>" # (html stuff)
 echo "$INPUT"                    #### Echo back user's input ####
 echo "</body></html>"            # (html stuff)

What does this program do? Well, run it from your terminal and see. The statement
      read INPUT
causes the script to wait (for you to provide to it) some input from stdin.
The script then displays its few lines (including yours) to stdout

  • Aside from HTML specific lines, this is quite an ordinary script.
  • For purposes of debugging your script as you prepare it for use as a "CGI-script", you can actually run it as an "ordinary" script.
  • When the script does its read (which is from stdin), remember that there is no particular prompt; the script will be waiting for one single line which it expects will contain all the encoded data it will need.
  • You can type that line from your keyboard.
  • Alternatively, you might prepare a one-line file that contains that input. Then, you can run your script using the technique of stdin re-direction that we've discussed:
          myScript < myData
  • Debugging your script in this fashion, ie: without even using the web server is a good idea. If you execute your script from the web and it contains errors, you may not get a helpful error message out of Netscape.

    Remember that your script's stdout goes to the web server first. The web server passes things on to the Netscape user. If the web server doesn't like what it sees from your script, it may not pass on anything too terribly helpful to you as the Netscape user.

    By executing your script from the command line, you will see whatever output it does or doesn't provide. If something is amiss, you'll be better able to see it.

Once you can successfully exercise myScript independently of the web, you're well on the way to having it work through the web as well. But you may not yet be out of the woods. The web server is a "user" and that user is not you...

  • The web server "user" (eg: "apache") does not have your access permissions
  • Your PATH does not apply to it.
  • Your environment does not apply to it.
... so more challenges may lie ahead.
Summary

In this document, we've attempted to demonstrate a powerful Unix approach aimed at linking small components together via re-direction. Rather than creating large monolithic programs that do everything, Unix comes with hundreds, if not thousands, of small components that may be tied together to accomplish a task.

Along the way, we took a small detour and looked at CGI programming and saw how re-direction applies to it.