| |
Last update: December 23, 2000
Secrets I learned...
about standard input/output
Do one thing and do it well.
The Great Canadian Railroad was a daunting undertaking, linking divergent
communities across thousands of miles through many different conditions.
A single piece of railway track won't get you very far, but, put a lot
of them together, and you have something that will. That's the Unix
approach too. And understanding that concept is central to understanding
Unix.
It would not surprise me to learn that there are as many Unix tools as
that railway has ties. And, just like railway ties, they can be fit together
to form something meaningful, to get you where you need to go.
for example
Suppose, for example, that you need a list of the different people
currently logged onto the system.
The who command lists everybody on the system. It's a comparatively
small task and it's all that who does. It has a few options on
how it should do it and it does it very well.
But it doesn't answer the question because, if someone is logged on more
than once, it reports that someone multiple times.
The uniq command looks at a list and weeds out duplicates. It has
a few options and does its task quite well. Having an eye to the
uniqeness of our user list would help. However, the uniq
command only looks at adjacent items in a list. If our list looks like
alan
bob
carol
alan
then uniq won't be able to recognize that Alan is logged on twice.
The input to uniq must be pre-sorted.
The sort command sorts things. Sorting is really quite a complicated
affair, if it's done efficiently. Unix sort is both efficient
and flexible. Sorting is all it does. And it does it very well.
Now, if we only had a way for who to give its results to
sort which could give its results to uniq which
could give its results to us...
Drip, drip, drip
I'm going to leave you in suspense for a moment while I talk about water
heaters and introduce some important terminology.
- My water heater performs a relatively small task. It heats
water. It provides that heated water to a standard output device.
- What is the standard output of the hot water heater? It's
ME. I'm not much of a plumber. I got soaking wet until I
re-directed that standard output into a pipe.
- My hot water bathtub faucet performs a relatively small task.
When I first installed it, it didn't do much. It sat there patiently
waiting for ME to provide it with some standard input.
Eventually I connected it to the hot water pipe and announced
"I hereby inform you that your standard input has been re-directed
so that it comes from this pipe instead of ME."
- What is the standard output of the faucet? It's ME
sitting there in the bathtub. It's also true that there have been
occasions when I've re-directed that standard output
into another pipe (actually, a hose) so I could fill up a
water bed.
Herein lies a critically important Unix concept.
- Just about everything gets some standard input from somewhere
-- usually ME -- and provides something to standard
output which is also usually ME.
- Standard input may be re-directed so that it
comes from a pipe.
- Standard output may be re-directed so that it
goes to a pipe.
- On the computer, standard input is usually ME which
is my keyboard.
- On the computer, standard output is usually ME which
is my screen.
who re-visited
Let's return to the original exercise: a list of the different
people currently logged onto the system. Using the concepts of standard
input, called stdin, and standard output, called stdout and
pipe re-direction, the answer is at hand.
- Execute the who command, re-directing
its stdout to go to a pipe.
- Execute the sort command, re-directing
its stdin to come from a pipe and
its stdout to go to a pipe.
- Execute the uniq command, re-directing
its stdin to come from a pipe.
(Don't re-direct uniq's output. Leave stdout alone, which
means it will come to ME ie: my screen.)
These sorts of thing happen so often in Unix, that there are special notations
for them. The vertical bar on a keyboard is about the closest resemblence to
a pipe. So, if we want the output of command1 to be piped into the
input of command2 we write this:
command1
|
command2
So, our answer is revealed by this "command" which is really a series of
commands, each of which does one thing (and does it well).
who | sort | uniq
File re-directs
Pipes are the intermediators between the links of a chain of tasks.
They are the plumbing that exists between commands. There is always a task
(command, process, etc) on either side of the pipe. The pipe re-directs the
output of the first command into the input of the second command.
There cannot be a pipe at the beginning or end of the chain.
But the beginning and/or end of the chain can still have re-direction from/to
a file.
- The last (or only) command may have its stdout re-directed to go out to
a file. The symbol used is > which
looks rather much like an arrow pointing to the right. Since we read
from left to right, it is akin to saying "what happens next". Thus, we
could have something like:
which would re-direct who's output destination so that it
instead goes into a file named foo.bar
- The first (or only) command may have its stdin re-directed to come in
from a file. It's symbol is representative of an arrow pointing to
the left, ie: <
The head command (which shows the top few lines of a file)
usually takes one arguement which is a filename as in
head foo.bar
However, in the absence of a file name, head assumes its
input is stdin. So, the following command amounts to the same thing:
Let's make that distinction clear.
- Pipe re-direction via | redirects the
output of a command into the input of another command, whereas
- File re-direction via >
and < respectively redirect
a command's output or input to a file, not another command.
2 variants of output file re-direction
There are, in fact two variants of output file re-direction. They are similar.
| > |
re-directs output by writing it to a file. If the re-direction file
does already exist, it will be replaced. |
| >> |
re-directs output by appending it onto the end of an old file. If
the re-direction file does not already exist, it will be created anew.
|
2 variants of input file re-direction
There are, in fact two variants of input file re-direction as well. Truthfully,
they too are similar, but it may not appear that way at first blush.
| < |
re-directs stdin so that it comes from a file. If a file called
my.list looked like this:
peaches
apples
pears
then the command
doSomething < my.list
would tell doSomething to get its input from
my.list |
| << |
re-directs stdin in a rather special way called "HERE (it) IS".
Instead of saying "the stdin for this command is from some pre-existing
file", this technique essentially says "...and
HEREIS that file".
This technique is most often used within a script, but could also
be used in a command line
doSomething <<END
peaches
apples
pears
END
This is all one command, although it appears on several lines. We
are re-directing stdin and essentially saying "and here it is". It
is necessary for us to indicate where the input file ends, which we
do by specifying some unique string. In this case, I've said that
a line consisting of nothing more than the word END on
a line by itself is the "end of file" |
Web server CGI programming and pipeline re-directs
CGI programming is a sophisticated example of piping in action. Assume the web
browser (eg: Netscape) sends a request to the web server to execute a program
called myScript and to pass onto myScript some data that the Netscape user has
entered into a form.
Here's what happens.
- The browser encodes the information that the user has entered (and
maybe some more as well) into a series of Name=Value pairs.
The encoded information is all on one line and might look something
like this:
product=gadget&quantity=12
- The web browser sends its request ("Please execute a CGI program
named myScript") to the web server.
- The web server invokes myScript.
- Insodoing, it re-directs both myScript's stdin to
come from the web server and myScript's stdout to
go (back) to the web server.
- It then provides stdin for myScript and then waits for myScript
to generate some stdout
Consider this small shell script:
#!/bin/bash
read INPUT #### Get user input ####
echo "Content-type: text/html" # (html stuff)
echo # (html stuff)
echo "<html><head></head><body>" # (html stuff)
echo "$INPUT" #### Echo back user's input ####
echo "</body></html>" # (html stuff)
What does this program do? Well, run it from your terminal and see. The
statement
read INPUT
causes the script to wait (for you to provide to it) some input from
stdin.
The script then displays its few lines (including yours) to stdout
- Aside from HTML specific lines, this is quite an ordinary script.
- For purposes of debugging your script as you prepare it for
use as a "CGI-script", you can actually run it as an "ordinary"
script.
- When the script does its read (which is from stdin),
remember that there is no particular prompt; the script will be
waiting for one single line which it expects will contain all the
encoded data it will need.
- You can type that line from your keyboard.
- Alternatively, you might prepare a one-line file that contains that
input. Then, you can run your script using the technique of stdin
re-direction that we've discussed:
myScript < myData
- Debugging your script in this fashion, ie: without even using the
web server is a good idea. If you execute your script from the web and
it contains errors, you may not get a helpful error message out of
Netscape.
Remember that your script's stdout goes to the web server
first. The web server passes things on to the Netscape user. If the
web server doesn't like what it sees from your script, it may not pass
on anything too terribly helpful to you as the Netscape user.
By executing your script from the command line, you will see
whatever output it does or doesn't provide. If something is amiss,
you'll be better able to see it.
Once you can successfully exercise myScript independently of the web,
you're well on the way to having it work through the web as well. But you may
not yet be out of the woods. The web server is a "user" and that user is not
you...
- The web server "user" (eg: "apache") does not have your access
permissions
- Your PATH does not apply to it.
- Your environment does not apply to it.
... so more challenges may lie ahead.
Summary
In this document, we've attempted to demonstrate a powerful Unix approach
aimed at linking small components together via re-direction. Rather than
creating large monolithic programs that do everything, Unix comes with
hundreds, if not thousands, of small components that may be tied together to
accomplish a task.
Along the way, we took a small detour and looked at CGI programming and saw
how re-direction applies to it.
|