|
Last update: October 29, 2000
Secrets I learned... about the
Unix filesystem
DOS filesystem tree(s)
Small aside: I hesitated in referencing the operating
system largely associated with Microsoft as DOS.
The acronym means "disk operating
system" which is a generic term. MS-DOS, PC-DOS, Linux and Unix are all
disk operating systems. Anyway, moving right along...
A DOS filesystem tree consists of several trees. They have device
names such as A: and B: representing diskette drives
and C: D: and E: representing, well, things that aren't,
such as hard disk drives and CDROM drives.
Each one of those trees has a "root". Both DOS and Unix extend this
analogy, in some respects consistant with each other but not in others. We'll
see more varied use of the term "root" on the Unix side, but DOS uses it too.
In DOS, full file specifications begin with the root but,
since there are several roots, they must be distinguished from one another
by specifying the device name as well.
Each device may contain files and sub-directories which "branch" off it. The
sub-directories may themselves contain sub-sub-directories in hierarchical
fashion. After the zero or more directories comes a filename and extension.
A full specification consists of the following parts
Device :
directory\
directory\...\
filename
Until "long filename support", directories and filenames were restricted to
no more than 8 characters with optional extensions of no more than 3
characters. Thus they could be up
to 12 characters in length. Hence, the "8-dot-3 rule".
There is a one-to-one relationship between the name of a file and the data
that it contains, ie: one file, one name.
Strictly speaking, that's not quite true. You see, for backward compatability
sake, DOS maintains 2 filenames for each file: one which is the long file
name and another that codes the long filename into something that still
follows the 8.3 rule. For this discussion, however, forget about that.
A directory is a table of filenames and physical disk address pointers.
The address pointers say "at this address on the disk, you'll find the
contents of this filename."
A directory also contains a list of available pointers, called a "free
list", which says "if you want to create a new file, you can use this
address for it". When you delete a file, all you're really doing is putting
its address pointer into the free list.
The Unix filesystem tree
A Unix filesystem tree consists of only one tree.
Its name is simply "root" and it's represented as (just)
/
The root may contain files and sub-directories which "branch" off
it. The sub-directories may themselves contain sub-sub-directories. And so on.
A full specification consists of the following parts
/
directory /
directory /... /
filename
There is very little restriction as to what characters you can and can't put
in a filename or directory. Take my advise however. Don't use blanks or
non-alphanumeric characters unless you know what you're doing.
Inodes
On the surface, a Unix directory has many similarities to a DOS directory.
(Beneath the surface, however, it's a lot different.) Most notably, the
directory table does not contain physical disk address pointers.
Instead, there's an intermediate stage. Filenames are associated with
inodes (pronounced "eye nodes"). In turn, inodes are (typically)
associated with physical locations on the disk.
This leads to an interesting and powerful feature. More than one filename
may be associated with the same inode! This means that a file can
have more than one name! In fact, each inode has an inode counter which
is the number of filenames that point to it. We'll visit this again later,
demonstrating why this feature is so valuable.
When you delete a file, all you're really doing is reducing the inode counter.
When that inode counter reaches zero, the file is really deleted.
Files, files, everything's a file
It seems that everything is a file in Unix! This may take some mind bending.
Even you yourself (keyboard and screen at least) are a file. When you log
on, you open that file. When you log off, you close it.
Want some proof?
Well, the end-of-file marker is the character Control-D.
Press Control-D. You'll be logged off!
You are obviously a rather special type of file. Data from you
doesn't get written to disk like a regular file. Instead those
characters get written to your screen.
When you do a long directory
listing using ls -l the very first
character on the output line indicates the file type. If that first character
is anything other than "-", the "file" is somehow special. (You may have to
do some mind bending to think of it as a file at all.)
| - |
a regular file |
| d |
a directory of other files |
| c |
a character device
where I/O is done a character a time,
eg: a keyboard/screen |
| b |
a block device
where I/O is done a block of characters at a time,
eg: a disk drive |
| l |
a link (pseudonym) to something else |
| |
and there are others |
Linked to my own device
When I installed Linux, literally thousands of "files" were placed in my
/dev (as in device) directory. There were perhaps a dozen
"files" relating to a mouse alone. Each one contains information necessary
for the O/S to relate to a particular kind of mouse.
I have only one mouse. So what goes on here?
Well, assume there were only 3 different "mouse" devices in the /dev
directory (a grossly understated assumption) and that they were named
/dev/mThis, /dev/mThat and /dev/mTheOther.
After the O/S knows which of those 3 is my real mouse, I'll find that
a 4th file mysteriously appears in the /dev directory named (just)
/dev/mouse
What is this /dev/mouse file? In a sense, it's not a real file. Rather,
it's a link to one of the other three. You might think of it as an
"alias" or "synonym". Suppose that /dev/mouse is a link to
/dev/mThat. That means that accessing /dev/mouse is pretty
much the same as accessing /dev/mThat
That's quite convenient. Other software need only reference the
/dev/mouse file.
If I get some other software, that I cannot modify and that insists
on accessing something like /dev/pointingDevice, I might be in
trouble. Using links however, I've got an alternative. I can make a file called
/dev/pointingDevice and have it also point to my real mouse file.
|