Last update: October 29, 2000

Secrets I learned... about the Unix filesystem

DOS filesystem tree(s)

Small aside: I hesitated in referencing the operating system largely associated with Microsoft as DOS. The acronym means "disk operating system" which is a generic term. MS-DOS, PC-DOS, Linux and Unix are all disk operating systems. Anyway, moving right along...

A DOS filesystem tree consists of several trees. They have device names such as A: and B: representing diskette drives and C: D: and E: representing, well, things that aren't, such as hard disk drives and CDROM drives.

Each one of those trees has a "root". Both DOS and Unix extend this analogy, in some respects consistant with each other but not in others. We'll see more varied use of the term "root" on the Unix side, but DOS uses it too.

In DOS, full file specifications begin with the root but, since there are several roots, they must be distinguished from one another by specifying the device name as well.

Each device may contain files and sub-directories which "branch" off it. The sub-directories may themselves contain sub-sub-directories in hierarchical fashion. After the zero or more directories comes a filename and extension.

A full specification consists of the following parts
   Device : directory\ directory\...\ filename

Until "long filename support", directories and filenames were restricted to no more than 8 characters with optional extensions of no more than 3 characters. Thus they could be up to 12 characters in length. Hence, the "8-dot-3 rule".

There is a one-to-one relationship between the name of a file and the data that it contains, ie: one file, one name.

Strictly speaking, that's not quite true. You see, for backward compatability sake, DOS maintains 2 filenames for each file: one which is the long file name and another that codes the long filename into something that still follows the 8.3 rule. For this discussion, however, forget about that.

A directory is a table of filenames and physical disk address pointers. The address pointers say "at this address on the disk, you'll find the contents of this filename."

A directory also contains a list of available pointers, called a "free list", which says "if you want to create a new file, you can use this address for it". When you delete a file, all you're really doing is putting its address pointer into the free list.

The Unix filesystem tree
A Unix filesystem tree consists of only one tree.
Its name is simply "root" and it's represented as (just)   /

The root may contain files and sub-directories which "branch" off it. The sub-directories may themselves contain sub-sub-directories. And so on.

A full specification consists of the following parts
   / directory / directory /... / filename

There is very little restriction as to what characters you can and can't put in a filename or directory. Take my advise however. Don't use blanks or non-alphanumeric characters unless you know what you're doing.

Inodes
On the surface, a Unix directory has many similarities to a DOS directory. (Beneath the surface, however, it's a lot different.) Most notably, the directory table does not contain physical disk address pointers. Instead, there's an intermediate stage. Filenames are associated with inodes (pronounced "eye nodes"). In turn, inodes are (typically) associated with physical locations on the disk.

This leads to an interesting and powerful feature. More than one filename may be associated with the same inode! This means that a file can have more than one name! In fact, each inode has an inode counter which is the number of filenames that point to it. We'll visit this again later, demonstrating why this feature is so valuable.

When you delete a file, all you're really doing is reducing the inode counter. When that inode counter reaches zero, the file is really deleted.

Files, files, everything's a file
It seems that everything is a file in Unix! This may take some mind bending. Even you yourself (keyboard and screen at least) are a file. When you log on, you open that file. When you log off, you close it.

Want some proof?
Well, the end-of-file marker is the character Control-D. Press Control-D. You'll be logged off!

You are obviously a rather special type of file. Data from you doesn't get written to disk like a regular file. Instead those characters get written to your screen.

When you do a long directory listing using ls -l the very first character on the output line indicates the file type. If that first character is anything other than "-", the "file" is somehow special. (You may have to do some mind bending to think of it as a file at all.)
- a regular file
d a directory of other files
c a character device where I/O is done a character a time, eg: a keyboard/screen
b a block device where I/O is done a block of characters at a time, eg: a disk drive
l a link (pseudonym) to something else
  and there are others

Linked to my own device
When I installed Linux, literally thousands of "files" were placed in my /dev (as in device) directory. There were perhaps a dozen "files" relating to a mouse alone. Each one contains information necessary for the O/S to relate to a particular kind of mouse.

I have only one mouse. So what goes on here?

Well, assume there were only 3 different "mouse" devices in the /dev directory (a grossly understated assumption) and that they were named /dev/mThis,   /dev/mThat and /dev/mTheOther.

After the O/S knows which of those 3 is my real mouse, I'll find that a 4th file mysteriously appears in the /dev directory named (just) /dev/mouse

What is this /dev/mouse file? In a sense, it's not a real file. Rather, it's a link to one of the other three. You might think of it as an "alias" or "synonym". Suppose that /dev/mouse is a link to /dev/mThat. That means that accessing /dev/mouse is pretty much the same as accessing /dev/mThat

That's quite convenient. Other software need only reference the /dev/mouse file.

If I get some other software, that I cannot modify and that insists on accessing something like /dev/pointingDevice, I might be in trouble. Using links however, I've got an alternative. I can make a file called /dev/pointingDevice and have it also point to my real mouse file.