What is the difference between a hard link and a file?
The very short answer is:
- a file is an anonymous blob of data
- a hardlink is a name for a file
- a symbolic link is a special file whose content is a pathname
Unix files and directories work exactly like files and directories in the real world (and not like folders in the real world); Unix filesystems are (conceptually) structured like this:
- a file is an anonymous blob of data; it doesn't have a name, only a number (inode)
- a directory is a special kind of file which contains a mapping of names to files (more specifically inodes); since a directory is just a file, directories can have entries for directories, that's how recursion is implemented (note that when Unix filesystems were introduced, this was not at all obvious, a lot of operating systems didn't allow directories to contain directories back then)
- these directory entries are called hardlinks
- a symbolic link is another special kind of file, whose content is a pathname; this pathname is interpreted as the name of another file
- other kinds of special files are: sockets, fifos, block devices, character devices
Keeping this metaphor in mind, and specifically keeping in mind that Unix directories work like real-world directories and not like real-world folders explains many of the "oddities" that newcomers often encounter, like: why can I delete a file I don't have write access to? Well, for one, you're not deleting the file, you are deleting one of many possible names for the file, and in order to do that, you only need write access to the directory, not the file. Just like in the real world.
Or, why can I have dangling symlinks? Well, the symlink simply contains a pathname. There is nothing that says that there actually has to be a file with that name.
My question is simply what is the difference of a file and a hard link ?
The difference between a file and a hard link is the same as the difference between you and the line with your name in the phone book.
Hard link is pointing to an inode, so what is a file ? Inode entry itself ? Or an Inode with a hard link ?
A file is an anonymous piece of data. That's it. A file is not an inode, a file has an inode, just like you are not a Social Security Number, you have a SSN.
A hard link is a name for a file. A file can have many names.
Let's say, I create a file with touch, then an Inode entry is created in the Inode Table.
Yes.
And I create a hard link, which has the same Inode number with the file.
No. A hard link doesn't have an inode number, since it's not a file. Only files have inode numbers.
The hardlink associates a name with an inode number.
So did I create a new file ?
Yes.
Or the file is just defined as an Inode ?
No. The file has an inode, it isn't an inode.
A hard link is a directory entry. A file may have multiple directory entries, if it's present under different names or in different directories. A directory entry is called “hard link” when it's put in relation with other directory entries for the same file.
The inode contains the file's metadata other than its name and contents (location of the contents, permissions, timestamps, etc.). There's one inode per file. (Not all filesystems put the metadata in a clearly identifiable space on disk that you could call “inode”, but it's a common architecture.) A directory entry links a name to an inode. It's possible for more than one directory entry to link to the same inode, hence the term “link”. Such a link is called a “hard link” by opposition to “soft links” or “symbolic links” which don't say “for this name, use this inode” but “for this name, look up that other name”.
Think of files as rooms and directory entries as doors. “Open the file /foo/bar
” means “go to corridor /foo
and go to room bar
”. “Go to room bar
” really means “open the door marked bar
and enter the room” but “go to room bar
” is an unremarkable way to say the same thing in a shorter way. It's possible to have more than one door leading to the same room.
When you create a hard link to an existing file (ln existing new
), you're creating a second link to the same file, i.e. you're creating a new directory entry that links to the already-existing file. After creation, the two directory entries have equal status: there isn't one that is “primary” and one that's “secondary”, they're just both links to the same file.
You can also remove all the links to a file without removing the file itself. This happens if you delete a file (i.e. you remove all its directory entries) while a program still has the file open. The file remains on the filesystem, it's only actually removed when the last process that had the file open closes it. In the room-and-doors metaphor, a room that has no doors still takes up space.
In addition to all other answers I want to point out the following important properties:
A softlink is a true reference, i.e. it is a small file that contains a pathname. Resolving a softlink happens transparently to the application: if a process opens a file, say /this/path/here
which is a symlink pointing to /that/other/path
then the entire handling of opening /that/other/path
is done by the OS. Furthermore, if /that/other/path
happens to be a symlink itself, then this is also being dealt with by the OS. In fact, the OS follows the chain of symlinks until it finds something else (e.g. a regular file) or until it reaches SYMLOOP_MAX
(see sysconf(3)
) many entries, in which case the OS (more precisely: the according system call) returns an error and sets errno
to ELOOP
. Thus, a circular reference like xyz -> xyz
will not stall the process. (For Linux systems see path_resolution(7)
for full details.)
Note that a process can check whether a pathname is a symlink or not through the use of lstat(2)
and may modify its file attributes (stored in the inode table) through lchown(2)
and others (see symlink(7)
for the whole story.)
Now, in terms of permission you will notice that symlinks always have permissions 777 (rwxrwxrwx
in symbolic notation). This is the due to the fact that any other permissions can be bypassed by accessing the actual file, anyway. Conversely, 777 for a symlink does not make the symlinked file accessible if it was not accessible in the first place. For instance, a symlink with permissions 777 pointing to a file with permissions 640 does the file not make accessible for "other" (the general public). In other words, a file xyz
is accessible through a symlink if and only if it is directly accessible, i.e. without indirection. Thus, the symlink's permissions have no security effect whatsoever.
One of the main visible differences between hardlinks and symlinks (a.k.a. softlinks) is that symlinks work across filesystems while hardlinks are confined to one filesystem. That is, a file on partition A can be symlinked to from partition B, but it cannot be hardlinked from there. This is clear from the fact that a hardlink is actually an entry in a directory, which consists of a file name and an inode number, and that inode numbers are unique only per file system.
The term hardlink is actually somewhat misleading. While for symlinks source and destination are clearly distinguishable (the symlink has its own entry in the inode table), this is not true for hardlinks. If you create a hardlink for a file, the original entry and the hardlink are indistinguishable in terms of what was there first. (Since they refer to the same inode, they share their file attributes such as owner, permissions, timestamps etc.) This leads to the statement that every directory entry is actually a hardlink, and that hardlinking a file just means to create a second (or third, or fourth...) hardlink. In fact, each inode stores a counter for the number of hardlinks to that inode.
Finally, note that ordinary users may not hardlink directories. This is because this must be done with utmost caution: an unwary user may introduce cycles into the otherwise strictly hierarchical file tree, which all usual tools (like fsck
) and the OS itself are not prepared to deal with.