How does Unix keep track of a user's working directory when navigating the file system?
The other answers are oversimplifications, each presenting only parts of the story, and are wrong on a couple of points.
There are two ways in which the working directory is tracked:
- For every process, in the kernel-space data structure that represents that process, the kernel stores two vnode references to the vnodes of the working directory and the root directory for that process. The former reference is set by the
chdir()
andfchdir()
system calls, the latter bychroot()
. One can see them indirectly in/proc
on Linux operating systems or via thefstat
command on FreeBSD and the like:% fstat -p $$|head -n 5 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W JdeBP zsh 92648 text / 24958 -r-xr-xr-x 702360 r JdeBP zsh 92648 ctty /dev 148 crw--w---- pts/4 rw JdeBP zsh 92648 wd /usr/home/JdeBP 4 drwxr-xr-x 124 r JdeBP zsh 92648 root / 4 drwxr-xr-x 35 r %
When pathname resolution operates, it begins at one or the other of those referenced vnodes, according to whether the path is relative or absolute. (There is a family of
…at()
system calls that allow pathname resolution to begin at the vnode referenced by an open (directory) file descriptor as a third option.)In microkernel Unices the data structure is in application space, but the principle of holding open references to these directories remains the same.
- Internally, within shells such as the Z, Korn, Bourne Again, C, and Almquist shell, the shell additionally keeps track of the working directory using string manipulation of an internal string variable. It does this whenever it has cause to call
chdir()
.If one changes to a relative pathname, it manipulates the string to append that name. If one changes to an absolute pathname, it replaces the string with the new name. In both cases, it adjusts the string to remove
.
and..
components and to chase down symbolic links replacing them with their linked-to names. (Here is the Z shell's code for that, for example.)The name in the internal string variable is tracked by a shell variable named
PWD
(orcwd
in the C shells). This is conventionally exported as an environment variable (namedPWD
) to programs spawned by the shell.
These two methods of tracking things are revealed by the -P
and -L
options to the cd
and pwd
shell built-in commands, and by the differences between the shells' built-in pwd
commands and both the /bin/pwd
command and the built-in pwd
commands of things like (amongst others) VIM and NeoVIM.
% mkdir a ; ln -s a b % (cd b; pwd; /bin/pwd; printenv PWD) /usr/home/JdeBP/b /usr/home/JdeBP/a /usr/home/JdeBP/b % (cd b; pwd -P; /bin/pwd -P) /usr/home/JdeBP/a /usr/home/JdeBP/a % (cd b; pwd -L; /bin/pwd -L) /usr/home/JdeBP/b /usr/home/JdeBP/b % (cd -P b; pwd; /bin/pwd; printenv PWD) /usr/home/JdeBP/a /usr/home/JdeBP/a /usr/home/JdeBP/a % (cd b; PWD=/hello/there /bin/pwd -L) /usr/home/JdeBP/a %
As you can see: obtaining the "logical" working directory is a matter of looking at the PWD
shell variable (or environment variable if one is not the shell program); whereas obtaining the "physical" working directory is a matter of calling the getcwd()
library function.
The operation of the /bin/pwd
program when the -L
option is used is somewhat subtle. It cannot trust the value of the PWD
environment variable that it has inherited. After all, it need not have been invoked by a shell and intervening programs may not have implemented the shell's mechanism of making the PWD
environment variable always track the name of the working directory. Or someone may do what I did just there.
So what it does is (as the POSIX standard says) check that the name given in PWD
yields the same thing as the name .
, as can be seen with a system call trace:
% ln -s a c % (cd b; truss /bin/pwd -L 3>&1 1>&2 2>&3 | grep -E '^stat|__getcwd') stat("/usr/home/JdeBP/b",{ mode=drwxr-xr-x ,inode=120932,size=2,blksize=131072 }) = 0 (0x0) stat(".",{ mode=drwxr-xr-x ,inode=120932,size=2,blksize=131072 }) = 0 (0x0) /usr/home/JdeBP/b % (cd b; PWD=/usr/local/etc truss /bin/pwd -L 3>&1 1>&2 2>&3 | grep -E '^stat|__getcwd') stat("/usr/local/etc",{ mode=drwxr-xr-x ,inode=14835,size=158,blksize=10240 }) = 0 (0x0) stat(".",{ mode=drwxr-xr-x ,inode=120932,size=2,blksize=131072 }) = 0 (0x0) __getcwd("/usr/home/JdeBP/a",1024) = 0 (0x0) /usr/home/JdeBP/a % (cd b; PWD=/hello/there truss /bin/pwd -L 3>&1 1>&2 2>&3 | grep -E '^stat|__getcwd') stat("/hello/there",0x7fffffffe730) ERR#2 'No such file or directory' __getcwd("/usr/home/JdeBP/a",1024) = 0 (0x0) /usr/home/JdeBP/a % (cd b; PWD=/usr/home/JdeBP/c truss /bin/pwd -L 3>&1 1>&2 2>&3 | grep -E '^stat|__getcwd') stat("/usr/home/JdeBP/c",{ mode=drwxr-xr-x ,inode=120932,size=2,blksize=131072 }) = 0 (0x0) stat(".",{ mode=drwxr-xr-x ,inode=120932,size=2,blksize=131072 }) = 0 (0x0) /usr/home/JdeBP/c %
As you can see: it only calls getcwd()
if it detects a mismatch; and it can be fooled by setting PWD
to a string that does indeed name the same directory, but by a different route.
The getcwd()
library function is a subject in its own right. But to précis:
- Originally it was purely a library function, that built up a pathname from the working directory back up to the root by repeatedly trying to look up the working directory in the
..
directory. It stopped when it reached a loop where..
was the same as its working directory or when there was an error trying to open the next..
up. This would be a lot of system calls under the covers. - Nowadays the situation is slightly more complex. On FreeBSD, for example (this being true for other operating systems as well), it is a true system call, as you can see in the system call trace given earlier. All of the traversal from the working directory vnode up to the root is done in a single system call, which takes advantage of things like kernel mode code's direct access to the directory entry cache to do the pathname component lookups much more efficiently.
However, note that even on FreeBSD and those other operating systems the kernel does not keep track of the working directory with a string.
Navigating to ..
is again a subject in its own right. Another précis: Although directories conventionally (albeit, as already alluded to, this is not required) contain an actual ..
in the directory data structure on disc, the kernel tracks the parent directory of each directory vnode itself and can thus navigate to the ..
vnode of any working directory. This is somewhat complicated by the mountpoint and changed root mechanisms, which are beyond the scope of this answer.
Aside
Windows NT in fact does a similar thing. There is a single working directory per process, set by the SetCurrentDirectory()
API call and tracked per process by the kernel via an (internal) open file handle to that directory; and there is a set of environment variables that Win32 programs (not just the command interpreters, but all Win32 programs) use to track the names of multiple working directories (one per drive), appending to or overwriting them whenever they change directory.
Conventionally, unlike the case with Unix and Linux operating systems, Win32 programs do not display these environment variables to users. One can sometimes see them in Unix-like subsystems running on Windows NT, though, as well as by using the command interpreters' SET
commands in a particular way.
Further reading
- "
pwd
". The Open Group Base Specifications Issue 7. IEEE 1003.1:2008. The Open Group. 2016. - "Pathname Resolution". The Open Group Base Specifications Issue 7. IEEE 1003.1:2008. The Open Group. 2016.
- https://askubuntu.com/a/636001/43344
- How are files opened in unix?
- what is inode for, in FreeBSD or Solaris
- Strange environment variable !::=::\ in Cygwin
- Why does CDPATH not work as documented in the manuals?
- How can I set zsh to use physical paths?
- Going into a directory linked by a link