Path syntax rules
There are three types of paths:
- relative paths like
foo
,foo/bar
,../a
,.
. They don't start with/
and are relative to the current directory of the process making a system call with that path. - absolute paths like
/
,/foo/bar
or///x
. They start with 1, or 3 or more/
, they are not relative, are looked up starting from the/
root directory. - POSIX allows
//foo
to be treated specially, but doesn't specify how. Some systems use that for special cases like network files. It has to be exactly 2 slashes.
Other than at the start, sequences of slashes act like one.
~
is only special to the shell, it's expanded by the shell, it's not special to the system at all. How it's expanded is shell dependent. Shells do other forms of expansions like globbing (*.txt
) or variable expansion /$foo/$bar
or others. As far as the system is concerned ~foo
is just a relative path like _foo
or foo
.
Things to bear in mind:
foo/
is not the same asfoo
. It's closer tofoo/.
thanfoo
(especially iffoo
is a symlink) for most system calls on most systems (foo//
is the same asfoo/
though).a/b/../c
is not necessarily the same asa/c
(for instance ifa/b
is a symlink). Best is not to treat..
specially.- it's generally safe to consider
a/././././b
the same asa/b
though.
For example, as best as I can tell, it seems that foo/bar and foo//bar both point to the same place.
Yes. This is common because software sometimes concatenates a path assuming the first part was not terminated with a forward slash, so one is thrown in to make sure (meaning there may end up being two or more). foo///bar
and foo/////bar
also point to the same place as foo/bar
. A nice function for a path manipulation library would be one which reduces any number of sequential slashes to one (except at the beginning of a path, where it may be used in an URL-ish way, or, as Stephane points out, for any unspecified special purpose).
Also, ~ usually stands for the user's home directory
That transformation is done via the shell and tilde exapansion, which only works if it is the first character in the path. Whether or not you need to deal with this depends on context. If the library is to be used with normal programs which receive, e.g., command line arguments containing a path, tilde expansion is already done when they see the path. The only situation I can see it being a concern is if you are processing paths directly from a text file.
Beyond that, ~
is a legal character in a *nix path and should not be changed to anything else. As per this, the only characters which aren't legal in a unix filename are /
(because it is the path separator) and "null" (aka. a zero byte) because they are illegal in text generally.