How do I "normalize" a pathname using boost::filesystem?
Boost v1.48 and above
You can use boost::filesystem::canonical
:
path canonical(const path& p, const path& base = current_path());
path canonical(const path& p, system::error_code& ec);
path canonical(const path& p, const path& base, system::error_code& ec);
http://www.boost.org/doc/libs/1_48_0/libs/filesystem/v3/doc/reference.html#canonical
v1.48 and above also provide the boost::filesystem::read_symlink
function for resolving symbolic links.
Boost versions prior to v1.48
As mentioned in other answers, you can't normalise because boost::filesystem can't follow symbolic links. However, you can write a function that normalises "as much as possible" (assuming "." and ".." are treated normally) because boost offers the ability to determine whether or not a file is a symbolic link.
That is to say, if the parent of the ".." is a symbolic link then you have to retain it, otherwise it is probably safe to drop it and it's probably always safe to remove ".".
It's similar to manipulating the actual string, but slightly more elegant.
boost::filesystem::path resolve(
const boost::filesystem::path& p,
const boost::filesystem::path& base = boost::filesystem::current_path())
{
boost::filesystem::path abs_p = boost::filesystem::absolute(p,base);
boost::filesystem::path result;
for(boost::filesystem::path::iterator it=abs_p.begin();
it!=abs_p.end();
++it)
{
if(*it == "..")
{
// /a/b/.. is not necessarily /a if b is a symbolic link
if(boost::filesystem::is_symlink(result) )
result /= *it;
// /a/b/../.. is not /a/b/.. under most circumstances
// We can end up with ..s in our result because of symbolic links
else if(result.filename() == "..")
result /= *it;
// Otherwise it should be safe to resolve the parent
else
result = result.parent_path();
}
else if(*it == ".")
{
// Ignore
}
else
{
// Just cat other path entries
result /= *it;
}
}
return result;
}
With version 3 of boost::filesystem
you can also try to remove all the symbolic links with a call to canonical
. This can be done only for existing paths so a function that also works for non-existing ones would require two steps (tested on MacOS Lion and updated for Windows thanks to @void.pointer's comment):
boost::filesystem::path normalize(const boost::filesystem::path &path) {
boost::filesystem::path absPath = absolute(path);
boost::filesystem::path::iterator it = absPath.begin();
boost::filesystem::path result = *it++;
// Get canonical version of the existing part
for (; exists(result / *it) && it != absPath.end(); ++it) {
result /= *it;
}
result = canonical(result);
// For the rest remove ".." and "." in a path with no symlinks
for (; it != absPath.end(); ++it) {
// Just move back on ../
if (*it == "..") {
result = result.parent_path();
}
// Ignore "."
else if (*it != ".") {
// Just cat other path entries
result /= *it;
}
}
// Make sure the dir separators are correct even on Windows
return result.make_preferred();
}
Your complaints and/or wishes about canonical
have been addressed by Boost 1.60 [1] with
path lexically_normal(const path& p);