locate vs find: usage, pros and cons of each other

locate(1) has only one big advantage over find(1): speed.

find(1), though, has many advantages over locate(1):

find(1) is primordial, going back to the very first version of AT&T Unix. You will even find it in cut-down embedded Linuxes via Busybox. It is all but universal.

locate(1) is much younger than find(1). The earliest ancestor of locate(1) didn't appear until 1983, and it wasn't widely available as "locate" until 1994, when it was adopted into GNU findutils and into 4.4BSD.
locate(1) is also nonstandard, thus it is not installed by default everywhere. Some POSIX type OSes don't even offer it as an option, and where it is available, the implementation may be lacking features you want because there is no independent standard specifying the minimum feature set that must be available.

There is a de facto standard, being BSD locate(1), but that is only because the other two main flavors of locate implement all of its options: -0, -c, -d, -i, -l, -m, -s, and -S. mlocate implements 6 additional options not in BSD locate: -b, -e, -P, -q, --regex and -w. GNU locate implements those six plus another four: -A, -D, -E, and -p. (I'm ignoring aliases and minor differences like -? vs -h vs --help.)

The BSDs and Mac OS X ship BSD locate.

Most Linuxes ship GNU locate, but Red Hat Linuxes and Arch ship mlocate instead. Debian doesn't install either in its base install, but offers both versions in its default package repositories; if both are installed at once, "locate" runs mlocate.

Oracle has been shipping mlocate in Solaris since 11.2, released in December 2014. Prior to that, locate was not installed by default on Solaris. (Presumably, this was done to reduce Solaris' command incompatibility with Oracle Linux, which is based on Red Hat Enterprise Linux, which also uses mlocate.)

IBM AIX still doesn't ship any version of locate, at least as of AIX 7.2, unless you install GNU findutils from the AIX Toolbox for Linux Applications.

HP-UX also appears to lack locate in the base system.

Older "real" Unixes generally did not include an implementation of locate.
find(1) has a powerful expression syntax, with many functions, Boolean operators, etc.
find(1) can select files by more than just name. It can select by:
- age
- size
- owner
- file type
- timestamp
- permissions
- depth within the subtree...
When finding files by name, you can search using file globbing syntax in all versions of find(1), or in GNU or BSD versions, using regular expressions.

Current versions of locate(1) accept glob patterns as find does, but BSD locate doesn't do regexes at all. If you're like me and have to use a variety of machine types, you find yourself preferring grep filtering to developing a dependence on -r or --regex.

locate needs strong filtering more than find does because...
find(1) doesn't necessarily search the entire filesystem. You typically point it at a subdirectory, a parent containing all the files you want it to operate on. The typical behavior for a locate(1) implementation is to spew up all files matching your pattern, leaving it to grep filtering and such to cut its eruption down to size.

(Evil tip: locate / will probably get you a list of all files on the system!)

There are variants of locate(1) like slocate(1) which restrict output based on user permissions, but this is not the default version of locate in any major operating system.
find(1) can do things to files it finds, in addition to just finding them. The most powerful and widely supported such operator is -exec, but there are others. In recent GNU and BSD find implementations, for example, you have the -delete and -execdir operators.
find(1) runs in real time, so its output is always up to date.

Because locate(1) relies on a database updated hours or days in the past, its output can be outdated. (This is the stale cache problem.) This coin has two sides:
1. locate can name files that no longer exist.
  
  GNU locate and mlocate have the -e flag to make it check for file existence before printing out the name of each file it discovered in the past, but this eats away some of the locate speed advantage, and isn't available in BSD locate besides.
2. locate will fail to name files that were created since the last database update.
You learn to be somewhat distrustful of locate output, knowing it may be wrong.

There are ways to solve this problem, but I am not aware of any implementation in widespread use. For example, there is rlocate, but it appears to not work against any modern Linux kernel.
find(1) never has any more privilege than the user running it.

Because locate provides a global service to all users on a system, it wants to have its updatedb process run as root so it can see the entire filesystem. This leads to a choice of security problems:
1. Run updatedb as root, but make its output file world-readable so locate can run without special privileges. This effectively exposes the names of all files in the system to all users. This may be enough of a security breach to cause a real problem.
  
  BSD locate is configured this way on Mac OS X and FreeBSD.
2. Write the database as readable only by root, and make locate setuid root so it can read the database. This means locate effectively has to reimplement the OS's permission system so it doesn't show you files you can't normally see. It also increases the attack surface of your system, specifically risking a root escalation attack.
3. Create a special "locate" user or group to own the database file, and mark the locate binary as setuid/setgid for that user/group so it can read the database. This doesn't prevent privilege escalation attacks by itself, but it greatly mitigates the damage one could cause.
  
  mlocate is configured this way on Red Hat Enterprise Linux.
  
  You still have a problem, though, because if you can use a debugger on locate or cause it to dump core you can get at privileged parts of the database.
I don't see a way to create a truly "secure" locate command, short of running it separately for each user on the system, which negates much of its advantage over find(1).

Bottom line, both are very useful. locate(1) is better when you're just trying to find a particular file by name, which you know exists, but you just don't remember where it is exactly. find(1) is better when you have a focused area to examine, or when you need any of its many advantages.

locate uses a prebuilt database, which should be regularly updated, while find iterates over a filesystem to locate files.

Thus, locate is much faster than find, but can be inaccurate if the database -can be seen as a cache- is not updated (see updatedb command).

Also, find can offer more granularity, as you can filter files by every attribute of it, while locate uses a pattern matched against file names.

find is not possible for a novice or occasional user of Unix to successfully use without careful perusal of the man page. Historically, some versions of find didn't even default the -print option, adding to the user-hostility.

locate is less flexible, but far more intuitive to use in the common case.

locate vs find: usage, pros and cons of each other

Tags:

Search

Find

Locate

Files

Related

Recent Posts