What kind of database do `updatedb` and `locate` use?
Implementations of locate
/updatedb
typically use specific databases tailored to their requirements, rather than a generic database engine. You’ll find those specific databases documented by each implementation; for example:
- GNU
findutils
’ is documented inlocatedb(5)
, and is pretty much just a list of files (with a specific compression algorithm); mlocate
’s is documented inmlocate.db(5)
, and can also be considered a list of directories and files (with metadata).
Seems to be a flat file of C structs, written/read using the Gnu LibC OBSTACKS Macros
See sources
https://github.com/msekletar/mlocate/blob/master/src/updatedb.c#L720
https://github.com/msekletar/mlocate/blob/master/src/locate.c#L413
You could get something similar with
find / -xdev -type f -not -path \*\.git\/\* | gzip -9 > /tmp/files.gz
zgrep file_i_want /tmp/files.gz
As far as I know behind is Berkeley DB which is key/value daemonless database. Follow the link for more info. Extract from Wikipedia:
Berkeley DB (BDB) is a software library intended to provide a high-performance embedded database for key/value data. Berkeley DB is written in C with API bindings for C++, C#, Java, Perl, PHP, Python, Ruby, Smalltalk, Tcl, and many other programming languages. BDB stores arbitrary key/data pairs as byte arrays, and supports multiple data items for a single key. Berkeley DB is not a relational database.
The location of database in RHEL/CentOS is /var/lib/mlocate/mlocate.db
(not sure about the other distributions).
The command locate --statistics
will give you info about the location and some statistics of database (example):
Database /var/lib/mlocate/mlocate.db:
16,375 directories
242,457 files
11,280,301 bytes in file names
4,526,116 bytes used to store database
For mlocate format here is head of man page:
A mlocate database starts with a file header: 8 bytes for a magic number ("\0mlo- cate" like a C literal), 4 bytes for the configuration block size in big endian, 1 byte for file format version (0), 1 byte for the “require visibility” flag (0 or 1), 2 bytes padding, and a NUL-terminated path name of the root of the database.
The header is followed by a configuration block, included to ensure databases are not reused if some configuration changes could affect their contents. The size of the configuration block in bytes is stored in the file header. The configuration block is a sequence of variable assignments, ordered by variable name. Each vari- able assignment consists of a NUL-terminated variable name and an ordered list of NUL-terminated values. The value list is terminated by one more NUL character. The ordering used is defined by the strcmp () function.