What kind of database do `updatedb` and `locate` use?

Implementations of locate/updatedb typically use specific databases tailored to their requirements, rather than a generic database engine. You’ll find those specific databases documented by each implementation; for example:

GNU findutils’ is documented in locatedb(5), and is pretty much just a list of files (with a specific compression algorithm);
mlocate’s is documented in mlocate.db(5), and can also be considered a list of directories and files (with metadata).

Seems to be a flat file of C structs, written/read using the Gnu LibC OBSTACKS Macros

See sources

https://github.com/msekletar/mlocate/blob/master/src/updatedb.c#L720

https://github.com/msekletar/mlocate/blob/master/src/locate.c#L413

You could get something similar with

find / -xdev -type f -not -path \*\.git\/\* | gzip -9 > /tmp/files.gz
zgrep file_i_want /tmp/files.gz

As far as I know behind is Berkeley DB which is key/value daemonless database. Follow the link for more info. Extract from Wikipedia:

Berkeley DB (BDB) is a software library intended to provide a high-performance embedded database for key/value data. Berkeley DB is written in C with API bindings for C++, C#, Java, Perl, PHP, Python, Ruby, Smalltalk, Tcl, and many other programming languages. BDB stores arbitrary key/data pairs as byte arrays, and supports multiple data items for a single key. Berkeley DB is not a relational database.

The location of database in RHEL/CentOS is /var/lib/mlocate/mlocate.db (not sure about the other distributions). The command locate --statistics will give you info about the location and some statistics of database (example):

Database /var/lib/mlocate/mlocate.db:
        16,375 directories
        242,457 files
        11,280,301 bytes in file names
        4,526,116 bytes used to store database

For mlocate format here is head of man page:

A mlocate database starts with a file header: 8 bytes for a magic number ("\0mlo- cate" like a C literal), 4 bytes for the configuration block size in big endian, 1 byte for file format version (0), 1 byte for the “require visibility” flag (0 or 1), 2 bytes padding, and a NUL-terminated path name of the root of the database.

The header is followed by a configuration block, included to ensure databases are not reused if some configuration changes could affect their contents. The size of the configuration block in bytes is stored in the file header. The configuration block is a sequence of variable assignments, ordered by variable name. Each vari- able assignment consists of a NUL-terminated variable name and an ordered list of NUL-terminated values. The value list is terminated by one more NUL character. The ordering used is defined by the strcmp () function.

What kind of database do `updatedb` and `locate` use?

Tags:

Database

Find

Locate

Updatedb

Related

Recent Posts