Does sqlite3 compress data?

I have a rather big SQLite file in my GIT repository and I was wondering why my total repository size was not growing that much but instead, was even smaller than my SQLite-.db-file. Turns out, GIT compresses the repository by default. A quick check on my .db-file also indicates this, since zipping the .db-file resulted in a zip-archive which was only ~20% the size of the .db-file.

So, at least with the default settings, it doesn't look like SQLite is storing the data in a compressed way. However, regardless of that, it might be fine to add a big SQLite file to a GIT repository, since GIT performs compression automatically.


SQLite is not running a compression algorithm, but it will store data in a binary file instead of a text file. Which means that the data can be stored more efficiently, for example using a 32-bit (4 byte) number to represent 10,000,000 instead of storing it as 8 bytes of text (or more if the file is unicode).

Here are more details on the SQL Database File Format if you are interested.

Does that make sense?


SQLite, by default, does not compress data it writes to the disk; however, SQLite does have a set of "Proprietary Extensions" for that and other purposes. Look for ZIPVFS in the links as follows.

http://www.sqlite.org/support.html and http://www.hwaci.com/sw/sqlite/prosupport.html

You can achieve a lot of "compression" in your data by encoding fields as integers. For example an IP address was engineered to fit into a word (4 bytes). Each octect of the address may be represented in one byte of a word.

string[] octets = '172.168.0.1'.split('.')
int ip = atoi(octets[0]) << 24
ip |= atoi(octets[1]) << 16
ip |= atoi(octets[2]) << 8
ip |= atoi(octets[3])

Additionally, your timestamp may be represented in Unix time, which is the number of seconds since the epoch.

UPDATE mytable SET healthtime = CAST(strftime('%s',healthtime) AS INTEGER);

See the Date and Time functions

Note the CAST directive in the above SQL: SQLite does not enforce type on a column, so you may have a group of digits stored as a string; increasing your field size more than necessary (this will also make certain queries behave oddly).

One more thing: field size is not the only part of the story. Remember that indexes take up a space too, and indexes on integers are more efficient --in terms of disk size and performance.

Tags:

Python

Sqlite