Any potential problems backing up open GeoPackages using Python routine
GeoPackages are SQLite files and so techniques for backing up SQLite files should apply here.
The SQLite docs have some information on backups here: https://sqlite.org/backup.html and gives a couple of methods:
- Use the SQLite backup API from C code. I'm not sure if there's Python bindings for this.
- Open the database and run
VACUUM INTO 'backup.gpkg';
- this will copy the open database to a new one calledbackup.gpkg
as well as vacuuming it to remove data marked as deleted.
If these methods run without error (ie no "Disk Full" or power plug removed) then the resulting output file should be a valid SQLite DB and hence a valid GeoPackage backup of the original. It should also be consistent (ie all atomic operations complete), and should have a minor impact on other processes using the database (ie it won't lock them out for long periods of time).
The VACUUM INTO
method does require a newer version of SQLite
tools than I had on my system just now but downloading and running the latest is really easy - I only had to get the Linux binaries which were a couple of megabytes.
It shouldn't matter if the database is being written to by another process - that process can continue writing while the backup happens, but how much of the current written content is backed up will depend on the exact point that the backup happens. In any case the consistency property will ensure that your backup is complete to the transactional level of the writing process.
You might get more and better answers from http://dba.stackexchange.com/ because after all GeoPackage is just a SQLite database.
A simple backup method in GIS environment would be to run just
ogr2ogr -f gpkg backup.gpkg input.gpkg
.
The same can be done with GDAL Python bindings without using the ogr2ogr executable. Because data gets written into a new database it effectively vacuums the database by the same. But if the GeoPackage is in use in read-write mode and there are pending transactions I am not sure about what data gets stored into the copy.
If you rather play with files and you know that the database is opened as read-only it is safe to backup just the main db file .gpkg. All possible temporary files https://sqlite.org/tempfiles.html can be skipped.
You can also backup just the .gpkg file when the database is opened as read-write but then it is not sure what your backup will contain. A better option is to backup also the journal files. What they are depends on the journal mode that the GeoPackage db is using.
If the GeoPackage db is using rollback journals https://sqlite.org/lockingv3.html#rollback you could check if the journal file exists. If there is no journal file then .gpkg is up to date and you could backup just that. If there is a journal file you can backup also that or loop and wait until the journal disappears. Usually it is just seconds but sometimes it could mean a long wait.
If GeoPackage is set to use write-ahead logging https://sqlite.org/wal.html a sidecar wal file is created also when the database is opened as read-only. As far as I know QGIS is changing GeoPackages into WAL. The wal file disappears only once the last connection to the database is closed properly. In the rollback journal mode the .gpkg is guaranteed to contain all the changes if there is no journal file but in WAL mode that logic cannot be used. If you control the system and know that GeoPackage is opened as read-only you can skip the wal file because it newer contains any pending transactions. Otherwise you should backup .gpkg and wal and shm files together and the backup would contain a snapshot from that exact moment.
Your backup system for the read-write case could also copy the .gpkg and either the journal file or wal+shm into a temporary place and then open and close the database. By doing that the pending changes would be integrated into main database file, the sidecar files would disappear, and you would have just .gpkg file to put in the final backup.