Should GDAL be set to produce GeoTIFF files with compression? Which algorithm should be used?
To select compression method you need to use a command like:
gdal_translate -co "COMPRESS=method" src_dataset dst_dataset
When you use compression biggest trade-off is extra processing time which is required to uncompress the image, and after uncompressing the image would still consume same amount of memory. About information loss there are two basic types of compression:
- lossless - which preserve original data values
- lossy - which degrade data to save even more space
You would lossless algorithms when original data values must be preserved, like DEMs, or raster features. Algorithms like PACKBITS, DEFLATE and LZW are lossless and can be ordered according compression ratio:
- LZW - highest compression ratio, highest processing power
- DEFLATE
- PACKBITS - lowest compression ratio, lowest processing power
Compression ratio still depends on data, if the data has a lot of similar values PACKBITS will yield good results.
Contrary to lossless you would use lossy algorithms like JPEG to compress rasters that don't have to return exact values. For instance, orthophotos or satellite imagery can be compressed using lossy algorithms.
With lzw
and deflate
compression using -co predictor=2
can help with imagery that is smoothly varying as it compresses the differences from pixel to pixel instead of the absolute values, and these will tend to be small and have more patterns (ref). Predictor is only useful with lzw
and deflate
compression, the option has no effect with other methods.
gdal_translate -co compress=lzw -co predictor=2 ...
The predictor savings can be dramatic. I just re-compressed a directory of 16bit geotiff elevation models using up 17GB with the default LZW settings into just 5GB with predictor=2.
There is conflicting info on the differences between predictors 2 & 3 and when each is best applied (ref1, ref2). Perhaps fuel for another question.
Another easy option for savings is -co tiled=yes
. There are some software which can't read tiled images, but those are becoming rarer and mostly outside of GIS (I don't know of any main stream GIS software now that doesn't read them).
To build on @alfonx's answer of using compressed overviews: This allows the base image to be stored lossless, for data integrity, and the pyramids to be lossy, for speed and some space savings. It's almost the best of both worlds. For the smallest possible overviews with gdaladdo
on RGB images: use jpeg compression, averaged or gaussian resampling instead of the default nearest neighbour (makes the overviews smoother), and YCBCR photometric overview. See the gdaladdo reference page for more info on these options (though it doesn't say much about what photometric is all about).
This is part of a windows batch file I use to apply external jpeg overviews to all tiffs in a directory:
set _opts= -r gauss --config PHOTOMETRIC_OVERVIEW YCBCR ^
--config COMPRESS_OVERVIEW JPEG --config JPEG_QUALITY_OVERVIEW 85
for %%a in (*.tif) do gdaladdo -ro %_opts% %%a 2 4 8 16 32 64
Notes
GDAL 1.6.0 introduced gauss
resampling which can lead to better results average
in case of sharp edges with high contrast or noisy patterns. Powers of 2 levels (2 4 8 ...) should be used so a 3x3 resampling Gaussian kernel is selected.
JPEG_QUALITY_OVERVIEW 85
- if not specified the default of 75% is used, which does yield smaller file, but I find 85% a better compromise in the size vs quality trade off.
Update, 2015: GDAL 1.8 and 2.0 have introduced a lot of new options not covered here and which I haven't had time to digest. Read the official gtiff format page, I'm sure there are additional useful settings detailed.
For big rasters GeoTiff offers the possibility to store (pre-)downscaled overviews as extra images to the GeoTiff file. This can be done with gdaladdo (= GDAL ADD Overview). When creating these overviews, you can manually tell gdal to compress them too:
gdaladdo --config COMPRESS_OVERVIEW JPEG
Speeds up viewing your data without adding too much size. Note: Geotools applications like Geoserver, uDig, AtlasStyler, Geopublisher can all use this feature and profit from overviews.