combining spatial netcdf files using xarray python

I don't know an "automated" way to do this in python (or R, FORTRAN), only manually reading in the files to a larger array and then writing out that array to a new netcdf file, but there is a more "automated" to do it from the command line using CDO.

If you define a domain description file grid.txt that contains the two (or more) files regions:

gridtype = lonlat
gridsize = 420
xname = lon
xlongname = longitude
xunits = degrees east
yname = lat
ylongname = latitude
yunits = degrees north
xsize = 21
ysize = 20
xfirst = -11.0
xinc = 1
yfirst = -20.0
yinc = 1

and then you "expand" the first file file1.nc to the larger domain and then merge in the contents of both netcdf files:

cdo expand,grid.txt file1.nc large.nc
cdo mergegrid large.nc file1.nc merge1.nc
cdo mergegrid merge1.nc file2.nc final_merge.nc 

I found this soln here: https://code.mpimet.mpg.de/boards/1/topics/26 and have used it when I need to merge 2 or 3 files together. However when I needed to merge many hundred of files together containing e.g. one latitude row of data each, I wrote a manual programme (in R in my case).


xarray now supports multi-dimensional concatenation directly through open_mfdataset.

The documentation on combining data along multiple dimensions is here, but as your question is very similar to this one, I'm going to copy the key parts of my answer here:


You have a 2D concatenation problem: you need to arrange the datasets such that when joined up along x and y, they make a larger dataset which also has dimensions x and y.

As long as len(x) is the same in every file, and len(y) is the same in every file, you should in theory be able to do this in one or two different ways.

1) Using combine='nested'

You can manually specify the order that you need them joined up in. xarray allows you to do this by passing the datasets as a grid, specified as a nested list. In your case, if we had 4 files (named [upper_left, upper_right, lower_left, lower_right]), we would combine them like so:

from xarray import open_mfdataset

grid = [[upper_left, upper_right], 
        [lower_left, lower_right]]

ds = open_mfdataset(grid, concat_dim=['x', 'y'], combine='nested')

We had to tell open_mfdataset which dimensions of the data the rows and colums of the grid corresponded to, so it would know which dimensions to concatenate the data along. That's why we needed to pass concat_dim=['x', 'y'].

2) Using combine='by_coords'

But your data has coordinates in it already - can't xarray just use those to arrange the datasets in the right order? That is what the combine='by_coords' option is for, but unfortunately, it requires 1-dimensional coordinates (also known as dimensional coordinates) to arrange the data. If your files don't have any of those the printout will says Dimensions without coordinates: x, y).

If you can add 1-dimensional coordinates to your files first, then you could use combine='by_coords', then you could just pass a list of all the files in any order, i.e.

ds = open_mfdataset([file1, file2, ...], combine='by_coords')

But otherwise you'll have to use combine='nested'.


My understanding of your question is that you want to want to open multiple netcdf files which contain different spatial sections of your data, where the overall dataset has been broken down along both lat and lon.

If that's the case, then I'm afraid xarray doesn't support this at the moment, I asked about exactly the same issue on the xarray github here.

The same thing was also asked about on SO here. The concat solution mentioned there will work.

In my case I then wanted to save the concatenated dataset to a single new netcdf file, but using this method ended up loading all the data into memory at once. TO get around this I ended up having to use the netcdf python library to solve this at a lower level, but it took a lot of effort.