Opening EOS netCDF4/HDF5 file with correct format using xarray?
To open the data with the projection information you need to open the sub-datasets individually.
I will use a MODIS dataset I have to hand as an example, MOD11A1, but it will be the same for yours. You can get the filename of the subdatasets using rasterio for example:
import rasterio
filename = '/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf'
with rasterio.open(filename) as src:
subdatasets = src.subdatasets
You could use gdal rather than rasterio:
import gdal
g = gdal.Open(filename)
subdatasets = g.GetSubDatasets()
In this example, subdatasets looks like:
print(subdatasets)
['HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:LST_Day_1km', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:Emis_32', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:Clear_day_cov', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:Clear_night_cov', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:QC_Day', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:Day_view_time', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:Day_view_angl', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:LST_Night_1km', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:QC_Night', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:Night_view_time', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:Night_view_angl', 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:Emis_31']
Opening one of these subdatasets as an xarray will preserve the projection information:
import xarray as xr
fname = 'HDF4_EOS:EOS_GRID:/data/MOD11A1.A2019225.h17v03.006.2019226085002.hdf:MODIS_Grid_Daily_1km_LST:LST_Day_1km'
myDataset = xr.open_rasterio(fname)
And I have an xarray with projection information:
print(myDataset)
<xarray.DataArray (band: 1, y: 1200, x: 1200)>
[1440000 values with dtype=uint16]
Coordinates:
* band (band) int64 1
* y (y) float64 6.671e+06 6.67e+06 6.669e+06 ... 5.561e+06 5.56e+06
* x (x) float64 -1.111e+06 -1.111e+06 -1.11e+06 ... -1.39e+03 -463.3
Attributes:
transform: (926.6254331391667, 0.0, -1111950.519767, 0.0, -926.625433...
crs: +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=637100...
res: (926.6254331391667, 926.6254331383334)
is_tiled: 0
nodatavals: (0.0,)
scales: (0.02,)
offsets: (0.0,)
descriptions: ('Daily daytime 1km grid Land-surface Temperature',)
units: ('K',)
If you need all of the sub-datasets it is necessary to loop though each sub-product and then add them to an xarray dataset.