Xarray - multidimensional science data¶
This notebook reads aerosol index and Tropospheric $NO_{2}$ Concentration from Sentinel-5P TROPOMI data using Opengeo Tools
Tutorial data and code are from NASA ARSET program: https://appliedsciences.nasa.gov/join-mission/training/english/high-resolution-no2-monitoring-space-tropomi
import numpy as np
# from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import sys
from netCDF4 import Dataset
from pprint import pprint, pp
import pandas as pd
ls TROPOMI_PythonCodesAndData/
S5P_OFFL_L2__AER_AI_20180816T183016_20180816T201146_04361_01_010100_20180822T174822.nc* S5P_OFFL_L2__CO_____20180816T183016_20180816T201146_04361_01_010100_20180822T174815.nc* S5P_RPRO_L2__CH4____20180816T182917_20180816T201245_04361_01_010202_20190101T194705.nc* fileList.txt* read_and_map_tropomi_no2_ai.py* read_tropomi_and_list_sds.py* read_tropomi_no2_ai_and_dump_ascii.py* read_tropomi_no2_ai_at_a_location.py*
Explore NetCDF file for its contents¶
The NetCDF file is like a folder with multiple sub-folders and files within it. Folders are called as groups
and files within it are called as variables
. NASA supplies a cross-platform app called Panoply which gives you a UI to query and visualize NetCDF files. Below is a screenshot of Panoply reading the Aerosol Index file.
The first step is to read this file as a netCDF4.Dataset
class.
file_path = 'TROPOMI_PythonCodesAndData/S5P_OFFL_L2__AER_AI_20180816T183016_20180816T201146_04361_01_010100_20180822T174822.nc'
ds = Dataset(file_path, mode='r')
type(ds)
netCDF4._netCDF4.Dataset
ds.groups.keys()
dict_keys(['PRODUCT', 'METADATA'])
Explore the different variables as a DataFrame
v = {'variables':[], 'long_name':[], 'units':[]}
for var in list(ds.groups['PRODUCT'].variables.keys()):
v['variables'].append(ds.groups['PRODUCT'].variables[var].name)
v['long_name'].append(ds.groups['PRODUCT'].variables[var].long_name)
try:
v['units'].append(ds.groups['PRODUCT'].variables[var].units)
except:
v['units'].append(None)
vars_df = pd.DataFrame.from_dict(v)
vars_df
variables | long_name | units | |
---|---|---|---|
0 | scanline | along-track dimension index | 1 |
1 | ground_pixel | across-track dimension index | 1 |
2 | time | reference time for the measurements | seconds since 2010-01-01 00:00:00 |
3 | corner | pixel corner index | 1 |
4 | latitude | pixel center latitude | degrees_north |
5 | longitude | pixel center longitude | degrees_east |
6 | delta_time | offset from reference start time of measurement | milliseconds |
7 | time_utc | Time of observation as ISO 8601 date-time string | None |
8 | qa_value | data quality value | 1 |
9 | aerosol_index_354_388 | Aerosol index from 388 and 354 nm | 1 |
10 | aerosol_index_340_380 | Aerosol index from 380 and 340 nm | 1 |
11 | aerosol_index_354_388_precision | Precision of aerosol index from 388 and 354 nm | 1 |
12 | aerosol_index_340_380_precision | Precision of aerosol index from 380 and 340 nm | 1 |
Read Aerosol Index 354 - 388 nm into memory¶
# preview contents of the variable
ds.groups['PRODUCT'].variables['aerosol_index_354_388']
<class 'netCDF4._netCDF4.Variable'> float32 aerosol_index_354_388(time, scanline, ground_pixel) units: 1 proposed_standard_name: ultraviolet_aerosol_index comment: Aerosol index from 388 and 354 nm long_name: Aerosol index from 388 and 354 nm radiation_wavelength: [354. 388.] coordinates: longitude latitude ancillary_variables: aerosol_index_354_388_precision _FillValue: 9.96921e+36 path = /PRODUCT unlimited dimensions: current shape = (1, 3245, 450) filling on
ai_data = ds.groups['PRODUCT'].variables['aerosol_index_354_388'][:]
type(ai_data)
numpy.ma.core.MaskedArray
ai_data.shape
(1, 3245, 450)
plt.imshow(ai_data[0]);
Reading using xarray
¶
See https://github.com/acgeospatial/Sentinel-5P/blob/master/sentinel5p_xarray_blog.ipynb
import xarray
xr_data = xarray.open_dataset(file_path, group='PRODUCT',
engine='netcdf4', decode_coords=True)
type(xr_data)
xarray.core.dataset.Dataset
xr_data
<xarray.Dataset> Dimensions: (corner: 4, ground_pixel: 450, scanline: 3245, time: 1) Coordinates: * scanline (scanline) float64 0.0 1.0 ... 3.244e+03 * ground_pixel (ground_pixel) float64 0.0 1.0 ... 449.0 * time (time) datetime64[ns] 2018-08-16 * corner (corner) float64 0.0 1.0 2.0 3.0 latitude (time, scanline, ground_pixel) float32 ... longitude (time, scanline, ground_pixel) float32 ... Data variables: delta_time (time, scanline) timedelta64[ns] 18:51:5... time_utc (time, scanline) object '2018-08-16T18:5... qa_value (time, scanline, ground_pixel) float32 ... aerosol_index_354_388 (time, scanline, ground_pixel) float32 ... aerosol_index_340_380 (time, scanline, ground_pixel) float32 ... aerosol_index_354_388_precision (time, scanline, ground_pixel) float32 ... aerosol_index_340_380_precision (time, scanline, ground_pixel) float32 ...
- corner: 4
- ground_pixel: 450
- scanline: 3245
- time: 1
- scanline(scanline)float640.0 1.0 2.0 ... 3.243e+03 3.244e+03
- units :
- 1
- axis :
- Y
- long_name :
- along-track dimension index
- comment :
- This coordinate variable defines the indices along track; index starts at 0
array([0.000e+00, 1.000e+00, 2.000e+00, ..., 3.242e+03, 3.243e+03, 3.244e+03])
- ground_pixel(ground_pixel)float640.0 1.0 2.0 ... 447.0 448.0 449.0
- units :
- 1
- axis :
- X
- long_name :
- across-track dimension index
- comment :
- This coordinate variable defines the indices across track, from west to east; index starts at 0
array([ 0., 1., 2., ..., 447., 448., 449.])
- time(time)datetime64[ns]2018-08-16
- standard_name :
- time
- axis :
- T
- long_name :
- reference time for the measurements
- comment :
- The time in this variable corresponds to the time in the time_reference global attribute
array(['2018-08-16T00:00:00.000000000'], dtype='datetime64[ns]')
- corner(corner)float640.0 1.0 2.0 3.0
- units :
- 1
- long_name :
- pixel corner index
- comment :
- This coordinate variable defines the indices for the pixel corners; index starts at 0 (counter-clockwise, starting from south-western corner of the pixel in ascending part of the orbit)
array([0., 1., 2., 3.])
- latitude(time, scanline, ground_pixel)float32...
- long_name :
- pixel center latitude
- units :
- degrees_north
- standard_name :
- latitude
- valid_min :
- -90.0
- valid_max :
- 90.0
- bounds :
- /PRODUCT/SUPPORT_DATA/GEOLOCATIONS/latitude_bounds
[1460250 values with dtype=float32]
- longitude(time, scanline, ground_pixel)float32...
- long_name :
- pixel center longitude
- units :
- degrees_east
- standard_name :
- longitude
- valid_min :
- -180.0
- valid_max :
- 180.0
- bounds :
- /PRODUCT/SUPPORT_DATA/GEOLOCATIONS/longitude_bounds
[1460250 values with dtype=float32]
- delta_time(time, scanline)timedelta64[ns]...
- long_name :
- offset from reference start time of measurement
array([[67911027000000, 67912107000000, 67913187000000, ..., 71412302000000, 71413382000000, 71414462000000]], dtype='timedelta64[ns]')
- time_utc(time, scanline)object...
- long_name :
- Time of observation as ISO 8601 date-time string
array([['2018-08-16T18:51:51.027000Z', '2018-08-16T18:51:52.107000Z', '2018-08-16T18:51:53.187000Z', ..., '2018-08-16T19:50:12.301999Z', '2018-08-16T19:50:13.381999Z', '2018-08-16T19:50:14.461999Z']], dtype=object)
- qa_value(time, scanline, ground_pixel)float32...
- units :
- 1
- valid_min :
- 0
- valid_max :
- 100
- long_name :
- data quality value
- comment :
- A continuous quality descriptor, varying between 0 (no data) and 1 (full quality data). Recommend to ignore data with qa_value < 0.5
[1460250 values with dtype=float32]
- aerosol_index_354_388(time, scanline, ground_pixel)float32...
- units :
- 1
- proposed_standard_name :
- ultraviolet_aerosol_index
- comment :
- Aerosol index from 388 and 354 nm
- long_name :
- Aerosol index from 388 and 354 nm
- radiation_wavelength :
- [354. 388.]
- ancillary_variables :
- aerosol_index_354_388_precision
[1460250 values with dtype=float32]
- aerosol_index_340_380(time, scanline, ground_pixel)float32...
- units :
- 1
- proposed_standard_name :
- ultraviolet_aerosol_index
- comment :
- Aerosol index from 380 and 340 nm
- long_name :
- Aerosol index from 380 and 340 nm
- radiation_wavelength :
- [340. 380.]
- ancillary_variables :
- aerosol_index_340_380_precision
[1460250 values with dtype=float32]
- aerosol_index_354_388_precision(time, scanline, ground_pixel)float32...
- units :
- 1
- proposed_standard_name :
- ultraviolet_aerosol_index standard_error
- comment :
- Precision of aerosol index from 388 and 354 nm
- long_name :
- Precision of aerosol index from 388 and 354 nm
- radiation_wavelength :
- [354. 388.]
[1460250 values with dtype=float32]
- aerosol_index_340_380_precision(time, scanline, ground_pixel)float32...
- units :
- 1
- proposed_standard_name :
- ultraviolet_aerosol_index standard_error
- comment :
- Precision of aerosol index from 380 and 340 nm
- long_name :
- Precision of aerosol index from 380 and 340 nm
- radiation_wavelength :
- [340. 380.]
[1460250 values with dtype=float32]
xr_data_ai = xr_data['aerosol_index_340_380']
print(type(xr_data_ai))
print(xr_data_ai.shape)
<class 'xarray.core.dataarray.DataArray'> (1, 3245, 450)
xr_data_ai[0].plot();
(xr_data.latitude.attrs, xr_data.longitude.attrs)
({'long_name': 'pixel center latitude', 'units': 'degrees_north', 'standard_name': 'latitude', 'valid_min': -90.0, 'valid_max': 90.0, 'bounds': '/PRODUCT/SUPPORT_DATA/GEOLOCATIONS/latitude_bounds'}, {'long_name': 'pixel center longitude', 'units': 'degrees_east', 'standard_name': 'longitude', 'valid_min': -180.0, 'valid_max': 180.0, 'bounds': '/PRODUCT/SUPPORT_DATA/GEOLOCATIONS/longitude_bounds'})
Plot using matplotlib¶
plt.figure(figsize=(14,8))
ax = plt.axes()
xr_data.aerosol_index_340_380[0].plot.pcolormesh(ax=ax, x='longitude',
y='latitude',
add_colorbar=True, cmap='jet');