Analyse#

The ModelScenario class allows users to collate related data sources and calculate modelled output based on this data. The types of data currently included are: - Timeseries observation data (ObsData) - Fixed domain sensitivity maps known as footprints (FootprintData) - Fixed domain flux maps (FluxData) - multiple maps can be included and referenced by source name - Fixed domain vertical curtains at the boundaries referred to as boundary conditions (BoundaryConditionsData)

class openghg.analyse.ModelScenario(site=None, satellite=None, species=None, inlet=None, height=None, network=None, domain=None, platform=None, max_level=None, obs_region=None, selection=None, model=None, met_model=None, fp_inlet=None, source=None, sources=None, bc_input=None, start_date=None, end_date=None, obs=None, obs_column=None, footprint=None, flux=None, bc=None, store=None)[source]#

This class stores together observation data with ancillary data and allows operations to be performed combining these inputs.

__init__(site=None, satellite=None, species=None, inlet=None, height=None, network=None, domain=None, platform=None, max_level=None, obs_region=None, selection=None, model=None, met_model=None, fp_inlet=None, source=None, sources=None, bc_input=None, start_date=None, end_date=None, obs=None, obs_column=None, footprint=None, flux=None, bc=None, store=None)[source]#

Create a ModelScenario instance based on a set of keywords to be or directly supplied objects. This can be created as an empty class to be populated.

The keywords are related to observation, footprint and flux data which may be available within the object store. The combination of these supplied will be used to extract the relevant data. Related keywords are as follows:

Observation data: site, species, inlet, network, start_date, end_data

Footprint data: site, inlet, domain, model, met_model, species, start_date, end_date

Flux data: species, sources, domain, start_date, end_date

Args: site: Site code e.g. “TAC”. satellite: Satellite name e.g “GOSAT”. species: Species code e.g. “ch4”. inlet: Inlet value e.g. “10m”. height: Alias for inlet. network: Network name e.g. “AGAGE”. domain: Domain name e.g. “EUROPE”. platform: Platform name e.g “satellite”, “column-insitu”. max_level: Maximum level for processing. obs_region: The geographic region covered by the data (“BRAZIL”, “INDIA”, “UK”). selection: For satellite only, identifier for any data selection which has been

performed on satellite data. This can be based on any form of filtering, binning etc. but should be unique compared to other selections made e.g. “land”, “glint”, “upperlimit”. If not specified, domain will be used.

model: Model name used in creation of footprint e.g. “NAME”. met_model: Name of met model used in creation of footprint e.g. “UKV”. fp_inlet: Specify footprint release height options if this doesn’t match the site value. source: “anthro” (for “TOTAL”), source name from file otherwise. sources: Emissions sources. bc_input: Input keyword for boundary conditions e.g. “mozart” or “cams”. start_date: Start of date range to use. Note for flux this may not be applied. end_date: End of date range to use. Note for flux this may not be applied. obs: Supply ObsData object directly (e.g. from get_obs…() functions). obs_column: Supply ObsColumnData object directly. footprint: Supply FootprintData object directly (e.g. from get_footprint() function). flux: Supply FluxData object directly (e.g. from get_flux() function). bc: Supply BoundaryConditionsData object directly. store: Name of object store to retrieve data from.

Returns:

None

Sets up instance of class with associated values.

TODO: For obs, footprint, flux should we also allow Dataset input and turn these into the appropriate class?

add_bc(species=None, bc_input=None, domain=None, start_date=None, end_date=None, bc=None, store=None)[source]#

Add boundary conditions data based on keywords or direct BoundaryConditionsData object.

Return type:: None

add_flux(species=None, domain=None, source=None, sources=None, start_date=None, end_date=None, flux=None, store=None)[source]#

Add flux data based on keywords or direct FluxData object. Can add flux datasets for multiple sources.

Return type:: None

add_footprint(site=None, inlet=None, height=None, domain=None, model=None, satellite=None, obs_region=None, met_model=None, start_date=None, end_date=None, species=None, fp_inlet=None, network=None, footprint=None, store=None)[source]#

Add footprint data based on keywords or direct FootprintData object.

Return type:: None

add_obs(site=None, species=None, inlet=None, height=None, network=None, start_date=None, end_date=None, obs=None, store=None)[source]#

Add observation data based on keywords or direct ObsData object.

Return type:: None

add_obs_column(site=None, satellite=None, max_level=None, species=None, platform=None, obs_region=None, domain=None, selection=None, network=None, obs_column=None, store=None)[source]#

Add column data based on keywords or direct ObsColumnData object.

Return type:: None

calc_modelled_baseline(resample_to='coarsest', platform=None, output_units=1, cache=True, recalculate=False)[source]#

Calculate the modelled baseline points based on site footprint and boundary conditions. Boundary conditions are multipled by any loss (exp(-t/lifetime)) for the species.

The time points returned are dependent on the resample_to option chosen. If obs data is also linked to the ModelScenario instance, this will be used to derive the time points where appropriate.

Parameters:

resample_to (str | None) –
Resample option to use for averaging:
- either one of [“coarsest”, “obs”, “footprint”] to match to the datasets
- or using a valid pandas resample period e.g. “2H”.
- None to not resample and to just “ffill” footprint to obs
Default = “coarsest”.
platform (str | None) – Observation platform used to decide whether to resample e.g. “satellite”, “insitu”, “flask”
cache (bool) – Cache this data after calculation. Default = True.
recalculate (bool) – Make sure to recalculate this data rather than return from cache. Default = False.

Returns:

Modelled baselined values along the time axis

If cache is True:: This data will also be cached as the ModelScenario.modelled_baseline attribute. The associated scenario data will be cached as the ModelScenario.scenario attribute.

Return type:

xarray.DataArray

calc_modelled_obs(sources=None, resample_to='coarsest', platform=None, cache=True, recalculate=False, output_fp_x_flux=False, split_by_sectors=False)[source]#

Calculate the modelled observation points based on site footprint and fluxes.

The time points returned are dependent on the resample_to option chosen. If obs data is also linked to the ModelScenario instance, this will be used to derive the time points where appropriate.

Parameters:

sources (str | list | None) – Sources to use for flux. All will be used and stacked if not specified.
resample_to (str | None) –
Resample option to use for averaging:
- either one of [“coarsest”, “obs”, “footprint”] to match to the datasets
- or using a valid pandas resample period e.g. “2H”.
- None to not resample and to just “ffill” footprint to obs
Default = “coarsest”.
platform (str | None) – Observation platform used to decide whether to resample e.g. “satellite”, “insitu”, “flask”
cache (bool) – Cache this data after calculation. Default = True.
recalculate (bool) – Make sure to recalculate this data rather than return from cache. Default = False.
output_fp_x_flux (bool) – If true, include “fp x flux” data variable in output.
split_by_sectors (bool) – If true, compute separate timeseries (and fp_x_flux) for each flux sector; these are stored under the mf_mod_sectoral and fp_x_flux_sectoral data variables, and have a source dimension for the different flux sources. The total mf_mod and fp_x_flux are available under their usual names.

Returns:

Modelled observation values along the time axis, optionally with “fp x flux”.

If cache is True:: This data will also be cached as the ModelScenario.modelled_obs attribute. The associated scenario data will be cached as the ModelScenario.scenario attribute.

Return type:

xarray.Dataset

combine_flux_sources(sources=None, cache=True, recalculate=False)[source]#

Combine together flux sources on the time dimension. This will align to the time of the highest frequency flux source both for time range and frequency.

Parameters:

sources (str | list | None) – Names of sources to combine. Should already be attached to ModelScenario.
cache (bool) – Cache this data after calculation. Default = True

Returns:

All flux sources stacked on the time dimension.

Return type:

Dataset

combine_obs_footprint(resample_to='coarsest', platform=None, cache=True, recalculate=False)[source]#

Combine observation and footprint data so these are on the same time axis. This will both slice and resample the data to align this axis.

Data is slices to smallest timeframe spanned by both footprint and obs
Data is resampled according to resample_to input and using the mean
Data is combined into one dataset

Parameters:

resample_to (str | None) –
Resample option to use for averaging:
- either one of [“coarsest”, “obs”, “footprint”] to match to the datasets
- or using a valid pandas resample period e.g. “2H”.
- None to not resample and to just “ffill” footprint to obs
Default = “coarsest”.
platform (str | None) – Observation platform used to decide on resample and alignment steps. If this is not supplied, function will attempt to extract this value from from the metadata, then the openghg_defs site_info.json details for the site.
cache (bool) – Cache this data after calculation. Default = True.

Returns:

Combined dataset aligned along the time dimension

If cache is True:: This data will be also be cached as the ModelScenario.scenario attribute.

Return type:

xarray.Dataset

footprints_data_merge(resample_to='coarsest', platform=None, calc_timeseries=True, calc_fp_x_flux=False, sources=None, split_by_sectors=False, calc_bc=True, cache=True, recalculate=False)[source]#

Produce combined object containing aligned footprint and observation data. Can also include modelled timeseries data derived from flux.

Parameters:

resample_to (str | None) –
Resample option to use for averaging:
- either one of [“coarsest”, “obs”, “footprint”] to match to the datasets
- or using a valid pandas resample period e.g. “2H”.
- None to not resample and to just “ffill” footprint to obs
Default = “coarsest”.
platform (str | None) – Observation platform used to decide whether to resample.
calc_timeseries (bool) – Calculate modelled timeseries based on flux sources.
calc_fp_x_flux (bool) – Calculate “fp x flux” matrix
sources (str | list | None) – Sources to use for flux if calc_timseries is True. All will be used and stacked if not specified.
split_by_sectors (bool) – if True, separate modelled obs (and footprint x flux) will be calculated for each flux source (“sector”).
calc_baseline – Calculate modelled baseline.
cache (bool) – Cache this data after calculation. Default = True.
recalculate (bool) – Make sure to recalculate this data rather than return from cache. Default = False.

Returns:

Combined dataset containing footprint and observation data

Return type:

xarray.Dataset

plot_comparison(baseline='boundary_conditions', sources=None, resample_to='coarsest', platform=None, cache=True, recalculate=False)[source]#

Plot comparison between observation and modelled timeseries data.

Parameters:

baseline (str | None) – Add baseline to data. One of: - “boundary_conditions” - Uses added boundary conditions to calculate modelled baseline - “percentile” - Calculates the 1% value across the whole time period - None - don’t add a baseline and only plot the modelled observations
sources (str | list | None) – Sources to use for flux. All will be used and stacked if not specified.
resample_to (str | None) –
Resample option to use for averaging:
- either one of [“coarsest”, “obs”, “footprint”] to match to the datasets
- or using a valid pandas resample period e.g. “2H”.
- None to not resample and to just “ffill” footprint to obs
Default = “coarsest”.
platform (str | None) – Observation platform used to decide whether to resample e.g. “satellite”, “insitu”, “flask”
cache (bool) – Cache this data after calculation. Default = True.
recalculate (bool) – Make sure to recalculate this data rather than return from cache. Default = False.

Return type:

Any

Returns:

Plotly Figure

Interactive plotly graph created with observation and modelled observation data.

plot_timeseries()[source]#

Plot the observation timeseries data.

Return type:

Any

Returns:

Plotly Figure

Interactive plotly graph created with observations