store API detail#

openghg.store.Emissions#

The Emissions class is used to process emissions / flux data files.

class openghg.store.Emissions[source]#

This class is used to process emissions / flux data

static read_data(binary_data, metadata, file_metadata)[source]#

Ready a footprint from binary data

Parameters
  • binary_data (bytes) – Footprint data

  • metadata (Dict) – Dictionary of metadata

  • file_metadat – File metadata

Returns

UUIDs of Datasources data has been assigned to

Return type

dict

static read_file(filepath, species, source, domain, source_format='openghg', date=None, high_time_resolution=False, period=None, continuous=True, overwrite=False)[source]#

Read emissions file

Parameters
  • filepath (Union[str, Path]) – Path of emissions file

  • species (str) – Species name

  • domain (str) – Emissions domain

  • source (str) – Emissions source

  • date (Optional[str]) – Date associated with emissions as a string

  • source_format (str) – Type of data being input e.g. openghg (internal format)

  • high_time_resolution (Optional[bool]) – If this is a high resolution file

  • period (Union[str, tuple, None]) – Period of measurements. Only needed if this can not be inferred from the time coords

  • specified (If) –

    • “yearly”, “monthly”

    • suitable pandas Offset Alias

    • tuple of (value, unit) as would be passed to pandas.Timedelta function

  • of (should be one) –

    • “yearly”, “monthly”

    • suitable pandas Offset Alias

    • tuple of (value, unit) as would be passed to pandas.Timedelta function

  • continuous (bool) – Whether time stamps have to be continuous.

  • overwrite (bool) – Should this data overwrite currently stored data.

Returns

Dictionary of datasource UUIDs data assigned to

Return type

dict

static schema()[source]#

Define schema for emissions Dataset.

Includes flux/emissions for each time and position:
  • “flux”
    • expected dimensions: (“time”, “lat”, “lon”)

Expected data types for all variables and coordinates also included.

Returns

Contains schema for Emissions.

Return type

DataSchema

static transform_data(datapath, database, overwrite=False, **kwargs)[source]#

Read and transform an emissions database. This will find the appropriate parser function to use for the database specified. The necessary inputs are determined by which database ie being used.

The underlying parser functions will be of the form:
  • openghg.transform.emissions.parse_{database.lower()}
    • e.g. openghg.transform.emissions.parse_edgar()

Parameters
  • datapath (Union[str, Path]) – Path to local copy of database archive (for now)

  • database (str) – Name of database

  • overwrite (bool) – Should this data overwrite currently stored data which matches.

  • **kwargs (Dict) – Inputs for underlying parser function for the database. Necessary inputs will depend on the database being parsed.

TODO: Could allow Callable[…, Dataset] type for a pre-defined function be passed

Return type

Dict

static validate_data(data)[source]#

Validate input data against Emissions schema - definition from Emissions.schema() method.

Parameters

data (Dataset) – xarray Dataset in expected format

Return type

None

Returns

None

Raises a ValueError with details if the input data does not adhere to the Emissions schema.

openghg.store.EulerianModel#

The EulerianModel class is used to process Eulerian model data.

class openghg.store.EulerianModel[source]#

This class is used to process Eulerian model data

static read_file(filepath, model, species, start_date=None, end_date=None, setup=None, overwrite=False)[source]#

Read Eulerian model output

Parameters
  • filepath (Union[str, Path]) – Path of Eulerian model species output

  • model (str) – Eulerian model name

  • species (str) – Species name

  • start_date (Optional[str]) – Start date (inclusive) associated with model run

  • end_date (Optional[str]) – End date (exclusive) associated with model run

  • setup (Optional[str]) – Additional setup details for run

  • overwrite (bool) – Should this data overwrite currently stored data.

Return type

Dict

openghg.store.Footprints#

The Footprints class is used to store and retrieve meteorological data from the ECMWF data store. Some data may be cached locally for quicker access.

class openghg.store.Footprints[source]#

This class is used to process footprints model output

static read_data(binary_data, metadata, file_metadata)[source]#

Ready a footprint from binary data

Parameters
  • binary_data (bytes) – Footprint data

  • metadata (Dict) – Dictionary of metadata

  • file_metadat – File metadata

Returns

UUIDs of Datasources data has been assigned to

Return type

dict

static read_file(filepath, site, height, domain, model, metmodel=None, species=None, network=None, period=None, continuous=True, retrieve_met=False, high_spatial_res=False, high_time_res=False, short_lifetime=False, overwrite=False)[source]#

Reads footprints data files and returns the UUIDS of the Datasources the processed data has been assigned to

Parameters
  • filepath (Union[str, Path]) – Path of file to load

  • site (str) – Site name

  • height (str) – Height above ground level in metres

  • domain (str) – Domain of footprints

  • model (str) – Model used to create footprint (e.g. NAME or FLEXPART)

  • metmodel (Optional[str]) – Underlying meteorlogical model used (e.g. UKV)

  • species (Optional[str]) – Species name. Only needed if footprint is for a specific species e.g. co2 (and not inert)

  • network (Optional[str]) – Network name

  • period (Union[str, tuple, None]) – Period of measurements. Only needed if this can not be inferred from the time coords

  • continuous (bool) – Whether time stamps have to be continuous.

  • retrieve_met (bool) – Whether to also download meterological data for this footprints area

  • high_spatial_res (bool) – Indicate footprints include both a low and high spatial resolution.

  • high_time_res (bool) – Indicate footprints are high time resolution (include H_back dimension) Note this will be set to True automatically if species=”co2” (Carbon Dioxide).

  • short_lifetime (bool) – Indicate footprint is for a short-lived species. Needs species input. Note this will be set to True if species has an associated lifetime.

  • overwrite (bool) – Overwrite any currently stored data

Returns

UUIDs of Datasources data has been assigned to

Return type

dict

static schema(particle_locations=True, high_spatial_res=False, high_time_res=False, short_lifetime=False)[source]#

Define schema for footprint Dataset.

The returned schema depends on what the footprint represents, indicated using the keywords. By default, this will include “fp” variable but this will be superceded if high_spatial_res or high_time_res are specified.

Parameters
  • particle_locations (bool) – Include 4-directional particle location variables: - “particle_location_[nesw]” and include associated additional dimensions (“height”)

  • high_spatial_res (bool) – Set footprint variables include high and low resolution options: - “fp_low” - “fp_high” and include associated additional dimensions (“lat_high”, “lon_high”).

  • high_time_res (bool) – Set footprint variable to be high time resolution - “fp_HiTRes” and include associated dimensions (“H_back”).

  • short_lifetime (bool) – Include additional particle age parameters for short lived species: - “mean_age_particles_[nesw]”

Return type

DataSchema

static validate_data(data, particle_locations=True, high_spatial_res=False, high_time_res=False, short_lifetime=False)[source]#

Validate data against Footprint schema - definition from Footprints.schema(…) method.

Parameters
  • data (Dataset) – xarray Dataset in expected format

  • inputs. (See Footprints.schema() method for details on optional) –

Return type

None

Returns

None

Raises a ValueError with details if the input data does not adhere to the Footprints schema.

openghg.store.ObsSurface#

The ObsSurface class is used to process surface observation data.

class openghg.store.ObsSurface[source]#

This class is used to process surface observation data

delete(uuid)[source]#

Delete a Datasource with the given UUID

This function deletes both the record of the object store in he

Parameters

uuid (str) – UUID of Datasource

Return type

None

Returns

None

static read_data(binary_data, metadata, file_metadata, precision_data=None)[source]#

Reads binary data passed in by serverless function. The data dictionary should contain sub-dictionaries that contain data and metadata keys.

This is clunky and the ObsSurface.read_file function could be tidied up quite a lot to be more flexible.

Parameters
  • binary_data (bytes) – Binary measurement data

  • metadata (Dict) – Metadata

  • file_metadata (Dict) – File metadata such as original filename

  • precision_data (Optional[bytes]) – GCWERKS precision data

Returns

Dictionary of result

Return type

dict

static read_file(filepath, source_format, network, site, inlet=None, instrument=None, sampling_period=None, measurement_type='insitu', overwrite=False, verify_site_code=True)[source]#
Process files and store in the object store. This function

utilises the process functions of the other classes in this submodule to handle each data type.

Parameters
  • filepath (Union[str, Path, Tuple, List]) – Filepath(s)

  • source_format (str) – Data format, for example CRDS, GCWERKS

  • site (str) – Site code/name

  • network (str) – Network name

  • inlet (Optional[str]) – Inlet height. If retrieve multiple files pass None, OpenGHG will attempt to

  • data. (read inlets from) –

  • instrument (Optional[str]) – Instrument name

  • sampling_period (Optional[str]) – Sampling period in pandas style (e.g. 2H for 2 hour period, 2m for 2 minute period).

  • measurement_type (str) – Type of measurement e.g. insitu, flask

  • overwrite (bool) – Overwrite previously uploaded data

  • verify_site_code (bool) – Verify the site code

Returns

Dictionary of Datasource UUIDs

Return type

dict

TODO: Should “measurement_type” be changed to “platform” to align with ModelScenario and ObsColumn?

static read_multisite_aqmesh(data_filepath, metadata_filepath, network='aqmesh_glasgow', instrument='aqmesh', sampling_period=60, measurement_type='insitu', overwrite=False)[source]#

Read AQMesh data for the Glasgow network

NOTE - temporary function until we know what kind of AQMesh data we’ll be retrieve in the future.

This data is different in that it contains multiple sites in the same file.

Return type

DefaultDict

static schema(species)[source]#

Define schema for surface observations Dataset.

Only includes mandatory variables
  • standardised species name (e.g. “ch4”)

  • expected dimensions: (“time”)

Expected data types for variables and coordinates also included.

Returns

Contains basic schema for ObsSurface.

Return type

DataSchema

# TODO: Decide how to best incorporate optional variables # e.g. “ch4_variability”, “ch4_number_of_observations”

static store_data(data, overwrite=False, required_metakeys=None)[source]#

This expects already standardised data such as ICOS / CEDA

Parameters
  • data (Dict) – Dictionary of data in standard format, see the data spec under

  • documentation (Development -> Data specifications in the) –

  • overwrite (bool) – If True overwrite currently stored data

  • required_metakeys (Optional[Sequence]) – Keys in the metadata we should use to store this metadata in the object store

  • to (if None it defaults) –

  • {"species"

  • "site"

  • "station_long_name"

  • "inlet"

  • "instrument"

:param : :param “network”: :param “source_format”: :param “data_source”: :param “icos_data_level”}:

Return type

Dict or None

store_hashes(hashes)[source]#

Store hashes of data retrieved from a remote data source such as ICOS or CEDA. This takes the full dictionary of hashes, removes the ones we’ve seen before and adds the new.

Parameters

hashes (Dict) – Dictionary of hashes provided by the hash_retrieved_data function

Return type

None

Returns

None

static validate_data(data, species)[source]#

Validate input data against ObsSurface schema - definition from ObsSurface.schema() method.

Parameters
  • data (Dataset) – xarray Dataset in expected format

  • species (str) – Species name

Return type

None

Returns

None

Raises a ValueError with details if the input data does not adhere to the ObsSurface schema.

Recombination functions#

These handle the recombination of data retrieved from the object store.

openghg.store.recombine_datasets(keys, sort=True, attrs_to_check=None, elevate_inlet=False)[source]#

Combines datasets stored separately in the object store into a single dataset

Parameters
  • keys (List[str]) – List of object store keys

  • sort (Optional[bool]) – Sort the resulting Dataset by the time dimension. Default = True

  • attrs_to_check (Optional[Dict[str, str]]) – Attributes to check for duplicates. If duplicates are present a new data variable will be created containing the values from each dataset If a dictionary is passed, the attribute(s) will be retained and the new value assigned. If a list/string is passed, the attribute(s) will be removed.

  • elevate_inlet (bool) – Force the elevation of the inlet attribute

Returns

Combined Dataset

Return type

xarray.Dataset

openghg.store.recombine_multisite(keys, sort=True)[source]#

Recombine the keys from multiple sites into a single Dataset for each site

Parameters
  • site_keys – A dictionary of lists of keys, keyed by site

  • sort (Optional[bool]) – Sort the resulting Dataset by the time dimension

Returns

Dictionary of xarray.Datasets

Return type

dict

Segmentation functions#

These handle the segmentation of data ready for storage in the object store.

openghg.store.assign_data(data_dict, lookup_results, overwrite, data_type)[source]#

Assign data to a Datasource. This will either create a new Datasource Create or get an existing Datasource for each gas in the file

Args:

data_dict: Dictionary containing data and metadata for species lookup_results: Dictionary of lookup results] overwrite: If True overwrite current data stored

Returns:

dict: Dictionary of UUIDs of Datasources data has been assigned to keyed by species name

Return type

Dict[str, Dict]