Standardise - metadata#

These handle the assignment and standardisation of metadata and ensure the NetCDF created during standardisation has the correct attributes assigned to it.

openghg.standardise.meta.assign_attributes(data, site=None, network=None, sampling_period=None, update_mismatch='never', site_filepath=None, species_filepath=None)[source]#

Assign attributes to each site and species dataset. This ensures that the xarray Datasets produced are CF 1.7 compliant. Some of the attributes written to the Dataset are saved as metadata to the Datasource allowing more detailed searching of data.

If accessing underlying stored site or species definitions, this will be accessed from the openghg/supplementary_data repository by default.

Parameters:
  • data (Dict) – Dictionary containing data, metadata and attributes

  • site (Optional[str]) – Site code

  • sampling_period (Union[str, float, int, None]) – Number of seconds for which air sample is taken. Only for time variable attribute

  • network (Optional[str]) – Network name

  • update_mismatch (str) –

    This determines how mismatches between the internal data “attributes” and the supplied / derived “metadata” are handled. This includes the options:

    • ”never” - don’t update mismatches and raise an AttrMismatchError

    • ”from_source” / “attributes” - update mismatches based on input data (e.g. data attributes)

    • ”from_definition” / “metadata” - update mismatches based on associated data (e.g. site_info.json)

  • site_filepath (Union[str, Path, None]) – Alternative site info file

  • species_filepath (Union[str, Path, None]) – Alternative species info file

Returns:

Dictionary of combined data with correct attributes assigned to Datasets

Return type:

dict

openghg.standardise.meta.get_attributes(ds, species, site, network=None, global_attributes=None, units=None, scale=None, sampling_period=None, date_range=None, site_filepath=None, species_filepath=None)[source]#

This function writes attributes to an xarray.Dataset so that they conform with the CF Convention v1.6

Attributes of the xarray DataSet are modified, and variable names are changed

If accessing underlying stored site or species definitions, this will be accessed from the openghg/supplementary_data repository by default.

Variable naming related to species name will be defined using define_species_label() function.

Parameters:
  • ds (Dataset) – Should contain variables such as “ch4”, “ch4 repeatability”. Must have a “time” dimension.

  • species (str) – Species name. e.g. “CH4”, “HFC-134a”, “dCH4C13”

  • site (str) – Three-letter site code

  • network (Optional[str]) – Network site is associated with

  • global_attribuates – Dictionary containing any info you want to add to the file header (e.g. {“Contact”: “Contact_Name”})

  • units (Optional[str]) – This routine will try to guess the units unless this is specified. Options are in units_interpret

  • scale (Optional[str]) – Calibration scale for species.

  • sampling_period (Union[str, float, int, None]) – Number of seconds for which air sample is taken. Only for time variable attribute

  • date_range (Optional[List[str]]) – Start and end date for output If you only want an end date, just put a very early start date (e.g. [“1900-01-01”, “2010-01-01”])

  • site_filepath (Union[str, Path, None]) – Alternative site info file

  • species_filepath (Union[str, Path, None]) – Alternative species info file

Return type:

Dataset

openghg.standardise.meta.sync_surface_metadata(metadata, attributes, keys_to_add=None, update_mismatch='never')[source]#

Makes sure any duplicated keys between the metadata and attributes dictionaries match and that certain keys are present in the metadata.

Parameters:
  • metadata (Dict) – Dictionary of metadata

  • attributes (Dict) – Attributes

  • keys_to_add (Optional[List]) – Add these keys to the metadata, if not present, based on

  • Note (the attribute values.) – this skips any keys which can’t be

  • values. (copied from the attribute) –

  • update_mismatch (str) –

    If case insensitive mismatch is found between an attribute and a metadata value, this determines the function behaviour. This includes the options:

    • ”never” - don’t update mismatches and raise an AttrMismatchError

    • ”from_source” / “attributes” - update mismatches based on input attributes

    • ”from_definition” / “metadata” - update mismatches based on input metadata

Returns:

Aligned metadata, attributes

Return type:

dict, dict