Standardise - metadata#
These handle the assignment and standardisation of metadata and ensure the NetCDF created during standardisation has the correct attributes assigned to it.
- openghg.standardise.meta.assign_attributes(data, site=None, network=None, sampling_period=None, update_mismatch='never', site_filepath=None, species_filepath=None)[source]#
Assign attributes to each site and species dataset. This ensures that the xarray Datasets produced are CF 1.7 compliant. Some of the attributes written to the Dataset are saved as metadata to the Datasource allowing more detailed searching of data.
If accessing underlying stored site or species definitions, this will be accessed from the openghg/openghg_defs repository by default.
- Parameters:
data (
dict
) – Dictionary containing data, metadata and attributessite (
Optional
[str
]) – Site codesampling_period (
Union
[str
,int
,float
,None
]) – Number of seconds for which air sample is taken. Only for time variable attributenetwork (
Optional
[str
]) – Network nameupdate_mismatch (
str
) –This determines how mismatches between the internal data “attributes” and the supplied / derived “metadata” are handled. This includes the options:
”never” - don’t update mismatches and raise an AttrMismatchError
”from_source” / “attributes” - update mismatches based on input data (e.g. data attributes)
”from_definition” / “metadata” - update mismatches based on associated data (e.g. site_info.json)
site_filepath (
Union
[str
,Path
,None
]) – Alternative site info filespecies_filepath (
Union
[str
,Path
,None
]) – Alternative species info file
- Returns:
Dictionary of combined data with correct attributes assigned to Datasets
- Return type:
dict
- openghg.standardise.meta.get_attributes(ds, species, site, network=None, global_attributes=None, units=None, scale=None, sampling_period=None, date_range=None, site_filepath=None, species_filepath=None)[source]#
This function writes attributes to an xarray.Dataset so that they conform with the CF Convention v1.6
Attributes of the xarray DataSet are modified, and variable names are changed
If accessing underlying stored site or species definitions, this will be accessed from the openghg/openghg_defs repository by default.
Variable naming related to species name will be defined using define_species_label() function.
- Parameters:
ds (
Dataset
) – Should contain variables such as “ch4”, “ch4 repeatability”. Must have a “time” dimension.species (
str
) – Species name. e.g. “CH4”, “HFC-134a”, “dCH4C13”site (
str
) – Three-letter site codenetwork (
Optional
[str
]) – Network site is associated withglobal_attribuates – Dictionary containing any info you want to add to the file header (e.g. {“Contact”: “Contact_Name”})
units (
Optional
[str
]) – This routine will try to guess the units unless this is specified. Options are in units_interpretscale (
Optional
[str
]) – Calibration scale for species.sampling_period (
Union
[str
,int
,float
,None
]) – Number of seconds for which air sample is taken. Only for time variable attributedate_range (
Optional
[list
[str
]]) – Start and end date for output If you only want an end date, just put a very early start date (e.g. [“1900-01-01”, “2010-01-01”])site_filepath (
Union
[str
,Path
,None
]) – Alternative site info filespecies_filepath (
Union
[str
,Path
,None
]) – Alternative species info file
- Return type:
Dataset
- openghg.standardise.meta.sync_surface_metadata(metadata, attributes, keys_to_add=None, update_mismatch='never')[source]#
Makes sure any duplicated keys between the metadata and attributes dictionaries match and that certain keys are present in the metadata.
- Parameters:
metadata (
dict
) – Dictionary of metadataattributes (
dict
) – Attributeskeys_to_add (
Optional
[list
]) – Add these keys to the metadata, if not present, based onNote (the attribute values.) – this skips any keys which can’t be
values. (copied from the attribute)
update_mismatch (
str
) –If case insensitive mismatch is found between an attribute and a metadata value, this determines the function behaviour. This includes the options:
”never” - don’t update mismatches and raise an AttrMismatchError
”from_source” / “attributes” - update mismatches based on input attributes
”from_definition” / “metadata” - update mismatches based on input metadata
- Returns:
Aligned metadata, attributes
- Return type:
dict, dict