Standardisation functions#

Functions that accept data in specific formats, standardise it to a CF-compliant format and ensure it has the correct metadata attached. The data returned from these functions is then stored in the object store.

Measurement Standardisation#

These functions cover the four types of measurement we currently support.

Surface measurements#

openghg.standardise.standardise_surface(filepaths, source_format, site, network, inlet=None, instrument=None, sampling_period=None, overwrite=False)[source]#

Standardise surface measurements and store the data in the object store.

Parameters
  • filepaths (Union[str, Path, List, Tuple]) – Path of file(s) to process

  • source_format (str) – Format of data i.e. GCWERKS, CRDS, ICOS

  • site (str) – Site code

  • network (str) – Network name

  • inlet (Optional[str]) – Inlet height in metres

  • instrument (Optional[str]) – Instrument name

  • sampling_period (Optional[str]) – Sampling period as pandas time code, e.g. 1m for 1 minute, 1h for 1 hour

  • overwrite (bool) – Overwrite data currently present in the object store

Returns

Dictionary containing confirmation of standardisation process.

Return type

dict

Boundary Conditions#

openghg.standardise.standardise_bc(filepath, species, bc_input, domain, period=None, continuous=True, overwrite=False)[source]#

Standardise boundary condition data and store it in the object store.

Parameters
  • filepath (Union[str, Path]) – Path of boundary conditions file

  • species (str) – Species name

  • bc_input (str) – Input used to create boundary conditions. For example: - a model name such as “MOZART” or “CAMS” - a description such as “UniformAGAGE” (uniform values based on AGAGE average)

  • domain (str) – Region for boundary conditions

  • period (Union[str, tuple, None]) – Period of measurements, if not passed this is inferred from the time coords

  • overwrite (bool) – Should this data overwrite currently stored data.

Returns

Dictionary containing confirmation of standardisation process.

Return type

dict

Emissions / Flux#

openghg.standardise.standardise_flux(filepath, species, source, domain, date=None, high_time_resolution=False, period=None, continuous=True, overwrite=False)[source]#

Process flux data

Parameters
  • filepath (Union[str, Path]) – Path of emissions file

  • species (str) – Species name

  • domain (str) – Emissions domain

  • source (str) – Emissions source

  • high_time_resolution (Optional[bool]) – If this is a high resolution file

  • period (Union[str, tuple, None]) – Period of measurements, if not passed this is inferred from the time coords

  • overwrite (bool) – Should this data overwrite currently stored data.

Returns

Dictionary of Datasource UUIDs data assigned to

Return type

dict

Footprints#

openghg.standardise.standardise_footprint(filepath, site, height, domain, model, metmodel=None, species=None, network=None, period=None, continuous=True, retrieve_met=False, high_spatial_res=False, high_time_res=False, overwrite=False)[source]#

Reads footprint data files and returns the UUIDs of the Datasources the processed data has been assigned to

Parameters
  • filepath (Union[str, Path]) – Path of file to load

  • site (str) – Site name

  • network (Optional[str]) – Network name

  • height (str) – Height above ground level in metres

  • domain (str) – Domain of footprints

  • model_params – Model run parameters

  • retrieve_met (bool) – Whether to also download meterological data for this footprints area

  • high_spatial_res (bool) – Indicate footprints include both a low and high spatial resolution.

  • high_time_res (bool) – Indicate footprints are high time resolution (include H_back dimension) Note this will be set to True automatically for Carbon Dioxide data.

  • overwrite (bool) – Overwrite any currently stored data

Returns

Dictionary containing confirmation of standardisation process. None if file already processed.

Return type

dict / None

Behind the scence these functions use parsing functions that are written specifically for each data type. Please see the Developer API for these functions.

Metadata#

openghg.standardise.meta.assign_attributes(data, site=None, network=None, sampling_period=None)[source]#

Assign attributes to each site and species dataset. This ensures that the xarray Datasets produced are CF 1.7 compliant. Some of the attributes written to the Dataset are saved as metadata to the Datasource allowing more detailed searching of data.

Parameters
  • data (Dict) – Dictionary containing data, metadata and attributes

  • site (Optional[str]) – Site code

  • sampling_period (Union[str, float, int, None]) – Number of seconds for which air sample is taken. Only for time variable attribute

  • network (Optional[str]) – Network name

Returns

Dictionary of combined data with correct attributes assigned to Datasets

Return type

dict

openghg.standardise.meta.get_attributes(ds, species, site, network=None, global_attributes=None, units=None, scale=None, sampling_period=None, date_range=None)[source]#

This function writes attributes to an xarray.Dataset so that they conform with the CF Convention v1.6

Attributes of the xarray DataSet are modified, and variable names are changed

Variable naming related to species name will be defined using define_species_label() function.

Parameters
  • ds (Dataset) – Should contain variables such as “ch4”, “ch4 repeatability”. Must have a “time” dimension.

  • species (str) – Species name. e.g. “CH4”, “HFC-134a”, “dCH4C13”

  • site (str) – Three-letter site code

  • network (Optional[str]) – Network site is associated with

  • global_attribuates – Dictionary containing any info you want to add to the file header (e.g. {“Contact”: “Contact_Name”})

  • units (Optional[str]) – This routine will try to guess the units unless this is specified. Options are in units_interpret

  • scale (Optional[str]) – Calibration scale for species.

  • sampling_period (Union[str, float, int, None]) – Number of seconds for which air sample is taken. Only for time variable attribute

  • date_range (Optional[List[str]]) – Start and end date for output If you only want an end date, just put a very early start date (e.g. [“1900-01-01”, “2010-01-01”])

Return type

Dataset