Standardise#

Functions that accept data in specific formats, standardise it to a CF-compliant format and ensure it has the correct metadata attached. The data returned from these functions is then stored in the object store.

Measurement Standardisation#

These functions cover the four types of measurement we currently support.

Surface measurements#

openghg.standardise.standardise_surface(filepaths, source_format, site, network, inlet=None, instrument=None, sampling_period=None, calibration_scale=None, overwrite=False)[source]#

Standardise surface measurements and store the data in the object store.

Parameters:
  • filepaths (Union[str, Path, List, Tuple]) – Path of file(s) to process

  • source_format (str) – Format of data i.e. GCWERKS, CRDS, ICOS

  • site (str) – Site code

  • network (str) – Network name

  • inlet (Optional[str]) – Inlet height in metres

  • instrument (Optional[str]) – Instrument name

  • sampling_period (Optional[str]) – Sampling period as pandas time code, e.g. 1m for 1 minute, 1h for 1 hour

  • overwrite (bool) – Overwrite data currently present in the object store

Returns:

Dictionary containing confirmation of standardisation process.

Return type:

dict

Boundary Conditions#

openghg.standardise.standardise_bc(filepath, species, bc_input, domain, period=None, continuous=True, overwrite=False)[source]#

Standardise boundary condition data and store it in the object store.

Parameters:
  • filepath (Union[str, Path]) – Path of boundary conditions file

  • species (str) – Species name

  • bc_input (str) – Input used to create boundary conditions. For example: - a model name such as “MOZART” or “CAMS” - a description such as “UniformAGAGE” (uniform values based on AGAGE average)

  • domain (str) – Region for boundary conditions

  • period (Union[str, tuple, None]) – Period of measurements, if not passed this is inferred from the time coords

  • continuous (bool) – Whether time stamps have to be continuous.

  • overwrite (bool) – Should this data overwrite currently stored data.

Returns:

Dictionary containing confirmation of standardisation process.

Return type:

dict

Emissions / Flux#

openghg.standardise.standardise_flux(filepath, species, source, domain, high_time_resolution=False, period=None, chunks=None, continuous=True, overwrite=False)[source]#

Process flux data

Parameters:
  • filepath (Union[str, Path]) – Path of emissions file

  • species (str) – Species name

  • source (str) – Emissions source

  • domain (str) – Emissions domain

  • date – Date as a string e.g. “2012” or “201206” associated with emissions as a string. Only needed if this can not be inferred from the time coords

  • high_time_resolution (Optional[bool]) – If this is a high resolution file

  • period (Union[str, tuple, None]) – Period of measurements, if not passed this is inferred from the time coords

  • continuous (bool) – Whether time stamps have to be continuous.

  • overwrite (bool) – Should this data overwrite currently stored data.

Returns:

Dictionary of Datasource UUIDs data assigned to

Return type:

dict

Footprints#

openghg.standardise.standardise_footprint(filepath, site, domain, model, inlet=None, height=None, metmodel=None, species=None, network=None, period=None, chunks='auto', continuous=True, retrieve_met=False, high_spatial_res=False, high_time_res=False, overwrite=False)[source]#

Reads footprint data files and returns the UUIDs of the Datasources the processed data has been assigned to

Parameters:
  • filepath (Union[str, Path]) – Path of file to load

  • site (str) – Site name

  • domain (str) – Domain of footprints

  • model (str) – Model used to create footprint (e.g. NAME or FLEXPART)

  • inlet (Optional[str]) – Height above ground level in metres. Format ‘NUMUNIT’ e.g. “10m”

  • height (Optional[str]) – Alias for inlet. One of height or inlet must be included.

  • metmodel (Optional[str]) – Underlying meteorlogical model used (e.g. UKV)

  • species (Optional[str]) – Species name. Only needed if footprint is for a specific species e.g. co2 (and not inert)

  • network (Optional[str]) – Network name

  • period (Union[str, tuple, None]) – Period of measurements. Only needed if this can not be inferred from the time coords

  • chunks (Union[int, Dict, Literal['auto'], None]) – Chunk size to use when opening the NetCDF. Set to “auto” for automated chunk sizing

  • continuous (bool) – Whether time stamps have to be continuous.

  • retrieve_met (bool) – Whether to also download meterological data for this footprints area

  • high_spatial_res (bool) – Indicate footprints include both a low and high spatial resolution.

  • high_time_res (bool) – Indicate footprints are high time resolution (include H_back dimension) Note this will be set to True automatically for Carbon Dioxide data.

  • overwrite (bool) – Overwrite any currently stored data

Returns:

Dictionary containing confirmation of standardisation process. None if file already processed.

Return type:

dict / None

Helpers#

Some of the functions above require quite specific arguments as we must ensure all metadata attriuted to data is as correct as possible. These functions help you find the correct arguments in each case.

Behind the scences these functions use parsing functions that are written specifically for each data type. Please see the Developer API for these functions.