Retrieval functions#

These handle the retrieval of data from the object store.

Search functions#

We have a number of search functions, most customised to the data type, which we hope will make it easier for users to find the data they require from the object store.

To search for surface observations we recommend the use of search_surface.

openghg.retrieve.search_surface(species=None, site=None, inlet=None, instrument=None, measurement_type=None, source_format=None, network=None, start_date=None, end_date=None, data_source=None, sampling_height=None, icos_data_level=None)[source]#

Cloud object store search

  • species (Union[str, List[str], None]) – Species

  • site (Union[str, List[str], None]) – Three letter site code

  • inlet (Union[str, List[str], None]) – Inlet height

  • instrument (Union[str, List[str], None]) – Instrument name

  • measurement_type (Union[str, List[str], None]) – Measurement type

  • data_type – Data type e.g. “surface”, “column”, “emissions” See for full details.

  • start_date (Union[str, List[str], None]) – Start date

  • end_date (Union[str, List[str], None]) – End date

  • data_source (Optional[str]) – Source of data, e.g. noaa_obspack, icoscp, ceda_archive. This

  • sources. (argument only needs to be used to narrow the search to data solely from these) –

  • sampling_height (Optional[str]) – Sampling height of measurements

  • icos_data_level (Optional[int]) – ICOS data level, see ICOS documentation


SearchResults object

Return type


For a more general search you can use the search function directly. This function accepts any number of keyword arguments.**kwargs)[source]#

Search for observations data. Any keyword arguments may be passed to the the function and these keywords will be used to search the metadata associated with each Datasource.

This function detects the running environment and routes the call to either the cloud or local search function.

Example / commonly used arguments are given below.

  • species – Terms to search for in Datasources

  • locations – Where to search for the terms in species

  • inlet – Inlet height such as 100m

  • instrument – Instrument name such as picarro

  • find_all – Require all search terms to be satisfied

  • start_date – Start datetime for search.

  • epoch (If None a start datetime of UNIX) –

  • end_date – End datetime for search.

  • set (If None an end datetime of the current datetime is) –


SearchResults object is results found, otherwise None

Return type

SearchResults or None

Specific retrieval functions#

openghg.retrieve.get_obs_surface(site, species, inlet=None, start_date=None, end_date=None, average=None, network=None, instrument=None, calibration_scale=None, keep_missing=False, skip_ranking=False)[source]#

This is the equivalent of the get_obs function from the ACRG repository.

Usage and return values are the same whilst implementation may differ.

  • site (str) – Site of interest e.g. MHD for the Mace Head site.

  • species (str) – Species identifier e.g. ch4 for methane.

  • start_date (Union[str, Timestamp, None]) – Output start date in a format that Pandas can interpret

  • end_date (Union[str, Timestamp, None]) – Output end date in a format that Pandas can interpret

  • inlet (Optional[str]) – Inlet label

  • average (Optional[str]) – Averaging period for each dataset. Each value should be a string of

  • "2H" (the form e.g.) –

  • "30min" (should match pandas offset aliases format) –

  • keep_missing (bool) – Keep missing data points or drop them.

  • network (Optional[str]) – Network for the site/instrument (must match number of sites).

  • instrument (Optional[str]) – Specific instrument for the sipte (must match number of sites).

  • calibration_scale (Optional[str]) – Convert to this calibration scale


ObsData object if data found, else None

Return type

ObsData or None

openghg.retrieve.get_flux(species, source, domain, start_date=None, end_date=None, time_resolution=None)[source]#

The flux function reads in all flux files for the domain and species as an xarray Dataset. Note that at present ALL flux data is read in per species per domain or by emissions name. To be consistent with the footprints, fluxes should be in mol/m2/s.

  • species (str) – Species name

  • source (str) – Source name

  • domain (str) – Domain e.g. EUROPE

  • start_date (Optional[Timestamp]) – Start date

  • end_date (Optional[Timestamp]) – End date

  • time_resolution (Optional[str]) – One of [“standard”, “high”]


FluxData object

Return type


openghg.retrieve.get_footprint(site, domain, height, model=None, start_date=None, end_date=None, species=None)[source]#

Get footprints from one site.

  • site (str) – The name of the site given in the footprints. This often matches to the site name but if the same site footprints are run with a different met and they are named slightly differently from the obs file. E.g. site=”DJI”, site_modifier = “DJI-SAM” - station called DJI, footprints site called DJI-SAM

  • domain (str) – Domain name for the footprints

  • height (str) – Height of inlet in metres

  • start_date (Optional[Timestamp]) – Output start date in a format that Pandas can interpret

  • end_date (Optional[Timestamp]) – Output end date in a format that Pandas can interpret

  • species (Optional[str]) – Species identifier e.g. “co2” for carbon dioxide. Only needed if species needs a modified footprints from the typical 30-day footprints appropriate for a long-lived species (like methane) e.g. for high time resolution (co2) or is a short-lived species.


FootprintData dataclass

Return type


openghg.retrieve.get_bc(species, domain, bc_input=None, start_date=None, end_date=None)[source]#

Get boundary conditions for a given species, domain and bc_input name.

  • species (str) – Species name

  • bc_input (Optional[str]) – Input used to create boundary conditions. For example: - a model name such as “MOZART” or “CAMS” - a description such as “UniformAGAGE” (uniform values based on AGAGE average)

  • domain (str) – Region for boundary conditions e.g. EUROPE

  • start_date (Optional[Timestamp]) – Start date

  • end_date (Optional[Timestamp]) – End date


BoundaryConditionsData object

Return type