Developer API#
The functions and methods documented in this section are the internal workings of the OpenGHG library. They are subject to change without warning due to the early stages of development of the project.
Warning
Normal users should not use any of the functions shown here directly as they may be removed or their functionality may change.
Standardisation#
Surface measurements#
These functions take surface measurement data and standardise it for storage in the object store. They ensure the correct metadata and attributes
are recorded with the data, and that the data is CF compliant. They are called by the ObsSurface
class.
parse_aqmesh()
For processing data from the AQMesh network
parse_beaco2n()
For processing data from the BEACO2N network
parse_btt()
For processing data from the BT Tower site
parse_cranfield()
For processing data from Cranfield
parse_crds()
For processing data from CRDS (cavity ring-down spectroscopy) data from the DECC network.
parse_eurocom()
For processing data from the EUROCOM network
parse_gcwerks()
For processing data in the form expected by the GCWERKS package
parse_noaa()
For processing data from the NOAA network
parse_npl()
For processing data from NPL
parse_tmb()
For processing data from the Thames Barrier site
Metadata handling#
These handle the assignment and standardisation of meta`data`.
Attributes#
Ensuring the NetCDF created during standardisation has the correct attributes assigned to it.
assign_attributes()
Assign attributes to a number of datasets.
get_attributes()
Assign attributes to a single dataset, called by the above.
Metadata sync#
surface_standardise()
Ensure the required metadata is shared between the metadata and attributes.
Storage#
These functions and classes handle the lower level storage and retrieval of data from the object store.
Base class#
This provides the functionality required by all data storage and processing classes, namely the saving, retrieval and loading of data from the object store.
BaseStore
Base class which the other core processing modules inherit
Datasource#
The Datasource is the smallest data provider within the OpenGHG topology. A Datasource represents a data provider such as an instrument measuring a specific gas at a specific height at a specific site. For an instrument measuring three gas species at an inlet height of 100m at a site we would have three Datasources.
Datasource
Handles the storage of data, metadata and version information for measurements
Emissions#
Handles the storage of emissions data.
Emissions
EulerianModel#
Handles the storage of Eulerian model data.
METStore#
Handles the storage of met data from the ECMWF data.
Footprints#
Handles the storage of footprints / flux data.
ObsSurface#
Handles the storage of surface measurement data.
Dataclasses#
These dataclasses are used to facilitate the simple packaging and retrieval of data from the object store.
_BaseData
The base class that (most of) the dataclasses inherit.
FluxData
Stores flux data.
FootprintData
Stores footprint data.
METData
Stores meteorological data retrieved from the ECMWF / our local cache.
ObsData
Stores data returned by search functions.
RankSources
Handles the ranking of datasources.
SearchResults
Makes the retrieval of data simple.
Retrieval functions#
These handle the retrieval of data from the object store.
Searching#
search()
Search for data in the object store, accepts any pair of keyword - argument pairs
Example usage:
search(site=”bsd”, inlet=”50m”, species=”co2”, interesting_metadata=”special_units”)
Specific retrieval#
Handle the retrieval of specific data types, some functions may try to mirror the interface of functions in the Bristol ACRG repository but should hopefully be useful to all users.
get_obs_surface()
Get measurements from one site
get_flux()
Reads in all flux files for the domain and species as an
xarray
Datasetget_footprint()
Gets footprints from one site
Object Store#
These functions handle the storage of data in the object store, in JSON or binary format. Each object and piece of data in the
object store is stored at a specific key
, which can be thought of as the address of the data. The data is stored
in a bucket
which in the cloud is a section of the OpenGHG object store. Locally a bucket
is just a normal
directory in the user’s filesystem specific by the OPENGHG_PATH
environment variable.
delete_object()
Delete an object in the store
exists()
Check if an object exists at that key
get_abs_filepaths()
Get absolute filepaths for objects
get_bucket()
Get path to bucket
get_local_bucket()
Get path to local bucket
get_md5()
Get MD5 has of file
get_md5_bytes()
Get MD5 hash of passed bytes
get_object()
Retrieve object from object store
get_object_from_json()
Retrieve JSON object from object store
hash_files()
Get the MD5 hashes of the given files
set_object_from_file()
Set an object in the object store
set_object_from_json()
Create a JSON object in the object store
Utility functions#
This module contains all the helper functions used throughout OpenGHG.
Exporting#
These are used to export data to a format readable by the OpenGHG data dashboard.
to_dashboard()
Export timeseries data to JSON
to_dashboard_mobile()
Export mobile observations data to JSON
Hashing#
These handle hashing of data (usually with SHA1)
hash_file()
Calculate the SHA1 hash of a file
hash_string()
Calculate the SHA1 hash of a UTF-8 encoded string
String manipulation#
String cleaning and formatting functions
clean_string()
Return a lowercase cleaned string
to_lowercase()
Converts a string to lowercase
remove_punctuation()
Removes punctuation from a string
Time#
Helpers to deal with all things datetime.
timestamp_tzaware()
Create a Timestamp with a UTC timezone
timestamp_now()
Create a timezone aware timestamp for now
timestamp_epoch()
Create a timezone aware timestamp for the UNIX epoch (1970-01-01)
daterange_from_str()
Create a daterange from two timestamp strings
daterange_to_str()
Convert a daterange to string
create_daterange_str()
Create a daterange string from two timestamps or strings
create_daterange()
Create a pandas DatetimeIndex from two timestamps
daterange_overlap()
Check if two dateranges overlap
combine_dateranges()
Combine a list of dateranges
split_daterange_str()
Split a daterange string to the component start and end Timestamps
closest_daterange()
Finds the closest daterange in a list of dateranges
valid_daterange()
Check if the passed daterange is valid
find_daterange_gaps()
Find the gaps in a list of dateranges
trim_daterange()
Removes overlapping dates from to_trim
split_encompassed_daterange()
Checks if one of the passed dateranges contains the other, if so, then split the larger daterange into three sections.
daterange_contains()
Checks if one daterange contains another
sanitise_daterange()
Make sure the daterange is correct and return tzaware daterange.
check_nan()
Check if the given value is NaN, is so return an NA string
check_date()
Check if the passed string is a valid date or not, if not returns NA
Iteration#
Our own personal itertools
pairwise()
Return a zip of an iterable where a is the iterable and b is the iterable advanced one step.
unanimous()
Checks that all values in an iterable object are the same
Site Checks#
These perform checks to ensure data processed for each site is correct
verify_site()
Verify that the given site is one we recognize, does fuzzy text matching to suggest a possible valid value.
multiple_inlets()
Check if the passed site has more than one inlet
Cloud#
These handle cloud based functionality
running_in_cloud()
Checks if we’re running in the cloud by checking for the
OPENGHG_CLOUD
environment variable.
Custom Data Types#
Errors#
Customised errors for OpenGHG.
InvalidSiteError
Raised if an invalid site is given
UnknownDataError
Raised if we don’t recognize the data passed
FunctionError
Raised if there has been an error with a serverless function.
ObjectStoreError
Raised if an error accessing an object at a key in the object store occurs
Types#
These are used in conjunction with mypy
to make type hinting easier.
pathType
multiPathType
resultsType