Store#

Storage#

These classes are used to store each type of data in the object store. Each has a static load function that loads a version of itself from the object store. The read_file function is then used to read data files, call standardisation functions based on the format of the data file, collect metadata and then store the data and metadata in the object store.

openghg.store.BoundaryConditions#

The BoundaryConditions class is used to standardise and store boundary conditions data.

class openghg.store.BoundaryConditions(bucket)[source]#

This class is used to process boundary condition data

read_data(binary_data, metadata, file_metadata, source_format)[source]#

Ready a footprint from binary data

Parameters:

binary_data (bytes) – Footprint data
metadata (dict) – Dictionary of metadata
file_metadat – File metadata
source_format (str) – Type of data being input e.g. openghg (internal format)

Returns:

UUIDs of Datasources data has been assigned to

Return type:

dict

read_file(filepath, species, bc_input, domain, source_format, tag=None, period=None, continuous=True, if_exists='auto', save_current='auto', overwrite=False, force=False, compressor=None, filters=None, chunks=None, info_metadata=None)[source]#

Read boundary conditions file

Parameters:

filepath (Union[str, Path]) – Path of boundary conditions file
species (str) – Species name
bc_input (str) – Input used to create boundary conditions. For example: - a model name such as “MOZART” or “CAMS” - a description such as “UniformAGAGE” (uniform values based on AGAGE average)
domain (str) – Region for boundary conditions
source_format (str) – Type of data being input e.g. openghg (internal format)
period (str | tuple | None) –
Period of measurements. Only needed if this can not be inferred from the time coords If specified, should be one of:
- ”yearly”, “monthly”
- suitable pandas Offset Alias
- tuple of (value, unit) as would be passed to pandas.Timedelta function
continuous (bool) – Whether time stamps have to be continuous.
if_exists (str) –
What to do if existing data is present. - “auto” - checks new and current data for timeseries overlap
- adds data if no overlap
- raises DataOverlapError if there is an overlap
- ”new” - just include new data and ignore previous
- ”combine” - replace and insert new data into current timeseries
save_current (str) – Whether to save data in current form and create a new version. - “auto” - this will depend on if_exists input (“auto” -> False), (other -> True) - “y” / “yes” - Save current data exactly as it exists as a separate (previous) version - “n” / “no” - Allow current data to updated / deleted
overwrite (bool) – Deprecated. This will use options for if_exists=”new”.
force (bool) – Force adding of data even if this is identical to data stored.
compressor (Any | None) – A custom compressor to use. If None, this will default to Blosc(cname=”zstd”, clevel=5, shuffle=Blosc.SHUFFLE). See https://zarr.readthedocs.io/en/stable/api/codecs.html for more information on compressors.
filters (Any | None) – Filters to apply to the data on storage, this defaults to no filtering. See https://zarr.readthedocs.io/en/stable/tutorial.html#filters for more information on picking filters.
chunks (dict | None) – Chunking schema to use when storing data. It expects a dictionary of dimension name and chunk size, for example {“time”: 100}. If None then a chunking schema will be set automatically by OpenGHG. See documentation for guidance on chunking: https://docs.openghg.org/tutorials/local/Adding_data/Adding_ancillary_data.html#chunking. To disable chunking pass in an empty dictionary.
info_metadata (dict | None) – Allows to pass in additional tags to describe the data. e.g {“comment”:”Quality checks have been applied”}

Returns:

of dictionaries of files processed and datasource UUIDs data assigned to, plus “required” metadata

Return type:

list

static schema()[source]#

Define schema for boundary conditions Dataset.

Includes volume mole fractions for each time and ordinal, vertical boundary at the edge of the defined domain:

“vmr_n”, “vmr_s”
- expected dimensions: (“time”, “height”, “lon”)
“vmr_e”, “vmr_w”
- expected dimensions: (“time”, “height”, “lat”)

Expected data types for all variables and coordinates also included.

Returns:: Contains schema for BoundaryConditions.
Return type:: DataSchema

static validate_data(data)[source]#

Validate input data against BoundaryConditions schema - definition from BoundaryConditions.schema() method.

Parameters:

data (Dataset) – xarray Dataset in expected format

Return type:

None

Returns:

None

Raises a ValueError with details if the input data does not adhere to the BoundaryConditions schema.

openghg.store.Emissions#

The Emissions class is used to process emissions / flux data files.

openghg.store.EulerianModel#

The EulerianModel class is used to process Eulerian model data.

class openghg.store.EulerianModel(bucket)[source]#

This class is used to process Eulerian model data

read_file(filepath, model, species, source_format='openghg', start_date=None, end_date=None, setup=None, tag=None, if_exists='auto', save_current='auto', overwrite=False, force=False, compressor=None, filters=None, chunks=None, info_metadata=None)[source]#

Read Eulerian model output

Parameters:

filepath (Union[str, Path]) – Path of Eulerian model species output
model (str) – Eulerian model name
species (str) – Species name
source_format (str) – Data format, for example openghg (internal format)
start_date (str | None) – Start date (inclusive) associated with model run
end_date (str | None) – End date (exclusive) associated with model run
setup (str | None) – Additional setup details for run
tag (str | list | None) – Special tagged values to add to the Datasource. This will be added to any current values if the tag key already exists in a list.
if_exists (str) –
What to do if existing data is present. - “auto” - checks new and current data for timeseries overlap
- adds data if no overlap
- raises DataOverlapError if there is an overlap
- ”new” - just include new data and ignore previous
- ”combine” - replace and insert new data into current timeseries
save_current (str) – Whether to save data in current form and create a new version. - “auto” - this will depend on if_exists input (“auto” -> False), (other -> True) - “y” / “yes” - Save current data exactly as it exists as a separate (previous) version - “n” / “no” - Allow current data to updated / deleted
overwrite (bool) – Deprecated. This will use options for if_exists=”new”.
force (bool) – Force adding of data even if this is identical to data stored.
compressor (Any | None) – A custom compressor to use. If None, this will default to Blosc(cname=”zstd”, clevel=5, shuffle=Blosc.SHUFFLE). See https://zarr.readthedocs.io/en/stable/api/codecs.html for more information on compressors.
filters (Any | None) – Filters to apply to the data on storage, this defaults to no filtering. See https://zarr.readthedocs.io/en/stable/tutorial.html#filters for more information on picking filters.
chunks (dict | None) – Chunking schema to use when storing data. It expects a dictionary of dimension name and chunk size, for example {“time”: 100}. If None then a chunking schema will be set automatically by OpenGHG. See documentation for guidance on chunking: https://docs.openghg.org/tutorials/local/Adding_data/Adding_ancillary_data.html#chunking. To disable chunking pass in an empty dictionary.
info_metadata (dict | None) – Allows to pass in additional tags to describe the data. e.g {“comment”:”Quality checks have been applied”}

Return type:

list[dict]

openghg.store.Footprints#

The Footprints class is used to store and retrieve meteorological data from the ECMWF data store. Some data may be cached locally for quicker access.

class openghg.store.Footprints(bucket)[source]#

This class is used to process footprints model output

chunking_schema(time_resolved=False, high_time_resolution=False, high_spatial_resolution=False, short_lifetime=False, source_format='')[source]#

Get chunking schema for footprint data.

Parameters:

time_resolved (bool) – Set footprint variable to be high time resolution.
high_time_resolution (bool) – This argument is deprecated and will be replaced in future versions with time_resolved.
high_spatial_resolution (bool) – Set footprint variables include high and low resolution options.
short_lifetime (bool) – Include additional particle age parameters for short lived species.

Returns:

Chunking schema for footprint data.

Return type:

dict

read_data(binary_data, metadata, file_metadata)[source]#

Ready a footprint from binary data

Parameters:

binary_data (bytes) – Footprint data
metadata (dict) – Dictionary of metadata
file_metadat – File metadata

Returns:

UUIDs of Datasources data has been assigned to

Return type:

dict

read_file(domain, model, filepath, site=None, satellite=None, obs_region=None, selection=None, inlet=None, height=None, met_model=None, species=None, network=None, period=None, tag=None, continuous=True, chunks=None, source_format='acrg_org', retrieve_met=False, high_spatial_resolution=False, time_resolved=False, high_time_resolution=False, short_lifetime=False, if_exists='auto', save_current='auto', overwrite=False, force=False, sort=False, drop_duplicates=False, compressor=None, filters=None, info_metadata=None)[source]#

Reads footprints data files and returns the UUIDS of the Datasources the processed data has been assigned to

Parameters:

filepath (Union[str, Path, tuple, list]) – Path(s) of file(s) to standardise
site (str | None) – Site name
domain (str) – Domain of footprints
satellite (str | None) – Satellite name
obs_region (str | None) – The geographic region covered by the data (“BRAZIL”, “INDIA”, “UK”).
model (str) – Model used to create footprint (e.g. NAME or FLEXPART)
inlet (str | None) – Height above ground level in metres. Format ‘NUMUNIT’ e.g. “10m”
height (str | None) – Alias for inlet. One of height or inlet MUST be included.
met_model (str | None) – Underlying meteorlogical model used (e.g. UKV)
species (str | None) – Species name. Only needed if footprint is for a specific species e.g. co2 (and not inert)
network (str | None) – Network name
period (str | tuple | None) – Period of measurements. Only needed if this can not be inferred from the time coords
tag (str | list | None) – Special tagged values to add to the Datasource. This will be added to any current values if the tag key already exists in a list.
continuous (bool) – Whether time stamps have to be continuous.
chunks (dict | None) – Chunk schema to use when storing data the NetCDF. It expects a dictionary of dimension name and chunk size, for example {“time”: 100}. If None then a chunking schema will be set automatically by OpenGHG.
source_format (str) – Type of data being input e.g. acrg_org
retrieve_met (bool) – Whether to also download meterological data for this footprints area
high_spatial_resolution (bool) – Indicate footprints include both a low and high spatial resolution.
time_resolved (bool) – Indicate footprints are high time resolution (include H_back dimension) Note this will be set to True automatically if species=”co2” (Carbon Dioxide).
high_time_resolution (bool) – This argument is deprecated and will be replaced in future versions with time_resolved.
short_lifetime (bool) – Indicate footprint is for a short-lived species. Needs species input. Note this will be set to True if species has an associated lifetime.
if_exists (str) –
What to do if existing data is present. - “auto” - checks new and current data for timeseries overlap
- adds data if no overlap
- raises DataOverlapError if there is an overlap
- ”new” - just include new data and ignore previous
- ”combine” - replace and insert new data into current timeseries
save_current (str) – Whether to save data in current form and create a new version. - “auto” - this will depend on if_exists input (“auto” -> False), (other -> True) - “y” / “yes” - Save current data exactly as it exists as a separate (previous) version - “n” / “no” - Allow current data to updated / deleted
overwrite (bool) – Deprecated. This will use options for if_exists=”new”.
force (bool) – Force adding of data even if this is identical to data stored.
sort (bool) – Sort data in time dimension. We recommend NOT sorting footprint data unless necessary.
drop_duplicates (bool) – Drop duplicate timestamps, keeping the first value
compressor (Any | None) – A custom compressor to use. If None, this will default to Blosc(cname=”zstd”, clevel=5, shuffle=Blosc.SHUFFLE). See https://zarr.readthedocs.io/en/stable/api/codecs.html for more information on compressors.
filters (Any | None) – Filters to apply to the data on storage, this defaults to no filtering. See https://zarr.readthedocs.io/en/stable/tutorial.html#filters for more information on picking filters.
info_metadata (dict | None) – Allows to pass in additional tags to describe the data. e.g {“comment”:”Quality checks have been applied”}

Returns:

UUIDs of Datasources data has been assigned to

Return type:

dict

static schema(particle_locations=True, high_spatial_resolution=False, time_resolved=False, high_time_resolution=False, short_lifetime=False, source_format=None)[source]#

Define schema for footprint Dataset.

The returned schema depends on what the footprint represents, indicated using the keywords. By default, this will include “fp” variable but this will be superceded if high_spatial_resolution or time_resolved are specified.

Parameters:

particle_locations (bool) – Include 4-directional particle location variables: - “particle_location_[nesw]” and include associated additional dimensions (“height”)
high_spatial_resolution (bool) – Set footprint variables include high and low resolution options: - “fp_low” - “fp_high” and include associated additional dimensions (“lat_high”, “lon_high”).
time_resolved (bool) – Set footprint variable to be high time resolution - “fp_HiTRes” and include associated dimensions (“H_back”).
high_time_resolution (bool) – This argument is deprecated and will be replaced in future versions with time_resolved.
short_lifetime (bool) – Include additional particle age parameters for short lived species: - “mean_age_particles_[nesw]”
source_format (str | None) – optional string containing source format; necessary for “time resolved” footprints since the the schema is different for PARIS/FLEXPART and ACRG formats.

Return type:

DataSchema

Returns:

DataSchema object describing this format.

Note: In PARIS format the coordinate dimensions are (“latitude”, “longitude”) rather than (“lat”, “lon”): but given that all other openghg internal formats are (“lat”, “lon”), we are currently keeping all footprint internal formats consistent with this.

static validate_data(data, particle_locations=True, high_spatial_resolution=False, time_resolved=False, high_time_resolution=False, short_lifetime=False, source_format=None)[source]#

Validate data against Footprint schema - definition from Footprints.schema(…) method.

Parameters:

data (Dataset) – xarray Dataset in expected format
inputs. (See Footprints.schema() method for details on optional)

Return type:

None

Returns:

None

Raises a ValueError with details if the input data does not adhere to the Footprints schema.

openghg.store.ObsColumn#

The ObsColumn class is used to process column / satellite observation data.

class openghg.store.ObsColumn(bucket)[source]#

This class is used to process emissions / flux data

read_file(filepath, species, platform='satellite', obs_region=None, satellite=None, domain=None, selection=None, site=None, network=None, instrument=None, tag=None, source_format='openghg', if_exists='auto', save_current='auto', overwrite=False, force=False, compressor=None, filters=None, chunks=None, info_metadata=None)[source]#

Read column observation file

Parameters:

filepath (Union[str, Path]) – Path of observation file
species (str) – Species name or synonym e.g. “ch4”
platform (str) – Type of platform. Should be one of: - “satellite” - “site”
satellite (str | None) – Name of satellite (if relevant). Should include satellite OR site.
domain (str | None) – For satellite only. If data has been selected on an area include the identifier name for domain covered. This can map to previously defined domains (see openghg_defs “domain_info.json” file) or a newly defined domain.
selection (str | None) – For satellite only, identifier for any data selection which has been performed on satellite data. This can be based on any form of filtering, binning etc. but should be unique compared to other selections made e.g. “land”, “glint”, “upperlimit”. If not specified, domain will be used.
site (str | None) – Site code/name (if relevant). Should include satellite OR site.
instrument (str | None) – Instrument name e.g. “TANSO-FTS”
network (str | None) – Name of in-situ or satellite network e.g. “TCCON”, “GOSAT”
tag (str | list | None) – Special tagged values to add to the Datasource. This will be added to any current values if the tag key already exists in a list.
source_format (str) – Type of data being input e.g. openghg (internal format)
if_exists (str) –
What to do if existing data is present. - “auto” - checks new and current data for timeseries overlap
- adds data if no overlap
- raises DataOverlapError if there is an overlap
- ”new” - just include new data and ignore previous
- ”combine” - replace and insert new data into current timeseries
save_current (str) – Whether to save data in current form and create a new version. - “auto” - this will depend on if_exists input (“auto” -> False), (other -> True) - “y” / “yes” - Save current data exactly as it exists as a separate (previous) version - “n” / “no” - Allow current data to updated / deleted
overwrite (bool) – Deprecated. This will use options for if_exists=”new”.
force (bool) – Force adding of data even if this is identical to data stored.
compressor (Any | None) – A custom compressor to use. If None, this will default to Blosc(cname=”zstd”, clevel=5, shuffle=Blosc.SHUFFLE). See https://zarr.readthedocs.io/en/stable/api/codecs.html for more information on compressors.
filters (Any | None) – Filters to apply to the data on storage, this defaults to no filtering. See https://zarr.readthedocs.io/en/stable/tutorial.html#filters for more information on picking filters.
chunks (dict | None) – Chunking schema to use when storing data. It expects a dictionary of dimension name and chunk size, for example {“time”: 100}. If None then a chunking schema will be set automatically by OpenGHG. See documentation for guidance on chunking: https://docs.openghg.org/tutorials/local/Adding_data/Adding_ancillary_data.html#chunking. To disable chunking pass in an empty dictionary.
info_metadata (dict | None) – Allows to pass in additional tags to describe the data. e.g {“comment”:”Quality checks have been applied”}

Returns:

Dictionary of datasource UUIDs data assigned to

Return type:

dict

openghg.store.ObsSurface#

The ObsSurface class is used to process surface observation data.

class openghg.store.ObsSurface(bucket)[source]#

This class is used to process surface observation data

delete(uuid)[source]#

Delete a Datasource with the given UUID

This function deletes both the record of the object store in he

Parameters:: uuid (str) – UUID of Datasource
Return type:: None
Returns:: None

read_data(binary_data, metadata, file_metadata, precision_data=None, site_filepath=None)[source]#

Reads binary data passed in by serverless function. The data dictionary should contain sub-dictionaries that contain data and metadata keys.

This is clunky and the ObsSurface.read_file function could be tidied up quite a lot to be more flexible.

Parameters:

binary_data (bytes) – Binary measurement data
metadata (dict) – Metadata
file_metadata (dict) – File metadata such as original filename
precision_data (bytes | None) – GCWERKS precision data
site_filepath (Union[str, Path, None]) – Alternative site info file (see openghg/openghg_defs repository for format). Otherwise will use the data stored within openghg_defs/data/site_info JSON file by default.

Returns:

Dictionary of result

Return type:

dict

read_file(filepath, source_format, site, network, inlet=None, height=None, instrument=None, data_level=None, data_sublevel=None, dataset_source=None, sampling_period=None, calibration_scale=None, platform=None, measurement_type='insitu', verify_site_code=True, site_filepath=None, tag=None, update_mismatch='never', if_exists='auto', save_current='auto', overwrite=False, force=False, compressor=None, filters=None, chunks=None, info_metadata=None)[source]#

Process files and store in the object store. This function: utilises the process functions of the other classes in this submodule to handle each data type.

Parameters:

filepath (Union[str, Path, tuple, list]) – Filepath(s)
source_format (str) – Data format, for example CRDS, GCWERKS
site (str) – Site code/name
network (str) – Network name
inlet (str | None) – Inlet height. Format ‘NUMUNIT’ e.g. “10m”. If retrieve multiple files pass None, OpenGHG will attempt to extract this from the file.
height (str | None) – Alias for inlet.
data. (read inlets from)
instrument (str | None) – Instrument name
data_level (str | int | float | None) –
The level of quality control which has been applied to the data. This should follow the convention of:
- ”0”: raw sensor output
- ”1”: automated quality assurance (QA) performed
- ”2”: final data set
- ”3”: elaborated data products using the data
data_sublevel (str | float | None) – Can be used to sub-categorise data (typically “L1”) depending on different QA performed before data is finalised.
dataset_source (str | None) – Dataset source name, for example “ICOS”, “InGOS”, “European ObsPack”, “CEDA 2023.06”
sampling_period (Timedelta | str | None) – Sampling period in pandas style (e.g. 2H for 2 hour period, 2m for 2 minute period).
platform (str | None) – Type of measurement platform e.g. “surface-insitu”, “surface-flask”
measurement_type (str) – Type of measurement. For some source_formats this value is added to the attributes. Platform should be used in preference. If platform is specified and measurement_type is not, this will be set to match the platform.
verify_site_code (bool) – Verify the site code
site_filepath (Union[str, Path, None]) –
Alternative site info file (see openghg/openghg_defs repository for format). Otherwise will use the data stored within openghg_defs/data/site_info JSON file by default.

update_mismatch: This determines whether mismatches between the internal data

attributes and the supplied / derived metadata can be updated or whether this should raise an AttrMismatchError. If True, currently updates metadata with attribute value.
tag (str | list | None) – Special tagged values to add to the Datasource. This will be added to any current values if the tag key already exists in a list.
update_mismatch (str) –
This determines how mismatches between the internal data “attributes” and the supplied / derived “metadata” are handled. This includes the options:
- ”never” - don’t update mismatches and raise an AttrMismatchError
- ”from_source” / “attributes” - update mismatches based on input data (e.g. data attributes)
- ”from_definition” / “metadata” - update mismatches based on associated data (e.g. site_info.json)
if_exists (str) –
What to do if existing data is present. - “auto” - checks new and current data for timeseries overlap
- adds data if no overlap
- raises DataOverlapError if there is an overlap
- ”new” - just include new data and ignore previous
- ”combine” - replace and insert new data into current timeseries
save_current (str) – Whether to save data in current form and create a new version. - “auto” - this will depend on if_exists input (“auto” -> False), (other -> True) - “y” / “yes” - Save current data exactly as it exists as a separate (previous) version - “n” / “no” - Allow current data to updated / deleted
overwrite (bool) – Deprecated. This will use options for if_exists=”new”.
force (bool) – Force adding of data even if this is identical to data stored.
compressor (Any | None) – A custom compressor to use. If None, this will default to Blosc(cname=”zstd”, clevel=5, shuffle=Blosc.SHUFFLE).
https (See) – //zarr.readthedocs.io/en/stable/api/codecs.html for more information on compressors.
filters (Any | None) – Filters to apply to the data on storage, this defaults to no filtering. See https://zarr.readthedocs.io/en/stable/tutorial.html#filters for more information on picking filters
chunks (dict | None) – Chunking schema to use when storing data. It expects a dictionary of dimension name and chunk size, for example {“time”: 100}. If None then a chunking schema will be set automatically by OpenGHG. See documentation for guidance on chunking: https://docs.openghg.org/tutorials/local/Adding_data/Adding_ancillary_data.html#chunking. To disable chunking pass in an empty dictionary.
info_metadata (dict | None) – Allows to pass in additional tags to describe the data. e.g {“comment”:”Quality checks have been applied”}

Returns:

Dictionary of Datasource UUIDs

Return type:

dict

TODO: Should “measurement_type” be changed to “platform” to align with ModelScenario and ObsColumn?

read_multisite_aqmesh(filepath, metadata_filepath, network='aqmesh_glasgow', instrument='aqmesh', sampling_period=60, measurement_type='insitu', if_exists='auto', overwrite=False)[source]#

Read AQMesh data for the Glasgow network

NOTE - temporary function until we know what kind of AQMesh data we’ll be retrieve in the future.

This data is different in that it contains multiple sites in the same file.

Return type:: defaultdict

static schema(species)[source]#

Define schema for surface observations Dataset.

Only includes mandatory variables

standardised species name (e.g. “ch4”)
expected dimensions: (“time”)

Expected data types for variables and coordinates also included.

Returns:: Contains basic schema for ObsSurface.
Return type:: DataSchema

# TODO: Decide how to best incorporate optional variables # e.g. “ch4_variability”, “ch4_number_of_observations”

store_data(data, if_exists='auto', overwrite=False, force=False, required_metakeys=None, compressor=None, filters=None)[source]#

This expects already standardised data such as ICOS / CEDA

Parameters:

data (MutableSequence[MetadataAndData]) – Dictionary of data in standard format, see the data spec under
documentation (Development -> Data specifications in the)
if_exists (str) –
What to do if existing data is present. - “auto” - checks new and current data for timeseries overlap
- adds data if no overlap
- raises DataOverlapError if there is an overlap
- ”new” - creates new version with just new data
- ”combine” - replace and insert new data into current timeseries
overwrite (bool) – Deprecated. This will use options for if_exists=”new”.
force (bool) – Force adding of data even if this is identical to data stored (checked based on previously retrieved file hashes).
required_metakeys (Sequence | None) –
Keys in the metadata we should use to store this metadata in the object store if None it defaults to:

{“species”, “site”, “station_long_name”, “inlet”, “instrument”, “network”, “source_format”, “data_source”, “icos_data_level”}
compressor (Any | None) – A custom compressor to use. If None, this will default to Blosc(cname=”zstd”, clevel=5, shuffle=Blosc.SHUFFLE). See https://zarr.readthedocs.io/en/stable/api/codecs.html for more information on compressors.
filters (Any | None) – Filters to apply to the data on storage, this defaults to no filtering. See https://zarr.readthedocs.io/en/stable/tutorial.html#filters for more information on picking filters.

Return type:

list[dict] | None

Returns:

list of dicts containing details of stored data, or None

static validate_data(data, species)[source]#

Validate input data against ObsSurface schema - definition from ObsSurface.schema() method.

Parameters:

data (Dataset) – xarray Dataset in expected format
species (str) – Species name

Return type:

None

Returns:

None

Raises a ValueError with details if the input data does not adhere to the ObsSurface schema.

openghg.store.FluxTimeseries#

The FluxTimeseries class is used to process UK inventory data.

class openghg.store.FluxTimeseries(bucket)[source]#

This class is used to process ond dimension timeseries data

_data_type = 'flux_timeseries'#: _root = “FluxTimeseries” _uuid = “099b597b-0598-4efa-87dd-472dfe027f5d8” _metakey = f”{_root}/uuid/{_uuid}/metastore

read_data(binary_data, metadata, file_metadata)[source]#

Ready a footprint from binary data

Parameters:

binary_data (bytes) – Footprint data
metadata (dict) – Dictionary of metadata
file_metadat – File metadata

Returns:

UUIDs of Datasources data has been assigned to

Return type:

dict

read_file(filepath, species, source, region, domain=None, database=None, database_version=None, model=None, source_format='crf', period=None, tag=None, continuous=True, if_exists='auto', save_current='auto', overwrite=False, force=False, compressor=None, filters=None, info_metadata=None)[source]#

Read one dimension timeseries file

Parameters:

filepath (Union[str, Path]) – Path of flux timeseries / emissions timeseries file
species (str) – Species name
domain (str | None) – Region for Flux timeseries
source (str) – Source of the emissions data, e.g. “energy”, “anthro”, default is ‘anthro’.
region (str) – Region/Country of the CRF data
domain – Geographic domain, default is ‘None’. Instead region is used to identify area
database (str | None) – Name of database source for this input (if relevant)
database_version (str | None) – Name of database version (if relevant)
model (str | None) – Model name (if relevant)
source_format (str) – Type of data being input e.g. openghg (internal format)
period (str | tuple | None) – Period of measurements. Only needed if this can not be inferred from the time coords
specified (If) –
- “yearly”, “monthly”
- suitable pandas Offset Alias
- tuple of (value, unit) as would be passed to pandas.Timedelta function
of (should be one) –
- “yearly”, “monthly”
- suitable pandas Offset Alias
- tuple of (value, unit) as would be passed to pandas.Timedelta function
tag (str | list | None) – Special tagged values to add to the Datasource. This will be added to any current values if the tag key already exists in a list.
continuous (bool) – Whether time stamps have to be continuous.
if_exists (str) –
What to do if existing data is present. - “auto” - checks new and current data for timeseries overlap
- adds data if no overlap
- raises DataOverlapError if there is an overlap
- ”new” - just include new data and ignore previous
- ”combine” - replace and insert new data into current timeseries
save_current (str) – Whether to save data in current form and create a new version. - “auto” - this will depend on if_exists input (“auto” -> False), (other -> True) - “y” / “yes” - Save current data exactly as it exists as a separate (previous) version - “n” / “no” - Allow current data to updated / deleted
overwrite (bool) – Deprecated. This will use options for if_exists=”new”.
force (bool) – Force adding of data even if this is identical to data stored.
compressor (Any | None) – A custom compressor to use. If None, this will default to Blosc(cname=”zstd”, clevel=5, shuffle=Blosc.SHUFFLE). See https://zarr.readthedocs.io/en/stable/api/codecs.html for more information on compressors.
filters (Any | None) – Filters to apply to the data on storage, this defaults to no filtering. See https://zarr.readthedocs.io/en/stable/tutorial.html#filters for more information on picking filters.
info_metadata (dict | None) – Allows to pass in additional tags to describe the data. e.g {“comment”:”Quality checks have been applied”}

Returns:

Dictionary of datasource UUIDs data assigned to

Return type:

dict

static schema()[source]#

Define schema for one dimensional timeseries(FluxTimeseries) Dataset.

Includes observation for each time of the defined domain:

“Obs”
- expected dimensions: (“time”)

Expected data types for all variables and coordinates also included.

Returns:: Contains schema for FluxTimeseries.
Return type:: DataSchema

static validate_data(data)[source]#

Return type:

None

Validate input data against FluxTimeseries schema - definition from FluxTimeseries.schema() method.

Args:: data : xarray Dataset in expected format
Returns:: None

Raises: ValueError if the input data does not match the schema: to the FluxTimeseries schema.

Recombination functions#

These handle the recombination of data retrieved from the object store.

Segmentation functions#

These handle the segmentation of data ready for storage in the object store.

Metadata Handling#

The data_manager function is used in the same way as the search functions. It takes any number of keyword arguments for searching of metadata and a data_type argument. It returns a DataManager object.

Data types#

These helper functions provide a useful way of retrieving the data types OpenGHG can process and their associated storage classes.