Data objects

_BaseData

The base dataclass inherited by (most of) the dataclasses below.

DataManager

This dataclass is used to modify metadata stored in Datasource objects and the metadata store. DataManager instances are created by the data_manager function.

class openghg.dataobjects.DataManager(metadata, store)[source]
__init__(metadata, store)[source]
__str__()[source]

Return str(self).

Return type:

str

delete_datasource(uuid)[source]

Delete Datasource(s) in the object store. At the moment we only support deleting the complete Datasource.

NOTE: Make sure you really want to delete the Datasource(s)

Parameters:

uuid (UnionType[list, str]) – UUID(s) of objects to delete

Return type:

None

Returns:

None

refresh()[source]

Force refresh the internal metadata store with data from the object store.

Return type:

None

Returns:

None

restore(uuid, version='latest')[source]

Restore a backed-up version of a Datasource’s metadata.

Parameters:
  • uuid (str) – UUID of Datasource to retrieve

  • version (UnionType[str, int]) – Version of metadata to restore

Return type:

None

Returns:

None

update_attributes(uuid, version='latest', data_vars=None, update_global=True, to_update=None, to_delete=None)[source]

Update the attributes of the stored Dataset.

This takes UUIDs of Datasources (and optionally a version tag) and updates the associated attributes: - to update attributes pass in a dictionary of key/value pairs to update. - to delete attributes pass in a list of keys to delete.

Parameters:
  • uuid (UnionType[list, str]) – UUID(s) of Datasources to be updated.

  • version (UnionType[str, list[str]]) – optional version string

  • data_vars (UnionType[str, list[str], None]) – optional list of data vars to update; if None, then only global attributes will be updated.

  • update_global (bool) – if True, update global attributes.

  • to_update (UnionType[dict, None]) – Dictionary of metadata to add/update. New key/value pairs will be added.

  • updated. (If the key already exists in the metadata the value will be)

  • to_delete (UnionType[str, list, None]) – Key(s) to delete from the metadata

Return type:

None

Returns:

None

update_metadata(uuid, to_update=None, to_delete=None)[source]

Update the metadata associated with data.

This takes UUIDs of Datasources and updates the associated metadata. To update metadata pass in a dictionary of key/value pairs to update. To delete metadata pass in a list of keys to delete.

Parameters:
  • uuid (UnionType[list, str]) – UUID(s) of Datasources to be updated.

  • to_update (UnionType[dict, None]) – Dictionary of metadata to add/update. New key/value pairs will be added.

  • updated. (If the key already exists in the metadata the value will be)

  • to_delete (UnionType[str, list, None]) – Key(s) to delete from the metadata

Return type:

None

Returns:

None

view_backup(uuid=None, version=None)[source]

View backed-up metadata for all Datasources or a single Datasource if a UUID is passed in.

Parameters:

uuid (UnionType[str, None]) – UUID of Datasource

Returns:

Dictionary of versioned metadata

Return type:

dict

SearchResults

This dataclass is returned by the OpenGHG search functions and allows easy retrieval and querying of metadata retrieved by the search function.

class openghg.dataobjects.SearchResults(metadata=None, start_result=None, start_date=None, end_date=None)[source]

This class is used to return data from the search function. It has member functions to retrieve data from the object store.

Parameters:
  • keys – Dictionary of keys keyed by Datasource UUID

  • metadata (UnionType[dict, None]) – Dictionary of metadata keyed by Datasource UUID

  • start_result (UnionType[str, None]) –

    ?

__init__(metadata=None, start_result=None, start_date=None, end_date=None)[source]
__repr__()[source]

Return repr(self).

Return type:

str

__str__()[source]

Return str(self).

Return type:

str

static df_to_table_console_output(df)[source]

Process the DataFrame and display it as a formatted table in the console.

Parameters:

df (DataFrame) – The DataFrame to be processed and displayed.

Return type:

None

Returns:

None

retrieve(dataframe=None, version='latest', sort=True, **kwargs)[source]

Retrieve data from object store using a filtered pandas DataFrame

Parameters:
  • dataframe (DataFrame | None) – pandas DataFrame

  • version (str) – Version of data requested from Datasource. Default = “latest”.

  • sort (bool) – Sort data by time in retrieved Dataset

  • **kwargs – Metadata values to search for

Returns:

ObsData object(s)

Return type:

ObsData / List[ObsData]

retrieve_all(version='latest', sort=True)[source]

Retrieves all data found during the search

Parameters:
  • version (str) – Version of data requested from Datasource. Default = “latest”.

  • sort (bool) – Sort by time. Note that this may be very memory hungry for large Datasets.

Returns:

ObsData object(s)

Return type:

ObsData / List[ObsData]

uuids()[source]

Return the UUIDs of the found data

Returns:

List of UUIDs

Return type:

list

ObsData

This dataclass is returned by data retrieval functions such as get_obs_surface and the SearchResults retrieve function.

class openghg.dataobjects.ObsData(metadata, data=None, uuid=None, version=None, start_date=None, end_date=None, sort=True, elevate_inlet=False, attrs_to_check=None)[source]

This class is used to return observations data. It be created with a preloaded xarray Dataset or with a UUID and version number to retrieve data from Datasource zarr store.

__eq__(other)[source]

Return self==value.

Return type:

bool

__getitem__(key)[source]

Returns the data attribute (xarray Dataset) when the site name is specified. Included as a compatability layer for legacy format as a dictionary containing a Dataset for each site code.

key (str): Site code

Return type:

Any

__hash__ = None
__iter__()[source]

Returns site code as the key for the dictionary as would be expected.

Return type:

Iterator

__len__()[source]

Returns number of key values (fixed at 1 at present)

Return type:

int

plot_timeseries(title=None, xlabel=None, ylabel=None, units=None, logo=True)[source]

Plot a timeseries

Return type:

Figure

FluxData

This dataclass is used to return observations data from the get_flux function

class openghg.dataobjects.FluxData(metadata, data=None, uuid=None, version=None, start_date=None, end_date=None, sort=True, elevate_inlet=False, attrs_to_check=None)[source]

This class is used to return flux/emissions data from the get_flux function

Parameters:
  • data (UnionType[Dataset, None]) – xarray Dataframe

  • metadata (dict) – Dictionary of metadata including model run parameters

__str__()[source]

Return str(self).

Return type:

str

ObsColumnData

This dataclass is used to return observations data from the get_obs_column function

class openghg.dataobjects.ObsColumnData(metadata, data=None, uuid=None, version=None, start_date=None, end_date=None, sort=True, elevate_inlet=False, attrs_to_check=None)[source]

This class is used to return observations data from the get_obs_column function

Parameters:
  • data (UnionType[Dataset, None]) – xarray Dataset

  • metadata (dict) – Dictionary of metadata including model run parameters

__str__()[source]

Return str(self).

Return type:

str

plot_timeseries(title=None, xlabel=None, ylabel=None, units=None, logo=True)[source]

Plot a timeseries

Return type:

Figure

FootprintData

This dataclass is used to return observations data from the get_footprint function

class openghg.dataobjects.FootprintData(metadata, data=None, uuid=None, version=None, start_date=None, end_date=None, sort=True, elevate_inlet=False, attrs_to_check=None)[source]

This class is used to return observations data from the get_footprint function

Parameters:
  • data (UnionType[Dataset, None]) – xarray Dataset

  • metadata (dict) – Dictionary of metadata including model run parameters

__str__()[source]

Return str(self).

Return type:

str