Managing and deleting data#

Sometimes you might want to modify some metadata after running the data through the standardisation scripts. After the standardisation process the metadata associated with some data can still be edited. This can save time if the data standardisation process is quite time consuming. Data can also be deleted from the object store. We use some lower level functions in this tutorial that you may not have encountered before. Take time to understand what they are doing and check each operation carefully.

Warning

The functionality exposed by the DataManager class could lead to data loss. Please use it carefully.

This tutorial will not work with the normal use_tutorial_store command. We need to manually backup the OpenGHG configuration file and create a new one.

mv ~/.openghg/openghg.conf ~/.openghg/openghg.conf.bak
openghg --quickstart

OpenGHG configuration
---------------------

INFO:openghg.util:We'll first create your user object store.

Enter path for your local object store (default /home/gareth/openghg_store): /home/gareth/testing_store
Would you like to add another object store? (y/n): n
INFO:openghg.util:Configuration written to /home/gareth/.config/openghg/openghg.conf

We’ll first add some footprint data to the object store, we’ll explicitly pass in the name of the store we want to write the data to here, the “user” store.

from openghg.dataobjects import data_manager
from openghg.standardise import standardise_footprint
from openghg.tutorial import retrieve_example_data

tac_fp_inert = "https://github.com/openghg/example_data/raw/main/footprint/tac_footprint_inert_201607.tar.gz"
tac_inert_path = retrieve_example_data(url=tac_fp_inert)[0]

site = "TAC"
inlet = "100m"
domain = "EUROPE"
model = "NAME"

store = "user"

standardise_footprint(filepath=tac_inert_path, site=site, inlet=inlet, domain=domain, model=model, store=store)

Now we’re ready to retrieve the metadata from the object store and create a DataManager object. Again we have to pass in the name of the store.

Note

You can only pass in the name of a store to which you have write access. If you don’t have the correct permissions an ObjectStoreError will be raised.

dm = data_manager(data_type="footprints", site="TAC", height="100m", store="user")
dm.metadata

We want to update the model name so we’ll use the update_metadata method of the DataManager object. To do this we need to take the UUID of the Datasource returned by the data_manager function, this is the key of the metadata dictionary.

NOTE: Each time an object is added to the object store it is assigned a unique id using the Python uuid4 function. This means any UUIDs you see in the documentation won’t match those created when you run these tutorials.

For the purposes of this tutorial we take the first key from the metadata dictionary. We can do this only because we’ve checked the dictionary and seen that only one key exists. It also means you can run through this notebook and it should work without you having to modify it. But be careful, if the dictionary contains more than one key, running the cell below might not result in the UUID you want. Each time you want to modify the data copy and paste the UUID and double check it.

uuid = next(iter(dm.metadata))
updated = {"model": "new_model"}

dm.update_metadata(uuid=uuid, to_update=updated)

When you run update_metadata the internal store of metadata for each Datasource is updated. If you want to really make sure that the metadata in the object store has been updated you can run refresh.

dm.refresh()
metadata = dm.metadata[uuid]

And check the model has been changed.

metadata["model"]

You’ll need to update the metadata for each Datasource. You may automate this process yourself but please be careful as to avoid data loss.

Deleting keys#

Let’s accidentally add too much metadata for the footprint and then delete.

excess_metadata = {"useless_key": "useless_value"}
dm.update_metadata(uuid=uuid, to_update=excess_metadata)
dm.metadata[uuid]["useless_key"]

Oh no! We’ve added some useless metadata, let’s remove it.

to_delete = ["useless_key"]
dm.update_metadata(uuid=uuid, to_delete=to_delete)

And check if the key is in the metadata:

"useless_key" in dm.metadata[uuid]

Restore from backup#

If you’ve accidentally pushed some bad metadata you can fix this easily by restoring from backup. Each DataManager object stores a backup of the current metadata each time you run update_metadata. Let’s add some bad metadata, have a quick look at the backup and then restore it. We’ll start with a fresh DataManager object.

Warning

The backed up data is only stored in memory for the lifetime of the DataManager object. The backup is not stored in the object store.

dm = data_manager(data_type="footprints", site="TAC", height="100m", store="user")
bad_metadata = {"domain": "neptune"}
dm.update_metadata(uuid=uuid, to_update=bad_metadata)

Let’s check the domain

dm.metadata[uuid]["domain"]

Using view_backup we can check the different versions of metadata we have backed up for each Datasource.

dm.view_backup()

To restore the metadata to the previous version we use the restore function. This takes the UUID of the datasource and optionally a version string. The default for the version string is "latest", which is the version most recently backed up. We’ll use the default here.

dm.restore(uuid=uuid)

Now we can check the domain again

dm.metadata[uuid]["domain"]

To really make sure we can force a refresh of all the metadata from the object store and the Datasource.

dm.refresh()

Then check again

dm.metadata[uuid]["domain"]

Multiple backups#

The DataManager object will store a backup each time you run update_metadata. This means you can restore any version of the metadata since you started editing. Do note that the backups, currently, only exist in memory belonging to the DataManager object.

more_metadata = {"time_period": "1m"}
dm.update_metadata(uuid=uuid, to_update=more_metadata)

We can view a specific metadata backup using the version argument. The first version is version 1, here we take a look at the backup made just before we made the update above.

backup_2 = dm.view_backup(uuid=uuid, version=2)
backup_2["time_period"]

Say we want to keep some of the changes we’ve made to the metadata but undo the last one we can restore the last backup. To do this we can pass “latest” to the version argument when using restore.

dm.restore(uuid=uuid, version="latest")
dm.metadata[uuid]["time_period"]

We’re now back to where we want to be.

Deleting data#

To remove data from the object store we use data_manager again

dm = data_manager(data_type="footprints", site="TAC", height="100m", store="user")
dm.metadata

Each key of the metadata dictionary is a Datasource UUID. Please make sure that you double check the UUID of the Datasource you want to delete, this operation cannot be undone! Also remember to change the UUID below to the one in your version of the metadata.

uuid = "13fd70dd-e549-4b06-afdb-9ed495552eed"
dm.delete_datasource(uuid=uuid)

To make sure it’s gone let’s run the search again

dm = data_manager(data_type="footprints", site="TAC", height="100m", store="user")
dm.metadata

An empty dictionary means no results, the deletion worked.

Tidy up#

To restore your old OpenGHG configuration file run

mv ~/.openghg/openghg.conf.bak ~/.openghg/openghg.conf