Modifying and deleting data#
Sometimes you might want to modify some metadata after running the data through the standardisation scripts. After the standardisation process the metadata associated with some data can still be edited. This can save time if the data standardisation process is quite time consuming. Data can also be deleted from the object store.
from openghg.store import data_handler_lookup
from openghg.tutorial import populate_footprint_inert
We’ll first add some footprint data to the object store.
populate_footprint_inert()
result = data_handler_lookup(data_type="footprints", site="TAC", height="100m")
result.metadata
We want to update the model name so we’ll use the update_metadata
method of the DataHandler
object. To do this we need to take the
UUID of the Datasource returned by the data_handler_lookup
function,
this is the key of the metadata dictionary.
NOTE: Each time an object is added to the object store it is assigned a unique id using the Python uuid4 function. This means any UUIDs you see in the documentation won’t match those created when you run these tutorials.
For the purposes of this tutorial we take the first key from the metadata dictionary. We can do this only because we’ve checked the dictionary and seen that only one key exists. It also means you can run through this notebook and it should work without you having to modify it. But be careful, if the dictionary contains more than one key, running the cell below might not result in the UUID you want. Each time you want to modify the data copy and paste the UUID and double check it.
uuid = next(iter(result.metadata))
updated = {"model": "new_model"}
result.update_metadata(uuid=uuid, to_update=updated)
When you run update_metadata
the internal store of metadata for each
Datasource
is updated. If you want to really make sure that the
metadata in the object store has been updated you can run refresh
.
result.refresh()
metadata = result.metadata[uuid]
And check the model has been changed.
metadata["model"]
Deleting keys#
Let’s accidentally add too much metadata for the footprint and then delete.
excess_metadata = {"useless_key": "useless_value"}
result.update_metadata(uuid=uuid, to_update=excess_metadata)
result.metadata[uuid]["useless_key"]
Oh no! We’ve added some useless metadata, let’s remove it.
to_delete = ["useless_key"]
result.update_metadata(uuid=uuid, to_delete=to_delete)
And check if the key is in the metadata:
"useless_key" in result.metadata[uuid]
Restore from backup#
If you’ve accidentally pushed some bad metadata you can fix this easily
by restoring from backup. Each DataHandler
object stores a backup of
the current metadata each time you run update_metadata
. Let’s add
some bad metadata, have a quick look at the backup and then restore it.
We’ll start with a fresh DataHandler
object.
result = data_handler_lookup(data_type="footprints", site="TAC", height="100m")
bad_metadata = {"domain": "neptune"}
result.update_metadata(uuid=uuid, to_update=bad_metadata)
Let’s check the domain
result.metadata[uuid]["domain"]
Using view_backup
we can check the different versions of metadata we
have backed up for each Datasource
.
result.view_backup()
To restore the metadata to the previous version we use the restore
function. This takes the UUID of the datasource and optionally a version
string. The default for the version string is "latest"
, which is the
version most recently backed up. We’ll use the default here.
result.restore(uuid=uuid)
Now we can check the domain again
result.metadata[uuid]["domain"]
To really make sure we can force a refresh of all the metadata from the
object store and the Datasource
.
result.refresh()
Then check again
result.metadata[uuid]["domain"]
Multiple backups#
The DataHandler
object will store a backup each time you run
update_metadata
. This means you can restore any version of the
metadata since you started editing. Do note that the backups, currently,
only exist in memory belonging to the DataHandler
object.
more_metadata = {"time_period": "1m"}
result.update_metadata(uuid=uuid, to_update=more_metadata)
We can view a specific metadata backup using the version
argument.
The first version is version 1, here we take a look at the backup made
just before we made the update above.
backup_2 = result.view_backup(uuid=uuid, version=2)
backup_2["time_period"]
Say we want to keep some of the changes we’ve made to the metadata but
undo the last one we can restore the last backup. To do this we can pass
“latest” to the version argument when using restore
.
result.restore(uuid=uuid, version="latest")
result.metadata[uuid]["time_period"]
We’re now back to where we want to be.
Deleting data#
To remove data from the object store we use data_handler_lookup
again
result = data_handler_lookup(data_type="footprints", site="TAC", height="100m")
result.metadata
Each key of the metadata dictionary is a Datasource UUID. Please make sure that you double check the UUID of the Datasource you want to delete, this operation cannot be undone! Also remember to change the UUID below to the one in your version of the metadata.
uuid = "13fd70dd-e549-4b06-afdb-9ed495552eed"
result.delete_datasource(uuid=uuid)
To make sure it’s gone let’s run the search again
result = data_handler_lookup(data_type="footprints", site="TAC", height="100m")
result.metadata
An empty dictionary means no results, the deletion worked.
Cleanup#
If you’re finished with the data in this tutorial you can cleanup the
tutorial object store using the clear_tutorial_store
function.
from openghg.tutorial import clear_tutorial_store
clear_tutorial_store()