Converting an object store#

In this tutorial we’ll convert an object store from the old-style NetCDF format to the new Zarr based object store format.

In this example we have an old-style object store at /home/gareth/openghg_store and we want to convert it to the new Zarr format. To perform the conversion we need the path of the old old-style store and the name of the new store to write to.

Let’s first add a new object store to our openghg.conf using openghg --quickstart.

$ openghg --quickstart

OpenGHG configuration
---------------------

INFO:openghg.util:User config exists at /home/gareth/.openghg/openghg.conf, checking...             _user.py:91
INFO:openghg.util:Current user object store path: /home/gareth/openghg_store                       _user.py:102
Would you like to update the path? (y/n): n
Would you like to add another object store? (y/n): y
Enter the name of the store: openghg_store_zarr
Enter the object store path: /home/gareth/openghg_store_zarr

You will now be asked for read/write permissions for the store.
For read only enter r, for read and write enter rw.

Enter object store permissions: rw
Would you like to add another object store? (y/n): n
INFO:openghg.util:Configuration written to /home/gareth/.openghg/openghg.conf

Now we have a new object store called openghg_store_zarr in our configuration file and the path of that object store is /home/gareth/openghg_store_zarr. We’re now ready perform the conversion.

from openghg.store.storage import convert_store

old_store = "/home/gareth/openghg_store"
new_store = "openghg_store_zarr"

convert_store(path_in=old_store, store_out=new_store)

The convert_store function iterates over each of the data storage classes (Footprints, ObsSurface etc) and adds the data to the new object store. It does this by reading the metadata from the the metadata store and passing the data stored as NetCDF files to the appropriate standardise_* functions.

The conversion process can take some time depending on the size of the object store. The conversion process is not atomic and if the process is interrupted the new object store may be broken. If the conversion process is interrupted it is recommended to delete the new object store and start the conversion process again.

Note

We recommend that object stores are populated using the original data, this will result in a more consistent store. We recommend using the conversion process only when necessary.

The conversion function will attempt to catch errors are they are raised during the conversion process. You may see lines such as Error standardising record <uuid>: Codec does not support buffers of > 2147483647 bytes in the log output. These errors mean that the chunks being written to the object store are too large and need to be reduced through the use of chunking.

To do this we can pass a chunks dictionary and the name of the data type that was being converted during conversion to the convert_store function.

from openghg.store.storage import convert_store

old_store = "/home/gareth/openghg_store"
new_store = "openghg_store_zarr"

convert_store(path_in=old_store, store_out=new_store, to_convert=["footprints"], chunks={"time": 24})

You may need to experiment with the chunk sizes to find the optimal size for the data being converted.