Searching and plotting#

In this short tutorial we’ll show how to retrieve some data and create a simple plot using one of our plotting functions.

Using the tutorial object store#

As in the previous tutorial, we will use the tutorial object store to avoid cluttering your personal object store.

from openghg.tutorial import use_tutorial_store

use_tutorial_store()

Now we’ll add some data to the tutorial store.

from openghg.tutorial import populate_surface_data
populate_surface_data()

1. Searching#

Let’s search for all the methane data from Tacolneston. To do this we need to know the site code (“TAC”).

If we didn’t know the site code, we could find it using the summary_site_codes() function:

from openghg.standardise import summary_site_codes

## UNCOMMENT THIS CODE TO SHOW ALL ENTRIES
# import pandas as pd; pd.set_option('display.max_rows', None)

summary = summary_site_codes()
summary

The output of this function is a pandas DataFrame, so we can filter to find sites containing the name “Tacolneston”:

site_long_name = summary["Long name"]
find_tacolneston = site_long_name.str.contains("Tacolneston")

summary[find_tacolneston]

This shows us that the site code for Tacolneston is “TAC”, and also that there are two entries for Tacolneston, since it is included under multiple networks.

To see all available data associated with Tacolneston we can search for this using the site code of “TAC”.

from openghg.retrieve import search

tac_data_search = search(site="tac")

For our search we can take a look at the results property (which is a pandas DataFrame).

tac_data_search.results

To just look for the surface observations we can use the search_surface function specifically. We can also pass multiple keys to extract, for example, just the methane data:

from openghg.retrieve import search_surface

tac_surface_search = search_surface(site="TAC", species="ch4")
tac_surface_search.results

Keyword options when searching#

When searching it is also possible to specify multiple options for keywords. If this is done using a list, then datasources which have any of the specified values will be found. For example if we wanted to search for methane at two specific inlets we could write:

from openghg.retrieve import search_surface

tac_surface_search = search_surface(site="TAC", species="ch4", inlet=["100m", "185m"])
tac_surface_search.results

This will return results from both the 100m and 185m inlets (but not the 54m inlet).

Note: it is also possible to specify a dictionary to provide an option between different keywords but this would most often be for backwards compatability (e.g. if a new keyword is introduced and a previous one retired but still present for some data sources) and so will not be demonstrated in this tutorial.

There are also equivalent search functions for other data types including search_footprints, search_flux and search_bc.

Searching for a range of values#

An alternative way to search for multiple inlet values is by specifying a range of values:

from openghg.retrieve import search_surface

tac_surface_search = search_surface(site="TAC", species="ch4", inlet=slice("100m", "185m"))
tac_surface_search.results

Again, this will return results from both the 100m and 185m inlets.

When a slice is used to specify inlet heights in get_obs_surface, the search results will be combined into a single output with an inlet data variable, if possible. This is useful when the inlet height changes slightly.

Suppose that we have surface data at the BSD site, with inlet heights 248m and 250m. To retrieve this data as a single dataset, we use:

from openghg.retrieve import get_obs_surface

bsd_surface_data = get_obs_surface(site="BSD", species="ch4", inlet=slice("248m", "250m"))

The data in bsd_surface_data.data will have an inlet data variable, which contains the inlet height at each time.

Note that call

bsd_surface_data = get_obs_surface(site="BSD", species="ch4", inlet=["248m", "250m"])

would raise an error, because two datasources would be found, and without specifying a slice, OpenGHG doesn’t know to combine this data.

Further, the range inlet=slice("240m", "260m") would also work, so the exact values do not need to be specified.

2. Plotting#

If we want to take a look at the data from the 185m inlet we can first retrieve the data from the object store and then create a quick timeseries plot. See the SearchResults object documentation for more information.

data_185m = tac_surface_search.retrieve(inlet="185m")

Note

The plots created below may not show up on the online documentation version of this notebook.

We can visualise this data using the in-built plotting commands from the plotting sub-module. We can also modify the inputs to improve how this is displayed:

from openghg.plotting import plot_timeseries

plot_timeseries(data_185m, title="Methane at Tacolneston", xlabel="Time", ylabel="Concentration", units="ppm")

Plotting multiple timeseries#

If there are multiple results for a given search, we can also retrieve all the data and receive a list of ObsData objects.

all_ch4_tac = tac_surface_search.retrieve()

Then we can use the plot_timeseries function from the plotting submodule to compare measurements from different inlets. This creates a Plotly plot that should be interactive and and responsive, even with relatively large amounts of data.

plot_timeseries(data=all_ch4_tac, units="ppb")

3. Comparing different sites#

We can easily compare data for the same species from different sites by doing a quick search to see what’s available

ch4_data = search_surface(species="ch4")

ch4_data.results

Then we refine our search to only retrieve the sites (and inlets) that we want to compare and make a plot

bsd_data = ch4_data.retrieve(site="BSD")
tac_data = ch4_data.retrieve(site="TAC", inlet="54m")
plot_timeseries(data=[bsd_data, tac_data], title="Comparing CH4 measurements at Tacolneston and Bilsdale")