Getting setup

Here we’ll cover getting your development environment setup for contributing to OpenGHG. The source code for OpenGHG is available on GitHub.

Setting up your computer

You’ll need git and Python >= 3.10, so please make sure you have both installed before continuing further.

Clone OpenGHG

First we’ll clone the repository and make sure we’re on the devel branch. This makes sure we’re on the most up to date version of OpenGHG.

git clone https://github.com/openghg/openghg.git
cd openghg
git checkout devel

Next we’ll get a virtual environment setup using pixi, pip, or conda.

Environments

Here we cover the creation of an environment and the installation of OpenGHG into it. Installation here means adding OpenGHG to the environment. We’ll install it in developer mode so that any changes you make to the code will automatically be available when you run commands. Similarly, if you run a git pull on the devel branch all changes made will be available to you straight away, without having to reinstall or update OpenGHG within the environment.

pixi

Pixi is the recommended development environment when working with NetCDF, HDF5, or Zarr data. It installs the compiled scientific, HDF5, and NetCDF stack from conda-forge and keeps this OpenGHG checkout editable.

Install Pixi directly with one of the following commands.

On macOS or Linux, use the official installer:

curl -fsSL https://pixi.sh/install.sh | sh

If curl is unavailable, use wget:

wget -qO- https://pixi.sh/install.sh | sh

On macOS with Homebrew:

brew install pixi

Then create the editable OpenGHG development environment from this checkout:

pixi install -e dev
pixi run -e dev python -c "import openghg, h5py, h5netcdf, netCDF4, xarray, zarr"

Useful development commands:

pixi run -e dev test
pixi run -e dev test-storage
pixi run -e dev lint
pixi run -e dev typecheck

Avoid running commands such as pip install -U h5py h5netcdf netcdf4 inside the Pixi environment. That can replace Pixi’s conda-forge HDF5/NetCDF packages with PyPI wheels and reintroduce binary incompatibilities.

OpenGHG should now be installed, you can check this by opening ipython and running

In [1]: import openghg

pip

It is recommended that you develop OpenGHG in a Python virtual environment. Here we’ll create a new folder called envs in our home directory and create a new openghg_devel environment in it.

mkdir -p ~/envs/openghg_devel
python -m venv ~/envs/openghg_devel

Virtual environments provide sandboxes which make it easier to develop and test code. They also allow you to install Python modules without interfering with other Python installations.

We activate our new environment using

source ~/envs/openghg_devel/bin/activate

We’ll first install and update some installation tools

pip install --upgrade pip wheel setuptools

Now, making sure we’re in the root of the OpenGHG repository we just cloned, install OpenGHG’s development dependencies.

pip install -e ".[dev]"

This installs OpenGHG in editable mode (-e / --editable flag) with all development dependencies defined in pyproject.toml.

OpenGHG should now be installed, you can check this by opening ipython and running

In [1]: import openghg

conda

Making sure you’re in the openghg repository folder run

conda env create -f environment.yaml

Once conda finishes its installation process you can activate the enironment

conda activate openghg_env

Next install conda-build which allows us to install packages in develop mode

conda install conda-build

And finally install OpenGHG

conda develop .

OpenGHG should now be installed, you can check this by opening ipython and running

In [1]: import openghg

Run tests

To ensure everything is working on your system running the tests is a good idea. To do this run

pytest -v tests

Testing against multiple versions of Python

Our GitHub workflows test against multiple versions of Python, using the latest versions of dependencies. This can sometimes result in failing tests when you push to GitHub, despite your tests passing locally.

You can use tox to run tests in isolated environments built with the latest dependencies different versions of Python.

To manage multiple versions of Python, you can use pyenv. After following the installation instructions, you can install multiple versions of Python:

pyenv install 3.10
pyenv install 3.11
pyenv install 3.12

To view all available versions, call pyenv versions. To view and set your preferred version globally, use pyenv global. To activate multiple versions of Python, you can use pyenv local:

pyenv local 3.10 3.11 3.12

This makes Python 3.10, 3.11, and 3.12 available in the current directory. The python command will default to the first version in the list; in this case, Python 3.10.

To run tests against Python 3.10, 3.11, and 3.12, as well as run the linters (black and flake8) and mypy, call tox in your OpenGHG repo.

To see all jobs that tox can run, use tox -l. You can run a specific job with tox run -e <env>. For instance

tox run -e py312

will run the tests against Python 3.12.

To pass arguments to pytest, you can append them after the tox command as follows:

tox run -e py312 -- tests/analyse/test_scenario.py

Coding Style

OpenGHG is written in Python 3 (>= 3.9). We aim as much as possible to follow a PEP8 python coding style and recommend that use a linter such as flake8.

This code has to run on a wide variety of architectures, operating systems and machines - some of which don’t have any graphic libraries, so please be careful when adding a dependency.

With this in mind, we use the following coding conventions:

Naming

We follow a Python style naming convention.

  • Packages: lowercase, singleword

  • Classes: CamelCase

  • Methods: snake_case

  • Functions: snake_case

  • Variables: snake_case

  • Source Files: snake_case with a leading underscore

Functions or variables that are private should be named with a leading underscore. This prevents them from being prominantly visible in Python’s help and tab completion.

Modules

OpenGHG consists of the main module, e.g. openghg, plus a openghg.submodule module.

To make OpenGHG easy for new developers to understand, we have a set of rules that will ensure that only necessary public functions, classes and implementation details are exposed to the Python help system.

  • Module files containing implementation details are prefixed with an underscore, i.e. _parameters.py

  • Each module file contains an __all__ variable that lists the specific items that should be imported.

  • Import-heavy package __init__.py files should expose public names lazily. Keep the public names in __all__ and add a matching _EXPORTS mapping from each public name to the private implementation module that defines it:

from importlib import import_module
from typing import Any

__all__ = ["function_a", "ClassB"]

_EXPORTS = {
    "function_a": "._module_a",
    "ClassB": "._module_b",
}

def __getattr__(name: str) -> Any:
    try:
        module_name = _EXPORTS[name]
    except KeyError as exc:
        raise AttributeError(f"module {__name__!r} has no attribute {name!r}") from exc

    value = getattr(import_module(module_name, __name__), name)
    globals()[name] = value
    return value

def __dir__() -> list[str]:
    return sorted(__all__)

The invariant is set(__all__) == set(_EXPORTS). Tests should enforce this for each package using the lazy export pattern.

Packages using this pattern must also include a sibling __init__.pyi stub that re-exports the same public names from their implementation modules. The runtime __getattr__ necessarily returns Any, and the stub keeps mypy and other static checkers from losing the real function and class types while preserving lazy runtime imports.

  • The top-level openghg package uses the same idea for subpackages: subpackages are listed in __all__ and imported only when first accessed.

  • Do not rely on importing a package __init__.py to trigger subclass registration, xarray accessor registration, or other implementation-module side effects. Discovery that must work before implementation modules are imported should use declarative metadata instead. For example, store data type discovery is maintained in openghg.store.spec rather than by eagerly importing every store class.

  • If a previous import-time side effect is still useful, provide an explicit opt-in helper rather than restoring the eager import. For example, openghg.enable_pint_xarray() imports pint_xarray and registers the xarray .pint accessor without making import openghg import xarray.

This results in a clean API and documentation, with all extraneous information, e.g. external modules, hidden from the user. This is important when working interactively, since IPython and Jupyter do not respect the __all__ variable when auto-completing, meaning that the user will see a full list of the available names when hitting tab. When following the conventions above, the user will only be able to access the exposed names. This greatly improves the clarity of the package, allowing a new user to quickly determine the available functionality. Any user wishing expose further implementation detail can, of course, type an underscore to show the hidden names when searching.

Type hinting

Throughout the OpenGHG project we use type hinting which allows us to declare the type of the objects that are going to be passed to and returned from functions. This helps improve user understanding of the code and when used in conjunction with tools like mypy can help catch bugs.

If we are writing a function that accepts takes a string and returns a string we can add the types like so

def greeter(name: str) -> str:
    """ Greets the user

        Args:
            name: Name of user
        Returns:
            str: Greeting string
    """
    return 'Hello ' + name

For a function that takes either a string or a list as its argument and returns a list we can write it as

def search(search_terms: Union[str, List]) -> List:
    """ A function that searches

        Args:
            search_terms: Search terms
        Returns:
            list: List of data found
    """
    return ["found_item"]

Workflow

Feature branches

First make sure that you are on the development branch of OpenGHG:

git checkout devel

Now create and switch to a feature branch. This should be prefixed with feature, e.g.

git checkout -b feature-process

Pre-commit

This project uses pre-commit to ensure code is linted and formatted using tools such as flake8, black and others. This ensures errors are caught before the code is checked in the CI pipeline.

To install the hook

pre-commit install

The hook should now run each time you make a commit.

Testing

When working on your feature it is important to write tests to ensure that it does what is expected and doesn’t break any existing functionality. All code added to the project must be covered by tests and tests should be placed inside the tests directory, creating an appropriately named sub-directory for any new submodules.

The test suite is intended to be run using pytest. When run, pytest searches for tests in all directories and files below the current directory, collects the tests together, then runs them. Pytest uses name matching to locate the tests. Valid names start or end with test, e.g.:

# Files:
test_file.py       file_test.py
# Functions:
def test_func():
   # code to perform tests...

def func_test():
   # code to perform tests...

We use the convention of test_* when naming files and functions.

Running tests

To run the full test suite, simply type:

pytest tests/

To get more detailed information about each test, run pytests using the verbose flag, e.g.:

pytest -v tests/

For more information on the capabilties of pytest please see the pytest documentation.

Continuous integration and delivery

We use GitHub Actions to run a full continuous integration (CI) on all pull requests to devel and master, and all pushes to devel and master. We will not merge a pull request until all tests pass. We only accept pull requests to devel.

Documentation

OpenGHG is fully documented using a combination of hand-written files (in the doc folder) and auto-generated api documentation created from Google style docstrings. for details. The documentation is automatically built using Sphinx. Whenever a commit is pushed to devel the documentation is automatically rebuilt and updated.

To build the documentation locally you will first need to install the documentation dependencies. If you haven’t yet installed the documentation dependencies please do so by running

pip install -e ".[doc]"

Next ensure you have pandoc installed. Installation instructions can be found here

Then move to the doc directory and run:

make

When finished, point your browser to build/html/index.html.

Committing

If you create new tests, please make sure that they pass locally before commiting. When happy, commit your changes, e.g.

git commit openghg/_new_feature.py tests/test_feature \
    -m "Implementation and test for new feature."

If your edits don’t change the OpenGHG source code e.g. fixing typos in the documentation, then please add [skip ci] to your commit message.

git commit -a -m "Updating docs [skip ci]"

This will avoid unnecessarily running the GitHub Actions, e.g. running all the tests and rebuilding the documentation of the OpenGHG package etc. GitHub actions are configured in the file .github/workflows/main.yaml).

Next, push your changes to the remote server:

# Push to the feature branch on the main OpenGHG repo, if you have access.
git push origin feature

# Push to the feature branch your own fork.
git push fork feature

When the feature is complete, create a pull request on GitHub so that the changes can be merged back into the development branch. For information, see the documentation here.