Util#

Exporting#

These are used to export data to a format readable by the OpenGHG data dashboard.

openghg.util.to_dashboard(data, selected_vars, downsample_n=3, filename=None)[source]#

Takes a Dataset produced by OpenGHG and outputs it into a JSON format readable by the OpenGHG dashboard or a related project.

This also exports a separate file with the locations of the sites for use with map selector component.

Note - this function does not currently support export of data from multiple inlets.

Parameters:
  • data (Union[ObsData, List[ObsData]]) – Dictionary of retrieved data

  • selected_vars (List) – The variables to want to export

  • downsample_n (int) – Take every nth value from the data

  • filename (Optional[str]) – filename to write output to

Return type:

Optional[Dict]

Returns:

None

openghg.util.to_dashboard_mobile(data, filename=None)[source]#

Export the Glasgow LICOR data to JSON for the dashboard

Parameters:
  • data (Dict) – Data dictionary

  • filename (Union[str, Path, None]) – Filename for export of JSON

Returns:

Dictonary if no filename given

Return type:

dict or None

Hashing#

These handle hashing of data (usually with SHA1)

openghg.util.hash_file(filepath)[source]#

Opens the file at filepath and calculates its SHA1 hash

Taken from https://stackoverflow.com/a/22058673

Parameters:

filepath (pathlib.Path) – Path to file

Returns:

SHA1 hash

Return type:

str

openghg.util.hash_string(to_hash)[source]#

Return the SHA-1 hash of a string

Parameters:

to_hash (str) – String to hash

Returns:

SHA1 hash of string

Return type:

str

String manipulation#

String cleaning and formatting functions

openghg.util.clean_string(to_clean)[source]#

Returns a lowercase string with only alphanumeric characters and underscores.

Parameters:

to_clean (Optional[str]) – String to clean

Returns:

Clean string

Return type:

str or None

openghg.util.to_lowercase(d, skip_keys=None)[source]#

Convert an object to lowercase. All keys and values in a dictionary will be converted to lowercase as will all objects in a list, tuple or set. You can optionally pass in a list of keys to skip when lowercasing a dictionary.

Based on the answer https://stackoverflow.com/a/40789531/1303032

Parameters:
  • d (Union[Dict, List, Tuple, Set, str]) – Object to lower case

  • skip_keys (Optional[List]) – List of keys to skip when lowercasing.

Returns:

Dictionary of lower case keys and values

Return type:

dict

openghg.util.remove_punctuation(s)[source]#

Removes punctuation and converts the passed string to lowercase

Parameters:

s (str) – String to convert

Returns:

Unpunctuated, lowercased string

Return type:

str

Time#

Helpers to deal with all things datetime.

openghg.util.timestamp_tzaware(timestamp)[source]#

Returns the pandas Timestamp passed as a timezone (UTC) aware Timestamp.

Parameters:

timestamp (pandas.Timestamp) – Timezone naive Timestamp

Returns:

Timezone aware

Return type:

pandas.Timestamp

openghg.util.timestamp_now()[source]#

Returns a pandas timezone (UTC) aware Timestamp for the current time.

Returns:

Timestamp at current time

Return type:

pandas.Timestamp

openghg.util.timestamp_epoch()[source]#

Returns the UNIX epoch time 1st of January 1970

Returns:

Timestamp object at epoch

Return type:

pandas.Timestamp

openghg.util.daterange_from_str(daterange_str, freq='D')[source]#

Get a Pandas DatetimeIndex from a string. The created DatetimeIndex has minute frequency.

Parameters:
  • daterange_str (str) – Daterange string

  • 2019-01-01T00 (of the form) – 00:00_2019-12-31T00:00:00

Returns:

DatetimeIndex covering daterange

Return type:

pandas.DatetimeIndex

openghg.util.daterange_to_str(daterange)[source]#

Takes a pandas DatetimeIndex created by pandas date_range converts it to a string of the form 2019-01-01-00:00:00_2019-03-16-00:00:00

Parameters:

daterange (pandas.DatetimeIndex)

Returns:

Daterange in string format

Return type:

str

openghg.util.create_daterange_str(start, end)[source]#

Convert the passed datetimes into a daterange string for use in searches and Datasource interactions

Parameters:
  • start_date – Start date

  • end_date – End date

Returns:

Daterange string

Return type:

str

openghg.util.create_daterange(start, end, freq='D')[source]#

Create a minute aligned daterange

Parameters:
  • start (Timestamp) – Start date

  • end (Timestamp) – End date

Return type:

DatetimeIndex

Returns:

pandas.DatetimeIndex

openghg.util.check_nan(data)[source]#

Check if a number is Nan.

Returns a string that can be JSON serialised.

Parameters:

data (Union[int, float]) – Number

Returns:

Returns NA if not a number else number

Return type:

str, float, int

openghg.util.check_date(date)[source]#

Check if a date string can be converted to a pd.Timestamp and returns NA if not.

Returns a string that can be JSON serialised.

Parameters:

date (str) – String to test

Returns:

Returns NA if not a date, otherwise date string

Return type:

str

Site Checks#

These perform checks to ensure data processed for each site is correct

openghg.util.verify_site(site)[source]#

Check if the passed site is a valid one and returns the three letter site code if found. Otherwise we use fuzzy text matching to suggest sites with similar names.

Parameters:

site (str) – Three letter site code or site name

Returns:

Verified three letter site code if valid site

Return type:

str

openghg.util.multiple_inlets(site)[source]#

Check if the passed site has more than one inlet

Parameters:

site (str) – Three letter site code

Returns:

True if multiple inlets

Return type:

bool

Domain#

openghg.util.find_domain(domain, domain_filepath=None)[source]#

Finds the latitude and longitude values in degrees associated with a given domain name.

Parameters:
  • domain (str) – Pre-defined domain name

  • domain_filepath (Union[str, Path, None]) – Alternative domain info file. Defaults to openghg_defs input.

Returns:

Latitude and longitude values for the domain in degrees.

Return type:

array, array

Inlet#

openghg.util.format_inlet(inlet, units='m', key_name=None, special_keywords=None)[source]#

Make sure inlet / height name conforms to standard. The standard imposed can depend on the associated key_name itself (can be supplied as an option to check).

This standard is as follows:
  • number followed by unit

  • number alone if unit / derviative is specified at the end of key_name (e.g. station_height_masl)

  • unchanged if this is one of the special keywords (by default “multiple” or “various”)

Other considerations:
  • For units of “m”, we will also look for “magl” and “masl” (metres above ground and sea level)

  • If the input string just contains numbers, it is assumed this is already within the correct unit.

Parameters:
  • inlet (Union[str, slice, None, list[Union[str, slice, None]]]) – Inlet / Height value in the specified units

  • units (str) – Units for the inlet value (“m” by default)

  • key_name (Optional[str]) – Name of the associated key. This is optional but will be used to determine whether the unit value should be added to the output string.

  • special_keywords (Optional[list]) – Specify special keywords inlet could be set to If so do not apply any formatting. If this is not set a special keyword of “multiple” and “column” will still be allowed.

Return type:

Union[str, slice, None, list[Union[str, slice, None]]]

Returns:

same type as input, with all strings formatted

Usage:
>>> format_inlet("10")
    "10m"
>>> format_inlet("10m")
    "10m"
>>> format_inlet("10magl")
    "10m"
>>> format_inlet("10.111")
    "10.1m"
>>> format_inlet(["10", 100])
    ["10m", "100m"]
>>> format_inlet("multiple")
    "multiple"
>>> format_inlet("10m", key_name="inlet")
    "10m"
>>> format_inlet("10m", key_name="inlet_magl")
    "10"
>>> format_inlet("10m", key_name="station_height_masl")
    "10"