Exporting#

These are used to export data to a format readable by the OpenGHG data dashboard.

openghg.util.to_dashboard(data, selected_vars, downsample_n=3, filename=None)[source]#

Takes a Dataset produced by OpenGHG and outputs it into a JSON format readable by the OpenGHG dashboard or a related project.

This also exports a separate file with the locations of the sites for use with map selector component.

Note - this function does not currently support export of data from multiple inlets.

Parameters
  • data (Union[ObsData, List[ObsData]]) – Dictionary of retrieved data

  • selected_vars (List) – The variables to want to export

  • downsample_n (int) – Take every nth value from the data

  • filename (Optional[str]) – filename to write output to

Return type

Optional[Dict]

Returns

None

openghg.util.to_dashboard_mobile(data, filename=None)[source]#

Export the Glasgow LICOR data to JSON for the dashboard

Parameters
  • data (Dict) – Data dictionary

  • filename (Union[str, Path, None]) – Filename for export of JSON

Returns

Dictonary if no filename given

Return type

dict or None

Hashing#

These handle hashing of data (usually with SHA1)

openghg.util.hash_file(filepath)[source]#

Opens the file at filepath and calculates its SHA1 hash

Taken from https://stackoverflow.com/a/22058673

Parameters

filepath (pathlib.Path) – Path to file

Returns

SHA1 hash

Return type

str

openghg.util.hash_string(to_hash)[source]#

Return the SHA-1 hash of a string

Parameters

to_hash (str) – String to hash

Returns

SHA1 hash of string

Return type

str

String manipulation#

String cleaning and formatting functions

openghg.util.clean_string(to_clean: str) str[source]#
openghg.util.clean_string(to_clean: None) None

Returns a lowercase string with only alphanumeric characters and underscores.

Parameters

to_clean (Optional[str]) – String to clean

Returns

Clean string

Return type

str or None

openghg.util.to_lowercase(d: Dict, skip_keys: Optional[List] = None) Dict[source]#
openghg.util.to_lowercase(d: List, skip_keys: Optional[List] = None) List
openghg.util.to_lowercase(d: Tuple, skip_keys: Optional[List] = None) Tuple
openghg.util.to_lowercase(d: Set, skip_keys: Optional[List] = None) Set
openghg.util.to_lowercase(d: str, skip_keys: Optional[List] = None) str

Convert an object to lowercase. All keys and values in a dictionary will be converted to lowercase as will all objects in a list, tuple or set. You can optionally pass in a list of keys to skip when lowercasing a dictionary.

Based on the answer https://stackoverflow.com/a/40789531/1303032

Parameters
  • d (Union[Dict, List, Tuple, Set, str]) – Object to lower case

  • skip_keys (Optional[List]) – List of keys to skip when lowercasing.

Returns

Dictionary of lower case keys and values

Return type

dict

openghg.util.remove_punctuation(s)[source]#

Removes punctuation and converts the passed string to lowercase

Parameters

s (str) – String to convert

Returns

Unpunctuated, lowercased string

Return type

str

Time#

Helpers to deal with all things datetime.

openghg.util.timestamp_tzaware(timestamp)[source]#

Returns the pandas Timestamp passed as a timezone (UTC) aware Timestamp.

Parameters

timestamp (pandas.Timestamp) – Timezone naive Timestamp

Returns

Timezone aware

Return type

pandas.Timestamp

openghg.util.timestamp_now()[source]#

Returns a pandas timezone (UTC) aware Timestamp for the current time.

Returns

Timestamp at current time

Return type

pandas.Timestamp

openghg.util.timestamp_epoch()[source]#

Returns the UNIX epoch time 1st of January 1970

Returns

Timestamp object at epoch

Return type

pandas.Timestamp

openghg.util.daterange_from_str(daterange_str, freq='D')[source]#

Get a Pandas DatetimeIndex from a string. The created DatetimeIndex has minute frequency.

Parameters
  • daterange_str (str) – Daterange string

  • 2019-01-01T00 (of the form) – 00:00_2019-12-31T00:00:00

Returns

DatetimeIndex covering daterange

Return type

pandas.DatetimeIndex

openghg.util.daterange_to_str(daterange)[source]#

Takes a pandas DatetimeIndex created by pandas date_range converts it to a string of the form 2019-01-01-00:00:00_2019-03-16-00:00:00

Parameters

daterange (pandas.DatetimeIndex) –

Returns

Daterange in string format

Return type

str

openghg.util.create_daterange_str(start, end)[source]#

Convert the passed datetimes into a daterange string for use in searches and Datasource interactions

Parameters
  • start_date – Start date

  • end_date – End date

Returns

Daterange string

Return type

str

openghg.util.create_daterange(start, end, freq='D')[source]#

Create a minute aligned daterange

Parameters
  • start (Timestamp) – Start date

  • end (Timestamp) – End date

Return type

DatetimeIndex

Returns

pandas.DatetimeIndex

openghg.util.daterange_overlap(daterange_a, daterange_b)[source]#

Check if daterange_a is within daterange_b.

Parameters
  • daterange_a (str) – Timezone aware daterange string. Example:

  • 2014-01-30-10 – 52:30+00:00_2014-01-30-13:22:30+00:00

  • daterange_b (str) – As daterange_a

Returns

True if daterange included

Return type

bool

openghg.util.combine_dateranges(dateranges)[source]#

Combine dateranges

Parameters

dateranges (List[str]) – Daterange strings

Returns

List of combined dateranges

Return type

list

Modified from https://codereview.stackexchange.com/a/69249

openghg.util.split_daterange_str(daterange_str, date_only=False)[source]#

Split a daterange string to the component start and end Timestamps

Parameters
  • daterange_str (str) – Daterange string of the form

  • date_only (bool) – Return only the date portion of the Timestamp, removing

  • component (the hours / seconds) –

  • 2019-01-01T00 – 00:00_2019-12-31T00:00:00

Returns

Tuple of start, end timestamps / dates

Return type

tuple (Timestamp / datetime.date, Timestamp / datetime.date)

openghg.util.closest_daterange(to_compare, dateranges)[source]#

Finds the closest daterange in a list of dateranges

Parameters
  • to_compare (str) – Daterange (as a string) to compare

  • dateranges (Union[str, List[str]]) – List of dateranges

Returns

Daterange from dateranges that’s the closest in time to to_compare

Return type

str

openghg.util.valid_daterange(daterange)[source]#

Check if the passed daterange is valid

Parameters

daterange (str) – Daterange string

Returns

True if valid

Return type

bool

openghg.util.find_daterange_gaps(start_search, end_search, dateranges)[source]#

Given a start and end date and a list of dateranges find the gaps.

For example given a list of dateranges

example = [‘2014-09-02_2014-11-01’, ‘2016-09-02_2018-11-01’]

start = timestamp_tzaware(“2012-01-01”) end = timestamp_tzaware(“2019-09-01”)

gaps = find_daterange_gaps(start, end, example)

gaps == [‘2012-01-01-00:00:00+00:00_2014-09-01-00:00:00+00:00’,

‘2014-11-02-00:00:00+00:00_2016-09-01-00:00:00+00:00’, ‘2018-11-02-00:00:00+00:00_2019-09-01-00:00:00+00:00’]

Parameters
  • start_search (Timestamp) – Start timestamp

  • end_search (Timestamp) – End timestamp

  • dateranges (List[str]) – List of daterange strings

Returns

List of dateranges

Return type

list

openghg.util.trim_daterange(to_trim, overlapping)[source]#

Removes overlapping dates from to_trim

Parameters
  • to_trim (from) – Daterange to trim down. Dates that overlap

  • to_trim

  • overlap_daterange – Daterange containing dates we want to trim

  • to_trim

Returns

Trimmed daterange

Return type

str

openghg.util.split_encompassed_daterange(container, contained)[source]#

Checks if one of the passed dateranges contains the other, if so, then split the larger daterange into three sections.

<—a—>

<———b———–>

Here b is split into three and we end up with:

<-dr1-><—a—><-dr2->

Parameters
  • daterange_a – Daterange

  • daterange_b – Daterange

Returns

Dictionary of results

Return type

dict

openghg.util.daterange_contains(container, contained)[source]#

Check if the daterange container contains the daterange contained

Parameters
  • container (str) – Daterange

  • contained (str) – Daterange

Return type

bool

Returns

bool

openghg.util.sanitise_daterange(daterange)[source]#

Make sure the daterange is correct and return tzaware daterange.

Parameters

daterange (str) – Daterange str

Returns

Timezone aware daterange str

Return type

str

openghg.util.check_nan(data)[source]#

Check if a number is Nan.

Returns a string that can be JSON serialised.

Parameters

data (Union[int, float]) – Number

Returns

Returns NA if not a number else number

Return type

str, float, int

openghg.util.check_date(date)[source]#

Check if a date string can be converted to a pd.Timestamp and returns NA if not.

Returns a string that can be JSON serialised.

Parameters

date (str) – String to test

Returns

Returns NA if not a date, otherwise date string

Return type

str

Iteration#

Our own personal itertools

openghg.util.pairwise(iterable)[source]#

Return a zip of an iterable where a is the iterable and b is the iterable advanced one step.

Parameters

iterable (Iterable) – Any iterable type

Returns

Tuple of iterables

Return type

tuple

openghg.util.unanimous(seq)[source]#

Checks that all values in an iterable object are the same

Parameters

seq (Dict) – Iterable object

Returns

bool: True if all values are the same

Return type

bool

Site Checks#

These perform checks to ensure data processed for each site is correct

openghg.util.verify_site(site)[source]#

Check if the passed site is a valid one and returns the three letter site code if found. Otherwise we use fuzzy text matching to suggest sites with similar names.

Parameters

site (str) – Three letter site code or site name

Returns

Verified three letter site code if valid site

Return type

str

openghg.util.multiple_inlets(site)[source]#

Check if the passed site has more than one inlet

Parameters

site (str) – Three letter site code

Returns

True if multiple inlets

Return type

bool

Cloud#

These handle checks on cloud based functionality

openghg.util.running_in_cloud()[source]#

Are we running in the cloud?

Checks for the OPENGHG_CLOUD environment variable being set

Returns

True if running in cloud

Return type

bool

Errors#

Customised errors for OpenGHG

InvalidSiteError

UnknownDataError

FunctionError