Util#
Helper functions that are used throughout OpenGHG. From file hashing to timestamp handling.
Domain#
- openghg.util.find_domain(domain, domain_filepath=None)[source]#
- Finds the latitude and longitude values in degrees associated with a given domain name. - Parameters:
- domain ( - str) – Pre-defined domain name
- domain_filepath ( - Union[- str,- Path,- None]) – Alternative domain info file. Defaults to openghg_defs input.
 
- Returns:
- Latitude and longitude values for the domain in degrees. 
- Return type:
- array, array 
 
Downloading data#
- openghg.util.download_data(url, filepath=None, timeout=10)[source]#
- Download data file, with progress bar. - Based on https://stackoverflow.com/a/63831344/1303032 - Parameters:
- url ( - str) – URL of content to download
- filepath ( - str|- Path|- None) – Filepath to write out data
- timeount – Timeout for HTTP request (seconds) 
 
- Returns:
- Bytes if no filepath given 
- Return type:
- bytes / None 
 
File handling, compression#
- openghg.util.compress(data)[source]#
- Compress the given data - Parameters:
- data ( - bytes) – Binary data
- Returns:
- Compressed data 
- Return type:
- bytes 
 
- openghg.util.compress_json(data)[source]#
- Convert object to JSON string and compress - Parameters:
- data ( - Any) – Object to pass to json.dumps
- Returns:
- Compressed binary data 
- Return type:
- bytes 
 
- openghg.util.compress_str(s)[source]#
- Compress a string - Parameters:
- s ( - str) – String
- Return type:
- bytes
 - Retruns:
- bytes: Compressed data 
 
- openghg.util.decompress(data)[source]#
- Decompress the given data - Parameters:
- data ( - bytes) – Compressed data
- Returns:
- Decompressed data 
- Return type:
- bytes 
 
- openghg.util.decompress_json(data)[source]#
- Decompress a string and load to JSON - Parameters:
- data ( - bytes) – Compressed binary data
- Return type:
- Any
- Returns:
- Object loaded from JSON 
 
- openghg.util.decompress_str(data)[source]#
- Decompress a string from bytes - Parameters:
- data ( - bytes) – Compressed data
- Returns:
- Decompressed str 
- Return type:
- str 
 
- openghg.util.get_datapath(filename, directory=None)[source]#
- Returns the correct path to data files used for assigning attributes - Parameters:
- filename ( - Union[- str,- Path]) – Name of file to be accessed
- Returns:
- Path of file 
- Return type:
- pathlib.Path 
 
- openghg.util.get_logfile_path()[source]#
- Get the logfile path - Returns:
- Path to logfile 
- Return type:
- Path 
 
- openghg.util.load_json(path)[source]#
- Returns a dictionary deserialised from JSON. - Parameters:
- path ( - str|- Path) – Path to file, can be any filepath
- Returns:
- Dictionary created from JSON 
- Return type:
- dict 
 
- openghg.util.read_header(filepath, comment_char='#')[source]#
- Reads the header lines denoted by the comment_char - Parameters:
- filepath ( - Union[- str,- Path]) – Path to file
- comment_char ( - str) – Character that denotes a comment line
- file (at the start of a) 
 
- Returns:
- List of lines in the header 
- Return type:
- list 
 
Hashing#
- openghg.util.hash_bytes(data)[source]#
- Calculate the SHA1 sum of some data - Parameters:
- data ( - bytes) – Binary data
- Returns:
- SHA1 hash 
- Return type:
- str 
 
- openghg.util.hash_file(filepath)[source]#
- Opens the file at filepath and calculates its SHA1 hash - Taken from https://stackoverflow.com/a/22058673 - Parameters:
- filepath (pathlib.Path) – Path to file 
- Returns:
- SHA1 hash 
- Return type:
- str 
 
- openghg.util.hash_retrieved_data(to_hash)[source]#
- Hash data retrieved from a data platform. This calculates the SHA1 of the metadata and the start date, end date and the number of timestamps in the Dataset. - Parameters:
- to_hash ( - dict[- str,- dict]) – Dictionary to hash
- as (We expected this to be a dictionary such) 
- {species_key – {“data”: xr.Dataset, “metadata”: {…}}} 
 
- Returns:
- Dictionary of hash: species_key 
- Return type:
- dict 
 
Measurement helpers#
- openghg.util.check_lifetime_monthly(lifetime)[source]#
- Check whether retrieved lifetime value represents monthly lifetimes. This checks whether lifetime is a list and contains 12 values. - Parameters:
- lifetime ( - Union[- str,- list[- str],- None]) – str or list representation of lifetime value
- Returns:
- True of lifetime matches criteria for monthly data, False otherwise 
- Return type:
- bool 
 - Raises ValueError:
- if lifetime is a list but does not contain exactly 12 entries, one for each month 
 
- openghg.util.format_inlet(inlet, units='m', key_name=None, special_keywords=None)[source]#
- Make sure inlet / height name conforms to standard. The standard imposed can depend on the associated key_name itself (can be supplied as an option to check). - This standard is as follows:
- number followed by unit 
- number alone if unit / derviative is specified at the end of key_name (e.g. station_height_masl) 
- unchanged if this is one of the special keywords (by default “multiple” or “various”) 
 
- Other considerations:
- For units of “m”, we will also look for “magl” and “masl” (metres above ground and sea level) 
- If the input string just contains numbers, it is assumed this is already within the correct unit. 
 
 - Parameters:
- inlet ( - str|- slice|- None|- list[- str|- slice|- None]) – Inlet / Height value in the specified units
- units ( - str) – Units for the inlet value (“m” by default)
- key_name ( - str|- None) – Name of the associated key. This is optional but will be used to determine whether the unit value should be added to the output string.
- special_keywords ( - list|- None) – Specify special keywords inlet could be set to If so do not apply any formatting. If this is not set a special keyword of “multiple” and “column” will still be allowed.
 
- Return type:
- str|- slice|- None|- list[- str|- slice|- None]
- Returns:
- same type as input, with all strings formatted 
 - Usage:
- >>> format_inlet("10") "10m" >>> format_inlet("10m") "10m" >>> format_inlet("10magl") "10m" >>> format_inlet("10.111") "10.1m" >>> format_inlet(["10", 100]) ["10m", "100m"] >>> format_inlet("multiple") "multiple" >>> format_inlet("10m", key_name="inlet") "10m" >>> format_inlet("10m", key_name="inlet_magl") "10" >>> format_inlet("10m", key_name="station_height_masl") "10" 
 
- openghg.util.find_matching_site(site_name, possible_sites)[source]#
- Try and find a similar name to site_name in site_list and return a suggestion or error string. - Parameters:
- site_name ( - str) – Name of site
- site_list – List of sites to check 
 
- Returns:
- Suggestion / error message 
- Return type:
- str 
 
- openghg.util.multiple_inlets(site, site_filepath=None)[source]#
- Check if the passed site has more than one inlet - Parameters:
- site ( - str) – Three letter site code
- site_filepath ( - Union[- str,- Path,- None]) – Alternative site info file. Defaults to openghg_defs input.
 
- Returns:
- True if multiple inlets 
- Return type:
- bool 
 
- openghg.util.molar_mass(species, species_filepath=None)[source]#
- Extracts the molar mass of a species. - Parameters:
- species ( - str) – Species name
- species_filepath ( - Union[- str,- Path,- None]) – Alternative species info file. Defaults to openghg_defs input.
 
- Returns:
- Molar mass of species 
- Return type:
- float 
 
- openghg.util.species_lifetime(species, species_filepath=None)[source]#
- Find species lifetime. This can either be labelled as “lifetime” or “lifetime_monthly”. - Note: no species synonyms accepted yet - Parameters:
- species ( - str|- None) – Species name e.g. “ch4” or “co2”
- species_filepath ( - Union[- str,- Path,- None]) – Alternative species info file. Defaults to openghg_defs input.
 
- Returns:
- Extracted lifetime or None is no lifetime was present. 
- Return type:
- str / list / None 
 
- openghg.util.synonyms(species, lower=True, allow_new_species=True, species_filepath=None)[source]#
- Check to see if there are other names that we should be using for a particular input. E.g. If CFC-11 or CFC11 was input, go on to use cfc11. - Parameters:
- species ( - str) – Input string that you’re trying to match
- lower ( - bool) – Return all lower case
- allow_new_species ( - bool) – Return original value (may be lower case) if this (or a synonym) is not found in the database. If False, raise a ValueError.
- species_filepath ( - Union[- str,- Path,- None]) – Alternative species info file. Defaults to openghg_defs input.
 
- Returns:
- Matched species string 
- Return type:
- str 
 - TODO: Decide if we need to make this lower case or not. Included this here so this occurs in one place which can be linked to and changed if needed. 
- openghg.util.site_code_finder(site_name, site_filepath=None)[source]#
- Find the site code for a given site name. - Parameters:
- site_name ( - str) – Site long name
- site_filepath ( - Union[- str,- Path,- None]) – Alternative site info file. Defaults to openghg_defs input.
 
- Returns:
- Three letter site code if found 
- Return type:
- str or None 
 
- openghg.util.verify_site(site, site_filepath=None)[source]#
- Check if the passed site is a valid one and returns the three letter site code if found. Otherwise we use fuzzy text matching to suggest sites with similar names. - Parameters:
- site ( - str) – Three letter site code or site name
- site_filepath ( - Union[- str,- Path,- None]) – Alternative site info file. Defaults to openghg_defs input.
 
- Returns:
- Verified three letter site code if valid site 
- Return type:
- str 
 
String handling#
- openghg.util.clean_string(to_clean)[source]#
- Returns a lowercase string with only alphanumeric characters and underscores. - Parameters:
- to_clean ( - str|- None) – String to clean
- Returns:
- Clean string 
- Return type:
- str or None 
 
- openghg.util.is_number(s)[source]#
- Is it a number? - https://stackoverflow.com/q/354038 - Parameters:
- s ( - Any) – String which may be a number
- Return type:
- bool
- Returns:
- bool 
 
- openghg.util.remove_punctuation(s)[source]#
- Removes punctuation and converts the passed string to lowercase - Parameters:
- s ( - str) – String to convert
- Returns:
- Unpunctuated, lowercased string 
- Return type:
- str 
 
- openghg.util.to_lowercase(d, skip_keys=None)[source]#
- Convert an object to lowercase. All keys and values in a dictionary will be converted to lowercase as will all objects in a list, tuple or set. You can optionally pass in a list of keys to skip when lowercasing a dictionary. - Based on the answer https://stackoverflow.com/a/40789531/1303032 - Parameters:
- d ( - dict|- list|- tuple|- set|- str) – Object to lower case
- skip_keys ( - list|- None) – List of keys to skip when lowercasing.
 
- Returns:
- Dictionary of lower case keys and values 
- Return type:
- dict 
 
Dates and times#
- openghg.util.check_date(date)[source]#
- Check if a date string can be converted to a pd.Timestamp and returns NA if not. - Returns a string that can be JSON serialised. - Parameters:
- date ( - str) – String to test
- Returns:
- Returns NA if not a date, otherwise date string 
- Return type:
- str 
 
- openghg.util.check_nan(data)[source]#
- Check if a number is Nan. - Returns a string that can be JSON serialised. - Parameters:
- data ( - int|- float) – Number
- Returns:
- Returns NA if not a number else number 
- Return type:
- str, float, int 
 
- openghg.util.closest_daterange(to_compare, dateranges)[source]#
- Finds the closest daterange in a list of dateranges - Parameters:
- to_compare ( - str) – Daterange (as a string) to compare
- dateranges ( - str|- list[- str]) – List of dateranges
 
- Returns:
- Daterange from dateranges that’s the closest in time to to_compare 
- Return type:
- str 
 
- openghg.util.combine_dateranges(dateranges)[source]#
- Combine dateranges - Parameters:
- dateranges ( - list[- str]) – Daterange strings
- Returns:
- List of combined dateranges 
- Return type:
- list 
 - Modified from https://codereview.stackexchange.com/a/69249 
- openghg.util.create_daterange(start, end, freq='D')[source]#
- Create a minute aligned daterange - Parameters:
- start ( - Timestamp) – Start date
- end ( - Timestamp) – End date
 
- Return type:
- DatetimeIndex
- Returns:
- pandas.DatetimeIndex 
 
- openghg.util.create_daterange_str(start, end)[source]#
- Convert the passed datetimes into a daterange string for use in searches and Datasource interactions - Parameters:
- start_date – Start date 
- end_date – End date 
 
- Returns:
- Daterange string 
- Return type:
- str 
 
- openghg.util.create_frequency_str(value=None, unit=None, period=None, include_units=True)[source]#
- Create a suitable frequency string based either a value and unit pair or a period value. The unit will be made singular if the value is 1. - Check time_offset_definition() for accepted input units. - Parameters:
- value ( - int|- float|- None) – Value and unit pair to use
- unit ( - str|- None) – Value and unit pair to use
- period ( - str|- tuple|- None) – Suitable input for period (see parse_period() for more details)
 
- Returns:
- Formatted string - Examples: >>> create_frequency_str(unit=1, value=”hour”) - ”1 hour” - >>> create_frequency(period="3MS") "3 months" >>> create_frequency(period="yearly") "1 year" 
- Return type:
- str 
 
- openghg.util.daterange_contains(container, contained)[source]#
- Check if the daterange container contains the daterange contained - Parameters:
- container ( - str) – Daterange
- contained ( - str) – Daterange
 
- Return type:
- bool
- Returns:
- bool 
 
- openghg.util.daterange_from_str(daterange_str, freq='D')[source]#
- Get a Pandas DatetimeIndex from a string. The created DatetimeIndex has minute frequency. - Parameters:
- daterange_str (str) – Daterange string 
- 2019-01-01T00 (of the form) – 00:00_2019-12-31T00:00:00 
 
- Returns:
- DatetimeIndex covering daterange 
- Return type:
- pandas.DatetimeIndex 
 
- openghg.util.daterange_overlap(daterange_a, daterange_b)[source]#
- Check if daterange_a is within daterange_b. - Parameters:
- daterange_a (str) – Timezone aware daterange string. Example: 
- 2014-01-30-10 – 52:30+00:00_2014-01-30-13:22:30+00:00 
- daterange_b (str) – As daterange_a 
 
- Returns:
- True if daterange included 
- Return type:
- bool 
 
- openghg.util.daterange_to_str(daterange)[source]#
- Takes a pandas DatetimeIndex created by pandas date_range converts it to a string of the form 2019-01-01-00:00:00_2019-03-16-00:00:00 - Parameters:
- daterange (pandas.DatetimeIndex) 
- Returns:
- Daterange in string format 
- Return type:
- str 
 
- openghg.util.find_daterange_gaps(start_search, end_search, dateranges)[source]#
- Given a start and end date and a list of dateranges find the gaps. - For example given a list of dateranges - example = [‘2014-09-02_2014-11-01’, ‘2016-09-02_2018-11-01’] - start = timestamp_tzaware(“2012-01-01”) end = timestamp_tzaware(“2019-09-01”) - gaps = find_daterange_gaps(start, end, example) - gaps == [‘2012-01-01-00:00:00+00:00_2014-09-01-00:00:00+00:00’,
- ‘2014-11-02-00:00:00+00:00_2016-09-01-00:00:00+00:00’, ‘2018-11-02-00:00:00+00:00_2019-09-01-00:00:00+00:00’] 
 - Parameters:
- start_search ( - Timestamp) – Start timestamp
- end_search ( - Timestamp) – End timestamp
- dateranges ( - list[- str]) – List of daterange strings
 
- Returns:
- List of dateranges 
- Return type:
- list 
 
- openghg.util.find_duplicate_timestamps(data)[source]#
- Check for duplicates - Parameters:
- data ( - Dataset|- DataFrame) – Data object to check. Should have a time attribute or index
- Returns:
- A list of duplicates 
- Return type:
- list 
 
- openghg.util.first_last_dates(keys)[source]#
- Find the first and last timestamp from a list of keys - Parameters:
- keys ( - list) – List of keys
- Returns:
- First and last timestamp 
- Return type:
- tuple 
 
- openghg.util.in_daterange(start_a, end_a, start_b, end_b)[source]#
- Check if two dateranges overlap. - Parameters:
- start – Start datetime 
- end – End datetime 
 
- Returns:
- True if overlap 
- Return type:
- bool 
 
- openghg.util.parse_period(period)[source]#
- Parses period input and converts to a value, unit pair. - Check time_offset_definition() for accepted input units. - Parameters:
- period ( - str|- tuple) –- Period of measurements. Should be one of: - ”yearly”, “monthly” 
- suitable pandas Offset Alias 
- tuple of (value, unit) as would be passed to pandas.Timedelta function 
 
- Returns:
- class containing value and associated time period (subclass of NamedTuple) - Examples: >>> parse_period(“12H”) - TimePeriod(12, “hours”) - >>> parse_period("yearly") TimePeriod(1, "years") >>> parse_period("monthly") TimePeriod(1, "months") >>> parse_period((1, "minute")) TimePeriod(1, "minutes") 
- Return type:
- TimePeriod 
 
- openghg.util.relative_time_offset(value=None, unit=None, period=None)[source]#
- Create relative time offset based on inputs. This is based on the pandas DateOffset and Timedelta functions. - Check time_offset_definition() for accepted input units. - If the input is “years” or “months” a relative offset (DateOffset) will be created since these are variable units. For example: - “2013-01-01” + 1 year relative offset = “2014-01-01” 
- “2012-05-01” + 2 months relative offset = “2012-07-01” 
 - Otherwise the Timedelta function will be used. - Parameters:
- value ( - int|- float|- None) – Value and unit pair to use
- unit ( - str|- None) – Value and unit pair to use
- period ( - str|- tuple|- None) – Suitable input for period (see parse_period() for more details)
 
- Returns:
- Time offset object, appropriate for the period type 
- Return type:
- DateOffset/Timedelta 
 
- openghg.util.sanitise_daterange(daterange)[source]#
- Make sure the daterange is correct and return tzaware daterange. - Parameters:
- daterange ( - str) – Daterange str
- Returns:
- Timezone aware daterange str 
- Return type:
- str 
 
- openghg.util.split_daterange_str(daterange_str, date_only=False)[source]#
- Split a daterange string to the component start and end Timestamps - Parameters:
- daterange_str ( - str) – Daterange string of the form
- date_only ( - bool) – Return only the date portion of the Timestamp, removing
- component (the hours / seconds) 
- 2019-01-01T00 – 00:00_2019-12-31T00:00:00 
 
- Returns:
- Tuple of start, end timestamps / dates 
- Return type:
- tuple (Timestamp / datetime.date, Timestamp / datetime.date) 
 
- openghg.util.split_encompassed_daterange(container, contained)[source]#
- Checks if one of the passed dateranges contains the other, if so, then split the larger daterange into three sections. - <—a—> - <———b———–> - Here b is split into three and we end up with: - <-dr1-><—a—><-dr2-> - Parameters:
- daterange_a – Daterange 
- daterange_b – Daterange 
 
- Returns:
- Dictionary of results 
- Return type:
- dict 
 
- openghg.util.time_offset(value=None, unit=None, period=None)[source]#
- Create time offset based on inputs. This will return a Timedelta object and cannot create relative offsets (this includes “weeks”, “months”, “years”). - Parameters:
- value ( - int|- float|- None) – Value and unit pair to use
- unit ( - str|- None) – Value and unit pair to use
- period ( - str|- tuple|- None) – Suitable input for period (see parse_period() for more details)
 
- Returns:
- Time offset object 
- Return type:
- Timedelta 
 
- openghg.util.time_offset_definition()[source]#
- Returns synonym definition for time offset inputs. - Accepted inputs are as follows:
- “months”: “monthly”, “months”, “month”, “MS” 
- “years”: “yearly”, “years”, “annual”, “year”, “AS”, “YS” 
- “weeks”: “weekly”, “weeks”, “week”, “W” 
- “days”: “daily”, “days”, “day”, “D” 
- “hours”: “hourly”, “hours”, “hour”, “hr”, “h”, “H” 
- “minutes”: “minutely”, “minutes”, “minute”, “min”, “m”, “T” 
- “seconds”: “secondly”, “seconds”, “second”, “sec”, “s”, “S” 
 
 - This is to ensure the correct keyword for using the DataOffset and TimeDelta functions can be supplied (want this to be “years”, “months” etc.) - Returns:
- containing list of values of synonym values 
- Return type:
- dict 
 
- openghg.util.timestamp_epoch()[source]#
- Returns the UNIX epoch time 1st of January 1970 - Returns:
- Timestamp object at epoch 
- Return type:
- pandas.Timestamp 
 
- openghg.util.timestamp_now()[source]#
- Returns a pandas timezone (UTC) aware Timestamp for the current time. - Returns:
- Timestamp at current time 
- Return type:
- pandas.Timestamp 
 
- openghg.util.timestamp_tzaware(timestamp)[source]#
- Returns the pandas Timestamp passed as a timezone (UTC) aware Timestamp. - Parameters:
- timestamp (pandas.Timestamp) – Timezone naive Timestamp 
- Returns:
- Timezone aware 
- Return type:
- pandas.Timestamp 
 
User#
Handling user configuration files.
- openghg.util.create_config(silent=False)[source]#
- Creates a user config. - Parameters:
- silent ( - bool) – Creates the basic configuration file with only
- location. (the user's object store in a default) 
 
- Return type:
- None
- Returns:
- None 
 
- openghg.util.get_user_config_path()[source]#
- Returns path to user config file. - This file is created in the user’s home directory in ~/.ghgconfig/openghg/user.conf on Linux / macOS or in LOCALAPPDATA/openghg/openghg.conf on Windows. - Returns:
- Path to user config file 
- Return type:
- pathlib.Path 
 
Miscellaneous#
Some itertools like functions.
