icenet.data.datasets package

Submodules

icenet.data.datasets.utils module

class icenet.data.datasets.utils.SplittingMixin[source]

Bases: object

Read train, val, test datasets from tfrecord protocol buffer files.

Split and shuffle data if specified as well.

Example

This mixin is not to be used directly, but to give an idea of its use:

# Initialise SplittingMixin >>> split_dataset = SplittingMixin()

# Add file paths to the train, validation, and test datasets >>> split_dataset.add_records(base_path=”./network_datasets/notebook_data/”, hemi=”south”)

add_records(base_path: str, hemi: str) None[source]

Add list of paths to train, val, test *.tfrecord(s) to relevant instance attributes.

Add sorted list of file paths to train, validation, and test datasets in SplittingMixin.

Parameters:
  • base_path (str) – The base path where the datasets are located.

  • hemi (str) – The hemisphere the datasets correspond to.

Returns:

None. Updates self.train_fns, self.val_fns, self.test_fns with list

of *.tfrecord files.

property batch_size: int

The dataset’s batch size.

check_dataset(split: str = 'train') None[source]

Check the dataset for NaN, log debugging info regarding dataset shape and bounds.

Also logs a warning if any NaN are found.

Parameters:

split – The split of the dataset to check. Default is “train”.

property dtype: str

The dataset’s data type.

get_split_datasets(ratio: object = None)[source]

Retrieves train, val, and test datasets from corresponding attributes of SplittingMixin.

Retrieves the train, validation, and test datasets from the file paths stored in the

train_fns, val_fns, and test_fns attributes of SplittingMixin.

Parameters:

ratio (optional) – A float representing the truncated list of datasets to be used. If not specified, all datasets will be used. Defaults to None.

Returns:

A tuple containing the train, validation, and test datasets.

Return type:

tuple

Raises:
  • RuntimeError – If no files have been found in the train, validation, and test datasets.

  • RuntimeError – If the ratio is greater than 1.

property n_forecast_days: int

The number of days to forecast in prediction.

property num_channels: int

The number of channels in dataset.

property shape: object

The shape of dataset.

property shuffling: bool

A flag for whether training dataset(s) are marked to be shuffled.

test_fns = []
train_fns = []
val_fns = []
icenet.data.datasets.utils.get_decoder(shape: object, channels: object, forecasts: object, num_vars: int = 1, dtype: str = 'float32') object[source]

Returns a decoder function used for parsing and decoding data from tfrecord protocol buffer.

Parameters:
  • shape – The shape of the input data.

  • channels – The number of channels in the input data.

  • forecasts – The number of days to forecast in prediction

  • num_vars (optional) – The number of variables in the input data. Defaults to 1.

  • dtype (optional) – The data type of the input data. Defaults to “float32”.

Returns:

A function that can be used to parse and decode data. It takes in a protocol buffer

(tfrecord) as input and returns the parsed and decoded data.

Module contents