icenet.data.datasets package

icenet.data.datasets package#

Submodules#

icenet.data.datasets.utils module#

class icenet.data.datasets.utils.SplittingMixin[source]#

Bases: object

Read train, val, test datasets from tfrecord protocol buffer files.

Split and shuffle data if specified as well.

Example

This mixin is not to be used directly, but to give an idea of its use:

# Initialise SplittingMixin >>> split_dataset = SplittingMixin()

# Add file paths to the train, validation, and test datasets >>> split_dataset.add_records(base_path=”./network_datasets/notebook_data/”, hemi=”south”)

add_records(base_path: str, hemi: str) → None[source]#

Add list of paths to train, val, test *.tfrecord(s) to relevant instance attributes.

Add sorted list of file paths to train, validation, and test datasets in SplittingMixin.

Parameters:

base_path (str) – The base path where the datasets are located.
hemi (str) – The hemisphere the datasets correspond to.

Returns:

None. Updates self.train_fns, self.val_fns, self.test_fns with list: of *.tfrecord files.

property batch_size: int#: The dataset’s batch size.

check_dataset(split: str = 'train') → None[source]#

Check the dataset for NaN, log debugging info regarding dataset shape and bounds.

Also logs a warning if any NaN are found.

Parameters:: split – The split of the dataset to check. Default is “train”.

property dtype: str#: The dataset’s data type.

get_split_datasets(ratio: object = None)[source]#

Retrieves train, val, and test datasets from corresponding attributes of SplittingMixin.

Retrieves the train, validation, and test datasets from the file paths stored in the: train_fns, val_fns, and test_fns attributes of SplittingMixin.

Parameters:

ratio (optional) – A float representing the truncated list of datasets to be used. If not specified, all datasets will be used. Defaults to None.

Returns:

A tuple containing the train, validation, and test datasets.

Return type:

tuple

Raises:

RuntimeError – If no files have been found in the train, validation, and test datasets.
RuntimeError – If the ratio is greater than 1.

property n_forecast_days: int#: The number of days to forecast in prediction.

property num_channels: int#: The number of channels in dataset.

property shape: object#: The shape of dataset.

property shuffling: bool#: A flag for whether training dataset(s) are marked to be shuffled.

test_fns = []#

train_fns = []#

val_fns = []#

icenet.data.datasets.utils.get_decoder(shape: object, channels: object, forecasts: object, num_vars: int = 1, dtype: str = 'float32') → object[source]#

Returns a decoder function used for parsing and decoding data from tfrecord protocol buffer.

Parameters:

shape – The shape of the input data.
channels – The number of channels in the input data.
forecasts – The number of days to forecast in prediction
num_vars (optional) – The number of variables in the input data. Defaults to 1.
dtype (optional) – The data type of the input data. Defaults to “float32”.

Returns:

A function that can be used to parse and decode data. It takes in a protocol buffer: (tfrecord) as input and returns the parsed and decoded data.

icenet.data.datasets package

Contents

icenet.data.datasets package#

Submodules#

icenet.data.datasets.utils module#

Module contents#