icenet.data package

Subpackages

Submodules

icenet.data.cli module

icenet.data.cli.add_date_args(arg_parser: object)[source]
Parameters:

arg_parser

icenet.data.cli.csv_arg(string: str) list[source]
Parameters:

string

Returns:

icenet.data.cli.csv_of_csv_arg(string: str) list[source]
Parameters:

string

Returns:

icenet.data.cli.date_arg(string: str) object[source]
Parameters:

string

Returns:

icenet.data.cli.dates_arg(string: str) object[source]
Parameters:

string

Returns:

icenet.data.cli.download_args(choices: object = None, dates: bool = True, dates_optional: bool = False, var_specs: bool = True, workers: bool = False, extra_args: object = ()) object[source]
Parameters:
  • choices

  • dates

  • dates_optional

  • var_specs

  • workers

  • extra_args

Returns:

icenet.data.cli.int_or_list_arg(string: str) object[source]
Parameters:

string

Returns:

icenet.data.cli.process_args(dates: bool = True, ref_option: bool = True, extra_args: object = ()) object[source]
Parameters:
  • dates

  • ref_option

  • extra_args

Returns:

icenet.data.cli.process_date_args(args: object) dict[source]
Parameters:

args

Returns:

icenet.data.dataset module

class icenet.data.dataset.IceNetDataSet(configuration_path: str, *args, batch_size: int = 4, path: str = './network_datasets', shuffling: bool = False, **kwargs)[source]

Bases: SplittingMixin, DataCollection

Initialises and configures a dataset.

It loads a JSON configuration file, updates the _config attribute with the result, creates a data loader, and methods to access the dataset.

_config

A dict used to store configuration loaded from JSON file.

_configuration_path

The path to the JSON configuration file.

_batch_size

The batch size for the data loader.

Type:

int

_counts

A dict with number of elements in train, val, test.

_dtype

The type of the dataset.

Type:

object

_loader_config

The path to the data loader configuration file.

_generate_workers

An integer representing number of workers for parallel processing with Dask.

_n_forecast_days

An integer representing number of days to predict for.

Type:

int

_num_channels

An integer representing number of channels (input variables) in the dataset.

Type:

int

_shape

The shape of the dataset.

Type:

int

_shuffling

A flag indicating whether to shuffle the data or not.

Type:

bool

property channels: list

The list of channels (variable names) specified in the dataset config file.

property counts: dict

A dict with number of elements in train, val, test in the config file.

get_data_loader(n_forecast_days: object = None, generate_workers: object = None) object[source]

Create an instance of the IceNetDataLoader class.

Parameters:
  • n_forecast_days (optional) – The number of forecast days to be used by the data loader. If not provided, defaults to the value specified in the configuration file.

  • generate_workers (optional) – An integer representing number of workers to use for parallel processing with Dask. If not provided, defaults to the value specified in the configuration file.

Returns:

An instance of the DaskMultiWorkerLoader class configured with the specified parameters.

property loader_config: str

The path to the JSON loader configuration file stored in the dataset config file.

class icenet.data.dataset.MergedIceNetDataSet(configuration_paths: object, *args, batch_size: int = 4, path: str = './network_datasets', shuffling: bool = False, **kwargs)[source]

Bases: SplittingMixin, DataCollection

Parameters:
  • identifier

  • configuration_paths – List of configurations to load

  • batch_size

  • path

property channels
check_dataset(split: str = 'train')[source]
Parameters:

split

property counts
get_data_loader()[source]
Returns:

icenet.data.dataset.check_dataset() None[source]

Check the dataset for a specific split.

icenet.data.dataset.get_args() object[source]

Parse command line arguments using the argparse module.

Returns:

An object containing the parsed command line arguments.

Example

Assuming CLI arguments provided.

args = get_args() print(args.dataset) print(args.split) print(args.verbose)

icenet.data.loader module

icenet.data.loader.create()[source]
icenet.data.loader.create_get_args() object[source]

Converts input data creation argument strings to objects, and assigns them as attributes to the namespace.

The args added in this function relate to the dataloader creation process.

Returns:

An argparse.ArgumentParser object with all arguments added via add_argument accessible

as object attributes.

icenet.data.loader.save_sample(output_folder: str, date: object, sample: tuple)[source]
Parameters:
  • output_folder

  • date

  • sample

icenet.data.process module

class icenet.data.process.IceNetPreProcessor(abs_vars, anom_vars, name, train_dates, val_dates, test_dates, *args, data_shape=(432, 432), dtype=<class 'numpy.float32'>, exclude_vars=(), file_filters=('latlon_', ), identifier=None, linear_trends=('siconca', ), linear_trend_steps=7, meta_vars=(), missing_dates=(), minmax=True, no_normalise=('siconca', ), path='./processed', parallel_opens=False, ref_procdir=None, source_data='./data', update_key=None, update_loader=True, **kwargs)[source]

Bases: Processor

Parameters:
  • abs_vars

  • anom_vars

  • name

  • train_dates

  • val_dates

  • test_dates

  • *args

  • data_shape

  • dtype

  • exclude_vars

  • file_filters

  • identifier

  • linear_trends

  • linear_trend_days

  • meta_vars

  • missing_dates

  • minmax

  • no_normalise

  • path

  • parallel_opens

  • ref_procdir

  • source_data

  • update_key

  • update_loader

DATE_FORMAT = '%Y_%m_%d'
static mean_and_std(array: object)[source]

Return the mean and standard deviation of an array-like object (intended use case is for normalising a raw satellite data array based on a list of samples used for training). :param array: :return:

property missing_dates
post_normalisation(var_name: str, da: object)[source]
Parameters:
  • var_name

  • da

Returns:

pre_normalisation(var_name: str, da: object)[source]
Parameters:
  • var_name

  • da

Returns:

process()[source]
update_loader_config()[source]
Returns:

icenet.data.producers module

class icenet.data.producers.DataCollection(*args, identifier: object = None, north: bool = True, south: bool = False, path: str = './data', **kwargs)[source]

Bases: HemisphereMixin

An Abstract base class with common interface for data collection classes.

_identifier

The identifier of the data collection.

_path

The base path of the data collection.

_hemisphere

The hemisphere(s) of the data collection.

Type:

int

property base_path: str

The base path of the data collection.

property identifier: object

The identifier (label) for this data collection.

class icenet.data.producers.DataProducer(*args, dry: bool = False, overwrite: bool = False, **kwargs)[source]

Bases: DataCollection

Manages the creation and organisation of data files.

dry

Flag specifying whether the data producer should be in dry run mode or not.

overwrite

Flag specifying whether existing files should be overwritten or not.

get_data_var_folder(var: str, append: object = None, hemisphere: object = None, missing_error: bool = False) str[source]

Returns the path for a specific data variable.

Appends additional folders to the path if specified in the append parameter.

Parameters:
  • var – The data variable.

  • append (optional) – Additional folders to append to the path. Defaults to None.

  • hemisphere (optional) – The hemisphere. Defaults to None.

  • missing_error (optional) – Flag to specify if missing directories should be treated as an error. Defaults to False.

Returns:

The path for the specific data variable.

Return type:

str

class icenet.data.producers.Downloader(*args, **kwargs)[source]

Bases: DataProducer

Abstract base class for a downloader.

abstract download()[source]

Abstract download method for this downloader: Must be implemented by subclasses.

class icenet.data.producers.Generator(*args, **kwargs)[source]

Bases: DataProducer

Abstract base class for a generator.

abstract generate()[source]

Abstract generate method for this generator: Must be implemented by subclasses.

class icenet.data.producers.Processor(identifier: str, source_data: object, *args, file_filters: object = (), lead_time: int = 93, test_dates: object = (), train_dates: object = (), val_dates: object = (), **kwargs)[source]

Bases: DataProducer

An abstract base class for data processing classes.

Provides methods for initialising source data, processing the data, and

saving the processed data to standard netCDF files.

_file_filters

List of file filters to exclude certain files during data processing.

_lead_time

Forecast/lead time used in the data processing.

source_data

Path to the source data directory.

_var_files

Dictionary storing variable files organised by variable name.

_processed_files

Dictionary storing the processed files organised by variable name.

_dates

Named tuple that stores the dates used for training, validation, and testing.

property dates: object

The dates used for training, validation, and testing in this class as a named collections.tuple.

init_source_data(lag_days: object = None) None[source]

Initialises source data by globbing the files and organising based on date. Adds previous n days of lag_days if not already in self._dates

if lag_days>0.

Adds next n days of self._lead_time if not already in self._dates

if self._lead_time>0.

Parameters:

lag_days – The number of lag days to include in the data processing.

Returns:

None. The method updates the _var_files attribute of the Processor object.

Raises:

OSError – If the source data directory does not exist.

property lead_time: int

The lead time used in the data processing.

abstract process()[source]

Abstract method defining data processing: Must be implemented by subclasses.

property processed_files: dict

A dict with the processed files organised by variable name.

save_processed_file(var_name: str, name: str, data: object, **kwargs) str[source]

Save processed data to netCDF file.

Parameters:
  • var_name – The name of the variable.

  • name – The name of the file.

  • data – The data to be saved.

  • **kwargs – Additional keyword arguments to be passed to the get_data_var_folder method.

Returns:

The path of the saved netCDF file.

property source_data: str

The source data directory as a string.

icenet.data.utils module

icenet.data.utils.assign_lat_lon_coord_system(cube: object)[source]

Assign coordinate system to iris cube to allow regridding.

Parameters:

cube

Below taken from https://hub.binder.pangeo.io/user/pangeo-data-pan–cmip6-examples-ro965nih/lab and adapted slightly

Parameters:
  • server

  • files_type

  • local_node

  • latest

  • project

  • format

  • use_csrf

  • search

Returns:

icenet.data.utils.gridcell_angles_from_dim_coords(cube: object)[source]

Author: Tony Phillips (BAS)

Wrapper for gridcell_angles() that derives the 2D X and Y lon/lat coordinates from 1D X and Y coordinates identifiable as ‘x’ and ‘y’ axes

The provided cube must have a coordinate system so that its X and Y coordinate bounds (which are derived if necessary) can be converted to lons and lats

Parameters:

cube

Returns:

icenet.data.utils.invert_gridcell_angles(angles: object)[source]

Author: Tony Phillips (BAS)

Negate a cube of gridcell angles in place, transforming gridcell_angle_from_true_east <–> true_east_from_gridcell_angle :param angles:

icenet.data.utils.rotate_grid_vectors(u_cube: object, v_cube: object, angles: object)[source]

Author: Tony Phillips (BAS)

Wrapper for rotate_grid_vectors() that can rotate multiple masked spatial fields in one go by iterating over the horizontal spatial axes in slices

Parameters:
  • u_cube

  • v_cube

  • angles

Returns:

Module contents