Skip to content

DataPreprocessor

Bases: ABC, FromConfigMixin

Abstract class for data preprocessors

Data preprocessors are callables that preprocess data. The data preprocessor is used by the dataset to preprocess the fetched data for each tile.

Currently implemented data preprocessors
  • CompositePreprocessor: Composes multiple data preprocessors
  • NormalizePreprocessor: Applies min-max normalization
  • StandardizePreprocessor: Applies standardization

__call__ abstractmethod

Preprocesses the data.

PARAMETER DESCRIPTION
data

data

TYPE: npt.NDArray

RETURNS DESCRIPTION
npt.NDArray

preprocessed data


CompositePreprocessor

Bases: DataPreprocessor

Data preprocessor that composes multiple data preprocessors

PARAMETER DESCRIPTION
data_preprocessors

data preprocessors

TYPE: list[DataPreprocessor]

from_config classmethod

Creates a composite preprocessor from the configuration.

PARAMETER DESCRIPTION
config

configuration

TYPE: CompositePreprocessorConfig

RETURNS DESCRIPTION
CompositePreprocessor

composite preprocessor

__call__

Preprocesses the data with each data preprocessor.

PARAMETER DESCRIPTION
data

data

TYPE: npt.NDArray

RETURNS DESCRIPTION
npt.NDArray

preprocessed data


CompositePreprocessorConfig

Bases: pydantic.BaseModel

Configuration for the from_config class method of CompositePreprocessor

ATTRIBUTE DESCRIPTION
data_preprocessors_configs

configurations of the data preprocessors

TYPE: list[DataPreprocessorConfig]


DataPreprocessorConfig

Bases: pydantic.BaseModel

Configuration for data preprocessors

ATTRIBUTE DESCRIPTION
name

name of the data preprocessor

TYPE: str

config

configuration of the data preprocessor

TYPE: NormalizePreprocessorConfig | StandardizePreprocessorConfig


NormalizePreprocessor

Bases: DataPreprocessor

Data preprocessor that applies min-max normalization

Examples:

Assume the data is a 3-channel image of data type uint8.

You can scale the data to a range of 0 to 1 by normalizing the data.

>>> normalize_preprocessor = NormalizePreprocessor(
...     min_values=[0.] * 3,
...     max_values=[255.] * 3,
... )
>>> preprocessed_data = normalize_preprocessor(data)
PARAMETER DESCRIPTION
min_values

minimum values of the data (per channel)

TYPE: list[float]

max_values

maximum values of the data (per channel)

TYPE: list[float]

from_config classmethod

Creates a normalize preprocessor from the configuration.

PARAMETER DESCRIPTION
config

configuration

TYPE: NormalizePreprocessorConfig

RETURNS DESCRIPTION
NormalizePreprocessor

normalize preprocessor

__call__

Preprocesses the data by applying min-max normalization.

PARAMETER DESCRIPTION
data

data

TYPE: npt.NDArray

RETURNS DESCRIPTION
npt.NDArray[np.float32]

preprocessed data


NormalizePreprocessorConfig

Bases: pydantic.BaseModel

Configuration for the from_config class method of NormalizePreprocessor

ATTRIBUTE DESCRIPTION
min_values

minimum values of the data (per channel)

TYPE: list[float]

max_values

maximum values of the data (per channel)

TYPE: list[float]


StandardizePreprocessor

Bases: DataPreprocessor

Data preprocessor that applies standardization

Examples:

Assume the data is a 3-channel image of data type float32.

You can scale the data to have a mean of 0 and a standard deviation of 1 by standardizing the data. In this example, the mean and standard deviation values from the ImageNet dataset are used.

>>> standardize_preprocessor = StandardizePreprocessor(
...     mean_values=[.485, .456, .406],
...     std_values=[.229, .224, .225],
... )
>>> preprocessed_data = standardize_preprocessor(data)
PARAMETER DESCRIPTION
mean_values

mean values of the data (per channel)

TYPE: list[float]

std_values

standard deviation values of the data (per channel)

TYPE: list[float]

from_config classmethod

Creates a standardize preprocessor from the configuration.

PARAMETER DESCRIPTION
config

configuration

TYPE: StandardizePreprocessorConfig

RETURNS DESCRIPTION
StandardizePreprocessor

standardize preprocessor

__call__

Preprocesses the data by applying standardization.

PARAMETER DESCRIPTION
data

data

TYPE: npt.NDArray

RETURNS DESCRIPTION
npt.NDArray[np.float32]

preprocessed data


StandardizePreprocessorConfig

Bases: pydantic.BaseModel

Configuration for the from_config class method of StandardizePreprocessor

ATTRIBUTE DESCRIPTION
mean_values

mean values of the data (per channel)

TYPE: list[float]

std_values

standard deviation values of the data (per channel)

TYPE: list[float]