preprocessing ¶
Defines the preprocessing configurations that define the ensembling of different members.
ClassifierEnsembleConfig
dataclass
¶
Bases: EnsembleConfig
Configuration for a classifier ensemble member.
See EnsembleConfig for more details.
generate_for_classification
classmethod
¶
generate_for_classification(
*,
n: int,
subsample_size: int | float | None,
max_index: int,
add_fingerprint_feature: bool,
polynomial_features: Literal["no", "all"] | int,
feature_shift_decoder: (
Literal["shuffle", "rotate"] | None
),
preprocessor_configs: Sequence[PreprocessorConfig],
class_shift_method: Literal["rotate", "shuffle"] | None,
n_classes: int,
random_state: int | Generator | None
) -> list[ClassifierEnsembleConfig]
Generate ensemble configurations for classification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
Number of ensemble configurations to generate. |
required |
subsample_size |
int | float | None
|
Number of samples to subsample. If int, subsample that many
samples. If float, subsample that fraction of samples. If |
required |
max_index |
int
|
Maximum index to generate for. |
required |
add_fingerprint_feature |
bool
|
Whether to add fingerprint features. |
required |
polynomial_features |
Literal['no', 'all'] | int
|
Maximum number of polynomial features to add, if any. |
required |
feature_shift_decoder |
Literal['shuffle', 'rotate'] | None
|
How shift features |
required |
preprocessor_configs |
Sequence[PreprocessorConfig]
|
Preprocessor configurations to use on the data. |
required |
class_shift_method |
Literal['rotate', 'shuffle'] | None
|
How to shift classes for classpermutation. |
required |
n_classes |
int
|
Number of classes. |
required |
random_state |
int | Generator | None
|
Random number generator. |
required |
Returns:
Type | Description |
---|---|
list[ClassifierEnsembleConfig]
|
List of ensemble configurations. |
generate_for_regression
classmethod
¶
generate_for_regression(
*,
n: int,
subsample_size: int | float | None,
max_index: int,
add_fingerprint_feature: bool,
polynomial_features: Literal["no", "all"] | int,
feature_shift_decoder: (
Literal["shuffle", "rotate"] | None
),
preprocessor_configs: Sequence[PreprocessorConfig],
target_transforms: Sequence[
TransformerMixin | Pipeline | None
],
random_state: int | Generator | None
) -> list[RegressorEnsembleConfig]
Generate ensemble configurations for regression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
Number of ensemble configurations to generate. |
required |
subsample_size |
int | float | None
|
Number of samples to subsample. If int, subsample that many
samples. If float, subsample that fraction of samples. If |
required |
max_index |
int
|
Maximum index to generate for. |
required |
add_fingerprint_feature |
bool
|
Whether to add fingerprint features. |
required |
polynomial_features |
Literal['no', 'all'] | int
|
Maximum number of polynomial features to add, if any. |
required |
feature_shift_decoder |
Literal['shuffle', 'rotate'] | None
|
How shift features |
required |
preprocessor_configs |
Sequence[PreprocessorConfig]
|
Preprocessor configurations to use on the data. |
required |
target_transforms |
Sequence[TransformerMixin | Pipeline | None]
|
Target transformations to apply. |
required |
random_state |
int | Generator | None
|
Random number generator. |
required |
Returns:
Type | Description |
---|---|
list[RegressorEnsembleConfig]
|
List of ensemble configurations. |
to_pipeline ¶
to_pipeline(
*, random_state: int | Generator | None
) -> SequentialFeatureTransformer
Convert the ensemble configuration to a preprocessing pipeline.
EnsembleConfig
dataclass
¶
Configuration for an ensemble member.
Attributes:
Name | Type | Description |
---|---|---|
feature_shift_count |
int
|
How much to shift the features columns. |
class_permutation |
int
|
Permutation to apply to classes |
preprocess_config |
PreprocessorConfig
|
Preprocessor configuration to use. |
subsample_ix |
NDArray[int64] | None
|
Indices of samples to use for this ensemble member.
If |
generate_for_classification
classmethod
¶
generate_for_classification(
*,
n: int,
subsample_size: int | float | None,
max_index: int,
add_fingerprint_feature: bool,
polynomial_features: Literal["no", "all"] | int,
feature_shift_decoder: (
Literal["shuffle", "rotate"] | None
),
preprocessor_configs: Sequence[PreprocessorConfig],
class_shift_method: Literal["rotate", "shuffle"] | None,
n_classes: int,
random_state: int | Generator | None
) -> list[ClassifierEnsembleConfig]
Generate ensemble configurations for classification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
Number of ensemble configurations to generate. |
required |
subsample_size |
int | float | None
|
Number of samples to subsample. If int, subsample that many
samples. If float, subsample that fraction of samples. If |
required |
max_index |
int
|
Maximum index to generate for. |
required |
add_fingerprint_feature |
bool
|
Whether to add fingerprint features. |
required |
polynomial_features |
Literal['no', 'all'] | int
|
Maximum number of polynomial features to add, if any. |
required |
feature_shift_decoder |
Literal['shuffle', 'rotate'] | None
|
How shift features |
required |
preprocessor_configs |
Sequence[PreprocessorConfig]
|
Preprocessor configurations to use on the data. |
required |
class_shift_method |
Literal['rotate', 'shuffle'] | None
|
How to shift classes for classpermutation. |
required |
n_classes |
int
|
Number of classes. |
required |
random_state |
int | Generator | None
|
Random number generator. |
required |
Returns:
Type | Description |
---|---|
list[ClassifierEnsembleConfig]
|
List of ensemble configurations. |
generate_for_regression
classmethod
¶
generate_for_regression(
*,
n: int,
subsample_size: int | float | None,
max_index: int,
add_fingerprint_feature: bool,
polynomial_features: Literal["no", "all"] | int,
feature_shift_decoder: (
Literal["shuffle", "rotate"] | None
),
preprocessor_configs: Sequence[PreprocessorConfig],
target_transforms: Sequence[
TransformerMixin | Pipeline | None
],
random_state: int | Generator | None
) -> list[RegressorEnsembleConfig]
Generate ensemble configurations for regression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
Number of ensemble configurations to generate. |
required |
subsample_size |
int | float | None
|
Number of samples to subsample. If int, subsample that many
samples. If float, subsample that fraction of samples. If |
required |
max_index |
int
|
Maximum index to generate for. |
required |
add_fingerprint_feature |
bool
|
Whether to add fingerprint features. |
required |
polynomial_features |
Literal['no', 'all'] | int
|
Maximum number of polynomial features to add, if any. |
required |
feature_shift_decoder |
Literal['shuffle', 'rotate'] | None
|
How shift features |
required |
preprocessor_configs |
Sequence[PreprocessorConfig]
|
Preprocessor configurations to use on the data. |
required |
target_transforms |
Sequence[TransformerMixin | Pipeline | None]
|
Target transformations to apply. |
required |
random_state |
int | Generator | None
|
Random number generator. |
required |
Returns:
Type | Description |
---|---|
list[RegressorEnsembleConfig]
|
List of ensemble configurations. |
to_pipeline ¶
to_pipeline(
*, random_state: int | Generator | None
) -> SequentialFeatureTransformer
Convert the ensemble configuration to a preprocessing pipeline.
PreprocessorConfig
dataclass
¶
Configuration for data preprocessors.
Attributes:
Name | Type | Description |
---|---|---|
name |
Literal['per_feature', 'power', 'safepower', 'power_box', 'safepower_box', 'quantile_uni_coarse', 'quantile_norm_coarse', 'quantile_uni', 'quantile_norm', 'quantile_uni_fine', 'quantile_norm_fine', 'robust', 'kdi', 'none', 'kdi_random_alpha', 'kdi_uni', 'kdi_random_alpha_uni', 'adaptive', 'norm_and_kdi', 'kdi_alpha_0.3_uni', 'kdi_alpha_0.5_uni', 'kdi_alpha_0.8_uni', 'kdi_alpha_1.0_uni', 'kdi_alpha_1.2_uni', 'kdi_alpha_1.5_uni', 'kdi_alpha_2.0_uni', 'kdi_alpha_3.0_uni', 'kdi_alpha_5.0_uni', 'kdi_alpha_0.3', 'kdi_alpha_0.5', 'kdi_alpha_0.8', 'kdi_alpha_1.0', 'kdi_alpha_1.2', 'kdi_alpha_1.5', 'kdi_alpha_2.0', 'kdi_alpha_3.0', 'kdi_alpha_5.0']
|
Name of the preprocessor. |
categorical_name |
Literal['none', 'numeric', 'onehot', 'ordinal', 'ordinal_shuffled', 'ordinal_very_common_categories_shuffled']
|
Name of the categorical encoding method. Options: "none", "numeric", "onehot", "ordinal", "ordinal_shuffled", "none". |
append_original |
bool
|
Whether to append original features to the transformed features |
subsample_features |
float
|
Fraction of features to subsample. -1 means no subsampling. |
global_transformer_name |
str | None
|
Name of the global transformer to use. |
RegressorEnsembleConfig
dataclass
¶
Bases: EnsembleConfig
Configuration for a regression ensemble member.
See EnsembleConfig for more details.
generate_for_classification
classmethod
¶
generate_for_classification(
*,
n: int,
subsample_size: int | float | None,
max_index: int,
add_fingerprint_feature: bool,
polynomial_features: Literal["no", "all"] | int,
feature_shift_decoder: (
Literal["shuffle", "rotate"] | None
),
preprocessor_configs: Sequence[PreprocessorConfig],
class_shift_method: Literal["rotate", "shuffle"] | None,
n_classes: int,
random_state: int | Generator | None
) -> list[ClassifierEnsembleConfig]
Generate ensemble configurations for classification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
Number of ensemble configurations to generate. |
required |
subsample_size |
int | float | None
|
Number of samples to subsample. If int, subsample that many
samples. If float, subsample that fraction of samples. If |
required |
max_index |
int
|
Maximum index to generate for. |
required |
add_fingerprint_feature |
bool
|
Whether to add fingerprint features. |
required |
polynomial_features |
Literal['no', 'all'] | int
|
Maximum number of polynomial features to add, if any. |
required |
feature_shift_decoder |
Literal['shuffle', 'rotate'] | None
|
How shift features |
required |
preprocessor_configs |
Sequence[PreprocessorConfig]
|
Preprocessor configurations to use on the data. |
required |
class_shift_method |
Literal['rotate', 'shuffle'] | None
|
How to shift classes for classpermutation. |
required |
n_classes |
int
|
Number of classes. |
required |
random_state |
int | Generator | None
|
Random number generator. |
required |
Returns:
Type | Description |
---|---|
list[ClassifierEnsembleConfig]
|
List of ensemble configurations. |
generate_for_regression
classmethod
¶
generate_for_regression(
*,
n: int,
subsample_size: int | float | None,
max_index: int,
add_fingerprint_feature: bool,
polynomial_features: Literal["no", "all"] | int,
feature_shift_decoder: (
Literal["shuffle", "rotate"] | None
),
preprocessor_configs: Sequence[PreprocessorConfig],
target_transforms: Sequence[
TransformerMixin | Pipeline | None
],
random_state: int | Generator | None
) -> list[RegressorEnsembleConfig]
Generate ensemble configurations for regression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
Number of ensemble configurations to generate. |
required |
subsample_size |
int | float | None
|
Number of samples to subsample. If int, subsample that many
samples. If float, subsample that fraction of samples. If |
required |
max_index |
int
|
Maximum index to generate for. |
required |
add_fingerprint_feature |
bool
|
Whether to add fingerprint features. |
required |
polynomial_features |
Literal['no', 'all'] | int
|
Maximum number of polynomial features to add, if any. |
required |
feature_shift_decoder |
Literal['shuffle', 'rotate'] | None
|
How shift features |
required |
preprocessor_configs |
Sequence[PreprocessorConfig]
|
Preprocessor configurations to use on the data. |
required |
target_transforms |
Sequence[TransformerMixin | Pipeline | None]
|
Target transformations to apply. |
required |
random_state |
int | Generator | None
|
Random number generator. |
required |
Returns:
Type | Description |
---|---|
list[RegressorEnsembleConfig]
|
List of ensemble configurations. |
to_pipeline ¶
to_pipeline(
*, random_state: int | Generator | None
) -> SequentialFeatureTransformer
Convert the ensemble configuration to a preprocessing pipeline.
balance ¶
Take a list of elements and make a new list where each appears n
times.
default_classifier_preprocessor_configs ¶
default_classifier_preprocessor_configs() -> (
list[PreprocessorConfig]
)
Default preprocessor configurations for classification.
default_regressor_preprocessor_configs ¶
default_regressor_preprocessor_configs() -> (
list[PreprocessorConfig]
)
Default preprocessor configurations for regression.
fit_preprocessing ¶
fit_preprocessing(
configs: Sequence[EnsembleConfig],
X_train: ndarray,
y_train: ndarray,
*,
random_state: int | Generator | None,
cat_ix: list[int],
n_workers: int,
parallel_mode: Literal["block", "as-ready", "in-order"]
) -> Iterator[
tuple[
EnsembleConfig,
SequentialFeatureTransformer,
ndarray,
ndarray,
list[int],
]
]
Fit preprocessing pipelines in parallel.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
configs |
Sequence[EnsembleConfig]
|
List of ensemble configurations. |
required |
X_train |
ndarray
|
Training data. |
required |
y_train |
ndarray
|
Training target. |
required |
random_state |
int | Generator | None
|
Random number generator. |
required |
cat_ix |
list[int]
|
Indices of categorical features. |
required |
n_workers |
int
|
Number of workers to use. |
required |
parallel_mode |
Literal['block', 'as-ready', 'in-order']
|
Parallel mode to use.
|
required |
Returns:
Type | Description |
---|---|
EnsembleConfig
|
Iterator of tuples containing the ensemble configuration, the fitted |
SequentialFeatureTransformer
|
preprocessing pipeline, the transformed training data, the transformed target, |
ndarray
|
and the indices of categorical features. |
fit_preprocessing_one ¶
fit_preprocessing_one(
config: EnsembleConfig,
X_train: ndarray,
y_train: ndarray,
random_state: int | Generator | None = None,
*,
cat_ix: list[int]
) -> tuple[
EnsembleConfig,
SequentialFeatureTransformer,
ndarray,
ndarray,
list[int],
]
Fit preprocessing pipeline for a single ensemble configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config |
EnsembleConfig
|
Ensemble configuration. |
required |
X_train |
ndarray
|
Training data. |
required |
y_train |
ndarray
|
Training target. |
required |
random_state |
int | Generator | None
|
Random seed. |
None
|
cat_ix |
list[int]
|
Indices of categorical features. |
required |
Returns:
Type | Description |
---|---|
EnsembleConfig
|
Tuple containing the ensemble configuration, the fitted preprocessing pipeline, |
SequentialFeatureTransformer
|
the transformed training data, the transformed target, and the indices of |
ndarray
|
categorical features. |
generate_index_permutations ¶
generate_index_permutations(
n: int,
*,
max_index: int,
subsample: int | float,
random_state: int | Generator | None
) -> list[NDArray[int64]]
Generate indices for subsampling from the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
Number of indices to generate. |
required |
max_index |
int
|
Maximum index to generate. |
required |
subsample |
int | float
|
Number of indices to subsample. If |
required |
random_state |
int | Generator | None
|
Random number generator. |
required |
Returns:
Type | Description |
---|---|
list[NDArray[int64]]
|
List of indices to subsample. |