Skip to content

greedy_weighted_ensemble

GreedyWeightedEnsemble

Bases: AbstractValidationUtils

get_oof_per_estimator

get_oof_per_estimator(
    X: ndarray,
    y: ndarray,
    *,
    return_loss_per_estimator: bool = False,
    impute_dropped_instances: bool = True,
    _extra_processing: bool = False
) -> list[ndarray] | tuple[list[ndarray], list[float]]

Get OOF predictions for each base model.

Parameters:

Name Type Description Default
X ndarray

training data (features)

required
y ndarray

training labels

required
return_loss_per_estimator bool

if True, also return the loss per estimator.

False
impute_dropped_instances bool

if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).

True
_extra_processing bool
False

either only OOF predictions or OOF predictions and loss per estimator.

Type Description
list[ndarray] | tuple[list[ndarray], list[float]]

If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.

not_enough_time

not_enough_time(current_repeat: int) -> bool

Simple heuristic to stop cross-validation early if not enough time is left for another repeat.

Parameters:

Name Type Description Default
current_repeat int

The current repeat index

required

Returns:

Name Type Description
bool bool

True if there likely isn't enough time for another repeat, False otherwise

Note

This is a heuristic based on average time per repeat so far and may not be exact.

set_time_limit

set_time_limit() -> None

Initialize the timer for time-limited execution.

Sets the start time for time limit tracking and logs the time limit info. This method should be called at the beginning of validation.

time_limit_reached

time_limit_reached() -> bool

Check if the time limit for execution has been reached.

Returns:

Name Type Description
bool bool

True if the time limit has been reached, False otherwise or if no time limit was set

GreedyWeightedEnsembleClassifier

Bases: GreedyWeightedEnsemble, AbstractValidationUtilsClassification

get_oof_per_estimator

get_oof_per_estimator(
    X: ndarray,
    y: ndarray,
    *,
    return_loss_per_estimator: bool = False,
    impute_dropped_instances: bool = True,
    _extra_processing: bool = False
) -> list[ndarray] | tuple[list[ndarray], list[float]]

Get OOF predictions for each base model.

Parameters:

Name Type Description Default
X ndarray

training data (features)

required
y ndarray

training labels

required
return_loss_per_estimator bool

if True, also return the loss per estimator.

False
impute_dropped_instances bool

if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).

True
_extra_processing bool
False

either only OOF predictions or OOF predictions and loss per estimator.

Type Description
list[ndarray] | tuple[list[ndarray], list[float]]

If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.

not_enough_time

not_enough_time(current_repeat: int) -> bool

Simple heuristic to stop cross-validation early if not enough time is left for another repeat.

Parameters:

Name Type Description Default
current_repeat int

The current repeat index

required

Returns:

Name Type Description
bool bool

True if there likely isn't enough time for another repeat, False otherwise

Note

This is a heuristic based on average time per repeat so far and may not be exact.

set_time_limit

set_time_limit() -> None

Initialize the timer for time-limited execution.

Sets the start time for time limit tracking and logs the time limit info. This method should be called at the beginning of validation.

time_limit_reached

time_limit_reached() -> bool

Check if the time limit for execution has been reached.

Returns:

Name Type Description
bool bool

True if the time limit has been reached, False otherwise or if no time limit was set

GreedyWeightedEnsembleRegressor

Bases: GreedyWeightedEnsemble, AbstractValidationUtilsRegression

get_oof_per_estimator

get_oof_per_estimator(
    X: ndarray,
    y: ndarray,
    *,
    return_loss_per_estimator: bool = False,
    impute_dropped_instances: bool = True,
    _extra_processing: bool = False
) -> list[ndarray] | tuple[list[ndarray], list[float]]

Get OOF predictions for each base model.

Parameters:

Name Type Description Default
X ndarray

training data (features)

required
y ndarray

training labels

required
return_loss_per_estimator bool

if True, also return the loss per estimator.

False
impute_dropped_instances bool

if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).

True
_extra_processing bool
False

either only OOF predictions or OOF predictions and loss per estimator.

Type Description
list[ndarray] | tuple[list[ndarray], list[float]]

If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.

not_enough_time

not_enough_time(current_repeat: int) -> bool

Simple heuristic to stop cross-validation early if not enough time is left for another repeat.

Parameters:

Name Type Description Default
current_repeat int

The current repeat index

required

Returns:

Name Type Description
bool bool

True if there likely isn't enough time for another repeat, False otherwise

Note

This is a heuristic based on average time per repeat so far and may not be exact.

set_time_limit

set_time_limit() -> None

Initialize the timer for time-limited execution.

Sets the start time for time limit tracking and logs the time limit info. This method should be called at the beginning of validation.

time_limit_reached

time_limit_reached() -> bool

Check if the time limit for execution has been reached.

Returns:

Name Type Description
bool bool

True if the time limit has been reached, False otherwise or if no time limit was set

caruana_weighted

caruana_weighted(
    predictions: list[ndarray],
    labels: ndarray,
    seed,
    n_iterations,
    loss_function,
)

Caruana's ensemble selection with replacement.