Skip to content

Prior Labs

Greedy weighted ensemble

Jobs Business

greedy_weighted_ensemble ¶

GreedyWeightedEnsemble ¶

Bases: AbstractValidationUtils

get_oof_per_estimator ¶

get_oof_per_estimator(
    X: ndarray,
    y: ndarray,
    *,
    return_loss_per_estimator: bool = False,
    impute_dropped_instances: bool = True,
    _extra_processing: bool = False
) -> list[ndarray] | tuple[list[ndarray], list[float]]

Get OOF predictions for each base model.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	training data (features)	required
`y`	`ndarray`	training labels	required
`return_loss_per_estimator`	`bool`	if True, also return the loss per estimator.	`False`
`impute_dropped_instances`	`bool`	if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).	`True`
`_extra_processing`	`bool`		`False`

either only OOF predictions or OOF predictions and loss per estimator.

Type	Description
`list[ndarray] \| tuple[list[ndarray], list[float]]`	If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.

not_enough_time ¶

not_enough_time(current_repeat: int) -> bool

Simple heuristic to stop cross-validation early if not enough time is left for another repeat.

Parameters:

Name	Type	Description	Default
`current_repeat`	`int`	The current repeat index	required

Returns:

Name	Type	Description
`bool`	`bool`	True if there likely isn't enough time for another repeat, False otherwise

Note

This is a heuristic based on average time per repeat so far and may not be exact.

set_time_limit ¶

set_time_limit() -> None

Initialize the timer for time-limited execution.

Sets the start time for time limit tracking and logs the time limit info. This method should be called at the beginning of validation.

time_limit_reached ¶

time_limit_reached() -> bool

Check if the time limit for execution has been reached.

Returns:

Name	Type	Description
`bool`	`bool`	True if the time limit has been reached, False otherwise or if no time limit was set

GreedyWeightedEnsembleClassifier ¶

Bases: GreedyWeightedEnsemble, AbstractValidationUtilsClassification

get_oof_per_estimator ¶

get_oof_per_estimator(
    X: ndarray,
    y: ndarray,
    *,
    return_loss_per_estimator: bool = False,
    impute_dropped_instances: bool = True,
    _extra_processing: bool = False
) -> list[ndarray] | tuple[list[ndarray], list[float]]

Get OOF predictions for each base model.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	training data (features)	required
`y`	`ndarray`	training labels	required
`return_loss_per_estimator`	`bool`	if True, also return the loss per estimator.	`False`
`impute_dropped_instances`	`bool`	if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).	`True`
`_extra_processing`	`bool`		`False`

either only OOF predictions or OOF predictions and loss per estimator.

Type	Description
`list[ndarray] \| tuple[list[ndarray], list[float]]`	If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.

not_enough_time ¶

not_enough_time(current_repeat: int) -> bool

Simple heuristic to stop cross-validation early if not enough time is left for another repeat.

Parameters:

Name	Type	Description	Default
`current_repeat`	`int`	The current repeat index	required

Returns:

Name	Type	Description
`bool`	`bool`	True if there likely isn't enough time for another repeat, False otherwise

Note

This is a heuristic based on average time per repeat so far and may not be exact.

set_time_limit ¶

set_time_limit() -> None

Initialize the timer for time-limited execution.

Sets the start time for time limit tracking and logs the time limit info. This method should be called at the beginning of validation.

time_limit_reached ¶

time_limit_reached() -> bool

Check if the time limit for execution has been reached.

Returns:

Name	Type	Description
`bool`	`bool`	True if the time limit has been reached, False otherwise or if no time limit was set

GreedyWeightedEnsembleRegressor ¶

Bases: GreedyWeightedEnsemble, AbstractValidationUtilsRegression

get_oof_per_estimator ¶

get_oof_per_estimator(
    X: ndarray,
    y: ndarray,
    *,
    return_loss_per_estimator: bool = False,
    impute_dropped_instances: bool = True,
    _extra_processing: bool = False
) -> list[ndarray] | tuple[list[ndarray], list[float]]

Get OOF predictions for each base model.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	training data (features)	required
`y`	`ndarray`	training labels	required
`return_loss_per_estimator`	`bool`	if True, also return the loss per estimator.	`False`
`impute_dropped_instances`	`bool`	if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).	`True`
`_extra_processing`	`bool`		`False`

either only OOF predictions or OOF predictions and loss per estimator.

Type	Description
`list[ndarray] \| tuple[list[ndarray], list[float]]`	If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.

not_enough_time ¶

not_enough_time(current_repeat: int) -> bool

Simple heuristic to stop cross-validation early if not enough time is left for another repeat.

Parameters:

Name	Type	Description	Default
`current_repeat`	`int`	The current repeat index	required

Returns:

Name	Type	Description
`bool`	`bool`	True if there likely isn't enough time for another repeat, False otherwise

Note

This is a heuristic based on average time per repeat so far and may not be exact.

set_time_limit ¶

set_time_limit() -> None

Initialize the timer for time-limited execution.

Sets the start time for time limit tracking and logs the time limit info. This method should be called at the beginning of validation.

time_limit_reached ¶

time_limit_reached() -> bool

Check if the time limit for execution has been reached.

Returns:

Name	Type	Description
`bool`	`bool`	True if the time limit has been reached, False otherwise or if no time limit was set

caruana_weighted ¶

caruana_weighted(
    predictions: list[ndarray],
    labels: ndarray,
    seed,
    n_iterations,
    loss_function,
)

Caruana's ensemble selection with replacement.