Skip to content

sklearn_based_random_forest_tabpfn

Random Forest implementation that uses TabPFN at the leaf nodes.

RandomForestTabPFNBase

Base Class for common functionalities.

fit

fit(X: ndarray, y: ndarray, sample_weight: ndarray = None)

Fits RandomForestTabPFN.

Parameters:

Name Type Description Default
X ndarray

Feature training data

required
y ndarray

Label training data

required
sample_weight ndarray

Weights of each sample

None

Returns:

Type Description

Fitted model

Raises:

Type Description
ValueError

If n_estimators is not positive

ValueError

If tabpfn is None

TypeError

If tabpfn is not of the expected type

get_n_estimators

get_n_estimators(X: ndarray) -> int

Get the number of estimators to use.

Parameters:

Name Type Description Default
X ndarray

Input features

required

Returns:

Type Description
int

Number of estimators

RandomForestTabPFNClassifier

Bases: RandomForestTabPFNBase, RandomForestClassifier

RandomForestTabPFNClassifier implements Random Forest using TabPFN at leaf nodes.

This classifier combines decision trees with TabPFN models at the leaf nodes for improved performance on tabular data. It extends scikit-learn's RandomForestClassifier with TabPFN's neural network capabilities.

Parameters:

Name Type Description Default
tabpfn

TabPFNClassifier instance to use at leaf nodes

None
n_jobs

Number of parallel jobs

1
categorical_features

List of categorical feature indices

None
show_progress

Whether to display progress during fitting

False
verbose

Verbosity level (0=quiet, >0=verbose)

0
adaptive_tree

Whether to use adaptive tree-based method

True
fit_nodes

Whether to fit the leaf node models

True
adaptive_tree_overwrite_metric

Metric used for adaptive node fitting

'log_loss'
adaptive_tree_test_size

Test size for adaptive node fitting

0.2
adaptive_tree_min_train_samples

Minimum samples for training leaf nodes

100
adaptive_tree_max_train_samples

Maximum samples for training leaf nodes

5000
adaptive_tree_min_valid_samples_fraction_of_train

Min fraction of validation samples

0.2
preprocess_X_once

Whether to preprocess X only once

False
max_predict_time

Maximum time allowed for prediction (seconds)

60
rf_average_logits

Whether to average logits instead of probabilities

True
dt_average_logits

Whether to average logits in decision trees

True
adaptive_tree_skip_class_missing

Whether to skip classes missing in nodes

True
n_estimators

Number of trees in the forest

100
criterion

Function to measure split quality

'gini'
max_depth

Maximum depth of the trees

5
min_samples_split

Minimum samples required to split a node

1000
min_samples_leaf

Minimum samples required at a leaf node

5
min_weight_fraction_leaf

Minimum weighted fraction of sum total

0.0
max_features

Number of features to consider for best split

'sqrt'
max_leaf_nodes

Maximum number of leaf nodes

None
min_impurity_decrease

Minimum impurity decrease required for split

0.0
bootstrap

Whether to use bootstrap samples

True
oob_score

Whether to use out-of-bag samples

False
random_state

Controls randomness of the estimator

None
warm_start

Whether to reuse previous solution

False
class_weight

Weights associated with classes

None
ccp_alpha

Complexity parameter for minimal cost-complexity pruning

0.0
max_samples

Number of samples to draw to train each tree

None

fit

fit(X: ndarray, y: ndarray, sample_weight: ndarray = None)

Fits RandomForestTabPFN.

Parameters:

Name Type Description Default
X ndarray

Feature training data

required
y ndarray

Label training data

required
sample_weight ndarray

Weights of each sample

None

Returns:

Type Description

Fitted model

Raises:

Type Description
ValueError

If n_estimators is not positive

ValueError

If tabpfn is None

TypeError

If tabpfn is not of the expected type

get_n_estimators

get_n_estimators(X: ndarray) -> int

Get the number of estimators to use.

Parameters:

Name Type Description Default
X ndarray

Input features

required

Returns:

Type Description
int

Number of estimators

init_base_estimator

init_base_estimator()

Initialize a base decision tree estimator.

Returns:

Type Description

A new DecisionTreeTabPFNClassifier instance

predict

predict(X: ndarray) -> ndarray

Predict class for X.

The predicted class of an input sample is a vote by the trees in the forest, weighted by their probability estimates. That is, the predicted class is the one with highest mean probability estimate across the trees.

Parameters:

Name Type Description Default
X ndarray

{array-like, sparse matrix} of shape (n_samples, n_features) The input samples.

required

Returns:

Name Type Description
y ndarray

ndarray of shape (n_samples,) The predicted classes.

Raises:

Type Description
ValueError

If model is not fitted

predict_proba

predict_proba(X: ndarray) -> ndarray

Predict class probabilities for X.

The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest.

Parameters:

Name Type Description Default
X ndarray

{array-like, sparse matrix} of shape (n_samples, n_features) The input samples.

required

Returns:

Name Type Description
p ndarray

ndarray of shape (n_samples, n_classes) The class probabilities of the input samples.

Raises:

Type Description
ValueError

If model is not fitted

RandomForestTabPFNRegressor

Bases: RandomForestTabPFNBase, RandomForestRegressor

RandomForestTabPFNRegressor implements a Random Forest using TabPFN at leaf nodes.

This regressor combines decision trees with TabPFN models at the leaf nodes for improved regression performance on tabular data. It extends scikit-learn's RandomForestRegressor with TabPFN's neural network capabilities.

Parameters:

Name Type Description Default
tabpfn

TabPFNRegressor instance to use at leaf nodes

None
n_jobs

Number of parallel jobs

1
categorical_features

List of categorical feature indices

None
show_progress

Whether to display progress during fitting

False
verbose

Verbosity level (0=quiet, >0=verbose)

0
adaptive_tree

Whether to use adaptive tree-based method

True
fit_nodes

Whether to fit the leaf node models

True
adaptive_tree_overwrite_metric

Metric used for adaptive node fitting

'rmse'
adaptive_tree_test_size

Test size for adaptive node fitting

0.2
adaptive_tree_min_train_samples

Minimum samples for training leaf nodes

100
adaptive_tree_max_train_samples

Maximum samples for training leaf nodes

5000
adaptive_tree_min_valid_samples_fraction_of_train

Min fraction of validation samples

0.2
preprocess_X_once

Whether to preprocess X only once

False
max_predict_time

Maximum time allowed for prediction (seconds)

-1
rf_average_logits

Whether to average logits instead of raw predictions

False
n_estimators

Number of trees in the forest

16
criterion

Function to measure split quality

'friedman_mse'
max_depth

Maximum depth of the trees

5
min_samples_split

Minimum samples required to split a node

300
min_samples_leaf

Minimum samples required at a leaf node

5
min_weight_fraction_leaf

Minimum weighted fraction of sum total

0.0
max_features

Number of features to consider for best split

'sqrt'
max_leaf_nodes

Maximum number of leaf nodes

None
min_impurity_decrease

Minimum impurity decrease required for split

0.0
bootstrap

Whether to use bootstrap samples

True
oob_score

Whether to use out-of-bag samples

False
random_state

Controls randomness of the estimator

None
warm_start

Whether to reuse previous solution

False
ccp_alpha

Complexity parameter for minimal cost-complexity pruning

0.0
max_samples

Number of samples to draw to train each tree

None

fit

fit(X: ndarray, y: ndarray, sample_weight: ndarray = None)

Fits RandomForestTabPFN.

Parameters:

Name Type Description Default
X ndarray

Feature training data

required
y ndarray

Label training data

required
sample_weight ndarray

Weights of each sample

None

Returns:

Type Description

Fitted model

Raises:

Type Description
ValueError

If n_estimators is not positive

ValueError

If tabpfn is None

TypeError

If tabpfn is not of the expected type

get_n_estimators

get_n_estimators(X: ndarray) -> int

Get the number of estimators to use.

Parameters:

Name Type Description Default
X ndarray

Input features

required

Returns:

Type Description
int

Number of estimators

init_base_estimator

init_base_estimator()

Initialize a base decision tree estimator.

Returns:

Type Description

A new DecisionTreeTabPFNRegressor instance

predict

predict(X: ndarray) -> ndarray

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the trees in the forest.

Parameters:

Name Type Description Default
X ndarray

{array-like, sparse matrix} of shape (n_samples, n_features) The input samples.

required

Returns:

Name Type Description
y ndarray

ndarray of shape (n_samples,) or (n_samples, n_outputs) The predicted values.

Raises:

Type Description
ValueError

If model is not fitted

softmax_numpy

softmax_numpy(logits: ndarray) -> ndarray

Apply softmax to numpy array of logits.

Parameters:

Name Type Description Default
logits ndarray

Input logits array

required

Returns:

Type Description
ndarray

Probabilities after softmax