sklearn_based_random_forest_tabpfn ¶
Random Forest implementation that uses TabPFN at the leaf nodes.
RandomForestTabPFNBase ¶
Base Class for common functionalities.
fit ¶
Fits RandomForestTabPFN.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Feature training data |
required |
y |
ndarray
|
Label training data |
required |
sample_weight |
ndarray
|
Weights of each sample |
None
|
Returns:
Type | Description |
---|---|
Fitted model |
Raises:
Type | Description |
---|---|
ValueError
|
If n_estimators is not positive |
ValueError
|
If tabpfn is None |
TypeError
|
If tabpfn is not of the expected type |
get_n_estimators ¶
Get the number of estimators to use.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Input features |
required |
Returns:
Type | Description |
---|---|
int
|
Number of estimators |
RandomForestTabPFNClassifier ¶
Bases: RandomForestTabPFNBase
, RandomForestClassifier
RandomForestTabPFNClassifier implements Random Forest using TabPFN at leaf nodes.
This classifier combines decision trees with TabPFN models at the leaf nodes for improved performance on tabular data. It extends scikit-learn's RandomForestClassifier with TabPFN's neural network capabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tabpfn |
TabPFNClassifier instance to use at leaf nodes |
None
|
|
n_jobs |
Number of parallel jobs |
1
|
|
categorical_features |
List of categorical feature indices |
None
|
|
show_progress |
Whether to display progress during fitting |
False
|
|
verbose |
Verbosity level (0=quiet, >0=verbose) |
0
|
|
adaptive_tree |
Whether to use adaptive tree-based method |
True
|
|
fit_nodes |
Whether to fit the leaf node models |
True
|
|
adaptive_tree_overwrite_metric |
Metric used for adaptive node fitting |
'log_loss'
|
|
adaptive_tree_test_size |
Test size for adaptive node fitting |
0.2
|
|
adaptive_tree_min_train_samples |
Minimum samples for training leaf nodes |
100
|
|
adaptive_tree_max_train_samples |
Maximum samples for training leaf nodes |
5000
|
|
adaptive_tree_min_valid_samples_fraction_of_train |
Min fraction of validation samples |
0.2
|
|
preprocess_X_once |
Whether to preprocess X only once |
False
|
|
max_predict_time |
Maximum time allowed for prediction (seconds) |
60
|
|
rf_average_logits |
Whether to average logits instead of probabilities |
True
|
|
dt_average_logits |
Whether to average logits in decision trees |
True
|
|
adaptive_tree_skip_class_missing |
Whether to skip classes missing in nodes |
True
|
|
n_estimators |
Number of trees in the forest |
100
|
|
criterion |
Function to measure split quality |
'gini'
|
|
max_depth |
Maximum depth of the trees |
5
|
|
min_samples_split |
Minimum samples required to split a node |
1000
|
|
min_samples_leaf |
Minimum samples required at a leaf node |
5
|
|
min_weight_fraction_leaf |
Minimum weighted fraction of sum total |
0.0
|
|
max_features |
Number of features to consider for best split |
'sqrt'
|
|
max_leaf_nodes |
Maximum number of leaf nodes |
None
|
|
min_impurity_decrease |
Minimum impurity decrease required for split |
0.0
|
|
bootstrap |
Whether to use bootstrap samples |
True
|
|
oob_score |
Whether to use out-of-bag samples |
False
|
|
random_state |
Controls randomness of the estimator |
None
|
|
warm_start |
Whether to reuse previous solution |
False
|
|
class_weight |
Weights associated with classes |
None
|
|
ccp_alpha |
Complexity parameter for minimal cost-complexity pruning |
0.0
|
|
max_samples |
Number of samples to draw to train each tree |
None
|
fit ¶
Fits RandomForestTabPFN.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Feature training data |
required |
y |
ndarray
|
Label training data |
required |
sample_weight |
ndarray
|
Weights of each sample |
None
|
Returns:
Type | Description |
---|---|
Fitted model |
Raises:
Type | Description |
---|---|
ValueError
|
If n_estimators is not positive |
ValueError
|
If tabpfn is None |
TypeError
|
If tabpfn is not of the expected type |
get_n_estimators ¶
Get the number of estimators to use.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Input features |
required |
Returns:
Type | Description |
---|---|
int
|
Number of estimators |
init_base_estimator ¶
Initialize a base decision tree estimator.
Returns:
Type | Description |
---|---|
A new DecisionTreeTabPFNClassifier instance |
predict ¶
Predict class for X.
The predicted class of an input sample is a vote by the trees in the forest, weighted by their probability estimates. That is, the predicted class is the one with highest mean probability estimate across the trees.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
{array-like, sparse matrix} of shape (n_samples, n_features) The input samples. |
required |
Returns:
Name | Type | Description |
---|---|---|
y |
ndarray
|
ndarray of shape (n_samples,) The predicted classes. |
Raises:
Type | Description |
---|---|
ValueError
|
If model is not fitted |
predict_proba ¶
Predict class probabilities for X.
The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
{array-like, sparse matrix} of shape (n_samples, n_features) The input samples. |
required |
Returns:
Name | Type | Description |
---|---|---|
p |
ndarray
|
ndarray of shape (n_samples, n_classes) The class probabilities of the input samples. |
Raises:
Type | Description |
---|---|
ValueError
|
If model is not fitted |
RandomForestTabPFNRegressor ¶
Bases: RandomForestTabPFNBase
, RandomForestRegressor
RandomForestTabPFNRegressor implements a Random Forest using TabPFN at leaf nodes.
This regressor combines decision trees with TabPFN models at the leaf nodes for improved regression performance on tabular data. It extends scikit-learn's RandomForestRegressor with TabPFN's neural network capabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tabpfn |
TabPFNRegressor instance to use at leaf nodes |
None
|
|
n_jobs |
Number of parallel jobs |
1
|
|
categorical_features |
List of categorical feature indices |
None
|
|
show_progress |
Whether to display progress during fitting |
False
|
|
verbose |
Verbosity level (0=quiet, >0=verbose) |
0
|
|
adaptive_tree |
Whether to use adaptive tree-based method |
True
|
|
fit_nodes |
Whether to fit the leaf node models |
True
|
|
adaptive_tree_overwrite_metric |
Metric used for adaptive node fitting |
'rmse'
|
|
adaptive_tree_test_size |
Test size for adaptive node fitting |
0.2
|
|
adaptive_tree_min_train_samples |
Minimum samples for training leaf nodes |
100
|
|
adaptive_tree_max_train_samples |
Maximum samples for training leaf nodes |
5000
|
|
adaptive_tree_min_valid_samples_fraction_of_train |
Min fraction of validation samples |
0.2
|
|
preprocess_X_once |
Whether to preprocess X only once |
False
|
|
max_predict_time |
Maximum time allowed for prediction (seconds) |
-1
|
|
rf_average_logits |
Whether to average logits instead of raw predictions |
False
|
|
n_estimators |
Number of trees in the forest |
16
|
|
criterion |
Function to measure split quality |
'friedman_mse'
|
|
max_depth |
Maximum depth of the trees |
5
|
|
min_samples_split |
Minimum samples required to split a node |
300
|
|
min_samples_leaf |
Minimum samples required at a leaf node |
5
|
|
min_weight_fraction_leaf |
Minimum weighted fraction of sum total |
0.0
|
|
max_features |
Number of features to consider for best split |
'sqrt'
|
|
max_leaf_nodes |
Maximum number of leaf nodes |
None
|
|
min_impurity_decrease |
Minimum impurity decrease required for split |
0.0
|
|
bootstrap |
Whether to use bootstrap samples |
True
|
|
oob_score |
Whether to use out-of-bag samples |
False
|
|
random_state |
Controls randomness of the estimator |
None
|
|
warm_start |
Whether to reuse previous solution |
False
|
|
ccp_alpha |
Complexity parameter for minimal cost-complexity pruning |
0.0
|
|
max_samples |
Number of samples to draw to train each tree |
None
|
fit ¶
Fits RandomForestTabPFN.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Feature training data |
required |
y |
ndarray
|
Label training data |
required |
sample_weight |
ndarray
|
Weights of each sample |
None
|
Returns:
Type | Description |
---|---|
Fitted model |
Raises:
Type | Description |
---|---|
ValueError
|
If n_estimators is not positive |
ValueError
|
If tabpfn is None |
TypeError
|
If tabpfn is not of the expected type |
get_n_estimators ¶
Get the number of estimators to use.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Input features |
required |
Returns:
Type | Description |
---|---|
int
|
Number of estimators |
init_base_estimator ¶
Initialize a base decision tree estimator.
Returns:
Type | Description |
---|---|
A new DecisionTreeTabPFNRegressor instance |
predict ¶
Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the trees in the forest.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
{array-like, sparse matrix} of shape (n_samples, n_features) The input samples. |
required |
Returns:
Name | Type | Description |
---|---|---|
y |
ndarray
|
ndarray of shape (n_samples,) or (n_samples, n_outputs) The predicted values. |
Raises:
Type | Description |
---|---|
ValueError
|
If model is not fitted |
softmax_numpy ¶
Apply softmax to numpy array of logits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
logits |
ndarray
|
Input logits array |
required |
Returns:
Type | Description |
---|---|
ndarray
|
Probabilities after softmax |