sklearn_based_random_forest_tabpfn ¶

Random Forest implementation that uses TabPFN at the leaf nodes.

RandomForestTabPFNBase ¶

Base Class for common functionalities.

fit ¶

fit(X: ndarray, y: ndarray, sample_weight: ndarray = None)

Fits RandomForestTabPFN.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature training data	required
`y`	`ndarray`	Label training data	required
`sample_weight`	`ndarray`	Weights of each sample	`None`

Returns:

Type	Description
	Fitted model

Raises:

Type	Description
`ValueError`	If n_estimators is not positive
`ValueError`	If tabpfn is None
`TypeError`	If tabpfn is not of the expected type

get_n_estimators ¶

get_n_estimators(X: ndarray) -> int

Get the number of estimators to use.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Input features	required

Returns:

Type	Description
`int`	Number of estimators

RandomForestTabPFNClassifier ¶

Bases: RandomForestTabPFNBase, RandomForestClassifier

RandomForestTabPFNClassifier implements Random Forest using TabPFN at leaf nodes.

This classifier combines decision trees with TabPFN models at the leaf nodes for improved performance on tabular data. It extends scikit-learn's RandomForestClassifier with TabPFN's neural network capabilities.

Parameters:

Name	Description	Default
`tabpfn`	TabPFNClassifier instance to use at leaf nodes	`None`
`n_jobs`	Number of parallel jobs	`1`
`categorical_features`	List of categorical feature indices	`None`
`show_progress`	Whether to display progress during fitting	`False`
`verbose`	Verbosity level (0=quiet, >0=verbose)	`0`
`adaptive_tree`	Whether to use adaptive tree-based method	`True`
`fit_nodes`	Whether to fit the leaf node models	`True`
`adaptive_tree_overwrite_metric`	Metric used for adaptive node fitting	`'log_loss'`
`adaptive_tree_test_size`	Test size for adaptive node fitting	`0.2`
`adaptive_tree_min_train_samples`	Minimum samples for training leaf nodes	`100`
`adaptive_tree_max_train_samples`	Maximum samples for training leaf nodes	`5000`
`adaptive_tree_min_valid_samples_fraction_of_train`	Min fraction of validation samples	`0.2`
`preprocess_X_once`	Whether to preprocess X only once	`False`
`max_predict_time`	Maximum time allowed for prediction (seconds)	`60`
`rf_average_logits`	Whether to average logits instead of probabilities	`True`
`dt_average_logits`	Whether to average logits in decision trees	`True`
`adaptive_tree_skip_class_missing`	Whether to skip classes missing in nodes	`True`
`n_estimators`	Number of trees in the forest	`100`
`criterion`	Function to measure split quality	`'gini'`
`max_depth`	Maximum depth of the trees	`5`
`min_samples_split`	Minimum samples required to split a node	`1000`
`min_samples_leaf`	Minimum samples required at a leaf node	`5`
`min_weight_fraction_leaf`	Minimum weighted fraction of sum total	`0.0`
`max_features`	Number of features to consider for best split	`'sqrt'`
`max_leaf_nodes`	Maximum number of leaf nodes	`None`
`min_impurity_decrease`	Minimum impurity decrease required for split	`0.0`
`bootstrap`	Whether to use bootstrap samples	`True`
`oob_score`	Whether to use out-of-bag samples	`False`
`random_state`	Controls randomness of the estimator	`None`
`warm_start`	Whether to reuse previous solution	`False`
`class_weight`	Weights associated with classes	`None`
`ccp_alpha`	Complexity parameter for minimal cost-complexity pruning	`0.0`
`max_samples`	Number of samples to draw to train each tree	`None`

fit ¶

fit(X: ndarray, y: ndarray, sample_weight: ndarray = None)

Fits RandomForestTabPFN.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature training data	required
`y`	`ndarray`	Label training data	required
`sample_weight`	`ndarray`	Weights of each sample	`None`

Returns:

Type	Description
	Fitted model

Raises:

Type	Description
`ValueError`	If n_estimators is not positive
`ValueError`	If tabpfn is None
`TypeError`	If tabpfn is not of the expected type

get_n_estimators ¶

get_n_estimators(X: ndarray) -> int

Get the number of estimators to use.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Input features	required

Returns:

Type	Description
`int`	Number of estimators

init_base_estimator ¶

init_base_estimator()

Initialize a base decision tree estimator.

Returns:

Type	Description
	A new DecisionTreeTabPFNClassifier instance

predict ¶

predict(X: ndarray) -> ndarray

Predict class for X.

The predicted class of an input sample is a vote by the trees in the forest, weighted by their probability estimates. That is, the predicted class is the one with highest mean probability estimate across the trees.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	{array-like, sparse matrix} of shape (n_samples, n_features) The input samples.	required

Returns:

Name	Type	Description
`y`	`ndarray`	ndarray of shape (n_samples,) The predicted classes.

Raises:

Type	Description
`ValueError`	If model is not fitted

predict_proba ¶

predict_proba(X: ndarray) -> ndarray

Predict class probabilities for X.

The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	{array-like, sparse matrix} of shape (n_samples, n_features) The input samples.	required

Returns:

Name	Type	Description
`p`	`ndarray`	ndarray of shape (n_samples, n_classes) The class probabilities of the input samples.

Raises:

Type	Description
`ValueError`	If model is not fitted

RandomForestTabPFNRegressor ¶

Bases: RandomForestTabPFNBase, RandomForestRegressor

RandomForestTabPFNRegressor implements a Random Forest using TabPFN at leaf nodes.

This regressor combines decision trees with TabPFN models at the leaf nodes for improved regression performance on tabular data. It extends scikit-learn's RandomForestRegressor with TabPFN's neural network capabilities.

Parameters:

Name	Description	Default
`tabpfn`	TabPFNRegressor instance to use at leaf nodes	`None`
`n_jobs`	Number of parallel jobs	`1`
`categorical_features`	List of categorical feature indices	`None`
`show_progress`	Whether to display progress during fitting	`False`
`verbose`	Verbosity level (0=quiet, >0=verbose)	`0`
`adaptive_tree`	Whether to use adaptive tree-based method	`True`
`fit_nodes`	Whether to fit the leaf node models	`True`
`adaptive_tree_overwrite_metric`	Metric used for adaptive node fitting	`'rmse'`
`adaptive_tree_test_size`	Test size for adaptive node fitting	`0.2`
`adaptive_tree_min_train_samples`	Minimum samples for training leaf nodes	`100`
`adaptive_tree_max_train_samples`	Maximum samples for training leaf nodes	`5000`
`adaptive_tree_min_valid_samples_fraction_of_train`	Min fraction of validation samples	`0.2`
`preprocess_X_once`	Whether to preprocess X only once	`False`
`max_predict_time`	Maximum time allowed for prediction (seconds)	`-1`
`rf_average_logits`	Whether to average logits instead of raw predictions	`False`
`n_estimators`	Number of trees in the forest	`16`
`criterion`	Function to measure split quality	`'friedman_mse'`
`max_depth`	Maximum depth of the trees	`5`
`min_samples_split`	Minimum samples required to split a node	`300`
`min_samples_leaf`	Minimum samples required at a leaf node	`5`
`min_weight_fraction_leaf`	Minimum weighted fraction of sum total	`0.0`
`max_features`	Number of features to consider for best split	`'sqrt'`
`max_leaf_nodes`	Maximum number of leaf nodes	`None`
`min_impurity_decrease`	Minimum impurity decrease required for split	`0.0`
`bootstrap`	Whether to use bootstrap samples	`True`
`oob_score`	Whether to use out-of-bag samples	`False`
`random_state`	Controls randomness of the estimator	`None`
`warm_start`	Whether to reuse previous solution	`False`
`ccp_alpha`	Complexity parameter for minimal cost-complexity pruning	`0.0`
`max_samples`	Number of samples to draw to train each tree	`None`

fit ¶

fit(X: ndarray, y: ndarray, sample_weight: ndarray = None)

Fits RandomForestTabPFN.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature training data	required
`y`	`ndarray`	Label training data	required
`sample_weight`	`ndarray`	Weights of each sample	`None`

Returns:

Type	Description
	Fitted model

Raises:

Type	Description
`ValueError`	If n_estimators is not positive
`ValueError`	If tabpfn is None
`TypeError`	If tabpfn is not of the expected type

get_n_estimators ¶

get_n_estimators(X: ndarray) -> int

Get the number of estimators to use.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Input features	required

Returns:

Type	Description
`int`	Number of estimators

init_base_estimator ¶

init_base_estimator()

Initialize a base decision tree estimator.

Returns:

Type	Description
	A new DecisionTreeTabPFNRegressor instance

predict ¶

predict(X: ndarray) -> ndarray

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the trees in the forest.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	{array-like, sparse matrix} of shape (n_samples, n_features) The input samples.	required

Returns:

Name	Type	Description
`y`	`ndarray`	ndarray of shape (n_samples,) or (n_samples, n_outputs) The predicted values.

Raises:

Type	Description
`ValueError`	If model is not fitted

softmax_numpy ¶

softmax_numpy(logits: ndarray) -> ndarray

Apply softmax to numpy array of logits.

Parameters:

Name	Type	Description	Default
`logits`	`ndarray`	Input logits array	required

Returns:

Type	Description
`ndarray`	Probabilities after softmax