TabPFNRegressor ¶
Bases: BaseEstimator
, RegressorMixin
, TabPFNModelSelection
__init__ ¶
__init__(
model: str = "default",
n_estimators: int = 8,
preprocess_transforms: Tuple[
PreprocessorConfig, ...
] = (
PreprocessorConfig(
"quantile_uni",
append_original=True,
categorical_name="ordinal_very_common_categories_shuffled",
global_transformer_name="svd",
),
PreprocessorConfig(
"safepower", categorical_name="onehot"
),
),
feature_shift_decoder: str = "shuffle",
normalize_with_test: bool = False,
average_logits: bool = False,
optimize_metric: Literal[
"mse",
"rmse",
"mae",
"r2",
"mean",
"median",
"mode",
"exact_match",
None,
] = "rmse",
transformer_predict_kwargs: Optional[Dict] = None,
softmax_temperature: Optional[float] = -0.1,
use_poly_features=False,
max_poly_features=50,
remove_outliers=-1,
regression_y_preprocess_transforms: Optional[
Tuple[
None
| Literal[
"safepower", "power", "quantile_norm"
],
...,
]
] = (None, "safepower"),
add_fingerprint_features: bool = True,
cancel_nan_borders: bool = True,
super_bar_dist_averaging: bool = False,
subsample_samples: float = -1,
)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
str
|
The model string is the path to the model. |
'default'
|
n_estimators |
int
|
The number of ensemble configurations to use, the most important setting. |
8
|
preprocess_transforms |
Tuple[PreprocessorConfig, ...]
|
A tuple of strings, specifying the preprocessing steps to use.
You can use the following strings as elements '(none|power|quantile_norm|quantile_uni|quantile_uni_coarse|robust...)[_all][_and_none]', where the first
part specifies the preprocessing step (see |
(PreprocessorConfig('quantile_uni', append_original=True, categorical_name='ordinal_very_common_categories_shuffled', global_transformer_name='svd'), PreprocessorConfig('safepower', categorical_name='onehot'))
|
feature_shift_decoder |
str
|
["shuffle", "none", "local_shuffle", "rotate", "auto_rotate"] Whether to shift features for each ensemble configuration. |
'shuffle'
|
normalize_with_test |
bool
|
If True, the test set is used to normalize the data, otherwise the training set is used only. |
False
|
average_logits |
bool
|
Whether to average logits or probabilities for ensemble members. |
False
|
optimize_metric |
Literal['mse', 'rmse', 'mae', 'r2', 'mean', 'median', 'mode', 'exact_match', None]
|
The optimization metric to use. |
'rmse'
|
transformer_predict_kwargs |
Optional[Dict]
|
Additional keyword arguments to pass to the transformer predict method. |
None
|
softmax_temperature |
Optional[float]
|
A log spaced temperature, it will be applied as logits <- logits/exp(softmax_temperature). |
-0.1
|
use_poly_features |
Whether to use polynomial features as the last preprocessing step. |
False
|
|
max_poly_features |
Maximum number of polynomial features to use, None means unlimited. |
50
|
|
remove_outliers |
If not 0.0, will remove outliers from the input features, where values with a standard deviation larger than remove_outliers will be removed. |
-1
|
|
regression_y_preprocess_transforms |
Optional[Tuple[None | Literal['safepower', 'power', 'quantile_norm'], ...]]
|
Preprocessing transforms for the target variable. This can be one from |
(None, 'safepower')
|
add_fingerprint_features |
bool
|
If True, will add one feature of random values, that will be added to the input features. This helps discern duplicated samples in the transformer model. |
True
|
cancel_nan_borders |
bool
|
Whether to ignore buckets that are tranformed to nan values by inverting a |
True
|
super_bar_dist_averaging |
bool
|
If we use |
False
|
subsample_samples |
float
|
If not None, will use a random subset of the samples for training in each ensemble configuration. If 1 or above, this will subsample to the specified number of samples. If in 0 to 1, the value is viewed as a fraction of the training set size. |
-1
|