utils ¶
A collection of random utilities for the TabPFN models.
infer_categorical_features ¶
infer_categorical_features(
X: ndarray,
*,
provided: Sequence[int] | None,
min_samples_for_inference: int,
max_unique_for_category: int,
min_unique_for_numerical: int
) -> list[int]
Infer the categorical features from the given data.
Note
This function may infer particular columns to not be categorical as defined by what suits the model predictions and it's pre-training.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
The data to infer the categorical features from. |
required |
provided |
Sequence[int] | None
|
Any user provided indices of what is considered categorical. |
required |
min_samples_for_inference |
int
|
The minimum number of samples required for automatic inference of features which were not provided as categorical. |
required |
max_unique_for_category |
int
|
The maximum number of unique values for a feature to be considered categorical. |
required |
min_unique_for_numerical |
int
|
The minimum number of unique values for a feature to be considered numerical. |
required |
Returns:
Type | Description |
---|---|
list[int]
|
The indices of inferred categorical features. |
infer_device_and_type ¶
Infer the device and data type from the given device string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device |
str | device | None
|
The device to infer the type from. |
required |
Returns:
Type | Description |
---|---|
device
|
The inferred device |
infer_fp16_inference_mode ¶
Infer whether fp16 inference should be enabled.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device |
device
|
The device to validate against. |
required |
enable |
bool | None
|
Whether it should be enabled, |
required |
Returns:
Type | Description |
---|---|
bool
|
Whether to use fp16 inference or not. |
Raises:
Type | Description |
---|---|
ValueError
|
If fp16 inference was enabled and device type does not support it. |
infer_random_state ¶
Infer the random state from the given input.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
random_state |
int | RandomState | Generator | None
|
The random state to infer. |
required |
Returns:
Type | Description |
---|---|
tuple[int, Generator]
|
A static integer seed and a random number generator. |
is_autocast_available ¶
Infer whether autocast is available for the given device type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device_type |
str
|
The device type to check for autocast availability. |
required |
Returns:
Type | Description |
---|---|
bool
|
Whether autocast is available for the given device type. |
load_model_criterion_config ¶
load_model_criterion_config(
model_path: None | str | Path,
*,
check_bar_distribution_criterion: bool,
cache_trainset_representation: bool,
which: Literal["regressor", "classifier"],
version: Literal["v2"] = "v2",
download: bool,
model_seed: int
) -> tuple[
PerFeatureTransformer,
BCEWithLogitsLoss
| CrossEntropyLoss
| FullSupportBarDistribution,
InferenceConfig,
]
Load the model, criterion, and config from the given path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_path |
None | str | Path
|
The path to the model. |
required |
check_bar_distribution_criterion |
bool
|
Whether to check if the criterion is a FullSupportBarDistribution, which is the expected criterion for models trained for regression. |
required |
cache_trainset_representation |
bool
|
Whether the model should know to cache the trainset representation. |
required |
which |
Literal['regressor', 'classifier']
|
Whether the model is a regressor or classifier. |
required |
version |
Literal['v2']
|
The version of the model. |
'v2'
|
download |
bool
|
Whether to download the model if it doesn't exist. |
required |
model_seed |
int
|
The seed of the model. |
required |
Returns:
Type | Description |
---|---|
tuple[PerFeatureTransformer, BCEWithLogitsLoss | CrossEntropyLoss | FullSupportBarDistribution, InferenceConfig]
|
The model, criterion, and config. |
translate_probs_across_borders ¶
Translate the probabilities across the borders.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
logits |
Tensor
|
The logits defining the distribution to translate. |
required |
frm |
Tensor
|
The borders to translate from. |
required |
to |
Tensor
|
The borders to translate to. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The translated probabilities. |
update_encoder_outlier_params ¶
update_encoder_outlier_params(
model: Module,
remove_outliers_std: float | None,
seed: int | None,
*,
inplace: Literal[True]
) -> None
Update the encoder to handle outliers in the model.
Warning
This only happens inplace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
Module
|
The model to update. |
required |
remove_outliers_std |
float | None
|
The standard deviation to remove outliers. |
required |
seed |
int | None
|
The seed to use, if any. |
required |
inplace |
Literal[True]
|
Whether to do the operation inplace. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If |
validate_X_predict ¶
validate_X_predict(
X: XType, estimator: TabPFNRegressor | TabPFNClassifier
) -> ndarray
Validate the input data for prediction.
validate_Xy_fit ¶
validate_Xy_fit(
X: XType,
y: YType,
estimator: TabPFNRegressor | TabPFNClassifier,
*,
max_num_features: int,
max_num_samples: int,
ensure_y_numeric: bool = False,
ignore_pretraining_limits: bool = False
) -> tuple[ndarray, ndarray, NDArray[Any] | None, int]
Validate the input data for fitting.