Skip to content

tabpfn_embedding

TabPFNEmbedding

TabPFNEmbedding is a utility for extracting embeddings from TabPFNClassifier or TabPFNRegressor models. It supports standard training (vanilla embedding) as well as K-fold cross-validation for embedding extraction.

  • When n_fold=0, the model extracts vanilla embeddings by training on the entire dataset.
  • When n_fold>0, K-fold cross-validation is applied based on the method proposed in "A Closer Look at TabPFN v2: Strength, Limitation, and Extension" (https://arxiv.org/abs/2502.17361), where a larger n_fold improves embedding effectiveness.

NOTE: This functionality requires the full TabPFN implementation (pip install tabpfn) and is not compatible with the TabPFN client (pip install tabpfn-client). The client version does not provide access to model embeddings.

Parameters:

Name Type Description Default
tabpfn_clf

TabPFNClassifier, optional An instance of TabPFNClassifier to handle classification tasks.

None
tabpfn_reg

TabPFNRegressor, optional An instance of TabPFNRegressor to handle regression tasks.

None
n_fold

int, default=0 Number of folds for K-fold cross-validation. If set to 0, standard training is used.

0

Attributes:

Name Type Description
model

TabPFNClassifier or TabPFNRegressor The model used for embedding extraction.

>>> from tabpfn_extensions import TabPFNClassifier  # Must use full TabPFN package
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.datasets import fetch_openml
>>> X, y = fetch_openml(name='kc1', version=1, as_frame=False, return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
>>> clf = TabPFNClassifier(n_estimators=1)
>>> embedding_extractor = TabPFNEmbedding(tabpfn_clf=clf, n_fold=0)
>>> train_embeddings = embedding_extractor.get_embeddings(X_train, y_train, X_test, data_source="train")
>>> test_embeddings = embedding_extractor.get_embeddings(X_train, y_train, X_test, data_source="test")

fit

fit(X_train: ndarray, y_train: ndarray) -> None

Trains the TabPFN model on the given dataset.

Parameters:

Name Type Description Default
X_train ndarray

Training feature data.

required
y_train ndarray

Training target labels.

required

Raises:

Type Description
ValueError

If no model is set before calling fit.

get_embeddings

get_embeddings(
    X_train: ndarray,
    y_train: ndarray,
    X: ndarray,
    data_source: str,
) -> ndarray

Extracts embeddings for the given dataset using the trained model.

Parameters:

Name Type Description Default
X_train ndarray

Training feature data.

required
y_train ndarray

Training target labels.

required
X ndarray

Data for which embeddings are to be extracted.

required
data_source str

Specifies the data source ("test" for test data).

required

Returns:

Type Description
ndarray

np.ndarray: The extracted embeddings.

Raises:

Type Description
ValueError

If no model is set before calling get_embeddings.