tabpfn_embedding ¶

TabPFNEmbedding ¶

TabPFNEmbedding is a utility for extracting embeddings from TabPFNClassifier or TabPFNRegressor models. It supports standard training (vanilla embedding) as well as K-fold cross-validation for embedding extraction.

When n_fold=0, the model extracts vanilla embeddings by training on the entire dataset.
When n_fold>0, K-fold cross-validation is applied based on the method proposed in "A Closer Look at TabPFN v2: Strength, Limitation, and Extension" (https://arxiv.org/abs/2502.17361), where a larger n_fold improves embedding effectiveness.

NOTE: This functionality requires the full TabPFN implementation (pip install tabpfn) and is not compatible with the TabPFN client (pip install tabpfn-client). The client version does not provide access to model embeddings.

Parameters:

Name	Description	Default
`tabpfn_clf`	TabPFNClassifier, optional An instance of TabPFNClassifier to handle classification tasks.	`None`
`tabpfn_reg`	TabPFNRegressor, optional An instance of TabPFNRegressor to handle regression tasks.	`None`
`n_fold`	int, default=0 Number of folds for K-fold cross-validation. If set to 0, standard training is used.	`0`

Attributes:

Name	Type	Description
`model`		TabPFNClassifier or TabPFNRegressor The model used for embedding extraction.

>>> from tabpfn_extensions import TabPFNClassifier  # Must use full TabPFN package
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.datasets import fetch_openml
>>> X, y = fetch_openml(name='kc1', version=1, as_frame=False, return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
>>> clf = TabPFNClassifier(n_estimators=1)
>>> embedding_extractor = TabPFNEmbedding(tabpfn_clf=clf, n_fold=0)
>>> train_embeddings = embedding_extractor.get_embeddings(X_train, y_train, X_test, data_source="train")
>>> test_embeddings = embedding_extractor.get_embeddings(X_train, y_train, X_test, data_source="test")

fit ¶

fit(X_train: ndarray, y_train: ndarray) -> None

Trains the TabPFN model on the given dataset.

Parameters:

Name	Type	Description	Default
`X_train`	`ndarray`	Training feature data.	required
`y_train`	`ndarray`	Training target labels.	required

Raises:

Type	Description
`ValueError`	If no model is set before calling fit.

get_embeddings ¶

get_embeddings(
    X_train: ndarray,
    y_train: ndarray,
    X: ndarray,
    data_source: str,
) -> ndarray

Extracts embeddings for the given dataset using the trained model.

Parameters:

Name	Type	Description	Default
`X_train`	`ndarray`	Training feature data.	required
`y_train`	`ndarray`	Training target labels.	required
`X`	`ndarray`	Data for which embeddings are to be extracted.	required
`data_source`	`str`	Specifies the data source ("test" for test data).	required

Returns:

Type	Description
`ndarray`	np.ndarray: The extracted embeddings.

Raises:

Type	Description
`ValueError`	If no model is set before calling get_embeddings.