tabpfn_embedding ¶
TabPFNEmbedding ¶
TabPFNEmbedding is a utility for extracting embeddings from TabPFNClassifier or TabPFNRegressor models. It supports standard training (vanilla embedding) as well as K-fold cross-validation for embedding extraction.
- When
n_fold=0
, the model extracts vanilla embeddings by training on the entire dataset. - When
n_fold>0
, K-fold cross-validation is applied based on the method proposed in "A Closer Look at TabPFN v2: Strength, Limitation, and Extension" (https://arxiv.org/abs/2502.17361), where a largern_fold
improves embedding effectiveness.
NOTE: This functionality requires the full TabPFN implementation (pip install tabpfn) and is not compatible with the TabPFN client (pip install tabpfn-client). The client version does not provide access to model embeddings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tabpfn_clf |
TabPFNClassifier, optional An instance of TabPFNClassifier to handle classification tasks. |
None
|
|
tabpfn_reg |
TabPFNRegressor, optional An instance of TabPFNRegressor to handle regression tasks. |
None
|
|
n_fold |
int, default=0 Number of folds for K-fold cross-validation. If set to 0, standard training is used. |
0
|
Attributes:
Name | Type | Description |
---|---|---|
model |
TabPFNClassifier or TabPFNRegressor The model used for embedding extraction. |
>>> from tabpfn_extensions import TabPFNClassifier # Must use full TabPFN package
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.datasets import fetch_openml
>>> X, y = fetch_openml(name='kc1', version=1, as_frame=False, return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
>>> clf = TabPFNClassifier(n_estimators=1)
>>> embedding_extractor = TabPFNEmbedding(tabpfn_clf=clf, n_fold=0)
>>> train_embeddings = embedding_extractor.get_embeddings(X_train, y_train, X_test, data_source="train")
>>> test_embeddings = embedding_extractor.get_embeddings(X_train, y_train, X_test, data_source="test")
fit ¶
Trains the TabPFN model on the given dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X_train |
ndarray
|
Training feature data. |
required |
y_train |
ndarray
|
Training target labels. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If no model is set before calling fit. |
get_embeddings ¶
Extracts embeddings for the given dataset using the trained model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X_train |
ndarray
|
Training feature data. |
required |
y_train |
ndarray
|
Training target labels. |
required |
X |
ndarray
|
Data for which embeddings are to be extracted. |
required |
data_source |
str
|
Specifies the data source ("test" for test data). |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The extracted embeddings. |
Raises:
Type | Description |
---|---|
ValueError
|
If no model is set before calling get_embeddings. |