Classification¶
TabPFN provides a powerful interface for handling classification tasks on tabular data. The TabPFNClassifier
class can be used for binary and multi-class classification problems.
Example¶
Below is an example of how to use TabPFNClassifier
for a multi-class classification task:
from tabpfn_client import TabPFNClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the Iris dataset
X, y = load_iris(return_X_y=True)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train classifier
classifier = TabPFNClassifier(device='cuda', N_ensemble_configurations=10)
classifier.fit(X_train, y_train)
# Evaluate
y_pred = classifier.predict(X_test)
print('Test Accuracy:', accuracy_score(y_test, y_pred))
from tabpfn import TabPFNClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the Iris dataset
X, y = load_iris(return_X_y=True)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train classifier
classifier = TabPFNClassifier(device='cuda', N_ensemble_configurations=10)
classifier.fit(X_train, y_train)
# Evaluate
y_pred = classifier.predict(X_test)
print('Test Accuracy:', accuracy_score(y_test, y_pred))
Example with AutoTabPFNClassifier¶
Abstract
The AutoTabPFNClassifier and AutoTabPFNRegressor automatically run a hyperparameter search and build an ensemble of strong hyperparameters. You can control the runtime using ´max_time´ and need to make no further adjustments to get best results.
from tabpfn.scripts.estimator.post_hoc_ensembles import AutoTabPFNClassifier, AutoTabPFNRegressor
# we refer to the PHE variant of TabPFN as AutoTabPFN in the code
clf = AutoTabPFNClassifier(device='auto', max_time=30)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
clf.fit(X_train, y_train)
preds = clf.predict_proba(X_test)
y_eval = np.argmax(preds, axis=1)
print('ROC AUC: ', sklearn.metrics.roc_auc_score(y_test, preds[:,1], multi_class='ovr'), 'Accuracy', sklearn.metrics.accuracy_score(y_test, y_eval))