Skip to content

many_class_classifier

This module provides a classifier that overcomes TabPFN's limitation on the number of classes (typically 10) by using a meta-classifier approach. It works by breaking down multi-class problems into multiple sub-problems, each within TabPFN's class limit.

Key features: - Handles any number of classes beyond TabPFN's native limits - Uses an efficient codebook approach to minimize the number of base models - Compatible with both TabPFN and TabPFN-client backends - Maintains high accuracy through redundant coding - Follows scikit-learn's Estimator interface

Example usage
from tabpfn import TabPFNClassifier  # or from tabpfn_client
from tabpfn_extensions.many_class import ManyClassClassifier

# Create a base TabPFN classifier
base_clf = TabPFNClassifier()

# Wrap it with ManyClassClassifier to handle more classes
many_class_clf = ManyClassClassifier(
    estimator=base_clf,
    alphabet_size=10  # Use TabPFN's maximum class limit
)

# Use like any scikit-learn classifier, even with more than 10 classes
many_class_clf.fit(X_train, y_train)
y_pred = many_class_clf.predict(X_test)

ManyClassClassifier

Bases: BaseEstimator, ClassifierMixin

Output-Code multiclass strategy to extend TabPFN beyond its class limit.

This class enables TabPFN to handle classification problems with any number of classes by using a meta-classifier approach. It creates an efficient coding system that maps the original classes to multiple sub-problems, each within TabPFN's class limit.

Parameters:

Name Type Description Default
estimator

A classifier implementing fit() and predict_proba() methods. Typically a TabPFNClassifier instance. The base classifier should be able to handle up to alphabet_size classes.

required
alphabet_size

Maximum number of classes the base estimator can handle. If None, it will try to get this from estimator.max_num_classes_

None
n_estimators

Number of base estimators to train. If None, an optimal number is calculated based on the number of classes and alphabet_size.

None
n_estimators_redundancy

Redundancy factor for the auto-calculated number of estimators. Higher values increase reliability but also computational cost.

4
random_state

Controls randomization used to initialize the codebook. Pass an int for reproducible results.

None

Attributes:

Name Type Description
classes_

Array containing unique target labels.

code_book_

N-ary array containing the coding scheme that maps original classes to base classifier problems.

no_mapping_needed_

True if the number of classes is within the alphabet_size limit, allowing direct use of the base estimator without any mapping.

classes_index_

Maps class labels to their indices in classes_.

X_train

Training data stored for reuse during prediction.

Y_train

Encoded training labels for each base estimator.

Examples:

>>> from sklearn.datasets import load_iris
>>> from tabpfn import TabPFNClassifier
>>> from tabpfn_extensions.many_class import ManyClassClassifier
>>> from sklearn.model_selection import train_test_split
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
>>> base_clf = TabPFNClassifier()
>>> many_clf = ManyClassClassifier(base_clf, alphabet_size=base_clf.max_num_classes_)
>>> many_clf.fit(X_train, y_train)
>>> y_pred = many_clf.predict(X_test)

fit

fit(
    X: ndarray, y: ndarray, **fit_params
) -> ManyClassClassifier

Fit underlying estimators.

Parameters:

Name Type Description Default
X ndarray

Data matrix of shape (n_samples, n_features).

required
y ndarray

Multi-class target labels of shape (n_samples,).

required
**fit_params

Parameters passed to the estimator.fit method of each sub-estimator.

{}

Returns:

Name Type Description
self ManyClassClassifier

Returns a fitted instance of self.

get_alphabet_size

get_alphabet_size() -> int

Get the alphabet size to use for the codebook.

Returns:

Name Type Description
int int

The alphabet size to use

get_n_estimators

get_n_estimators(n_classes: int) -> int

Calculate the number of estimators to use based on the number of classes.

Parameters:

Name Type Description Default
n_classes int

The number of classes in the classification problem

required

Returns:

Name Type Description
int int

The number of estimators to use

predict

predict(X: ndarray) -> ndarray

Predict multi-class targets using underlying estimators.

Parameters:

Name Type Description Default
X ndarray

Data matrix of shape (n_samples, n_features).

required

Returns:

Type Description
ndarray

np.ndarray: Predicted multi-class targets of shape (n_samples,).

predict_proba

predict_proba(X: ndarray) -> ndarray

Predict probabilities using the underlying estimators.

Parameters:

Name Type Description Default
X ndarray

Data matrix of shape (n_samples, n_features).

required

Returns:

Type Description
ndarray

np.ndarray: The probability of the samples for each class in the model, of shape (n_samples, n_classes), where classes are ordered as they are in self.classes_.

set_categorical_features

set_categorical_features(
    categorical_features: list[int],
) -> None

Set categorical features for the base estimator if it supports it.

Parameters:

Name Type Description Default
categorical_features list[int]

List of categorical feature indices.

required