many_class_classifier ¶
This module provides a classifier that overcomes TabPFN's limitation on the number of classes (typically 10) by using a meta-classifier approach. It works by breaking down multi-class problems into multiple sub-problems, each within TabPFN's class limit.
Key features: - Handles any number of classes beyond TabPFN's native limits - Uses an efficient codebook approach to minimize the number of base models - Compatible with both TabPFN and TabPFN-client backends - Maintains high accuracy through redundant coding - Follows scikit-learn's Estimator interface
Example usage
from tabpfn import TabPFNClassifier # or from tabpfn_client
from tabpfn_extensions.many_class import ManyClassClassifier
# Create a base TabPFN classifier
base_clf = TabPFNClassifier()
# Wrap it with ManyClassClassifier to handle more classes
many_class_clf = ManyClassClassifier(
estimator=base_clf,
alphabet_size=10 # Use TabPFN's maximum class limit
)
# Use like any scikit-learn classifier, even with more than 10 classes
many_class_clf.fit(X_train, y_train)
y_pred = many_class_clf.predict(X_test)
ManyClassClassifier ¶
Bases: BaseEstimator
, ClassifierMixin
Output-Code multiclass strategy to extend TabPFN beyond its class limit.
This class enables TabPFN to handle classification problems with any number of classes by using a meta-classifier approach. It creates an efficient coding system that maps the original classes to multiple sub-problems, each within TabPFN's class limit.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
estimator |
A classifier implementing fit() and predict_proba() methods.
Typically a TabPFNClassifier instance. The base classifier
should be able to handle up to |
required | |
alphabet_size |
Maximum number of classes the base estimator can handle. If None, it will try to get this from estimator.max_num_classes_ |
None
|
|
n_estimators |
Number of base estimators to train. If None, an optimal number is calculated based on the number of classes and alphabet_size. |
None
|
|
n_estimators_redundancy |
Redundancy factor for the auto-calculated number of estimators. Higher values increase reliability but also computational cost. |
4
|
|
random_state |
Controls randomization used to initialize the codebook. Pass an int for reproducible results. |
None
|
Attributes:
Name | Type | Description |
---|---|---|
classes_ |
Array containing unique target labels. |
|
code_book_ |
N-ary array containing the coding scheme that maps original classes to base classifier problems. |
|
no_mapping_needed_ |
True if the number of classes is within the alphabet_size limit, allowing direct use of the base estimator without any mapping. |
|
classes_index_ |
Maps class labels to their indices in classes_. |
|
X_train |
Training data stored for reuse during prediction. |
|
Y_train |
Encoded training labels for each base estimator. |
Examples:
>>> from sklearn.datasets import load_iris
>>> from tabpfn import TabPFNClassifier
>>> from tabpfn_extensions.many_class import ManyClassClassifier
>>> from sklearn.model_selection import train_test_split
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
>>> base_clf = TabPFNClassifier()
>>> many_clf = ManyClassClassifier(base_clf, alphabet_size=base_clf.max_num_classes_)
>>> many_clf.fit(X_train, y_train)
>>> y_pred = many_clf.predict(X_test)
fit ¶
fit(
X: ndarray, y: ndarray, **fit_params
) -> ManyClassClassifier
Fit underlying estimators.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Data matrix of shape (n_samples, n_features). |
required |
y |
ndarray
|
Multi-class target labels of shape (n_samples,). |
required |
**fit_params |
Parameters passed to the |
{}
|
Returns:
Name | Type | Description |
---|---|---|
self |
ManyClassClassifier
|
Returns a fitted instance of self. |
get_alphabet_size ¶
Get the alphabet size to use for the codebook.
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The alphabet size to use |
get_n_estimators ¶
Calculate the number of estimators to use based on the number of classes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_classes |
int
|
The number of classes in the classification problem |
required |
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The number of estimators to use |
predict ¶
Predict multi-class targets using underlying estimators.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Data matrix of shape (n_samples, n_features). |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Predicted multi-class targets of shape (n_samples,). |
predict_proba ¶
Predict probabilities using the underlying estimators.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Data matrix of shape (n_samples, n_features). |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The probability of the samples for each class in the model,
of shape (n_samples, n_classes), where classes are ordered as
they are in |
set_categorical_features ¶
Set categorical features for the base estimator if it supports it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
categorical_features |
list[int]
|
List of categorical feature indices. |
required |