TabPFN-3: Technical Report

Abstract

Tabular data underpins most high-value prediction problems in science and industry, and TabPFN has driven the foundation model revolution for this modality. Designed with feedback from our users, TabPFN-3 builds on this foundation to scale state-of-the-art performance to datasets with 1M training rows and substantially reduce training and inference time. Pretrained exclusively on synthetic data from our prior, TabPFN-3 dramatically pushes the frontier of tabular prediction and brings substantial gains on time series, relational, and tabular-text data.

A new performance standard. On the standard tabular benchmark TabArena, a forward pass of TabPFN-3 outperforms all other models, including tuned and ensembled baselines, by a significant margin, and pareto-dominates the speed/performance frontier. TabPFN-3 also scales to more diverse datasets: it ranks first on datasets with many classes, and beats 8-hour-tuned gradient-boosted-tree baselines on datasets up to 1M training rows and 200 features.

Thinking mode. TabPFN-3 introduces test-time compute scaling to tabular foundation models. Our API offering TabPFN-3-Plus (Thinking) exploits this to beat all non-TabPFN models by over 200 Elo on the standard TabArena benchmark, rising to 420 Elo on the largest data subset, and outperforming AutoGluon 1.5 extreme in less than a tenth of its runtime, without using LLMs, real data, internet search or any other model besides TabPFN.

Broader capabilities. TabPFN-3 extends the capabilities of our models, enabling SOTA prediction on many-class datasets, relational data (new SOTA foundation model on RelBenchV1) and tabular-text datasets (SOTA on TabSTAR via TabPFN-3-Plus). It also directly improves existing integrations of TabPFN: a specialized TabPFN-3 checkpoint, TabPFN-TS-3, ranks 2nd on the time-series benchmark fev-bench, and SHAP-value computation through shapiq is up to 120× faster with KV caching.

An enterprise-ready model. TabPFN-3 achieves this performance while being up to 20x faster than TabPFN-2.5. In addition, a reduced KV cache and row-chunking scale to 1M rows on a single H100 with fast inference speed.

We release TabPFN-3 under the TABPFN-3.0 License v1.0, permissive for research and internal evaluation. TabPFN-3-Plus (Thinking) is available via API and enterprise licensing including on-prem and VPC environments (AWS SageMaker, Azure AI Foundry).

Figure 1: Performance on the TabArena benchmark, largest data subset (10k-100k samples). TabPFN-3 outperforms any other model in a forward pass. TabPFN-3-Plus (Thinking) is dramatically better yet, outperforming AutoGluon 1.5 extreme, a complex ensemble of models tuned for 4 hours, while being 10x faster.

Introduction

Tabular data sits at the core of operational decision-making across science and industry, including clinical risk prediction, credit scoring, predictive maintenance, and scientific measurement. While gradient-boosted trees were the reliable default for decades, tabular foundation models have displaced them as the strongest predictors on standard small-to-medium-sized benchmarks over the last year.

Earlier TabPFN releases established and extended this paradigm. TabPFN v1 showed that a transformer pretrained on synthetic tasks could approximate Bayesian inference in a single forward pass, though only on a thousand rows of clean numerical data. TabPFN v2 scaled this to 10,000 rows datasets with categorical features, missing values, and outliers, becoming the first tabular foundation model to outperform tuned gradient-boosted trees on standard benchmarks. TabPFN-2.5 extended the strong performance to 100,000 rows and 2,000 features and matched four-hour-tuned ensembles in a single forward pass. Across these releases, an active research ecosystem of extensions grew on top of the core model – domains include time-series forecasting, causal inference, Bayesian optimization, graph learning, interpretability, reinforcement learning – with over 200 published applications and more than three million PyPI downloads.

TabPFN-3 is shaped by the feedback from users and the entire ecosystem. To remove common bottlenecks, we scaled beyond a hundred thousand rows to one million rows, cut the memory and latency of inference at scale, added support for many-class classification, and honed our calibrated predictive distributions in a single forward pass. Furthermore, we carefully designed the TabPFN-3 model and training process to lift performance on both core tabular prediction as well as the many downstream extensions built on top of the open-source model, in particular time-series forecasting, multi-table relational data, and interpretability.

The remainder of this report describes the architecture, prior, and inference-time optimizations of TabPFN-3 (Section 2); evaluates its performance on public and internal benchmarks across classification, regression, many-class, time-series, and relational data (Section 3); surveys the adoption and ecosystem the model is built for (Section 4); and details licensing and availability (Section 5). Appendices provide architectural hyperparameters, prior visualizations, additional internal benchmark results, more detailed benchmark results and an extensive list of published TabPFN use cases. For installation and usage, see docs.priorlabs.ai.

Read the Full Technical Report ->

What's Next

We are working with customers to apply TabPFN-2.5 Scaling Mode to their largest datasets.