AI's Blind Spot: The Structured Data Challenge

TL;DR: Structured data is the language of measurement and decision-making, yet AI still treats it as an afterthought. At Prior Labs, we're building Multimodal Tabular Foundation Models (TFMs), starting with TabPFN, that understand tables natively—learning statistical reasoning directly from data. Our vision is broader: truly agentic AI systems capable of understanding high-level goals, fusing tables, language, and images to reason, integrate domain knowledge, infer causality, and adapt dynamically. This isn't just better analytics—it's a new foundation for discovery across science, medicine, and the global economy.

While artificial intelligence masters language and vision, it remains surprisingly inept with structured data. This isn't niche data – structured tables are the language of measurement and empirical observation. AI now generates art and poems but struggles to natively comprehend the core operational data in spreadsheets and databases driving most critical decisions across medicine, finance, science and virtually all industries. This isn't just a gap; it's a massive bottleneck holding back progress.

Imagine, instead, a future where AI doesn't just interact with tables through brittle tools, but understands them. A future where intelligent agents can instantly forecast market trends from financial logs, accelerate the discovery of life-saving drugs by interpreting clinical trial data, optimize global supply chains using real-time sensor readings, prevent billions of dollars in fraud by spotting anomalies in transactions, and personalize medicine based on genomic insights. This isn't merely about better analytics; it's about transforming how discovery happens, how businesses operate, and how we tackle grand challenges like cancer and climate change. It promises to reshape data science itself, from university curricula to organizational structures. At Prior Labs, we are building this future.

Why Structured Data Remains AI's Unconquered Frontier

Today, we witness iterative cycles where domain specialists brief data scientists, who then wrestle with outdated models, aggregate findings, report back, and painstakingly refine questions or data inputs—a process ill-suited to the pace of modern discovery and business. While LLMs can call tools to interact with tables, they lack a deep, internal understanding of the data itself, inheriting the limitations of the tools they use.

But, why has structured data proven so resistant to the foundation model revolution? Tables are different: link

Data Accessibility Bias: AI's growth was fuelled by public text/images. Critical tabular data often remains private (spreadsheets rarely go viral), reducing public data for large model training.
Architectural Mismatch: Standard LLM models lack native mechanisms for tabular layouts and numerics. Their 1-dimensional sequential architecture is made to understand language not numbers. This is like grasping an image by hearing its pixels read aloud. We need AI designed specifically for data patterns.
Inherent Complexity: Tables combine diverse data types while often encoding highly complex domains (e.g., genomics, physics, finance). Interpreting this deep semantic and structural complexity challenges standard AI architectures.

Building Native Intelligence for Tables

Prior Labs is tackling this challenge by developing Tabular Foundation Models (TFMs), marking a paradigm shift, away from training on specific downstream tasks to teaching the model statistical reasoning itself.

Our first major breakthrough, TabPFN, exemplifies this. TabPFN is a Transformer model, leveraging the compute and architectural power of the modern AI era, but pre-trained exclusively on millions of synthetic tabular datasets encompassing a vast diversity of underlying structures and patterns. This unique pre-training process imbues TabPFN with a rich statistical "prior," allowing it to implicitly understand tabular data through a native architecture. It treats numbers as numbers, grasps 2D relationships, and avoids the information loss common with standard tokenization approaches.

TabPFN uses In-Context Learning (ICL), processing new data examples at inference time for state-of-the-art predictions in seconds – zero-shot, without retraining or tuning. Validated in Nature Magazine, its speed, accuracy, and remarkable generalization confirm the power of this approach. link

Multimodal Models for Truly Agentic Data Science

TabPFN is a crucial first step, but our ultimate vision extends far beyond specialized tabular models. We are building the next generation: Multimodal TFMs designed for inherent multimodal understanding.

Imagine AI that doesn't just call tools, but natively fuses the statistical patterns in tables with the semantic context of language and the perceptual information from images, all within a single, unified architecture.

Such integrated models will power AI agents capable of:

Understanding high-level analytical goals expressed in natural language.
Intelligently gathering, querying, and integrating data from diverse sources.
Integrating common sense, users domain knowledge and additional information sources with statistical information to improve predictions.
Engaging in dynamic dialogue to explore results, refine hypotheses, and surface insights invisible to human analysis alone.

Just as LLMs provided a foundational layer for language tasks, we envision TFMs becoming the core intelligence engine for reasoning over structured and multimodal data. They are designed to empower platforms like Snowflake, Databricks, SAP, and the broader ecosystem of companies building in the application layer by providing deep, native data understanding capabilities – the missing predictive and analytical intelligence layer needed to unlock the full potential of modern data infrastructure. This extends to robust outlier detection, accurate forecasting, high-fidelity synthetic data generation, and enabling analysis across entire heterogeneous databases.

Tackling the Hard Questions

Realizing this vision requires solving some of the most complex and fundamental challenges in AI, problems that have stumped the field for decades:

Semantic Reasoning: Truly blending statistical power with contextual and domain-specific knowledge within a unified architecture.
Inferring Causality: Moving beyond correlation to identify the causal drivers.
Ensuring Trust: Making complex AI reasoning transparent, fair, interpretable, and dependable.

These questions define the cutting edge of AI, and answering them is core to our mission. Led and advised by pioneers in AI, AutoML and causality including Frank Hutter and Bernhard Schölkopf, we aim to build the world's best team of researchers tackling these questions, driven by creating the best possible products, and aided by world class engineers without whom this wouldn't be possible.

Building the Future & Ecosystem at Prior Labs

Join us to solve deep AI problems with global impact, transforming how entire industries make decisions. We seek passionate world-class researchers and engineers to join our collaborative team, pioneer solutions to these fundamental challenges with significant compute resources, and build systems that unlock understanding, using intelligence truly native to the data itself.

Recognizing that optimal performance often requires domain-specific knowledge, we will launch a dedicated fine-tuning program in the coming days. This initiative will help organizations, particularly in complex fields like genomics, clinical trials, trading, and financial modeling, to adapt our models – more to follow soon!

Looking ahead throughout the year, we plan several impactful releases. Key developments will include scaling our models to handle up to one million samples, substantially reducing inference times, doubling down on time series forecasting, and introducing relational understanding. We will also launch a dedicated open-source repository to foster education, research and experimentation, alongside releasing a series of agentic features aimed at automating complex data science processes.

We believe that just as LLMs democratized interaction with language, TFMs will democratize deep data analysis and decision making. Let's build the future of structured data together.

Frank, Noah & Sauraj — April 24th, 2025