About a-team Marketing Services
The knowledge platform for the financial technology industry
The knowledge platform for the financial technology industry

A-Team Insight Blogs

When AI Meets SEC Filings: Where LLMs Deliver, Where They Don’t, and Why the Plumbing Matters

Subscribe to our newsletter

Can LLMs reliably do what financial analysts do? Compare peers, track firms over time, and spot what’s changed? A panel at the recent A-Team Group/Eagle Alpha Alternative Data Conference New York explored the question, and new research from Goldman Sachs and Yale suggests the answer is more nuanced than the industry assumes.

The panel session “How to Use AI to Build Better Data Products” brought together Norman Niemer, Head of Research & Data Science at UBS, and Sid Ghatak, founder of Increase Alpha and former chief AI architect at the Federal Energy Regulatory Commission. The discussion, moderated by Andrew Delaney of A-Team Group, surfaced a set of operational tensions that anyone building or buying alternative data products should be paying attention to.

The headline question – where is AI delivering the most value in alternative data product development? – drew a range of responses. But the more revealing discussion concerned whether large language models belong at the core of financial data extraction pipelines, or whether they’re better deployed at the edges as productivity tools and presentation layers, while something more deterministic does the heavy lifting.

The Deterministic Pipeline vs. the LLM

One panellist described a data product built entirely on proprietary, non-LLM artificial intelligence: a deterministic extraction engine that pulls structured features from SEC filings to predict short-term returns. Because the system doesn’t use language models, it is technically incapable of hallucination, it runs the same way every time, and the output is completely auditable and compliant.

The panellist referenced Fin-RATE, a benchmark published in February 2026 by researchers at Goldman Sachs and Yale, which tests LLMs against three task types that mirror real analyst workflows: extracting facts from a single filing; comparing disclosures across companies; and tracking how the same firm’s filings change over time.

The headline finding is that LLMs are reasonably competent at single-document extraction, but degrade substantially on the tasks analysts actually spend their time on. When asked to compare disclosures across companies, models fabricated comparative claims and confused which data belonged to which entity. When tracking the same firm over time, they treated each year’s filing independently, producing temporal mismatches and invented trend claims. Finance-tuned models were particularly brittle. Although strong on single documents, they suffered from near-total collapse on cross-entity work.

Under retrieval-augmented conditions – the way most production systems actually work – performance dropped further still, and the researchers demonstrated this was primarily a retrieval problem rather than a generation one. The models performed adequately when given the right evidence; the bottleneck was surfacing it. For cross-company queries, the vast majority of questions suffered from missing evidence entirely. A hierarchical retrieval approach – pre-bucketing documents by company and year before searching – dramatically improved results, suggesting that how you index and organise your corpus determines whether the AI layer works at all. This maps directly to a point made repeatedly during the panel: the “unsexy” work of things like data organisation, entity resolution and temporal indexing, determines whether AI can deliver on its promise.

AI as Plumbing: The Unglamorous Work That Matters Most

The panel converged on a shared observation: AI’s most tangible impact in alternative data isn’t in signal generation. It’s in data engineering; the entity resolution, fuzzy matching, normalisation, and deduplication work that has historically been the most time-consuming part of onboarding alternative datasets.

One participant described using AI to match property reviews against a holdings database, a task previously too labour-intensive to justify. The gain wasn’t speed on an existing task; it was enabling analysis that wouldn’t have been attempted at all. Another described compressing days of what-if regression work into near-instant iterations.

But trust remains the binding constraint. Even participants who use AI extensively for entity resolution noted they still trust human verification more, particularly for historical mappings. AI can match a brand to a company entity today, but corporate structures and brand portfolios change over time. Maintaining point-in-time accuracy across a ten-year history is precisely the temporal mismatch problem that Fin-RATE documents formally.

There was also a candid acknowledgement that LLMs are fundamentally language models, and much of the data that needs cleaning is numerical, tabular, and structural. This aligns with Fin-RATE’s finding that finance-specific numerical errors – units and scales confusion, computation logic mistakes – constitute a persistent category of LLM failure.

One structurally interesting observation: dirty data is the primary obstacle to AI adoption in financial workflows, and AI itself may be the best tool for cleaning it. But this requires deliberate investment in what multiple participants called “ugly plumbing work,” i.e. effort that organisations often skip in favour of more visible applications. The panel’s audience poll confirmed the point: data quality and reliability topped the list of challenges, ahead of integration, explainability, and cost.

Competitive Implications

The panel explicitly addressed whether AI is a productivity tool or a capability enabler. The consensus was both, but the boundary matters strategically.

If AI dramatically reduces the cost of data engineering – onboarding, mapping, and normalising messy datasets – then smaller firms can now do work that previously required the infrastructure budgets of the largest shops. Data engineering used to be part of the alpha; if AI commoditises it, differentiation has to come from proprietary data sources, the quality of the analytical layer, or the speed and reliability of delivery. The perennial alternative data question – does the signal get competed away when everyone has the same tools? – takes on a new form.

For data products that use AI, trust and transparency go hand in hand. The panel discussion identified three things that build credibility with institutional clients: verifiable track records based on live predictions rather than backtests; full source citations that trace every claim back to a specific section of a specific filing; and honesty about which parts of the pipeline are deterministic and which rely on probabilistic AI. That last point matters more than ever. If LLMs fabricate comparative claims and invent trends – as Fin-RATE documents – then clients evaluating AI-powered data products need to know exactly where in the process those models are being used, and what validation sits around them.

What This Means in Practice

For those evaluating or building AI-powered data products, several conclusions emerge. Single-document LLM extraction is meaningful but imperfect; multi-document synthesis – comparison and longitudinal tracking – is where models fail systematically. If a vendor claims AI-powered comparative analysis, the right questions concern validation methodology, hallucination rates, and entity alignment.

Retrieval architecture may matter more than model selection for production systems. How documents are indexed, bucketed, and queried has a larger impact on accuracy than which LLM generates the answer.

Data quality work remains the critical enabler. AI can help, but it requires deliberate investment and doesn’t happen by default. Organisations that skip the plumbing build on unstable foundations.

And the deterministic-vs-probabilistic architectural choice is commercially significant. Products built on deterministic pipelines offer auditability and consistency; those on LLMs offer flexibility and breadth. Understanding where each applies is essential for informed procurement.

Subscribe to our newsletter

Related content

WEBINAR

Recorded Webinar: The Role of Data Fabric and Data Mesh in Modern Trading Infrastructures

The demands on trading infrastructure are intensifying. Increasing data volumes, the necessity for real-time processing, and stringent regulatory requirements are exposing the limitations of legacy data architectures. In response, firms are re-evaluating their data strategies to improve agility, scalability, and governance. Two architectural models central to this conversation are Data Fabric and Data Mesh. This...

BLOG

SimCorp Opens Axioma Factor Research Library to Quant Investors

SimCorp has introduced the Axioma Factor Library Suite, giving quantitative investors access to a broad set of proprietary equity and macro factors derived from the research underpinning its Axioma risk models. The dataset is aimed at hedge funds, systematic investors and asset managers seeking to expand the signal universe available for portfolio construction and strategy...

EVENT

RegTech Summit London

Now in its 9th year, the RegTech Summit in London will bring together the RegTech ecosystem to explore how the European capital markets financial industry can leverage technology to drive innovation, cut costs and support regulatory change.

GUIDE

Connecting to Today’s Fast Markets

At the same time, the growth of high frequency and event-driven trading techniques is spurring demand for direct feed services sourced from exchanges and other trading venues, including alternative trading systems and multilateral trading facilities. Handling these high-speed data feeds its presenting market data managers and their infrastructure teams with a challenge: how to manage...