How Much Hedge Fund Alpha Is Lost Before the Model Even Runs?

How often is a hedge fund’s apparent model failure actually a data failure in disguise? A model that stops working. A backtest that does not replicate. A risk number that needs explaining. How to tell one from the other – before reaching for the model first – was the recurring question of an hour-long discussion at a recent A-Team webinar.

The Data Foundation for Alpha: How Fragmented Data is Eroding Hedge Fund Performance, sponsored by FE fundinfo, brought together Renato Guerrieri, Head of Quantitative Strategy for Liquid Alternatives at Downing; Matthew Bell, Senior Data Scientist at Man Group; and Kirsty Joss, Head of Distribution Support and Feeds (Product) at FE fundinfo.

Three threads ran through the hour. That the most expensive form of fragmented data is the kind that produces convincing-looking backtests rather than the kind that slows research down. That point-in-time discipline is harder to enforce properly than firms often acknowledge, with lineage the discipline most often skipped, and the most expensive. And that AI, applied to both ends of the validation problem, is raising the stakes in both directions.

Most model failures are data problems in disguise

Guerrieri’s framing turned on a single phrase. “Many research debates are data debates wearing a quantitative costume,” he said. The same economic reality can appear in different shapes depending on the provider, identifier, timestamps and history – and two systems can disagree while both being correct. The researcher becomes an archaeologist before reaching the investment question.

The cost, on Guerrieri’s reading, is less the cleaning time – visible and measurable as it is – than the false confidence that follows in the front office: a signal that may not be the signal the model is testing, propagated into production on the strength of a backtest that looked right. “A slow process would have been annoying,” he said. “A wrong process with cleaner-looking outputs is much worse.”

An audience poll on where fragmented data hurts firms most produced a roughly even split between research productivity, backtest reliability and operational cost. Risk measurement received no votes; almost no-one said they did not measure the cost. Asked to react, Guerrieri singled out backtest reliability as the most dangerous of the three. “Bad data does not always scream,” he said. “Sometimes it arrives as a clean chart and a spectacular Sharpe ratio.”

Point-in-time discipline runs deeper than the data

On point-in-time, Bell flagged a category error in how funds buy data. Every vendor claims point-in-time, he noted, and most are not lying – they simply use the term to mean different things. The danger is the unspoken extension. Point-in-time on the features is necessary; on its own, it is not sufficient. If the security mappings, the model versioning, or the methodology revisions are backfilled rather than recorded as known, the discipline collapses. A point-in-time feature loaded into a backtest with non-point-in-time mappings has stopped being point-in-time data.

For Guerrieri, whether a backtest can be trusted at all comes down to three questions: what was known, when it was known, and whether the strategy could have acted on it at the time. The third is the one most often skipped – even where information was available in time, the practical question of whether a UK fund could have traded on overnight US news rarely makes it into historical returns. “Many beautiful backtests are just look-ahead bias with nice formatting,” Guerrieri said.

From the vendor side, Joss raised a parallel concern. Fund portfolio holdings data, she noted, is often actively embargoed by asset managers for 90 days or more – meaning a backtest constructed on holdings as at a given quarter-end will, if built without reference to embargo terms, draw on information no investor could have acted on at the time. The constraint is structural; firms that do not record it carry the bias forward.

Lineage gets neglected first, and costs the most

Asked which of validation, governance and lineage gets skipped first in a fast-growing fund, Guerrieri and Bell – from quite different vantage points – gave the same answer: lineage.

Validation gets attention when something breaks, Guerrieri said. Governance gets attention when a senior figure asks the question. Lineage gets treated as documentation – boring, perpetually deferred, never urgent enough to fund. The cost compounds out of sight. “Bad lineage becomes bad conviction,” he said, “and bad conviction is ultimately expensive.”

Bell’s evidence for the same point came from the operational reality of running thousands of signals across multiple strategies. When a signal moves unexpectedly, the diagnostic question is whether the market moved, the underlying vendor restated, the security mappings shifted, or a pipeline upgrade introduced silent breakage. Without lineage in place, that question has no answer – and the loss of confidence in the signal that follows is a direct research cost.

The framing both panellists landed on is that lineage is research infrastructure – treated as such by funds that scale well, and as back-office documentation by funds that do not. Funds pay for that gap twice: visibly, in research time, and invisibly, in portfolio manager confidence in the signals reaching them.

AI raises the stakes on both sides of the validation problem

Bell brought the AI question down to a recent case he encountered, in which an AI tool, asked to analyse a set of datasets and produce a research output, had found one of the underlying databases offline. Rather than flag the gap, the tool fabricated the missing data and presented it in a polished report. The output was unusually impressive; that was what drew Bell’s eye. The discovery was made before the report reached a researcher, but the implication, Bell noted, is that traditional data quality frameworks do not yet account for the failure modes AI introduces.

Guerrieri put the problem in scale terms. With traditional research, bad data damages one model or one backtest. With AI-driven workflows, the same input can contaminate retrieval, summaries, generated code, model training and the downstream decisions that follow. “AI just gives the garbage a better grammar,” he said.

Joss pushed back on the all-negative reading. At the point of inbound data, she argued, AI is most usefully deployed as an additive layer on top of established rules-based validation – picking up spikes and anomalies at the dataset level that field-by-field checks would miss. The procurement conversation has shifted in response. Coverage, fields, format and cost remain on the buyer’s list, but detailed questions about validation methodology, audit trail and sourcing have moved up it. The reason: data flowing directly into AI workflows has no one piecing through it at the other end, and the stakes of getting it right at source are higher.

A second audience poll on what hedge funds most want from data providers in 2026 ranked clearer accountability when issues arise above greater transparency on validation and lineage, with point-in-time accuracy and faster onboarding lower. Build-versus-buy has changed in parallel, Bell suggested: toward buying the data and building everything else, with the contested grey area now sitting around AI-enabled, research-ready products that blur the distinction between dataset and model.

Guerrieri’s closing argument compressed the hour. Weaker data, he said, does not stop a model looking sophisticated, a backtest looking impressive, or a dashboard looking clean – the investment decision is already compromised by the time those outputs arrive. Fragmented data, on his account, is an alpha problem dressed as an operational one.

The Data Foundation for Alpha: How Fragmented Data is Eroding Hedge Fund Performance was sponsored by FE fundinfo. The full recording is available on the Market & Alt Data Insight website.

Subscribe to our newsletter

Browse by brand

Market & Alt Data Insight

TradingTech Insight

Digital Assets & Tokenisation Insight

Data Management Insight

RegTech Insight

Browse by content type

A-Team Insight Blogs

How Much Hedge Fund Alpha Is Lost Before the Model Even Runs?

Share article

Related content

WEBINAR

Recorded Webinar: Navigating the Build vs Buy Dilemma: Cloud Strategies for Accelerating Quantitative Research

BLOG

Testing an Assumption: Do AI Signals Really Decay?

EVENT

ExchangeTech Summit London

GUIDE

AI in Capital Markets Handbook 2026

Share on Mastodon

A-Team Insight Blogs

How Much Hedge Fund Alpha Is Lost Before the Model Even Runs?

Share article

Related content

webinars

Recorded Webinar: The Data Foundation for Alpha – How fragmented data is eroding hedge fund performance

Related content

WEBINAR

Recorded Webinar: Navigating the Build vs Buy Dilemma: Cloud Strategies for Accelerating Quantitative Research

BLOG

Testing an Assumption: Do AI Signals Really Decay?

EVENT

ExchangeTech Summit London

GUIDE

AI in Capital Markets Handbook 2026