A-Team Insight Blogs

Is the Most Expensive Failure Mode in Alt Data on the User Side?

21 May 2026

Subscribe to our newsletter

Asked at the A-Team/Eagle Alpha Alternative Data Conference, London last week which datasets his firm had ever taken out of production, one panellist paused, considered the question, and admitted he could not think of a single one. The answer was offered without defensiveness, as a genuine reflection. It also turned out to be one of the most revealing moments of the session.

The panel – “Building and Scaling an Alternative Data Function” – was moderated by Mike O’Hara, editor of Market & Alt Data Insight, and brought together Caio Natividade, Managing Director and Global Head of Quantitative Investment Solutions Research at Deutsche Bank; Dr Tim Drye, Founder of Curious Blue Fish; and Alexander Denev, CEO of Turnleaf Analytics, a member of the GARP AI Advisory Committee and author of “The Book of Alternative Data”.

The discussion themes ranged across structure, operations, accountability, skills and spend. What emerged across all of them was a less comfortable underlying question: how much of what alt data teams do is built on the assumption that the things they are working with – datasets, models, even the team’s own understanding of what a dataset is for – sit still long enough to be managed?

The Dataset That Never Gets Retired

The retirement question was the cleanest illustration. Pressed on why no dataset had ever been formally retired, one panellist explained that the type of data his function prioritises is data that gives scale – coverage across an entire asset class or multiple asset classes. Niche datasets that lose their applicability after a specific window do not clear the bar for onboarding in the first place. Structured, broad-coverage data, by his account, was not retired because it did not expire; its relevance ebbed and returned.

A second panellist, speaking from the vendor’s perspective, gave the converse view. Clients, he said, frequently buy a dataset, fail to use it, and then ask two years later for a contract extension so they can try again. The cash flows were welcome. Large organisations, he added, were the worst offenders.

Between those two accounts sat the more substantive admission, offered at the close of the session: that a firm had taken roughly two years to realise it was using a news flow and sentiment dataset in the wrong way. The initial approach – mapping headline scores to alphas – was simply incorrect. Other applications of the same data turned out to be more productive once the team understood them. The fault, the panellist said plainly, was on the user’s side, not the vendor’s.

That admission cut against the dominant narrative in alt data discourse, in which underperformance is attributed to vendor methodology, drift, or misrepresentation. Whereas most post-mortem frameworks the industry has developed are oriented towards evaluating vendors rather than evaluating a firm’s own analytical assumptions, the most expensive failure mode here was a two-year period in which a user did not understand what its own dataset could do.

Centralised, Embedded, or Hybrid

The same question – what is stable and what is not – sat under the earlier operating model debate. An audience poll opened the session with a clear preference for fully centralised data teams. One panellist immediately – and vehemently – disagreed. Another said hybrid worked best, citing the difficulty of negotiating with fully centralised data functions inside large banks, where the data team could become bureaucratically isolated. A third said yes to centralisation on grounds of scale, price competitiveness and technical debt – and then described a structure in which a data science operations lead sat inside the research function but worked closely with the centralised data team. That, his fellow panellists pointed out, was hybrid.

What had appeared to be a three-way disagreement resolved into a debate about labelling. The shared destination was an operating model in which centralised infrastructure – the catalogue, the onboarding pipeline, the contracts – coexists with embedded analysts close to the end user. The substantive question is not whether to centralise but whether the communication between the two actually works.

Building for Instability

The operations discussion produced some of the session’s most concrete material. One speaker walked through a working monitoring regime: around 50 automated data checks at the entrance to each pipeline, distinguishing engineering failures from quieter problems where the data continues to look genuine but its quality has deteriorated. Around 1,000 variables per country, used deliberately to create redundancy: if a source disappears, the model substitutes the nearest neighbour. A human in the loop for any flagged anomaly. AI used for both modelling and anomaly detection. Sources do disappear – Google Mobility was cited as a recent example – and the architecture is built around the expectation that this will happen, not the hope that it will not.

Another panellist made the same point philosophically. Most statistical practice, he argued, assumes that markets are at some form of equilibrium and that the data around them is stationary with independent noise. He suggested that markets never reach equilibrium. The dynamics, in his view, were the most informative property of the data; the largest value, he said, is captured by detecting real change earlier than the rest of the market, because mispricing persists in the gap between the change happening and the market accommodating it. Treating data as stationary, he argued, imposes a structure that requires much larger change before it is detected at all.

The intuition – that the assumption of stability is itself a source of risk – connected directly to the monitoring practices described moments earlier, and to the two-year delay in finding the right use for the news flow dataset. Different vocabularies, the same underlying problem.

The Open Question

Vendor evaluation frameworks have matured. Monitoring regimes have matured. The discipline of asking, periodically and rigorously, whether a firm still understands what its own datasets are for has not. That, on the evidence of this session, is potentially where the most expensive failures now live.

Subscribe to our newsletter

Market & Alt Data Insight

WEBINAR

Recorded Webinar: The Data Foundation for Alpha – How fragmented data is eroding hedge fund performance

Alpha depends on more than models, talent and execution. It depends on the quality, consistency and timeliness of the data behind every investment decision. Many hedge funds still operate with fragmented datasets, inconsistent identifiers and manual reconciliation processes that slow research, distort signals and increase operational risk. As firms scale across strategies, regions and asset...

Find out more

23 June 2026

Market & Alt Data Insight

BLOG

LexisNexis Q&A: Ensuring Data Trust, From News to Governance

Since the 1970s, LexisNexis has been providing a variety of data services to financial institutions. Data Management Insight spoke to Danielle McCormick, vice president of product, Nexis Solutions – LexisNexis, to discuss how financial institutions are approaching AI, trusted data and the future of enterprise intelligence. Data Management Insight: Hello Danielle, when were LexisNexis’ data...

26 May 2026

Data Management Insight Market & Alt Data Insight

EVENT

Digital Assets & Tokenisation Forum, New York

A-Team Group’s Digital Assets & Tokenisation Summit spotlights how global financial leaders are rapidly embracing programmable tokenised assets and DLT networks to achieve real-time, 24/7 peer-to-peer transactions.

19 November 2026

Digital Assets & Tokenisation Insight TradingTech Insight

GUIDE

AI in Capital Markets Handbook 2026

AI adoption in capital markets has moved into a more disciplined phase. The priority is now controlled deployment: where AI can be used safely, where it can deliver measurable value, and how outputs can be governed, monitored and evidenced. The 2026 edition of the AI in Capital Markets Handbook examines how AI is being applied...

21 May 2026

Data Management Insight Market & Alt Data Insight RegTech Insight TradingTech Insight

Browse by brand

Market & Alt Data Insight

TradingTech Insight

Digital Assets & Tokenisation Insight

Data Management Insight

RegTech Insight

Browse by content type

A-Team Insight Blogs

Is the Most Expensive Failure Mode in Alt Data on the User Side?

Share article

Related content

WEBINAR

Recorded Webinar: The Data Foundation for Alpha – How fragmented data is eroding hedge fund performance

BLOG

LexisNexis Q&A: Ensuring Data Trust, From News to Governance

EVENT

Digital Assets & Tokenisation Forum, New York

GUIDE

AI in Capital Markets Handbook 2026

Share on Mastodon

A-Team Insight Blogs

Is the Most Expensive Failure Mode in Alt Data on the User Side?

Share article

Related content

webinars

Recorded Webinar: From Data to Alpha: AI Strategies for Taming Unstructured Data

Related content

WEBINAR

Recorded Webinar: The Data Foundation for Alpha – How fragmented data is eroding hedge fund performance

BLOG

LexisNexis Q&A: Ensuring Data Trust, From News to Governance

EVENT

Digital Assets & Tokenisation Forum, New York

GUIDE

AI in Capital Markets Handbook 2026