About a-team Marketing Services
The knowledge platform for the financial technology industry
The knowledge platform for the financial technology industry

A-Team Insight Blogs

Is the Most Expensive Failure Mode in Alt Data on the User Side?

Subscribe to our newsletter

Asked at the A-Team/Eagle Alpha Alternative Data Conference, London last week which datasets his firm had ever taken out of production, one panellist paused, considered the question, and admitted he could not think of a single one. The answer was offered without defensiveness, as a genuine reflection. It also turned out to be one of the most revealing moments of the session.

The panel – “Building and Scaling an Alternative Data Function” – was moderated by Mike O’Hara, editor of Market & Alt Data Insight, and brought together Caio Natividade, Managing Director and Global Head of Quantitative Investment Solutions Research at Deutsche Bank; Dr Tim Drye, Founder of Curious Blue Fish; and Alexander Denev, CEO of Turnleaf Analytics, a member of the GARP AI Advisory Committee and author of “The Book of Alternative Data”.

The discussion themes ranged across structure, operations, accountability, skills and spend. What emerged across all of them was a less comfortable underlying question: how much of what alt data teams do is built on the assumption that the things they are working with – datasets, models, even the team’s own understanding of what a dataset is for – sit still long enough to be managed?

The Dataset That Never Gets Retired

The retirement question was the cleanest illustration. Pressed on why no dataset had ever been formally retired, one panellist explained that the type of data his function prioritises is data that gives scale – coverage across an entire asset class or multiple asset classes. Niche datasets that lose their applicability after a specific window do not clear the bar for onboarding in the first place. Structured, broad-coverage data, by his account, was not retired because it did not expire; its relevance ebbed and returned.

A second panellist, speaking from the vendor’s perspective, gave the converse view. Clients, he said, frequently buy a dataset, fail to use it, and then ask two years later for a contract extension so they can try again. The cash flows were welcome. Large organisations, he added, were the worst offenders.

Between those two accounts sat the more substantive admission, offered at the close of the session: that a firm had taken roughly two years to realise it was using a news flow and sentiment dataset in the wrong way. The initial approach – mapping headline scores to alphas – was simply incorrect. Other applications of the same data turned out to be more productive once the team understood them. The fault, the panellist said plainly, was on the user’s side, not the vendor’s.

That admission cut against the dominant narrative in alt data discourse, in which underperformance is attributed to vendor methodology, drift, or misrepresentation. Whereas most post-mortem frameworks the industry has developed are oriented towards evaluating vendors rather than evaluating a firm’s own analytical assumptions, the most expensive failure mode here was a two-year period in which a user did not understand what its own dataset could do.

Centralised, Embedded, or Hybrid

The same question – what is stable and what is not – sat under the earlier operating model debate. An audience poll opened the session with a clear preference for fully centralised data teams. One panellist immediately – and vehemently – disagreed. Another said hybrid worked best, citing the difficulty of negotiating with fully centralised data functions inside large banks, where the data team could become bureaucratically isolated. A third said yes to centralisation on grounds of scale, price competitiveness and technical debt – and then described a structure in which a data science operations lead sat inside the research function but worked closely with the centralised data team. That, his fellow panellists pointed out, was hybrid.

What had appeared to be a three-way disagreement resolved into a debate about labelling. The shared destination was an operating model in which centralised infrastructure – the catalogue, the onboarding pipeline, the contracts – coexists with embedded analysts close to the end user. The substantive question is not whether to centralise but whether the communication between the two actually works.

Building for Instability

The operations discussion produced some of the session’s most concrete material. One speaker walked through a working monitoring regime: around 50 automated data checks at the entrance to each pipeline, distinguishing engineering failures from quieter problems where the data continues to look genuine but its quality has deteriorated. Around 1,000 variables per country, used deliberately to create redundancy: if a source disappears, the model substitutes the nearest neighbour. A human in the loop for any flagged anomaly. AI used for both modelling and anomaly detection. Sources do disappear – Google Mobility was cited as a recent example – and the architecture is built around the expectation that this will happen, not the hope that it will not.

Another panellist made the same point philosophically. Most statistical practice, he argued, assumes that markets are at some form of equilibrium and that the data around them is stationary with independent noise. He suggested that markets never reach equilibrium. The dynamics, in his view, were the most informative property of the data; the largest value, he said, is captured by detecting real change earlier than the rest of the market, because mispricing persists in the gap between the change happening and the market accommodating it. Treating data as stationary, he argued, imposes a structure that requires much larger change before it is detected at all.

The intuition – that the assumption of stability is itself a source of risk – connected directly to the monitoring practices described moments earlier, and to the two-year delay in finding the right use for the news flow dataset. Different vocabularies, the same underlying problem.

The Open Question

Vendor evaluation frameworks have matured. Monitoring regimes have matured. The discipline of asking, periodically and rigorously, whether a firm still understands what its own datasets are for has not. That, on the evidence of this session, is potentially where the most expensive failures now live.

Subscribe to our newsletter

Related content

WEBINAR

Recorded Webinar: From Data to Alpha: AI Strategies for Taming Unstructured Data

Unstructured data and text now accounts for the majority of information flowing through financial markets organisations, spanning research content, corporate disclosures, communications, alternative data, and internal documents. While AI has created new opportunities to extract signals, many firms are discovering that value is constrained not by models, but by the quality of the content, architecture,...

BLOG

Business Conduct Data in Demand as Risk Exposure Rises in a Complex World

Business conduct data is becoming more important to financial institutions as the risk of exposure to damaging incidents increases. A new survey of more than 500 C-suite risk leaders by RepRisk – a provider of data on business conduct risks faced by financial and other industries – found that four-fifths expect business conduct risk data...

EVENT

TradingTech Summit London

Now in its 15th year the TradingTech Summit London brings together the European trading technology capital markets industry and examines the latest changes and innovations in trading technology and explores how technology is being deployed to create an edge in sell side and buy side capital markets financial institutions.

GUIDE

AI in Capital Markets Handbook 2026

AI adoption in capital markets has moved into a more disciplined phase. The priority is now controlled deployment: where AI can be used safely, where it can deliver measurable value, and how outputs can be governed, monitored and evidenced. The 2026 edition of the AI in Capital Markets Handbook examines how AI is being applied...