A-Team Insight Blogs

Where Generative AI Belongs in the Institutional Data Stack – and Where It Doesn’t

18 May 2026

Subscribe to our newsletter

Two and a half years into the generative AI cycle, the working consensus among practitioners deploying AI against unstructured content in capital markets has quietly settled into a layered architecture. Generative models sit at the desktop, research and operational-efficiency layer. The feed, signal and modelling layer that drives systematic alpha still runs on the older, deterministic NLP stack. The distinction is not always made explicit, but it surfaced repeatedly during a recent A-Team Group webinar on AI strategies for unstructured data, sponsored by LSEG and moderated by Andrew Delaney, President and Chief Content Officer of A-Team Group. The panel featured Calum Conejo-Watt, Head of Data Governance & Quality at Lombard Odier Investment Managers; Nicole Allen, Director of Product, Text Analytics at LSEG; Katie Prideaux, Director of Quant Solutions, Analytics at LSEG; and Richard Peterson, CEO of MarketPsych Data.

The reason for the split is reproducibility. “With generative AI, the tagging that a lot of people are doing is not deterministic,” said Peterson. “You can give a sentiment score on a document and find that if you do it again later, you get a different output. That makes it less suited for alpha generation but still interesting for other use cases like document tagging.” Encoder-only transformer models such as BERT and RoBERTa, by contrast, produce stable, reproducible classifications – the same document scored at the same point in time returns the same result. For systematic strategies that depend on backtesting and model replication, that property is non-negotiable.

The same architectural split appears at the vendor level. “When I think of generative AI, I think of summarisation and question answering. A lot of those applications are useful for desktop analysis and research,” said Allen. “In terms of our feeds data and predictive analytics, I think about that somewhat separately. We are able to deal with larger volumes of content, and we have better connectors like MCPs.” MCP – Model Context Protocol – was raised several times on the webinar as a standardisation layer that improves auditability and helps avoid bespoke integration work, though Allen later cautioned that its role in cleaning unstructured data sets is secondary to its role in exploring them.

The Desktop and Operational Layer

At the operational layer, the case for generative AI is more straightforward. Prideaux pointed to ideation, code review and model interrogation as areas where generative tools have meaningfully changed the workflow. “You can take code, which technically would be structured data, and go and analyse that model. Is it running efficiently? Can it be optimised for performance or speed? Have I got any survivorship bias?” she said.

Conejo-Watt described a customer-experience deployment that captures the operational efficiency case: live Arabic-to-English transcription of incoming customer calls, semantic analysis of sentiment, and an LLM trained on standard operating procedures to advise agents in real time. “Maybe not alpha generation, but something that helped the organisation with complaints,” he said. The compliance review of investment documents – cross-referencing newsletters, prospectuses and disclosures for consistency – is another use case where generative summarisation reduces what was previously manual headcount.

None of these applications requires the deterministic reproducibility that the modelling layer demands. The trade-off the layered architecture accepts is that approximate answers are acceptable for human-in-the-loop workflows, but not for inputs feeding a backtest.

Where Generative AI Earns Its Place in Alpha Workflows

The picture is not that generative AI has no role in signal generation, but that its role is narrower and more specific than the prevailing narrative suggests. Peterson described one productive use: generating synthetic training data to improve the accuracy of downstream deterministic classifiers. “I can ask an LLM to tell me how a biotech executive sounds when he’s optimistic. You can get all these permutations and get synthetic data from the LLMs. That has really improved our accuracy and ability to score tight nuances,” he said.

MarketPsych’s emotion-classification work on earnings call transcripts illustrates the kind of signal that becomes accessible at scale once the architecture is in place. The firm classifies thirteen distinct emotions in CEO statements. According to Peterson, CEO enthusiasm and general positivity on calls do not correlate with subsequent outperformance – but optimism specifically, defined as future-oriented tone, does. “There are subtleties that we can tease out with AI,” he said. The result is consistent with a broader pattern: the marginal value comes from precise, narrow classification, not from broad summarisation.

The Cost and Licensing Overlay

The architectural split is now appearing in data-licensing contracts as well. Several panellists noted that data providers have begun differentiating pricing between traditional NLP use and generative reproduction. “I spoke with a text provider who said you are allowed to use LLMs on our content, you just can’t do generative reproduction,” said Peterson. “If you use generative AI, you can’t reproduce text, but you can use it to label the text.”

Conejo-Watt confirmed the trend on the consumer side. “Across data providers, you are not only paying for access to data, but now you have to pay for AI use cases. It really affects conversations internally,” he said. The EU AI Act introduces a further constraint: any processing that touches personal data falls into a high-risk category requiring additional guardrails. Conejo-Watt flagged Guide Labs’ Steerling-8B, an open-source model released earlier this year that is engineered to trace outputs back to specific training data, as the kind of architecture financial services will increasingly require. “It is all about trust,” he said.

The Adoption Plateau

A live audience poll during the session found just over half of attendees actively deploying AI to add structure to unstructured data sets, around a third with a plan in place, and 6 per cent with a complete process implemented. Allen noted that the distribution has not shifted materially in twelve months. “Doing this work does require a lot of resource, so people are being very thoughtful about it. You have to have the right expertise in house,” she said.

That caution maps onto the layered architecture. Firms that have separated the desktop layer from the modelling layer can deploy generative tools quickly against operational workflows while taking longer over the deterministic feed and signal layer. Firms that have not made the distinction tend to treat AI adoption as a single strategic decision – and stall.

The implication for buy-side data leaders is that “AI strategy” is no longer a useful unit of analysis. The more productive question is which layer of the data stack a given capability belongs in, what its tolerance for non-determinism is, and which licensing terms it is being deployed under. The architecture that has emerged is more pragmatic than the headlines suggest. It is also more defensible.

Subscribe to our newsletter

WEBINAR

Recorded Webinar: From Data to Alpha: AI Strategies for Taming Unstructured Data

Unstructured data and text now accounts for the majority of information flowing through financial markets organisations, spanning research content, corporate disclosures, communications, alternative data, and internal documents. While AI has created new opportunities to extract signals, many firms are discovering that value is constrained not by models, but by the quality of the content, architecture,...

Find out more

16 April 2026

Market & Alt Data Insight TradingTech Insight

BLOG

Alt Data’s New Competitive Edge: From Discovery to Synthesis

Has the alternative data industry crossed a maturity threshold? The competitive advantage has migrated from simply having access to novel datasets to building superior frameworks for combining them, and AI is the engine driving that shift. But as a panel of senior practitioners made clear at the recent A-Team/Eagle Alpha Alternative Data Conference in New...

08 April 2026

Market & Alt Data Insight

EVENT

Data Management Summit New York City

Now in its 15th year the Data Management Summit NYC brings together the North American data management community to explore how data strategy is evolving to drive business outcomes and speed to market in changing times.

17 September 2026

Data Management Insight

GUIDE

AI in Capital Markets Handbook 2026

AI adoption in capital markets has moved into a more disciplined phase. The priority is now controlled deployment: where AI can be used safely, where it can deliver measurable value, and how outputs can be governed, monitored and evidenced. The 2026 edition of the AI in Capital Markets Handbook examines how AI is being applied...

21 May 2026

Data Management Insight Market & Alt Data Insight RegTech Insight TradingTech Insight

Browse by brand

Market & Alt Data Insight

TradingTech Insight

Digital Assets & Tokenisation Insight

Data Management Insight

RegTech Insight

Browse by content type

A-Team Insight Blogs

Where Generative AI Belongs in the Institutional Data Stack – and Where It Doesn’t

Share article

Related content

WEBINAR

Recorded Webinar: From Data to Alpha: AI Strategies for Taming Unstructured Data

BLOG

Alt Data’s New Competitive Edge: From Discovery to Synthesis

EVENT

Data Management Summit New York City

GUIDE

AI in Capital Markets Handbook 2026

Share on Mastodon

A-Team Insight Blogs

Where Generative AI Belongs in the Institutional Data Stack – and Where It Doesn’t

Share article

Related content

webinars

Recorded Webinar: Navigating the Build vs Buy Dilemma: Cloud Strategies for Accelerating Quantitative Research

Related content

WEBINAR

Recorded Webinar: From Data to Alpha: AI Strategies for Taming Unstructured Data

BLOG

Alt Data’s New Competitive Edge: From Discovery to Synthesis

EVENT

Data Management Summit New York City

GUIDE

AI in Capital Markets Handbook 2026