About a-team Marketing Services
The knowledge platform for the financial technology industry
The knowledge platform for the financial technology industry

A-Team Insight Blogs

Where Generative AI Belongs in the Institutional Data Stack – and Where It Doesn’t

Subscribe to our newsletter

Two and a half years into the generative AI cycle, the working consensus among practitioners deploying AI against unstructured content in capital markets has quietly settled into a layered architecture. Generative models sit at the desktop, research and operational-efficiency layer. The feed, signal and modelling layer that drives systematic alpha still runs on the older, deterministic NLP stack. The distinction is not always made explicit, but it surfaced repeatedly during a recent A-Team Group webinar on AI strategies for unstructured data, sponsored by LSEG and moderated by Andrew Delaney, President and Chief Content Officer of A-Team Group. The panel featured Calum Conejo-Watt, Head of Data Governance & Quality at Lombard Odier Investment Managers; Nicole Allen, Director of Product, Text Analytics at LSEG; Katie Prideaux, Director of Quant Solutions, Analytics at LSEG; and Richard Peterson, CEO of MarketPsych Data.

The reason for the split is reproducibility. “With generative AI, the tagging that a lot of people are doing is not deterministic,” said Peterson. “You can give a sentiment score on a document and find that if you do it again later, you get a different output. That makes it less suited for alpha generation but still interesting for other use cases like document tagging.” Encoder-only transformer models such as BERT and RoBERTa, by contrast, produce stable, reproducible classifications – the same document scored at the same point in time returns the same result. For systematic strategies that depend on backtesting and model replication, that property is non-negotiable.

The same architectural split appears at the vendor level. “When I think of generative AI, I think of summarisation and question answering. A lot of those applications are useful for desktop analysis and research,” said Allen. “In terms of our feeds data and predictive analytics, I think about that somewhat separately. We are able to deal with larger volumes of content, and we have better connectors like MCPs.” MCP – Model Context Protocol – was raised several times on the webinar as a standardisation layer that improves auditability and helps avoid bespoke integration work, though Allen later cautioned that its role in cleaning unstructured data sets is secondary to its role in exploring them.

The Desktop and Operational Layer

At the operational layer, the case for generative AI is more straightforward. Prideaux pointed to ideation, code review and model interrogation as areas where generative tools have meaningfully changed the workflow. “You can take code, which technically would be structured data, and go and analyse that model. Is it running efficiently? Can it be optimised for performance or speed? Have I got any survivorship bias?” she said.

Conejo-Watt described a customer-experience deployment that captures the operational efficiency case: live Arabic-to-English transcription of incoming customer calls, semantic analysis of sentiment, and an LLM trained on standard operating procedures to advise agents in real time. “Maybe not alpha generation, but something that helped the organisation with complaints,” he said. The compliance review of investment documents – cross-referencing newsletters, prospectuses and disclosures for consistency – is another use case where generative summarisation reduces what was previously manual headcount.

None of these applications requires the deterministic reproducibility that the modelling layer demands. The trade-off the layered architecture accepts is that approximate answers are acceptable for human-in-the-loop workflows, but not for inputs feeding a backtest.

Where Generative AI Earns Its Place in Alpha Workflows

The picture is not that generative AI has no role in signal generation, but that its role is narrower and more specific than the prevailing narrative suggests. Peterson described one productive use: generating synthetic training data to improve the accuracy of downstream deterministic classifiers. “I can ask an LLM to tell me how a biotech executive sounds when he’s optimistic. You can get all these permutations and get synthetic data from the LLMs. That has really improved our accuracy and ability to score tight nuances,” he said.

MarketPsych’s emotion-classification work on earnings call transcripts illustrates the kind of signal that becomes accessible at scale once the architecture is in place. The firm classifies thirteen distinct emotions in CEO statements. According to Peterson, CEO enthusiasm and general positivity on calls do not correlate with subsequent outperformance – but optimism specifically, defined as future-oriented tone, does. “There are subtleties that we can tease out with AI,” he said. The result is consistent with a broader pattern: the marginal value comes from precise, narrow classification, not from broad summarisation.

The Cost and Licensing Overlay

The architectural split is now appearing in data-licensing contracts as well. Several panellists noted that data providers have begun differentiating pricing between traditional NLP use and generative reproduction. “I spoke with a text provider who said you are allowed to use LLMs on our content, you just can’t do generative reproduction,” said Peterson. “If you use generative AI, you can’t reproduce text, but you can use it to label the text.”

Conejo-Watt confirmed the trend on the consumer side. “Across data providers, you are not only paying for access to data, but now you have to pay for AI use cases. It really affects conversations internally,” he said. The EU AI Act introduces a further constraint: any processing that touches personal data falls into a high-risk category requiring additional guardrails. Conejo-Watt flagged Guide Labs’ Steerling-8B, an open-source model released earlier this year that is engineered to trace outputs back to specific training data, as the kind of architecture financial services will increasingly require. “It is all about trust,” he said.

The Adoption Plateau

A live audience poll during the session found just over half of attendees actively deploying AI to add structure to unstructured data sets, around a third with a plan in place, and 6 per cent with a complete process implemented. Allen noted that the distribution has not shifted materially in twelve months. “Doing this work does require a lot of resource, so people are being very thoughtful about it. You have to have the right expertise in house,” she said.

That caution maps onto the layered architecture. Firms that have separated the desktop layer from the modelling layer can deploy generative tools quickly against operational workflows while taking longer over the deterministic feed and signal layer. Firms that have not made the distinction tend to treat AI adoption as a single strategic decision – and stall.

The implication for buy-side data leaders is that “AI strategy” is no longer a useful unit of analysis. The more productive question is which layer of the data stack a given capability belongs in, what its tolerance for non-determinism is, and which licensing terms it is being deployed under. The architecture that has emerged is more pragmatic than the headlines suggest. It is also more defensible.

Subscribe to our newsletter

Related content

WEBINAR

Upcoming Webinar: Optimising cloud, marketplaces & managed data services

Date: 30 June 2026 Time: 10:00am ET / 3:00pm London / 4:00pm CET Duration: 50 minutes Financial institutions are under mounting pressure to rethink how they source, manage and distribute market data. Rising data volumes, multi-cloud adoption and the operational demands of regulations such as DORA are exposing the limits of legacy infrastructure, and driving...

BLOG

When Margin Moves Upstream: How TT is Reworking Trading Decisions After the OpenGamma Deal

More than a month after completing its acquisition of OpenGamma, Trading Technologies is beginning to articulate how the deal is intended to change the way firms think about margin, capital efficiency, and trading decision-making. Rather than positioning margin as a downstream risk or treasury concern, TT is now framing capital efficiency as a front-office variable...

EVENT

RegTech Summit New York

Now in its 9th year, the RegTech Summit in New York will bring together the RegTech ecosystem to explore how the North American capital markets financial industry can leverage technology to drive innovation, cut costs and support regulatory change.

GUIDE

The Reference Data Utility Handbook

The potential of a reference data utility model has been discussed for many years, and while early implementations failed to gain traction, the model has now come of age as financial institutions look for new data management models that can solve the challenges of operational cost reduction, improved data quality and regulatory compliance. The multi-tenanted...