About a-team Marketing Services
The knowledge platform for the financial technology industry
The knowledge platform for the financial technology industry

A-Team Insight Blogs

Where Generative AI Belongs in the Institutional Data Stack – and Where It Doesn’t

Subscribe to our newsletter

Two and a half years into the generative AI cycle, the working consensus among practitioners deploying AI against unstructured content in capital markets has quietly settled into a layered architecture. Generative models sit at the desktop, research and operational-efficiency layer. The feed, signal and modelling layer that drives systematic alpha still runs on the older, deterministic NLP stack. The distinction is not always made explicit, but it surfaced repeatedly during a recent A-Team Group webinar on AI strategies for unstructured data, sponsored by LSEG and moderated by Andrew Delaney, President and Chief Content Officer of A-Team Group. The panel featured Calum Conejo-Watt, Head of Data Governance & Quality at Lombard Odier Investment Managers; Nicole Allen, Director of Product, Text Analytics at LSEG; Katie Prideaux, Director of Quant Solutions, Analytics at LSEG; and Richard Peterson, CEO of MarketPsych Data.

The reason for the split is reproducibility. “With generative AI, the tagging that a lot of people are doing is not deterministic,” said Peterson. “You can give a sentiment score on a document and find that if you do it again later, you get a different output. That makes it less suited for alpha generation but still interesting for other use cases like document tagging.” Encoder-only transformer models such as BERT and RoBERTa, by contrast, produce stable, reproducible classifications – the same document scored at the same point in time returns the same result. For systematic strategies that depend on backtesting and model replication, that property is non-negotiable.

The same architectural split appears at the vendor level. “When I think of generative AI, I think of summarisation and question answering. A lot of those applications are useful for desktop analysis and research,” said Allen. “In terms of our feeds data and predictive analytics, I think about that somewhat separately. We are able to deal with larger volumes of content, and we have better connectors like MCPs.” MCP – Model Context Protocol – was raised several times on the webinar as a standardisation layer that improves auditability and helps avoid bespoke integration work, though Allen later cautioned that its role in cleaning unstructured data sets is secondary to its role in exploring them.

The Desktop and Operational Layer

At the operational layer, the case for generative AI is more straightforward. Prideaux pointed to ideation, code review and model interrogation as areas where generative tools have meaningfully changed the workflow. “You can take code, which technically would be structured data, and go and analyse that model. Is it running efficiently? Can it be optimised for performance or speed? Have I got any survivorship bias?” she said.

Conejo-Watt described a customer-experience deployment that captures the operational efficiency case: live Arabic-to-English transcription of incoming customer calls, semantic analysis of sentiment, and an LLM trained on standard operating procedures to advise agents in real time. “Maybe not alpha generation, but something that helped the organisation with complaints,” he said. The compliance review of investment documents – cross-referencing newsletters, prospectuses and disclosures for consistency – is another use case where generative summarisation reduces what was previously manual headcount.

None of these applications requires the deterministic reproducibility that the modelling layer demands. The trade-off the layered architecture accepts is that approximate answers are acceptable for human-in-the-loop workflows, but not for inputs feeding a backtest.

Where Generative AI Earns Its Place in Alpha Workflows

The picture is not that generative AI has no role in signal generation, but that its role is narrower and more specific than the prevailing narrative suggests. Peterson described one productive use: generating synthetic training data to improve the accuracy of downstream deterministic classifiers. “I can ask an LLM to tell me how a biotech executive sounds when he’s optimistic. You can get all these permutations and get synthetic data from the LLMs. That has really improved our accuracy and ability to score tight nuances,” he said.

MarketPsych’s emotion-classification work on earnings call transcripts illustrates the kind of signal that becomes accessible at scale once the architecture is in place. The firm classifies thirteen distinct emotions in CEO statements. According to Peterson, CEO enthusiasm and general positivity on calls do not correlate with subsequent outperformance – but optimism specifically, defined as future-oriented tone, does. “There are subtleties that we can tease out with AI,” he said. The result is consistent with a broader pattern: the marginal value comes from precise, narrow classification, not from broad summarisation.

The Cost and Licensing Overlay

The architectural split is now appearing in data-licensing contracts as well. Several panellists noted that data providers have begun differentiating pricing between traditional NLP use and generative reproduction. “I spoke with a text provider who said you are allowed to use LLMs on our content, you just can’t do generative reproduction,” said Peterson. “If you use generative AI, you can’t reproduce text, but you can use it to label the text.”

Conejo-Watt confirmed the trend on the consumer side. “Across data providers, you are not only paying for access to data, but now you have to pay for AI use cases. It really affects conversations internally,” he said. The EU AI Act introduces a further constraint: any processing that touches personal data falls into a high-risk category requiring additional guardrails. Conejo-Watt flagged Guide Labs’ Steerling-8B, an open-source model released earlier this year that is engineered to trace outputs back to specific training data, as the kind of architecture financial services will increasingly require. “It is all about trust,” he said.

The Adoption Plateau

A live audience poll during the session found just over half of attendees actively deploying AI to add structure to unstructured data sets, around a third with a plan in place, and 6 per cent with a complete process implemented. Allen noted that the distribution has not shifted materially in twelve months. “Doing this work does require a lot of resource, so people are being very thoughtful about it. You have to have the right expertise in house,” she said.

That caution maps onto the layered architecture. Firms that have separated the desktop layer from the modelling layer can deploy generative tools quickly against operational workflows while taking longer over the deterministic feed and signal layer. Firms that have not made the distinction tend to treat AI adoption as a single strategic decision – and stall.

The implication for buy-side data leaders is that “AI strategy” is no longer a useful unit of analysis. The more productive question is which layer of the data stack a given capability belongs in, what its tolerance for non-determinism is, and which licensing terms it is being deployed under. The architecture that has emerged is more pragmatic than the headlines suggest. It is also more defensible.

Subscribe to our newsletter

Related content

WEBINAR

Recorded Webinar: Navigating a Complex World: Best Data Practices in Sanctions Screening

As rising geopolitical uncertainty prompts an intensification in the complexity and volume of global economic and financial sanctions, banks and financial institutions are faced with a daunting set of new compliance challenges. The risk of inadvertently engaging with sanctioned securities has never been higher and the penalties for doing so are harsh. Traditional sanctions screening...

BLOG

CFTC File Format Change to Impact Futures Data Management Teams

For futures commission merchants, clearing members, proprietary trading firms, and banks with material futures and options exposure, the transition of CFTC Part 17 Large Trader Reporting to FIX Markup Language (FIXML) is a test of data management maturity. This change directly affects firms responsible for aggregating, validating, and submitting large trader position data, often across...

EVENT

Eagle Alpha Alternative Data Conference, Spring, New York, hosted by A-Team Group

Now in its 9th year, the Eagle Alpha Alternative Data Conference managed by A-Team Group, is the premier content forum and networking event for investment firms and hedge funds.

GUIDE

Hosted/Managed Services

The on-site data management model is broken. Resources have been squeezed to breaking point. The industry needs a new operating model if it is truly to do more with less. Can hosted/managed services provide the answer? Can the marketplace really create and maintain a utility-based approach to reference data management? And if so, how can...