A-Team Insight Blogs

Clean Data Is Not Enough to Power AI

26 May 2026

Subscribe to our newsletter

By Shai Popat, managing director, product and commercial strategy, financial information, SIX.

Agentic AI projects are beginning to roll out across the financial industry. Many firms are testing AI’s feasibility by assigning it relatively simple tasks, such as summarising information or retrieving data and documents from internal databases.

Two maxims are often cited when discussing AI adoption. One is “Garbage in, garbage out”, and the other is “Trust but verify”. Both are important principles, but neither fully explains how to solve one of AI’s biggest problems: hallucinations.

“Garbage in, garbage out” relates to data quality. If the data used to train or feed an AI model is poor, the insights it generates will inevitably suffer. Clean, sanitised and optimised data inputs are critically important, but they are not enough on their own.

This becomes especially important in complex financial datasets such as corporate actions, symbology, issuer hierarchies and regulatory reporting, where even a small inconsistency in interpretation can materially affect multiple teams and workflows across a financial institution.

Hallucinations Persist

Many firms experimenting with agentic AI will already have encountered hallucinations despite using high quality data inputs. For agentic AI to meet the high standards required for safe use in financial institutions, robust governance frameworks are essential. Firms are increasingly navigating the balance between the benefits of speed and the risks posed by “acceptable” hallucinations.

Legacy infrastructure is also ill-equipped to handle the rapid inflow of data AI systems require. Even the highest quality datasets will occasionally contain errors. Previously, if something looked wrong, a human could intervene and correct it. And even if an issue was missed initially, the pace of human decision-making often meant it would be caught before spreading further.

AI operates at a very different speed. A single incorrect input can be processed and reused multiple times before anyone notices. Consider a corporate action example. If an AI agent hallucinates an incorrect coupon payment from a bond prospectus, that error could flow into cashflow projections, distorting portfolio valuations and bond pricing. Firms therefore need clear frameworks governing how data is used across AI systems to ensure everyone is working from the same accurate information.

Trace and Validate

This is where the second adage, “trust but verify”, becomes equally important. If an AI-generated output appears questionable, the person overseeing it must be able to trace and validate the result rather than simply accept it at face value.

That requires governance around the verification process itself. If AI remains a black box, determining why a hallucination occurred, or even whether an output is wrong, becomes extremely time-consuming. But with the right controls in place, firms can understand how the AI moved from A to B. That makes it far easier to identify the root cause of a hallucination and verify whether the output is correct.

For decades, financial firms have developed governance procedures to minimise the impact of human error, whether fat-finger trades or an extra zero entered into a system. The same discipline is now needed for AI, ensuring outputs move from plausible to defensible and from interesting to usable.

Semantic Layer

As AI adoption becomes more widespread, increasing volumes of data are being transmitted through APIs designed specifically for AI use cases. This simplifies access and removes translation layers that can introduce additional errors. Fewer layers also improve accountability, as there are fewer points at which errors can enter the process, making audit trails easier to manage.

The crucial ingredient that makes AI usable within this streamlined model is the semantic layer. Sitting above raw data and APIs, the semantic layer provides business meaning, enabling AI to translate a question into the correct data calls, joins and calculations using consistent definitions. APIs and Model Context Protocols (MCPs) make data accessible to AI agents, but they do not provide meaning on their own.

Even with high quality data, AI struggles without that context. It may not understand what Swiss-domiciled refers to, how identifiers such as Valor, ISIN and LEI connect to one another, or how ratings should be standardised. The result is often a fragmented set of outputs that users must piece together manually, undermining the very purpose of agentic AI.

For data providers, which play a critical role in enabling the adoption of agentic AI, the responsibility extends beyond simply supplying high quality data. They must also help build the foundations of the semantic layer through standardised datasets and identifier mapping. Increasingly, this data is delivered through cloud-enabled architectures, reducing fragmentation, improving consistency and enhancing timeliness. This is particularly valuable when querying large volumes of information. Cloud infrastructure, combined with APIs accessible through Model Context Protocols, creates a scalable bridge between AI agents and enterprise data, replacing one-off queries with continuous, structured access.

The firms that succeed with agentic AI will be those with the strongest data foundations and governance frameworks, capable of combining advanced AI models with interoperability and contextual understanding across reference, regulatory and event-driven datasets. That is what will allow high quality data to be used to its fullest extent without introducing additional risk.

Subscribe to our newsletter

Data Management Insight

WEBINAR

Recorded Webinar: The ROI of Data Trust: Quantifying the Business Value of Data Observability

Data is the fuel that keeps modern financial institutions’ motors running but if that data can’t be trusted then the decisions made based upon it, or the uses to which its put, will be compromised. That’s especially important for data that’s fed into artificial intelligence models. If the data isn’t clean, accurate and complete, then...

Find out more

08 July 2026

Data Management Insight

BLOG

Direct Lending Practitioners Target Large Tech Budget Growth on Data

An overwhelming majority of private credit market practitioners are planning to substantially increase their technology budgets as they seek to address risks that are contributing to concerns about the direct lending sector. The Compass 2026 survey conducted for Oxane Partners – a technology provider for credit and other private markets – found that almost four-fifths...

10 June 2026

Data Management Insight Market & Alt Data Insight

EVENT

Data Management Summit London

Now in its 16th year, the Data Management Summit (DMS) in London brings together the European capital markets enterprise data management community, to explore how data strategy is evolving to drive business outcomes and speed to market in changing times.

26 March 2027

Data Management Insight

GUIDE

AI in Capital Markets Handbook 2026

AI adoption in capital markets has moved into a more disciplined phase. The priority is now controlled deployment: where AI can be used safely, where it can deliver measurable value, and how outputs can be governed, monitored and evidenced. The 2026 edition of the AI in Capital Markets Handbook examines how AI is being applied...

21 May 2026

Data Management Insight Market & Alt Data Insight RegTech Insight TradingTech Insight

Browse by brand

Market & Alt Data Insight

TradingTech Insight

Digital Assets & Tokenisation Insight

Data Management Insight

RegTech Insight

Browse by content type

A-Team Insight Blogs

Clean Data Is Not Enough to Power AI

Share article

Related content

WEBINAR

Recorded Webinar: The ROI of Data Trust: Quantifying the Business Value of Data Observability

BLOG

Direct Lending Practitioners Target Large Tech Budget Growth on Data

EVENT

Data Management Summit London

GUIDE

AI in Capital Markets Handbook 2026

Share on Mastodon

A-Team Insight Blogs

Clean Data Is Not Enough to Power AI

Share article

Related content

webinars

Upcoming Webinar: Executing the Migration to Cloud to Enable Scalability and Innovation

Related content

WEBINAR

Recorded Webinar: The ROI of Data Trust: Quantifying the Business Value of Data Observability

BLOG

Direct Lending Practitioners Target Large Tech Budget Growth on Data

EVENT

Data Management Summit London

GUIDE

AI in Capital Markets Handbook 2026