About a-team Marketing Services
The knowledge platform for the financial technology industry
The knowledge platform for the financial technology industry

A-Team Insight Blogs

FCA and Turing Institute Collaborate on Synthetic Data to Advance AML Detection

Subscribe to our newsletter

The Financial Conduct Authority has published a research note from its synthetic data anti-money laundering project, an initiative that began in autumn 2024 and was developed with the Alan Turing Institute, Plenitude Consulting, and Napier AI to create a synthetic dataset for AML detection testing. The paper marks the culmination of that work to date and sets out how the dataset will be used in the FCA’s upcoming AML detection sprint, giving firms a controlled environment in which to test transaction monitoring approaches without relying on live customer data. The project adds to the FCA’s use of shared testing environments for regulatory innovation and industry collaboration.

Firms need rich transactional data to test whether models can identify suspicious behaviour that unfolds across multiple accounts, entities, and payment flows, yet access to that data is restricted by privacy, legal and confidentiality concerns. The FCA notes that criminals are estimated to launder between 2% and 5% of global GDP each year, or about $800 billion to $2 trillion, underscoring both the scale of the threat and the pressure on financial institutions to invest in more effective detection and prevention tools.

The regulator worked with the Alan Turing Institute, Plenitude Consulting and Napier AI, each contributing a different layer of expertise. The FCA’s role covered regulatory leadership, oversight, and technical skills. The Alan Turing Institute brought synthetic data research and technical capability. Plenitude contributed financial crime and industry expertise. Napier AI supplied applied technology and product experience in financial crime detection.

Methodology

The dataset was created in three stages:

  • The project began with real banking data that had already been anonymised at source, with no personal data included in the initial request. That provided the statistical base for the exercise while limiting privacy risk from the outset.
  • The dataset was then enriched with synthetic money laundering typologies designed to reflect recognisable real-world behaviours associated with illicit finance, so the resulting data could be used to test whether detection tools can identify suspicious patterns rather than only normal transaction activity.
  • Using the anonymised source data and the embedded typologies, the team generated fully synthetic datasets with the Adaptive and Iterative Mechanism (AIM), a privacy-focused method that introduces controlled randomness to prevent reverse engineering of individual customers or transactions while preserving enough structure for meaningful analysis. Access to the final dataset is limited to participating firms in the data sprint under contractual and control measures.

The testing phase showed the dataset is usable for controlled AML testing. The team found little statistical divergence from the anonymised source data, while maintaining privacy safeguards through the generation process. Tests on the embedded typologies also produced a useful range of detectability, rather than patterns that were either too obvious or too obscure.

That next phase is the Synthetic Data AML Solution Sprint, which the FCA will run through its Digital Sandbox. The regulator says participating firms will use the dataset to test transaction monitoring approaches and then reconvene to share findings. Applications are open until 26 April. The stated aim is to create a setting in which new detection techniques can be demonstrated and challenged without exposing real customer data or requiring privileged access to bank datasets.

The report is also careful about the limits of what synthetic data can do:

  • The dataset can only capture known typologies and cannot reflect laundering methods that have not yet been identified or codified. It also notes internal coherence challenges, including the difficulty of preserving realistic relationships between customers, accounts and transactions, and the limitations of modelling time-based behaviour when transactions are generated independently. In some cases, the project chose to document anomalies rather than sanitise the data, on the basis that an unrealistically clean dataset would be less useful for AML testing.
  • Synthetic data can contain emergent artefacts that arise from privacy processing, typology injection, and modelling choices rather than genuine risk signals. Firms could optimise their systems to the typologies embedded in the dataset without improving broader detection capability, or place too much confidence in results derived from synthetic rather than live operational data. The FCA’s conclusion is that synthetic data should complement rather than replace real-world calibration and validation.

After the data sprint, the FCA will use participant feedback to refine the dataset, so it continues to reflect the complexity of financial crime and consider whether access could eventually extend beyond the sandbox. Any broader rollout would depend on stronger governance, privacy safeguards, and alignment with international standards, supported by clearer technical documentation and disclosure of limitations. The next phase will also need to address which additional typologies should be added, how access should be managed beyond the sandbox, and what evaluation standards are needed so results can be compared and trusted across participants.

The project shows the FCA using collaborative testing infrastructure to address a persistent financial crime risk. By bringing together public sector oversight, academic research, and industry expertise to build a usable synthetic AML testing environment, the regulator is showing how structured collaboration can strengthen detection capabilities while preserving the confidentiality and controlled handling of sensitive financial data.

Subscribe to our newsletter

Related content

WEBINAR

Recorded Webinar: Sponsored by FundGuard: NAV Resilience Under DORA, A Year of Lessons Learned

The EU’s Digital Operational Resilience Act (DORA) came into force a year ago, and is reshaping how asset managers, asset owners and fund service providers think about operational risk. While DORA’s focus is squarely on ICT resilience and third-party dependencies, its implications extend deep into core operational processes that are critical to market integrity, investor...

BLOG

ISDA Taps Gentek AI for DRR Traceability Tool

The International Securities Swaps and Derivatives Association has selected Gentek AI to build a traceability tool for Digital Regulatory Reporting (DRR). Gentek will develop a tool designed to let users track the history of DRR decision-making and connect coding choices back to regulatory requirements. The story behind the announcement is that Gentek comes to the...

EVENT

ExchangeTech Summit London

A-Team Group, organisers of the TradingTech Summits, are pleased to announce the inaugural ExchangeTech Summit London on May 14th 2026. This dedicated forum brings together operators of exchanges, alternative execution venues and digital asset platforms with the ecosystem of vendors driving the future of matching engines, surveillance and market access.

GUIDE

Regulatory Data Handbook 2024 – Twelfth Edition

Welcome to the twelfth edition of A-Team Group’s Regulatory Data Handbook, a unique and useful guide to capital markets regulation, regulatory change and the data and data management requirements of compliance. The handbook covers regulation in Europe, the UK, US and Asia-Pacific. This edition of the handbook includes a detailed review of acts, plans and...