About a-team Marketing Services
The knowledge platform for the financial technology industry
The knowledge platform for the financial technology industry

A-Team Insight Blogs

FCA and Turing Institute Collaborate on Synthetic Data to Advance AML Detection

Subscribe to our newsletter

The Financial Conduct Authority has published a research note from its synthetic data anti-money laundering project, an initiative that began in autumn 2024 and was developed with the Alan Turing Institute, Plenitude Consulting, and Napier AI to create a synthetic dataset for AML detection testing. The paper marks the culmination of that work to date and sets out how the dataset will be used in the FCA’s upcoming AML detection sprint, giving firms a controlled environment in which to test transaction monitoring approaches without relying on live customer data. The project adds to the FCA’s use of shared testing environments for regulatory innovation and industry collaboration.

Firms need rich transactional data to test whether models can identify suspicious behaviour that unfolds across multiple accounts, entities, and payment flows, yet access to that data is restricted by privacy, legal and confidentiality concerns. The FCA notes that criminals are estimated to launder between 2% and 5% of global GDP each year, or about $800 billion to $2 trillion, underscoring both the scale of the threat and the pressure on financial institutions to invest in more effective detection and prevention tools.

The regulator worked with the Alan Turing Institute, Plenitude Consulting and Napier AI, each contributing a different layer of expertise. The FCA’s role covered regulatory leadership, oversight, and technical skills. The Alan Turing Institute brought synthetic data research and technical capability. Plenitude contributed financial crime and industry expertise. Napier AI supplied applied technology and product experience in financial crime detection.

Methodology

The dataset was created in three stages:

  • The project began with real banking data that had already been anonymised at source, with no personal data included in the initial request. That provided the statistical base for the exercise while limiting privacy risk from the outset.
  • The dataset was then enriched with synthetic money laundering typologies designed to reflect recognisable real-world behaviours associated with illicit finance, so the resulting data could be used to test whether detection tools can identify suspicious patterns rather than only normal transaction activity.
  • Using the anonymised source data and the embedded typologies, the team generated fully synthetic datasets with the Adaptive and Iterative Mechanism (AIM), a privacy-focused method that introduces controlled randomness to prevent reverse engineering of individual customers or transactions while preserving enough structure for meaningful analysis. Access to the final dataset is limited to participating firms in the data sprint under contractual and control measures.

The testing phase showed the dataset is usable for controlled AML testing. The team found little statistical divergence from the anonymised source data, while maintaining privacy safeguards through the generation process. Tests on the embedded typologies also produced a useful range of detectability, rather than patterns that were either too obvious or too obscure.

That next phase is the Synthetic Data AML Solution Sprint, which the FCA will run through its Digital Sandbox. The regulator says participating firms will use the dataset to test transaction monitoring approaches and then reconvene to share findings. Applications are open until 26 April. The stated aim is to create a setting in which new detection techniques can be demonstrated and challenged without exposing real customer data or requiring privileged access to bank datasets.

The report is also careful about the limits of what synthetic data can do:

  • The dataset can only capture known typologies and cannot reflect laundering methods that have not yet been identified or codified. It also notes internal coherence challenges, including the difficulty of preserving realistic relationships between customers, accounts and transactions, and the limitations of modelling time-based behaviour when transactions are generated independently. In some cases, the project chose to document anomalies rather than sanitise the data, on the basis that an unrealistically clean dataset would be less useful for AML testing.
  • Synthetic data can contain emergent artefacts that arise from privacy processing, typology injection, and modelling choices rather than genuine risk signals. Firms could optimise their systems to the typologies embedded in the dataset without improving broader detection capability, or place too much confidence in results derived from synthetic rather than live operational data. The FCA’s conclusion is that synthetic data should complement rather than replace real-world calibration and validation.

After the data sprint, the FCA will use participant feedback to refine the dataset, so it continues to reflect the complexity of financial crime and consider whether access could eventually extend beyond the sandbox. Any broader rollout would depend on stronger governance, privacy safeguards, and alignment with international standards, supported by clearer technical documentation and disclosure of limitations. The next phase will also need to address which additional typologies should be added, how access should be managed beyond the sandbox, and what evaluation standards are needed so results can be compared and trusted across participants.

The project shows the FCA using collaborative testing infrastructure to address a persistent financial crime risk. By bringing together public sector oversight, academic research, and industry expertise to build a usable synthetic AML testing environment, the regulator is showing how structured collaboration can strengthen detection capabilities while preserving the confidentiality and controlled handling of sensitive financial data.

Subscribe to our newsletter

Related content

WEBINAR

Recorded Webinar: GenAI and LLM case studies for Surveillance, Screening and Scanning

As Generative AI (GenAI) and Large Language Models (LLMs) move from pilot to production, compliance, surveillance, and screening functions are seeing tangible results – and new risks. From trade surveillance to adverse media screening to policy and regulatory scanning, GenAI and LLMs promise to tackle complexity and volume at a scale never seen before. But...

BLOG

FCA Stablecoin Cohort Commences Live Sandbox Testing

The Financial Conduct Authority (FCA) has selected Monee Financial Technologies, ReStabilise, Revolut and VVTX to participate in a new stablecoins cohort within its Regulatory Sandbox, with testing due to begin in Q1 2026. On the FCA’s framing, the cohort is designed to test how stablecoin services operate against proposed UK rules in a controlled environment,...

EVENT

RegTech Summit New York

Now in its 9th year, the RegTech Summit in New York will bring together the RegTech ecosystem to explore how the North American capital markets financial industry can leverage technology to drive innovation, cut costs and support regulatory change.

GUIDE

Entity Data Management

Entity data management has historically been a rather overlooked area of the reference data landscape, but with the increase focus on managing risk, the industry is finally taking notice. It is now generally agreed to be critical to every financial institution; although the rewards for investment in entity data management appear to be rather small,...