About a-team Marketing Services
The knowledge platform for the financial technology industry
The knowledge platform for the financial technology industry

A-Team Insight Blogs

FCA and Turing Institute Collaborate on Synthetic Data to Advance AML Detection

Subscribe to our newsletter

The Financial Conduct Authority has published a research note from its synthetic data anti-money laundering project, an initiative that began in autumn 2024 and was developed with the Alan Turing Institute, Plenitude Consulting, and Napier AI to create a synthetic dataset for AML detection testing. The paper marks the culmination of that work to date and sets out how the dataset will be used in the FCA’s upcoming AML detection sprint, giving firms a controlled environment in which to test transaction monitoring approaches without relying on live customer data. The project adds to the FCA’s use of shared testing environments for regulatory innovation and industry collaboration.

Firms need rich transactional data to test whether models can identify suspicious behaviour that unfolds across multiple accounts, entities, and payment flows, yet access to that data is restricted by privacy, legal and confidentiality concerns. The FCA notes that criminals are estimated to launder between 2% and 5% of global GDP each year, or about $800 billion to $2 trillion, underscoring both the scale of the threat and the pressure on financial institutions to invest in more effective detection and prevention tools.

The regulator worked with the Alan Turing Institute, Plenitude Consulting and Napier AI, each contributing a different layer of expertise. The FCA’s role covered regulatory leadership, oversight, and technical skills. The Alan Turing Institute brought synthetic data research and technical capability. Plenitude contributed financial crime and industry expertise. Napier AI supplied applied technology and product experience in financial crime detection.

Methodology

The dataset was created in three stages:

  • The project began with real banking data that had already been anonymised at source, with no personal data included in the initial request. That provided the statistical base for the exercise while limiting privacy risk from the outset.
  • The dataset was then enriched with synthetic money laundering typologies designed to reflect recognisable real-world behaviours associated with illicit finance, so the resulting data could be used to test whether detection tools can identify suspicious patterns rather than only normal transaction activity.
  • Using the anonymised source data and the embedded typologies, the team generated fully synthetic datasets with the Adaptive and Iterative Mechanism (AIM), a privacy-focused method that introduces controlled randomness to prevent reverse engineering of individual customers or transactions while preserving enough structure for meaningful analysis. Access to the final dataset is limited to participating firms in the data sprint under contractual and control measures.

The testing phase showed the dataset is usable for controlled AML testing. The team found little statistical divergence from the anonymised source data, while maintaining privacy safeguards through the generation process. Tests on the embedded typologies also produced a useful range of detectability, rather than patterns that were either too obvious or too obscure.

That next phase is the Synthetic Data AML Solution Sprint, which the FCA will run through its Digital Sandbox. The regulator says participating firms will use the dataset to test transaction monitoring approaches and then reconvene to share findings. Applications are open until 26 April. The stated aim is to create a setting in which new detection techniques can be demonstrated and challenged without exposing real customer data or requiring privileged access to bank datasets.

The report is also careful about the limits of what synthetic data can do:

  • The dataset can only capture known typologies and cannot reflect laundering methods that have not yet been identified or codified. It also notes internal coherence challenges, including the difficulty of preserving realistic relationships between customers, accounts and transactions, and the limitations of modelling time-based behaviour when transactions are generated independently. In some cases, the project chose to document anomalies rather than sanitise the data, on the basis that an unrealistically clean dataset would be less useful for AML testing.
  • Synthetic data can contain emergent artefacts that arise from privacy processing, typology injection, and modelling choices rather than genuine risk signals. Firms could optimise their systems to the typologies embedded in the dataset without improving broader detection capability, or place too much confidence in results derived from synthetic rather than live operational data. The FCA’s conclusion is that synthetic data should complement rather than replace real-world calibration and validation.

After the data sprint, the FCA will use participant feedback to refine the dataset, so it continues to reflect the complexity of financial crime and consider whether access could eventually extend beyond the sandbox. Any broader rollout would depend on stronger governance, privacy safeguards, and alignment with international standards, supported by clearer technical documentation and disclosure of limitations. The next phase will also need to address which additional typologies should be added, how access should be managed beyond the sandbox, and what evaluation standards are needed so results can be compared and trusted across participants.

The project shows the FCA using collaborative testing infrastructure to address a persistent financial crime risk. By bringing together public sector oversight, academic research, and industry expertise to build a usable synthetic AML testing environment, the regulator is showing how structured collaboration can strengthen detection capabilities while preserving the confidentiality and controlled handling of sensitive financial data.

Subscribe to our newsletter

Related content

WEBINAR

Recorded Webinar: Navigating a Complex World: Best Data Practices in Sanctions Screening

As rising geopolitical uncertainty prompts an intensification in the complexity and volume of global economic and financial sanctions, banks and financial institutions are faced with a daunting set of new compliance challenges. The risk of inadvertently engaging with sanctioned securities has never been higher and the penalties for doing so are harsh. Traditional sanctions screening...

BLOG

What an Actimize Sale Might Mean for Surveillance and FinCrime Technology

When news emerged that NICE is preparing to sell its Actimize division – long regarded as one of the most established full-stack platforms for financial crime, fraud, and surveillance – the immediate headlines focused on valuation. With reports suggesting a price in the range of US$1.5–2 billion, the deal would be one of the RegTech...

EVENT

Eagle Alpha Alternative Data Conference, Spring, New York, hosted by A-Team Group

Now in its 9th year, the Eagle Alpha Alternative Data Conference managed by A-Team Group, is the premier content forum and networking event for investment firms and hedge funds.

GUIDE

The DORA Implementation Playbook: A Practitioner’s Guide to Demonstrating Resilience Beyond the Deadline

The Digital Operational Resilience Act (DORA) has fundamentally reshaped the European Union’s financial regulatory landscape, with its full application beginning on January 17, 2025. This regulation goes beyond traditional risk management, explicitly acknowledging that digital incidents can threaten the stability of the entire financial system. As the deadline has passed, the focus is now shifting...