Leveraging NLP for Regulatory Compliance

Natural language processing (NLP) is being used to accelerate every step of the compliance journey, from identifying relevant regulatory updates to understanding their content to mapping the changes onto internal infrastructure, according to panellists on a recent A-Team Group webinar, ‘Leveraging NLP for regulatory compliance’.

Key takeaways included:

Many firms have yet to recognise the potential of NLP
Complementary disciplines are a vital element of any NLP strategy
NPL is applicable to both rules- and principles-based regulation
Firms need to be realistic about what can be achieved

The webinar examined how NLP, an emerging form of artificial intelligence (AI), is impacting work flows in the regulatory and compliance space. It featured panellists Yin Lu, head of product at Cube Global; Selja Seppala, lecturer in the department of business information systems at University College Cork; and Melanie Williams, senior regulatory change manager at Revolut.

The webinar kicked off with an audience poll, asking attendees what types of technologies they were using. Almost half (46%) referred to machine learning, with 38% mentioning NLP. Almost one-third (31%), however, said they were not using any form of AI, which underscores the challenge faced by those encouraging more universal adoption.

The panel discussed how best NLP is used in the compliance space. According to one webinar participant, NLP’s extensive use of speech and text-based data makes it particularly relevant to regulatory change management, which relies on analysis of text. This panelists also suggested NLP could be applied to fraud detection, which typically draws on surveillance of both voice and electronic communications channels. More broadly, the technique can be used to identify relationships within and between datasets, making it useful for fraud detection, counter terrorism and anti-money laundering (AML).

The panel identified three characteristics of text that make NLP a compelling capability as part of firms’ compliance efforts. Firstly, most text-based compliance data is extremely long. Secondly, the complexity of text is very high – not just the terminology but self-referencing and cross-referencing within the document. The Bank of England demonstrated this when it compared pre-crash regulatory text with post-crash regulatory text and found that it took longer to process linguistic units.

The third aspect addresses the lack of standardisation in terms of language (as humans often talk about the same concepts in different ways) and data format. For the most part, firms are dealing with regulation that is not machine-readable. To allow computer processing of text-based rules, NLP can be used with techniques like computer vision – an interdisciplinary scientific field that allows computers to gain high-level understanding from digital images or videos – to extract key points from text and put them into machine-readable format.

The panel agreed that regulatory requirements are becoming more demanding, which is driving financial institutions to consider NLP as a response. NLP creates the promise of greater efficiency as compliance teams spend less time manually scanning documents to find relevant elements and keep track of incremental changes, and use NLP technology instead to automate tracking of all the relevant content segregated by type. Using NLP, compliance executives can determine between an obligation and merely guidance or general market information. This too frees up time for other tasks and reduces human error.. All this helps demonstrate compliance to regulators.

In general, the panel suggested that rules-based rather than principles-based regulation as more appropriate for NLP solutions. Rules-based regulations provide more concrete information such as transaction amounts, dates, durations and mentions of individuals following clear patterns, which is well suited to the use of NLP, panellists agreed. But much depends on the type of application case. For example, NLP could be suited for both types of regulation when seeking to detect regulatory change, reduce manual input and free up teams for more complex tasks, eventually reducing the total cost of compliance.

Panellists discussed the various ways NLP can fit into firms’ compliance systems, and detailed how regulatory content could be fed into an organisation’s existing internal systems or how a solution provider could offer new tools for capturing content. What’s key with the latter approach, the panel agreed, is that firms manage expectations about how the technology can support their compliance programme before finding the right provider.

Implementation requires a significant amount of effort because there are a lot of moving parts to manage internally, so one option is to take a phased approach to implementation and be realistic. This is not a quick fix and might take some time to reap the benefits, said one of the participants. Companies have to invest in testing and implementation as NLP relies on machine learning so the more feedback they put into it the better the machine can understand their requirements.

The next audience poll considered implementation challenges. Two-thirds of respondents referred to a shortfall in necessary skills, followed by legacy systems.

It was noted that semantic processing is possibly the most difficult NLP-related task because it requires not only NLP but external resources and knowledge bases and even in non-specialised areas the technology is still being developed. Among the other major challenges identified by the panel was the complexity of regulatory texts. At a linguistic level, regulators tend to use very long sentences with lots of relative clauses and lists, which results in very complex syntactic structures that are difficult for NLP systems to parse and that can lead to low performance and impact user confidence.

A second challenge is related to data silos, which can be internal to an organisation’s data or across legislation, so an NLP model developed for one piece of legislation may not be readily applicable for documents in another piece of legislation. In the poll, 42% of audience poll respondents referred to data silos as an obstacle to implementation.

The third challenge is posed by the need to have reliably annotated, domain-specific data sets that can be used to develop accurate and reliable NLP models and also for evaluating and validating them. For this reason it is very important to have benchmarks to evaluate a system to show that it is reliable. To ensure the annotations are reliable it is also important to involve subject matter experts in the process, which is also a challenge as they have to be trained in the tasks they are involved in.

To underline the complexity required to create just one data set for a sub-area, one panellist referenced the Contract Understanding Atticus Dataset or CUAD, which in the course of a single year received 13,000 annotations of important clauses.

An audience member asked where firms should start with their NLP implementations. The panel suggested organisations should set out what they expect to achieve, the kind of content that will be needed, what systems the NLP will plug into, who is going to be involved, and the resources available.

Subscribe to our newsletter

Browse by brand

RegTech Insight

TradingTech Insight

Data Management Insight

Browse by content type

A-Team Insight Blogs

Leveraging NLP for Regulatory Compliance

Share article

Related content

WEBINAR

Recorded Webinar: Best approaches for trade and transaction reporting

BLOG

AI Everywhere at A-Team Group’s RegTech Summit (NYC) 2025

EVENT

TEST Event page 2

GUIDE

Regulatory Data Handbook 2025 – Thirteenth Edition

Share on Mastodon

A-Team Insight Blogs

Leveraging NLP for Regulatory Compliance

Share article

Related content

webinars

Recorded Webinar: Hearing from the Experts: AI Governance Best Practices

Related content

WEBINAR

Recorded Webinar: Best approaches for trade and transaction reporting

BLOG

AI Everywhere at A-Team Group’s RegTech Summit (NYC) 2025

EVENT

TEST Event page 2

GUIDE

Regulatory Data Handbook 2025 – Thirteenth Edition