A-Team Insight Blogs

The Lexicon is Dead – Time to Reboot

19 March 2021

Subscribe to our newsletter

By Matt Storey, Chief Product Officer, SteelEye.

For years now, surveillance of regulated employees at financial services firms has relied on the lexicon, a piece of technology that scans staff for specific words or sequences of words that could be associated with an act of market abuse or misconduct. For example, a firm might have a lexicon watch for the word ‘secret’ as it might be indicative of a person who is sharing information they should not be.

Over time, the lexicon has become regarded as an essential compliance tool that is frustrating to use, in part because the technology used often generates a high number of inaccurate results (false positives) and hasn’t been updated for many years, despite advances in how and where we communicate.

Despite its flaws, the lexicon remains a powerful tool for identifying misconduct and signs of market abuse and will always have an important role to play in compliance. However, what is being used today is not fit for purpose. It is time to reinvent it.

New communications channels and evolving language

Communication channels are constantly changing, and new platforms pose a big risk for firms’ compliance teams as they represent new places where regulated employees can hold unmonitored conversations. For example, despite the policies many financial firms have which ban the use of WhatsApp, traders are still able to communicate illicitly on this channel. Also, preferred channels can change rapidly, with use of Signal increasing after Elon Musk advocated that people switch from WhatsApp to this growing communications channel.

Compliance teams need to either capture the communications on these channels or flag the intent to communicate on unauthorised channels. However, most of lexicons are rigid and inflexible and do not allow for regular updates. This means that firms are unable to set up searches for communications such as ‘let’s speak on WhatsApp’ because the word ‘WhatsApp’ simply hasn’t been added to the lexicon dictionary.

Language has also changed. On instant messaging platforms we often abbreviate. ‘I will’ becomes ‘I’ll’ or simply ‘ill’. ‘Tomorrow’ becomes ‘tmr’ or ‘tomoz’. Yet most lexicons today only accept the full, correct or even British (as opposed to American) spelling of words, meaning that firms are likely to miss key signals of risk.

Understanding a wider context

Another key issue that limits the effectiveness of legacy lexicons is that they do not consider context. Take for example someone writing ‘let’s split it’. If this is in relation to a client account this should create a flag, but if two colleagues are talking about the lunch bill, it should not.

Because most lexicons today can’t distinguish between these types of situations, firms are often forced to limit themselves to only a small number of search terms to ensure they can manage the workload. This is a significant problem as these firms could be missing key signs of market abuse by not including enough words in their lexicon-based alerts.

There are of course the firms with larger teams that decide to add more phrases, but these compliance departments often find themselves overwhelmed with alert volumes – struggling to determine the genuine risks from the false positives, such as the lunch example above.

Firms need to link the language used with the intent by understanding the wider context of how a word has been applied.

A new approach

The industry needs a lexicon that can be adjusted in line with changes in how we communicate – capturing all iterations of words, colloquialisms, sentences, and how different people talk on different platforms. However, it is important to note that any communications surveillance system built around a lexicon needs to have advanced change controls and allow users to only roll out updates once they have evaluated the impact on their compliance programme.

Context is also vital to understand the intent behind a communication, as it enables analysis of what was said around a flagged word.

Another crucial element is the ability to both include and exclude words depending on the context in which they were used. For example, if the word ‘virus’ is used in the same sentence as ‘Covid-19’, the system should not raise a flag, but if the word is used together with the name of a co-worker or in a phrase such as ‘he is a virus’ this is likely a conduct misdemeanour that should be investigated.

Of course, deciphering modern language and how it is used is a no small feat. These are just some of the key areas where we hope to see advancements in lexicon technology.

Lexicons are, and will, remain important to help firms detect signs of wrongdoing – so the industry must focus on getting them right. By applying modern technology, the workings of lexicons can be improved significantly, balancing the needs of firms to monitor a much wider number of search terms, with more accuracy and less false positives.

Subscribe to our newsletter

RegTech Insight

WEBINAR

Upcoming Webinar: Sponsored by FundGuard: NAV Resilience Under DORA, A Year of Lessons Learned

Date: 25 February 2026 Time: 10:00am ET / 3:00pm London / 4:00pm CET Duration: 50 minutes The EU’s Digital Operational Resilience Act (DORA) came into force a year ago, and is reshaping how asset managers, asset owners and fund service providers think about operational risk. While DORA’s focus is squarely on ICT resilience and third-party...

Find out more

25 February 2026

RegTech Insight

BLOG

EU’s AMLA Sets Stage for Direct Supervision of High-Risk Cross-Border Banks

The EU’s new Anti-Money Laundering Authority (AMLA – the Authority)) moved from concept to reality in summer 2025 as it began operations in Frankfurt. The Authority has a mandate to drive supervisory convergence, coordinate Financial Intelligence Units (FIUs) and, from 2028, directly supervise a set of high-risk, cross-border financial institutions. The EU Anti Money Laundering...

19 August 2025

RegTech Insight

EVENT

TradingTech Summit London

Now in its 15th year the TradingTech Summit London brings together the European trading technology capital markets industry and examines the latest changes and innovations in trading technology and explores how technology is being deployed to create an edge in sell side and buy side capital markets financial institutions.

26 February 2026

TradingTech Insight

GUIDE

The DORA Implementation Playbook: A Practitioner’s Guide to Demonstrating Resilience Beyond the Deadline

The Digital Operational Resilience Act (DORA) has fundamentally reshaped the European Union’s financial regulatory landscape, with its full application beginning on January 17, 2025. This regulation goes beyond traditional risk management, explicitly acknowledging that digital incidents can threaten the stability of the entire financial system. As the deadline has passed, the focus is now shifting...

28 August 2025

RegTech Insight

Browse by brand

RegTech Insight

TradingTech Insight

Data Management Insight

Browse by content type

A-Team Insight Blogs

The Lexicon is Dead – Time to Reboot

Share article

Related content

WEBINAR

Upcoming Webinar: Sponsored by FundGuard: NAV Resilience Under DORA, A Year of Lessons Learned

BLOG

EU’s AMLA Sets Stage for Direct Supervision of High-Risk Cross-Border Banks

EVENT

TradingTech Summit London

GUIDE

The DORA Implementation Playbook: A Practitioner’s Guide to Demonstrating Resilience Beyond the Deadline

Share on Mastodon

A-Team Insight Blogs

The Lexicon is Dead – Time to Reboot

Share article

Related content

webinars

Recorded Webinar: Hearing from the Experts: AI Governance Best Practices

Related content

WEBINAR

Upcoming Webinar: Sponsored by FundGuard: NAV Resilience Under DORA, A Year of Lessons Learned

BLOG

EU’s AMLA Sets Stage for Direct Supervision of High-Risk Cross-Border Banks

EVENT

TradingTech Summit London

GUIDE

The DORA Implementation Playbook: A Practitioner’s Guide to Demonstrating Resilience Beyond the Deadline