A-Team Insight Blogs

The Cost of Dirty Data

6 November 2018

Subscribe to our newsletter

By Giles Nelson, Chief Technology Officer, Financial Services, MarkLogic

The cost of dirty data – data that is inaccurate, incomplete or inconsistent – is enormous. Earlier this year, Gartner reported that, on average, poor quality data cost an organisation $15 million in 2017. These findings were reinforced by MIT Sloan Management Review, which reported that dirty data costs the average business an astonishing 15% to 25% cent of revenue.

With global revenues of around $80 billion per year, just in investment banking, this means the cost of dirty data in financial services is astronomical. So, where does it come from and what can be done about it?

What’s the source?

Human error is a significant source. An Experian study found human error influences over 60% of dirty data. When different departments are entering related data into separate data silos, without proper governance, fouling of downstream data warehouses, data marts and data lakes will occur. Records will be duplicated, with data such as misspellings of names and addresses. Data silos with poor constraints will also lead to dates, account numbers or personal information being shown in different formats, making them difficult or impossible to reconcile automatically.

Further, once created, dirty data can remain hidden for years, which makes it even more difficult to detect and deal with when it is actually found. Most businesses only find out about dirty data when it’s reported by customers or prospects – a particularly poor way to track down and solve data issues.

And, still in 2018, dealing with print is an issue for many financial services firms. The scanning, marking up and import of printed documents is a recipe for the introduction of errors.

Many organisations search for inconsistent and inaccurate data using manual processes because their data is decentralised and in too many different systems. Harvard Business Review reports that analysts spend 50% of their time searching for data, correcting errors and seeking out confirmatory sources for data they don’t trust. These processes tend to fall into the same trap as the data – instead of consolidated processing, each department is responsible for its own data inaccuracies. While this may work in some instances, it also contributes to internal inconsistencies between department silos. The fix happens in one place, but not in another, which just leads to more data problems.

The impacts of dirty data

All of these issues result in enormous productivity losses and, perhaps worse, to a systemic loss of confidence in the data being used to power the business. The estimates above of revenue loss because of poor data seem extraordinary, but even if they represent the upper limit of the true cost, the impact is still very significant.

In a highly regulated industry, such as financial services, dirty data has an even greater cost. Missing, incomplete and inaccurate data can lead to the wrong trade being made, decisions taking even longer as further manual checks are made, and regulatory breaches being made. MiFID II has, of course, placed significant extra burdens on financial firms to ensure their data is in order.

Cleaning up the mess

What can be done? Here are a few things that organisations having difficulty with dirty data should be thinking about:

Achieving one golden version of data has long been an objective. Be careful though – doing this for all the data in an organisation without setting the whole data estate in concrete is an impossible task.
Take a data-first approach, rather than model first. Cleaning up dirty data involves the removal of invalid entries, duplicates, combining previously siloed records etc. The path to clean-up can be incremental. Taking the conventional approach and imposing a data model first, before doing anything with the data, leads to less flexibility and more cost.
Start building confidence in the data. Too often, data is present in isolation, with no knowledge of its provenance – when it was created, its source system and whether it’s been combined with other data. This metadata is valuable in proving a data item’s worth and actually preventing dirty data in the first place.

In conclusion, it’s worth stopping dirty data slowing you down. The business impact of dirty data is staggering, but an individual organisation can avoid the morass if it takes the right approach. Clean, reliable data makes the business more agile and responsive, and cuts down wasted efforts by data scientists and knowledge workers. And remember that 25% potential loss of revenue. It’s there to be clawed back.

Subscribe to our newsletter

WEBINAR

Recorded Webinar: Sponsored by FundGuard: NAV Resilience Under DORA, A Year of Lessons Learned

The EU’s Digital Operational Resilience Act (DORA) came into force a year ago, and is reshaping how asset managers, asset owners and fund service providers think about operational risk. While DORA’s focus is squarely on ICT resilience and third-party dependencies, its implications extend deep into core operational processes that are critical to market integrity, investor...

Find out more

25 February 2026

RegTech Insight

BLOG

A-Team Group Announces Capital Markets Technology APAC Awards 2026 Winners and Launches ‘State of the Market’ Report

A-Team Group today announced the highly anticipated winners of the Capital Markets Technology APAC Awards 2026. These prestigious awards celebrate the most innovative solution providers and financial institutions that are reshaping the capital markets technology landscape across the dynamic Asia Pacific region. In conjunction with the awards, A-Team Group has also launched the “State of...

02 July 2026

Data Management Insight Market & Alt Data Insight RegTech Insight TradingTech Insight

EVENT

AI in Capital Markets Summit London

Now in its 3rd year, the AI in Capital Markets Summit returns with a focus on the practicalities of onboarding AI enterprise wide for business value creation. Whilst AI offers huge potential to revolutionise capital markets operations many are struggling to move beyond pilot phase to generate substantial value from AI.

17 June 2027

Data Management Insight RegTech Insight TradingTech Insight

GUIDE

AI in Capital Markets Handbook 2026

AI adoption in capital markets has moved into a more disciplined phase. The priority is now controlled deployment: where AI can be used safely, where it can deliver measurable value, and how outputs can be governed, monitored and evidenced. The 2026 edition of the AI in Capital Markets Handbook examines how AI is being applied...

21 May 2026

Data Management Insight Market & Alt Data Insight RegTech Insight TradingTech Insight

Browse by brand

Market & Alt Data Insight

TradingTech Insight

Digital Assets & Tokenisation Insight

Data Management Insight

RegTech Insight

Browse by content type

A-Team Insight Blogs

The Cost of Dirty Data

Share article

Related content

WEBINAR

Recorded Webinar: Sponsored by FundGuard: NAV Resilience Under DORA, A Year of Lessons Learned

BLOG

A-Team Group Announces Capital Markets Technology APAC Awards 2026 Winners and Launches ‘State of the Market’ Report

EVENT

AI in Capital Markets Summit London

GUIDE

AI in Capital Markets Handbook 2026

Share on Mastodon

A-Team Insight Blogs

The Cost of Dirty Data

Share article

Related content

webinars

Recorded Webinar: Sponsored by FundGuard: NAV Resilience Under DORA, A Year of Lessons Learned

Related content

WEBINAR

Recorded Webinar: Sponsored by FundGuard: NAV Resilience Under DORA, A Year of Lessons Learned

BLOG

A-Team Group Announces Capital Markets Technology APAC Awards 2026 Winners and Launches ‘State of the Market’ Report

EVENT

AI in Capital Markets Summit London

GUIDE

AI in Capital Markets Handbook 2026