Datactics is heading into 2022 with the release of additional solutions on its flagship self-service data quality platform, a strong business performance in 2021, and new hires and offices to match. To find out more about the company’s plans for 2022 and highlights of 2021, we caught up with CEO Stuart Harvey and head of software development and machine learning Fiona Browne.
“Through 2022, we will continue to help our customers leverage data quality to make smart business decisions using our machine learning (ML) platform with a human in the loop,” says Harvey. “We will also build on a trend we see in the market – CDOs have a range of platforms handling data governance, lineage, master data management, and quality. They need more interoperation between the platforms to deal with both data at rest and data in motion.”
Datactics has long experience of cleansing, matching and storing data at rest in a static repository. Moving on to data in motion, it is working with data lineage specialist Solidatus to measure the quality of data as it moves through an organisation by integrating data sets, lineage and quality.
Data quality rules library
As well as driving innovation through partnership, Datactics continues to invest in augmenting and automating its platform to the greatest extent possible. The platform is deterministic, rules-based, and incorporates ML to reduce manual processes and offer more options than are available using traditional data management and quality solutions.The company’s new data quality rules library and ML model can be used to profile data at the start of a data quality journey, with the model looking at a client’s data and identifying which rules are suitable for the data. These can then be applied along with a client’s own, more specific data quality rules. “These are bulk data quality rules for a first pass of the data,” Says Browne. “Our rules are on the platform and additional rules can be added by the client.” The bulk data quality rules solution is being used in a proof of concept at a large financial institution.
Datactics is also using more statistical ML models to find errors in data. Browne explains: “From a data profiling point of view, finding errors in data is not always easy, and when you do find them, they are often heterogeneous.” To solve the problem, Datactics is using co-occurrence analysis to calculate how often values in a table of two columns occur as a pair. This shows a relationship between the data – for example London and UK, but if there is a spelling mistake – perhaps Landon and UK, the data don’t occur together suggesting there is a mistake in the data.
Co-occurrence capabilities will be rolled out as part of Datactics’ data profiling offering and have already been the subject of a proof of concept.
Break analysis service
The company’s new break analysis service is designed to provide insight into data in terms of the criticality of breaks. It adds a dashboard to the Datactics data quality platform and uses the Neo4j graph database to show how data breaks are related. Remediation can then be prioritised and data quality and KPIs can be tracked over time.
Datactics also started (and finished) its Rapid Match project in 2021. Funded by Innovate UK, the project was designed to produce a generalised framework to address the complexity of integrating data and matching data at scale that could be reproduced. It integrated diverse data sets from the UK Office of National Statistics (ONS) and UK Companies House (CH) on the Datactics platform to provide a view on regional funding and sectors and the impact of Covid.The project outcomes included the generation of a data quality pipeline for Companies House data. Browne says: “There are a lot of data quality issues in the data as there is the scope for free text entry.” To improve matters for the millions of firms that use Companies House data for KYC and AML checks, as well as identifying ultimate beneficial ownership, Datactics has taken the data, which is public, and is producing a monthly report providing statistics on the data quality of Companies House records. It has also created a cleansed version of the data that can be provided as a service. While the service has yet to be commercialised, Browne describes it as ‘a vehicle to show what our platform can do’.
2021 in review
“2021 was a good year in terms of customers,” says Harvey, noting the addition of five major clients using its self-service data quality platform, one of which took the company into the insurance business for the first time, while the others added to its presence in financial services and government markets across the US, UK and Asia.
Bolstering this growth and readying for 2022, the Belfast, Northern Ireland based company added an office in London and presence in New York City, an account manager and partnership manager at headquarters, and additional software developers and data scientists, bringing total headcount to over 60.
Acknowledging that it is increasingly difficult to find talent, Datactics also established a data engineering academy that started to recruit in 2021 and has already attracted a handful of trainees who will ‘earn while they learn’, and be offered jobs at Datactics when training is complete. Harvey comments: “At the moment, we have data engineering trainees. We will add software developers too as we look to build out a broad range of skills.”
Subscribe to our newsletter