Data lineage is key to regulatory compliance and business development, yet it remains a significant challenge for many firms as they lack an understanding of their data. Other barriers to implementation include having too much data – a problem that can, ironically, be solved by data lineage, internal culture, lack of management buy-in, and budget shortfalls.
Necessarily, these challenges, the results of an audience poll taken during a recent A-Team Ground webinar, are limiting data lineage adoption and use cases, but there are approaches and solutions that can help firms improve functionality from both a technology and business perspective. The webinar speakers noted the primary use case of data lineage as regulatory compliance, essentially having control of data and the ability to identify where it has come from and when and how it has been changed en route.
With this level of lineage in place, it becomes possible to build out use cases such as gaining value from data, and developing new products. Nick Golovin, founder and CEO at Data Virtuality and a participant in the webinar, noted that there is consensus between IT and business that data lineage is essential, but a divide in approaches. He commented: “IT works mostly on data extraction for data lineage, and data granularity and quality. The business needs data lineage so that it can make better use of data and better decisions. It needs data lineage information and the data itself.”
In the case of data lineage, different approaches do not have to cause difficulties, provided, as the webinar speakers agreed, that it is a shared responsibility between IT, business and risk. Adding a comment on the importance of data lineage, Daniel Bertha, director, data steward at Sumitomo Mitsui Banking Corporation, said: “Data is the second most important asset after employees.”
So, how best to approach implementation? Ian Rowlands, director, product marketing at ASG Technologies, described the three Rs of getting the right data, to the right people at the right time. He explained: “Data lineage is a programme, it is about building a data lineage factory. Think about scope and don’t boil the ocean. Make an inventory of data assets – you can’t manage what you don’t know you have and when you know what you have you can reduce volume by cutting out duplicate data and data that is not used. Then you need a business glossary tied to the underlying data, and to recognise different users’ points of view. Monitor for data change continually and make sure all stakeholders are aware of change. You can’t do this manually.”
Moving the conversation on to automation of data lineage, an audience poll reflected the difficulty of moving away from manual processes with 37% of respondents saying they have not automated lineage at all, 34% saying they have automated to a limited extent, 17% to a reasonable extent, 6% to a great extent and a further 6% to the greatest extent possible.
While there is not, and is not likely to be, a single solution for complete automation, the webinar speakers encouraged practitioners working on data lineage to look at innovative technologies such as artificial intelligence (AI) and machine learning (ML) that can handle issues such as similarity analysis. Golovin commented: “I have seen proofs of concept using AI, ML and natural language processing (NLP) to bind business terms to technical data assets.”
The webinar went on to discuss potential breaking points in data lineage and how they can be resolved, and the benefits of successful implementation. It concluded with final advice from the speakers, which was led by Rowlands, who said: “Lay the foundation, maintain it, explore use cases and find value.”
To find out more about how to approach and implement data lineage successfully, you can listen to a recording of the complete webinar here.