By Mark Hermeling, Chief Technology Officer at Asset Control.
Within data management, data lineage is increasingly becoming an important capability. From pinpointing the origins of data to how it evolves over time allows a wide range of sectors to achieve a clear view of their data, track the data exactly, and use it to assist with managing regulatory compliance and financial reporting.
Increasingly, data lineage is becoming hard-wired into regulations and data quality frameworks – ultimately the need is for ‘explainability’. If a bank, for example, values a position at $25 million, it might need to explain why it is valued at that amount, how it came to that decision and what data points it used in arriving at the valuation. All this context, and more, may need to be tracked.
Corporate treasurers and finance staff have a major role to play in ensuring that the process of reporting and consolidating financial data is consistently transparent and explainable. They can also use data lineage and the related concept of explainability in other contexts. When changes are made to existing processes, for example, data lineage can be used in diagnostics to improve data quality. It can also be key to data licensing. If an organisation is licensing third-party content and therefore has to abide by associated restrictions, it will need to know what data is a derivative of another piece of data.
Finally, data lineage can be key to achieving better management of client records and ensuring greater care is taken about where client data is moved to and used going forward. After all, if a business client under GDPR demands that their data be expunged from the records, the organisation will need to know where all of that customer’s details have ended up in order to be able to achieve this.
Horizontal and vertical lineage
To do all this efficiently and well, organisations will effectively need to implement two different kinds of data lineage: horizontal and vertical. Horizontal data lineage traces the journey of a piece of data as it moves through systems from source to destination. It effectively tracks the journey of a specific item of data – typically across systems and reports – and is focused on the process of keeping track of the data that the business has consumed, where it subsequently went, and who touched it on its journey. The objective is to trace the journey of the data upstream.
Vertical data lineage, in contrast, describes the transformations that happen to a piece of data on the journey. The data could be an element feeding into a calculation, one of the sources of a bond curve calculation, for example. The lineage in this case would be to ‘go back’ from the bond curve and see what individual bonds formed part of the input at a specific point in time.
In short, horizontal data lineage traces data back to the original source, while vertical data lineage reverse-engineers the transformations that happen along the way, whether they are simple processes like cross-referencing, or tracking the different taxonomies that exist for financial instruments or industry classifications. Often, in order to compare like for like, an organisation might want to express an issuer or counterparty within the same taxonomy. So, for example, if one taxonomy labels a segment ‘IT’ and another calls the same segment ‘computer systems’, the organisation may need to ensure that the same label is used for both.
Meeting the Challenge
The specific challenge in terms of an organisation’s ability to reverse engineer is that it will need to keep track of input data sources and their value at the time a transformation took place, all the calculation parameters that fed into the calculation and their value at the time it was done, and the algorithm that was used.
To address the data lineage challenge, firms need bi-temporality so they know the value of the data and associated business logic at the time the calculation took place. They need to be able to track metadata and keep cross-reference tables between different taxonomies and classification schemes up to date, and they need a clear administrative process detailing who can access data, where it goes, where did they get it from, and what their sources were. Moreover, they also need a sourcing hierarchy to be able to look at data sources that are clearly documented and provide access for everybody who needs the data.
This is a complex undertaking, but organisations, and more specifically treasury departments, that understand the requirement and can put the right combination of processes and technology in place will be best placed both to meet data lineage requirements and gain competitive edge.