One partial solution to the new norm of constantly changing compliance is data lineage, the art of tracking all the internal and external data used by your organisation, where it’s come from, how it’s changed, and ultimately whether it can be trusted.
Enterprise data lineage tool vendors suggest that by automating this often manual process, financial institutions can benefit from not just ticking regulatory boxes, but also more streamlined operations and opportunities to identify new business. But with reports of lengthy implementations and only partial success, some people in the financial world may take some convincing that this is not simply a costly chore of documenting data for regulatory purposes, and that it can bring business gain with the pain.
Reflecting this, a recent A-Team Group webinar asked delegates to rank the benefits of data lineage. The results showed 77% of delegates selecting a ‘better understanding of data’ and 74% ‘securing regulatory compliance’, way ahead of more exciting bottom-line benefits like ‘reducing costs’ (41%) and ‘new business opportunities’ (only 24% – ouch).
That said, these are relatively early days in turning data lineage to advantage. Simon Hankinson, global financial services market lead at data lineage tool vendor Collibra, says data lineage was initially seen as a defensive tool required to respond to regulation, although many banks are increasingly taking a proactive approach and using data lineage to source better data for decision making and allow the business to view data more strategically.
Philip Dutton, co-founder of Solidatus, agrees that regulation is a driver of data lineage, led by BCBS 239 and followed by regulations such as MiFID II, GDPR and CCAR, although it can also create a vastly improved understanding of data, particularly when the lineage is visualised.
He also notes the criticality of data lineage as regtech takes hold, adding micro applications to systems infrastructure to the extent that financial institutions could run thousands of systems. He says: “If you want to build out the latest and greatest disruptive environment based on regtech, and you are also trying to make savings across the technology stack, data lineage is key.”
The data management challenges of data lineage include large volumes of data, a legacy environment, sourcing data from disparate systems, getting data into a uniform format and being able to extract useful information. That said, the potential benefits of lineage are driving uptake.
Take the case of American Fidelity Assurance, which worked with ASG Technologies to install a data lineage system in 2015 that would modernise its analytical processes. Before that, the firm had to manually or semi-automatically map its daily scheduler processes to capture data like predecessors, successors, job flows and other dependencies.
One project to map its daily schedule involved 500 jobs and over 100 daily schedules, and took well over 100 hours. The data lineage system now provides this type of information in minutes. If a job terminates abnormally, the company can quickly see what it was and trace it back to source. In another striking example of cost saving, the firm cut 464 instances of tax information down to 12.
Mark Nance, American Fidelity’s chief data officer, likens the introduction of automated data lineage tracking to a first taste of ice cream. He says: “If someone has never really eaten great ice cream before and finally gets their first bite, it changes their entire outlook. I didn’t know how much I was missing until I had that first bite. I want more of that. Tracing data was the same experience.”
A good of example of lineage, yet there’s no escaping the fact that many firms have a long way to go to achieve an automated solution. Speaking on the webinar noted above, Mike Smith, head of data strategy and governance for Citi Bank’s US consumer, commercial and mortgage group, says: “Financial institutions are still behind on the process of tracking data lineage. The increased reporting requirements around BCBS 239, KYC (Know Your Customer), AML (Anti-Money Laundering) and everything else are creating a tsunami of issues that require us to understand where data is traded, what happens when it goes into the data supply chain, and reporting requirements when it is put into a report.”
In response to the tsunami, Smith agrees with other expert speakers that an automated data lineage tool – like that used by American Fidelity – is the solution. He explains: “There is a lot of effort around trying to determine lineage on a manual basis, and it’s extremely costly and unsustainable. From a regulatory standpoint, it’s also hard to defend. Data lineage needs to be put into a tool so you’re able to scale it and support high volumes of lineage. If not, you’re just throwing a lot of money at it. Regulators want us to understand lineage faster than we can really react, so it’s hard to build that out in a manual way.”
These automated tools are not foolproof. As Smith says: “Everyone in an organisation today wants a silver bullet, they want one solution vendor to come in, pull the trigger and give you 100% definitive lineage – and that’s just not the real world. Our systems are too complex and have been around a long time, so it’s going to get you 85-90% of the way there and you’re going to do some manual stitching. Once the stitching is in there the data lineage will be sustainable.”
Tool vendors accept there are a number of potential issues to overcome in a data lineage project, such as:
- Project length and complexity
- Technology complexity
- Constantly changing regulatory requirements
- Poor understanding of lineage
- Lack of management buy-in
- High volumes of siloed data
- Problems mining outsourced and automated data feeds
- Managing and consolidating lineage across multiple systems
- Organising and sustaining data lineage so it remains useful to the organisation.
If those are some of the issues, what’s the best way to make data lineage work? The need is to prioritise your most important data and look for ‘quick wins’. Sue Habas, vice president of strategic technologies at ASG, advises: “Have a well-defined scope. Date lineage projects can quickly get out of hand.”
Hankinson agrees: “Take a top-down approach, there is a lot of metadata out there, not all of it useful. Don’t try and capture all of it. Focus on what actually matters to the business and regulators.”
Smith spells out what this approach means in practice: “You can’t do all the big problems all at once – you’ll never get it funded, you’ll never get your business case built. There are data issues everywhere, you’ve got to figure out where there’s a lot of pain in your organisation. Then take a small subset of that data, define what your business, functional and technical requirements are and from there evaluate what’s the best tool that’s going to help you scheme that lineage. Do a proof of concept (PoC), test it against your current manual processes, and evaluate the time savings it provides. You have to show some small wins and show how it leads to bigger wins.”
Bloomberg PolarLake application specialist Duncan Cooper – who made the quip about ‘death, taxes and regulation’ – points out that while organisations need quick wins, they actually benefit most from sticking with data lineage for the long haul. “This is a service, not a project,” he says. “For me it’s about changing the mindset. It’s not big bang, it’s a service.”
Habas agrees: “Once the lineage is complete, you don’t want to stop there, you want to industrialise the process. You’ll be surprised how much lineage changes week to week, month to month.”
The point these experts make is that automated data lineage on its own does not provide extra value to the business; that comes from sustaining lineage and using it to gain new abilities – to find the right data quickly and decide what data is important, to switch off unnecessary feeds and eliminate data duplication, and ultimately to hand data ownership to the relevant line of business that can best exploit it for financial gain.
As Hankinson says: “On its own, data lineage has a relatively low value. The benefit is in improving the reliability of data for the business. Think about the value of sustaining data lineage. If you make it easy by making a portal for the data and allowing the business to access reports that are now governed, there are significant operational efficiencies to the organisation. You can ingest separate data sources and combine them, and that’s an important part of making data lineage and data governance sustainable, and also valuable.”
In terms of regulatory benefits, he says: “If new regulations come in, there is significant overlap in the data being governed if it comes from a platform that cuts across silos. So instead of AML documentation being in Excel or in a SharePoint silo, and the CCAR team having its documentation on a different SharePoint site, you bring that together. Then, with new regulation you can establish what part of the data is already being governed and documented by another regulation. That’s a significant opportunity to reduce duplication of effort.”
As organisations work to realise the benefits of data lineage, one positive highlighted by Habas is the improving level and accessibility of automated data lineage tools: “There are more systems integrators offering lineage as a service, and platforms are getting easier to implement.” Such advances will be well received by financial organisations seeking a solution to the death-and-taxes-like certainty of yet more regulation.
How to get the most out of data lineage
- Let your most important data – to the business and regulators – drive the project
- Initially, prioritise this data only, to get a quick win and business buy-in
- Use automated tools to capture data lineage
- Data lineage is an ongoing, not one-off, effort
- Create a sustained data lineage and governance framework to track changes to data
- Establish which departments will access data lineage and how they will use it.