Data Management Insight Knowledge Hub
 
                  Data Lineage
In a nutshell: Data lineage traces data from source to destination, noting every move the data makes and taking into account any changes to the data during its journey for full traceability. It is critical to regulatory compliance and offers numerous business and operational benefits.
Read on in our Knowledge Hub ‘Everything you need to know’ section to understand the full details of what data lineage is all about, who it impacts, the key requirements, the technical and data challenges it presents, and the outlook.
You can also take a look at all the latest content we have related to data lineage. And you can see a listing of key vendors delivering solutions to this data and technological challenge.
                    Our Data Lineage Knowledge Hub delivers everything you need to know about data lineage, with a full overview, key resources from across the A-Team Insight platform, and a list of solutions providers.                  
                Key resources
Blogs

Cautious and Steady Adoption of Unstructured Data Capabilities Advocated by Experts in DMI Webinar
Financial institutions are taking a considered approach to integrating unstructured data into their systems, exercising caution as they get to grips with the mushrooming data format and the technology that is enabling generation of it. At the most recent A-Team Group Data Management Insight webinar, experts and audience members alike attested to the growing importance…

Rethinking Data Management in Financial Services: Virtualisation Over Static Storage
By Thomas McHugh, Co-Founder and Chief Executive, FINBOURNE Technology. In Financial Services (FS), data management has long been centred around traditional database storage. However, this approach is fundamentally misaligned with the nature of FS data, which is process-driven rather than static. The industry needs a shift in perspective – one that prioritises virtualisation over rigid…

Modern Data Landscape Comes Under Scrutiny at Data Management Summit London
From data products and marketplaces to the new challenges of regulatory compliance and the latest thinking on unstructured data, A-Team Group’s Data Management Summit London 2025 took in the full breadth of topics that chief data officers and their teams are dealing with daily. With a line up of C-suite executives and expert speakers from…
Load more
              	White papers

The Data Transformation Imperative: From Operational Burden to Strategic Advantage
A new report by SimCorp examines the investment management industry’s data opportunities – and challenges – in a fast-changing financial landscape, offering guidance on how companies can plot a course towards future-proofing their data capabilities and making immediate operational improvements. New data complexity challenges are presenting themselves to investment managers at an immense scale, the…

Valuing Hard-to-Price Securities in Volatile Markets
Valuing hard-to-price securities has long been a complex endeavour for asset managers and institutional investment firms, particularly for OTC instruments and illiquid assets. This challenge is amplified in volatile markets, where obtaining direct, observable prices becomes even more difficult, making accurate valuations a significant issue with serious consequences. To address this, the use of reliable,…

Nordic ESG Data Matures: Transformative Insights from Strategic Imperative to Practical Application
In the Nordic region, ESG has moved beyond high-level compliance. Financial institutions are now deeply involved in practical data management complexities, value creation, and integrating ESG into core investment decision-making processes. This white paper, based on an A-Team Group executive roundtable in Stockholm, supported by Rimes, offers insights into how leading Nordic buy-side firms are…
Load more
        Webinars

Recorded Webinar: Unlocking Transparency in Private Markets: Data-Driven Strategies in Asset Management
As asset managers continue to increase their allocations in private assets, the demand for greater transparency, risk oversight, and operational efficiency is growing rapidly. Managing private markets data presents its own set of unique challenges due to a lack of transparency, disparate sources and lack of standardization. Without reliable access, your firm may face inefficiencies,…

Recorded Webinar: End-to-End Lineage for Financial Services: The Missing Link for Both Compliance and AI Readiness
The importance of complete robust end-to-end data lineage in financial services and capital markets cannot be overstated. Without the ability to trace and verify data across its lifecycle, many critical workflows – from trade reconciliation to risk management – cannot be executed effectively. At the top of the list is regulatory compliance. Regulators demand a…

Recorded Webinar: Hearing from the Experts: AI Governance Best Practices
The rapid spread of artificial intelligence in the financial industry presents data teams with novel challenges. AI’s ability to harvest and utilize vast amounts of data has raised concerns about the privacy and security of sensitive proprietary data and the ethical and legal use of external information. Robust data governance frameworks provide the guardrails needed…
Load more
          Guides

Data Lineage Handbook
Data lineage has become a critical concern for data managers in capital markets as it is key to both regulatory compliance and business opportunity. The regulatory requirement for data lineage kicked in with BCBS 239 in 2016 and has since been extended to many other regulations that oblige firms to provide transparency and a data…

Regulatory Data Handbook 2025 – Thirteenth Edition
Welcome to the thirteenth edition of A-Team Group’s Regulatory Data Handbook, a unique and practical guide to capital markets regulation, regulatory change, and the data and data management requirements of compliance across Europe, the UK, US and Asia-Pacific. This year’s edition lands at a moment of accelerating regulatory divergence and intensifying data focused supervision. Inside,…

AI in Capital Markets: Practical Insight for a Transforming Industry – Free Handbook
AI is no longer on the horizon – it’s embedded in the infrastructure of modern capital markets. But separating real impact from inflated promises requires a grounded, practical understanding. The AI in Capital Markets Handbook 2025 provides exactly that. Designed for data-driven professionals across the trade life-cycle, compliance, infrastructure, and strategy, this handbook goes beyond…
Load more
Everything you need to know about: Data Governance & Lineage
What is data lineage?
Data lineage covers the lifecycle of data, from its origins, through to what happens to the data when it is processed by different systems, and where it moves from and to over time. It can be applied to most types of data and systems, and is particularly valuable in complex, high volume data environments. It is also a key element of data governance, providing an understanding of where data comes from, how systems process the data, how it is used and by whom.
The importance of data lineage has escalated in recent years in response to increasing regulatory demand where regulators are demanding full transparency and audit trails of the data behind all trading decisions.
But over time firms have come to understand the value and benefits it can deliver. Acceleration of automation has also advanced use cases. Beyond compliance, extensive data lineage can provide operational transparency and reduce risk and costs. From a business perspective, data lineage can improve data quality and allow the business to make better decisions and spot new business opportunities and strategies.
Data lineage is often represented visually to show the movement of data from source to destination, changes to the data and how it is transformed by processes or users as it moves from one system to another across an enterprise, and how it splits or converges after each move. Visualisation can demonstrate data lineage at different levels of granularity, perhaps at a high level providing data lineage that shows which systems data interacts with before it reaches its destination. As granularity increases, it becomes possible to provide detail around the particular data, such as its attributes and the quality of the data at specific points in the lineage.
By building a picture of how data flows through an organisation and is transformed from source to destination, it is possible to create complete audit trails of data points, an aspect of lineage that has become increasingly necessary to meeting regulatory requirements and ensuring data integrity for the business.
The necessary scope of data lineage can be determined by regulatory requirements, enterprise data management strategy, data impact and critical data elements. It is not necessary to boil the ocean – instead, best practice identifies regulatory requirements and business processes to which the application of data lineage is beneficial.
Who is involved in data lineage?
Reflecting the regulatory compliance and business uses cases of data lineage, related job titles include:
- Business analyst
- Business intelligence developer
- Compliance officer
- Data analyst
- Data architect
- Data governance analyst
- Data modeller
- Data quality analyst
- Solutions architect
Regulations driving adoption
Data lineage was initially implemented by financial institutions to track data across individual data management projects. It rose to prominence and became part of the regulatory landscape following the implementation of BCBS 239 in January 2016, a Basel Committee on Banking Supervision (BCBS) rule designed to improve data aggregation and reporting across financial markets, as well as accountability for data.
These requirements were the early drivers of improved data lineage, which has since been reinforced by a number of regulations that require firms to implement lineage to demonstrate exactly how they came to the results published in regulatory reports. Data lineage allows firms to not only prove the validity of report entries, but also take a proactive approach to identifying and fixing any gaps in required data.
BCBS 239
Regulatory requirement: Basel Committee on Banking Supervision rule 239 (BCBS 239) came into force on January 1, 2016 and is designed to improve risk data aggregation and reporting. It is based on 14 principles that underpin accurate risk aggregation and reporting in normal times and times of crisis. To achieve compliance, banks must capture risk data across the organisation, establish consistent data taxonomies, and store data in a way that makes it easily accessible and straightforward to understand.
Data lineage response: Data lineage must be implemented to support risk aggregation, data accuracy and reporting, and conversely, to ensure risk data can be traced back to its origin and risk reports can be defended.
GDPR
Regulatory requirement: General Data Protection Regulation (GDPR) is an EU data privacy regulation that came into force on May 25, 2018. It is designed to harmonise data privacy laws across Europe and protect EU citizens’ data privacy. The requirements of GDPR include gaining explicit consent to process personal data, giving data subjects access to their personal data, ensuring data portability, notifying authorities and individuals of data breaches, and giving individuals the right to be forgotten.
Data lineage response: Firms subject to GDPR are dependent on data lineage to track data and provide transparency about where it is and how it used. Data lineage provides firms with the ability to demonstrate compliance with the regulation and, from a data subject’s perspective, supports access to personal data and the execution of other rights such as the right to be forgotten.
MiFID II
Regulatory requirement: Markets in Financial Instruments Directive II (MiFID II) is a principles based directive issued by the EU. It took effect on January 3, 2018, and aims to increase transparency across Europe’s financial markets and ensure investor protection. The demand for reference and market data for both pre- and post-trade transparency, including trade reporting and transaction reporting, is unprecedented, leading to data management challenges including sourcing required data, reporting in near real-time, and uploading reference and market data to MiFID II mechanisms including Approved Publication Arrangements (APAs) and Approved Reporting Mechanisms (ARMs).
Data lineage response: MiFID II operations can benefit from data lineage in a number of ways. Lineage can be used to identify any gaps in trade reporting data, and any similarities across numerous regulatory reporting obligations. It can also be used to map MiFID II reporting data from source systems to APAs and ARMs and vice versa.
CCAR
Regulatory requirement: The Comprehensive Capital Analysis and Review (CCAR) is an annual exercise carried out by the Federal Reserve to assess whether the largest bank holding companies (BHCs) operating in the US have sufficient capital to continue operations throughout times of economic and financial stress, and have robust, forward-looking capital planning processes that account for their unique risks. From a data management perspective, CCAR requires data sourcing, analytics and risk data aggregation for stress tests designed to assess the capital adequacy of BHCs and for regulatory reporting purposes.
Data lineage response: CCAR requires attribute level data lineage to track data from source to destination and ensure the validity and veracity of capital plans. Data lineage can also be used to identify any data gaps in reporting and highlight any data quality issues.
FRTB
Regulatory requirement: Fundamental Review of the Trading Book (FRTB) regulation will take effect in 2022. It is a response to the 2008 financial crisis, which exposed fundamental weaknesses in the design of the trading book regime, and focuses on a revised internal model approach to market risk and capital requirements, a revised standardised approach, a shift from value at risk to an expected shortfall measure of risk, incorporation of the risk of market illiquidity, and reduced scope for arbitrage between banking and trading books.
The data management challenges of the regulation are significant and include data sourcing, facilitating capital calculations, and gathering historical data as well as real price observations for executed trades or committed quotes to meet requirements around non-modellable risk factors (NMRFs) and the linked risk factor eligibility test.
Data lineage response: To satisfy the demands of FRTB, data lineage may be needed to track historical data and trade data aggregation required for the risk factor eligibility test of NMRFs, essentially the provision of at least 24 real price observations of the value of the risk factor over the previous 12 months.
Business use cases of data lineage
Beyond regulatory compliance, data lineage offers business benefits, but it must be approached as a long-term activity rather than a point solution if it is to provide ongoing value.
Among the business benefits of successful data lineage implementation are:
Understanding data: It may sound simple, but understanding data that is used and stored across an organisation can be very difficult when it includes masses of internal data, several sources of external data, data silos and data in different formats. By applying data lineage, it is possible to gain a greater understanding of the data a company holds, where it is, what it is used for, its value and potential. With a good understanding of data, it is also possible to assign responsibility for data ownership to individuals, departments or lines of business within the organisation.
Improved business decisions: By providing access to accurate, trusted data quickly and efficiently, data lineage allows business to make smarter, faster and better informed decisions. Decisions can be made more proactively where there is data lineage and defended on the basis of being able to determine the exact data underlying a decision.
Identifying business opportunities: Using data lineage to gain a better understanding of data, and to visualise data and processes, can provide new business opportunities, such as the potential to create new products by combining certain data and processes, or the possibility of finding an external partner to upscale and commercialise specific datasets.
Data discovery: Data lineage provides the ability to decide what data is important and find the right data quickly. This is crucial to business decisions and can help firms remain competitive and identify new business opportunities.
Improved analytics: More reliable and better quality data that is understood and easily accessible supports improved analytics and the knock-on effect of better business decisions.
Increased efficiency: By eliminating duplicated data and redundant data and systems, and providing a clear view of data and how it changes and moves around an organisation, data lineage can provide increased operational efficiency that can support both cost reduction and business needs for fast access to trusted data.
Impact assessment: Data lineage can be used to study how changes to IT systems or business processes could affect specific products or reports downstream.
Cost reduction: Data lineage offers a number of ways to reduce costs. The need to review data across an organisation as a first step of data lineage allows firms to identify and delete any duplicated data, focus on data silos and decide their fate, and discover unused data that can be eradicated and redundant systems that can be switched off. This will optimise a firm’s data footprint and reduce the costs of data management.
Understanding data provides an opportunity to review licensed data, which may be licensed more than once in any one organisation or not used to any great extent, avoid the penalties of using unlicensed data, and renegotiate licenses with data vendors to make external data provision more cost effective.
Data lineage and data discovery can also support new projects at lower cost as some required data and processes can be identified and reused.
Business intelligence and change management: The ability of data lineage to expose an organisation’s data lends itself well to business intelligence and change management. What-if analyses can be made using existing data and processes, starter projects can be undertaken to predict outcomes of change, and favourable projects can be developed quickly using existing and new resources. Rather than calling on IT to build new systems from scratch, the business can discover how new commercial concepts could work before investing in systems.
Data ownership: By clarifying where data is, who uses it and what for, data lineage can allow data ownership to be handed over to relevant individuals, departments or lines of business that can best exploit the data.
What are the challenges of implementing data lineage?
The challenges of implementing data lineage fall into three buckets – operations, technology and data management.
Operations
The operational challenges of data lineage start with winning management buy-in and funding for a solution that can be expensive, requires significant human input, and offers only a modicum of advantage in early implementation.
The best approach here is to educate management and start small. Decide whether a pilot project is going to provide insight into business opportunities or achieve an element of regulatory compliance, prioritise the most important and relevant data, scope the project carefully, and identify stakeholders that should be involved.
In the first instance, it may be useful to assess where required data comes from manually and create baseline data lineage before considering automation. It is also important to make sure the pilot project is scalable for other data sources or areas of the organisation before making a business case for lineage.
Proving the concept of data lineage and demonstrating quick wins to the business should, hopefully, be enough to start the journey towards a larger data lineage programme spanning part or all of the organisation.
Technology
The technology challenges of data lineage arise from growing numbers of regulations with overlapping requirements, smarter auditors and regulators asking for responses to questions on demand. Technology innovation adds to the challenge, with cloud-based applications and services, big data systems, machine learning, artificial intelligence and natural language processing technologies creating complex infrastructure. Data can be managed in new and interesting ways, but keeping track of it and ensuring it can be trusted is increasingly difficult.
At the heart of addressing these challenges is the selection of a solution, or solutions, to support an organisation’s data lineage. Questions to consider include: how much lineage is already in place; to what extent will manual lineage be necessary; how will lineage be documented; how will it need to be scaled; how will impact assessment be managed; what is the long-term aim for automation; which areas of the organisation will be covered and at what level in terms of technical and business lineage; how will data lineage be sustained; what skills are required; and how much will it cost?
There are no catch-all answers to these questions and few organisations will find answers to all of them in one solution, leading most firms to implement a combination of in-house systems and vendor solutions.
Whatever the selected solution, however, it will not provide value in isolation. It is important to consider how data lineage and its metadata will integrate with the rest of an organisation’s business metadata as this will provide rich data and the ability to slice and dice the data. Data lineage also needs to run alongside an organisation’s systems development lifecycle plan to ensure it is maintained as technologies are changed.
And, of course, scalable and flexible technology is essential, not only to master growing volumes of existing data types, but also to embrace additional datasets, alternative data, regulatory change and new regulatory requirements.
Data management
Implementing automated data lineage is a complex data management task that can include huge volumes of data, multiple legacy systems, mountains of spreadsheets, siloed data, uncharted data flows, mixed data formats, and creating metadata to describe the data.
Early considerations include identifying all the data across an organisation, assessing data quality and bringing manual processes into an automated lineage framework wherever possible.
An inventory of data can start the process of identifying which data is important to the business and should be part of a data lineage programme, which data can be left as is, and which data can be scrapped. Challenges here include mining outsourced and black box data, which can be difficult, if not impossible, to capture.
As well as identifying data that can be scrapped, the initial data inventory can uncover redundant systems that can be switched off, reducing the operations burden and the cost of systems infrastructure.
As data lineage is built out, data quality must be constantly monitored to facilitate lineage that is fit for purpose. Data quality can be addressed separately to data lineage, perhaps using the concept of a ‘data quality firewall’ based on a data management platform that enforces data policies and ensures data quality controls are executed before data is input to systems. Alternatively, it can be addressed within a data lineage framework using rules, controls and alerts.
Technology solutions
While most data lineage projects start as in-house manual developments responding to a specific requirement, an increasingly regulated environment, growing volumes of data and the need to provide fast access to business data are driving automation, in many cases based on a combination of in-house and vendor solutions.
A typical data lineage automation solution includes functionality that captures and documents data flows, such as a flow of financial instruments, from the data source to its final destination, perhaps a regulatory or internal report. Drilldown functionality allows particular points in the lineage to be inspected more closely, while traceability and audit ensure it is possible to track a piece of data through its journey across an organisation and verify its accuracy. Filtering capabilities allow users to filter for different data categories, such as reference data or trade data, and understand the data’s lineage and attributes.
Another technology facet of data lineage is visualisation, which can provide a real-time view of data moving through processes and systems, improve the understanding of data, highlight any defects in data flows, and visualise the impact of any changes to data and systems. Documentation is managed dynamically to reflect these changes in lineage.
Automation can also capture business logic and/or metadata that can be stored in a repository and used to create source to target data lineage, eliminate duplicated or redundant data, and provide business and technical users with the ability to locate, understand, and manage information that supports business operations.
These types of automated solutions offer many benefits, including the ability to trace data errors, identify discrepancies, control access to information and model what would happen if a new process or department were added to the business. They can also reduce time spent on validating data accuracy and put trusted information in the hands of decision makers.
Vendor solutions provide these types of functionality. There may be slight differences in underlying technologies, scope and potential for automation, but the key difference between vendor solutions is delivery, with some vendors providing cloud-based solutions that can be up and running quickly, and others offering enterprise software solutions that need to be implemented and maintained in-house.
Outlook
Going forward, data lineage is likely to follow the steady flow of data, applications and analytics into the cloud environment, extensive automation will become the norm, and the goal of zero-gap data lineage will be within reach.
Vendor solutions
 
  			ASG Technologies
ASG Technologies Group provides more than 3,000 global organizations with a modern approach to Digital Transformation. ASG is the only solutions provider for both Information Management and IT Systems. ASG’s Information Management solutions enable companies to find, understand, govern and deliver information of any kind, from any source through its lifecycle. The IT Systems Management solutions empower companies to support digital initiatives, operate IT infrastructure more efficiently and reduce the cost of managing IT systems landscapes. For more information, visit ASG.com or connect with us on LinkedIn, Twitter and Facebook.
 
  			MarkLogic
MarkLogic is an operational and transactional Enterprise NoSQL database platform trusted by global organizations to integrate their most critical data. Designed to integrate data from silos better, faster, and with less cost, MarkLogic can help integrate data and build a 360-degree view up to four times faster than if using a traditional database.
3d innovations (3di) – Data lineage for data compliance and licensing solutions
AxiomSL – Data capture and visualisation of data sources, data flows and business logic
Bloomberg – Solutions based on the Financial Instrument Global Identifier (FIGI)
Cambridge Semantics – Automatic capture of schema and statistical metadata describing data sources
Collibra – Interactive data lineage diagrams
Compact Solutions – Metadata integration platform providing data lineage
Datum – Metadata management for use cases including General Data Protection Regulation (GDPR)
Dremio – Data lineage to support analytics
Erwin – Web-based solution mapping data elements to sources
Global IDs – Data lineage layer that maps columns and tables to establish data flow
IBM – Metadata based data and business lineage
Informatica – Data lineage based on a machine learning enterprise data catalogue
Manta – Documents data lineage as it crunches programming code and provides an interactive map
Octopai – Automated cross-platform metadata management and data lineage
Smartlogic – Data lineage based on a semantic AI platform
Solidatus – Visualised data lineage based on metadata management
Talend – Cloud-based open source and enterprise lineage solutions
Trifacta – Data wrangling column-based solution
If you want to appear on this page please contact Jo Webb at jo@a-teamgroup.com or call us on +44 (0)20 8090 2055.