By Nick Jones, Senior Consultant, Citisoft
Have you ever tried to do a simple job around the house and spent several times longer looking for the right tools and materials than it took to actually do the job? Have you ever wanted to make a change to a system and been frustrated by the time and cost needed to check what the impact of the change (if any) would be on ‘upstream’ and / or ‘downstream’ systems?
The Problem with Data Integration Tools
There is a good selection of feature-rich ‘data management’ tools in the investment management market, each being sold on the promise that it will ‘generate consistency and coherence in all your data’. Indeed, the best integration tools excel in moving data from A to B in an organised and controlled manner.
However, the ensuing integration projects frequently fail to deliver full benefits, on schedule or on budget. When moving data from A to B expands to become moving some of it from B to C as well, and then some similar data has to be moved from D to C, and from D to E, and finally from C to F, there can be a proliferation of ad-hoc point-to-point data flows. Individually, the flows all make perfect sense, and individually they may be developed and deployed efficiently. But the task of managing the data becomes more and more challenging as the number of point-to-point interfaces grows.
This is not always the fault of the tools themselves. It is at least partly because the tools are being used incorrectly. What is not appreciated – and what vendors have little incentive to highlight – is that data integration tools in themselves are not a sufficient data management solution, though they do form an essential element of a broader solution.
In an information-centric business, data is one of the main materials used to produce our products. And data integration tools can do a superb job of moving it around … so that we never quite know where we should be looking to find it, in the correct state, when we most need it! (Any analogy with domestic partners tidying things up should be studiously avoided in this context.)
So, in order to manage data effectively, and enable us to get the maximum benefit from it and from the products it supports, we need a data map, a form of indexing (or metadata) that tells us where we can find any particular piece of data in any specific state.
At its core, this data map, or metadata, is a logical model of the data we are using, expressed and described in business terms. As well as telling us what the data means, it can tell us where it is stored, how it reached its location, and where it will be used. Immediately this information is available it begins to provide benefits in terms of speed of requirements definitions, consistency of data use, and change impact analysis.
Once the data from disparate sources can all be addressed as if it is in a single place, with known and consistent forms and structures, it becomes possible to standardise procedures for operating on items – such as moving them between locations, or performing calculations and transformations. A data warehouse can be a good starting point for this metadata structure, if the data model is appropriate.
Once a structure is in place, it is feasible to use it to support automated generation of new, self-documenting data flows, including data interfaces and outbound reports. (Metadata-aware tools not only move the data, they record where they have put it too.)
This is not just a theoretical possibility. A Citisoft client is already using this metadata-driven development approach to save 50-75% of the previous development time for new client inbound data flows. A lengthy specification, development and testing cycle has been replaced by an iterative ‘map, run, check’ process, performed by (technically aware) Business Analysts. This allows much closer client and business involvement in the process, and much quicker identification (and resolution) of problems prior to deployment.
Cleaning Up: Enhanced Visibility and Usability
Extracting maximum value from data integration tools requires that they are used in the context of a logical data model. Creating this logical structure is simpler if there is a physical data model (e.g. a data warehouse) underpinning it.
With the metadata structure in place it is simpler to: co-ordinate multiple data integrations and extracts, ensure consistency of mapping, avoid multiple ad-hoc point-to-point data flows, and massively increase the visibility and usability of data.