A common theme at conferences and events in recent months has been data quality management. Coupled with this we have seen an emerging trend whereby data is no longer regarded as a technical activity but as a true business asset, carrying business risks. Many firms are now positioning data management under operations or even directly at board level.
The Enterprise Data Management (EDM) Council (www.edmcouncil.org) has been at the forefront of this new shift in emphasis. Data management across the enterprise means applying a business understanding to data related issues. Data quality management and integration of new and legacy systems are activities that require a technology independent view of the business facts that the data represents.
The EDM Council has commissioned a ‘semantics repository’ to provide a business facing and technology independent view of financial data semantics. This has been developed for the council by Hypercube, headed up by the author.
Producing this has been a challenge. A business view of semantics requires two things: it must be expressive enough to represent real things, and it must provide views that can be validated and owned by the business.
To find a format that is expressive enough for real things and real facts, we looked to the work coming out of the World Wide Web Consortium (W3C), who has developed a language called OWL. This is capable of defining real kinds of ‘thing’ in any subject area, and capturing facts about those things. However, the tools that use it do not (yet) provide a suitably business friendly view.
Our second challenge then was to come up with a format that can be validated by business subject matter experts. Currently, business folks are happy using spreadsheets for the majority of their requirements and semantics, sometimes with remarkably complex structures. We therefore laid out a spreadsheet structure which, to quote Einstein, was as simple as possible and no simpler. This was designed so the rows and columns correspond directly to features of the OWL language.
The second format, which some business stakeholders are comfortable with, is simple block diagrams, such as one would draw in Visio. These typically show boxes and lines, with no underlying language that has to be learnt. A picture may paint a thousand words, but it has to be understood. We developed a format with boxes for kinds of ‘thing’, lines showing relationships between those things, and a taxonomic relationship whereby a kind of thing can be seen as a specialised version of the thing above it – so a bond is a debt instrument, just as a mammal is a vertebrate.
The EDM Council Semantics Repository produces both of these formats from a single underlying model. It shows kinds of thing in the securities world (financial instruments, contractual terms, payments and so on) and facts about them. Some of those facts relate one thing to another – so a traded security ‘has’ an issuer and so on. Other facts are simpler – names, dates, numbers and so on, such as the current value of a bond coupon. Anything that is not a thing or a fact would be part of some technical design, and has no place in our semantics model.
To get some content into this model, we reverse engineered some meaning out of the various standards that the technology folks have produced over the years – MDDL for market data, the ISO securities data model (called FIBIM), the FpML derivatives message standard and so on. The model was structured principally around the ISO 10962 Classification of Financial Instruments standard.
Reverse engineering from the technical standards in this way was never going to produce a complete and canonical model of securities terms, but it provided enough material to get things started. The project is managed around established requirements management practice, whereby a first draft is presented to business subject matter experts, as the basis for capturing the knowledge of those experts.
The first draft of the model has been completed. The EDM Council has presented a number of demonstrations of this to the industry and the response has been overwhelmingly positive. Some very useful early feedback has been incorporated into the next iteration of the model.
The EDM Council will be hosting a number of online subject matter expert reviews starting in the New Year, each targeted at a specific area of knowledge. These will start with common securities terms, followed by equities, debt, funds and business entities. As each part of the model is reviewed, that subject area will go from ‘draft’ to being identified as a consensual view of industry knowledge in that area.
What is it useful for?
The initial brief was to pin down those elusive business terms and definitions. At its most basic, this could be used within a firm to replace ad hoc spreadsheets of data terms and definitions. This alone will introduce efficiencies and a level of control of technology development within firms, and the repository can be adapted locally to define all the business semantics within a firm. Future changes to technology can be driven from this top, business level rather than coming up from the technology.
However, the model is considerably more powerful than just a set of tightly controlled spreadsheets. The repository is built in a technology development framework called UML, which can form the core of a model driven development framework both for new systems and for integration of new or acquired legacy systems. With a few simple transformations the model can enable the production of future proof data model designs.
Participating in these reviews will ensure the repository represents a broad industry consensus, while guaranteeing that data in the participating firm will be supported by the repository. We can also identify ways in which the repository can be used to gain the efficiencies and controls of requirements driven data development. This will put people on the road to data quality nirvana.