By Mike Bennett, director of Hypercube
We are all of us wiser than we were earlier in the year. The credit crisis, at its heart, is a crisis of knowledge: if we knew the right things at the right time, it seems that fund managers might have made different decisions, regulators could regulated and so on. Everyone from Alan Greenspan downwards has said that if we had a better data we might have done things differently.
A number of forward looking firms have already been looking at ways to improve the management of data in their firms, working towards the goal of having a single representation of the truth.
But can you have truth without meaning?
The data we use starts its life with clear enough meanings: terms defined in legally binding contracts or term sheets, prices quoted on trading venues and so on. Somewhere along the line, the meaning seems to get eroded.
The industry has recognised this problem for some time, and a lot of time and effort has gone into standardisation efforts for market data messages, securities data models and so on. Somehow these didn’t deliver what the industry really wanted. Handing the problem to a technical team to create a message or a data model does not solve the problem, it merely gives you a message or a data model, making it more obvious that the problem of common terms and definitions is not a technical problem to begin with. What is really needed is a standardisation of the terms themselves.
For example, an ISO committee was set up to define a common securities data model. Having sat on that committee I can say that the business subject experts were of the highest calibre, and a large number of terms were defined, collected and put into the data model. However, as soon as the model was shown to people who had not sat around the table, the questions started rolling in. Some 70% of the questions were along the lines of: “what does this mean?”, “is this the same as one of those?” and “what does this complex model structure actually show?” The model was a mix of meaningful business knowledge and technical model design. For example there is a term ‘other instrument identifier’. Does this mean another identifier for the same instrument, or the identifier of some other (underlying) instrument? Or both?
This is the problem with trying to set up definitions for data elements or message fields. Sometimes a different word means the same thing; sometimes the same word means something completely different. Does redemption for a bond mean the same thing as redemption in a fund? If not, can the same word be used and have two meanings? A good database design will combine common terms wherever possible, such as redemption and coupon payment streams. Good design pushes out knowledge.
Realistically you can’t deal with meanings by focusing on words. You can get a group of business experts in a room to argue all day about the meaning of a word; what you really want to do is find out all the meanings they know about. What the industry really needs to capture is the semantics of the terms themselves not the words, message tags or database field names.
Can this be done or is it an impossible, philosophical sounding ideal? The technosphere is awash with new terms like semantic technology, OWL and web 3.0, which appear to offer an answer to the problem of capturing those elusive meanings?
Using these semantics tools should be approached with caution. Anybody can pick up a semantics tool and create a model that mixes meanings with design features. You can import a data model into a semantics format and it will still be the same data model with the same design content and the same ambiguities, but in a different language.
What needs to go in a semantics model is real world things, and the facts about each of those things. So for example a bond is a debt instrument, which is a contract. An interest payment is an event. These are the simple facts we need to capture before we design anything.
There are two important features to a semantics model: everything is defined as a kind of ‘thing’, and each thing has facts that distinguish it from another thing.
Having a given set of facts makes something a member of the set of all things that have that fact. So everything with a backbone is a vertebrate. If a given thing has all the facts relating to a collateralised debt obligation, then it is a collateralised debt obligation. If it walks like a duck, swims like a duck and quacks like a duck, it is a duck. Because this uses set theory, it does not require the business domain expert to understand any computing concept.
For example, if I ask you what a Wertpapier is, you will tell me two things about it: what sort of thing it is (a debt instrument) and what facts about it distinguish it from other debt instruments. It is these two mechanisms: defining where something sits in the universe of things, and what facts distinguish it, that allows a meaningful definition of the terms.
Creating a semantic model requires skill, imagination and insight. The real skill lies in managing to not design something. Creating a semantic model requires getting business experts together, but also facilitating what they do so that their knowledge is captured for all time in a format that deals with the semantic issues.
An example of this approach is the EDM Council Semantics Repository (www.hypercube.co.uk/edmcouncil). This defines everything as a ‘thing’, along with definitions and relations in a technology neutral semantics format. This will be peer reviewed by subject matter experts until it represents an industry consensus on the terms and definitions behind financial data. This will be an industry resource that anyone can refer to for technical development and integration.