By Colin Gibson, Architecture & Data Management Consultant at Katano Limited.
What is the item you find hardest to locate in a supermarket? You know where to find it in your local store, but what confounds you when you try to find it in a different chain? What about Bovril Yeast Extract (other yeast extract brands are available)? Who decided to position it alongside the stocks and gravies while they put Marmite (see!) with the jams and spreads?This is a trivial and hopefully fun illustration of why, in the world of data, hierarchies should be approached with care.
Don’t get me wrong. Hierarchies are incredibly useful. Imagine trying to find your Bovril or Marmite if products were placed at the whim of each store manager. However, I would imagine most people involved with Data Management in any line of business can think of examples where hierarchies have also caused confusion or even conflict.
- Hierarchy – a group of people or things arranged in order of rank. I would add classifications to that…
- Classification – a division or category in a system that divides things into groups or types…
So a Classification Hierarchy is an ordered grouping of things and groups / types of things. And that is where the problem starts. How things are grouped together is seldom based on indisputable facts – maybe hierarchies of legal entities is an example, but even then what do you do with 50:50 joint ventures? Classification Hierarchies are artificial. Useful, but artificial.
So what can go wrong?
Don’t assume that there must be only one hierarchy of the same class of data
Take how countries or states / counties are grouped into “regions”. Countries are real. States / counties are real. (And, yes, there generally is a fact-based hierarchy of States/Counties rolling up to Countries … although I am sure someone, somewhere can think if a dispute that may break that rule). But the way States are grouped for Sales Management purposes may be different to how they are Grouped for Logistics Planning purposes. Using the wrong hierarchy for an intended purpose, or assuming that two purposes use the same hierarchy will lead to confusion and issues.
In the world of Finance, I doubt if any two banks have an identical classification scheme for how financial products are grouped into “classes”. Convertible Bonds are an often-quoted example. Do these belong in the Equities business or the Debt business?
Don’t assume that a piece of information can always be found at a consistent level in a hierarchy
An example here would be the level at which budgetary authority sits in an internal business hierarchy. Even if a hierarchy has uniform depth (i.e. it always has the same number of layers from the bottom to the top, as opposed to a ragged hierarchy which doesn’t) it might not be the case that budgetary authority, or any other type of authority, can be found n levels up from the bottom or m levels down from the top. Make sure your systems are not designed with that as a constraint.
Don’t assume that all hierarchies with a similar name have consistent things at the bottom / leaf level
Industry Classification is probably the best example here. As someone once said, “The great thing about standards is there are so many to choose from”. That is certainly the case with Industry Classifications. Across the different standards there are differences in the groupings of groups. But at the lowest level, the most granular, detailed groups that a business could be classified into are not consistent.
I am sure many seasoned data professionals will have their own favourite war story of where hierarchies have confused. I would love to hear them.
Remember. Hierarchies are useful. But approach with care!
To hear more about this sign up to attend our upcoming Data Management Summit where Colin will be moderating a panel on ‘Are we talking the same language? Releasing the potential of your data with a business aligned data model‘