By: Roland Bullivant, Silwood Technology
The EU’s new rules on data protection enshrined in General Data Protection Regulation (GDPR) come into force in May 2018. They fortify the rights of citizens over their own data and put more obligations on organisations of sizes to manage and protect that personal data.
The UK Information Commissioner’s Office provides a great deal of useful information including a 12 step guide to becoming compliant.
One of these steps, Number 2, suggests you ‘should document what personal data you hold, where it came from and who you share it with. You may need to organise an information audit across the organisation or within particular business areas’. One reason for this is the requirement to be able to support requests from data subjects. For example, if a customer wants to know what data is held about her, wishes to have data erased or corrected, or decides to withdraw consent to data being processed, it will be necessary to respond to those requests quickly and effectively.
In addition, if an organisation discovers it has inaccurate personal data that it has shared with other organisations, it will need to inform the other organisations about the inaccuracy, so they can amend their own records.
This is impossible unless the organisation knows what personal data is being held and its location. This article discusses the possible methods organisations might employ to find where personal data exists in an ERP or CRM system, and an alternative software driven approach.
Needles in haystacks?
For many organisations, the primary mechanism used to store personal data will be an enterprise data catalogue, data dictionary or data governance platform. This will provide data analysts with the information about personal data required to enable compliance with GDPR consent and rights of data subjects.
The key ingredient for these platforms will be metadata from across the enterprise IT landscape. For many systems, finding that information is quite straightforward as the metadata or data model or source data is easy to locate and understand. In addition, there are many software products that can scan systems and deliver required information relatively easily.
However, if an organisation is running enterprise CRM or ERP applications from SAP, Oracle, Salesforce, Microsoft or others, finding metadata that relates to personal data will be more of a challenge, especially if its location is not already known.
This is because of the size, complexity and level of customisation of the data models (metadata) that underpin these systems. Also, in most cases, the metadata is very opaque because the database system catalogue provides nothing useful in the form of business names for tables and fields, and no information about table relationships. This means standard database tools or scanners will not deliver anything of value in the search for personal data.
What methods can you use to find personal data information?
As an example, consider the methods you might employ to locate all the tables that store a particular personal data attribute in an SAP system. In the example used for this article, ‘Date of Birth’ has been selected as the piece of personal data to be located. A base SAP ERP system has well over 90,000 tables and 900,000 fields. In practice, the data model is often much larger and made more complex by the number of customisations that have been implemented.
While SAP has by far the largest and most complex data model, other ERP and CRM packages also have significant numbers of tables and fields. For example, a typical Oracle eBusiness Suite system has over 22,000 tables and 500,000 fields. A standard Microsoft Dynamics AX system has over 7,000 tables and 100,000 fields. Even a large Salesforce implementation can have over 3,000 tables.
The first question to ask when referring to documentation relating to metadata is ‘does it exist?’. If it does exist, can you access it easily? Another potential problem is to try to confirm whether it reflects any changes that have been made to the data model during the delivery project or subsequently.
If documentation is present it can provide a good starting point for metadata discovery. However, for large and complex ERP and CRM systems, trying to find individual instances of attributes that relate to personal data could present a problem. This is because the task of searching for each personal data item through documentation relating to tens of thousands of tables and hundreds of thousands of attributes is significant.
Finally, there is the additional task of ensuring the information is accurately recorded in the data catalogue, which could involve data being rekeyed or copied into the system.
Asking internal technical specialists
Your internal technical specialists will have access to whatever tools are provided for exploring the definitions of tables and field attributes within the system by each ERP and CRM vendor. It is likely they will also have good knowledge of the system and the particular way in which it has been customised to meet specific requirements.
They should then be able to go through a process to locate the personal data attributes for each table and record that information, perhaps in a spreadsheet or directly in the data catalogue tool.
One challenge with this approach is that searching and recording individual personal data attributes across large numbers of tables may not be supported by the tools provided. This would increase the amount of time it takes to achieve the task using this method. One result could be uncertainty as to whether all personal data attributes have been identified.
Engaging software vendors, staff or consultants
It may be necessary to engage the services of external consultants to achieve the same results as internal specialists. These could be application or GDPR specialists from a systems integrator or possibly from the supplier of the data catalogue or data dictionary software.
Depending on their experience and level of competence, they may be able to make use of whatever tools are available to them. Alternatively, they could provide some base templates that can provide the foundation for further exploration and comparison with the metadata as implemented.
Whichever approach is taken, some work with will be needed in order for them to familiarise themselves with the particular changes that have been made to the underlying data model and that might be relevant in the context of GDPR. These are likely to be a drag on the personal data discovery process.
It may be possible, if the metadata has been located using software tools or copied into a spreadsheet for example, to automate some of the processes for bringing personal data attributes into the data catalogue or glossary.
When all else fails, staff may resort to searching the internet for data models they hope will contain the personal data they are seeking from source ERP or CRM packages. This can be helpful, however it is best to approach this with a degree of caution. There is a risk that whatever is found may not represent what is in your own systems and so some work will be necessary to try to compare the two versions.
It may be possible to search for a list of all attributes in say, an SAP system, however with over 900,000 in a standard implementation, isolating those relevant for GDPR would be an extensive task.
Using dedicated metadata discovery software
Almost all data catalogue and data glossary software vendors have facilities to connect to source systems, locate and then import metadata into their platforms. This works really well if the source systems have relatively small amounts of metadata that is easy to find, understand and use.
Packaged ERP and CRM systems present a much more arduous challenge when it comes to accessing and making sense of their metadata. The size of their data models, combined with high levels of customisation, often impervious naming conventions, and lack of meaningful information in the database itself mean traditional methods and non-specialist tools are of limited value.
There are a very few software products that offer an alternative approach and provide data analysts with unique intelligence about the metadata in ERP and CRM packages. Typically, these products work by accessing or extracting rich metadata, as implemented, from where it resides in the application and then storing it in a repository. Often, this is in the data dictionary although some applications maintain their metadata elsewhere.
Doing it this way means that customisations to the data model are automatically surfaced so that users can be confident they are working with accurate metadata. Importantly, these products provide logical as well as physical information about tables and attributes, and discover the relationships between tables. This means it is easier to search for and locate personal data attributes.
For example, without a software discovery product with the capability to search for say, attributes across an entire SAP system with the string ‘social security’, the analyst would be reduced to hoping that someone would know that the physical name for that in the database is ‘CS04’. Alternatively, without the specialist search facilities offered by these products it would not be possible to quickly identify that a particular instance of SAP has 90 tables that contain the string ‘date of birth’.
Example showing how metadata discovery software can search for and find a list of SAP tables that contain one or more fields with the string ‘date of birth’
Ideally these metadata discovery products should support the ability for the relevant personal data metadata to be shared with other software platforms, including data catalogues and data glossaries. In contrast to traditional, manual and more resource hungry methods, this software driven approach means the whole process can be accomplished much more quickly and accurately.
Using technology in this way to help discovery of ERP and CRM metadata will assist and accelerate the information gathering part of the GDPR compliance process and improve the confidence and trust the business can have in the data.