The absence of a centralised data management strategy for artificial intelligence is the biggest hurdle to integrating data from different sources for use with the technology.
That was the finding of a survey of capital markets participants at a recent A-Team LIVE webinar “How to Organise, Integrate, and Structure Data for Successful AI”.
While expert speakers at the event agreed that a robust data infrastructure was foundational for achieving the cost savings and efficiencies that AI promises, half of attendees told a poll that the lack of a centralised data management strategy was the single-biggest challenge to achieving those goals. A quarter said that poor data quality and governance were their biggest obstacle.
The poll was held amid a deep-dive look at the opportunities, challenges and solutions that financial institutions are experiencing as they roll out AI applications.
The panel comprised Arijit Bhattacharya, Head of Data Governance at Northern Trust; Sumanda Basu, Head of Data Quality, Data Risks, Data Controls and Data Domain Lead at Société Générale; and Vahe Andonians, Chief Executive and Founder of Cognaize, which sponsored the event.
Data Quality
Polls held during the webinar also found that more than half of respondents said that having clean, well-organised and accessible data was the most critical factor for the successful use of AI. Another found that almost nine in 10 respondents said that custom-built benchmarks were the top vendor system requirements their organisations were is looking for to manage private assets.
The webinar also heard that:
- Generic data quality metrics are no longer adequate for sophisticated AI applications and that data must be accurate, complete and contextualised for machine consumption. Data for modern AI could be differentiated by five criteria: context, accessibility, governance for outcome, iterative learning and human expertise in the loop.
- Monolithic data management approaches should be replaced with highly differentiated and cross-functional strategies for managing diverse data types.
- Different AI technologies demand distinct data pipelines: Machine Learning (ML) relies on structured data; Natural Language Processing (NLP) operates on unstructured text; and, Deep Learning (DL), used for complex media like images or audio, mandates very high annotation accuracy and substantial storage.
- The challenge of data lineage and explainability remains acute, especially where AI models drive critical decisions. Lineage shouldn’t be regarded as a niche data function but an operational tool.
- AI agents will dramatically change operating models because they possess a feature that humans lack: the ability to forget. Because agents are guided by context rather than exhaustive training, policy changes can be implemented instantly, contrasting sharply with the lengthy change management cycles required when updating human-led processes. This capability fundamentally changes the economics of development, automating engineering and data science roles.
Subscribe to our newsletter