A round-up of emerging technologies at this week’s A-Team Data Management Summit found industry experts favouring grid and big data technologies as they strive to architect the data management platforms of the future.
A panel discussion moderated by A-Team Group editor-in-chief Andrew Delaney looked first at big data, how it is perceived and managed, and how it can be used in situ, with computing power moving to the data, a reversal of the traditional data-to-compute approach.
Benjamin Stopford, a specialist in high performance computing technologies at RBS Global Banking and Markets, said: “There is a lot of hype around big data. The concept comes from Internet companies, their approach to data and the use of modelling tools for unstructured data.”
Emmanuel Lecerf, senior solution architect, financial services industry, at Platform Computing, an IBM company, added: “Every bank doubles data creation every day. The data includes unstructured data and we see that as big data.”
If big data means unstructured data to some, for others it has a wider meaning. As Rupert Brown, lead architect in the CTO office at UBS, put it: “Big data is the result of markets going electronic. It is not particularly unstructured data but many-structured data, such as document centric data, square data and graphic data. We used to talk about grid computing as the big compute, now we have big data to go with it. The issue now is how do we manage the data not the compute?”
Answering this question, Stopford said: “Collocating data and processing delivers great efficiencies, but it is not an easy task and it needs to be done on an application basis rather than in a centralised way.”
Considering the processing power of grid computing, Lecerf said: “Grid technology is now mature and many customers have implemented it, so it can be used to underpin big data. The need is to transform infrastructure and merge grids to analyse big data.” Touching on Platform’s Symphony product, which is based on service oriented architecture grid computing, he commented: “We are looking at how to manage compute and data in the same product. It is key for customers and will allow them to do more with less.”
Discussion around managing large volumes of data quickly and efficiently included not only grid computing and data location, but also the less well established Hadoop and MapReduce open source programming model and software frameworks for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. Stopford noted RBS’s interest in Hadoop, which also featured in other presentations and workshops considering the challenges of supporting applications that are both compute and data intensive.
Turning the discussion to cloud computing, Delaney asked the panellists if they were using or ready to use the technology. While neither of the panel’s users, UBS and RBS, have adopted cloud computing, both said the technology is of interest but is constrained by a lack of infrastructure architects in the market. As Brown commented: “We may need to poach a few from Amazon.”
Stopford suggested in-memory technology had not taken off, at least at RBS, and pointed in preference to data caching and virtualisation. Asked how he would build the bank’s data architecture if he could build it from scratch, he concluded: “I would start with a single data model that everyone would use – mapping is the route of all evil. Then there would be a mechanism to extend the model at the base layer and I would avoid too many mapping layers.”