By Philip Brittan, Thomson Reuters
The scale of data in financial services is, although significant, dramatically smaller than it is for online consumer companies like Google or Amazon. For financial services, petabytes are huge, but for those firms they are a rounding error. Nevertheless the need for real-time throughput makes financial services data a unique challenge and there are tremendous learnings and benefits to be had from the big data world not only in terms of how we manage data at enterprise level but also in how we are able to present and manipulate that data through next generation desktops, such as Eikon.
You might say financial firms wrestle with “somewhat big data.” The difference is that much of financial data is structured and numerical, which actually doesn’t take up a lot of space, whereas the data Google deals with involves vast quantities of unstructured text, videos, and geospatial images. These things take up a lot of space and require massive computing power to deal with them, which is why big data companies bring literally millions of server CPUs to bear on the problem. This has led these companies to innovate technologies like MapReduce and Hadoop to efficiently harness the power of massive parallel server farms, and technologies like BigTable and Cassandra for managing seriously large databases that need to be managed seamlessly with incredible performance across many servers.
Luckily for us, you don’t need exabytes of data for technologies like Hadoop to be useful. In financial services we are able to use these technologies to deal with the data issues we do have, and frankly, once we have a big data infrastructure in place, it is then very easy to create all kinds of related data with knock-on benefits. An example we have put into practice with Eikon would be indexing the original data in multiple ways, which can speed run-time performance of look-up and analysis and allow for new ways of looking at the data, which allows for faster and more accurate auto-suggest results in the Eikon search function.
In financial services, we also have different issues due specifically to the structured nature of our data. Thus there is room for technology firms to innovate along a different dimension of the big data space. The financial services industry has “complex data” that supports a number of analytic use-cases. They include large matrix retrieval, complex time series analytics, aggregation and screening. We have hierarchical data that is nested. When dealing with complex data of this kind, simply throwing MapReduce at it does not necessarily help, as it would require a massive denormalisation of the underlying data. Essentially, this means that the complex nested data all has to be laid out flat, which can result in the loss of information contained in the very structure of the original data. To get around this, we have developed an innovative proprietary method for doing really fast retrievals on large amounts of complex data without needing to transform it at all, which ensures efficient data ingestion and no loss of semantic content.
For example, investors often look for growth stocks, and a common method to identify them is to compare the Compounded Growth Rate (CGR) of a stock with its industry median over a period of time and filter stocks that exceed the median. The computation is intensive as growth rate has to be calculated not only for every stock in the universe but also for the industry it belongs to. Typically, this kind of analytic takes tens of seconds to compute. The Eikon data cloud – through its proprietary “vega effect” – decomposes the query into data retrieval, data level analytics and application analytics and applies speed and scale to each layer. The techniques applied include efficient data retrieval algorithms, vectorisation, and the use of parallel processing. Using these techniques, the same analytic now takes less than one second to compute.
Data management, storage and retrieval is one aspect of big data. The other equally important aspect is what becomes possible when you have large amounts of data. The more data you have, the more “statistically significant” it becomes, which means that you can use a variety of statistical methods to tease magic from all that data. Those statistical methods lie at the heart of Google’s search engine and Amazon’s recommendation engine. These methods allow for the clustering of related news articles and for LinkedIn and Facebook to suggest people you probably know. We’re able to use similar methods on Eikon usage data to help customise Eikon search and navigation for a particular user and to look at patterns in news readership, for example. In this way, Eikon can continually learn and improve the results it returns, enriching the user experience on an ongoing basis.
Data gets really big when you start to look at unstructured text. This is the heart of the big data challenge. As we start to use more of the big data infrastructure now available to us, we have been able to start looking beyond structured numerical data and do analysis on news stories. Applying statistical methods to that news text allows us to build things like our award-winning news sentiment engine.
Another data challenge that is more unique to financial services is real-time throughput. Companies like Google ingest extremely large amounts of data quickly and store time series of usage data, such as which ads were clicked on by which users at what time. But the end point of that data is generally statistical analysis of a large amount of that data all at once. When dealing with pricing data, the important thing is to get the data from its original source to the end user with as little latency as possible, while storing it and possibly filtering, cleaning and analysing it along the way. One common use case in financial services is in creating derived time series, where calculations have to be run in real-time in response to every incoming tick of underlying data. This is fundamentally the same challenge faced by high frequency traders whose algorithms need to make buy/sell decisions within tiny fractions of a second based on a mix of historical and incoming real-time data.
The financial services industry can only benefit from the innovations big data technology brings to how we manage, store, analyse and deliver market data. By going big, we have breathing space to innovate further and this enables us to fully realise our vision for a next generation desktop in Eikon, bringing the full power of things like natural search to transform how we find and consume financial information.