A-Team Insight Blogs

Opinion: Big Data Solutions to the Problem of Volume

6 December 2011

Subscribe to our newsletter

By Amir Halfon, Senior Director for Technology, Capital Markets, Oracle Financial Services

With the latest developments in the European debt crisis reverberating across the globe, the importance and necessity of managing the large amounts of data related to risk exposures is apparent more than ever.

As I mentioned in my previous blog post, the ability to gain a holistic view of exposures and positions (requiring rapid, timely, and aggregated access to large amounts of financial data that are growing exponentially) is becoming paramount to financial institutions worldwide. The challenge that many firms are facing right now is how to keep up with the sheer volumes of data that are involved.

So, for this second installment, I’d like to focus on the seemingly most obvious of the “four Vs” of Big Data – volume – and talk about the technical patterns and approaches associated with processing very large amounts of data.

The most relevant strategy is of course parallelism, and while we have been spending a lot of effort as an industry on parallelizing computation, data parallelism remains a challenge and is the focal point of most current projects. Additionally, it is becoming apparent that in many cases compute grids are bottlenecking on data access. And therefore the pattern of moving compute tasks to the data rather than moving large amounts of data over the network is becoming more and more prevalent.

Several technical approaches combine these strategies, parallelizing both data management and computation, while bringing compute tasks close to the data:

Engineered Machines

Engineered machines integrate software and hardware mechanisms, combining data and compute parallelization with partitioning, compression and a high-bandwidth backplane to provide very high throughput data processing capabilities. Some of them are actually able to delegate query and analytics execution to the nodes that hold the data, thus radically minimizing data movement. They do this by replacing the traditional SAN with intelligent storage nodes that can do much more that simple I/O operations, and which are connected to the compute nodes with a high-throughput fabric such as Infiniband.

Integrated Analytics

Whether using engineered machines or not, the concept of performing analytics right on the data management system is a very powerful one, again following the philosophy of moving computation to the data rather than the other way around. Whether it’s ROLAP, MOLAP, predictive, or statistical analytics, today’s relational database management systems are capable of doing a lot of computation right where the data is stored. Some of them actually integrate their data parallelism mechanisms with the analytical engines, so that analytical tasks are parallelized along the same principles.

The combination of high throughput analytics with engineered machines has enabled several financial firms to dramatically reduce the time it takes to run analytical workloads. Whether it’s EOD batch processing, on-demand risk calculation, or pricing and valuations, firms are able to do a lot more in much less time, directly affecting the business by enabling continuous, on-demand data processing.

Data Grids

Unlike compute grids, data grids focus on the challenge of data parallelism. Some of them also provide the ability to ship compute tasks to the nodes holding the data in memory, rather than sending data to compute nodes as most compute grids do. Again, this is based on the principle that it’s cheaper to ship a compute task than it is to move large amounts of data across the wire.

Several firms have been using data grids to aggregate market data as well as positions data across desks and geographies. And some go even further by continuously executing certain analytics right on the nodes where this data is being held, achieving a real-time view of exposures, P&L and other calculated metrics.

NoSQL

The concept of schema-less data management (which is what NoSQL is really all about) has been gaining momentum in recent years. At its core is the notion that developers can be more productive by circumventing the need for complex schema design during the development lifecycle of data-intensive applications, especially when the data lends itself to key-value modelling (e.g. time-series data).

Despite being based on different principles, most of these technologies still follow a similar philosophy to data grids: they distribute the data horizontally across many nodes and model it in an object-oriented rather than a relational manner. They also enable the execution of compute tasks close to the data in order to minimize data movement over the network.

It is important to keep in mind that despite the name, NoSQL technologies are not necessarily antithetical to RDBMSs. In fact they become much more powerful when combined with traditional data warehousing and business intelligence tools. I therefore tend to view these technologies on a continuum rather than in dialectic opposition.

In future posts, I’ll delve into this topic in more detail – particularly in relation to Hadoop, which is quickly becoming a de-facto standard – and continue the discussion on the ‘four V’s’ of Big Data.

Subscribe to our newsletter

Data Management Insight

WEBINAR

Recorded Webinar: Strategies and solutions for unlocking value from unstructured data

Unstructured data accounts for a growing proportion of the information that capital markets participants are using in their day-to-day operations. Technology – especially generative artificial intelligence (GenAI) – is enabling organisations to prise crucial insights from sources – such as social media posts, news articles and sustainability and company reports – that were all but...

Find out more

27 March 2025

Data Management Insight

BLOG

LSEG Wins Most Innovative Data Quality Initiative Award in A-Team Group Innovation Awards 2025

LSEG has won the Most Innovative Data Quality Initiative Award in A-Team Group’s Innovation Awards 2025 for its Tick History – PCAP, which was expanded this year to offer more than 400 feeds, with new coverage spanning 14 markets in the Americas, eight in the Asia-Pacific region and 76 in EMEA. These awards, now in...

30 April 2025

Data Management Insight

EVENT

TradingTech Summit London

Now in its 14th year the TradingTech Summit London brings together the European trading technology capital markets industry and examines the latest changes and innovations in trading technology and explores how technology is being deployed to create an edge in sell side and buy side capital markets financial institutions.

26 February 2026

TradingTech Insight

GUIDE

AI in Capital Markets: Practical Insight for a Transforming Industry – Free Handbook

AI is no longer on the horizon – it’s embedded in the infrastructure of modern capital markets. But separating real impact from inflated promises requires a grounded, practical understanding. The AI in Capital Markets Handbook 2025 provides exactly that. Designed for data-driven professionals across the trade life-cycle, compliance, infrastructure, and strategy, this handbook goes beyond...

15 April 2025

Data Management Insight RegTech Insight TradingTech Insight

Browse by brand

RegTech Insight

TradingTech Insight

Data Management Insight

Browse by content type

A-Team Insight Blogs

Opinion: Big Data Solutions to the Problem of Volume

Share article

Related content

WEBINAR

Recorded Webinar: Strategies and solutions for unlocking value from unstructured data

BLOG

LSEG Wins Most Innovative Data Quality Initiative Award in A-Team Group Innovation Awards 2025

EVENT

TradingTech Summit London

GUIDE

AI in Capital Markets: Practical Insight for a Transforming Industry – Free Handbook

Share on Mastodon

A-Team Insight Blogs

Opinion: Big Data Solutions to the Problem of Volume

Share article

Related content

webinars

Recorded Webinar: Streamlining trading and investment processes with data standards and identifiers

Related content

WEBINAR

Recorded Webinar: Strategies and solutions for unlocking value from unstructured data

BLOG

LSEG Wins Most Innovative Data Quality Initiative Award in A-Team Group Innovation Awards 2025

EVENT

TradingTech Summit London

GUIDE

AI in Capital Markets: Practical Insight for a Transforming Industry – Free Handbook