About a-team Marketing Services
The knowledge platform for the financial technology industry
The knowledge platform for the financial technology industry

A-Team Insight Blogs

Tech Matters with Pete Harris: With Decision Latency, Performance Focus Turns Inward, and Parallel

Subscribe to our newsletter

As efforts to reduce data transmission latency begin to hit the speed of light barrier (the wireless networks being rolled out are pretty much the end game), attention is turning away from the speed of trade execution and towards the speed of trade construction and optimisation – decision latency as some call it. Addressing decision latency has moved the performance focus to within the data centre, and how to squeeze the maximum from servers, typically configured as tightly coupled clusters, running application code designed to take advantage of parallel processing.

To reduce decision latency, servers need to be able to perform complex calculations and analytics – such as monte-carlo simulations and vector arithmetic – and processes such as sorting and manipulation of large datasets. As a result there is a need to optimise both compute processing as well as data I/O.

Several technology approaches are currently gaining traction to improve compute and data management performance, including overclocking CPUs, tightly coupling CPU and storage, interconnecting servers with PCI Express, deploying solid state Flash storage on the main RAM bus, implementing in-memory data stores, and leveraging parallel processing capabilities within the server or cluster.

Parallelism is focused on processing logic that can be broken down into component parts, and executing those components alongside one another, rather than one after another (that’s called serial processing). While not all logic lends itself to the parallel approach (simple processing of financial market data feeds is an example), much of the compute and data manipulation related to pre-trade analytics, trade construction and risk management is well suited.

When it comes to implementing parallelism at the hardware level, a couple of technology approaches have been most commonly adopted: multi-core CPUs from Intel or AMD, or Graphics Processing Units from Nvidia.

In recent times, Intel has invested heavily in multi-core CPUs, with its current “Ivy Bridge” generation of Xeon chips having as many as 15 x86 cores – with each core capable of executing two logic threads simultaneously. Intel has been working closely with developers of financial markets applications – including Kx Systems, Redline Trading Solutions and Tibco Software – to ensure that their offerings make best use of multi-core architectures.

Meanwhile, GPUs – or more correctly GPGPUs (for General Purpose Graphical Processing Units) – are massively parallel by design and have been adopted for a number of applications, notably by the likes of BNP Paribas for derivatives pricing and JPMorgan Chase for risk management, and by analytics provider Hanweck Associates for greeks calculations for options data feeds from a number of exchanges.

Tests conducted last year by Xcelerit, which has developed a toolkit to make it easier to write parallel financial applications, compared a single core Intel Sandy Bridge CPU to one with eight cores and also to an Nvidia Tesla GPU co-processor card (with 2,688 cores) for monte-carlo simulations. Compared to the single core CPU, the multi-core configuration was 19 times as fast, and the GPU was 96 times as fast.

Importantly, Xcelerit’s test team noted that the boost sought from parallelism is highly dependent on the type of application being run, and since monte-carlo calculations lend themselves to parallel processing, are heavily compute bound and require little memory access, the gains are significant. As is the case when optimising most complex software code for a hardware platform, the devil is in the detail, and “your mileage may vary.”

But multi-core and GPUs are just the beginning of the world of parallelism, as Intel is now targeting the many core architectures – implemented in its Xeon Phi co-processor card – that it has already deployed in the world of scientific supercomputing, at financial markets applications.

Indeed, I first heard of Xeon Phi when it was deployed in significant numbers in Stampede, the latest supercomputer hosted by the University of Texas at Austin. The peak performance of Stampede, which was fully let loose in the Fall of 2013 with nearly 500,000 cores, is 8,520 teraflops, making it the seventh fastest supercomputer worldwide, according to the widely recognised “Top 500 Supercomputer” list. Moreover, Xeon Phi also underpins the Milky Way 2 machine at China’s National Supercomputer Centre. With more than three million cores peaking at 54,902 teraflops, it is currently the world’s fastest supercomputer.

The current version of Xeon Phi – codenamed Knights Corner – features 60 x86 cores, each of which can run four threads simultaneously. When Xcelerit ran its monte-carlo test on this configuration, it performed 43 times faster than the single core implementation. But there’s more to come when the Knights Landing version gets released next year, which has 72 cores, floating point and vector processing, and 384GB of on-board memory. It doesn’t take too much to figure out Intel’s trajectory.

Unsurprisingly, Intel makes much of the compatibility of Xeon Phi with its mainstream Xeon processors, and the ease of programmability and portability of applications that comes with it. It all adds up, it says, to reduced development risk, cost and and time of implementation.

Intel’s also happy that Xcelerit just released a version of its toolkit that supports Xeon Phi. Says principal engineer, Robert Geva: “The Xeon Phi coprocessor can really deliver spectacular performance to clever programmers who make good use of its cores, caches and vector processing units,” he says, adding “This Xcelerit SDK is very welcome as it opens up Phi performance to programmers who don’t have that expertise”.

Meanwhile, advocates of GPU technology point to frameworks such as the the Nvidia-backed CUDA, which provides parallel processing extensions to languages such as C, C++ and Fortran (as well as support for finance languages like Matlab and Mathematica) that ease parallel application development.

Just as Intel has for some years squared off against FPGA co-processors for such tasks as low-latency data feed handling and order-book building, it now looks set to do new battles with the GPU crowd when it comes to reducing decision latency for intelligent trading approaches. Along the way, there might also be skirmishes with other processor architectures, such as IBM’s POWER8, which supports 96 threads per chip – a battleground made more likely given Big Blue’s focus away from x86-based systems.

Pete Harris is Principal of Lighthouse Partners, an Austin, TX-based consulting company that helps innovative technology companies with their marketing endeavors. www.lighthouse-partners.com.

Subscribe to our newsletter

Related content


Recorded Webinar: Transforming Data Experiences in Quantitative Research and Trading

For quantitative researchers and quant trading teams at banking and capital markets firms, the ability to access, integrate, and share data is critical. Data and how teams collaborate with data underpins the ability to generate alpha, perform execution analyses, and provide a modern and differentiated client experience. However, for most banks, legacy technology stacks and...


big xyt and JSE Establish Joint Venture to Market New Data Analytics Platform

big xyt, the independent provider of smart data and analytics solutions, and the Johannesburg Stock Exchange (JSE), Africa’s largest stock exchange, have established a joint venture, big xyt ecosystems. The new company will make the Trade Explorer data analytics platform, initially launched in South Africa, available to other trading venues and financial centres worldwide. The...


Buy AND Build: The Future of Capital Markets Technology, London

Buy AND Build: The Future of Capital Markets Technology London on September 21st at Marriott Hotel Canary Wharf London examines the latest changes and innovations in trading technology and explores how technology is being deployed to create an edge in sell side and buy side capital markets financial institutions.


ESG Handbook 2023

The ESG Handbook 2023 edition is the essential guide to everything you need to know about ESG and how to manage requirements if you work in financial data and technology. Download your free copy to understand: What ESG Covers: The scope and definition of ESG Regulations: The evolution of global regulations, especially in the UK...