Two benchmark tests from STAC relating to low-latency trade execution and analytics illustrate how applications are benefiting from hardware advances to boost performance. The tests – on Redline Trading Solutions’ InRush market data platform, and on Kx Systems’ kdb+ database – demonstrate different approaches to leveraging hardware functionality related to Intel’s latest Xeon “Sandy Bridge” microprocessors.
In the case of the latest STAC-T1 proposed ‘tick-to-trade’ benchmark tests, while Redline’s InRush 3 platform is continuously optimised to run on Intel multi-core chips as they are improved, configuring the server hardware to implement Dell Processor Acceleration Technology (DPAT) contributed to the low latency and jitter observed.
DPAT is a free optional bios firmware update available exclusively for select Dell servers allowing control of the Turbo Boost feature of Intel chips, allowing a higher processing frequency to be locked in for a set number of active cores. Processing speed is increased, reducing latency, and jitter is reduced because the chip frequency is consistent, unlike the normal usage of Turbo boost.
Running on a Dell PowerEdge R720 server, with a total of 16 cores operating continuously at 3.3GHz (boosted from a nominal 2.9GHz), STAC benchmarked InRush 3 with mean latency of 5.2 microseconds and a standard deviation of 1.2 microseconds, handling a simulated market data feed running at 8x normal messaging rate. An earlier STAC-T1 benchmark conducted on an HP server running without Turbo Boost, delivered mean latency of 6.1 microseconds, with a standard deviation of 1.4 microseconds.
In contrast to the performance boost provided by the straightforward implementation of DPAT, Kx Systems put considerable effort into optimising the latest 3.1 release of its kdb+ database to take advantage of parallel multi-core processing and new instructions that are built into Intel’s latest processors.
STAC exercised kdb+ 3.1 using tests related to its STAC-M3 ‘time series analysis’ benchmark standard. In particular, the latest kdb+ performed the NBBO test (which creates a daily national best/bid offer across U.S. exchanges for all symbols – requiring heavy read/write activity) in just 32.6 seconds, down from the 5.2 minutes that kdb+ 2.7 took in 2011 (on a similar hardware stack).
Kx’s chief strategist Simon Garland notes that the NBBO test in particular benefits from multi-core processing, since processing of symbols can be parallelised across cores.
Garland also points to performance boosts from the leverage of new Intel instructions in 3.1. These include Streaming SIMD Extensions (SSE), which increase performance when the same set of operations is repeated on multiple data objects, and Advanced Vector Instructions (AVX), designed to improve floating point computation.
Instead of relying on compiler support for these new instructions, Garland says that Kdb+ makes use of them in specific optimised ways, and has been built to auto-configure itself to work across servers that may or may not support them.
These latest benchmarks – for both Redline and Kx – illustrate how mainstream – i.e. Intel x86 architecture – platforms are boosting application performance, reducing the need to adopt more exotic hardware acceleration techniques, such as FPGA or GPU deployment.