Following on from last month’s announcement of Ultra Messaging SMX, Informatica has published a range of latency and throughput performance figures for the shared memory transport, covering a number of programming languages. Messaging latency as low as 39 nanoseconds was recorded, with overall latency more than 16 times lower than tests conducted on an earlier version of the transport, conducted in May 2010.
Ultra Messaging SMX is designed for messaging within a single server – in fact within a single multi-core chip, an architecture that has become increasingly adopted as Intel has rolled out its Sandy Bridge (and now Ivy Bridge) microprocessors – with up to 12 cores on certain Ivy Bridge chips. On chip cache memory is leveraged by SMX, since it is faster than fetching data from standard RAM.
Latency tests were conducted between threads running on the same core (2 threads per core are supported by Intel) and between cores on the same chip. Throughput tests were conducted from one thread to threads across many cores on the same chip. Informatica did not test latency between cores across sockets, since it would have been higher than for a single socket.
Informatica tested its transport against C, C# and Java APIs, noting that trading systems are often built using a number of languages and so such support is a typical requirement. The test systems for latency included one server with an Intel Xeon E5-1620, with 4 cores, clocked at 3.6 GHz, while for throughput tests a server with a (pre-release) 10 core Ivy Bridge chip, operating at 2.8 GHz, was used. CentOS and Red Hat Linux operating systems were hosts for the C and Java tests, with Microsoft Windows 7 Professional SP1 supporting the C# tests.
Some highlights from the tests are:
* Thread to thread latency on same core, for the C API, and 16 byte messages, was 39 nanoseconds. The same for 128 byte messages was 48 nanoseconds, for 512 byte messages was 81 nanoseconds.
* Thread to thread latency on a sibling core, for the C API, was 103 nanoseconds for 16 byte messages, 111 nanoseconds for 128 byte messages, and 135 nanoseconds for 512 byte messages.
* C# and Java latencies were a bit higher. For example, latency for 512 byte messages between threads on the same core was 135 nanoseconds for C# and 106 nanoseconds for Java.
* As an example of a throughput test, 16 byte messages were transmitted from one thread to up to 19 other threads on the same chip. With 19 receivers and the C API, throughput of 133.92 million messages/secomd was achieved, without batching of messages. Batching – which increases latency – increased this to 305.34 million messages/second. Informatica found that throughput increased nearly linearly as receivers were added.
While the significant decrease in high frequency trading has reduced the overall need for such low latency transports, Informatica notes that it is still required for other trading operations and strategies, such as arbitrage, market making and smart order routing.