Building systems for low-latency market data integration with trading systems requires a rare combination of technology skills and financial markets understanding. IntelligentTradingTechnology.com talked to Art Novikov, founder of Trading Physics, which leverages that combination for its customers – whether they be vendors or market participants.
Q: First off, can you explain what Trading Physics does, and provide some background on the company?
A: Trading Physics provides interfaces to real-time and historical data feeds, as well as software development, integration, and custom data processing services that help private and corporate investors expedite the design, implementation, and benchmarking of their trading strategies. We have also integrated our technology with our partners, including market data vendors, brokers, and trading software companies, to allow for a rapid, turn-key deployment of our customer’s systems.
Q: Can you drill down on the kind of analytics services and low-latency systems that you build for your customers?
A: Our sweet spot is high-performance processing of large volumes of market data and delivery of the output via a certain interface or format given certain latency and bandwidth constraints. Take as an example our proprietary set of 600 indicators and oscillators that are computed off the equity depth feeds. These indicators numerically measure traders’ behavior and expectations based on how orders are placed, traded, and deleted, as well as the open order book. So the customers can feed +1 for overbought and -1 for oversold stocks into their algorithms, with down to one millisecond timestamp granularity. We have customers who deploy our indicator server in-house for their strategy R&D and real time trading. They gain immediate insight into the forces that are driving the market right out of the box, which saves them months of programmers’ work of development and optimisation of their own algorithms, working with 50 gigabytes of daily order flow data.
Another example is our recent Mercury 2.0 interface to depth of book as well as trades & quotes messages for various asset classes, such as equities, futures, and options. This interface allows the client to programmatically access the data being consolidated from multiple feed sources in a normalised fashion, both in real time and historical replay modes. However, it is not designed for an end user, such as a trader, but for a trading application that requires low-latency access to hundreds of gigabytes of real time data and hundreds of terabytes of historical data. The interface also enables definition of custom asset classes as well, allowing our customers to load and replay the data in their own formats that they are already set up with.
Q: How much commonality is there in the technology choices that you make internally and across customers? Are there particular technologies or vendors that you leverage?
A: There is a lot of commonality in the hardware space, as we are, like our customers, dependent on the reliable and fast equipment to keep us going. We are using the same high-performance Myricom and Solarflare networking cards, IBM/Blade and Arista switches, and Intel Xeon-based servers for maximum single-thread speed. We work with customers on optimal design, and the final decision is made by the customer, of course. We have learned, indeed, that there are unique preferences for software, especially when it comes to distributions of Linux and development environments. We, in turn, prefer Intel’s i7-2600K desktop CPUs (overclockable to 4.4 Ghz) for tasks that require fastest single-thread processing, Myricom’s 10gE NICs with DBL 2.0 for low network layer latency, Intel’s SSD for storage reliability and speed. In terms of operating systems and development environment, we like CentOS 6.2 and Netbeans 7.1. The core is written in C++ with C# interoperability. We have developed a wide array of our own tools specifically tailored to process the data in a multi-threaded fashion, for example decompression/compression, socket and disk i/o, and processing is all performed in different threads. We have tailored our solutions to 64-bit architecture and are using machine language and compiler intrinsic directives to achieve fastest processing, and help our customers take advantage of this as well.
Q: Where do you see room for improvements in latency? How have you been reducing latency for recent deployments?
A: One way of improving latency is algorithm design. That is why we have committed ourselves to designing formats, interfaces, and protocols that are programmer-friendly, ensuring that the client application will save CPU cycles by getting the data ready and easy to use. We have spent three years fine-tuning our core server platform, optimising our data processing algorithms down to single machine instructions and building highly scalable concurrent processing architecture that uses our own high-performing, low-latency synchronisation primitives tailored to processor architecture and executing algorithms. For example, we can generate entire time series for 600 indicators for 8000 tickers off a trading session with 500 million events in less than 60 seconds on a 4-core mainstream CPU.
We also use solutions from third-party vendors that help overcome OS bottlenecks, such as direct access to the network interface bypassing the networking stack. For instance, Myricom’s DBL 2.0 layer was able to deliver 4.0 microsecond average latency in our multi-cast version of the Mercury protocol, meaning that the client application will receive the results over the network only 4.0 microseconds later than the original event came into their switch.
In most cases, however, we are bound to the actual hardware interfaces. For example, the top-of-the line 10gE multicast latency is 3-4 microseconds, with bandwidth up to 2.5 million packets. A way to deal with this would be to use an alternative solution, such as InfiniBand. However, it increases complexity and cost for customer deployments.
Q: What is next for the company, in terms of products or technology directions?
A: We are rolling out a suite of Mercury 2.0 solutions tailored at different deployment scenarios: ultra low latency multi-cast co-location version (10gE), high-speed TCP co-location version (1Gbit and 100Mbit), as well as the mainstream version for feed delivery over the internet (together with our market data vendor, broker, and trade station partners). This will streamline the entire trading strategy development cycle, as the customers can start with an internet and historical replay capable versions, and then move over to co-location as higher bandwidth and lower latency becomes more important, preserving the existing interfaces.
Another direction is offering our server technology to customers who wish to use their own data on our high-speed, low-latency server. This offering will be of interest to those customers who already have their own proprietary sources of market data in their own format, but are struggling to collect, index, store, and provide the random access to gigabytes of real time and terabytes of replay data to their traders and algorithms.