Hardware acceleration means more than FPGAs. Network packet processors are also being adopted in the financial markets to boost the performance of trading applications. IntelligentTradingTechnology.com talked to Vibhoosh Gupta, business leader at GE Intelligent Platforms, to find out more.
Q: Let’s begin with the basics: What is a packet processor, and how does it compare functionally and in performance to (a) a traditional x86 CPU and (b) an FPGA?
A: A packet processor is a specialised compute element, created specifically to handle the movement, analysis and manipulation of data from an ethernet packet. It contains circuitry that includes RISC processing elements, cache memory, special dedicated processing acceleration, packet input and output ports and buffering, and high-speed interconnection between all of these elements.
A number of features and functions may be similar between a packet processor and an x86 CPU, but the x86 CPU is a general-purpose processor that can be programmed to perform packet processor functions while the packet processor is designed with specific processing functionality to give it maximum performance when operating on data packets and executing packet identification, inspection, encryption, data extraction or other aspects of packet handling.
Comparing a packet processor and an FPGA (designed to perform a processing function) is an interesting contrast. The FPGA can be developed with a customised and specific set of functions, most directly applicable to a specific (packet processing) requirement, but while it offers high performance, it is limited in its ability to do additional, perhaps newly-defined tasks without redesign.
The packet processor is programmable, and therefore flexible in the processing work it can be asked to do. A change of task means a change in the software running on the processor cores, resulting in a shorter downtime and shorter overall time to revenue. In this sense, packet processors are the perfect combination of flexibility and performance.
Q: Typically, what kind of functions might be implemented in packet processors? Where does their multi-core capability come into play?
A: Typical functionality provided inside a packet processor includes encryption/decryption, compression/decompression, packet data buffering, specialised I/O to match the expected packet interfaces (like 1G and 10G ethernet) and high-speed memory control, as well as the one or more cores of RISC processing elements. Applications where packet processor are inclined to be used include deep packet inspection, security gateways, load balancing and lawful intercept, as well as growing traction in financial applications, like low-latency risk management and content processing.
Multi-core refers to the number of processing elements included in the packet processor. Today, most competitive packet processors include multiple processing cores along with their buffering, I/O and acceleration functionality. Having multiple processing cores allows the packet processor to achieve high performance by either applying core processing capability in parallel to a single flow of packet data, or handle multiple traffic flows by applying one of the multiple cores to each flow. Also, the types of packet work to be done can be partitioned among groups of cores, allowing unrelated processing tasks to reside in the same packet processor device. A couple of examples here:
I. A specific packet processing application my consist of a single processing core executing 150 instructions per packet, and when clocked at 1GHz be able to support the traffic load of a 1Gbps Ethernet link without dropping packets. In order to increase the supported traffic load, to say 4 x 1Gbps links, the user would load the same application on three other available processing cores in the same packet processor, and enable the work scheduling circuitry inside the packet processor to distribute the incoming or outgoing packets to this 4-core processing configuration, sharing the processing work to be done among those 4 cores while still maintaining the line rate of the packet traffic.
II. For a particular flow of packet traffic, the user may need to perform encryption or decryption on the packets before the packet payload is available for the revenue-generating work that the user has in-mind. With this requirement, the user would allocate one or two cores for their cryptographic application, then run their payload application on the remaining cores in the processor. The user can balance the processing power and cryptographic algorithm selection with the payload processing requirements and run these two separate tasks on multiple processing cores in their application.
Q: How does one go about programming packet processors? Does it involve special languages and development tools? How quickly can code be developed for packet processors?
A: In the past, specialised processors had unique instruction sets and dedicate development tools for programming. Today, most packet processors use the standard C programming language and more familiar development tools for programming and debugging designs. Users familiar with Linux software development will be at home with these programming tools. That said, the above descriptions highlight that there is a fair amount of sophistication employed by today’s packet processors. In order to gain the most available performance from these processors’ usage, the user needs to understand the available functions and their implementation, as well as the Linux and bare metal operating system environments that are needed to take best advantage of this processing capability and achieve the necessary performance.
Using the standard tool sets and example programs in source code, applications can be created fairly quickly. Optimisation for performance can consume development time, but offerings like the GE Intelligent Platforms WANic FasTCP Optimised TCP/IP stack help get the user to their end-goals for functionality and performance.
Q: What products related to packet processors does GE Intelligent Platforms produce and what do they consist of?
A: GE Intelligent Platforms has been serving the embedded board market for 15 years, and has had a portfolio of packet processing products based on the Cavium Networks Octeon family of packet processors for more than eight years. Most recently, with an eye on opportunities in the financial markets, GE has developed the WANic-66512 PCI-Express packet processing card, using the Cavium Octeon II CN6645 10-core processor and providing two 10gE optical or direct attach SFP+ ports for the line interface. This product’s Octeon II processor operates at up to 1.5GHz clock speed (the fastest in market) and supports up to 8GB of DDR3 packet memory for program and packet data storage.
All available features of the Cavium Octeon II processor, like packet time stamping and the PCI-Express host interface are supported and a complete software development kit is provided for applications development.
GE Intelligent Platforms offers a complementary product, the WANic-6354, which is also Octeon II-based but uses a lower core count version and provides four 1Gb Ethernet Optical or Direct Attach SFP ports for the line interface.
As noted above, GE has extended its packet processing software offerings by developing a TCP/IP stack for users to deploy in their applications. This licensed software product, the WANic FasTCP TCP/IP stack, is a version of the open source LWIP stack that has been ported to and optimised for the Cavium Octeon II multicore processor. Users can get a leg up in their development effort by taking advantage of the high performance and low latency offered by this software package.
Q: How would a trading firm use your WANic network cards in conjunction with a mainstream server for HFT applications? Which components would run on the network cards, and which on the host server?
A: In financial applications that need the ability to operate on market data packets at line speed while maintaining low packet latency and maximum flexibility when functional changes are needed, the WANic-66512 packet processor card offers the right capability. A trading firm would deploy a WANic-66512 in one or more of their server PCI-Express slots, develop their multi-core application software using development kits and low-latency TCP/IP stack software like WANic FasTCP, then optimise for best performance and lowest latency. Resulting solutions could address market feed data analysis, regulatory or security applications or the distribution of trade data to selected customers.
Because the WANic-66512 is a programmable packet processor card, users can create a variety of solutions for their application, defining the type and amount of work distribution between the host system and the packet processor, starting on a small scale and expanding their application as required. When feature updates or functional changes are needed, they would be achieved through a short development cycle, change of the multi-core software and reload on the deployed cards, maintaining control of total cost of development and operation.
Q: What is the future development path for packet processors and your network cards? Where do you see this deployment model heading?
A: As packet processor technology evolves, we have the opportunity to consider how the multiple cores, higher clock speeds and new functionality that will be available might be used to address present and future applications. A successful strategy for packet processor deployment will require the right mix of compute capability, operational flexibility, power and thermal considerations as well as software development and implementation. It is the job of product developers like GE Intelligent Platforms to bring new technology and development expertise together to create the next generation of packet processing products.
In the HFT world, low latency is one of the most important requirements, with versatility and flexibility of programming significant as well. Taking advantage of higher clock speed processors in future packet processor products will allow users to push overall latency down while increasing performance, providing the best mix of latency, performance and flexibility for HFT applications.
Use of packet processing technology, originally designed for telecom application, as an HFT solution is generating interest among present and potential users in the financial community. As these users understand the performance/latency/flexibility equation as provided by packet processors, interest in today’s and tomorrow’s packet processing products will increase. GE Intelligent Platforms will be at the forefront of this effort.