Following up on the big Cisco news the other week of the launch of its Nexus 3548 switch, featuring latency of less than 190 nanoseconds, IntelligentTradingTechnology.com wanted to find out more about the new offering, its capabilities and the technology behind it. Dave Malik, senior director within its financial services group, provided some answers.
Q: One of the features of the Nexus 3548 is wire speed Network Address Translation, or NAT. What is NAT, and why is it important to trading firms? Before the 3548, how would NAT be implemented, and what would have been the latency hit?
A: Network Address Translation (NAT) is a feature used on many edge/boundary devices (routers/switches/firewalls), which translates one or more IP addresses from different networks to another IP address space (e.g. private network to public network or internal network to external network) and vice versa.
NAT is extremely important to trading firms since there are scenarios where a security boundary needs to be created between clients, exchanges/venues and other un-trusted networks. See below:
Scenario 1: Exchanges only allowing incoming connections from their managed IP address space. When a participant firm needs to send order flow to an exchange, it will want to mask its internal address schema and translate to the one required by the exchange.
Scenario 2: Client access connectivity between a (sell side) member firm co-located at the exchange and its clients. The member firm will have a demarcation point, through which clients are able to send orders. Both the sell-side firm and its clients gain additional security at this boundary via Network Address Translation. In some cases, it is even possible where overlapping IP address spaces exit in this environment where NAT can address to simplify.
The firms currently pay a penalty in latency for the address translation since it cannot be performed at line-rate performance. The latency budget with NAT enabled in traditional network devices ranges from the double digits microsecond range and beyond, depending on the implementation.
The Nexus 3548 provides NAT capabilities with no compromise to performance and latency attributes – as fast as 190 nanoseconds (ns). Speed and control are key in high velocity trading environments.
Q: What was the thinking that drove the decision to design and manufacture your own “Algo Boost” ASIC chip, and not to continue to use ones from third parties?
A: As a world leader in networking, Cisco has strong relationships with our clients in the financial markets and we actively listen to their changing requirements to address challenges of latency, security and compliance which directly impact their business. Third party or “merchant” silicon was not meeting the unique requirements of high performance environments, so Cisco decided to build innovative silicon from the ground up in an Application Specific Integrated Circuit (ASIC) that significantly lowered latency while not sacrificing full feature capabilities.
With custom engineered silicon, Cisco is able to provide unprecedented performance, visibility and control with groundbreaking latency of 50 ns with Warp Span mode, 190 ns with Warp mode, combined with line-rate NAT, Precision Time Protocol and Active Buffer Monitoring capabilities, plus full Layer-2/3 features with programmability features designed to address many use cases. We are excited by the response and interest we are getting from our customers on this innovation since we announced it.
Q: Can you provide some information on the kinds of analytics that the Algo Boost chip provides, and how these can be useful for trading systems?
A: The unique Cisco Algo Boost technology in the custom ASIC switching silicon provides industry-leading performance analytics framework to provide granular visibility to help financial traders accelerate price discovery, increase order flow liquidity, and better manage regulatory requirements.
The Active Buffer Monitoring analytics proactively monitors the network to help avoid latency or network congestion through intelligence embedded in the trading fabric. Congestion results in increased latency and is the enemy in the high-speed trading world. Congestion can be caused during periods of high volatility due to microbursts, having many-to-one flows and speed mismatches within the infrastructure. In these situations, queued packets must wait for the buffer to clear before continuing transmission.
Current platforms available today in the market do not proactively inform administrators of rising congestion, time when the congestion occurred, how long it was present, and whether any packets dropped, in a very granular time interval.
Active Buffer Monitoring capability provides real-time granular buffer occupancy data analytics to provide better visibility to address all of these requirements and prevent blind spots for administrators. It also provides rich buffer histograms per port, illustrating the percentage of time switch buffers were empty, fully occupied, or anywhere in between, with up to 10 ns granularity vs. others that only provide utilisation snapshots and watermarks. With this deeper visibility, an administrator can easily detect which ports encountered the microbursts, the exact time when the bursts occurred, and what was the maximum level of buffer utilisation during that microburst.
It is now possible to proactively detect the health of the fabric and take action to avoid bottlenecks. Trade correlation engines within a trading infrastructure can now associate possible delays or discards of orders in a flow and conduct forensic analysis to see if the network was the root cause of the associated problem.
All of the data collected on the switch can be exported via XML and on-switch scripting capabilities via Python to be leveraged by data mining applications or trading applications themselves. Applications now have the capability to adapt to communication patterns based on intelligence that is being provided by the switching fabric. And since events occur in very granular instances in time, Precision Time Protocol implementation in hardware is essential for correlation of all analytics. Furthermore, Cisco has embedded a 1 PPS (pulse per second) timing interface to verify the accuracy of the synchronised infrastructure.
Several enhancements to Switch Port Analyser functionality are also available through Algo Boost. First, in order to limit the data captured to what is interesting, an administrator can easily sample or filter interesting traffic that needs to be captured, and truncate the frames to a certain offset before capturing. Also, when the device is synchronised to a master clock via PTP, the switch can add timestamps to the Encapsulated Remote Span headers of captured packets for more rich analysis and correlation. The deeper filtering mechanisms can be used to monitor for gapping, slippage and slowness in order flow or market data within the trading infrastructure.
Q: In Warp mode, latency is reduced but at the expense of the number of network nodes that can be addressed. What’s the technical reason for that? Is it going to have an impact for most trading systems?
A: As an outcome of the feedback from several clients, Cisco’s silicon was designed to provide a solid balance in scalability, performance and ultra-low latency speed of 190 ns. When the 3548 is put into Warp mode, the ASIC stores all types of data – MACs, Routes, etc, in a single TCAM (Ternary Content Addressable Memory), allowing it to run multiple functions in a single lookup operation, lowering latency. Because a single table is used instead of multiple, larger, tables, the scalability is reduced. In Warp mode, the Nexus 3548 switch supports 4K unicast routes, 8K host routes and 8K MAC addresses in its hardware tables. These scalability parameters are still far beyond what any trading fabric requires.
Q: And in Warp Span mode, latency is further – significantly – reduced. How has that been made possible?
A: On the Nexus 3548 platform, Warp Span mode reduces the port-to-port latency to approximately 50 nanoseconds. Traffic ingressing a specific port is replicated to N user-configurable group of egress ports. The latency savings comes from the fact that the packet replication occurs without any classification, lookup processes or Access-Control Lists – see below:
Warp Span works simultaneously with normal and Warp mode traffic forwarding modes. For example, the traffic that is coming in on the ingress port can be replicated via Warp Span and also be filtered with ACLs and switched via Layer 2/3 at the same time with zero impact to latency. This mode allows bypass and traffic splitting capabilities for time sensitive traffic such as market data.
Our clients are extremely excited in being able to enhance price discovery in the market and access real-time market information within 50 nanoseconds. Many exchanges and venues are also showing great interest in leveraging this capability in Algo Boost for market data distribution to their members.