In-memory technology is finding new applications in the financial markets and in big data processing. We found out how ScaleOut Software is addressing these needs from the company’s COO Dave Brinker.
Q: How did ScaleOut Software get started – what’s in the name and what is its focus?
A: ScaleOut Software was founded by William Bain after he left Microsoft and first product was shipped in January 2005. He saw that applications running on server farms were encountering increased workloads that slowed application response times. Simply adding servers did not improve performance because data access bottlenecks, such as central database storage, were slowing performance. Storing data on each server didn’t help either since load balanced servers require uniform access to data that might be on another server.
The answer was to provide in-memory data grid (IMDG) middleware that could span a cluster of servers to provide fast data access and could scale its performance linearly by simply adding servers to the grid. Since its founding, ScaleOut Software has focused on in-memory solutions, providing leading-edge parallel computing technology it develops in-house. The origin of the name is straightforward; our products scale application performance across networked servers, thus the term “scaling out.” This can be contrasted to “scaling up” performance by using more powerful single machines.
Q: What are your key products and how are they typically being used in the financial markets?
A: Here is a summary of our products:
– ScaleOut StateServer gives your applications fast performance and scalability by using an IMDG to store fast-changing data. ScaleOut StateServer’s fast performance, ease of use, and portability set it apart for the competition.
– ScaleOut Analytics Server adds powerful, easy-to-use map/reduce-style data analytics to rapidly mine the in-memory data grid for trends and patterns. (Analytics Server is the successor to Grid Computing Edition.)
– ScaleOut hServer is designed to provide in-memory storage for Hadoop programs to accelerate access to fast changing data. ScaleOut hServer is available in a free community edition as well as a range of commercial editions. Open source API libraries are included.
– ScaleOut GeoServer provides data replication between data centers for disaster recovery, and it enables global access to grid data from remote sites.
– ScaleOut Management Pack adds industry-leading tools for examining grid data and quickly taking snapshots of the in-memory data grid.
There are several use cases for our products in financial services applications. Here are a few of the most common:
Real-time Risk Analysis: Using ScaleOut Analytics Server, customers can maintain investment positions in the IMDG, update them with streaming market data and trades, and simultaneously run continuous risk analytics on the positions according to defined business rules. Alerts can be raised when out-of-range conditions occur. Benefit: real-time response is possible to changing risk profiles during the trading day.
Scalable Position Keeping: Using Analytics Server’s parallel computing capabilities, trading firms can hold their portfolio positions in-memory and update them in real-time. Benefit: Traders always have fresh data at their fingertips.
Application Scalability: Using ScaleOut StateServer’s IMDG, firms can provide online applications for banking, loans, equity trading, etc. that scale to meet increased workloads. Benefit: consistently fast response times for application users even during peak loads.
Grid Computing/HPC: For firms using traditional grid computing for Monte Carlo simulations or other massively parallel tasks, ScaleOut can be used as an IMDG for storing raw data, state information, or computational data. Benefit: Overall performance is faster due to the speed and scalability of the IMDG as a data store when coupled with HPC.
Data Virtualisation: Where a firm has multiple geographically remote data grids, ScaleOut GeoServer can be used to create a single virtual data grid enabling applications to easily access data, regardless of where it is stored. This is especially useful in cases where, say, pricing or portfolio data is maintained in different locations, but must be accessible by any location. GeoServer’s multiple coherency policies enable the application developer to easily implement the desired functionality. Benefit: Complicated Web services code is eliminated and remote data centers are integrated.
Q: What kind of performance advantages do they deliver versus existing systems?
A: Our systems are designed to provide near real-time computation. We are focused on reducing tasks that currently require minutes to seconds. For example, a proof of concept done for a hedge fund provides risk analytics results in one second versus the fifteen minutes it took using their current method (SQL Server). You can read about this use case in detail on our website at http://www.scaleoutsoftware.com/solutions/hedge-fund-strategy-management/.
These results are possible due to our architecture that is built on pure in-memory computing combined with parallel processing technology. There are dozens of use cases where near real-time results can yield significant business results. Note: As a contrast, our products are not designed for the millisecond response times that are common in HFT and other extremely low latency environments.
Q: In-memory approaches are not new. So why are they getting so much attention of late?
A: There are several factors at work:
– Competitive pressure: The need to make business decisions before your competition is stronger than ever.
– Memory prices: RAM technology has advanced to the point where very high memory servers are affordable by a much wider range of companies.
– Software advancements: In-memory computing software has developed to a point of providing more powerful and useful solutions than ever before. At the same time, the developer community has become more familiar with in-memory technologies and their benefits.
Q: How does your technology play with big data technologies like Hadoop?
A: We have recently released a new product, ScaleOut hServer. This product is specifically designed to work with Hadoop. The product’s aim is to enable Hadoop for real-time analytics and this release is the first step toward that goal (look for more releases in a few months). Instead of storing data on disk within the Hadoop Distributed File System (HDFS), ScaleOut hServer uses its fast, scalable IMDG. This provides two main benefits (separate use cases):
– Live data can be continuously updated and analysed using standard Hadoop MapReduce programs. (Note that HDFS cannot support “live” data because it cannot be updated). In this model you store data directly in our IMDG and it can be accessed by Hadoop for normal Hadoop processing.
– ScaleOut hServer can be used as a transparent in-memory cache for HDFS data sets, which fit within the IMDG’s memory. In this usage model, when you run a Hadoop MapReduce, key/value pairs pass from your HDFS record readers into the mappers and ScaleOut hServer stores them in the IMDG. On subsequent runs, it transparently reads key/value pairs from the IMDG, providing a significant speed-up (11x in benchmarks) in data access time.
In this first release of ScaleOut hServer, the data access bottleneck is removed. In subsequent releases we will address other Hadoop performance bottlenecks that prevent it from being a real-time system.
Q: What does the future hold for in-memory technologies, and for ScaleOut in particular?
A: We believe the trend is clear: there will be continued pressure to process more data faster. The path for meeting this need is to use in-memory computing because it provides clear advantages in speed. Those technologies that marry the speed of in-memory with solid distributed computing technology will be able to provide solutions that are not only fast, but also scale to handle very large data sets. ScaleOut has been a pure-play in exactly these two areas. We plan to remain 100% focused on this powerful combination of capabilities and to keep providing solutions that meet the needs of our customers.