Big Data pervades every aspect of our lives, whether in support of smartphone apps or more sophisticated AI techniques used across many industries. The financial services segment is no exceptions, and the explosion of data in financial markets is posing new opportunities and challenges to deploy analytics for better trading decisions and more effective surveillance.
A recent A-Team Group webinar – ‘Data management for analytics and market surveillance’ – identified ways in which practitioners are deploying new technologies to harness the potential of expanded data sets. Panellists also discussed the challenges involved in establishing robust data management disciplines for analytical applications, whether for trading advantage, market surveillance or regulatory compliance.
The webinar featured panellists Nick Maslavets, Global Head of Surveillance Development, Methodologies, & Analytics at Citadel; Ilija Zovko, Senior Researcher at Aspect Capital; and James Corcoran, CTO Enterprise Solutions, at Kx, which sponsored the session. A recording of the webinar is available here.
Key challenges identified by the panel included how to ensure data quality across multiple lines of business, and how to implement the necessary data sourcing, integration and governance processes.
From a trading perspective, trading algos use much more information than is readily available on generic trading platforms, which only scratch the surface of what is now available in the markets. To build algorithms that can meaningfully react to unprecedented market conditions (such as we saw in 2020), more granular microstructure data is needed.
And regarding surveillance, it is becoming increasingly necessary for firms to monitor not only their own activity, but data from across the market. This is particularly necessary to protect P&L. Whenever a firm enters a market, it is potentially subject to manipulation by external participants, and the more frequently and the higher volume a firm trades, the bigger a target it becomes. By analysing data from across the market at a microstructure level, firms are in a better position to identify if and when their algos or their flow become targets for manipulation.
First principles for building a robust data management system
Firms need to balance the pragmatic approach of delivering solutions that create value to the business in the short term, with the longer-term approach of building more strategic solutions that can ingest new data sources and scale to multiple users and high data volumes. Equally important is the need for discipline in how applications interconnect, which means that well-defined interfaces are essential if data silos are to be avoided.
One pitfall that should be avoided is attempting to anticipate queries by pre-computing in advance and storing intermediate results. The danger here is that if a specific question has not been foreseen, it will remain unanswered. A better approach is to decouple the computation from the data by investing in a strong data management and query capability, so that questions can be addressed directly to the source data.
Another key principle is to cleanse data before it enters the platform rather than after it is already ingested. So having a strong ingest layer that provides cleansing, metadata and tagging is key to data quality and consistency.
Synchronising and integrating multiple data sources
Data sources should be approached as a continuum rather than totally separate environments, particularly for hybrid environments that include on-premise, private cloud and public cloud. By leveraging best practices around cloud native infosec and identity access management, and by integrating with tooling that can work in a hybrid environment, firms can create an all-encompassing platform.
One way of achieving this is by having a connector or gateway layer that can load balance across the various environments. This provides the ability to span across the continuum, abstracting the data source so that the business user – whether that’s a quantitative researcher, an analyst or even an application – doesn’t need to know where the data resides.
Once the data is abstracted in this way, data sources can be changed as needed, as the analytics components are agnostic to the segregated data. The onus then falls upon the users to ensure that the data is well understood and in a suitable format, without necessarily knowing the origins of it.
The ideal approach is to use raw market data as the starting point and build a set of functions around that data that can address questions from different domains. From a surveillance perspective, surveillance analysts need to be able to look at market microstructure data at the same level of detail as algorithmic traders in the front office. It’s important to have a highly granular data set that’s available to everyone who needs to access it, and to have the ability to filter it as necessary on the fly, depending on the use case.
Data should be tightly integrated, but loosely coupled, with a tight level of integration for internal and external data sources and feeds, but also the ability to swap those sources in and out over time if a firm needs to move to a different data vendor, or to change one data source for another. That can only be achieved through a decoupled architecture. Firms should avoid building vendor-specific or data source-specific logic deep within the database or the application.
Working with streaming and real-time data
Kx’s approach is to combine streaming data with massive amounts of data at rest, and to be able to capture multiple different data sources simultaneously, running computations on data in flight. This enables real-time decisioning and real-time insights across global markets and multiple asset classes, whilst at the same time giving users a simple interface to query data and run computations such as rebuilding an order book, for example. The Kx platform is designed to do this at scale, in a real-time environment, in a secure way, while also integrating with enterprise environments. The platform is also interoperable with visualisation tools, machine learning tools, and security and authentication products, thereby giving users horizontal capability across the organisation.
Regarding surveillance against market abuse, it should be noted that not all types of potential manipulation can be detected in real time. Often, a complete day of data or even multiple days of data is required to find patterns and properly analyse them. However, the more that firms are able to detect potentially manipulative or suspicious activity – and stop it – in real-time, the better. The longer it takes for such behaviour to be detected, the more severe punishment the firm is likely to receive from the regulator. Also, if a firm detects that an algo is being manipulated by external market participants and is able to stop it right away, P&L is protected.
The importance of integrating streaming data in real time comes not only in analysing the data ‘on the wire’, but in looking at that data within the context of history. Firms need to be able to analyse trends by running large-scale analytics and computations using statistical data and embedding those data points into a real-time stream.
From an algorithmic trading perspective, real-time data is more important in unexpected circumstances than in ‘business as usual’ situations. Algos may have been designed for a completely different market regime than a firm might currently find itself in. It should therefore be possible to analyse current market circumstances, so that the best decision or the appropriate algorithm can be applied.
In summary, six key takeaways emerged from the panel discussion:
- Extracting information directly from market data is the ideal approach, but firms should understand that this is not a trivial task
- When implementing a data management solution for trading or surveillance, think ahead, with at least a five-year plan.
- Position your data mining platform as close to the source data as possible.
- Adopt a real-time mindset.
- Include cloud in your long-term plans, even if you’re not planning on implementing anything on the cloud over the next couple of quarters.
- Always build with scalability in mind.