Machine-readable news technology, which enables trading and investment firms to process unstructured news data in a way that it can be added to the investment process, has become a powerful tool for firms seeking a competitive edge in today’s markets, as it can help firms derive sentiment, which in turn can be used to generate alpha.
But how can firms separate the signals from the noise? And what are the practical considerations of integrating machine-readable news into a trading environment?
This was the topic of discussion at a recent A-Team Group webinar, ‘Integrating Intelligent Machine Readable News’, sponsored by Moody’s Analytics, and featuring panellists: Andrea Nardon, Chief Quant Officer at Black Alpha Capital; Gurraj Singh Sangha, Chief Quantitative Investment Officer at Token Metrics; Saeed Amen, Founder at Cuemacro; and Sergio Gago Huerta, Managing Director of Media Solutions at Moody’s Analytics.The webinar began with panellists outlining where machine-readable news can add value, particularly for traders operating on a short-term time horizon. Panellists agreed that machine-readable news is more appropriate for firms operating on a micro, rather than a macro level, particularly if the firm can develop an understanding of how events from the unstructured news world can relate to one another, and how that can be translated into signals for action.
One panellist pointed out that for market makers, integrating machine-readable news is not just about alpha generation, it is also about risk control, citing how they might immediately widen their bid/offer spreads following a news item, in order not to get ‘steamrollered’ by people taking liquidity.
A prerequisite for firms operating at a micro level is to have well-structured knowledge graphs, with everything mapped appropriately to tickers. However, as setting up such a knowledge graph can be a complex process, this is an area where firms typically require the assistance of the vendor community to create these and keep them constantly up to date, with appropriately tagged metadata.
A number of challenges were discussed. One panellist highlighted the regulatory hurdles that need to be considered, for example compliance with the US SEC requirement to embargo or handicap content so that it is delivered to everyone at the same time. Another concern was raised around the exclusivity of datasets, i.e. if a firm has a news data feed that is exclusive to them, and the vendor doesn’t sell it to anyone else, that could potentially be classified as non-public information, which regulators might take issue with.
Other challenges discussed revolved around the accuracy and timeliness of the news data. Data accuracy is key if false signals are to be avoided. And latency-sensitive firms such as high frequency traders and market makers need to ensure that they can receive, process, and act on the data with no delays. This means that a typical cloud-based data infrastructure designed for big data and data science may not suffice, and firms may need to consider collocated infrastructures, with the absolute minimum network latency between the receipt of the news feed, the data extraction from it, and the signal generation from that content.
In terms of practical approaches to integrating machine-readable news and pitfalls to avoid, panellists made a number of recommendations.
As there are many different sources of news data, it is likely that firms will need to aggregate a number of sources, some of which might already be aggregated. Panellists recommended not just sticking to one specific source, but to combine sources in a way that can provide unique insights. Firms should therefore consider what types of sources a trader would look at, before automating the process and replicating it on a systematic basis, and use their domain expertise to work on filtering the news in the right way. Social media can be a useful addition to derive sentiment, but as it can often be inaccurate, it should be combined with news from authoritative sources to improve trading signals.
As integrating news feeds can be a costly and complex enterprise, panellists agreed that there’s no point embarking on the journey unless budget is available to research the subject, purchase the data and develop the appropriate models.
Costs can however be minimised at the start, by doing one thing at a time. First, get the sources and use them manually, as if in front of a trading terminal, but use those sources as the principle input, to evaluate what works and what doesn’t. Only then, start automating the NLP and other elements to extract the signal from the noise. Look at how the filters are working. What correlations are there, and over what time horizon?
Another recommendation was to set benchmarks and success criteria. It’s always important to be as clear as possible from the start, as to why you’re doing this, what you intend to achieve and how you can measure success against those benchmarks.
And finally, test, test and keep testing.