By Stephane Dubois, Founder and Chief Executive Officer, Xignite.
Over the last two to three years, several vendors have come into the market data space focused on selling financial data cheap. This is not a new phenomenon. We have seen it happen time and time again. After all, Xignite’s first API sold for as little as $19.95/month back in 2003. It was never our mission to be cheap – we simply did not fully grasp the value of our data. Between affordable rates, streamlined integration and the elimination of a lot of legacy infrastructure, our clients derived significant cost savings.
In contrast, today’s newcomers have made it their mantra to sell data cheap. To be fair, they sometimes have a point that market data is too expensive. Traditional providers like Bloomberg and Refinitiv have not been shy trying to take advantage of their stronghold to charge unreasonable prices. At the same time, the quality of data they provide is often significantly higher than that of the newcomers, which in the best case comes from second-tier providers with lesser quality data, or in the worst case is illegitimately scraped from public websites or outright and shamelessly stolen from others.
So, this raises the question of the value of quality market data, and how one can distinguish quality data from non-quality data.
Data is a tricky product to buy because it is intangible. You cannot feel and touch its quality the same way you can tell a real Louis Vuitton bag from pale imitation. You cannot take a quick glance and know. It requires use and extensive testing before you can detect the gaps and bad data points and experience the missing corporate actions that will cost you weeks of work and tens of thousands of dollars. It’s a bit like buying a lemon car. You could not tell it was a lemon from the outside. It’s not until you are stranded in the middle of nowhere having to pay hundreds of dollars for a tow truck and your vacation is ruined before you realize you’ve been had.
On top of that, non-professional data buyers, such as individual investors, tend to have a highly biased perception of the value of data. That perception is shaped by the fact that a lot of data is available for free on Yahoo! Finance (this data is paid for via advertising instead of user fees and it is certainly not available for commercial use). The fact that data can be easily copied also contributes to that perception. Many inexperienced buyers have no idea what causes market data to be bad or what the effort is to keep it accurate. They see a premium historical data set here and a cheaper version there, and they cannot tell the difference, so they go for the cheap one. Then, depending on their use case, they might continue limping along with the cheap data set, working hard around every data problem polluting their existence, or they may start swearing to all Norse gods that they will never be had again.
So, what does all this have to do with TSLA and AAPL?
As you may remember, TSLA and AAPL, two of the most widely traded and eyeballed stocks in the industry, both split on the same day back on August 31. The impact of a stock split on underlying market data is significant. The stock price is divided, the volumes are multiplied, all historical data for the security must be adjusted (end of day prices and intraday data as well) and so on. If you don’t do this right, it will show like a sore thumb on your historical charts. If this happens on a stock as significantly tracked as TSLA and AAPL, everyone will notice.
Given the importance of TSLA and AAPL to the retail industry, the ramifications of those splits were significant. There was unusual market activity that day and two of the most popular trading platforms (Robinhood and TD Ameritrade) went down for a while due to the impact of the splits on their operations. Even Schwab and Vanguard were impacted as well. On the surface, those outages were not market data related. They may have been caused by unusual activity that day. And while it is understandable in a unique situation like this, it was still surprising to see such massive businesses stumble over what was a very predictable event.
But most interesting is the fact that at least two of the emerging market data players missed the splits altogether. As a result, clients who depended on them for accuracy could not. The impact on their business was probably significant.
The question is how can a financial data provider really miss such wildly advertised splits?
Most likely, those vendors relied on their upstream provider to get the data right. And those upstream providers missed it. But what it means is that those vendors did not have either manual or automated processes in place to verify that such impactful data points would not be missing. And if you miss out on two such significant market events, what else are you missing?
This perfectly illustrates the difference between a premium data provider and a cheap one. At Xignite, we worked through the weekend to ensure no data points would be missing and held an all-hands-on-deck Monday morning to get ahead of any potential issues. I am sure that many other data vendors did the same. Cheap market data may be alluring on the surface, but it will never offer this level of expertise and thoroughness, making it far less economical in the long run.
Market data is complex. Corporate actions (like splits, dividends, mergers and others) are one of the most complex aspects of market data. Providers who take the time to build a robust, differentiated offering are uniquely positioned to help in this area. For example, after years of relying only on our data sources and automated process for accuracy, we have invested and built up a team whose sole purpose is data quality and preventing issues such as the missing TSLA and AAPL splits.
Staffing a team like this is expensive and complicated. It requires an architecture that decouples data types (pricing, corporate actions) and automatically stitches historical data based on corporate actions, and a large list of specific data quality processes including: cross-validation across APIs and sources, statistical monitoring, exception reporting and manual overrides for complex corporate actions.
Doing so means admitting that as tech people we cannot automate everything. It also requires some manual intervention, so if you only have a handful of employees, quality is almost impossible to achieve. But what it also means is that to get high quality data you should expect to pay a premium over the cheaper alternatives. With market data, you truly get what you pay for.