
Has the institutional alternative data market reached a phase in which the easy sources of edge have closed? Datasets that once generated standalone alpha are widely distributed, the AI tooling layered on top of them is increasingly commoditised, and the differentiator has migrated to a less glamorous middle ground: validation, transformation, kill criteria, and the human judgement wrapped around the technology.
That was the line of argument running through the opening Leaders Panel at the A-Team Group/Eagle Alpha Alternative Data Conference in London, where production – rather than acquisition – emerged as the binding constraint.
The panel, moderated by Julia Meigh, alternative data specialist, featured Matthew Bell, senior data scientist at Man Group; Timothée Consigny, chief technology officer and head of GenAI innovation at H2O Asset Management; Renato Guerrieri, head of quantitative strategy – liquid alternatives at Downing; Rob Schack, senior vice president of partnerships at Decision Desk HQ; and Shivani Shah-Knowles, team manager, account management at Sensor Tower.Access Has Stopped Paying
One panellist noted that simply having a particular dataset is no longer a source of edge in itself, as most datasets in 2026 are widely distributed and rival funds are likely looking at the same feeds. The same individual made the point that a backtest Sharpe Ratio of 2.5 routinely shrinks to 0.8 in live trading once a strategy goes into production. The reference point offered was 2016, when a single alternative dataset could simply generate alpha on its own. That window, the argument ran, has closed.
The first audience poll confirmed the conclusion, naming access to unique and proprietary datasets as the most important source of edge. Another panellist linked the result to a recurring client request for bespoke datasets engineered against a specific thesis rather than prepackaged products. Standardised distribution is now table stakes; the commercial conversation has moved to whether a vendor can deliver something the buyer’s competitors do not also have.
AI Is a Commodity, Not a Differentiator
The panel was unsparing on what one contributor described as “AI washing,” with every vendor now claiming an AI solution and the word “agent” doing inconsistent lifting – in some cases describing little more than a wrapper prompt around a frontier model. Another panellist offered a structural framing: a workflow has three components – model, data and user – and the model itself has become the least durable. Models change rapidly, data remains important, but the user and the use case are what persist.
The argument was made sharper later. The major AI systems, suggested one speaker, are available to every firm in the market, from the largest multi-strategy hedge funds to the smallest asset manager. The differentiator is no longer the model but the institutional process wrapped around it: the prompts, the documentation, the team-specific workflows, the conventions for how analysts interact with the tools. The human factor is reasserting itself, on this view, precisely because the underlying technology has stopped being scarce.One panellist made the point that black-box AI-generated vendor signals are increasingly difficult to onboard on the same logic. If a researcher cannot explain how a signal is constructed, the firm will not trade real money on it. Vendors building defensible product are using AI to clean and enhance existing datasets, rather than marketing AI-generated signals as standalone offerings.
The P-Hacking Problem Gets Worse, Not Better
If access has stopped paying and the model layer is commoditised, where does AI genuinely add value? The panel’s answer was qualified. AI is changing the discovery stage – surfacing questions, summarising research, and making qualitative data such as earnings transcripts and sell-side notes systematically usable in ways that previously required dedicated machine learning engineering. It is also unlocking textual datasets at a scale that would not previously have justified researcher time: one panellist gave the example of using news data to track copper mine outages in Chile, a niche no team would staff manually but that an AI workflow can surface alongside thousands of similar ideas.
The risk, multiple panellists agreed, is that the same workflow industrialises P-hacking – the practice, knowing or unwitting, of running enough statistical tests across a dataset that a spurious correlation eventually appears significant by chance alone. The point was made that if clustering techniques are run a sufficient number of times across a large textual dataset, one of the resulting signals is almost guaranteed to look excellent in a backtest without any underlying economic logic. The countermeasure, the argument ran, remains a researcher capable of asking whether there is a fundamental economic reason for the signal to exist. A separate panellist made an adjacent argument: textual data retrieval has stopped being the bottleneck, and the constraint has shifted to distinguishing statistically real signals from economically viable ones, where liquidity, capacity and overcrowding determine whether a backtested signal survives implementation. A further concern raised was an emerging closed loop in which one AI drafts research notes processed by another AI to extract signal, with informational content potentially degraded at each step.
Where New Signal Is Genuinely Coming From
The strongest single area of agreement on novel signal sources was prediction markets. Multiple panellists flagged Kalshi and Polymarket as a source the market is taking seriously, particularly around US political and policy events. The specific use case offered was using prediction market data to quantify the impact of late-cycle news events – scandals, surprise announcements – where polling cannot keep up. One panellist noted that big news stories often have a smaller effect on outcomes than headlines suggest, and that prediction markets provide the first reliable quantitative read on which stories matter and which do not.
Beyond prediction markets, the panel pointed to expanding use cases in private markets, credit, FX and commodities. Private markets benefit structurally from the absence of standardised disclosures, which makes datasets such as app usage and headcount data more valuable. As AI lowers the technical barrier to onboarding, the user base is also widening beyond firms with large data science teams.
The Capabilities That Matter Now
Asked what now constitutes a must-have, the responses converged on a short list. On the data product itself: clean mapping to tickers or company identifiers, long point-in-time history, and the ability to trial data ahead of contract. For systematic macro use, 20 years of point-in-time history was flagged as an entry point – a meaningfully higher bar than quant equity teams typically require. Coverage breadth was identified as a primary multiplier: more instruments traded means more alpha, which in turn justifies more spend.
The forward-looking capability was first-party data ownership – proprietary panels and primary collection rather than scraped or aggregated public sources. As AI makes public-source aggregation easier, vendors whose entire offering rests on that aggregation become structurally exposed. A complementary argument was made for rebuilding any alt data stack today as a modular architecture: buying where capability is genuinely commoditised, building internally where context matters, and forcing every dataset to answer a specific question rather than entering an inventory of speculative holdings.
On time-to-insight, the panel was clear: time-to-data is no longer the relevant measure; what matters is time to a validated investment decision. The best teams are not just fast at testing ideas – they are faster at killing the ones that do not survive scrutiny. In an environment where research time is the scarcest input, an inability to discard ideas quickly compounds into what one panellist described as self-deception at scale.
The second audience poll, on where firms are increasing investment in 2026, returned new data sources as the leading answer. That reads naturally against the rest of the discussion: if undifferentiated access is no longer edge but proprietary access still is, the search for genuinely novel sources is the rational response. A tougher question – and one the panel left unresolved – is whether the next phase of competition will be won at the sourcing layer at all, or whether the durable edge has already moved to the institutional process that decides what to do with the data once it arrives.
Subscribe to our newsletter


