
A trading firm will pay for 15 years of price history from a market data vendor and, on a parallel housekeeping schedule, purge its own order flow, client activity and system logs after five to seven. The vendor history is treated as an asset worth a recurring licence fee. The firm’s own data – generated by its desks every day – is treated as a storage cost to be cleared. That asymmetry, raised in the closing minutes of a panel on data as a competitive differentiator at A-Team Group’s recent TradingTech Summit New York, is the contradiction running underneath much of the current enthusiasm for alternative data: institutions are scouring the market for external signal while letting potentially the most differentiated signal they own expire on a retention timer.
The panel’s nominal subject was the “data product” mindset – the shift from treating data as a service piped in from outside to treating it as an owned product with clear lineage, quality controls and accountability. But a thread picked up repeatedly across the discussion was about a specific category of owned data: the operational telemetry generated inside the firm, and why so little of it ever becomes an investable signal.Why The Question Has Sharpened
The interest in internal data is not new. What has changed is the cost of getting it wrong. One line of argument held that artificial intelligence has turned data quality from a hygiene problem into a model-risk problem. In the pre-AI workflow, poorly sourced data produced a bad report that a human might catch. In an AI workflow, the same data feeds a model that is then deployed at scale, producing thousands of decisions before anyone notices the input was flawed. A single bad volatility surface entering a pricing model was offered as the kind of error that no longer stays contained. That shift, one participant suggested, is what has moved the data conversation out of the technology function and into the boardroom.
It is the same logic that has been driving the push to capture trading-floor “dark data” – the client intent and market colour locked inside unstructured chat streams – which Market & Alt Data Insight sister publication TradingTech Insight examined in its feature on agentic workflows on the trading desk. That earlier discussion concluded that platform openness and AI maturity had finally made the external-facing exhaust capturable, and that the question was no longer whether firms should act but how. The New York panel, in effect, turned the same lens inward – and found that for a firm’s own operational data, the “how” is where the difficulty actually sits.
From Exhaust To Signal
The raw material is plentiful. Customer activity logs, transaction and operations data, order flow, RFQs that missed, execution outcomes – the panel described a continuous stream of operational data that most firms generate and few systematically mine. The constraint is not availability. It is the work required to turn that raw output into something a model can use.
New datasets, internal ones included, rarely arrive in production-ready form. The discussion traced a sequence that has to happen before any of this becomes useable: governance and privacy controls first, then normalisation, then testing and validation, then the modelling and quant research that actually extracts a signal. Large language models were credited with a genuine but bounded role in that chain – streamlining metadata labelling, mapping and anomaly detection – but the panel located the differentiation in the domain expertise applied to the data rather than in the tooling itself. One participant put the distinction directly: AI technology will become commoditised; human insight will not.
Where firms follow that sequence, the payoff described was concrete. One trading desk recounted building from its own telemetry – including debit and credit card activity and other client-activity data – first to support an internal equity research platform, then cross-pollinating that data across other businesses, and ultimately packaging and monetising it for external clients. The progression matters: the same data asset moved from internal input, to cross-desk resource, to a product sold outside the firm. It is the clearest available demonstration that the internal data contains signal worth the processing cost – and that the firms which will outperform are not those consuming the most datasets but those with a disciplined framework for judging which signals are worth onboarding and supporting at scale.
Is Ownership the Obstacle, Not Architecture?
If the opportunity is understood and the worked example exists, the obvious question is why internal data so often goes uncaptured. The panel’s answer was consistent: the failure is one of incentives rather than capability.
Silos, more than one participant argued, are not built by technologists. Technology has no reason to create them; it generally inherits the resulting headache. Silos are built by organisational structure – by business units with competing revenue mandates, each with its own definition of a shared concept and its own reasons to guard its data. The same trade notional can carry a different definition on every desk that touches it. A model on one desk may require an input parameterised in a way that breaks another desk’s convention. Each divergence becomes a reason to keep data local, and the accumulated technical and business debt behind those local arrangements becomes a reason not to disturb them.
That makes the data-product shift a governance problem before it is an engineering one. The technological roadmap was described as already available – a common data catalogue so people can find what exists, data lineage so they can trust it, and granular access control so they can use it compliantly – but the panel was emphatic that the three only work in concert, and only when an incentive structure mandates their use from the top down. A point was made that data governance has to be mandated as a necessity, with data sharing made frictionless, because business units will not volunteer for it: efficiency is treated as a technology department’s job, while the business invests in what generates revenue. On that reading, the firm-wide sharing of internal data is unlikely to emerge from the desks on its own, however valuable the signal sitting inside them.
There was a counter-current worth noting. The dark-data feature referenced earlier had argued that the most transformative uses would come not from technologists prescribing them but from desks taking ownership and experimenting. The New York panel did not dispute that desks are where the value gets realised; its argument was that desks will not take that ownership while the incentive structure rewards holding data as a local competitive advantage rather than treating it as shared institutional infrastructure. The analogy offered was compute: business units once hoarded on-premise servers as a form of sovereignty, until cloud economics made hoarding pointless and the culture changed. Internal data, the suggestion ran, has yet to go through its equivalent shift.
What It Leaves Open
The panel closed on a reframing rather than a solution. The constraint it kept describing was, at root, a question of leadership: everyone has access to the same tools, and what differs is whether an organisation is structured – and incentivised – to treat its own operational data as an asset worth owning rather than a cost to be cleared.
For buy-side and sell-side data functions, that leaves a practical question rather than a tidy takeaway. The external alternative data market will keep expanding, and the discipline of evaluating whether a bought signal justifies its cost is now well established. The harder discipline may be the one the panel kept returning to: building the internal ownership, governance and accountability that would let a firm recognise the signal it already generates – before the retention policy deletes it.
Subscribe to our newsletter


