About a-team Marketing Services
The knowledge platform for the financial technology industry
The knowledge platform for the financial technology industry

A-Team Insight Blogs

Validating GenAI Models in Finance: A Q&A with Chandrakant Maheshwari on Risk, Governance, and the Rise of Agentic AI

Subscribe to our newsletter

At a recent RegTech Insight Advisory Board session, a discussion with Chandrakant on generative AI (GenAI) and model risk management underscored the need to cut through the hype and myths around GenAI and emerging agentic AI in regulated markets.

This Q&A is the result. It examines why traditional model validation techniques—ROC curves and confusion matrices—can’t contain narrative-driven AI, and how a “small wins” rollout (e.g. policy summarization to document comparison) builds practical ‘governance scaffolds’ and human-in-the-loop guardrails. Along the way, GenAI’s blind spots—hallucinations, prompt sensitivity, and bias amplification are addressed along with how continuous monitoring and scenario testing deliver controls.

Why has GenAI disrupted the way we think about model risk and validation?

Because GenAI isn’t just another model. It rewrites the rules of how models interact with data, users, and decisions. Traditional validation was built for structured, deterministic systems; think of models that generate a number, a score, or a classification. GenAI generates narratives. It reacts to tone. It adapts to phrasing. This fluidity creates both power and unpredictability.

We’ve now entered an era where model outputs are probabilistic, context-sensitive, and user dependent. That’s not a problem you solve with ROC curves and confusion matrices. You solve it with scenario testing, alignment evaluation, human oversight, and continuous monitoring.

You argue for a “small wins” approach to GenAI adoption. What does that look like?

It’s about proving value through low-risk, high-learnability use cases. Start with scenarios where GenAI augments not replaces human judgment. Think policy summarization, knowledge retrieval, document comparison. These allow you to build internal literacy, establish governance frameworks, and validate model performance incrementally.

Every small win teaches you how your people interact with GenAI, what kind of guardrails work, where the model drifts, and what quality assurance looks like in practice. That builds trust not just in the model, but in the organization’s ability to manage it responsibly.

What are the specific risks GenAI introduces that traditional validation may miss?

Several. First, hallucinations confident but false outputs can pass undetected unless you’re running factuality checks. Second, prompt sensitivity means small variations in user input can lead to inconsistent results, which aren’t acceptable in compliance settings.

Third, bias amplification, especially when models are trained on large, uncurated datasets can produce discriminatory or misleading recommendations.

And finally, the biggest risk: agency drift. As GenAI becomes more autonomous, especially with tool use, memory, and multi-step reasoning we’re entering the era of agentic AI. These systems don’t just respond to prompts; they pursue goals, take actions, and chain responses together. That changes everything about how we assess responsibility, traceability, and risk.

Let’s go deeper on that—what’s your take on agentic AI in regulated industries?

It’s a double-edged sword. On one hand, agentic AI can automate complex, multistep compliance workflows. On the other hand, it introduces emergent behaviour that’s hard to fully anticipate or constrain. When a model executes a reasoning loop or autonomously selects tools to answer a query, the validation challenge becomes not just “What did the model say?” but “Why did it take this path? Was that path valid?”

To manage agentic AI, we need to evolve from static validation to intent-level validation That means not just validating answers, but validating the model’s goals, decision boundaries, and tool invocations. And that requires a fusion of policy, ethics, risk, and AI engineering, something few firms are truly ready for. The emerging model context protocol standard (MCP) will be important. MCP is an open standard, described as the “USB-C port for AI agents.” It promises to standardize tool integration into compliant, auditable workflows, enabling AI agents to plug into data sources, governance controls, and risk-management peripherals with traceable consistency

How does governance help create safety without stifling innovation?

Governance is not bureaucracy, it’s scaffolding. Good governance enables safe innovation by defining ownership, creating feedback loops, establishing red lines, and ensuring traceability. In practice, that means clear model usage policies, escalation paths for ethical concerns, model cards with limitations, and audit trails of who used what, when, and why.

Without governance, validation becomes guesswork. You can’t rely on accuracy metrics alone you need process metrics. Was the prompt appropriate? Was the override logged? Was the reviewer’s judgment captured? These layers are what make GenAI safe enough to scale.

Many firms still use GenAI as a glorified search engine. What’s the right way to evolve?

Think of GenAI in phases. Phase 1 is retrieval summarizing documents, answering questions. Phase 2 is judgment flagging anomalies, proposing actions. Phase 3 is autonomy executing tasks, adapting over time. But you can’t skip phases. Each phase demands more maturity in validation, governance, and human-in-the-loop controls.

If you’re using GenAI for policy Q&A, you need prompt testing and alignment scoring. If you’re using it to pre-review suspicious transactions, you need bias audits and scenario testing. If you’re deploying agentic workflows, you need guardrails that monitor decision paths and restrict tool access based on risk classification.

What’s the biggest myth you see around GenAI deployment?

That speed is the advantage, t’s not. The real advantage is alignment with users, policies, regulations, and organizational goals. A GenAI model that’s fast but misaligned is worse than useless; it’s a liability. You may win a proof-of-concept demo, but lose regulatory trust, brand reputation, or customer confidence.

I tell teams: don’t aim for “deployment in 30 days.” Aim for “trust in 30 days.” That’s a much harder problem but it’s the right one to solve.

You’ve talked about shifting from metrics to systems. Can you unpack that?

Metrics are flashlights they show you specific dimensions like precision or semantic similarity. But GenAI performance depends on context, phrasing, and user expectations. That’s why we need validation systems, not just metrics.

A validation system spans data quality checks, prompt engineering stress tests, human review protocols, output logging, real-time drift monitoring, and retraining triggers. It’s not a dashboard; it’s a living process that learns, adapts, and improves with use. You don’t validate once; you validate continuously.

How do you see regulatory expectations evolving?

We’re seeing convergence: the EU AI Act, US executive orders, NYDFS Part 500, SR 11-7; they’re all pointing toward three pillars: governance, explainability, and accountability. Whether it’s credit scoring, fraud detection, or customer communication, GenAI will increasingly be treated as a high-risk system.

And regulators will ask: Can you show what your model did, why it did it, and how you prevented harm? If the answer is no, your defence can’t be “But it was accurate.” That won’t fly.

What’s your final advice for risk leaders navigating GenAI adoption?

Build trust before you scale. Pick low risk use cases, validate them deeply, and document every decision. Create a cross-functional GenAI council. Don’t treat governance as a bolt-on. And most importantly: stay humble. These models are powerful, but also fragile.

The goal isn’t to move fast, it’s to move wisely.

Subscribe to our newsletter

Related content

WEBINAR

Upcoming Webinar: GenAI and LLM case studies for Surveillance, Screening and Scanning

6 November 2025 11:00am ET | 3:00pm London | 4:00pm CET Duration: 50 Minutes As Generative AI (GenAI) and Large Language Models (LLMs) move from pilot to production, compliance, surveillance, and screening functions are seeing tangible results — and new risks. From trade surveillance to adverse media screening to policy and regulatory scanning, GenAI and...

BLOG

Modern Data Landscape Comes Under Scrutiny at Data Management Summit London

From data products and marketplaces to the new challenges of regulatory compliance and the latest thinking on unstructured data, A-Team Group’s Data Management Summit London 2025 took in the full breadth of topics that chief data officers and their teams are dealing with daily. With a line up of C-suite executives and expert speakers from...

EVENT

AI in Capital Markets Summit London

The AI in Capital Markets Summit will explore current and emerging trends in AI, the potential of Generative AI and LLMs and how AI can be applied for efficiencies and business value across a number of use cases, in the front and back office of financial institutions. The agenda will explore the risks and challenges of adopting AI and the foundational technologies and data management capabilities that underpin successful deployment.

GUIDE

AI in Capital Markets: Practical Insight for a Transforming Industry – Free Handbook

AI is no longer on the horizon – it’s embedded in the infrastructure of modern capital markets. But separating real impact from inflated promises requires a grounded, practical understanding. The AI in Capital Markets Handbook 2025 provides exactly that. Designed for data-driven professionals across the trade life-cycle, compliance, infrastructure, and strategy, this handbook goes beyond...