From Pilot Purgatory to P&L Powerhouse
This article was first published by AMA and, as part of the AMA Global Network, is republished by Management Centre Europe with permission.
Five years into the AI rush, many organisations have the same problem: AI is “in production”… but not in the income statement.
- Pilots are running.
- Models are live.
- Dashboards look impressive.
Yet when finance looks for results, the impact is often hard to find.
This article explores why the problem is rarely the technology itself. Instead, it shows how a lack of organisational readiness—around data, governance, integration, and financial accountability—keeps AI stuck in pilot mode
Listen to this article:
If you have got dozens of AI models in production and nothing on the income statement, you don’t have an AI problem, you have a readiness and governance problem. Here’s an operations-first, CFO-proof playbook for turning AI experiments into earnings.
THE 3:47 P.M. QUESTION
At 3:47 p.m. on Thursday, a CFO walked into my office and asked one question: “We have dozens of AI models in production. Why can’t I find a dollar of value on our books?”
That moment crystallized what I now see across construction, manufacturing, logistics, energy, healthcare operations, and financial services: enthusiastic pilots, impressive dashboards, big vendor fees—and no auditable P&L impact.
This isn’t happening in a vacuum. Finance leaders face unprecedented pressure today. AvidXchange’s July 2025 survey (“Are Finance Teams Equipped for Sustained Economic Volatility?”) found that 86% of finance leaders have expressed concern about the economy and 64% about inflation specifically. In October 2025, the New York Times reported that Amazon is pouring $125 billion into AI infrastructure (“Big Tech’s A.I. Spending Is Accelerating (Again)”). Meanwhile, enterprise abandonment of AI initiatives jumped from 17% to 42% in just two years, with 95% of generative AI pilots failing to deliver measurable financial impact, according to a research report from MIT’s Project NANDA.
Five years after the AI gold rush began, senior executives are discovering a brutal truth: Impressive demos don’t translate to income statements. In this environment, CFOs aren’t asking “What’s your AI strategy?” They’re asking, “Where’s my money?”
The core issue isn’t model accuracy. It’s organizational readiness.
TRADITIONAL ENTERPRISES MUST ORCHESTRATE AI
In a software company, AI can be the product. In a traditional enterprise, AI is never the product. The product is a finished building, a safe shift, a container that leaves port on time, a portfolio delivered within mandate. AI has to work inside ERP constraints, compliance regimes, union workflows, third-party systems, and thin margins that tolerate very little waste.
Here’s the critical insight I learned after 25 years of implementing AI across Fortune 500 companies: Traditional enterprises don’t manufacture AI. They orchestrate it.
The major platforms available—Azure AI, Vertex, Bedrock, OpenAI—are powerful and deliberately industry-agnostic. Their engineering is not the bottleneck. The bottleneck is whether the enterprise is ready to orchestrate them.
Most failures trace back to three gaps that no vendor can fix for you. First is integration discipline: Do AI recommendations automatically flow into your systems of record and trigger workflows, or do they die in dashboards and spreadsheets? Second is governance rigor: Can you explain any AI decision to an auditor, reproduce it on demand, and show when and why a model changed? Third is finance attribution: Can you prove, with counterfactuals and CFO sign-off, that an AI-enabled decision actually reduced cost, improved safety, or protected margin?
Think of the “edges” of AI—models, copilots, agents—as the cranes on a job site. The “rails”—data, governance, integration, controls—are the foundations, roads, and power. When the rails are weak, the edges look good in board decks but die in the field.
A TWO-PHASE MODEL: READINESS BEFORE VENDORS
After that 3:47 p.m. question, I stopped letting organizations jump straight from ambition to vendor selection. Instead, I built a disciplined two-phase model that changed how we approach AI investments.
Phase 1 assesses AI readiness across 10 dimensions (see “The 10 Dimensions of AI Readiness”). It identifies and prioritizes remediation for weak areas before platform purchase, assigning owners and timelines. This phase builds internal capability to orchestrate AI, not just deploy it.
Phase 2 focuses on vendor selection and scale, initiating only when readiness reaches a score of at least 70/100 for the 10 dimensions, with no dimension scoring below 6. Engagement involves evaluating platforms against real constraints (not slideware), deploying into an integrated, governed, and measurable environment, and continuously monitoring and attributing value.
C-suite leaders have four non-delegable responsibilities in this model: diagnose readiness, lead remediation, negotiate vendor fit from a position of strength, and enforce stage-gate discipline. I’ve learned that saying “not yet” is often the most strategic AI decision you will make.
THE 10-DIMENSION ENTERPRISE AI READINESS MODEL
The readiness model distills 25 years of work across Cisco, BlackRock, ARCO Construction, and other enterprises into a single question: Can your organization safely, repeatably, and profitably absorb AI into operations? To answer this question, score each dimension from 0 to 10. You need at least 70 overall and no dimension below 6 before you scale beyond a pilot.
Three of the dimensions are kill criteria, requiring a score of 6 or higher before serious vendor engagement: Data Infrastructure Quality (dimension 1), Governance and Compliance (dimension 2), and Measurement and Finance Attribution (dimension 5). Weak data corrupts models; regulatory exposure is unacceptable; and ROI must be proven to sustain investment. Do not engage vendors beyond experiments if any score below 6.
If there are problems to remediate, you can’t fix 10 dimensions at once. In traditional enterprises, the sequence matters because some dimensions are foundational to others. Start by raising Data Infrastructure, Governance, and Finance Attribution to at least 6.
Data Infrastructure improvements begin with establishing automated quality checks that flag inconsistencies within minutes, not days, and route alerts to data stewards with authority to fix them. Implement lineage tracking so that every AI decision can be traced back to its source system, source record, and transformation logic. At ARCO, this meant building a unified cloud data warehouse that consolidated 18 disconnected systems with real-time quality monitoring. It wasn’t glamorous, but it was essential.
To strengthen Governance, create a signed playbook that legal, risk, and audit co-own, documenting exactly which decisions AI can make autonomously, which require human review, and how exceptions are escalated. At BlackRock, our governance playbook specified that any ESG controversy alert required portfolio manager review within 24 hours, with escalation to compliance if exposure exceeded defined thresholds. This clarity protected us legally and built trust operationally.
I’ve learned that saying “not yet” is often the most strategic AI decision you will make
For Finance Attribution, work with your CFO to design control groups or baseline comparisons that isolate AI’s impact, then establish a monthly value ledger where finance signs off on documented savings or improvements. This isn’t optional. If finance won’t sign your ROI calculation, it’s not real savings.
Strengthen the dimensions of Technical Ecosystem Integration and Human-in-the-Loop Design so that AI recommendations reach systems of record (ERP, CRM) with clear decision rules, minimizing manual data entry. Integration maturity means AI outputs trigger work orders, update records, or adjust schedules. Human-in-the-Loop requires mapping decision types to authority levels: automated with audits, needing supervisor approval, or always requiring human judgment.
Finally, essential elements include: mature Security and Access (least-privilege access, audit trails), Operational Embedding (AI recommendations with one-click actions in frontline tools), Change Management and Skills (role-specific playbooks and coaching), Capacity and Economics planning (understanding AI unit economics and modeling compute costs), and Stage-Gate Discipline (documented pass/fail criteria and formal business leader sign-off for pilot graduation).
VENDOR EVALUATION: QUESTIONS THAT ACTUALLY MATTER
Once readiness exceeds 70%, the focus shifts from flashy demos to finding platforms that fit existing systems. Successful vendors, especially in traditional enterprises, must clearly answer these critical questions: data quality requirements and missing data handling; automatic output to ERP, CRM, or dispatch (versus manual exports); audit validation and trails; model drift detection and alerting; total cost of ownership at scale and time-to-production; deployment playbooks for intensive industries; speed of model deactivation; and provision of two similar-scale industry references. Vendors unable to clearly answer these are not ready.
To evaluate vendors, I use three stage-gates, each with explicit targets that must be met before advancing:
The pilot stage runs 3 to 6 months to prove the concept. Select one high-value use case with reliable data, involve 10 to 50 users, require 100% human review of AI decisions initially, and establish baselines and control groups to measure impact. The production stage spans 6 to 12 months and proves trust and repeatability. Expand to 100 to 500 users, integrate AI outputs into systems of record with no manual copy-paste, track override rates trending downward, and produce a monthly CFO-signed value ledger documenting financial impact.
The scale stage takes 12 to 24 months and proves enterprise value. Deploy to hundreds or thousands of users, embed AI into core workflows across sites or regions, document P&L impact in business reviews, and institutionalize governance, model lifecycle management, and training programs.
Most organizations overlook a crucial step: If an AI use case fails, stop, document lessons, and reassign the team. ARCO deliberately dropped four early AI candidates lacking ROI or clean integration. This discipline focused capacity on successful programs, resulting in a 23% reduction in mean project overruns.
FINANCE-GRADE MECHANICS: TURNING MODELS INTO MONEY
I use a one-page value ledger with CFOs for each AI use case, tracking baseline performance, AI recommendations/actions, confirmed impactful interventions, documented dollar impact, the counterfactual method for isolating AI’s contribution, and finance-signed, timestamped CFO buy-in.
Coverage rate measures the percentage of AI decisions humans override/review, reflecting user confidence. Start at near-100% review in months 1 to 3, tapering to 50% in months 4 to 6, 25% in months 7 to 12, and 10% thereafter as trust builds. Trust velocity tracks how quickly mandatory human review moves to statistical audit as override rates fall and outcomes stay positive, showing how fast AI can safely scale.
These metrics determine expansion speed without losing control, assuring finance can back wider deployment. BlackRock tracked trust velocity for nine months, dropping override rates from 42% to 8% as portfolio managers trusted AI after it detected ESG controversies 27 days sooner than rating agencies.
CASE SNAPSHOTS: WHAT GOOD LOOKS LIKE
At Cisco, a Universal Order Visibility program instrumented a $10 million-per-day supply chain with comprehensive vendor integrations and exception playbooks. The program delivered a 33% reduction in aged backlog and a 70% improvement in partner satisfaction, protecting more than $300 million in annual revenue. The success came from treating supply chain visibility as an integrated data and governance challenge, not just a technology deployment. During Hurricane Harvey and COVID-19 border closures, UOV maintained 85% fulfillment continuity while competitors fell below 60%.
BlackRock’s ESG data infrastructure for the Aladdin platform demonstrated how industrial-grade rails enable competitive advantage. The system ingested over 10 million data points daily with rigorous quality assurance, lineage tracking, and compliance controls. Portfolio teams gained earlier controversy signals than rating agencies—27 days on average—supporting significant growth in sustainable assets under management. The differentiator was the discipline applied to data quality and governance, not the models themselves.
ARCO Construction’s AI-enabled operations program combined forecasted overrun detection, disciplined change-order management, and supervisor rituals that embedded AI recommendations into daily decision making. The pilot phase documented $2.1 million in savings with clear attribution to specific interventions. The program later scaled to deliver a 23% reduction in mean project overruns across 250 active projects and cut reporting cycles from 20 days to 5. Success came from middle-out adoption and relentless finance-grade measurement that gave executives confidence to expand.
In each case, the differentiators were the same: strong rails, middle-out adoption, and relentless finance-grade measurement.
Top-down mandates put AI in strategy decks. Bottom-up heroics put it in one team. Sustainable adoption happens in the middle, with supervisors, managers, and operators who translate vision into execution.
Successful programs I have led consistently include a weekly “AI + finance” review of AI recommendations, overrides, and value; supervisor playbooks translating AI outputs into actions; continuous, not one-time, coaching for frontline leaders; and rituals celebrating AI-human judgment preventing losses, improving safety, or protecting margin.
When you do this well, finance becomes a sponsor instead of a skeptic. At ARCO, our CFO went from asking “Why can’t I find a dollar of value?” to asking “What else can we automate?” in less than a year. That shift happened because we gave him a monthly ledger he could defend to the board.
A 180-DAY PLAN FOR THE C-SUITE
If you want to know whether AI will be a slide in your story or a line on your income statement, you can find out in six months.
On days 1 through 30, focus on baselines and guardrails. Launch a weekly AI + finance review, define shared terms and map five high-value decisions in one function, capture baseline KPIs and costs, and score your readiness. If Data Infrastructure, Governance, or Finance Attribution are below 6, pause scale-up and run a “rails sprint” to fix foundational gaps.
On days 31 through 90, run a pilot that can graduate. Choose one use case with clear economics and solid data, establish coverage rate from day one, put in place A/B tests or matched control groups, and begin the value ledger with finance as co-author.
When you’ve come to days 91 through 180, move to production and proof. Expand to 100 to 500 users with full integration into systems of record, drive overrides down while maintaining or improving outcomes, present the trend lines—readiness, coverage, and CFO-signed value—to the board, and decide explicitly whether to scale or stop.
Before approving another “must-see” demo, I would urge any CEO, CIO, CDO, or chief AI officer to insist on three things: a readiness score of at least 70, with no single dimension below 6; a documented stage-gate with business and finance co-ownership; and a one-page value ledger where finance signs the dollars every month. If you enforce those basics relentlessly for one cycle, you’ll discover within 180 days whether AI in your enterprise is a fad or a new form of financial infrastructure you can trust.
WRITTEN BY
AQ Robin Patra is a globally recognized leader in enterprise AI, data, and digital transformation with more than 25 years driving measurable business outcomes across Fortune 500 firms including Cisco, BlackRock, and ARCO Construction. He has architected and scaled AI platforms that generated $2.1M – 300M+ in documented P&L impact across manufacturing, financial services, and construction industries.
At MCE, AI is not a technology programme. It is a management and leadership capability.
Explore our AI leadership programmes
Leading an AI-Ready Organisation
For senior leaders shaping direction, culture, and accountability.
- 3 Days In Person
- 6 Sessions Online
- Available In Company
Augmented Leadership: Blending EQ with AI
For senior leaders navigating judgment, trust, and human impact
- 2 Days Online + 3 Days In Person
- Available In Company
AI for Managers: From Resistance to Results
For managers responsible for teams, workflows, and performance.
- 3 Days In Person
- 6 Sessions Online
- Available In Company