Tools, Tools, Tools

But Not a Drop of ROI in Sight

"Water, water, every where, / Nor any drop to drink."

— Samuel Taylor Coleridge, The Rime of the Ancient Mariner, 1798
The Solow Paradox returns — investment everywhere, returns nowhere

"You can see the computer age everywhere but in the productivity statistics."

— Robert Solow, Nobel Laureate, 1987

Nearly four decades ago, Robert Solow looked at the billions being poured into computing and asked the obvious question nobody in the boardroom wanted to hear. The machines were everywhere. The returns were nowhere. It took another decade (and arguably wholesale reinvention of how companies actually worked) before the productivity statistics caught up with the capital expenditure.

I've been thinking a lot about Professor Solow lately. Because here we are again.

Enterprises spent an estimated $37 billion on Generative AI in 2025, up 3.2x year-over-year, making it the fastest-growing software category in history (Menlo Ventures, State of Generative AI in the Enterprise 2025). McKinsey estimates the technology could add $2.6 to $4.4 trillion in annual value to the global economy. 90% of American knowledge workers report using some form of AI-enabled tool. LLMs are the fastest-adopted consumer technology in human history… faster than the smartphone, faster than the web browser, faster than electricity.

And yet. MIT's Project NANDA report (The GenAI Divide: State of AI in Business 2025) went viral last summer with the headline claim that 95% of enterprise AI initiatives have delivered zero measurable P&L impact. That number got breathlessly repeated across LinkedIn, Fortune, and every conference keynote for months. It deserves scrutiny.

The NANDA study was based on just 300+ publicly disclosed initiatives, 52 interviews, and 153 survey responses (i.e., a small and self-selecting sample). As Kevin Werbach at Wharton pointed out in a detailed critique, the 95% figure appears to come from a narrow subsection on "custom enterprise AI tools" where "successful" was defined as "causing a marked and sustained productivity and/or P&L impact". This means that "unsuccessful" explicitly does not mean "zero returns." Futuriom was blunter, calling it "propagandized clickbait" with "incomprehensible charts" and "incomprehensible leaps of logic." The full supporting data has never been released. I think the report's directional insight is correct: most enterprises haven't cracked the code on scaling AI value. But the specific numbers are unreliable, and the report has been weaponised by people who want to declare the whole "GenAI thing" a bust.

The more rigorous data paints a more nuanced picture. The Wharton/GBK Accountable Acceleration report (October 2025) (a three-year annual study surveying 800+ enterprise decision-makers at US firms with 1,000+ employees) found that 74% of leaders already report positive ROI from GenAI, 72% are formally tracking structured business-linked ROI metrics, and 88% plan to increase AI budgets in the next 12 months. But here's the tension: McKinsey's March 2025 State of AI report found that over 80% reported no tangible enterprise-level EBIT impact, and only 1% described their rollouts as "mature." S&P Global's 2025 data showed 42% of companies abandoned most of their AI initiatives entirely, up from 17% the prior year.

Both things are true simultaneously. Leaders perceive positive returns. Enterprise P&L statements don't show them yet. That gap is the whole story.

You can see GenAI everywhere but in the productivity statistics. The Solow Paradox is back — and it has a $37 billion price tag.

Now, before we start throwing up our hands, let me be clear about something. The NANDA crowd and the doom-loop pundits would have you believe this is all a giant bust. I don't buy it, and neither should you. These tools take the basic motions that pace many of our lives at work and fundamentally augment, automate, eliminate, and re-imagine them. Unlike the proclaimed RPA revolution of the mid-2010s (jokingly, "Rapidly Pushing Automatically"), these non-deterministic tools can embed themselves in the disordered, semi-structured workflows and tasks that torment white-collar professionals (particularly those in enterprise, business, technology, and professional services).

Make no mistake: these tools are still just, well, tools. But they are immensely powerful tools that herald the next wave of the now 30-something-year-old digital disruption sweeping all our sectors.

My position: the ROI isn't missing. It's hiding. And it's hiding behind four distinct problems that compound each other:

  1. We're measuring it wrong.
  2. Value is migrating from incumbents to disruptors and we're confusing value shift for disappearance.
  3. It takes far longer than anyone budgeted because every GenAI initiative is really a Target Operating Model transformation.
  4. We're choosing the wrong tools for the wrong business impact.

Stay with me, because these arguments are additive. The thundering implication seems to me: continue to have confidence and double down.

——————

Problem 1: We're Measuring It Wrong

The most straightforward explanation for the paradox was a favourite of former Federal Reserve Chairman Alan Greenspan. It goes something like this: human beings are very good at counting stuff. Product quantity is tangible. Products either exist or they don't. What Greenspan and others postulated in the early 1990s is that we are not sufficiently good at measuring three things: the quantity of services provided (as distinct from goods), the quality improvement of goods and services overall, and — most insidiously — those innovations that make human interactions better: better educated, better decided, better communicated.

Relative to some other recent General Purpose Technologies (ironically, also GPTs), GenAI is particularly vulnerable to this measurement error. Just what is the value of personalisation benefit? Of innovation and R&D efficiency? Of instant execution? Of better information production? Of value reinvestment, where time saved on one task gets ploughed into higher-order work?

The research on power users makes this structural, not anecdotal. The Harvard/BCG "Jagged Technological Frontier" study (Dell'Acqua, Mollick et al., 2023) we have touched on before (a pre-registered randomised controlled experiment with 758 consultants), found that AI-assisted consultants completed 12.2% more tasks, finished them 25.1% faster, and produced results rated over 40% higher quality by blinded evaluators. The skill-levelling effect was dramatic: below-average performers improved by 43%, while above-average performers improved by 17%. The Stanford/NBER study (Brynjolfsson, Li, and Raymond, published in The Quarterly Journal of Economics, May 2025) tracked 5,172 customer support agents and found a 15% overall productivity gain, with novice workers seeing roughly 30% improvement.

These gains are real and rigorous. But here's where the measurement problem bites hardest: they are overwhelmingly (still) being privatised by individuals rather than captured by the firm. The Adecco Group's 2024 survey of 35,000 workers found the mean time saving for AI users is one hour per day but the distribution is heavily skewed. The top 5% save three to four hours daily. Microsoft's 2024 Work Trend Index (31,000 workers, 31 countries) found the top 5% of Teams users saved a full workday per month through meeting summarisation alone.

Stop and think about that for a moment. Your best people are finishing their work in half the time. But your dashboards don't show twice the output. The surplus is being absorbed into organisational slack, personal development time, or simply leaving the office earlier. Mollick calls these workers "secret cyborgs". Over 50% of American workers now use AI at work, and on a fifth of tasks they report three times performance improvement, but they don't disclose it because, as Mollick puts it, "if people think they're going to be fired or punished because they're showing productivity gains, they're just not going to show you."

I call this "value privatisation" and it is the single largest source of phantom ROI in enterprise GenAI today. The productivity gains are real at the individual level but invisible at the enterprise level because no one has designed the organisational mechanisms to capture them.

Meanwhile, capital flows to the wrong places. When executives in various studies allocate hypothetical GenAI budgets, roughly 70% goes toward Sales and Marketing, because those metrics are visible, legible, and easily communicated in board presentations. But multiple sources document the most dramatic cost savings in back-office operations: BPO elimination in customer service and document processing, reductions in external creative and content costs, and streamlined procurement workflows. Deloitte's State of AI in the Enterprise 2026 report confirms this pattern: two-thirds of organisations now report productivity and efficiency gains from AI, but only 20% are seeing actual revenue growth. The gains are real. They're just not where many of the dashboards are pointed.

I call this the Hype-Value Inversion. The flow of capital and public attention does not follow the flow of actual value. The companies that crack this first (i.e., the ones that develop what I'd call a "value dashboard" tracking decision velocity, innovation cycle time, risk exposure reduction, and customer lifetime value uplift) will allocate capital more intelligently and compound their advantage while competitors remain paralysed by a perceived lack of ROI.

——————

Problem 2: Value Is Migrating — and We're Confusing Destruction for Disappearance

The economist Joseph Schumpeter described capitalism as a "perennial gale of creative destruction." GenAI is that gale at hurricane strength. And its initial impact is not a gentle uplift across the board. It's a violent, industry-wide compression of profit pools, where value is destroyed and redistributed long before it is created anew.

The popular narrative focuses on whether GenAI will cause mass layoffs. For most of 2024 and early 2025, the evidence said: not yet. That narrative is shifting and in some places shifting fast. In 2025, companies directly cited AI in announcing 55,000 job cuts, more than 12 times the number attributed to AI just two years earlier (Challenger, Gray & Christmas). Block's Jack Dorsey announced in February 2026 that AI tools would allow the company to shrink from 10,000 employees to 6,000 and the stock surged over 20%. Salesforce cut 4,000 customer support roles after its AI agents began handling 50% of interactions. Accenture exited 11,000 employees it could not reskill. Duolingo stopped using human contractors for work AI can handle. The week of late February 2026, when Citrini Research's viral essay predicted an "AI doom loop" of layoffs and margin-fattening, felt like a genuine inflection point in public sentiment.

Now for the important caveat. There is a significant amount of what I'd call "AI washing" happening here. A Harvard Business Review survey of 1,006 global executives (January 2026) found that only 2% reported large headcount reductions tied to actual AI implementation. A full 60% made cuts in anticipation of future AI efficiencies. Oxford Economics was blunt: "We suspect some firms are trying to dress up layoffs as a good news story rather than bad news." The 55,000 AI-cited cuts represent just 4.5% of total reported job losses; standard "market and economic conditions" accounted for four times as many. But the direction of travel is clear, even if the magnitude is still contested.

What is happening (and what matters far more for enterprise services) is more subtle and more consequential: the clearest ROI is often emerging from reducing external spend — eliminating BPO contracts, cutting agency fees, replacing expensive outsourced services with AI-powered internal systems. Companies that have successfully crossed this threshold report saving $2–10 million annually from BPO elimination alone, and 30% or more in external creative and content costs.

This is a "re-shoring" of knowledge work. It is not bringing back "work" to the home country in human form, but back into the enterprise in automated form. For three decades, companies offshored process-driven knowledge work to BPO providers using labour arbitrage. GenAI is eliminating the need for that arbitrage entirely. The value isn't destroyed… it's recaptured and consolidated by the core enterprise.

If you're a PE investor with portfolio companies that sell outsourced professional services, that paragraph should keep you up at night. If you're a PE investor whose portfolio companies buy those services, it's an opportunity to fundamentally restructure cost bases. Either way, understanding which side of this value migration you're on is one of the most important strategic questions of the next 18 months.

Simultaneously, AI-native disruptors are doing something more radical than automating old processes. They're introducing entirely new business models: selling the completed work product rather than a tool to help humans work faster, exploiting incumbent business model conflicts where GenAI makes existing revenue models unnecessary, and compressing development cycles so dramatically that vendor switching accelerates. As GenAI capabilities commoditise and API costs fall by as much as 90%, the basis of competition shifts to price (and AI-centric companies often operate with gross margins of 50–60% or even less, closer to a services business than the 80–90% margins of traditional SaaS).

BCG's September 2025 study of 1,250+ firms found that the 5% of "future-built" companies achieving AI value at scale enjoy 1.7x revenue growth, 3.6x three-year total shareholder returns, and 1.6x EBIT margin versus laggards. The value isn't disappearing from the economy. It's migrating… from incumbents to disruptors, from external providers to internal capabilities, and from legacy business models to AI-native ones. The aggregate statistics look flat because the winners and losers are cancelling each other out.

As I explored in Tasks, Processes, and Journeys, the level of business model intervention matters enormously. Task-level automation compresses costs. Process-level redesign restructures how value is created. Journey-level transformation changes who captures value entirely. The firms getting crushed in this Great Compression are stuck at the task level. The firms building moats are working at the journey level.

——————

Problem 3: It Takes Far Longer Than Anyone Budgeted

Here's the uncomfortable truth that most GenAI vendors don't want to tell you and most boards don't want to hear: every significant GenAI initiative is a full Target Operating Model transformation. Not a software upgrade. Not a feature deployment. A wholesale rewiring of how work gets done across five interdependent dimensions simultaneously.

The evidence for this is overwhelming. McKinsey's State of AI research found that redesigning workflows has the single biggest effect on an organisation's ability to see EBIT impact from GenAI… not buying better tools, not hiring more data scientists. Deloitte's 2026 data tells us where organisations actually stand: only 34% are using AI to deeply transform (creating new products, services, or reinventing core processes). Another 30% are redesigning key processes. And a full 37% are using AI at a surface level with little or no change to existing processes. In other words, roughly two-thirds of organisations are trying to get AI-native results from legacy operating models.

It doesn't work. And it can't work. Because GenAI transformation requires concurrent change across every lever:

TOM Lever Legacy Model AI-Native Model
People & Skills Siloed expertise; periodic training; tech skills in IT Pervasive AI literacy; continuous learning; new hybrid roles; "builders not doers"
Processes & Ways of Working Linear, manual, human-gated workflows Dynamic, AI-augmented end-to-end journeys; redesigned from first principles
Governance & Structure Centralised, slow, risk-averse; IT-led Federated with clear accountability; agile risk management; business-led AI governance
Data & Analytics Data as IT cost centre; siloed, poor quality Data as strategic C-suite asset; assets in structured formats (MD/JSON); stale KM culled
Technology & Systems Monolithic, on-premise; fragmented point solutions Composable, cloud-native; integrated platform; rapid experimentation enabled

The ramping effects are real and uneven. Large enterprises take nine months or longer to scale a successful pilot into full production. Mid-market companies do it in 90 days. This isn't just a speed difference: it's a compounding competitive disadvantage! While one company is stuck in committee meetings, its more agile competitor's AI has been learning and improving for three quarters.

The follow-up BCG/Harvard study (Randazzo, Lifshitz, Kellogg, Dell'Acqua, Mollick, Candelon, Lakhani; December 2025) analysed 4,975 human-AI interactions from 244 BCG consultants and found three archetypes with sharply different outcomes. Cyborgs (60%) wove AI throughout their entire workflow in iterative dialogue (e.g., probing, extending, pushing back). They developed entirely new AI-related capabilities. Centaurs (14%) maintained a clear division of labour, using AI selectively while keeping strategic judgment human. They achieved the highest accuracy of any group. Self-Automators (27%) consolidated their entire workflow into one or two AI interactions, accepting outputs with minimal engagement. 44% of this group accepted AI's output without any modification. They developed neither domain expertise nor AI skills.

This connects directly to what I wrote about in The Apprenticeship Conundrum and what Ethan Mollick's most recent research at Wharton has made quantitatively (and uncomfortably) clear. In his January 2026 piece "Management as AI Superpower," Mollick argues that as AI becomes agentic (doing things, not just saying things), the critical skill shifts from prompt crafting to what he calls the "Equation of Agentic Work" — judging whether to delegate based on human baseline time, probability of AI success, and AI processing time. He tested this with executive MBA students who built entire startups from scratch in four days using AI agents. Most had never written code. The results were, in Mollick's words, "an order of magnitude further" than students working over a full semester. The binding constraint wasn't technical skill. It was management skill: knowing what to ask for, how to scope deliverables, and when to override the machine.

Mollick's earlier video game industry research provides the empirical anchor: the quality of middle managers explained more than 20% of eventual revenues which is a larger effect than senior leadership or creative designers. His point is sharp: every field has already invented the paperwork that makes good AI delegation work. Software developers write PRDs. The Marines use Five Paragraph Orders. Consultants scope engagement deliverables. All of these, Mollick notes, "work remarkably well as AI prompts." The skills needed are "Management 101": setting clear direction, providing good examples, defining what "done" looks like. The people who struggled most in his experiments weren't the non-technical participants. They were the ones who couldn't articulate what they wanted.

As I argued in Expertise, IQ, and EQ, the returns to operator capability are exponential in the GenAI era. But here's the rub: the BCG/Harvard study also found that on tasks outside AI's capability boundary, AI users were 19 percentage points less likely to produce correct solutions than those working without AI. This is the "sleeping driver" risk: passive delegation degrades judgment precisely where human expertise matters most.

And there's a further twist that nobody is talking about enough: the production bottleneck has moved. For decades, the gating constraint in knowledge work was creating the output. It was writing the brief, drafting the analysis, building the model. GenAI has obliterated that constraint. A marketing manager who previously spent two weeks developing a campaign brief now produces three polished concepts in an afternoon. But her creative director still has one hour per week scheduled for reviews. Legal still needs five business days to approve. The budget chain still requires four sequential sign-offs. She's producing faster, but each concept still takes three weeks to clear the system.

Asana's Work Innovation Lab (9,000+ knowledge workers, November 2025) crystallised this as the "absorption bottleneck": organisational value is capped by the slower of two rates: production and absorption. When production velocity increases 10x but absorption capacity remains flat, the bottleneck simply shifts from "we can't make it fast enough" to "we can't review, approve, and act on it fast enough." In software development, the data is even starker: Faros AI's analysis of 10,000+ developers across 1,255 teams found that while individual developers complete 21% more tasks with AI, code review times increased 91% and PR sizes grew 154%. The mathematics don't work. Individual gains are absorbed entirely by downstream friction.

This is a TOM problem… not a technology problem. Your approval chains, governance processes, and review cadences were designed for a world where production was slow. They now need to be redesigned for a world where production is fast and judgment is the scarce resource.

The binding constraint isn't technology. It's the "frozen middle" — that layer of management where current methods work "reasonably well" and the learning curve feels daunting. Plan to invest 2–3x more in process change, training, and enablement than in the technology itself. Map your actual processes, not your wishful thinking. Find the doom loops and hidden knowledge. And above all, build a talent model that creates builders, not doers.

——————

Problem 4: We're Choosing the Wrong Tools for the Wrong Business Impact

This is the problem I see most often in client work, and it's the one that frustrates me the most because it's the most fixable.

There is a spectrum of GenAI tool complexity — from bare-metal LLM interactions through custom GPTs and Projects, to single-agent orchestration, to multi-agent workflows. And there is a spectrum of business impact — from simple augmentation through automation, elimination, re-orchestration, to the creation of entirely new capabilities and business models. The ROI crisis is, in large part, a matching problem. Organisations are chronically over-engineering solutions for simple tasks and under-powering solutions for transformational ones.

Here's a directional framework I often use with clients:

Business Impact Intent What It Means Right Tool Match Wrong Tool Match
Augment Help humans do existing work faster and better Bare-metal LLM or embedded Copilot features Building custom agentic workflows for summarisation tasks
Automate Replace repetitive human steps in a defined workflow Custom GPTs/Projects with structured data pipelines and QA Throwing bare-metal ChatGPT at structured processes and hoping for consistency
Eliminate Remove entire process steps or external dependencies Single-agent orchestration with defined data flows Expecting an off-the-shelf Copilot feature to replace a BPO contract
Re-orchestrate Fundamentally rewire how a multi-step process works end-to-end Multi-agent or agentic orchestration with integrated data sources Using a custom GPT when you need real-time data access and cross-system coordination
Create Build entirely new products, services, or business models Full agentic architecture with new TOM design Any tool deployed without a corresponding business model and TOM transformation

Three principles govern this directional matching heuristic:

  • Don't over-engineer for simple tasks, and don't under-power for complex ones. If your people need help summarising meeting notes, turn on a Copilot feature. Don't build a custom agent. If you want to replace a $5 million BPO contract with an automated workflow, a Copilot feature isn't going to cut it. Match the tool to the specific outcome.
  • Greater tool complexity requires substantially more Target Operating Model transformation. Moving from augmentation to re-orchestration doesn't just require better technology — it requires changes to processes, skills, governance, data infrastructure, and often pricing models and client contracts. This is where Problem 3 and Problem 4 compound each other: organisations pick complex tools without budgeting for the TOM transformation required to make them work.
  • Simpler implementations can often deliver the same business impact with less risk. I see firms jumping to multi-agent architectures because they sound sophisticated when a well-configured Custom GPT with structured data would deliver 80% of the value at 20% of the cost and complexity. The goal isn't to deploy the most impressive technology. It's to match tool complexity to business impact intent, then invest in the TOM changes required to capture the value.

One more critical point: do not use Generative AI when other tools are better. Advanced analytics, deterministic AI, and traditional methods still outperform GenAI for many tasks. For example anything requiring absolute mathematical precision, specific regulatory compliance, or perfectly reproducible outputs. The worst thing you can do for your GenAI programme's credibility is deploy it on tasks where it's the wrong tool and then point to the predictable failures as evidence that "AI doesn't work."

——————

So Where Is the ROI?

It's not a mystery. It's a choice.

The ROI is hiding in back-office transformation that nobody's measuring. It's being privatised by power users while enterprises fumble their formal deployments. It's migrating from incumbents to disruptors in ways the aggregate statistics can't see. It's trapped behind operating models that nobody's willing to rewire. And it's being squandered by mismatching tool complexity to business impact intent.

But (and this is the critical "but") the companies that are getting it right are getting outsized returns. BCG's 5% of "future-built" companies are capturing 1.7x revenue growth and 3.6x shareholder returns. The Wharton/GBK study shows 74% of enterprise leaders already reporting positive ROI. This outcome is a far cry from NANDA's doom-and-gloom 95% failure rate. The Harvard/BCG experiments show 40%+ quality improvements for properly integrated users. The Stanford study shows novice workers performing like experienced veterans within two months of AI-assisted work.

The Solow Paradox resolved itself the first time around. It will resolve itself again. But only for the organisations that:

  • Redesign their measurement systems to capture the systemic value GenAI creates — not just the front-office metrics that are easy to count.
  • Understand the value migration and position themselves on the right side of it — whether that means bringing outsourced capabilities in-house or fundamentally rethinking what they sell and how.
  • Treat every GenAI deployment as a Target Operating Model transformation and invest 2–3x more in process change, training, and enablement than in technology. That includes redesigning your review, approval, and governance processes — because the bottleneck has moved from production to absorption, and your committee cadences were designed for a slower world.
  • Match tool complexity to business impact intent — ruthlessly, honestly, and with the discipline to start simple and scale only when the organisational foundation is ready.

And one more thing. Opting out is not an option. The learning moats, the reinforcing network effects that will embed in GenAI-intermediated value chains, and the step-change improvements in client and customer value propositions are going to be soon, lasting, and structurally difficult to overcome if you're late. The worst AI you are using is the AI you use today. It only gets better from here.

The ROI was never about the tools. It was always about the work.

Disclaimer: These views are my own and reflect no other organisation. They are current today but likely to evolve rapidly as our world, markets, and technologies do. Comments are welcome but please be constructive and civil — we are all trying to work out answers to this new world together!

Nota Bene: A friend asked me if I write these posts or does an LLM! I write all the words you see above. I do ask an LLM to critique it for me, identify any grammar errors, and fact-check my references. But the words all remain my own. These posts take me a long, long time to write. Apologies you have gotten so few of them lately!