Tags: AI, Engineering, Product Infra
2025

Everyone is going to get AI Product Development wrong

Writing up my thoughts on product infra had me thinking about the broader questions related AI product development. And by “AI product deelopment” I mean all of:

  • Using AI to accelerate product development
  • Creating better, novel, and previously impossible products that rely on AI
  • AI-as-product, how to accelerate AI development

ProdInfra and product development is often more art than science already. Perhaps even more waves hands than art or science. How much harder is it going to be to reliably predict and deliver products in the era of accelerating AI?

We can improve our chances of ultimate success in three clear ways. First, embrace ProdInfra to make it easier for product teams to escape local maxima. Next, when betting your future on efficiency gains from AI, pick a metric AI won’t just game. Finally, consider the challenge of blending research and product development efforts.

Hill climbing, AI, and ProdInfra

Modern development is incredibly fortunate to have o11y teams, not to mention platform eng/infra, release engineering, corp eng, and other teams. All of these contribute to faster iteration from different perspectives. All will accelerate development and increase iteration rate. This is awesome. Unfortunately they are necessary but not sufficient, because — in my experience — without the design and product biases prod infra brings to infrastructure improvement, the tendency is to accelerate hill climbing.

Hill climbing is important. Duh. However, as many organizations have learned to their detriment, accelerating your rate of hill climbing to a local maxima doesn’t necessarily help. Particularly in times of significant or accelerating change, when the hill you’re on may suddenly become an incredibly undesirable outcome. So, while we absolutely want teams maniacally focused on improving how products climb the hill they’re on, prod infra is one of the few deeply technical teams with the right mix to ease and accelerate the exploration of unexplored hills.

nota bene: I’m not talking about changing mission or vision here. Shifting those is the responsibility of the CEO and leadership team. I’m generally discussing the strategy of how you’re currently pursuing your mission (what the company is doing now) and vision (where it is headed). High functioning leadership teams take advantage of what product and related teams discover about the reality of the current mission and vision.

We’re already seeing the results of this. I wrote about it last week, as did others. This despite the fact that both OpenAI and Anthropic have hoovered up some of the most wonderful and thoughtful members of Facebook UIE/ProdInfra teams from the mobile transition era.

Great product development and product infra — already a challenge and opportunity for many orgs — is about to become many times more important. It’s also almost certainly going to go pear shaped.

Delivering positive change

Measuring product engineering productivity is already a shitshow. Hopefully, you’ve moved beyond lines of code or pull requests. Seriously, it’s not 1975 at IBM, people. Despite everyone agreeing that LOC are a stupid way to measure productivity, I’ve seen it referenced as a meaningful statistic everywhere from perf at Facebook to meetings at EBay. The whole “but we can measure it” is deeply ingrained and actively destructive.

OK, Mr. But Actually, I agree that teams at the extremes are often worth studying. Duh.

It’s also about to get way worse, because nothing can game commits or lines of code like AI tooling. They are so good at this that one of the normal protections against metrics driving pathological behavior — measuring at the team level always — is about to get completely overwhelmed by AI iteration speed. I look forward to the consultancies that truly make bank for the next few years off of both of these.

So, what metrics do work? To me, it’s the rate of positive change delivered to customers. I prefer this framing for a few reasons:

  1. Like o11y and Key Performance Indicators, teams need to understand and be able to describe how they are measuring user benefits. They could adopt Google’s Critical User Journeys or pick metrics they believe deeply align with users. Either way, you can’t even pretend to measure rate of positive change delivery without understanding what a positive change is.

  2. It’s the rate and it’s sensitive. Teams across the organization are able to contribute and — critically — see their impact.

  3. It captures technical, design, process, and reliability wins. Even the development of a single product is a massive set of entangled dependencies of time spent thinking, building, and learning. Multiple product teams or divisions — no matter the attempts to split the orgs — create exponential effects. Different groups correctly operate with more attention to specific pieces of this puzzle. Focusing on what really changes for the customers gives you a powerful tool to measure overall effectiveness.

We know AIs are convincing even when they are completely wrong. They are an infinite supply of six foot two, blond, Harvard grads who speak with a British accent and can eloquently reference anything humanity has ever written. Understanding whether they are really helping your organization requires not just end-to-end development metrics but concrete understanding of how customers’ — whether internal or external — experiences are being improved.

Reese’s and the Bitter Lesson

Many companies find pairing academic researchers and product development incredibly challenging, an oil and water mix. I keep getting surprised by this, because the era of game development I grew up could not have been more driven by the combination of two great tastes that taste great together. Just in rendering, BSPs unlocked high performance 3D graphics, Ed Catmull’s subdivision surfaces transformed both offline rendering (Pixar) and realtime rendering (basically all games today), and NVIDIA’s development of the programmable pipeline meant every graphics researcher was now also a game engine developer porting research to realtime.

It continued at Linden, where we figured out how to closely collaborate with researchers in fluids (Jos Stam’s stable fluids), noise (leading to a delightful conversation with Ken Perlin about what he called “Perlin Noise” when he submitted papers), and IP (Larry Lessig was as responsible as anyone for the critical decision to grant IP rights). I could keep going, the list goes on and on.

So, what happens when you don’t bridge the gap between research and product? First off, you get a lot of products somewhere between “suboptimal” and “lousy.” AI is partially masking this right now because GenAI’s capabilities are so magical and novel. Moreover, the incredible rewards being generated even by the perception of being an “AI researcher” creates all kinds of incentives to not let your pristine research groups get distracted by a bunch of freewheeling hackers doing gross things to their beautiful Python.

There’s even some reason to believe them. The bitter lesson has been a core component of AI for a long time. Basically, that every time we try to make smarter AIs we should have just been figuring out how to apply more compute. As Drew Breunig notes, this thinking is having a moment. Drew’s skepticism echoes my own.

We know the bitter lesson applies when we have high quality data and clear ways to measure outcomes. We saw the wins that came first from applying this to larger and larger sets of training data, then from scaling inference time, aka “reasoning.” Will those techniques — and other research techniques — keep accelerating AI? I hope so.

But it would be foolish to dismiss the benefits of truly great, coupled product development alongside all that research. Product developers who deeply understand the strengths, weaknesses, and goals of cutting edge AI research can explore products that can’t possibly exist today. They can play with ideas that will also create novel training sets to continue to improve AI. They’re also more likely to find gross approximations that researchers didn’t even think to try that work.

I was going to write up a hypothetical to try and prove this point, but as Simon Willison notes, AI in China is having an incredible month, export controls and chip limitations be damned. You think they are keeping the hackers separate from the researchers? In all of training, model development, and the products they’re building around the models, we’re seeing an incredible pace of exploration far in excess of what access to computing would predict.

Google, Meta, and everyone else going blank check for AI researchers are going to be setting up very complex operational challenges — and not the obvious ones that come from high pay disparities. Big Tech has been figuring out how to navigate that challenge for 20 years now.

No, the real challenge is that most pure researchers prefer to just work with other pure researchers. It’s comfortable, familiar, and in well run orgs produces real results. Plus, they’ll be able to point to their deals and say “you promised I was part of a lab.” Ironically, many of these same researchers have spun out to startups, trying to combine research and product development, only to boomerang back into R&D labs.

The more researchers and teams do this, the more they will be depriving themselves of the opportunity to move faster, to invent more, and to increase their chances of achieving their ultimate goal of AGI.

We live in interesting times

I’ve always felt transitions are the best time to be building products. AI feels like that times a bazillion. With funding — and stakes — this high, the companies and teams that figure out how to be sustainably better at creating the future are going to win and win big.