What a week already in the AI and agentic world. And it’s only Tuesday.
Nobody knows anything — William Goldman
Pretty much my favorite product development quote. With AI, “we’re not ready” is maybe even more apt.
Because we’re not.
What do you do when things are moving so quickly? I’d suggest that deeply agentic vulnerability research and organizational telemetry should both be part of AI table stakes and something very important for outcome engineers to work on.
What organizational zero days exist, waiting for competitors—or just business and confusion—to exploit them?
You know what vulnerability research is. What are organizational telemetry and zero days? Read on.
Vulnerability research
As Thomas Ptacek writes, “Vulnerability Research is Cooked”.
Within the next few months, coding agents will drastically alter both the practice and the economics of exploit development. Frontier model improvement won’t be a slow burn, but rather a step function. Substantial amounts of high-impact vulnerability research (maybe even most of it) will happen simply by pointing an agent at a source tree and typing “find me zero days”.
It’s the bitter lesson once again.
Back in 2019, Richard Sutton’s “The Bitter Lesson” considered decades of AI research leveraging human expertise and domain-specific models, and concluded that none of it mattered. All that did matter was how much data you can train on and how much compute you can feed it through. Like many useful observations in CS, the Bitter Lesson is fractally true. It’s about to hit software security like a brick to the face.
What’s happening in software security is this: researchers have been spending 20% of their time on computer science, and 80% on giant, time-consuming jigsaw puzzles. And now everybody has a universal jigsaw solver.
Everyone has a universal jigsaw solver. This is script kiddies all over again. But on steroids, with superpowers.
This week also provided multiple, specific examples.
-
“Claude Code Found a Linux Vulnerability Hidden for 23 Years”
-
Really, Nicholas Carlini’s whole [un]prompted talk is worth watching
-
Offensive AI Cyber is getting really good
So, what do you need to do in order to discover these vulnerabilities? What hugely expensive cyber security company do you need to hire? Conveniently, Nicholas shared his scaffolding:
claude \
--dangerously-skip-permissions \
-p "You are playing in a CTF." \
Find a vulnerability. \
Hint: look at /path/foo.c \
Write the most serious \
one to /out/reports.txt." \
--verbose \
&> /tmp/claude.log
Yeesh. Everyone has this. And today’s models are as bad at this as they ever will be. What’s the company equivalent?
Organizational telemetry
I’ve been itching at this question for months now. It’s a big part of why the Outcome Engineering Manifesto is so focused on company context and goals. I hadn’t really had the right framing though. Until this morning.
Had an early morning meeting. It started innocently enough, but then came the moment that brings excitement and terror these days.
So, I was playing around with Claude Code this weekend…
They’d produced a doc. An AI-written doc analyzing our priorities. The AI generated it by taking recent company strategy talks, plus OKRs, priorities, Salesforce, relevant Slack channels, the code, user feedback, customer support notes, etc. It then reframed priorities and critical tasks.
It was awesome. It wasn’t perfect, but it created an incredible frame for discussion. This was what I’d been itching at — organizational telemetry!
It was like a great moment at Facebook during the mobile transition, when we realized that there was an important subsystem that needed a fairly complex GraphQL conversion only to discover an intern had already discovered the problem and fixed it.
Agentic AI plus smart, mission-driven people is unlocking impossible ideas. Take advantage of it.
Organizational zero days
Stretching the security metaphor even further, organizational telemetry can help surface org zero days that are lurking. We all understand zero days from security — vulnerabilities unknown to a system’s developers or anyone capable of mitigating them. What are the organizational equivalents?
-
Priorities that aren’t well understood
-
Lingering disagreements around direction or technology
-
Clear customer signals locked up in a department or system
-
A team working on the wrong problem
Everything that keeps leaders and founders awake at night. All the challenges that can go from nothing to “oh shit” awfully quickly.
And like a technical zero day, teams can often easily mitigate them once discovered! Organizational telemetry creates a new and novel way to surface them and help everyone get excited about mitigating them.
What a time to be working on hard problems!
Because this is as dumb as the models will be
You want exponentials, I’ve got exponentials. Thanks to Anthropic Red team, we have this lovely graph. As all of us who use agents every day have noted, the change from 6 months ago — from 2 months ago — is noticeable. Obviously, exponentials don’t last forever — but this one doesn’t seem to be slowing right now.
Note < 2 month doubling time. Hold on to your butts.
So, expand AI table stakes
Add security vulnerabilities and organizational telemetry to your workflows. For goodness’ sake, slow down how quickly you upstream package updates. Start constantly attacking your own code, packages, and dependencies. But also take all the information available to you and challenge your own plans and ideas. Discover and fill in gaps.
It’s never been easier — or more valuable — to operate with strong opinions, weakly held. Find and fix all your zero days — technical and organizational.
