Tags: Bruce Schneier, Prompt Injection, Lethal Trifecta, Security
2025

OODA Loops and Setec Astronomy

For the last 20 years, I have been alternately amused and terrified by the military cosplaying via lingo in the tech sector — with references to “S-2s” being among the most eye-rolling. I will however make a rule-proving exception with Bruce Schneier’s latest article about AI security: “Agentic AI’s OODA Loop Problem.”

Few people have thought longer or more deeply about cyber security than Bruce, and his reasoning behind adopting the OODA-loop framework is dead-on.

Traditional OODA analysis assumes trusted inputs and outputs, in the same way that classical AI assumed trusted sensors, controlled environments, and physical boundaries. This no longer holds true. AI agents don’t just execute OODA loops; they embed untrusted actors within them.

The OODA Loop

For those unfamiliar with the term, the OODA loop is a fighter pilot term originated by Air Force Colonel John Boyd. Boyd is credited with inventing basically everything about modern jet-fighter combat, from energy being the core currency of fighter engagements to the decision-making framework known as OODA:

  • Observe
  • Orient
  • Decide
  • Act

The influence and debate around OODA is far-ranging, but the important concepts to take away are the idea of gathering information and data, processing it in the context of goals, making a decision comparatively late in the process, and then acting. Then you repeat the loop, with new data from your actions.

It’s the core of most agile thinking. “Strong opinions, loosely held” is OODA shorthand. It’s a very effective methodology in many circumstances and is even built to be resilient to noise/misdirection in observations. Unfortunately, it is not designed to tolerate a hostile actor running their own OODA loop within each step.

And that’s the world we’re stepping into.

The Threat Surface

Schneier’s article breaks down the implications for each step:

Observe: The risks include adversarial examples, prompt injection, and sensor spoofing. A sticker fools computer vision, a string fools an LLM. The observation layer lacks authentication and integrity.

Orient: The risks include training data poisoning, context manipulation, and semantic backdoors. The model’s worldview—its orientation—can be influenced by attackers months before deployment. Encoded behavior activates on trigger phrases.

Decide: The risks include logic corruption via fine-tuning attacks, reward hacking, and objective misalignment. The decision process itself becomes the payload. Models can be manipulated to trust malicious sources preferentially.

Act: The risks include output manipulation, tool confusion, and action hijacking. MCP and similar protocols multiply attack surfaces. Each tool call trusts prior stages implicitly.

These are all supply-chain and compiler attacks as a service. It used to be that these types of attacks required significant time, money, and/or technical expertise — consider the cleverness of Ken Thompson’s 40-year-old backdooring of the C-compiler — but these are now available to pretty much anyone with an LLM.

Suddenly, rather than debating “Fast, Cheap, Good”, we’re debating “Fast, Smart, Secure”:

This is the agentic AI security trilemma. Fast, smart, secure; pick any two. Fast and smart—you can’t verify your inputs. Smart and secure—you check everything, slowly, because AI itself can’t be used for this. Secure and fast—you’re stuck with models with intentionally limited capabilities.

Alignment and Integrity

OpenAI had a chance to discuss these issues around the launch of Atlas but was deafeningly quiet about it initially. Their CISO did a long post to twitter, which Simon Willison pulled into a manageable post. It’s pretty sobering reading. Sure, their goals are admirable:

Our long-term goal is that you should be able to trust ChatGPT agent to use your browser, the same way you’d trust your most competent, trustworthy, and security-aware colleague or friend.

Sure, and I want a pony. The how gets much thinner. On the hand, they advocate for logged our mode and forced human observation — basically Schneier’s “slower, less smart” tradeoff — but then we get this absolutely brutal comment:

New levels of intelligence and capability require the technology, society, the risk mitigation strategy to co-evolve. And as with computer viruses in the early 2000s, we think it’s important for everyone to understand responsible usage, including thinking about prompt injection attacks, so we can all learn to benefit from this technology safely.

Let’s be clear: nobody understands responsible usage for LLMs. If they did, we wouldn’t have daily reports of successful data exfiltration. Or LLM psychosis. Or “error-ridden” rulings by US District Judges.

The good news — such as there is — is that the big model developers have every incentive to solve the alignment problem and make architecture improvements at every stage from training through inference. AI slop requires this. Model integrity and user safety, too.

What about right now? My recommendation would be that if you are exploring agentic browsers — and anyone working in AI really should — to do it in logged out, locked down, and sandboxed ways. I would avoid browser makers known for abuse of robots.txt and user data. Yolo-mode only in very controlled ways.