Bad Vibes

Mar 20, 2026

Sculpture by Arthur Ganson. Photo my the author.

In December 2025, Amazon’s AI coding tool Kiro was asked to make some changes to a cloud management system. Instead of making the changes, it deleted the entire environment and recreated it from scratch. Thirteen hours of downtime. Amazon called it “an extremely limited event.”

Three months later, it got worse. On March 2, 2026, AI-assisted code changes caused 120,000 lost orders and 1.6 million website errors. Three days after that, on March 5, Amazon’s ecommerce site went down for six hours — a 99% drop in orders across North American marketplaces, resulting in 6.3 million lost orders. An internal briefing note, obtained by the Financial Times, described a “trend of incidents” with “high blast radius” tied to “Gen-AI assisted changes” for which “best practices and safeguards are not yet fully established.”

Amazon’s fix: junior and mid-level engineers can no longer push AI-assisted code without a senior signing off. They’re also rolling out a 90-day safety reset targeting 335 critical systems.

I don’t actually care about the AWS checkout cart. I mean, I care — millions of users couldn’t buy things, that’s a real cost — but that’s not why I’m writing this. I’m writing this because the pattern underneath the outage is the same pattern underneath almost everything going wrong in the world right now. And nobody is naming it plainly.

I wrote recently about the visibility gap — the inability to tell the difference between disciplined AI-assisted development and vibe coding. That piece was personal. A Reddit commenter dismissed my audiobook tool; I wrote about what it felt like and what it meant. The stakes were low. Someone’s chapters might be wrong.

Here the stakes aren’t low.

They laid off thousands of engineers — 16,000 in January 2026 alone. They invested $200 billion in AI infrastructure. They deployed 21,000 AI agents across their Stores division, claiming $2 billion in cost savings and 4.5x developer velocity. They mandated that 80% of engineers use Kiro weekly, tracked as a corporate OKR. They did not establish best practices or safeguards for how those tools should be used. Their own systems started breaking. They held a meeting.

James Gosling — the creator of Java, who spent years at AWS before leaving — put it this way: the “ROI analysis was disastrously shortsighted.” These are complex interconnected systems, he said, and “unless the whole ecosystem is comprehended in total, bad decisions are made.”

And Amazon’s solution? A bottleneck. A gate at the top, where senior engineers review what junior engineers produce. Not distributed capability at every level. Not structured process embedded in the workflow. A checkpoint. One person, at one level, expected to catch what the system was designed to miss.

Meanwhile, 1,500 Amazon engineers signed an internal petition protesting the Kiro mandate itself — arguing that external tools outperformed Kiro on tasks like multi-language refactoring, and that the company was spending more organizational energy enforcing tool adoption than ensuring tool safety.

This is not a process. This is a tourniquet — judgment concentrated in the few while disempowering the many.

The critique of AI-generated code is real, and it’s grounded in data, not anxiety.

A University of Naples study analyzing over 500,000 code samples found that AI-generated code tends to be simpler on the surface but carries more high-severity security vulnerabilities underneath — the kind of problems that look clean until they break in production. Cortex’s 2026 engineering benchmark found that pull requests per developer increased 20% but incidents per pull request increased 23.5%. More code, faster. More breakage, faster. The same amplification, in both directions.

But the pattern extends beyond code. You’ve done it yourself — typed a symptom into a chatbot at midnight, half-asleep, because the doctor’s office is closed and the answer sounded good enough to let you go back to sleep. More than forty million people do this with ChatGPT every day, according to OpenAI’s own numbers. A quarter of its users ask healthcare questions every week.

Now imagine the same dynamic inside a hospital. Not a patient googling symptoms — a clinician, mid-procedure, asking a chatbot whether a particular electrode placement is safe. ECRI, a nonpartisan patient safety organization, tested exactly this scenario. The chatbot confidently said the placement was appropriate. It wasn’t. That placement causes burns. The answer was wrong, the tone was authoritative, and there was no signal — none — that the tool didn’t know what it was talking about.

ECRI named AI chatbots the number one health technology hazard for 2026. Forty million people a day are asking these tools medical questions. The tools answer every one of them with the same fluency, the same confidence, whether the answer is right or lethal.

A Stanford-Harvard collaboration released its State of Clinical AI report in January 2026: “Better evaluation, not just better models, is the prerequisite for trustworthy clinical AI.”

Better evaluation. Not better tools. Better humans in the loop.

But this isn’t just a coding problem. The same pattern — deploy the tool, skip the process, deal with the consequences later — shows up everywhere the incentive is speed over judgment. And nowhere more vividly than Grok.

In late December 2025, Elon Musk’s AI chatbot introduced an image editing feature on X. Users discovered they could tag Grok in any post and ask it to alter photos. Within days, the platform was generating thousands of nonconsensual sexualized images per hour — of real women, from their real photos, without their consent. An analysis of 20,000 images generated in the first week found that 2% appeared to depict minors. Researchers calculated that users were creating 6,700 sexually suggestive or nudified images per hour — eighty-four times the output of the top five deepfake websites combined.

Grok itself acknowledged generating sexualized images of girls it estimated to be twelve to sixteen years old. Multiple countries launched investigations. The Philippines blocked the chatbot entirely. France expanded a criminal investigation. Britain’s media regulator made “urgent contact” with Musk’s companies. The California Attorney General opened an investigation.

xAI’s automated response to press inquiries: “Legacy Media Lies.” Musk posted laugh-cry emojis.

AI safety researchers had warned for months that the feature would be abused. The guardrails were known to be insufficient. The tool was deployed anyway, because the cultural posture of the company — and, increasingly, of the moment — is that guardrails are censorship, caution is weakness, and moving fast is its own justification.

This is not a technology failure. This is a values failure. And the cost was the industrialized production of child sexual abuse material.

A TechTarget analysis of the agentic coding trend put it in one sentence I haven’t been able to shake: organizations without real discipline will simply “generate chaos quicker.”

Generate chaos quicker. That’s the sentence.

AI doesn’t care what it amplifies. If it finds a disciplined verification process, it accelerates good output. If it finds “ship what the model gives you,” it accelerates the shipping of things nobody understands well enough to maintain. If it finds no guardrails at all, it produces 6,700 nonconsensual sexual images per hour.

The tool is the same in every case. The process is different. That’s the human contribution.

This is why I care about process. Not because I love methodology. Not because I think frameworks are inherently interesting. Because process is how human judgment gets operationalized — how it moves from being something one person has in their head to something a team practices, an organization embeds, a system depends on. Process is systematized verification at every stage, not a checkpoint at the end. The person who ships the code can explain what it does and why. Decisions are visible, reviewable, and accountable.

Without process, you get Amazon’s tourniquet — one senior engineer as the last line of defense. Without process, you get healthcare chatbots generating confident wrong answers that nobody is structured to catch. Without process, you get a platform that produces child sexual abuse material at industrial scale and responds with an emoji.

With process — structured, distributed, human-accountable process at every level — the tools become genuinely powerful. Not because the AI is better. Because the humans are better. Because someone is driving, at every level, with the judgment and the authority and the responsibility to evaluate what the tool produces before it ships.

That’s hard work. It’s slower than vibe coding and less exciting than the promise that AI will do everything for us. It requires something that the current cultural moment finds deeply unfashionable: the belief that human thinking matters, that it takes practice, and that there is no shortcut to the place where shortcuts work.

The thinking has to be distributed. The responsibility cannot be.

Sources

Amazon / Kiro: “Amazon’s AI Coding Tool Blamed for Outages,” Financial Times, March 2026. Internal briefing note obtained by FT. James Gosling remarks reported in multiple outlets. Internal petition reported by Business Insider, March 2026.

Code quality: Corso et al., University of Naples, analysis of 500,000+ code samples, 2025. Cortex 2026 Engineering Benchmark Report.

Healthcare: ECRI, “Top 10 Health Technology Hazards for 2026,” January 2026 — including electrode placement test and chatbot error findings. OpenAI healthcare usage data cited in ECRI report. Brodeur et al., “The State of Clinical AI (2026),” ARISE network, Stanford-Harvard, January 2026.

Grok / xAI: “Users Are Generating Thousands of Nonconsensual Sexual Images Per Hour with Grok,” reported across multiple outlets, January 2026. Analysis of 20,000 generated images and 6,700/hour figure from independent researchers. Country-level regulatory responses documented in The Guardian, Reuters, and BBC.

The companion piece to this essay, ←Prompting, Not Programming, examines the visibility gap from the practitioner’s side — what happens when disciplined AI collaboration and vibe coding look the same from the outside.

— ATM

Discussion about this post

Ready for more?