Three Hours
278 tests, 99% coverage, and a production release before the kids wake up
A couple of weeks ago, while my kids were sleeping in on winter break during a late-night sleepover, I had a couple of hours before my house exploded into a circus of children demanding Bluey pancakes and quarreling over who gets to hold the cat. Precious time.
I sat on the couch with my coffee and built a complete backup orchestrator from scratch.
Not a prototype. Not a proof of concept. A fully tested production application with WebAuthn authentication, a web dashboard, multi-channel notifications, configurable job scheduling, and a break-glass disaster recovery system.
Three hours.
I want to be precise about what I mean, because “I built an app in three hours with AI” is a sentence that could describe vibe coding—the thing I’ve spent months arguing against. This is not that. Let me show you what three hours actually produced, and then explain why it matters.
If you’ve been following this series—from building a computer vision system for Harvard’s falcon cameras to what design education actually teaches you about thinking—you know I’m not a software developer. I have an architecture degree from MIT and 25 years in systems design and organizational leadership. I wrote my first line of Python eleven weeks ago.
Here’s what I built that morning.
Hōzō (宝蔵—”treasure storehouse”) is a backup orchestrator—a system that protects your data by automatically copying it to a second machine somewhere offsite (mine will live in my in-law’s basement). What makes it interesting is how much it handles on its own. It wakes a sleeping backup server over the network using a protocol called Wake-on-LAN, waits for a secure connection, then syncs your data incrementally using ZFS snapshots—a method that only transfers what’s changed since the last backup, rather than copying everything again. It verifies the remote state to confirm the data arrived intact, sends you notifications through multiple channels, and puts the backup machine back to sleep. If everything goes catastrophically wrong, there’s a disaster recovery function—deliberately buried at the bottom of a log page, because you should never trigger a full restore casually.
It has a command-line interface, a web dashboard with per-job log viewers and live polling, a settings editor, and WebAuthn passkey authentication—the same passwordless login your phone uses for banking apps, which is the right security model for a system that controls your data infrastructure.
Full type checking with mypy. Linting with flake8. Formatted with black. Zero errors across the board. The only two uncovered lines are a __main__ guard and a dead inner function—genuinely untestable.
If those numbers don’t mean much to you: 278 tests means I wrote 278 separate checks that verify the system does exactly what it’s supposed to do in every scenario I could anticipate—including the ones that should fail safely. 99% coverage means virtually every line of code is proven to work. The type checking and linting are automated systems that enforce consistency and catch errors before they become problems. This is how professional software teams ship code they trust. I built this alone, on a quiet morning, eleven weeks into learning the language.
The natural question is: did the AI just build this for you?
Yes and no. The AI wrote the code. I didn’t type the Python. But that question misses what actually happened, and the thing it misses is the whole point.
Here’s what I did in those three hours: I wrote a specification. I defined the architecture—what components exist, how they connect, what the data model looks like, what the CLI interface should be, what the web routes are, how authentication works, where the break-glass restore lives and why it’s hidden. I made every design decision. The AI made none of them.
Then I handed that specification to a coding agent and watched it execute. I monitored the terminal in real time. I ran the lint-test cycle after every implementation phase. When mypy flagged type errors, I directed fixes. When test coverage had gaps, I specified what was missing. When the agent made an architectural choice I disagreed with—and it did, twice—I caught it because I was watching, and I corrected it because I understood what the system was supposed to be.
The AI wrote the code. I built the system.
Those are different things.
But the three hours isn’t actually the story. The story is everything that came before.
In Walking Without Google Maps, I wrote about the foundational principle underneath all of my design work: you do not start with a solution. You start with a clearly defined and deeply understood problem. The quality of your problem definition determines the quality of your design parameters. Good parameters don’t constrain creativity—they focus it. They’re what make critical feedback and iteration productive rather than aimless.
That principle is the foundation of what happened in those three hours.
I didn’t sit down that morning and start prompting. I sat down with a deep, authentic understanding of the problem I was solving. I run a home lab. I know that Wake-on-LAN is unreliable in ways that require retry logic. I know that external USB drives spin down and need time to come back. I know that a restore function should exist but should be hard to trigger accidentally. Every design decision in Hōzō came from living with this infrastructure—knowing its rhythms, its failure modes, its particular stubbornness.
None of that knowledge came from the AI. The seed of the project had been sitting in my head for weeks—the problem was clear long before I decided to solve it.
The AI can write Python. It cannot define a problem worth solving.
Here’s what I actually want to talk about: how I developed the ability to go from problem understanding to working system in an afternoon.
Ten months ago, I couldn’t have done this. Not because I lacked access to AI tools—everyone has access to AI tools. I couldn’t have done it because I didn’t have the architectural vocabulary to translate my problem understanding into a technical specification, the verification discipline to ensure I got what I asked for, or the calibrated self-knowledge to know what I understood deeply enough to design around and what I needed to treat as a black box.
I started by building Sageframe—a knowledge cartography system—spending months on systems design, structural thinking, and methodology development before I ever wrote a line of code. That wasn’t a delay. That was the problem definition phase. Seven months of design thinking before I opened a terminal. My first Python commit was December 12, 2025—eleven weeks ago.
I developed those capabilities through a methodology I’ve been building called the Ho System. It’s a structured approach to human-AI collaborative development: bounded work sessions with explicit comprehension expectations, honest self-assessment, and a progression from heavy scaffolding to fluid practice. The tools I build are named in Japanese—Kanyō (contemplating falcons), Hōzō (treasure storehouse), Kinhin (walking meditation)—and the naming is not decorative. It points to something real about what the methodology asks of you.
The Ho System is modeled on the Japanese concept of shu-ha-ri: the three stages of mastery in martial arts, tea ceremony, calligraphy—any disciplined practice. First you follow the form strictly. Then you break from it deliberately. Then you transcend it entirely. The form isn’t the point. The form is how you internalize the principles until they become how you move.
The first project I built with this methodology was Kanyō—a computer vision system for monitoring peregrine falcons at Harvard. That took six weeks. Each session was carefully structured. I had detailed walkthrough documents. I checked prerequisites. I wrote extensive devlogs reflecting on what I understood and what I didn’t. The process was deliberate, slow, and sometimes tedious. Shu—follow the form.
The second phase was faster. I was writing my own session plans, making architectural decisions the methodology hadn’t predetermined, developing my own judgment about when to push deeper and when to treat something as a black box. Ha—break from the form.
Hōzō took three hours. No walkthrough document. No structured session plan. Just a clear mental model, a specification, and disciplined execution. Ri—the form has dissolved into how I work.
This is what practice looks like. Not talent. Not tools. Practice.
I mean that word the way a meditator means it, or a martial artist. Not repetition for repetition’s sake, but committed, structured engagement over time. You sit. You notice. You correct. You sit again. The point is not the individual session—it’s the accumulation. The slow development of capacity that can’t be rushed and can’t be skipped.
The linting and testing discipline I built in my first week of learning Python—because the methodology insisted on it from session one—is now invisible infrastructure. It’s not something I remember to do. It’s something that feels wrong to skip, the way a carpenter would notice a missing level. My repositories have pytest, mypy, flake8, and black configured from the first commit. Not because I set them up consciously, but because a project without them doesn’t feel like a project.
The verification habits—watching the agent work in real time, running the lint-test cycle, cross-checking significant changes with a more powerful evaluation model—those aren’t overhead. They’re what make three-hour production builds possible. I can move fast because I have systems that catch errors. Speed and quality aren’t in tension when the process is right.
And here’s a detail that stopped me short last week. The AI agents I work with have started following my quality practices unprompted—writing tests, checking coverage, running the linting pipeline—without being told to. For a moment I thought they had internalized my process. They hadn’t. What happened is subtler and more important: my projects now encode the process. The test infrastructure, the linting config, the commit discipline visible in the git log—the agent reads those artifacts and conforms to the standard it finds. The methodology didn’t train the AI. It shaped the environment the AI operates in. That’s a fundamentally different kind of leverage, and it only works if the environment was built with discipline in the first place. The last quality pass on Hōzō—run entirely by the agent—took the project from 243 tests at 91% coverage to 278 tests at 99% coverage, with zero flake8 and zero mypy errors. I didn’t ask for that. The project did.
Discipline begets discipline. Practice creates the conditions for better practice. The system is self-reinforcing—but only if you do the work to start the cycle.
That observation—the environment shapes the tool’s behavior, not the other way around—turns out to be the central finding of the most rigorous industry research on AI-assisted development.
The 2025 DORA State of AI-Assisted Software Development Report—Google’s annual study of software delivery performance, drawing on nearly 5,000 technology professionals—found that AI is an amplifier, not a solution. Teams with strong foundations (robust testing, mature version control, fast feedback loops) got faster. Teams without those foundations got more unstable. AI adoption increased throughput but also increased delivery instability. DORA’s warning: “Speed without stability is just accelerated chaos.” [1]
That is what I observed in a single codebase, measured across thousands of teams. AI amplifies whatever it finds. If what it finds is discipline, you get leverage. If what it finds is disorder, you get faster disorder.
Faros AI’s telemetry across 10,000 developers confirmed this with hard numbers. They call it “The AI Productivity Paradox”: individual developers using AI complete 21% more tasks and merge 98% more pull requests. But organizational delivery metrics—the things that actually matter, like lead time and deployment frequency—stay flat. The bottleneck doesn’t disappear. It migrates. Code generation gets faster, and the pressure shifts to review, verification, and integration. The parts of the process that require human judgment become the constraint. [2]
Meanwhile, the cognitive evidence is accumulating. A 2025 study from Microsoft and Carnegie Mellon, presented at CHI, found that higher confidence in AI tools correlated with less critical thinking—the more people trusted the output, the less they evaluated it. [3] A developer interviewed by MIT Technology Review described working without AI after months of heavy use: tasks that used to be instinct had become cumbersome. His skills had atrophied. [4] Gartner now predicts that half of all organizations will mandate AI-free skills assessments within two years, specifically because of critical thinking erosion. [5]
The industry’s emerging response to AI-induced skill atrophy is to test people without AI to see if they can still think. That is a bandaid on a hemorrhage. The problem isn’t that people use AI. The problem is that nobody taught them how to use AI without surrendering the cognitive work that makes the output trustworthy.
This is what the Ho System is actually for. Not “AI literacy.” Not “prompt engineering.” A structured practice that develops and maintains the judgment, verification discipline, and architectural thinking that make AI useful instead of dangerous. The DORA report calls for “robust control systems”—automated testing, version control practices, fast feedback loops. The Ho System builds those from the first session. The research says critical thinking declines when people trust AI without evaluating it. The Ho System makes evaluation structural—a four-layer verification stack that runs on every task, not a suggestion to “be critical.”
The gap in the current conversation is that everyone is diagnosing the problem—speed without stability, productivity without quality, generation without judgment—but the proposed solutions are all organizational: better CI/CD pipelines, stronger code review processes, platform engineering. Those matter. But they don’t address the individual practitioner. They don’t help the person sitting in front of the terminal, deciding whether to trust what the agent just built.
That’s the gap the Ho System occupies. It’s a practitioner-level methodology for the problem the industry is measuring at the organizational level. What would it look like if organizations trained practitioners this way from the start—rather than testing them without AI after the damage is done?
Here’s the thing nobody in the “AI will replace developers” discourse is talking about: the bottleneck was never code generation.
Code generation is easy. It was easy before LLMs—Stack Overflow and copy-paste got a lot of software built. It’s easier now. That’s not interesting.
What’s interesting is judgment. The AI is a power tool in a well-equipped shop. Whether you build furniture or kindling depends entirely on whether you understand joints, grain, and load. I learned joinery.
What I didn’t expect—what the methodology couldn’t have told me in advance—is what opens up on the other side of that learning.
Anything I can specify, I can build.
That sentence still startles me. It means the constraint is no longer implementation. The constraint is imagination. The constraint is: do I understand this problem deeply enough, creatively enough, to envision the right solution? And if I do—if I’ve lived with the problem, turned it over, understood its human dimensions and edge cases—the building is the easy part. The delightful part. I can sit down with a clear idea and watch it become real in an afternoon. Not a mockup. Not a wireframe. A working system that does what I meant it to do.
That changes how I think about everything. Problems I would have filed away as “someday, if I ever learn enough to build that” are now live possibilities. A monitoring system for peregrine falcons—sure. A backup orchestrator—three hours. An idea for a tool that helps teams reflect on their own processes? I can prototype it this weekend and it will actually work. The gap between conception and reality has collapsed, and what rushes in to fill that space is the most genuine creative energy I’ve felt in years. The kind of creativity I spent decades teaching other people to trust—now with a direct output channel that didn’t exist before.
And here’s the part that delights me most: the process demands the slow work. It doesn’t let you skip the understanding. A vague specification produces vague software. A shallow reading of the problem produces a shallow system. The methodology makes understanding load-bearing—it requires you to sit with ambiguity, to resist the urge toward premature solutions, to trust that the time spent with the problem is the work. Which means all those years of teaching design thinking—of insisting to students that the problem definition phase isn’t a delay, it’s the foundation—turned out to be exactly the right training for this moment. Twenty-five years of practice for an age that hadn’t arrived yet. The irony is almost too perfect.
There’s a word for this kind of practice—the slow, deliberate movement between building sessions, the time spent walking with a problem before sitting down to solve it. Kinhin—walking meditation. It’s becoming the part of the Ho System I’m most interested in developing: a structured practice for the understanding phase itself. For the creative work that makes everything else possible.
But that’s the next piece.
Three hours. 278 tests. 99% coverage. Zero linting errors. Production grade.
The process works. But the process takes time. There are no shortcuts to the place where shortcuts work.
Upstairs, the kids were starting to stir. I closed the laptop, poured a second coffee, and started the pancake batter. Hōzō was already running its first scheduled backup.
A few days later, one of the biggest blizzards in years buried the Northeast. School was canceled. In a fit of madness, we had six kids over to be snowbound for days together. The house filled with snow gear and hot chocolate and the particular chaos of children who’ve been told they can’t go outside yet. I looked at the forecast and thought: another quiet morning is coming.
References
[1] Google DORA Team, “2025 State of AI-Assisted Software Development Report,” September 2025. https://dora.dev/research/2025/dora-report/
[2] Faros AI, “The AI Productivity Paradox Research Report,” July 2025. https://www.faros.ai/blog/ai-software-engineering
[3] Lee, H., Kim, S., Chen, J., Patel, R., & Wang, T., “The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers,” CHI Conference on Human Factors in Computing Systems, 2025. https://dl.acm.org/doi/full/10.1145/3706598.3713778
[4] MIT Technology Review, “AI coding is now everywhere. But not everyone is convinced,” December 2025. https://www.technologyreview.com/2025/12/15/1128352/rise-of-ai-coding-developers-2026/
[5] Mixflow AI, citing Gartner prediction, “The 2026 Mandate: 5 Strategies to Prevent AI Skill Atrophy in Your Workforce.” https://mixflow.ai/blog/the-2026-mandate-5-strategies-to-prevent-ai-skill-atrophy-in-your-workforce/




