You Wouldn't One-Shot a Hospital

Software is millions of tiny prompts. It always was.

9 min read

Nobody would one-shot a hospital.

If you walked up to a construction firm and said 'build me a 200-bed regional hospital with an emergency department, two operating theatres, a maternity ward, and parking for five hundred vehicles', and then expected a hospital to materialise from that single sentence, people would question your sanity. A hospital is the product of thousands of people executing millions of individual decisions over years. The client briefs the architect. The architect instructs the structural engineers. The engineers specify requirements for the contractors. The contractors direct their teams. The instructions branch downward, becoming more specific at every level, until the actual work - laying a tile, welding a joint, wiring a circuit - is governed by an instruction so precise that the person executing it can do so without understanding the full context of the building above them.

This is exactly how most people try to build software with LLMs. A prompt goes into Lovable or Bolt or a chat window: 'build me a project management tool with Kanban boards, time tracking, and Stripe billing.' Thirty seconds later, something appears on screen. It has pages and buttons and what looks like functionality. It demos well. It falls apart the moment a real user touches it, for the same reason a one-shot hospital would have no plumbing behind the walls: one instruction, however detailed, cannot carry the weight of a complex system.

The people building serious software with LLMs have figured out what the one-shot crowd has not. The skill is building the tree, not writing the prompt.

The One-Shot Fallacy

The tools are partly responsible for the expectation. Lovable, Bolt, Replit Agent, v0 - they are marketed on the promise that the gap between idea and product is now one prompt wide. Type a description, get an application. For simple things, this genuinely works. A landing page, a contact form, a portfolio site. These are one-prompt problems because they are one-level-of-complexity problems. No architecture decisions. No state management tradeoffs. No integration logic that needs to handle failure modes.

The fallacy is in extrapolating from these successes to complex systems. Because a landing page materialised from one prompt, people assume a SaaS application will too, if the prompt is detailed enough. They write longer prompts. They add specifications. They describe the database schema, the user roles, the API endpoints, the edge cases. And the LLM, which is nothing if not obliging, produces something that matches the description on the surface. The failure is invisible in the demo. It becomes visible in the first week of real use, when someone does something the prompt did not anticipate - which is most things - and the application has no considered response because nobody considered it.

The longest prompt in the world is still one level of decomposition. A two-thousand-word specification typed into a single chat message is not a substitute for a prompt tree, any more than a two-thousand-word letter to a construction firm is a substitute for architectural drawings, structural calculations, and work orders. Detail is not depth. Length is not structure. The difference between a prompt and a tree is not how much you say. It is how many levels of translation exist between what you say and what gets built.

How a Hospital Gets Built

When a client decides to build a hospital, the first instruction is broad. This region needs a 200-bed hospital. Here is the budget, here is the land, here is the deadline. That instruction goes to a project director, whose job is not to build anything but to decompose the problem into streams: architecture, structural engineering, mechanical and electrical services, medical equipment procurement, regulatory compliance. Each stream receives its own brief - a prompt derived from the original intent but translated into the language of that domain.

The architect receives: design a building with these departments, these patient capacities, these adjacency requirements. Emergency near imaging. The ICU near the operating theatres. Outpatient clinics near the main entrance. The architect does not pass this directly to a draughtsperson and wait. They make hundreds of decisions first. Building orientation. Circulation patterns. Natural light strategy. Fire escape routing. Ceiling heights for equipment clearance. Then they issue their own prompts downstream: draw the second floor plan. Detail the emergency entrance. Model the ventilation requirements for the operating theatres.

Each of these generates further instructions. The structural engineer receives the architect's plans and translates them into load calculations, foundation specifications, steel grades. These specifications become work orders for the contractors, who translate them into task assignments for their crews. And so on, branching downward, until someone is being told to lay these specific tiles in this specific pattern with this specific adhesive at this specific spacing.

The tiler does not need to understand the hospital's patient flow strategy. They need their instruction to be correct and specific. The quality of the hospital depends on every level issuing good prompts to the level below, and every level making sound decisions within its own domain. The project director handles budget and timeline. The architect handles space and flow. The engineer handles forces and materials. The foreman handles sequence and labour. Each level operates with different expertise on a different class of problem, and the translation between levels is where the real intellectual work happens.

There is a second hierarchy running alongside the first: the checkers. The structural engineer's calculations are peer-reviewed. The architect's plans are assessed against building codes. The electrical work is inspected by a certified inspector. The completed building passes fire safety review. Making and checking are parallel processes at every level of the tree. The prompts flow downward. The verification flows upward. No responsible construction project allows any level to be both sole maker and sole checker of its own work.

The Prompt Tree

A prompt tree is not a technique for talking to LLMs. It is a technique for thinking about work.

The hospital is not a special case. It is a clear case. Every complex project in human history has been organised the same way: a hierarchy of instructions, each level translating broad intent into narrower action. Film productions work this way. Military operations work this way. Publishing a newspaper works this way. The prompt - an instruction issued to an agent with the expectation that they will act on it - is not an invention of the LLM era. It is how organised work has functioned for centuries. The LLM did not create the need for prompts. It gave us a new recipient for them.

In software, the prompt tree looks like this. The root is the product vision: a project management tool for small teams that replaces spreadsheet chaos. Level one is the product specification: what features exist, who the users are, what the core workflows look like, and - critically - what we are not building. Level two is the technical architecture: framework, data model, authentication approach, deployment story, integration points. Level three is the module specification: the Kanban board has these column states, these card transitions, this drag-and-drop behaviour, this persistence model. Level four is implementation: write the function that moves a card between columns, the endpoint that persists board state, the styles that render the layout. Level five is verification: confirm that cards transition correctly, that the API rejects invalid state changes, that the layout holds on a narrow screen.

Each level is a prompt. Each level's output becomes the input for the level below it. And each level requires different thinking. The product spec requires market understanding. The architecture requires engineering judgment. The module spec requires design thinking. The implementation requires craft. Collapsing the tree into a single prompt forces one instruction to carry all of these concerns simultaneously, and no single instruction can do that well. Not because LLMs are limited, but because the concerns are genuinely distinct.

LLMs are remarkably good at levels four and five. They write functions, generate tests, handle boilerplate with speed and accuracy that would have seemed absurd five years ago. They are getting better at level three, particularly when given clear architectural constraints to work within. They are surprisingly useful at level two if you provide the right context and ask the right questions. They cannot do level one for you, because level one requires knowledge the model does not have: who your customers are, what frustrates them about their current tools, what they would pay money to solve, and what already exists that you need to be meaningfully better than. That knowledge comes from you. It always will.

Where People Skip Levels

The one-shot approach collapses the entire tree into a single jump from root to leaf. 'Build me a project management tool' goes directly to code. Levels one through three - the product specification, the architecture, the module designs - are skipped entirely. And those are the levels where the actual intellectual work lives.

When you skip levels, the LLM fills in the gaps with plausible defaults. It decides what features to include, how to structure the data, how components relate to each other, what happens when something goes wrong. It makes these decisions based on the most common patterns in its training data, which means it builds the average of everything it has seen. The result is generic. Not wrong, exactly, but not considered. Not the product of someone who understood a specific problem and made deliberate tradeoffs. It is a plausible hospital, with corridors that lead somewhere and rooms that have doors. But the emergency department is in the wrong place, the ventilation in the operating theatres is inadequate, and the building does not meet local seismic codes, because nobody with the relevant expertise made those decisions. The LLM made them by default, which is to say, nobody made them at all.

The levels exist because the work at each level is genuinely different. Deciding what to build requires product judgment. Deciding how to structure it requires architectural judgment. Deciding how each piece behaves requires design judgment. These are different skills exercised by different types of thinking, and they produce different outputs. A product spec is not a shorter version of an architecture document. An architecture document is not a vaguer version of a module spec. They are different artefacts answering different questions. Collapsing them into a single prompt does not eliminate the work. It eliminates the thought.

Lovable is an excellent bricklayer. Fast, consistent, increasingly capable. Give it a level-four prompt - 'write a React component that renders a Kanban board with drag-and-drop, conforming to this data model and this state machine' - and it will do outstanding work. Give it a level-zero prompt - 'build me a project management tool' - and it will do its best to simulate the five levels of decomposition you skipped. What you get back is a simulation. Realistic at a distance. Hollow when you lean on it.

The Checker Problem

The make-check pattern: Build your prompt tree with alternating make and check nodes. Spec, then critique the spec. Architecture, then stress-test the architecture. Code, then review the code. Run each check in a separate context from the make step - the same session that produced the work is anchored to the choices it already made.

In the hospital, every maker has a checker. The electrician wires the panel; the inspector verifies it meets code. The architect draws plans; the peer reviewer assesses them against structural and regulatory requirements. This is not bureaucracy. It is how complex systems achieve reliability. Making and checking require different cognitive modes. Making is generative: produce something that satisfies the instruction. Checking is analytical: verify that what was produced is correct, complete, and actually satisfies the intent behind the instruction, not just its literal wording.

When you one-shot, you get a maker with no checker. The LLM generates everything in one pass, and the session that generated it lacks the separation of context needed to critique it effectively. It can catch syntax errors, certainly. But can it question whether a feature should exist? Whether the data model will buckle under a usage pattern the original prompt did not describe? Whether the architecture embeds assumptions that will be expensive to undo in three months? These are checking questions, and they need a vantage point separate from the one that produced the work.

In a well-built prompt tree, checking is woven into every level. You write the product spec, then you review the spec - not in the same session, not in the same breath. You start a fresh context and prompt it to be adversarial: what assumptions is this spec making? What user needs has it ignored? What is the riskiest bet in this document? You do the same with the architecture, the module designs, the implementation. Each level has a make step and a check step. The hospital does not allow the electrician to inspect their own wiring. Your prompt tree should not allow the session that wrote the code to be the only one that reviews it.

The checking prompts are the ones most people skip. Writing them feels like overhead when the making was so fast. Why review the spec when you could already be generating code? Because the spec is where the expensive mistakes are. A wrong line of code costs minutes to fix. A wrong architectural assumption costs weeks. A wrong product decision costs the project. Checking is cheapest at the top of the tree and most expensive at the bottom, which is exactly the opposite of where most people invest their attention.

The Tree Is the Product

What separates people building real software with LLMs from people generating demos is prompt management, not prompt engineering. Prompt engineering - the craft of writing an effective instruction for a given level - is a useful skill. But mastering one level is not mastering the tree. The tree is the higher-order skill: decomposition, delegation, verification, applied across every layer from vision to deployment.

This is management. It is the same work a general contractor does on a building site, a CTO does in a technology company, a director does on a film production. The medium changed. You are delegating to a model instead of to a team of people. But the principles are identical. Break the work into layers. Scope each layer to the capability of the agent executing it. Verify the output before passing it down. Fix problems at the level where they originate, not at the leaves.

Nobody who has built anything complex has ever done it in a single instruction. The hospital is millions of prompts. The film is millions of prompts. The newspaper is millions of prompts. Software was always going to be millions of prompts too. The LLM did not change that. It just gave you a faster way to execute the leaves.

AI development startups strategy