How to Evaluate a Startup's Technology

Quick Technical Due Diligence Without the Buzzwords

9 min read

Every few months, an investor asks me to look at a startup's technology. The request is always framed the same way: "We like the team and the market. Can you tell us if the tech is real?" They want a technical due diligence assessment, and what they expect is a detailed report covering architecture, code quality, scalability, security, testing practices, and infrastructure. A comprehensive checklist, professionally evaluated.

What I actually do looks nothing like that.

Technical due diligence, done well, is closer to buying a used car than to conducting an audit. An audit checks whether everything is where it should be. A used car inspection asks a different question: is this vehicle going to get me where I need to go, and what is the seller not telling me? You do not inspect a used car by reading the owner's manual. You drive it. You listen to the engine at sixty kilometres an hour. You check whether the tyres are wearing evenly. You look at the small details that reveal how the car has actually been treated, as opposed to how it has been prepared for sale.

The startup equivalent of a fresh coat of paint is remarkably easy to produce on short notice, and remarkably easy to mistake for evidence of sound engineering. I have learned, through doing this enough times, that the real signals are few, specific, and almost never found in the places most investors think to look.

The Staging Problem

When a startup knows due diligence is coming, a predictable sequence of events begins. Someone tidies up the README. Someone else writes architecture documentation that did not exist the week before. Commented-out code gets removed. Variable names get cleaned up. A few lingering pull requests get merged or closed. If the team is particularly diligent, they will set up a staging environment and walk you through a prepared demo that hits all the right notes.

None of this is dishonest, exactly. It is the engineering equivalent of cleaning your flat before guests arrive. Everyone does it. The problem is not that the startup is presenting its best face. The problem is that the evaluation is happening against the staged version, and the staged version tells you almost nothing about how the engineering actually works on a Monday morning when three things are broken and the biggest customer is threatening to leave.

I learned this early. One of the first due diligence reviews I conducted was for a Series A company that presented beautifully. Clean repository. Comprehensive tests. Architecture diagram that looked like it belonged in a textbook. I spent two days reviewing and gave the investor a positive report. Six months later the company's engineering team fell apart. Three of five engineers quit within the same quarter. The codebase, it turned out, was the product of a two-week cleanup sprint conducted specifically for our review. The actual development process was chaotic. The tests existed but nobody ran them regularly. The architecture diagram described the system they intended to build, not the one they had.

The staging problem has a straightforward solution: look at the things that cannot be staged on short notice. Commit history cannot be fabricated in a week. Deployment frequency cannot be invented. The pattern of how pull requests are reviewed - who reviews them, how long reviews take, how substantive the comments are - is visible in the git history and is nearly impossible to fake. Incident response history, if the company uses any kind of issue tracker, reveals how the team behaves under pressure, which is when engineering quality actually matters.

I spend very little time reading code during due diligence. I spend a lot of time reading git logs.

Deployment Velocity

Deployment velocity is a proxy for engineering health, in the same way that resting heart rate is a proxy for cardiovascular fitness.

If I could ask only one question during a technical evaluation, it would be this: show me a change going from idea to production. Not a major feature. A small change. A bug fix, a copy update, a minor improvement. Walk me through the entire path: who decides it needs to happen, who writes the code, how it gets reviewed, how it gets tested, how it reaches users, and how long the whole thing takes.

The answer to this question reveals more about the state of the engineering than any amount of code review. A team that can ship a small change to production in a few hours has, by definition, solved a constellation of hard problems. They have a codebase that is not so tangled that a small change risks breaking unrelated things. They have automated testing that gives them confidence to deploy without extensive manual verification. They have deployment infrastructure that does not require a dedicated engineer to babysit. They have a culture where people are trusted to make and ship decisions without layers of approval.

A team that takes two weeks to ship a bug fix is telling you something different. The specific reasons will vary - maybe the test suite takes four hours to run, maybe deployments require a manual checklist and a senior engineer's sign-off, maybe the codebase is so interconnected that touching one feature breaks three others - but the root cause is always the same. The engineering has accumulated friction faster than the team has removed it, and that friction compounds.

Deployment velocity matters as a proxy for engineering health, in the same way that resting heart rate is a proxy for cardiovascular fitness. You do not need to understand the physiology to read the signal. A healthy engineering organisation ships frequently because frequent shipping is a consequence of doing dozens of things right. An unhealthy one ships slowly because any one of those things has gone wrong, and the symptom is the same regardless of which specific thing it is.

When I find a startup that deploys multiple times per day with confidence, I rarely find other serious engineering problems. When I find a startup that deploys once a fortnight with anxiety, I always do.

Proportional Complexity

There is a version of the used car that should make you suspicious for reasons that are the opposite of what you expect. Not the car with obvious problems - the car that has been modified far beyond what the situation requires. The hatchback with a racing exhaust. The city commuter with off-road suspension. These modifications are not improvements. They are evidence that the owner was more interested in the vehicle than in the journey.

I see the engineering equivalent constantly. A startup serving two hundred users, running on a microservices architecture with Kubernetes orchestration, a service mesh, event-driven messaging between twelve separate services, and a data pipeline built for millions of daily events. The technical sophistication is genuine. The engineering is often quite good. But the proportionality is wrong, and disproportionate complexity is one of the most reliable red flags in due diligence.

Good engineering is proportional to the problem it solves. A startup at the seed stage solving a well-defined problem for a small number of users should have a simple, boring architecture. A monolith. A relational database. Server-side rendering. Deployments that a single engineer can understand end to end. The interesting technical decisions should be in the product, not in the infrastructure.

When I see the opposite - when the infrastructure is more sophisticated than the product - I know one of a few things is true. The team is optimising for a scale they have not reached and may never reach, which means they are spending engineering time on hypothetical problems instead of real ones. Or the technical founder is building the system they find interesting rather than the system the business needs, which is a judgment problem. Or the team hired too many senior infrastructure engineers too early and those engineers, reasonably, built the kind of system they know how to build.

The healthiest startups I evaluate have engineering that looks almost disappointingly simple relative to what you would expect. They have made the boring choice at every point where the boring choice worked, and they have only reached for complexity where the problem genuinely demanded it. This is engineering maturity, and it is far more predictive of long-term success than technical sophistication.

The Honest Conversation

Technical debt is not the problem. Not knowing where it is, is the problem.

Every startup has technical debt. This is not a criticism. It is a description of reality. Any company that has been building quickly, responding to customer needs, and iterating on its product will have accumulated shortcuts, workarounds, and architectural decisions that made sense at the time and make less sense now. The absence of technical debt in an early-stage startup is not a sign of quality. It is a sign that the team has been polishing instead of shipping, which is its own kind of problem.

What matters is whether the team knows where the debt is.

I always ask the CTO or technical lead the same question: if you could rebuild any part of the system from scratch, what would it be and why? The answer is one of the most revealing moments in any due diligence process.

A good CTO answers immediately and specifically. "The authentication system. We bolted it on in month two and it has been a source of bugs ever since. We patched it enough to be secure, but every new feature that touches user permissions takes twice as long as it should because the data model is wrong." This answer tells me several things at once. The CTO knows where the problems are. They understand why the problems exist. They have made a deliberate decision to live with the debt rather than fix it, which means they are prioritising appropriately. And they are honest enough to say all of this to someone evaluating their technology, which is a character signal as much as a technical one.

A bad answer sounds like this: "We're pretty happy with the architecture. There are always things you could improve, but honestly we've built a pretty solid foundation." This tells me the CTO either does not know where the problems are - which means they are not looking - or is unwilling to discuss them honestly with an evaluator, which is worse. Every codebase has significant weaknesses. A technical leader who cannot name theirs is not one I would trust with an investor's money.

The middle answer - "Everything, honestly, it's all a mess" - is also a red flag, though a subtler one. This suggests a CTO who has lost confidence in the codebase, or who equates self-deprecation with honesty. The useful answer is specific, not comprehensive. A team that says everything needs rebuilding does not know how to prioritise, and prioritisation is the core skill of engineering leadership.

The AI Wrapper Test

The wrapper test: Ask what happens if the model provider raises prices tenfold or ships a competing feature. Companies with genuine technical depth talk about their own infrastructure. The rest talk about their roadmap.

AI startups require a specific additional lens, because the gap between what is presented and what exists is wider in AI than in almost any other category of software.

The fundamental question is straightforward: what does this company own that would be difficult to replicate? In traditional software, the answer is usually the codebase, the architecture, the accumulated product decisions baked into the system. In AI startups, the answer should be one or more of the following: proprietary training data, a fine-tuned or purpose-built model, a domain-specific evaluation framework, or a data flywheel where usage improves the product in ways that are difficult to bootstrap from zero.

What the answer often turns out to be is: a well-crafted system prompt and an API key.

I do not say this dismissively. Some genuinely valuable businesses have been built as integration layers over foundation models. The skill of understanding a customer's problem well enough to translate it into effective AI interactions is real and worth something. But it is not a technical moat, and investors evaluating these companies as technology investments need to understand the distinction.

The test I apply is simple. I ask: what happens to your product if your model provider raises prices by a factor of ten, or releases a feature that directly competes with your core use case? Companies with real technical depth have answers that involve their own infrastructure. They have trained models on proprietary data. They have built evaluation pipelines that measure output quality against domain-specific benchmarks. They can switch providers or run open-source models because their value is in the layer above the foundation model, not in access to it.

Companies without technical depth have answers that involve hoping this does not happen. Their product is, in effect, a user interface and a set of prompts that sit on top of someone else's intelligence. When I examine the engineering, I find a relatively thin application layer - often well-built, often well-designed - that delegates all of the hard work to an API call. The team's technical roadmap, if they are honest about it, consists largely of waiting for the next model to be better and then passing that improvement through to their users as if it were their own.

I have evaluated perhaps twenty AI startups in the past eighteen months. Roughly a third had genuine technical depth. The rest were wrappers of varying sophistication. The wrappers are not worthless - some of them are solving real problems and generating real revenue - but they should be valued as distribution and design businesses, not as technology businesses. The valuation multiples are very different, and investors who confuse the two categories are overpaying.

What Investors Actually Need to Know

Most investors who commission technical due diligence want certainty they cannot have. They want to be told the code is good or bad, the architecture will scale or will not, the technology is real or is not. These are the wrong questions, and answering them with false precision would be a disservice.

The right question is whether the engineering team makes good decisions under constraints. Every startup operates under constraints - time, money, information, talent. The quality of the engineering is not determined by how it looks in ideal conditions but by how the team has handled the tradeoffs that constraints impose. Did they take shortcuts in the right places? Did they invest in reliability where it mattered - in the payment system, in data integrity, in the core algorithm - and accept imperfection where it did not? Do they know the difference?

This is engineering judgment, and it is what I am actually evaluating when I review a startup's technology. Not whether the code is clean. Not whether they chose the right framework. Not whether test coverage hits some arbitrary threshold. I am evaluating whether the people making technical decisions have good instincts about what matters, the honesty to acknowledge what they have got wrong, and the discipline to fix the things that need fixing while leaving alone the things that do not.

An investor does not need to understand code to evaluate this. They need to ask the right questions and know what good answers sound like. Good answers are specific. Good answers acknowledge tradeoffs explicitly. Good answers demonstrate awareness of what is working and what is not, without either false confidence or performative humility.

The used car analogy holds to the end. You do not need to be a mechanic to buy a good used car. You need to know which questions to ask, which answers should worry you, and when the seller's confidence is earned versus performed. The best technical due diligence does the same thing. It gives the investor a framework for evaluating engineering judgment, not a checklist for evaluating code.

AI startups technology investing