How to Measure Engineering Teams Without Breaking Them

A bonus for shipping zero bugs

A CTO I worked under early in my career offered his developers a big bonus if they could get a famously buggy platform down to zero bugs. He expected it to take months. They did it in about two weeks.

He was thrilled, right up until a contractor looked under the bonnet some time later. The developers had not fixed anything. They had silenced every error. No errors logged, no bugs reported, bonus paid, platform as broken as ever.

That is what happens to any number the moment you turn it into a target. People will hit the number. They just will not do it the way you imagined, because you stopped paying them to fix the platform and started paying them to make a metric say zero.

It has a name, Goodhart’s Law, and the person who made me think about it properly is James Charlesworth.

Meet James Charlesworth

James is Director of Engineering at Pendo, a product analytics platform. He also builds courses, and he came into engineering through the side door rather than the front one. His degree is in electronics, so he taught himself to code and taught himself how to hold his own in rooms full of computer science graduates.

His first job explains most of what he believes. At seventeen he soldered connections on a fire alarm production line. He clocked in with a physical stamp card, clocked out the same way, and his entire worth for the day was two numbers: the minutes he was present, and the count of fire alarms he pushed through.

Everything he thinks about leading engineers is a reaction against that line. Nobody would judge a designer by how many Figma files they made today, or a product manager by how many slide decks they produced this month. It is obviously absurd. Yet we do the exact equivalent to engineers and call it productivity. Same creative work, same problem-solving, just expressed in code instead of slides, and somehow that one difference convinces people a tally is a fair measure of the person.

The number is not the thing

Measure everything you can. Build the widest picture you can afford. The damage starts the moment you take one of those measurements and make it the thing people are rewarded or punished for, because that is the moment the number stops describing reality and starts describing how badly people want the reward.

The no-bugs bonus is the obvious version. Here is a quieter one from my own past. As a junior I watched a head of engineering, leaned on by a CTO for good quarterly numbers, pick lines of code added as his target. I thought it was a great idea at the time. I was young and I did not know any better. The memory makes me wince now, because lines of code added is a target you hit by writing more code than the job needs, which is the opposite of good engineering.

James’s instant reaction to that story was the tell of someone who has thought about it for years. Added, or removed? Removed is usually the one that matters. His team kept a code deletion trophy, handed out weekly to whoever cut the most code from the base, because deleting code is often worth more than writing it. Point a target at lines added and you get bloat. Point it at lines removed and you get a different kind of gaming. The honest position is that the count was never the point in the first place.

The instinct underneath all of this is a preference for the dashboard over the conversation. A dashboard is clean and quantified and fits in a board deck. A conversation is messy and human and impossible to put in a cell. So leaders trust the dashboard more, even when the dashboard is measuring the wrong thing, because it feels more rigorous. It is not. It is just easier to look at.

We measure what is easy, not what matters

Robert McNamara ran a Ford plant and was brilliant at it. He measured stock, manufacturing times, every part, and used the numbers to predict and improve. Then he was made US Secretary of Defense and ran the Vietnam War the same way, on troop numbers, body counts, wins and losses, all on spreadsheets. It failed, because the things that actually decided that war, morale, the will of the people on the ground, do not show up in a body count.

That failure has a name too, the McNamara fallacy, and it is the trap of over-optimising what is easy to measure while quietly ignoring, and eventually denying the existence of, what is hard to measure but matters more.

The parallel to an engineering team is exact. The things that predict whether a team performs are the hardest things to put a number on. Trust between the people on it. How well they communicate. Whether they are happy, because a miserable team is rarely a creative one. None of that lands cleanly on a dashboard, so most leaders measure pull requests and story points and tell themselves they understand the team. They understand the easy third of it. The two thirds that decide the outcome are the part nobody put on the screen.

DORA is a good tool aimed at the wrong target

DORA metrics are the cleanest example going of a good tool being misused, and they are everywhere now, so they are worth being precise about.

The four measures, lead time, deployment frequency, change failure rate, and mean time to recovery, are genuinely valuable, for one specific reason. They describe a whole DevOps system, from what developers do to what production does in response. On a platform or site reliability team they tell you how your system copes with change and failure, and that is real, useful information.

They fall apart the instant you point them at people. The moment DORA stops describing a pipeline and starts describing this engineer or that team, it has been bent into a job it was never built for, and you are back to lines of code with a more respectable acronym.

Leaders keep reaching for it anyway, and the reason is more human than technical. AI is changing how software gets built, nobody is certain what their team will look like in a year, and that uncertainty makes leaders feel like control is slipping. A clean dashboard is a comfort blanket. DORA is very easy to stand up, and it is packaged in the kind of board-friendly language, MTTR and all, that makes an executive think this is the rigorous thing we should run everywhere. It spreads as a people metric precisely because it looks authoritative and it settles the nerves of someone who feels out of control. Neither of those is the same as it being the right thing to measure.

The only metrics that matter are the ones customers feel

The things worth optimising for are business and product outcomes. If you run a SaaS product, that is your adoption, your retention, your customer feedback, your revenue. Nothing else is the scoreboard.

You could not tell me the mean time to recovery on Amazon, but you know exactly what their product outcomes are, because you use the thing and you decide whether to keep paying for it. Customers experience outcomes. They never see activity. They have no idea how many pull requests went into the feature, only whether the feature is any good.

The uncomfortable bit for engineers is that too many of us hide behind the old line that our job ends when the code ships. If the wrong thing got built, that was product’s problem, not ours. It was always our problem. Building the right thing is the job of a combined product and engineering team, and disowning the outcome is just a way of avoiding the hard part.

You do not need a director’s title to pull a team toward this. Most teams run sprint demos. Any engineer presenting one, senior IC included, can tie what they built to the outcome it is meant to move. Not “here is the feature” but “here is the feature, and here is the number we expect it to move and why.” Do that and it travels upward on its own. A director watching that demo sees a team that thinks in outcomes, and starts holding the quieter teams to the same standard. You change the culture by modelling it, not by mandating it.

Silos are the same problem wearing a different coat. When I onboard engineers I send them to spend a few days in customer service, then sales, then marketing, so they hear the real customer problems and learn the company’s actual voice before they write a line. Most companies never do this, which is why the manager has to be the one who breaks the wall down. Your relationship with your counterpart in product or marketing matters as much as your relationship with your own engineers, because that relationship is the silo-breaker. If your people are not being put in front of other functions, you carry that context in yourself.

AI is making the measurement problem worse

Some companies have started tracking token spend per developer, and some engineers wear being a top-five token spender like a medal. It is lines of code all over again. Token spend is an activity metric. It tells you how much someone used a tool, not whether anything good came out the other end.

The question worth asking is about speed and learning instead. AI is genuinely good at teaching you something fast and at getting a prototype in front of a real user quicker than was ever possible. So measure that. Track the time from having an idea to a person using a rough version of it. That number should be falling. If it is, the token spend was a side effect, not an achievement.

Lean on AI for everything, though, and the human collaboration that produces good work quietly erodes, which is the cost no dashboard will show you. Some in-person time matters, not five days a week, but enough that people stand at a whiteboard and talk things through. AI will not replace the chat over coffee or the argument at the whiteboard, and a team that retreats entirely into private conversations with a model is not a team. It is a room of individual operators each running a fleet of agents, and that produces worse work than people building together.

How you are measured once you stop being the one who builds

The day you become a manager you go from the best IC on the team to the worst manager on it. Engineering and management are separate ladders, and switching drops you from near the top of one to the very bottom of the other. The instinct that earned you the promotion, being the strongest engineer, is now the quickest way out the door if you keep indulging it, because the job is no longer to do the work. It is to enable the people who do, and which way you lean when that gets hard is part of what kind of manager you are quietly becoming.

So your output stops being yours. You are measured by the output of the people who report to you. Whether they are visibly doing great work. Whether you are their voice in the organisation. Whether you are breaking down silos and helping them manage upward. Every engineer you promote is a mark in your favour. So is every underperformer you handle well, because performance managing one struggling person on a team of five is a gift to the other four.

A project gets cancelled, the team is handed something new, and whether you can carry them through it without half of them quitting is the truest measure of the job. The managers who report to James get measured on exactly that: what their teams say about them, how often their people escalate over their heads, and how many engineers name them on the way out. Your team is watching you more closely than you think. You can disagree with a decision privately and still commit to it in front of them, and that is the job, leading people from one mission to the next without the wheels coming off.

People do not quit their jobs. They quit their managers.

Final thoughts

Every story here is the same story. The number is a finger pointing at the thing that matters. Start optimising the finger and you lose the thing it was pointing at.

Measure widely, keep the messy human signals that no dashboard will ever hold, anchor on the outcomes your customers actually feel, and never let a single metric become the target. The job was always harder than a spreadsheet. That is the whole reason it is worth doing well.

If you want to know what kind of engineering leader you are becoming, and the blind spot that comes with your type, I built a quiz that scores you across all five. Three minutes, and it is free. Find out yours here.

And if you want to get sharper at the part of the job no dashboard measures, the people part, that is the whole of what we do inside EM Accelerator.

How to Measure Engineering Teams Without Breaking Them

A bonus for shipping zero bugs

Meet James Charlesworth

The number is not the thing

We measure what is easy, not what matters

DORA is a good tool aimed at the wrong target

The only metrics that matter are the ones customers feel

AI is making the measurement problem worse

How you are measured once you stop being the one who builds

Final thoughts

Keep reading

Your First 90 Days as an Engineering Manager

The Five Types of Engineering Manager (And the Blind Spot Each One Cannot See)

Values-Based Leadership: A Senior Director's Playbook for Building Trust at Scale for Engineering Managers

How to Measure Engineering Teams Without Breaking Them

A bonus for shipping zero bugs

Meet James Charlesworth

The number is not the thing

We measure what is easy, not what matters

DORA is a good tool aimed at the wrong target

The only metrics that matter are the ones customers feel

AI is making the measurement problem worse

How you are measured once you stop being the one who builds

Final thoughts

Keep reading

Your First 90 Days as an Engineering Manager

The Five Types of Engineering Manager (And the Blind Spot Each One Cannot See)

Values-Based Leadership: A Senior Director's Playbook for Building Trust at Scale for Engineering Managers

Get the latest issuesin your inbox

Get the latest issues
in your inbox