Skip to content
Cover of Software Engineering at Google

Notes on

Software Engineering at Google

by Titus Winters

• 12 min read


Programming vs software engineering

The book draws a hard line between the two. Programming is writing code. Software engineering is everything you need to make that code useful for as long as it needs to be, across teams and across years.

Getting good at programming is a tactic. Getting good at software engineering is strategy.
There’s a 100,000x difference between the lifespan of short-lived code and long-lived code, and the same practices don’t apply at both ends.

The book defines sustainability precisely: your codebase is sustainable when you are able to change all the things you ought to change, safely, for the life of your codebase.
You might choose not to change, but you need the capability. It’s one thing to know in theory that you can recover from tape, and another to know in practice exactly how to do it and what it’ll cost.

It’s programming if ‘clever’ is a compliment, but it’s software engineering if ‘clever’ is an accusation.

Both clever and clean code have their place. Each would be unwelcome when the other is called for.

This doesn’t mean you should always engineer for the long term. Shipping fast early in a company’s lifetime is a reasonable trade-off to validate ideas.
Lots of programmers don’t seem to agree with this, or maybe don’t understand it, which is interesting.
A serial startup developer could have 10 years of experience and zero experience maintaining anything that lives longer than a year. That’s fine, it’s a different game.

Hyrum’s Law

With enough users of an API, it doesn’t matter what you promise in the contract. All observable behaviors will be depended on by somebody.

Even with great engineers and strong code review, you can’t assume adherence to published contracts. Users depend on bugs, on hash ordering, on behaviors you never intended as part of the interface.
Being clear about your promises buys some flexibility, but it doesn’t buy immunity from breakage.

Policies that scale (and don’t)

“Does this scale?” is one of the most useful questions in the book. It can matter a little or a lot, depending on how long your code will live and how big your org will get.

Spotting policies with bad scaling properties is straightforward. Imagine the organization growing 10x or 100x. Does the amount of work per engineer grow with org size or codebase size? If yes, and there’s no automation to offset it, you have a scaling problem.

The traditional way of deprecating software, telling internal teams “you have 3 months until we delete this,” fails in large systems.
Dependencies are too complex, and a single build break ripples across the company.
Google replaced this with the “Churn Rule”: infrastructure teams do the migration themselves, or make the update backward-compatible. This works because expertise scales better than asking every team to independently ramp up, solve their local problem, and throw away the knowledge afterward.

The Beyoncé Rule: if an infrastructure change breaks your product but your CI tests didn’t catch it, that’s on you. “If you liked it, you should have put a CI test on it.”
This shifts responsibility to product teams to keep their tests in the CI system and frees infrastructure teams from tracking down every affected team. The more the org grows, the more valuable this becomes.

After a painful compiler upgrade, Google distilled their approach to three words: automate, consolidate, get good.
Automation lets one person do more. Consolidation limits the scope of low-level changes. Expertise makes the whole process faster each time.
Any painful experience is a chance to learn how to manage the complexity so the next time hurts less.

More specifically, codebase flexibility comes from five things: expertise (you’ve done this before), stability (fewer changes between releases because you adopt them regularly), conformity (less untouched code because you upgrade often), familiarity (you spot redundancies and automate them), and policy (rules like the Beyoncé Rule mean you only worry about what’s visible in CI).
These compound. The more often you do upgrades, the easier each one gets.

Trade-offs and decision-making

Google has a strong distaste for “because I said so.” Decisions need reasons, not authority. The goal is consensus, not unanimity. “I don’t agree with your metrics, but I see how you can come to that conclusion” is a valid outcome.

Cost isn’t just money. It’s engineering effort, CPU time, transaction costs, opportunity costs, societal costs. And biases (status quo bias, loss aversion) warp how we evaluate all of them.

In fields like software engineering, personnel cost usually dominates financial cost. Keeping engineers happy and focused can swing productivity by 10-20%.
Morale drops when you’re shipping boring stuff, or not shipping at all. Usually better to optimize for that than chase small efficiency gains that reduce work satisfaction.

The markers story is a perfect example of misplaced optimization. It goes…
Many organizations treat whiteboard markers like precious resources, tightly controlled, always in short supply. Half the markers at any whiteboard are dry.
Meetings get disrupted. Trains of thought derail. For a product that costs less than a dollar. Just buy a ton of packs and stay in flow. You’ll make more money than you spend on markers.

Being data-driven is a good start, but most decisions are a mix of data, assumption, precedent, and argument. When the data changes, be willing to reverse course. Leaders who admit mistakes are more respected, not less.

Not everything that matters is measurable. Part of a leader’s job is exercising judgment and asserting that something is important even when there’s no dashboard to prove it.

Teams and collaboration

Software engineering is a team endeavor. The lone genius programmer fantasy rarely maps to reality. Even world-changing work is almost always a spark of inspiration followed by a heroic team effort.
Would be insane to have a team of Michael Jordans only.

Engineers tend to build in secret, perhaps out of insecurity, then reveal their polished work months later. This is harmful. You risk fundamental design mistakes and building something the world has moved past by the time you emerge.
Reminds me of Hamming’s talk “You and Your Research”: don’t work with a closed door, metaphorical or not. The trick to staying relevant is staying in touch.

The book suggests small rooms of 4-8 people to encourage spontaneous conversation. Open-plan is bad, but so is one-engineer-one-closed-office.
I think the mechanism matters more than the layout, though. Remote work proves that the important thing isn’t how you group people in a room, it’s how you communicate within teams.

The DevOps philosophy makes the feedback point explicit: get feedback early and think about production early. Every upstream fix is cheaper than a downstream one.
Though there’s an upfront cost to shifting left, so opportunity cost still matters.

Humility, respect, and trust

The three pillars of working on a team:

  1. Humility — not omniscient or infallible, open to self-improvement.
  2. Respect — genuinely caring about others and appreciating their abilities.
  3. Trust — believing others are competent and letting them drive when appropriate.

Without these, empowering others is nearly impossible. You can’t do your best work watching everyone all the time, and neither can they.

You are not your code. Lots of people get defensive about code reviews and take criticism personally.
Not productive. Both sides of feedback are hard, but detaching your self-worth from your code makes it a lot easier to receive.

Google made these pillars concrete with their “Googleyness” definition: thrives in ambiguity, values feedback, challenges status quo, puts the user first, cares about the team, does the right thing.
Though from what I can tell, they don’t use “Googley” much anymore. They try to set more specific expectations now.

A good postmortem is blameless and structured: brief summary, timeline from discovery to resolution, root cause, impact assessment, immediate fixes with owners, prevention items, and lessons learned.

Leadership

Leaders serve their teams, not the other way around. The book pushes servant leadership: remove obstacles, build consensus, get your hands dirty when needed.

Traditional managers worry about how to get things done, whereas great managers worry about what things get done (and trust their team to figure out how to do it).

Don’t babysit employees. If you need to, the problem isn’t the employees. Empower them to make decisions. They are adults.

Hire people smarter than you. People you’d want to work for. People who can replace you — and then give them the chance to.
I’d hire for confidence, humility, curiosity, bias for action, agency, autonomy, competence.
Hiring wrong is far more expensive than the cost of finding the right person. Don’t lower the bar because you need to hire quickly.

Don’t ignore low performers. “Hope is not a strategy.” High performers end up pulling the load, and that kills morale.

Don’t try to be everyone’s best friend, either. When you hold power over someone’s career, friendship gets complicated. You can lead with a soft touch without being a close friend.

When someone asks for advice, don’t jump into solution mode. Ask questions. Help them solve it themselves.
People like ideas better when the ideas are their own.
Let others learn, even when you could do it faster. A junior spending three hours on something you could do in twenty minutes is frustrating, but it’s how teams grow.

The book frames scaled leadership around three principles:

  • Always Be Deciding — identify blinders, identify trade-offs, decide, iterate. No analysis paralysis.
  • Always Be Leaving — build an organization that solves problems without you. Ask yourself: what can I do that nobody else on my team can do?
  • Always Be Scaling — 95% observation and listening, 5% critical adjustments.

Put teams in charge of problems, not solutions. A team anchored to “we manage the Git repos” will resist switching to a better version control system. A team anchored to “we provide version control” is free to experiment.

The leader is always on stage. Your team watches how you react to small talk, how you handle email, how you carry yourself at lunch. If they read confidence, it spreads. If they read fear, that spreads too.

Shield your team from organizational chaos so they can perform. Share relevant information, but don’t distract them with politics that won’t actually affect them.
And let them know when they’re doing well. Don’t let it be all criticism and postmortems.

It’s easy to make easily reversible decisions. So when things are easily reversible, act quickly.

Knowledge sharing

Avoid single points of failure. The “let me take care of that for you” habit optimizes for short-term efficiency (“it’s faster for me to do it”) at the cost of long-term scalability. The team never learns.

Before removing something, understand why it’s there. Chesterton’s fence:

In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle… a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think.”
— G.K. Chesterton

You should always be in an environment where there’s something to learn. If not, you stagnate.

Psychological safety is the foundation for knowledge sharing. Start small: ask questions and write things down.

Measuring engineering productivity

Don’t measure unless the result is actionable. If no one will change behavior based on the outcome, don’t bother. And the person who asks for the measurement should be the one empowered to take action.

Google uses the Goals/Signals/Metrics (GSM) framework. Goals are desired end results, stated without measurement details. Signals are what you’d like to measure, which might not be directly measurable. Metrics are what you can actually measure, proxies for signals.
This prevents the streetlight effect (measuring what’s easy instead of what matters) and stops stakeholders from cherry-picking metrics after the fact to justify a preferred outcome.

QUANTS covers five components of engineering productivity: Quality of code, Attention from engineers, Intellectual complexity, Tempo and velocity, Satisfaction. They trade off against each other. Improving one can drive others down, so be aware of the full picture even if you’re only setting goals for a couple.

Testing

Test via public APIs, not implementation details. Test state, not interactions. Testing interactions makes your tests brittle, because any internal refactor forces you to rewrite them.

Test behaviors, not methods. A behavior is a guarantee about how a system responds to inputs in a particular state: “Given an empty bank account, when attempting to withdraw, then the transaction is rejected.”
One method can implement multiple behaviors, and one behavior can span multiple methods. Structuring tests around behaviors keeps them stable as the code evolves.

Code coverage is not a good measure of test quality. It becomes a goal in itself. Think about what you’re testing, not just that lines are being executed.

Prefer real implementations over test doubles. Fakes over stubs. Overuse of mocks leads to tests that are unclear and brittle.

The Beyoncé Rule shows up in the testing chapters too: “test everything you don’t want to break.”

Code is a liability

Code doesn’t bring value. Functionality does. Code is the cost you endure to get that functionality.
If you could get the same functionality from one line instead of ten thousand, you’d pick the one line.

Focus on value per unit of code. Add sparingly and remove the unnecessary, because Technical Debt accumulates when you don’t.

Removing things is often harder than building them, because users depend on the system beyond its original design (Hyrum’s Law again). But keeping dead code around has costs: maintenance, ecosystem confusion, drag on new feature development. Those costs are diffuse and hard to measure, which is exactly why they tend to be ignored.

Version control and trunk-based development

Google uses a massive monorepo with trunk-based development. Roughly 50,000 engineers, one repository, 60-70k commits per work day. They built a custom distributed VCS called Piper to handle the scale.

Development branches should be minimal and short-lived. Long-lived branches are not a good default.

Research from DORA/Accelerate has shown trunk-based development predicts high performance in development organizations.

Tooling and developer flow

Roughly 60-70% of developers build locally. A lot of productive time gets lost waiting for builds, and the cost of distributed build infrastructure almost always pays for itself.

Google built an internal code search tool that became central to productivity across the company. At one point it ran on Jeff Dean’s personal computer, which caused company-wide distress when he went on vacation and the machine was shut down.
Good build systems and code search are basic infrastructure.

Liked these notes? Join the newsletter.

Get notified whenever I post new notes.