I have Keanu Reeves to thank for one of my favourite life mottos: Be excellent to each other. It seems to be something that I can apply so often, in so many ways.
Take “working with other people on source code” for example.
For the sake of this post, I’m talking about large repository with many people working semi-independently on it. Those are most often found in companies, but there are some examples in the world of Open Source.
Setting the scene
There are always social expectations anywhere you have groups of people attempting to collaborate and get along with one another. You and whoever you live with? There are social expectations. You and the people you work with? There are social expectations. You and whoever you share a code repository with? There are social expectations.
One of the things that I’ve noticed is that people seldom think about the social expectations of the repos they inhabit. And I guess that’s only natural.
Maybe we don’t share our repo with anyone else. What we say should happen, happens, because we’re the only ones that are impacted by it.
For some of us, we share a repo with only a few people, and we tend to work fairly closely with them. It’s normally pretty easy to come to an agreement on whatever the hot-button contentious issue of the day is.
But, as teams grow, and the repository becomes larger, we end up at a point where we don’t necessarily know all the people. You’ll see things like code owners files appearing at this point, and the number of people who can commit at the root of the repository tends to be dramatically reduced.
And this is where we should be excellent to each other.
But, if we have the freedom to do whatever we want in a small repo, why would we possibly want to be in a larger one, let alone taking things to the extreme of a monorepo?
The more, the merrier
Well, for the same reason that living in cities is so popular: for the trade-off of some freedom and some additional costs, there are a whole heap of advantages, and these come from the scale of what you’re sharing.
For example, let’s take the classic bugbear of anyone working in a large repo: updating shared dependencies.
You might think thay updating a dependency in a small repo isn’t really that much work, and most of the time you’d be right. Someone gets assigned the task to do the update, they do the work, fix any issues, and they’re done. Simple.
Except, when you’re doing large scale programming, it’s not just one repo. There might be dozens (hundreds!) of repos. For each of those repos, an engineer must handle the update. Cumalatively, the aggregated engineering hours required to perform the update tends to be higher than changing a similar number of projects in a larger repo.
It’s all to do with context — if the update doesn’t cause any problems, and there are no weird bugs, then the cumulative effort of getting one engineer in one place to do one update is clearly lower than the cost of gettings dozens of engineers to do the same work across multiple repos. When things go wrong, the experience gained fixing one project can be applied to others in the same repo.
That is, in the best case scenario, the larger repo is cheaper overall to update than the smaller repos, even if it requires more work for the engineer doing the heavy lifting.
Put another way, smaller repos optimise for the micro-case. Larger repos allow optimisations for the macro-case. When you have limited engineering capacity and many projects, it most often makes sense to optimise for the macro-case.
But no repo is an island. We deploy our software. At some point, our code will need to integrate with someone else’s code. It’s only then that we find out our assumptions about how other projects will act are right or wrong.
With a small repo, we can iterate quickly. We can do so because we delay the point at which we integrate with others. But! It’s a truism in software development that the later a defect is found, the more expensive it is to fix. By delaying the point of integration, we’ve increased the cost of fixing any integration.
OTOH, while it’s deeply frustrating for someone working in a larger repo to find out that they’ve broken something, we’re front-loading the cost of integration. Intuitively, this means that the overall cost of this integration will be lower.
Put another way, smaller repos optimise for local changes, trading that for increased integration costs. Larger repos optimise for reduced overall integration costs, trading that for more effort being required to land a single change in the tree.
And then, there’s the costs of CI and build infrastructure. Unless a repo is particularly simple, there is likely to be some kind of build process, and some kind of CI pipeline. As the amount of code grows, the CI pipelines may slow down. We tend to end up with someone being assigned to “make the build faster”, or with dedicated build engineers.
Again, with many small repos, the individual costs may not be high (after all, these repos tend to be simpler by definition), but aggregated over an entire organisation, the total engineering cost tends to be higher. Larger repos can aggregate this cost into experts in the build process, who can focus on improving feedback loops for significantly more engineers.
Put another way, we can view the build as the fulcrum on which we’re trying to move the world. The larger the codebase, the longer our lever, and the more impact an improvement can have on people.
And these are only some of the ways that working in a larger codebase can be more efficient than working with the same amount of code spread across multiple repos.
But it hurts
There are major downsides to large repos: some technical, some social. I clearly have an agenda, but it would be wrong to ignore them entirely 😁
Technically, larger repos need a build tool that can cope with large repos. They take longer to clone. Thought needs to be put into making CI processes more efficient for repos with multiple projects, and not all CI tools are set up for this challenge. But the build tool is the kicker: over a certain size, most tools end up weeping gently and not working well at all. That’s one of the reasons I’m a fan of tools such as Bazel.
But the social issues are the important ones. Most of us have only ever spent time in smaller repos. It chafes and hurts to have to meet the social contract of working in a larger repo.
We want to update a dependency for ourselves: it seems like wasted effort on our part to have to upgrade everyone else. And, guess what? You’re right! It is more effort for you to do that update for everyone, but it’s still more efficient overall. And it ignores all the times when someone else has jumped through the upgrade hoops and silently updated something you depend on.
Our PR takes longer to land because the CI build tells us some project we know nothing about now has failing tests. What a pain! I don’t care about them! Figuring out what’s wrong with their code is slowing me down! And, guess what? You’re right! It is slowing you down. But, here’s the thing, overall it’s more efficient. You have the context for your change right there in your mind. While fixing a failing build or test is seldom fun, it’s still easier to fix it with better context. So, yes, it’s slower for you, but it’s faster overall.
A brief discussion about updating shared dependencies
It’s worth spending a bit of time thinking about updating dependencies. When talking to people who are sceptical about working in a large repo, this is normally presented as the number one problem to solve.
One argument against shared repos is that the difficulty of making dependency updates can lead to the ossification of the repo. Imagine someone wants to try an experiment in production with some fancy new library, but pulling this in will mean that a transitive dependency needs to be updated, and this causes some other service that they don’t care about to need work done. Do you try the experiment, or not? It takes engineering effort to do this work, and because the cost is higher in a shared repo, the possibility that the investigation isn’t worth that effort is higher.
So this is a good time to have a thoughtful conversation about the risk/reward trade-off that needs to be made. If I were working on this, the first thing I’d do is a quick spike to see if the work was as complicated as I feared.
This is why most large repos I’ve seen have had a mechanism in place to allow multiple versions of the same dependency to live in the same repo for a very short period of time, or allow people to release experiments from a relatively short-lived branch. In both of these cases, the choice is made thoughtfully and carefully. Yes, it does mean that sometimes less experimentation happens, but the question remains of the value of those experiments.
The other argument aginst a shared repo when discussing updating deps is what to do when you can’t clean up someone else’s code because you don’t have enough context about how it works. There’s a simple answer to this, but it’s not one that goes down very well: have A Conversation.
Now, this implies that the social contract in a large repo is that teams are aware that they’re sharing the repo with others, and they’re willing to be good neighbours. If someone comes to you asking for help in a part of the tree that you’re familiar with, then offering the help they need is a neighbourly thing to do.
Some people may recoil from this because they don’t like talking to people, and that’s unfortunate. Other people may shy away from having a conversation because they know the team they want to talk to has absolutely no capacity or time to help them. I’d suggest that running a team ragged like this isn’t necessarily in the best interests of the long-term health of the codebase. Another reason not to have this conversation is because the work culture precludes this, for whatever reason. If that’s the case, then the social pressures against having a small number of large repos will cause fragmentation and separation into smaller repos, no matter the engineering costs.
Social considerations almost always end up trumping technical concerns.
As a final note, I’ve observed that dependency updates (no matter the size of the repo) tend to be bi-modal: most tend to be pretty easy and straight-forward, but some turn out to be absolute monsters. At some point, I should blog some strategies for dealing with these.
Choose what to optimise for
Really, the social contract of a larger repo is that you accept that there will be times where what you want to do is slower and more difficult, because overall that discomfort will lead to reduced effort overall.
Conversely, the social contract of a smaller repo is that we’re optimising for our smaller team’s comfort, at the price of higher integration costs, and needing to be responsible for all updates to our dependencies ourselves.
Which really means that you have a choice: do you optimise for the smaller or larger repo? Do you choose to spend more engineering effort overall in a less visible way, or do you spend less engineering effort overall, but because integration happens sooner, in a more visible way?
To me, it’s obvious which approach I’d pick in almost all cases.