Skip to content

Instantly share code, notes, and snippets.

@kvnsmth
Last active March 5, 2023 21:58
Show Gist options
  • Save kvnsmth/4688345 to your computer and use it in GitHub Desktop.
Save kvnsmth/4688345 to your computer and use it in GitHub Desktop.
A real world usage for git subtrees.

Let's say you have an iOS project, and you want to use some external library, like AFNetworking. How do you integrate it?

With submodules

Add the project to your repo:

git submodule add [email protected]:AFNetworking/AFNetworking.git Vendor/AFNetworking

or something to that effect.

Well, what happens if you find a bug in AFNetworking that you want to fix? With submodules, I'd usually

  1. Fork AFNetworking
  2. Go through the pain of changing my project's submodules to point to my new fork
  3. Make my change and commit it to my fork
  4. Submit a pull request to AFNetworking repo
  5. Wait to see if the pull request is accepted, but keep my fork up-to-date in the meantime
  6. If it is accepted, do the whole dance to switch my submodule back to the official AFNetworking repo
  7. Continue as usual

Okay, that sucks. If you've ever done it, you know how painful it is and how finicky submodules can be.

With subtrees

Add the project to your repo:

git subtree add --prefix=Vendor/AFNetworking --squash [email protected]:AFNetworking/AFNetworking.git master

This is pretty similar so far except the other members of your team won't have to remember to run git submodule update because subtrees actually store the source in your repo. Nice.

Now, let's say we have the same bug. What do I do differently now that I'm using subtrees?

I make my change and commit it to my project's repository. Technically, I could stop now if I wanted to since the bug fixed code is in my repository. But, I want to be a good open source citizen, so what do I do?

Be a good open source citizen

  1. I'll fork AFNetworking into my account on Github.
  2. Back in my local repo:
git subtree split --prefix=Vendor/AFNetworking/ --branch AFNetworking

to set up being able to push changes to my fork. 3. I'll push my change to my fork, but on a branch to make the pull request more awesome.

git push [email protected]:kvnsmth/AFNetworking.git AFNetworking:critical-bug-fix
  1. I would issue a pull request and hope it gets accepted, but a big difference is that the acceptance of my change doesn't keep me from being able to easily stay in sync with the official AFNetworking repo.

I can still do:

git subtree pull --prefix=Vendor/AFNetworking --squash [email protected]:AFNetworking/AFNetworking.git master

to stay up-to-date with the latest in the official repository.

Now, I think that is much better than using submodules and a lot less invasive to my repo.

@funkytaco
Copy link

Curious about everyone's current thoughts as well as I'm researching git submodules versus subtrees.

@luca-ing
Copy link

@funkytaco I'm curious as well.

The thing that I find offputting about subtrees is exactly what people praise about them: they hide the existence of other repos a little too well for my taste.

If I clone a repo that contains subtrees, I will not really notice. Whatever changes I make may not find their way back out to the original repo that created the subtree. I'm surprised that nobody else seems to mind this at all.
I fear that while it is robust on my own repo (it will not break, as easily happens with submodules), the cohesion between the linked repos weakens. This is bad if you're developing e.g. a library in one repo and an application that uses your library in a separate repo (which would make a lot of sense to me).
You have to manually split out your changes and apply them to the remote repo.

OTOH submodules are very brittle, so I'm reluctant to use them as well.

I have no good answer right now.

@gayanpathirage
Copy link

Could someone comment about this from enterprise perspective, e.g. company managing more than 100 repos developed and shared between 100s of developers. (NOTE: Each repo is linked as dependencies e.g. Libraries)

@munjeli
Copy link

munjeli commented Apr 12, 2016

@gayanpathirage
I'm doing enterprise infrastructure development for a consultancy and can speak to scale a bit in the context of DevOps if not dev - I'm preparing a conference talk on 'Metaprogramming in Metarepositories' I've gone to subtrees rather than submodules because of the difficulty for users, but I really liked programming with submodules to expose the service dependencies when I'm programming across repositories. I love it as an infrastructure dev environment, but am still working to understand where it otherwise makes sense.

Regarding repositories with hundreds of subs - there's use cases where it seems to make sense (one company I worked with used a metarepository to integrate multiple chef-repos) but you need to write tooling to handle it effectively and abstract away the risk of corrupting the repository through mishandling. It can also be slow to pull, obviously, and parallel pulling is essential, or working on a propagated box. I use a small metarepository on a buildserver to integrate a handful of applications with their deploy code, and it's a bit slow to pull, but once it's there I can do a lot of builds quickly and there's better transparency for failure around the automation as all the code is in the same place. I have dev push and pull into the individual repositories, and the metarepository is largely for automation.

Generally, devs don't like metarepositories! Too much source control complexity! But metarepos can model actual infrastructure through source dependencies, and that model can go a long way to explaining your infrastructure patterns. I don't like the idea of HUGE metarepositories; the pulling is going to be messy or risky; it's not going to be efficient generally, it's a way to distribute an entire collection of applications and libraries for exploration or running automation rather than everyday contributions.

@YueLinHo
Copy link

Could someone comment about this from enterprise perspective

the entire Windows codebase is moving to a single Git repo... and From the Design History:

Submodules
...
In the end, we dropped that approach, because it created nearly as many problems as it fixed. 

For one, we found that we were complicating people’s workflows
...

Second, it’s not really possible to do atomic commits or pushes across multiple repos
...

And third, most developers are not interested in becoming version control experts, 
...

Can't find anything about subtree there, perhaps it's not even an option. :P

Another ref.:

We started down at least 2 failed paths to scale Git.
Probably the most extensive one was to use Git submodules to stitch together lots of repos into a single “super” repo.
I won’t go into details but after 6 months of working on that we realized it wasn’t going to work
– too many edge cases, too much complexity and fragility.
We needed a bulletproof solution that would be well supported by almost all Git tooling.

(From https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment