git annex - this is a separate bit of software that sits "atop" git and lets you (in it's words):
allows managing files with git, without checking the file contents into git.
Say what!? It basically stores your directory contents in git via symlinks, so really you are just checking in a bunch of symlinks - not content. This keeps the repo very small.
Why is this useful? Well annex then lets you fetch the actual file and keep things in sync. It's been described as 'your own personal dropbox'.
There's a proejct to make it more user friendly and 'just click' but it works perfectly well via the command line as if you were playing with git repos - so for people like us it's already "ready".
Assuming you've installed it somehow lets assume you have a folder of documents or large files or something that you don't want to check in direct to a repo but want to track and grab easily whenever:
cd ~/folder/
git init
git annex init
git annex add
git add .
git commit -am "Initial import"
Now the magic happens when git annex add
runs - it will go through everything
and replace it with a symlink to the real object within your .git
folder.
When you then run git add .
you're checking in symlinks.
In this case my documents folder is over 38Gb but the actual 'working copy' when checked out is 86Mb.
So go off to your netbook or home machine or whatnot and clone this new repo, eg:
git clone ssh://user@host/home/user/folder
Now you get a full copy of the structure - but no content. You can now do stuff like this:
git annex get *.pdf
git annex sync
If you make changes like renaming files etc you can simply commit and push. On another
remote a sync
will take those changes and apply them. If new files are created it
is able to look through it's various remotes and pull them in.
It even handles S3 buckets as a remote! Check it out, it could make something like document/video/big-data management much more 'in tune' with your source control.
So far I like it! If you can get around what happens when you do a ls -l
!