# create a git repo in a directory of your liking
mkdir gitinternals
cd gitinternals
git init -b main
## add two .txt files and commit them
echo "-TODO-" > LICENSE.txt
echo "a marcobehler.com guide" > README.txt
git add LICENSE.txt
git add README.txt
git commit -m "Project Setup"
## update README.txt's contents
echo "a git guide" > README.txt
git add README.txt
git commit -m "Updated README"
Git: Merge, Cherry-Pick & Rebase
An unconventional guide
Last updated on January 26, 2022 -
(Buy now if you're already convinced!)
You can use this guide to get a deep understanding of how Git's merges, rebases & cherry-picks work under the hood, so that you'll never fear them again.
(Editor’s note: At ~5500 words, you probably don’t want to try reading this on a mobile device. Bookmark it and come back later. And even on a desktop, eat read this elephant one bite at a time.)
Introduction
Sure, everyone and their grandmother use Git and seems to be comfortable with it.
But did you ever botch a merge and then your solution was to delete and re-clone your repository? Without quite knowing what went wrong and why?
Or did a rebase suddenly make tens of merge conflicts pop up, one after another and you didn’t know what the hell was going on?
In short, do you have nagging doubts, whenever it comes to merging, rebasing and cherry-picking?
Fear not, you’ve come to the right place: The remainder of this guide will help you get rid of those fears.
(Teaser: By the end of this article, you’ll understand that a git cherry-pick
is essentially just a git merge
. And a git rebase
is essentially just a git cherry-pick
? Sounds crazy? Read on!)
Git Storage Internals
Before you jump right into the nitty-gritty details of merging, let’s have a look at how Git stores your files and commits.
It might seem a bit weird to start off with internal details, but take a leap of faith: Those internals are the building block for everything else in this guide, so you’ll need to know them first.
Scenario: Committing Two Files
Open up your terminal and execute the following commands.
You created two .txt files in a first commit, then updated the contents of one file (README.txt
) in a second commit.
Here’s a question for you: How do you think Git will store those two commits, or rather the two versions of README.txt
?
-
Will it store full files, i.e.
a marcobehler.com guide
ANDa git guide
, somewhere? -
Will it store deltas, something like
a (-marcobehler.com)(+git) guide
(pseudo-code)?
Bonus question: How the hell would the answer to this help with merging or rebasing?
Let’s find out!
Inspecting Git repos: 'git cat-file'
Let’s execute a git log
in your repository, and you’ll get output similar to this:
# in your repository's directory
git log
# Project Setup
commit 142e5cf36d9f2047f24341883bd564b1d5170370 (HEAD -> main)
Author: Marco Behler <marco@marcobehler.com>
Date: Tue Dec 28 09:54:44 2021 +0100
Updated README
commit 715247c8426d3c16881539118e1eafeb38439b1c
Author: Marco Behler <marco@marcobehler.com>
Date: Tue Dec 28 09:54:25 2021 +0100
Project Setup
So far, nothing surprising - you’ll see your two commits. Something that you’ve seen, but probably ignored plenty of times are commit ids
. Here’s the second commit’s id.
commit 142e5cf36d9f2047f24341883bd564b1d5170370
More specifically, 142e5cf36d9f2047f24341883bd564b1d5170370
is not just a random id, it’s a SHA-1 hash.
But, what exactly has been hashed here?
Instead of spoiling the answer, let’s use another built-in git command: git cat-file
. It basically allows you to have a look at something which git stores somewhere in your repository’s .git
folder, given that you happen to know its SHA1-hash. Sounds useful, right?
Execute the following command (and make sure to try this with the SHA1 hash that you are getting for your commit)
# make sure to change the SHA1-hash!
git cat-file -p 142e5cf36d9f2047f24341883bd564b1d5170370
(Note: The -p
option makes sure to pretty-print its output.)
You’ll get output similar to this:
# git cat-file's output
tree c4548e069652a6825894699ef7740a620ea0a6a8
parent 715247c8426d3c16881539118e1eafeb38439b1c
author Marco Behler <marco@marcobehler.com> 1641459065 +0100
committer Marco Behler <marco@marcobehler.com> 1641459065 +0100
Updated README
Tada! This is what a commit looks like in Git. It’s a text file with…6 lines (well 5, and an empty one to delimit your commit message from the rest). Yes, really.
And if you put those lines into a sha1sum()
, function you’ll end up with your SHA1 hash : 142e5cf36d9f2047f24341883bd564b1d5170370
!
Now, some of those lines from your commit (file) you’ll be familiar with:
# who committed the file?
committer Marco Behler <marco@marcobehler.com> 1641459065 +0100
# what's the commit message?
Updated README
Whereas some other parts of the commit probably look unfamiliar:
tree c4548e069652a6825894699ef7740a620ea0a6a8
parent 715247c8426d3c16881539118e1eafeb38439b1c
Let’s (rightly) assume for now that parent(s)
simply references the commit that came before the current commit. Then, what does the tree
line stand for? Execute another git cat-file
to find out!
# make sure to change the SHA1-hash to that of your tree!
git cat-file -p c4548e069652a6825894699ef7740a620ea0a6a8
Look, this tree seems to be yet another text file, referencing (snapshots of) all the files in your repository at the time of the commit!
100644 blob ddd3b7b6335a636af9a9241096455e834f12f636 LICENSE.txt
100644 blob 773fc76fe191ceff24259d4e66efc90e86093b0c README.txt
Can this be true? Well, you’ll find out by doing one last git cat-file
, this time using README.txt’s
hash.
git cat-file -p 773fc76fe191ceff24259d4e66efc90e86093b0c
Which leads to the following output:
"a git guide"
Does this look familiar? Yes, it is a snapshot of your README.txt
file, at the time of the second commit, i.e. when you updated the readme. Which means that it does look like Git stores the full file contents for every commit (assuming the contents have changed)?
Well, to be sure, let’s repeat the git cat-file
game for the first commit (which serves as a great exercise, so refer back to the git log
output and repeat the steps!). You’ll end up with something like this:
# cat'ing README.txt snapshotted during the first commit
git cat-file -p fe066d3f7568e13ef031b495e35c94be91b6366c
"a marcobehler.com guide"
Take-Away: Git doesn’t store deltas between commits, it always stores snapshots, i.e. the full file, for every commit (as long as the file changed and its SHA1-hash is not already in your repository).
Get Full Access
Table of Contents. All prices include VAT. 21-day Money-Back Guarantee.
personal
online access just for me!
Tip: Most employers will reimburse this purchasecompany
online access for my team(s)
What others are saying
I've recently bought this guide https://t.co/zruAhoCfzd written by @MarcoBehler and it's definitely worth every penny! If git is your daily companion,do yourself a favour and buy this guide!
— Paolo Importuni (@paolo_tn) February 6, 2022
If you want to step up your skills in the Git game, I couldn't recommand you more @MarcoBehler 's guide: https://t.co/XnQIxcoIw5
— Christian German (@christiandev35) January 25, 2022
What's happen under the hood, explained with clear, comprehensive examples, helps a lot.
From time to time the GIT "magic" used to "stop working" and create an embarrassing mess. Thanks to the new course from @MarcoBehler , I finally filled the gap and now fully understand what is going on behind the scenes.
— Zoran Bogatinoski (@ZBogatinoski) January 17, 2022
Keep up the good work.👋
I definitely learned a thing or two! https://t.co/Jx6EanRZYZ
— Lukas Eder (@lukaseder) January 14, 2022
Learning git almost only by trial and error, I was always under the impression that the git internals are too complex 🤯@MarcoBehler's new git guide provides dead-simple explanations for what seems to be complicated (merging, rebasing, cherry-pick) 💯https://t.co/N9zzjGiy5z
— Philip Riecks (@rieckpil) January 28, 2022
In our team we use rebase, squash and cherry-picking squashed commits on a daily basis.
— Michael Simons (@rotnroll666) January 17, 2022
Thanks to @MarcoBehler I learned in stunning detail what's behind the scene… (Very much graph related, btw).
You can do too with Marcos unconvential guide https://t.co/j67Vhk2sZE
I bought and read this "Git Merge & Rebase: An unconventional guide" by @MarcoBehler and it's mind blowing.
— Siva (@sivalabs) January 14, 2022
If u r like me who restricted ur git usage to clone/commit/push/pull because u don't want to waste ur time because of weird git issues, then you should definitely read it. https://t.co/a5nY99RP33
Reading @MarcoBehler's new #git guide managed to fill in more than a few blanks in my understanding of how git actually works. Starting with the fact that git doesn't store diffs...💡https://t.co/xtLLY4HnpD
— Andreas Eisele (@ae____) January 14, 2022
Comments
let mut author = ?
I'm @MarcoBehler and I share everything I know about making awesome software through my guides, screencasts, talks and courses.
Follow me on Twitter to find out what I'm currently working on.