Git: Merge, Cherry-Pick & Rebase - header image

Git: Merge, Cherry-Pick & Rebase

An unconventional guide

Last updated on January 26, 2022 - 6 comments

(Buy now if you're already convinced!)

You can use this guide to get a deep understanding of how Git's merges, rebases & cherry-picks work under the hood, so that you'll never fear them again.

(Editor’s note: At ~5500 words, you probably don’t want to try reading this on a mobile device. Bookmark it and come back later. And even on a desktop, eat read this elephant one bite at a time.)

Introduction

Sure, everyone and their grandmother use Git and seems to be comfortable with it.

But did you ever botch a merge and then your solution was to delete and re-clone your repository? Without quite knowing what went wrong and why?

Or did a rebase suddenly make tens of merge conflicts pop up, one after another and you didn’t know what the hell was going on?

In short, do you have nagging doubts, whenever it comes to merging, rebasing and cherry-picking?

Fear not, you’ve come to the right place: The remainder of this guide will help you get rid of those fears.

(Teaser: By the end of this article, you’ll understand that a git cherry-pick is essentially just a git merge. And a git rebase is essentially just a git cherry-pick? Sounds crazy? Read on!)

Git Storage Internals

Before you jump right into the nitty-gritty details of merging, let’s have a look at how Git stores your files and commits.

It might seem a bit weird to start off with internal details, but take a leap of faith: Those internals are the building block for everything else in this guide, so you’ll need to know them first.

Scenario: Committing Two Files

Open up your terminal and execute the following commands.

# create a git repo in a directory of your liking

mkdir gitinternals
cd gitinternals
git init -b main

## add two .txt files and commit them

echo "-TODO-" > LICENSE.txt
echo "a marcobehler.com guide" > README.txt

git add LICENSE.txt
git add README.txt
git commit -m "Project Setup"

## update README.txt's contents

echo "a git guide" > README.txt

git add README.txt
git commit -m "Updated README"

You created two .txt files in a first commit, then updated the contents of one file (README.txt) in a second commit.

Here’s a question for you: How do you think Git will store those two commits, or rather the two versions of README.txt?

  • Will it store full files, i.e. a marcobehler.com guide AND a git guide, somewhere?

  • Will it store deltas, something like a (-marcobehler.com)(+git) guide (pseudo-code)?

Bonus question: How the hell would the answer to this help with merging or rebasing?

Let’s find out!

Inspecting Git repos: 'git cat-file'

Let’s execute a git log in your repository, and you’ll get output similar to this:

# in your repository's directory
git log

# Project Setup

commit 142e5cf36d9f2047f24341883bd564b1d5170370 (HEAD -> main)
Author: Marco Behler <marco@marcobehler.com>
Date:   Tue Dec 28 09:54:44 2021 +0100

    Updated README

commit 715247c8426d3c16881539118e1eafeb38439b1c
Author: Marco Behler <marco@marcobehler.com>
Date:   Tue Dec 28 09:54:25 2021 +0100

    Project Setup

So far, nothing surprising - you’ll see your two commits. Something that you’ve seen, but probably ignored plenty of times are commit ids. Here’s the second commit’s id.

commit 142e5cf36d9f2047f24341883bd564b1d5170370

More specifically, 142e5cf36d9f2047f24341883bd564b1d5170370 is not just a random id, it’s a SHA-1 hash.

But, what exactly has been hashed here?

Instead of spoiling the answer, let’s use another built-in git command: git cat-file. It basically allows you to have a look at something which git stores somewhere in your repository’s .git folder, given that you happen to know its SHA1-hash. Sounds useful, right?

Execute the following command (and make sure to try this with the SHA1 hash that you are getting for your commit)

# make sure to change the SHA1-hash!
git cat-file -p 142e5cf36d9f2047f24341883bd564b1d5170370

(Note: The -p option makes sure to pretty-print its output.)

You’ll get output similar to this:

# git cat-file's output
tree c4548e069652a6825894699ef7740a620ea0a6a8
parent 715247c8426d3c16881539118e1eafeb38439b1c
author Marco Behler <marco@marcobehler.com> 1641459065 +0100
committer Marco Behler <marco@marcobehler.com> 1641459065 +0100

Updated README

Tada! This is what a commit looks like in Git. It’s a text file with…​6 lines (well 5, and an empty one to delimit your commit message from the rest). Yes, really.

And if you put those lines into a sha1sum(), function you’ll end up with your SHA1 hash : 142e5cf36d9f2047f24341883bd564b1d5170370!

For the advanced reader, Git doesn't exactly do sha1sum(filecontent), it actually does a sha1sum(header + filecontent) - but we'll cover this in a bit.

Now, some of those lines from your commit (file) you’ll be familiar with:

# who committed the file?
committer Marco Behler <marco@marcobehler.com> 1641459065 +0100

# what's the commit message?
Updated README

Whereas some other parts of the commit probably look unfamiliar:

tree c4548e069652a6825894699ef7740a620ea0a6a8
parent 715247c8426d3c16881539118e1eafeb38439b1c

Let’s (rightly) assume for now that parent(s) simply references the commit that came before the current commit. Then, what does the tree line stand for? Execute another git cat-file to find out!

# make sure to change the SHA1-hash to that of your tree!
git cat-file -p c4548e069652a6825894699ef7740a620ea0a6a8

Look, this tree seems to be yet another text file, referencing (snapshots of) all the files in your repository at the time of the commit!

100644 blob ddd3b7b6335a636af9a9241096455e834f12f636    LICENSE.txt
100644 blob 773fc76fe191ceff24259d4e66efc90e86093b0c    README.txt

Can this be true? Well, you’ll find out by doing one last git cat-file, this time using README.txt’s hash.

git cat-file -p 773fc76fe191ceff24259d4e66efc90e86093b0c

Which leads to the following output:

"a git guide"

Does this look familiar? Yes, it is a snapshot of your README.txt file, at the time of the second commit, i.e. when you updated the readme. Which means that it does look like Git stores the full file contents for every commit (assuming the contents have changed)?

Well, to be sure, let’s repeat the git cat-file game for the first commit (which serves as a great exercise, so refer back to the git log output and repeat the steps!). You’ll end up with something like this:

# cat'ing README.txt snapshotted during the first commit
git cat-file -p fe066d3f7568e13ef031b495e35c94be91b6366c

"a marcobehler.com guide"

Take-Away: Git doesn’t store deltas between commits, it always stores snapshots, i.e. the full file, for every commit (as long as the file changed and its SHA1-hash is not already in your repository).

This is also the reason why Git is not a great choice for projects with (mostly) many binary assets, that frequently change.

What others are saying

Share

Comments (read-only)

6 comments

AnonymousSeptember 10, 2024
Great article. Git isn't the easiest concept in the software development. I doubt if it is really possible to master this completely. This article clears the mist from some of the most commonly used features in Git. Looking forward to additional articles on working with git reflog, and other future topics mentioned above.
AnonymousJune 07, 2024
In the cherry-pick algorithm you mention step 2: A diff between the cherry-picked’s parent and your current commit, will give you all the changes that Git needs to apply, to go from the parent, to your current state. Why is this needed? , shouldn't we only be using the diff between cherry-pick commit and its parent?(step1) for example: lets call the cherry pick parent commit as P, the cherrypick commit as C and current state as H (before cherry-pick) lets say at commit H there is file a.txt, at commit P there are two files a.txt and b.txt (b.txt was introduced at some point before P, this file is not present on the H commit branch) at commit C there are three files a.txt, b.txt, c.txt, (commit C introduced new file c.txt) during cherrypick if step 2 is applied that would imply that after the cherry pick is done b.txt would appear in final state, but that is not the observed behaviour. Can you please clarify maybe i didn't understand it correctly
Alan JacobJune 07, 2024
In the cherry-pick algorithm you mention step 2:

A diff between the cherry-picked’s parent and your current commit, will give you all the changes that Git needs to apply, to go from the parent, to your current state.

Why is this needed? , shouldn't we only be using the diff between cherry-pick commit and its parent?(step1)

for example:

lets call the cherry pick parent commit as P, the cherrypick commit as C and current state as H (before cherry-pick)

lets say at commit H there is file a.txt,

at commit P there are two files a.txt and b.txt (b.txt was introduced at some point before P, this file is not present on the H commit branch)

at commit C there are three files a.txt, b.txt, c.txt, (commit C introduced new file c.txt)

during cherrypick if step 2 is applied that would imply that after the cherry pick is done b.txt would appear in final state, but that is not the observed behaviour.

Can you please clarify maybe i didn't understand it correctly
AnonymousMarch 09, 2022
I bought this thinking it was complete. Looks like it is still in the works. Is that right? Will there be a downloadable PDF and video courses for this? Thanks!
Marco BehlerMarch 09, 2022
Hi there, it actually is complete, not in the works. There's a few additional topics that might be added in a future revision, but I wanted to wait for some feedback from users first. As for PDF/video, nope, nothing concrete planned. Do you prefer video?
Intelygenz January 28, 2022
If I accept the changes from the first-to-be-rebased commit then the second rebase step will not generate any merge conflicts as diff 2 candidate, i.e. diff between first-to-be-rebased commit and the newly cherry-picked first-to-be-rebased commit *, would be empty (the LICENSE.txt file in both are identical).

That being said, I didn't have to give different names for the commits using this strategy. Is there something that I'm missing?
Marco BehlerJanuary 28, 2022
Yup, correct < clap > :)

As for editing the commit message, you mean when there was a conflict or when there wasn't one?
Intelygenz January 28, 2022
Both
AnonymousJanuary 18, 2022
Just curious where this commit came from?
# cat'ing the first commit's file-tree/file-hash
git cat-file -p fe066d3f7568e13ef031b495e35c94be91b6366c
Marco BehlerJanuary 18, 2022
Ha, the wording isn't great. That's not the hash of the first commit, but the hash of the README.txt file snapshotted during the first commit. And for that, you'll have to go through log -> commit -> tree -> files. I'll fix the wording, thanks for the feedback.

let mut author = ?

I'm @MarcoBehler and I share everything I know about making awesome software through my guides, screencasts, talks and courses.

Follow me on Twitter to find out what I'm currently working on.