Skip to main content

Command Palette

Search for a command to run...

Git Internals, Branching, and Diff – A Practical Deep Dive

Published
11 min readView as Markdown

If you are comfortable using git add, git commit, and git push but want to understand what really happens behind the scenes, this guide is for you.

We will go from “What is Git?” all the way to Git’s internal object model, branches as pointers, and how git diff actually compares changes.


1. What Is Git?

Git is a Distributed Version Control System (DVCS). That means every developer has:

  • A full copy of the repository (the .git directory)

  • Complete history including all branches, commits, and tags

  • The ability to work offline, commit, branch, merge, and later push to a remote

Unlike older centralized systems like CVS or SVN, Git does not depend on a single central repository for day-to-day work. The “server” is mostly used as a synchronization point for collaboration.

Centralized vs Distributed

FeatureCentralized VCS (CVS, SVN)Git (DVCS)
Server dependencyMandatoryNot required for local work
Offline workGenerally noYes
PerformanceSlower (network bound)Faster (most operations are local)
History storedOn serverFull history stored locally
Branching modelLimited or heavyLightweight branching and rebasing

Why Git Changed the Game

Git is powerful because of how it stores and tracks data:

  • Uses snapshots instead of just line-by-line diffs

  • Keeps immutable commit history

  • Performs operations locally which makes them fast

  • Supports advanced branching and merging strategies

  • Uses SHA-1 hashes to identify content and ensure integrity

  • Stores data efficiently with content deduplication


2. Core Git Terminology (With Internals)

TermPurposeBehind the Scenes
git initCreates a new repoMakes a .git/ directory with metadata and references
Working directoryWhere you edit filesFiles can be tracked or untracked
Staging areaPrepares content for commitStored in .git/index as metadata about staged files
CommitSaves a snapshot of project stateNew commit object with a SHA-1 that points to a tree and parent
PushSends commits to remoteUpdates refs on the remote such as .git/refs/heads/main
HEADCurrent branch pointerContent of .git/HEAD points to a branch ref
RefsNames for commits (branches, tags)Files in .git/refs/ and packed-refs

3. Git’s Core Object Model

Everything Git stores lives under .git/objects as one of four object types.

Object TypePurpose
BlobFile content (no filename, just data)
TreeDirectory structure: filenames, modes, and hashes
CommitSnapshot reference plus metadata and parent links
TagHuman-friendly named reference to a commit or object

Blob Objects

  • Represent raw file content

  • Compressed and stored by hash (content addressable)

  • Identified by a SHA-1 hash

echo "Hello World" > file.txt
git hash-object file.txt

Git creates a blob and stores it under .git/objects/<first2>/<restOfHash>.


Tree Objects

Trees describe directories and their contents. They map:

  • Filenames to blob or tree hashes

  • File modes and types (file, directory, executable)

You can inspect a tree with:

git ls-tree HEAD

This prints the tree for the current commit.


Commit Objects

A commit ties everything together:

  • Points to a tree (the root directory for that snapshot)

  • Stores author, committer, date, and message

  • Points to one or more parent commits

git cat-file -p HEAD

Example output:

tree 48ab23...
parent a1c9e7...
author Aditya <adi@devops.com>
committer Aditya <adi@devops.com>
date   Tue Apr 9 20:31:54 2024 +0530

Initial commit

4. The Git Commit DAG

Git history is not just a straight line. It is a Directed Acyclic Graph (DAG) of commits.

A --- B --- C --- D  (main)
       \
        E --- F      (feature)
  • Each commit points to its parent or parents

  • Merge commits have multiple parents

  • “Directed acyclic” means the graph has direction (parent to child) and no cycles

  • Git resolves branches and history using these parent relationships

This structure allows branching and merging without copying entire directories.


5. SHA-1 Hashing in Git

Git uses SHA-1 hashes like 9fceb02c... to uniquely identify:

  • Blobs

  • Trees

  • Commits

  • Tags

Properties:

  • Even a one-character change creates a completely different hash

  • The hash is based on the content, so Git is content addressable

  • Ensures data integrity and enables deduplication


6. What Actually Happens During Common Git Commands

git add file.txt

Behind the scenes:

  1. Git hashes the file content.

  2. Creates a blob object if that content does not already exist.

  3. Updates the .git/index (staging area) to reference this blob.

git commit -m "msg"

Git will:

  1. Write a tree object that represents the directory structure and staged blobs.

  2. Create a commit object that points to that tree and the previous commit.

  3. Move HEAD (and the current branch ref) to this new commit.

git push

  • Sends new commits and updated refs to the remote repository over HTTP/SSH.

  • Remote refs like refs/heads/main are updated to point to the latest commit.


7. Inside the .git Directory

A typical .git directory looks like this:

.git/
├── HEAD          # Pointer to the current branch
├── config        # Repo configuration
├── index         # Staging area (binary)
├── objects/      # All Git objects (blobs, trees, commits, tags)
└── refs/         # Branch and tag references
    ├── heads/    # Local branches
    └── tags/     # Tags

8. Explore Git Internals Yourself

Try this small lab in any folder:

# Create repo and commit
git init
echo "Hello Git" > hello.txt
git add hello.txt
git commit -m "Initial commit"

# Inspect internals
ls .git/objects
git cat-file -p HEAD
git cat-file -p <commit-hash>
git cat-file -p <tree-hash>

This makes the theory very concrete.


9. Git Branch Management (Concepts + Internals)

What Is a Git Branch?

A branch is a lightweight movable pointer to a commit. It is not a separate copy of all files. Think of it like a bookmark in the commit DAG.

  • You can move the branch by committing new changes.

  • You can create new branches without duplicating data.

How Branches Work Under the Hood

  • HEAD points to your current branch.

  • Each branch is a file under .git/refs/heads/<branch-name> that contains one SHA-1 commit hash.

  • Creating a branch is just writing a new file with a hash.

Example:

git branch feature-x
cat .git/refs/heads/feature-x

You will see a commit hash like:

3adf23d5d61e2e6aee43b00e93e93e1e9e32a012

When you make new commits on feature-x, Git updates this file with the new commit hash.


Essential Branch Commands with Practical Meaning

TaskCommandScenario
Create branchgit branch feature-xStart a new login feature
Switch to branchgit switch feature-x or git checkout feature-xBegin working on that feature
Create and switchgit checkout -b feature-xOne shot: create then move to new branch
List local branchesgit branchSee which branches you have locally
Delete merged branchgit branch -d feature-xCleanup after merging
Force delete branchgit branch -D feature-xDelete even if not merged
Show remote branchesgit branch -rSee branches on origin / remote
Show all (local + remote)git branch -aFull list

Small Branching Assignment

git init
echo "Hello World" > hello.txt
git add . && git commit -m "Initial commit"

# Create and use a feature branch
git branch feature-login
git checkout feature-login
echo "Login Page" > login.html
git add . && git commit -m "Add login page"

Now go back to main:

git checkout main
ls

You will notice login.html is not there on main. That is branch isolation in action.


Branch Naming Conventions for Teams

Use meaningful, consistent names:

Branch TypeExamplePurpose
Mainlinemain or masterProduction ready code
IntegrationdevelopIntegration and testing
Featurefeature/payment-integrationNew features
Bugfixbugfix/login-crashNon critical bug fixes
Hotfixhotfix/critical-vulnUrgent fixes for production
Releaserelease/v1.2.0Stabilization before release

1. Git Flow (Vincent Driessen Model)

Good for large, release based projects.

Branches:

  • main for production

  • develop for integration

  • feature/*, release/*, hotfix/*

Example flow:

# Start from develop
git checkout develop
git checkout -b feature/cart-page

# After work
git checkout develop
git merge feature/cart-page

# Prepare a release
git checkout -b release/v1.0

# After testing
git checkout main
git merge release/v1.0
git tag v1.0

# Sync back to develop
git checkout develop
git merge release/v1.0

2. GitHub Flow

Ideal for continuous deployment and simpler workflows:

  • Only main as the long lived branch

  • Short lived feature branches and Pull Requests

Example:

git checkout -b feature/signup-form main
# Work, commit, push, open PR, review, merge

3. Trunk Based Development

Used by many large scale teams:

  • Everyone commits frequently to main (or trunk)

  • Uses feature flags to hide incomplete features

  • Requires strong CI and automated testing


Best Practices for Branch Management

  • Use descriptive names like feature/signup-form

  • Delete stale branches regularly

  • Merge or rebase frequently so branches do not drift too far

  • Prefer rebase locally, merge on remote (via PR) for cleaner history

  • Use git fetch --prune to clean up stale remote tracking branches

  • Use PR templates for consistent reviews

Common Branching Issues and Fixes

IssuePossible Fix
Branch started from wrong basegit rebase correct-base
Extra unwanted commits in a branchgit reset or git cherry-pick
Merge conflictResolve manually, then git add and git commit
Old remote refs after deletion on servergit fetch --prune

Practice Ideas

  1. Simulate a full Git Flow cycle from develop to main.

  2. Intentionally create a merge conflict and resolve it.

  3. Create and delete a few dummy branches both locally and on remote.

  4. Create a PR in GitHub or GitLab from a feature branch.


10. git diff in Depth

What Is git diff?

git diff shows line by line changes between two versions of your project. It can compare:

  • Working directory vs staging area

  • Staging area vs last commit

  • Any two commits

  • Any two branches

Common git diff Commands

CommandComparesUse Case
git diffWorking directory vs stagingSee unstaged changes
git diff --cachedStaging vs last commitSee what will be committed
git diff HEADWorking directory vs last commitAll uncommitted changes
git diff A BCommit A vs Commit BCompare two points in history
git diff branch1..branch2branch1 vs branch2See what changed between branches

Practical Examples

# Before staging
git diff

# After staging
git add file.txt
git diff --cached

# Between two commits
git diff 4e9f5d1 29c3dd2

# Between branches
git diff develop..feature-xyz

Understanding git diff Output

Example snippet:

diff --git a/index.html b/index.html
index 83db48f..bf2692e 100644
--- a/index.html
+++ b/index.html
@@ -5,7 +5,7 @@
<title>Homepage</title>
-<h1>Hello World</h1>
+<h1>Welcome to Git Course</h1>

Key parts:

  • --- and +++ show the file before and after.

  • @@ -5,7 +5,7 @@ shows the affected line ranges.

  • Lines starting with - are removed.

  • Lines starting with + are added.

Advanced Diff Usage

  • Last 3 commits vs current:

      git diff HEAD~3 HEAD
    
  • Staged vs last commit:

      git diff --cached
    
  • Word by word diff:

      git diff --word-diff
    
  • Summary statistics:

      git diff --stat
    

Tips:

  • Use git difftool to launch graphical comparison tools.

  • Example alias:

      git config --global alias.diffs "diff --stat --cached"
    

11. Git in One Line: Snapshots + DAG + Content Hashing

Git is essentially:

Core ConceptBenefit
Snapshot basedEach commit stores the full project state
DAG structurePowerful branching, merging, and rebasing
SHA-1 hashingStrong data integrity and automatic deduplication
Local repositoryFast operations and full offline capability
Immutable commitsClean audit trail and trustworthy history

12. Final Takeaways

  • Git is not just “a VCS”. It is a snapshot engine with a content addressed filesystem.

  • Every commit records a complete tree snapshot plus metadata.

  • Blobs store content, trees organize it, commits record history, refs name points in that history.

  • Git’s internals (SHA-1 hashes, DAG of commits, and local objects) are what make it fast, robust, and scalable for teams of any size.