How Git Works From Inside Out
-
When you run
git init
, Git creates a hidden folder called.git
- this is where everything lives: commits, branches, tags, logs, etc. -
Git doesn’t track files directly - it tracks snapshots of your files, not diffs (unlike what most people think).
-
When you add a file with
git add
, Git compresses the file’s contents and stores it as a blob object in.git/objects/
. -
Git then creates a tree object - this records the folder structure (which blob belongs to which filename, in which folder).
-
When you commit with
git commit
, Git creates a commit object:- Points to the current tree (snapshot).
- Stores the author, message, timestamp.
- Links to the previous commit(s).
-
The commit is now a snapshot of your repo at that moment, not just a change - it’s a full version of what files looked like.
-
All of these blobs, trees, and commits are stored by SHA-1 hashes - every object in Git is content-addressed.
-
When you create a branch, Git just makes a simple pointer (a file in
.git/refs/heads/
) to a specific commit hash. -
When you run
git checkout
, Git:- Reads the commit pointed to by the branch.
- Rebuilds the working directory to match the tree in that commit.
-
When you make new commits on that branch, Git updates the branch pointer to the new commit - old commits are untouched.
-
git log
walks backward from your current commit following parent links - that’s your full history. -
git merge
creates a new commit with two parents - combining the histories of both branches. -
If you run
git reset
, you’re just moving the branch pointer (and optionally your working directory) to a different commit. -
If you delete a branch, the commits aren’t deleted until Git’s garbage collector runs, and only if nothing else references them.
-
git stash
saves your working directory changes into a hidden commit and reverts your directory back to the last commit - it’s like a temporary branch. -
git rebase
rewrites commit history by changing the parent commits - it’s literally reconstructing the graph with new commit objects. -
Pushing/pulling just sends these objects and refs across the network between
.git
folders (locally and remotely). -
The
.git
folder is the real repo - your working directory is just a temporary view of one snapshot.
Git Internals Deep Dive
The Object Database
Git’s internal storage system is remarkably simple but powerful:
-
Blob objects: Store file contents (not filenames)
- Each file is stored as a compressed blob
- Identical files (even in different directories) are stored only once
- Identified by SHA-1 hash of their content
-
Tree objects: Store directory structures
- List of pointers to blobs and subtrees
- Each entry has a mode (file permissions), type, filename, and SHA-1 pointer
- Represents a single directory/folder at a point in time
-
Commit objects: Points to a specific tree
- Contains metadata: author, committer, timestamp, message
- Points to parent commit(s)
- Multiple commits can point to the same tree (empty commits)
-
Tag objects: Named pointers to commits
- Can include additional metadata (annotated tags)
- Signed tags include cryptographic verification
Working Directory States
Git tracks files through different states:
- Untracked: Git doesn’t know about the file yet
- Tracked: Git is aware of this file
- Modified: Changed but not staged
- Staged: Marked to go into the next commit
- Committed: Safely stored in the repository
References and Pointers
Git maintains several types of references:
- Branches (refs/heads/*): Points to the latest commit in a line of development
- HEAD: Points to the current branch or commit you’re working on
- Remote branches (refs/remotes/*): Local copies of branches on remote repositories
- Tags (refs/tags/*): Named pointers to specific commits
The Index (Staging Area)
The index is a binary file in .git/index
that:
- Acts as a staging area between working directory and repository
- Contains a sorted list of paths with modes, object names, stage number
- Serves as a “snapshot of proposed next commit”
Git’s Content-Addressable System
- Every object is identified by its SHA-1 hash
- This means Git can verify data integrity
- Finding objects is fast (looking up by hash)
- Allows for deduplication (same content = same hash)
- Makes distributed repositories possible and efficient
Pack Files and Git’s Compression
To save space, Git occasionally runs garbage collection:
- Similar objects are stored as deltas against each other
- Objects are compressed using zlib
- Pack files store many objects in a single file
- Pack index provides fast lookup into pack files
- This dramatically reduces storage requirements
The Reflog
Git maintains a log of where your HEAD and branch references have been:
.git/logs/HEAD
records all changes to HEAD.git/logs/refs/heads/*
tracks branch movements- The reflog helps recover from mistakes (even after
git reset --hard
) - Entries expire after 90 days by default
Git Hooks
Git provides automation points at key events:
- Pre-commit: Run before a commit is finalized
- Post-commit: Run after a commit is created
- Pre-push: Run before pushing to a remote
- And many others
These can be used for linting, testing, enforcing commit message standards, and more.
Merging and Conflict Resolution
When Git merges branches:
- It finds the common ancestor (merge base)
- It applies changes from both branches
- If the same part was edited differently, it marks a conflict
- Conflict markers (
<<<<<<<
,=======
,>>>>>>>
) are inserted - You resolve by editing the file and creating a merge commit
Rebasing Under the Hood
When you rebase:
- Git finds the common ancestor of current branch and target branch
- Generates a diff for each commit on your branch
- Applies each diff one by one on top of the target branch
- Creates new commits with the same messages but different parents
- Moves your branch pointer to the new chain of commits
Transport Protocols
Git supports several transport protocols:
- Local: Direct access to repository on disk
- HTTP/HTTPS: Efficient, firewall-friendly protocol
- SSH: Secure authenticated access
- Git protocol: Fastest but least secure, no authentication
During transfers, Git only sends objects the other side doesn’t have, making operations extremely efficient.