How Git Works From Inside Out

When you run git init, Git creates a hidden folder called .git - this is where everything lives: commits, branches, tags, logs, etc.
Git doesn’t track files directly - it tracks snapshots of your files, not diffs (unlike what most people think).
When you add a file with git add, Git compresses the file’s contents and stores it as a blob object in .git/objects/.
Git then creates a tree object - this records the folder structure (which blob belongs to which filename, in which folder).
When you commit with git commit, Git creates a commit object:
- Points to the current tree (snapshot).
- Stores the author, message, timestamp.
- Links to the previous commit(s).
The commit is now a snapshot of your repo at that moment, not just a change - it’s a full version of what files looked like.
All of these blobs, trees, and commits are stored by SHA-1 hashes - every object in Git is content-addressed.
When you create a branch, Git just makes a simple pointer (a file in .git/refs/heads/) to a specific commit hash.
When you run git checkout, Git:
- Reads the commit pointed to by the branch.
- Rebuilds the working directory to match the tree in that commit.
When you make new commits on that branch, Git updates the branch pointer to the new commit - old commits are untouched.
git log walks backward from your current commit following parent links - that’s your full history.
git merge creates a new commit with two parents - combining the histories of both branches.
If you run git reset, you’re just moving the branch pointer (and optionally your working directory) to a different commit.
If you delete a branch, the commits aren’t deleted until Git’s garbage collector runs, and only if nothing else references them.
git stash saves your working directory changes into a hidden commit and reverts your directory back to the last commit - it’s like a temporary branch.
git rebase rewrites commit history by changing the parent commits - it’s literally reconstructing the graph with new commit objects.
Pushing/pulling just sends these objects and refs across the network between .git folders (locally and remotely).
The .git folder is the real repo - your working directory is just a temporary view of one snapshot.

NOTE: The content below is additional technical knowledge and not necessary for basic understanding. Feel free to stop here if you're looking for just the essential process.

Git Internals Deep Dive

The Object Database

Git’s internal storage system is remarkably simple but powerful:

Blob objects: Store file contents (not filenames)
- Each file is stored as a compressed blob
- Identical files (even in different directories) are stored only once
- Identified by SHA-1 hash of their content
Tree objects: Store directory structures
- List of pointers to blobs and subtrees
- Each entry has a mode (file permissions), type, filename, and SHA-1 pointer
- Represents a single directory/folder at a point in time
Commit objects: Points to a specific tree
- Contains metadata: author, committer, timestamp, message
- Points to parent commit(s)
- Multiple commits can point to the same tree (empty commits)
Tag objects: Named pointers to commits
- Can include additional metadata (annotated tags)
- Signed tags include cryptographic verification

Working Directory States

Git tracks files through different states:

Untracked: Git doesn’t know about the file yet
Tracked: Git is aware of this file
- Modified: Changed but not staged
- Staged: Marked to go into the next commit
- Committed: Safely stored in the repository

References and Pointers

Git maintains several types of references:

Branches (refs/heads/*): Points to the latest commit in a line of development
HEAD: Points to the current branch or commit you’re working on
Remote branches (refs/remotes/*): Local copies of branches on remote repositories
Tags (refs/tags/*): Named pointers to specific commits

The Index (Staging Area)

The index is a binary file in .git/index that:

Acts as a staging area between working directory and repository
Contains a sorted list of paths with modes, object names, stage number
Serves as a “snapshot of proposed next commit”

Git’s Content-Addressable System

Every object is identified by its SHA-1 hash
This means Git can verify data integrity
Finding objects is fast (looking up by hash)
Allows for deduplication (same content = same hash)
Makes distributed repositories possible and efficient

Pack Files and Git’s Compression

To save space, Git occasionally runs garbage collection:

Similar objects are stored as deltas against each other
Objects are compressed using zlib
Pack files store many objects in a single file
Pack index provides fast lookup into pack files
This dramatically reduces storage requirements

The Reflog

Git maintains a log of where your HEAD and branch references have been:

.git/logs/HEAD records all changes to HEAD
.git/logs/refs/heads/* tracks branch movements
The reflog helps recover from mistakes (even after git reset --hard)
Entries expire after 90 days by default

Git Hooks

Git provides automation points at key events:

Pre-commit: Run before a commit is finalized
Post-commit: Run after a commit is created
Pre-push: Run before pushing to a remote
And many others

These can be used for linting, testing, enforcing commit message standards, and more.

Merging and Conflict Resolution

When Git merges branches:

It finds the common ancestor (merge base)
It applies changes from both branches
If the same part was edited differently, it marks a conflict
Conflict markers (<<<<<<<, =======, >>>>>>>) are inserted
You resolve by editing the file and creating a merge commit

Rebasing Under the Hood

When you rebase:

Git finds the common ancestor of current branch and target branch
Generates a diff for each commit on your branch
Applies each diff one by one on top of the target branch
Creates new commits with the same messages but different parents
Moves your branch pointer to the new chain of commits

Transport Protocols

Git supports several transport protocols:

Local: Direct access to repository on disk
HTTP/HTTPS: Efficient, firewall-friendly protocol
SSH: Secure authenticated access
Git protocol: Fastest but least secure, no authentication

During transfers, Git only sends objects the other side doesn’t have, making operations extremely efficient.