How Git Works From Inside Out

  1. When you run git init, Git creates a hidden folder called .git - this is where everything lives: commits, branches, tags, logs, etc.

  2. Git doesn’t track files directly - it tracks snapshots of your files, not diffs (unlike what most people think).

  3. When you add a file with git add, Git compresses the file’s contents and stores it as a blob object in .git/objects/.

  4. Git then creates a tree object - this records the folder structure (which blob belongs to which filename, in which folder).

  5. When you commit with git commit, Git creates a commit object:

    • Points to the current tree (snapshot).
    • Stores the author, message, timestamp.
    • Links to the previous commit(s).
  6. The commit is now a snapshot of your repo at that moment, not just a change - it’s a full version of what files looked like.

  7. All of these blobs, trees, and commits are stored by SHA-1 hashes - every object in Git is content-addressed.

  8. When you create a branch, Git just makes a simple pointer (a file in .git/refs/heads/) to a specific commit hash.

  9. When you run git checkout, Git:

    • Reads the commit pointed to by the branch.
    • Rebuilds the working directory to match the tree in that commit.
  10. When you make new commits on that branch, Git updates the branch pointer to the new commit - old commits are untouched.

  11. git log walks backward from your current commit following parent links - that’s your full history.

  12. git merge creates a new commit with two parents - combining the histories of both branches.

  13. If you run git reset, you’re just moving the branch pointer (and optionally your working directory) to a different commit.

  14. If you delete a branch, the commits aren’t deleted until Git’s garbage collector runs, and only if nothing else references them.

  15. git stash saves your working directory changes into a hidden commit and reverts your directory back to the last commit - it’s like a temporary branch.

  16. git rebase rewrites commit history by changing the parent commits - it’s literally reconstructing the graph with new commit objects.

  17. Pushing/pulling just sends these objects and refs across the network between .git folders (locally and remotely).

  18. The .git folder is the real repo - your working directory is just a temporary view of one snapshot.

NOTE: The content below is additional technical knowledge and not necessary for basic understanding. Feel free to stop here if you're looking for just the essential process.

Git Internals Deep Dive

The Object Database

Git’s internal storage system is remarkably simple but powerful:

  1. Blob objects: Store file contents (not filenames)

    • Each file is stored as a compressed blob
    • Identical files (even in different directories) are stored only once
    • Identified by SHA-1 hash of their content
  2. Tree objects: Store directory structures

    • List of pointers to blobs and subtrees
    • Each entry has a mode (file permissions), type, filename, and SHA-1 pointer
    • Represents a single directory/folder at a point in time
  3. Commit objects: Points to a specific tree

    • Contains metadata: author, committer, timestamp, message
    • Points to parent commit(s)
    • Multiple commits can point to the same tree (empty commits)
  4. Tag objects: Named pointers to commits

    • Can include additional metadata (annotated tags)
    • Signed tags include cryptographic verification

Working Directory States

Git tracks files through different states:

  1. Untracked: Git doesn’t know about the file yet
  2. Tracked: Git is aware of this file
    • Modified: Changed but not staged
    • Staged: Marked to go into the next commit
    • Committed: Safely stored in the repository

References and Pointers

Git maintains several types of references:

  1. Branches (refs/heads/*): Points to the latest commit in a line of development
  2. HEAD: Points to the current branch or commit you’re working on
  3. Remote branches (refs/remotes/*): Local copies of branches on remote repositories
  4. Tags (refs/tags/*): Named pointers to specific commits

The Index (Staging Area)

The index is a binary file in .git/index that:

  • Acts as a staging area between working directory and repository
  • Contains a sorted list of paths with modes, object names, stage number
  • Serves as a “snapshot of proposed next commit”

Git’s Content-Addressable System

  1. Every object is identified by its SHA-1 hash
  2. This means Git can verify data integrity
  3. Finding objects is fast (looking up by hash)
  4. Allows for deduplication (same content = same hash)
  5. Makes distributed repositories possible and efficient

Pack Files and Git’s Compression

To save space, Git occasionally runs garbage collection:

  1. Similar objects are stored as deltas against each other
  2. Objects are compressed using zlib
  3. Pack files store many objects in a single file
  4. Pack index provides fast lookup into pack files
  5. This dramatically reduces storage requirements

The Reflog

Git maintains a log of where your HEAD and branch references have been:

  1. .git/logs/HEAD records all changes to HEAD
  2. .git/logs/refs/heads/* tracks branch movements
  3. The reflog helps recover from mistakes (even after git reset --hard)
  4. Entries expire after 90 days by default

Git Hooks

Git provides automation points at key events:

  1. Pre-commit: Run before a commit is finalized
  2. Post-commit: Run after a commit is created
  3. Pre-push: Run before pushing to a remote
  4. And many others

These can be used for linting, testing, enforcing commit message standards, and more.

Merging and Conflict Resolution

When Git merges branches:

  1. It finds the common ancestor (merge base)
  2. It applies changes from both branches
  3. If the same part was edited differently, it marks a conflict
  4. Conflict markers (<<<<<<<, =======, >>>>>>>) are inserted
  5. You resolve by editing the file and creating a merge commit

Rebasing Under the Hood

When you rebase:

  1. Git finds the common ancestor of current branch and target branch
  2. Generates a diff for each commit on your branch
  3. Applies each diff one by one on top of the target branch
  4. Creates new commits with the same messages but different parents
  5. Moves your branch pointer to the new chain of commits

Transport Protocols

Git supports several transport protocols:

  1. Local: Direct access to repository on disk
  2. HTTP/HTTPS: Efficient, firewall-friendly protocol
  3. SSH: Secure authenticated access
  4. Git protocol: Fastest but least secure, no authentication

During transfers, Git only sends objects the other side doesn’t have, making operations extremely efficient.