SlideShare a Scribd company logo
> ls .git/
A deep dive into git internals
Before we start …
http://pollev.com/markusfuchs839
Who knows what “Event sourcing” is?
“Classic” way of storing data in a database
I D N A M E I S _ A R C H I V E D S P E A K E R _ I D A D D R E S S _ I D
1 Die zahnärztliche Niederlassung false 15 84
2 Psychosomatik I true 301 12
”Event sourced” way
I D e C O U R S E _ I D E V E N T _ T Y P E E V E N T _ D AT A
1 1 CourseCreated {
“name”: “Psychosomatik 1”,
“address_id”: 84,
“speaker_id”: 12
}
2 1 CourseSpeakerChanged { “new_speaker_id”: 15 }
3 1 CourseNameChanged { ”new_name”: “Psychosomatik I” }
”Event sourced” way
Handles CourseCreated event
Handles CourseSpeakerChanged event
Handles CourseNameChanged event
Handles CourseArchived event
Pros and cons
• Audit log for free
• Version tracking for free
• Course at time X
But:
• Performance with large amount of events
• Querying is not as easy because we never store the current state in
the database (Find all archived courses)
• It can make some operational tasks harder
🤔Why am I telling you this?
Pros and cons
• Audit log for free
• Version tracking for free
• Course at time X
But:
• Filtering is not as easy because we never store the current state in
the database (Find all archived courses)
• It can make some operational tasks harder
• Performance with large amount of events
Pros and cons
• Audit log for free
• Version tracking for free
• Course at time X
But:
• Filtering is not as easy because we never store the current state in
the database (Find all archived courses)
• It can make some operational tasks harder
• Performance with large amount of events
KI University - Git internals
KI University - Git internals
Learning goals
• A little bit about the history of git
• Internal storage mechanisms
• What’s in the .git/ folder?
• Which data structures are used by git?
KI University - Git internals
git (/ɡɪt/)
I'm an egoistical
bastard, and I name all
my projects after myself.
First 'Linux', now 'git'.
L I N U S T O RVA L D S
Source: https://www.urbandictionary.com/define.php?term=Git
Facts
• Development started in April 2005
• Linux kernel team was using BitKeeper (but the owner withdrew
free use of the product)
• Linus Torvalds wanted a DVCS but none met his needs
> ls .git/
Let’s dive in … 🏊♂️
https://github.com/fum36205/repository
0
1
Create a new file
02Add the file to the index
03Create a new commit
04 We can see the commit in the log
(incl. its hash)
KI University - Git internals
KI University - Git internals
0
1
Switch to a new branch that starts at
our initial commit
02Create a copy of our hello.txt file and
name it hello3.txt
03Create a new commit with it
04We can see all commits of this branch
but not the ones from master
KI University - Git internals
Learnings so far …
• .git/refs/heads contains one text file for each branch (named
like the branch itself) = “branch pointer”
e.g. .git/refs/heads/master, .git/refs/heads/feature/a
• HEAD is a text file that always contains the path (relative to
.git/) to the currently checked out branch
e.g. ref: refs/heads/feature/a
• Creating new branches is very easy (we only need to store a
reference to the commit)
How does git know which commits
belong to a branch?
0
1
Get the commit hashes of master …
02… and feature/a
03Print out the contents of both
commits with git cat-file
Learnings
• git cat-file –p <hash> allows us to look at the contents of a
commit
• Each commit contains
• Information about its author
• The commit message
• A timestamp when it was created
• and a reference to its direct ancestor commit
• All commits of a single branch form a linked list that can be
traversed back to the initial/first commit
• .git/refs/heads/… points to the head of this linked list
A: By hashing its contents with SHA-1.
Q: How is the commit hash generated?
Q: How does git store the files?
👀
KI University - Git internals
0
1
Move hello.txt into a newly created
subfolder
02Create a new commit with this
change
03The tree object of the root folder now
contains a reference to another tree
object
Let’s create a subfolder
commit
tree
blob blob
commit
tree
blob
commit
tree
tree blob
blob
= referenced by hash
There are THREE different kinds of “objects”
commit
Author
Commit message
Reference to the
previous commit
Timestamp
tree
One per folder (incl.
one for the root folder)
Contains the name of all
files in the folder and
references to its
corresponding blob objects
blob
File contents
Hash = SHA-1(object)
But where are these objects stored?
KI University - Git internals
KI University - Git internals
Objects itself are stored in a KVS
KI University - Git internals
Advantages
• Efficient storage/transfer because objects with the same content
are only stored once (same hash)
• If you fucked something up the chances are very high that it can
be fixed 🎉
KI University - Git internals
✅ ✅
Merkel Merkle trees
Source: https://komodoplatform.com/whats-merkle-tree/
commit: 676e6c8
tree: d80ea91
tree: 61b7138 blob: 9b4930f
blob: 9b4930f
commit: c3b1130
tree: ab3a8b9
tree: 61b7138 blob: ed9e506
blob: 9b4930f
✅ ✅ ✅
Further reading
Building Git
J A M E S C O G L A N
https://shop.jcoglan.com/building-git/
Pro Git
S C O T T C H A C O N
https://git-scm.com/book/de/v2
Thank you

More Related Content

PPT
Git, Fast and Distributed Source Code Management
PPTX
Node collaboration - sharing information between your systems
PPTX
Code4 lib 20141129 python
PPTX
Command line git
PPTX
Scrapy.for.dummies
PDF
Logstash: Get to know your logs
PPTX
Cscope and ctags
PPTX
Scrapy-101
Git, Fast and Distributed Source Code Management
Node collaboration - sharing information between your systems
Code4 lib 20141129 python
Command line git
Scrapy.for.dummies
Logstash: Get to know your logs
Cscope and ctags
Scrapy-101

What's hot (8)

PDF
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
PDF
Git journey from mars to neon EclipseCon North America - 2016-03-08
PPTX
RandomAccessFile Quick Start
PPTX
Information Gathering
PPTX
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
PDF
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
PDF
Writing Well-Behaved Unix Utilities
PDF
PyDriller: Python Framework for Mining Software Repositories
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
Git journey from mars to neon EclipseCon North America - 2016-03-08
RandomAccessFile Quick Start
Information Gathering
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Writing Well-Behaved Unix Utilities
PyDriller: Python Framework for Mining Software Repositories
Ad

Similar to KI University - Git internals (20)

PPT
Introduction to Git for developers
PPTX
How git works
PPTX
Git 101 for Beginners
PPTX
The Nits and Grits of Git
PDF
Mini git tutorial
PDF
New Views on your History with git replace
PDF
Understanding GIT
PDF
Version control with GIT
PPTX
Six3 Getting Git
PDF
Learning git
PPTX
Git_new.pptx
PDF
Code as Data workshop: Using source{d} Engine to extract insights from git re...
ODP
Bang a Gong, GIT It On, or Running Drupal With a GIT Repository (11/04/20 - B...
PPTX
Github Session organized by GDG on campus SCOE.pptx
PDF
Git: An introduction of plumbing and porcelain commands
PPTX
PDF
Intro to git (UT biocomputing 2015)
PDF
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
PDF
Git session-2012-2013
PDF
Introduction to Git Version Control System
Introduction to Git for developers
How git works
Git 101 for Beginners
The Nits and Grits of Git
Mini git tutorial
New Views on your History with git replace
Understanding GIT
Version control with GIT
Six3 Getting Git
Learning git
Git_new.pptx
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Bang a Gong, GIT It On, or Running Drupal With a GIT Repository (11/04/20 - B...
Github Session organized by GDG on campus SCOE.pptx
Git: An introduction of plumbing and porcelain commands
Intro to git (UT biocomputing 2015)
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
Git session-2012-2013
Introduction to Git Version Control System
Ad

Recently uploaded (20)

PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
medical staffing services at VALiNTRY
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
L1 - Introduction to python Backend.pptx
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
assetexplorer- product-overview - presentation
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
Computer Software and OS of computer science of grade 11.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
VVF-Customer-Presentation2025-Ver1.9.pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle
medical staffing services at VALiNTRY
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
ai tools demonstartion for schools and inter college
Understanding Forklifts - TECH EHS Solution
L1 - Introduction to python Backend.pptx
Navsoft: AI-Powered Business Solutions & Custom Software Development
assetexplorer- product-overview - presentation
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Design an Analysis of Algorithms II-SECS-1021-03
Reimagine Home Health with the Power of Agentic AI​
Upgrade and Innovation Strategies for SAP ERP Customers

KI University - Git internals

  • 1. > ls .git/ A deep dive into git internals
  • 2. Before we start … http://pollev.com/markusfuchs839
  • 3. Who knows what “Event sourcing” is?
  • 4. “Classic” way of storing data in a database I D N A M E I S _ A R C H I V E D S P E A K E R _ I D A D D R E S S _ I D 1 Die zahnärztliche Niederlassung false 15 84 2 Psychosomatik I true 301 12
  • 5. ”Event sourced” way I D e C O U R S E _ I D E V E N T _ T Y P E E V E N T _ D AT A 1 1 CourseCreated { “name”: “Psychosomatik 1”, “address_id”: 84, “speaker_id”: 12 } 2 1 CourseSpeakerChanged { “new_speaker_id”: 15 } 3 1 CourseNameChanged { ”new_name”: “Psychosomatik I” }
  • 6. ”Event sourced” way Handles CourseCreated event Handles CourseSpeakerChanged event Handles CourseNameChanged event Handles CourseArchived event
  • 7. Pros and cons • Audit log for free • Version tracking for free • Course at time X But: • Performance with large amount of events • Querying is not as easy because we never store the current state in the database (Find all archived courses) • It can make some operational tasks harder
  • 8. 🤔Why am I telling you this?
  • 9. Pros and cons • Audit log for free • Version tracking for free • Course at time X But: • Filtering is not as easy because we never store the current state in the database (Find all archived courses) • It can make some operational tasks harder • Performance with large amount of events
  • 10. Pros and cons • Audit log for free • Version tracking for free • Course at time X But: • Filtering is not as easy because we never store the current state in the database (Find all archived courses) • It can make some operational tasks harder • Performance with large amount of events
  • 13. Learning goals • A little bit about the history of git • Internal storage mechanisms • What’s in the .git/ folder? • Which data structures are used by git?
  • 15. git (/ɡɪt/) I'm an egoistical bastard, and I name all my projects after myself. First 'Linux', now 'git'. L I N U S T O RVA L D S Source: https://www.urbandictionary.com/define.php?term=Git
  • 16. Facts • Development started in April 2005 • Linux kernel team was using BitKeeper (but the owner withdrew free use of the product) • Linus Torvalds wanted a DVCS but none met his needs
  • 17. > ls .git/ Let’s dive in … 🏊♂️
  • 19. 0 1 Create a new file 02Add the file to the index 03Create a new commit 04 We can see the commit in the log (incl. its hash)
  • 22. 0 1 Switch to a new branch that starts at our initial commit 02Create a copy of our hello.txt file and name it hello3.txt 03Create a new commit with it 04We can see all commits of this branch but not the ones from master
  • 24. Learnings so far … • .git/refs/heads contains one text file for each branch (named like the branch itself) = “branch pointer” e.g. .git/refs/heads/master, .git/refs/heads/feature/a • HEAD is a text file that always contains the path (relative to .git/) to the currently checked out branch e.g. ref: refs/heads/feature/a • Creating new branches is very easy (we only need to store a reference to the commit)
  • 25. How does git know which commits belong to a branch?
  • 26. 0 1 Get the commit hashes of master … 02… and feature/a 03Print out the contents of both commits with git cat-file
  • 27. Learnings • git cat-file –p <hash> allows us to look at the contents of a commit • Each commit contains • Information about its author • The commit message • A timestamp when it was created • and a reference to its direct ancestor commit • All commits of a single branch form a linked list that can be traversed back to the initial/first commit • .git/refs/heads/… points to the head of this linked list
  • 28. A: By hashing its contents with SHA-1. Q: How is the commit hash generated?
  • 29. Q: How does git store the files?
  • 30. 👀
  • 32. 0 1 Move hello.txt into a newly created subfolder 02Create a new commit with this change 03The tree object of the root folder now contains a reference to another tree object Let’s create a subfolder
  • 34. There are THREE different kinds of “objects” commit Author Commit message Reference to the previous commit Timestamp tree One per folder (incl. one for the root folder) Contains the name of all files in the folder and references to its corresponding blob objects blob File contents Hash = SHA-1(object)
  • 35. But where are these objects stored?
  • 38. Objects itself are stored in a KVS
  • 40. Advantages • Efficient storage/transfer because objects with the same content are only stored once (same hash) • If you fucked something up the chances are very high that it can be fixed 🎉
  • 43. Merkel Merkle trees Source: https://komodoplatform.com/whats-merkle-tree/
  • 44. commit: 676e6c8 tree: d80ea91 tree: 61b7138 blob: 9b4930f blob: 9b4930f commit: c3b1130 tree: ab3a8b9 tree: 61b7138 blob: ed9e506 blob: 9b4930f
  • 46. Further reading Building Git J A M E S C O G L A N https://shop.jcoglan.com/building-git/ Pro Git S C O T T C H A C O N https://git-scm.com/book/de/v2

Editor's Notes

  • #5: Modifying the data is as simple as updating the row in the database
  • #6: ”Event sourcing”: we don’t store the current state of the object but a stream of events that modified it: CourseCreated, CourseChanged, CourseArchived Later, we can “reconstruct” the current state of the object by applying all events in order again If we want to archive the course -> we need to insert a new event into the stream
  • #7: Reconstructing the current state later happens in code where we iterate over all events in the stream (from the database) And apply the corresponding changes e.g. CourseArchived event is handled by setting tha
  • #8: You store the timestamp of the event, also the user who executed the operation. CQRS Snapshots to cache intermediary states
  • #10: You store the timestamp of the event, also the user who executed the operation. CQRS Snapshots to cache intermediary states
  • #11: You store the timestamp of the event, also the user who executed the operation. CQRS Snapshots to cache intermediary states
  • #12: Indeed if we type git log –p Four each commit you can see all changes that have been to the working directory (creating files, modifying files, deleting files) The current state of the working directory can always be derived by applying all changes/change events since the beginning.
  • #13: Did you already know that git works like this? https://www.polleverywhere.com/multiple_choice_polls/VnIwdXx7qmTsGepodbekY
  • #15: What has git to do with these two persons (Linus and Angela Merkel), and a shopping cart Bingo: but please in your mind (don’t scream bingo ;))
  • #17: Applying patches should be fast “Take Concurrent Versions System (CVS) as an example of what not to do; if in doubt, make the exact opposite decision.” Include very strong safeguards against corruption, either accidental or malicious. Support a distributed, BitKeeper-like workflow.
  • #18: Learning goal of today: to know what’s inside the .git folder in a repository folder
  • #19: Git already created an .git/ folder with some files and subfolders. But most of it is still empty. Only hooks contain some example hooks and the refs folder contains two empty sub-directories called “heads” and “tags” I’ve also published this sample repository on GitHub (So you can also follow along if you want to)
  • #29: We take all that -> and hash it with SHA-1 Any change to any of this data also means that a new hash is generated (changing the email of the author for example)
  • #30: And the folder structure of the working directory at point in time X (commit X)
  • #31: Let’s go back to our two commits and what git store about them If you looked carefully you might already have noticed that there is another piece of information stored that I didn’t mention yet
  • #32: We can again use our friend git cat-file to find out what’s behind the hash Why do they all contain the same hash? Because the hashes for blobs are calculated the same way as those of the commit -> by hashing its content Files that have the same content are stored under the same hash Because we just copied the file we initially created they all have the same hash
  • #34: Only references We still don’t know where the actual contents of the objects (e.g. files are stored)
  • #37: We didn’t look into the objects directory yet. That’s what we’re going to do now.
  • #38: Where key is the object’s hash The objects folder contains sub-directories with the first two bytes of the hash
  • #39: Where key is the object’s hash The objects folder contains sub-directories with the first two bytes of the hash
  • #40: Every time you write “git add ” a new blob object is added to the objects KVS (unless there already is a file with the same contents) Objects with the same hash are only stored once Trees and commits are stored in a binary format and then deflated with zlib Objects are never removed (as long as there is a reference to it somewhere) Garbage collector removes unreferenced objects Packing algorithm that compresses objects
  • #41: Unless it was already sent to the remote; even then it’s more of a communication problem Say you accidentally created a squashed merge commit instead of a “normal” merge commit
  • #42: Select the **true** statements about the way _git_ works. https://www.polleverywhere.com/multiple_choice_polls/QH0fsFke1EnN0cjmZHgDR
  • #43: Three side tasks: What has git to do with these two persons, and a shopping cart
  • #44: Named after its inventor Ralph Merkle Also called hash tree Leaf nodes contain actual content Nodes contain hash of its children Cryptographic properties: Finding differences in two trees is really easy/efficient Used in bitcoin (Merkle proof) where a user can verify only with sending hashes that a transaction has taken place
  • #45: Trust on probabilities He couldn’t come up with a chain of hashes that produce the exact same merkle root
  • #46: If we change a file (hello3.txt) and create a new commit This blob will get a new hash (because its contents have changed) In turn also the tree object will get a new hash (because its contents, the references to the blobs/trees, have changed) And we also get the new commit hash (not only because of the modified tree hash but also because of the timestamp If we later (git log –p) want to find out where these two commits differ (and where not) we only need to traverse the trees -> subtrees with the same hash have not changed
  • #47: Three side tasks: What has git to do with these two persons, and a shopping cart