SlideShare a Scribd company logo
git as a NoSQL database
Kenneth Truyers
.NET Developer
Microsoft MVP / Pluralsight author
@kennethtruyers
www.kenneth-truyers.net
Kenneth Truyers - Using Git as a NoSql database - Codemotion Milan 2018
Kenneth Truyers - Using Git as a NoSql database - Codemotion Milan 2018
Mechanics of
git as a database
Why it’s a
fantastic idea
Why it’s a
terrible idea
Database
“a structured set of data held in a computer, especially one that is
accessible in various ways”
NoSQL
“provides a mechanism for storage and retrieval of data in means
other than the tabular relations used in relational databases”
Schema-less Non relational
The naïve way
> git init myDatabase
Initialized empty Git repository in D:/myDatabase/.git/
> cd myDatabase
> echo {"id": 1, "name": "kenneth"} > 1.json
> git add 1.json
> git commit -m "Added person"
[master (root-commit) 6c6b907] Added person
1 file changed, 1 insertion(+)
create mode 100644 1.json
> git show master:1.json
{"id": 1, "name": "kenneth"}
HOW
Git repositories
Client Server (Bare repository)
Git data model
HOW
commit
05c1ce
tree
d6916d
blob
da95f82
tree
2c5d87r
master
commit
9918e4
tree
2c59c06
blob
42d0620
commit
5d8r56c
tree
4d52c5a
blob
da95f82
Reference Database
Object Database
Git commands
add archive bisect branch
checkout cherry-pick clean clone
commit diff fetch gc
init log merge mv
pull push rebase reset
revert rm show shortlog
stash status submodule tag
HOW
Porcelain
HOW
apply commit-tree hash-object merge-file
mktag mktree for-each-ref read-tree
update-ref write-tree cat-file diff-files
diff-index diff-tree merge-base rev-list
show-index show-ref unpack-tree send-pack
http-fetch http-push shell check-attr
mailinfo mailsplit sh-setup stripspace
Git commands
Plumbing
Plumbing
We need to get our hands dirty
Blobs
> echo {"id":1, "name": "kenneth"} | git hash-object –w
--stdin
da95f8264a0ffe3df10e94eed6371ea83aee9a4d
> git cat-file -p da95f
{"id": 1, "name": "kenneth"}
blob
da95f82
HOW
Trees
> git update-index --add --cacheinfo 100644
da95f8264a0ffe3df10e94eed6371ea83aee9a4d 1.json
> git write-tree
d6916d3e27baa9ef2742c2ba09696f22e41011a1
> git cat-file -p d6916d
100644 blob da95f8264a0ffe3df10e94eed6371ea83aee9a4d
1.json tree
d6916d3
blob: 1.json
blob
da95f82
HOW
Commits
> echo "commit kenneth" | git commit-tree d6916d
05c1cec5685bbb84e806886dba0de5e2f120ab2a
> git log --stat 05c1ce
commit 05c1cec5685bbb84e806886dba0de5e2f120ab2a
Author: Kenneth Truyers <truyers.kenneth@gmail.com>
Date: Sat Apr 29 10:15:23 2017 +0200
"commit kenneth"
1.json | 1 +
1 files changed, 1 insertions(+)
> git show 05c1cec:1.json
{"id": 1, "name": "kenneth"}
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
HOW
Updating the file
> echo {"id": 1, "name": "updated name"} | git hash-
object -w --stdin
42d0d209ecf70a96666f5a4c8ed97f3fd2b75dda
> git update-index --add --cacheinfo 100644
42d0d209ecf70a96666f5a4c8ed97f3fd2b75dda 1.json
> git write-tree
2c59068b29c38db26eda42def74b7142de392212
> echo "Commit Kenneth v2" | git commit-tree 2c59068 -p
05c1cec
9918e46dfc4241f0782265285970a7c16bf499e4
commit
9918e46
tree: 2c59c06
tree
2c59c06
blob: 1.json
blob
42d0620
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
HOW
Updating the file
> git log --stat 9918e46
9918e46dfc4241f0782265285970a7c16bf4 "commit Kenneth v2"
1.json | 1 +
1 file changed, 1 insertions(+)
05c1cec5685bbb84e806886dba0de5e2f120 "commit kenneth"
1.json | 1 +
1 file changed, 1 insertion(+)
> git show 9918e4:1.json
{"id": 1, "name": "updated name"}
> git show 05c1cec:1.json
{"id": 1, "name": "kenneth"}
commit
9918e46
tree: 2c59c06
tree
2c59c06
blob: 1.json
blob
42d0620
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
HOW
master
commit: 9918e46
commit
9918e46
tree: 2c59c06
tree
2c59c06
blob: 1.json
blob
42d0620
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
Refs
> git update-ref refs/heads/master 9918e4
> git show master:1.json
{"id": 1, "name": "updated name"}
HOW
Cool, but… I’m a programmer
portable, pure C implementation
of the core git methods
https://libgit2.github.com/
libgit2sharp
brings might and speed of libgit2 to the
managed world of .NET and Mono
https://github.com/libgit2/libgit2sharp
HOW
Building a git db in C#
public class GitDb
{
public string Save<T>(string branch, string key, T value)
{
}
public T Get<T>(string branch, string key)
{
}
}
HOW
Building a git db in C#
> git init --bare public class GitDb
{
readonly Repository _repo;
public GitDb(string path)
{
Repository.Init(path, isBare: true);
_repo = new Repository(path);
}
public string Save<T>(string branch, string key, T value)
{
}
public T Get<T>(string branch, string key)
{
}
}
HOW
Writing a blob
public string Save<T>(string branch, string key, T value)
{
string val = JsonConvert.SerializeObject(value);
byte[] bytes = Encoding.UTF8.GetBytes(val);
Stream stream = new MemoryStream(bytes);
Blob blob = _repo.ObjectDatabase.CreateBlob(stream);
}
commit
9918e46
tree
2c59c06
blob
42d0620
master
commit: 05c1cec
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
HOW
Writing a tree
public string Save<T>(string branch, string key, T value)
{
string val = JsonConvert.SerializeObject(value);
byte[] bytes = Encoding.UTF8.GetBytes(val);
Stream stream = new MemoryStream(bytes);
Blob blob = _repo.ObjectDatabase.CreateBlob(stream);
Commit currentCommit = _repo.Branches[branch].Tip;
TreeDefinition treeDef = TreeDefinition.From(currentCommit);
treeDef.Add(key, blob, Mode.NonExecutableFile);
Tree tree = _repo.ObjectDatabase.CreateTree(treeDef);
}
commit
9918e46
tree
2c59c06
blob: 1.json
blob
42d0620
master
commit: 05c1cec
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
HOW
Committing the tree
public string Save<T>(string branch, string key, T value)
{
string val = JsonConvert.SerializeObject(value);
byte[] bytes = Encoding.UTF8.GetBytes(val);
Stream stream = new MemoryStream(bytes);
Blob blob = _repo.ObjectDatabase.CreateBlob(stream);
Commit currentCommit = _repo.Branches[branch].Tip;
TreeDefinition treeDef = TreeDefinition.From(currentCommit);
treeDef.Add(key, blob, Mode.NonExecutableFile);
var tree = _repo.ObjectDatabase.CreateTree(treeDef);
var now = DateTimeOffset.Now;
Commit commit = _repo.ObjectDatabase.CreateCommit(
author: new Signature("author", "email", now),
message: "commit message",
tree: tree,
parents: new List<Commit> {currentCommit});
}
commit
9918e46
tree: 2c59c06
tree
2c59c06
blob: 1.json
blob
42d0620
master
commit: 05c1cec
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
HOW
master
commit: 9918e46
commit
9918e46
tree: 2c59c06
tree
2c59c06
blob: 1.json
blob
42d0620
master
commit: 05c1cec
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
Updating the branch
public string Save<T>(string branch, string key, T value)
{
string val = JsonConvert.SerializeObject(value);
byte[] bytes = Encoding.UTF8.GetBytes(val);
Stream stream = new MemoryStream(bytes);
Blob blob = _repo.ObjectDatabase.CreateBlob(stream);
Commit currentCommit = _repo.Branches[branch].Tip;
TreeDefinition treeDef = TreeDefinition.From(currentCommit);
treeDef.Add(key, blob, Mode.NonExecutableFile);
var tree = _repo.ObjectDatabase.CreateTree(treeDef);
var now = DateTimeOffset.Now;
Commit commit = _repo.ObjectDatabase.CreateCommit(
author: new Signature("author", "email", now),
message: "commit message",
tree: tree,
parents: new List<Commit> {currentCommit});
String refName = _repo.Branches
.Single(b => b.FriendlyName == branch)
.CanonicalName;
_repo.Refs.UpdateTarget(_repo.Refs[refName], commit.Id);
return commit.Sha;
}
HOW
Reading
public T Get<T>(string branch, string key)
{
Branch branchRef = _repo.Branches[branch];
Commit commit = branchRef.Tip;
TreeEntry entry = commit[key];
GitObject gitObj = entry.Target;
Blob blob = gitObj as Blob;
string content = blob.GetContentText();
return JsonConvert.DeserializeObject<T>(content);
}
master
commit: 9918e4
commit
9918e46
tree: 2c59c06
tree
2c59c06
blob: 1.json
blob
42d0620
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
HOW
Cool, but… where’s my ORM?
HOW
github.com/YellowLineParking/Appy.GitDb
goo.gl/vQpxra
Why it’s a
fantastic idea
Schema-less
public class User
{
public int Id { get; set; }
public string Name { get; set; }
}
WHY
public class User
{
public int Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public string Street { get; set; }
public string Country { get; set; }
}
Versioning and roll-back
WHY
> git show master:1.json
{"id": 1, "name": "updated name"}
> git update-ref refs/heads/master 05c1ce
> git show master:1.json
{"id": 1, "name": "kenneth"}'
master
commit: 9918e46
commit
9918e46
tree: 2c59c06
tree
2c59c06
blob: 1.json
blob
42d0620
master
commit: 05c1ce
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
Diff
WHY
> git diff 9918e4 05c1ce -- 1.json
diff --git a/1.json b/1.json
--- a/1.json
+++ b/1.json
@@ -1 +1 @@
-{"id": 1, "name": "updated name"}
+{"id": 1, "name": "kenneth"}
master
commit
9918e46
tree: 2c59c06
tree
2c59c06
blob: 1.json
blob
42d0620
master
commit: 05c1ce
commit
05c1cec
tree: d6916d3
tree
d6916d3
blob: 1.json
blob
da95f82
Backup & Replication
WHY
> git remote add backup https://github.com/user/repo.git
> git push backup
> type .git/hooks/post-commit
#!/bin/sh
git push backup
Transactions
Short lived
WHY
> echo "file1" | git hash-object -w --stdin
42d0d20...
> echo "file2" | git hash-object -w --stdin
5d8f5r0...
> git update-index --add --cacheinfo 100644 42d0d20... 1.json
> git update-index --add --cacheinfo 100644 5d8f5r0... 2.json
> git write-tree
9c38db2...
> echo "commit transaction" | git commit-tree 9c38db2 -p
05c1cec
9918e46...
Transactions
Long lived
WHY
> git checkout -b transaction
...
...
...
> git checkout master
> git checkout merge transaction
Tooling
WHY
App
GIT
Repository
Appy.GitDb
bash
Why it’s a
terrible idea
Queries
WHY
NOT
KEY
KEY PREFIX (sort of*)
Solution
App
GIT
SQL
ElasticSearch
...
WRITE READ
COMMIT HOOK
WHY
NOT
public string Save<T>(string branch, string key, T value)
{
Blob blob = _repo.ObjectDatabase.CreateBlob(...);
TreeDefinition treeDef = TreeDefinition.From(...);
treeDef.Add(key, blob, Mode.NonExecutableFile);
Tree tree = _repo.ObjectDatabase.CreateTree(treeDef);
Commit commit = _repo.ObjectDatabase.CreateCommit(...);
_repo.Refs.UpdateTarget(_repo.Refs[refName], commit.Id);
}
Concurrency
WHY
NOT
public string Save<T>(string branch, string key, T value)
{
Blob blob = _repo.ObjectDatabase.CreateBlob(...);
TreeDefinition treeDef = TreeDefinition.From(...);
treeDef.Add(key, blob, Mode.NonExecutableFile);
Tree tree = _repo.ObjectDatabase.CreateTree(treeDef);
Commit commit = _repo.ObjectDatabase.CreateCommit(...);
_repo.Refs.UpdateTarget(_repo.Refs[refName], commit.Id);
}
Concurrency
WHY
NOT
blob
A
tree
A
commit
A
master
tree
B
blob
B
tree
C
blob
C
public string Save<T>(string branch, string key, T value)
{
Blob blob = _repo.ObjectDatabase.CreateBlob(...);
TreeDefinition treeDef = TreeDefinition.From(...);
treeDef.Add(key, blob, Mode.NonExecutableFile);
Tree tree = _repo.ObjectDatabase.CreateTree(treeDef);
Commit commit = _repo.ObjectDatabase.CreateCommit(...);
_repo.Refs.UpdateTarget(_repo.Refs[refName], commit.Id);
}
Concurrency
WHY
NOT
commit
A
master
commit
B
commit
C
mastermaster
var locks = new Dictionary<string, object>
{
{"master", new object() },
{"branch", new object() },
};
lock (locks[branch])
{
// ...
}
Solution WHY
NOT
Performance
WHY
NOT
Writes per second
125 (10,000 files)
18 (1,000,000 files)
100644 blob b9e36d8ed060ebdad585e93cec84 1.json
100644 blob 5a947c79aae7b7f6c663f632bfcf 2.json
.
.
100644 blob 4b54f5a8e56cfbc90947e3071ef6 999998.json
100644 blob 06179b3af5668bf7aa1dcb205949 999999.json
100644 blob a3d0db85108eb3e0c3d2b989a2ce 1000000.json
tree
B
tree
A
commit
master
...
100644 blob b9e36d8ed060ebdad585e93cec84 1.json
100644 blob 5a947c79aae7b7f6c663f632bfcf 2.json
.
.
100644 blob 4b54f5a8e56cfbc90947e3071ef6 999998.json
100644 blob 06179b3af5668bf7aa1dcb205949 999999.json
100644 blob a3d0db85108eb3e0c3d2b989a2ce 1000000.json
100644 blob 5bc5e544f68a56f992ca2be2fd26 1000001.json
...
100
...
100
...
100
...
100
...
100
...
100
...
100
Solution: tree nesting WHY
NOT
Writes per second
125 (10,000 files)
18 (1,000,000 files)
250 (1,000,000 files)
Solution: partitioning WHY
NOT
App
Repo 1
Disk 1
Repo 2
Disk 2
Repo 3
Disk 3
... ...
Solution: back-end WHY
NOT
App
GIT
Merge conflicts
WHY
NOT
Solution? WHY
NOT
Good idea Bad idea
Content heavy
CMS, Wiki, …
Partitionable
Country, customer, …
Fast writes
Immediate consistency
Demo
(ab)use your tools
Kenneth Truyers
@kennethtruyers
www.kenneth-truyers.net
bit.ly/git-db-code
bit.ly/git-nosql

More Related Content

PPTX
13.2 Contraindications of massage.pptx
PPTX
Ankle joint
PPTX
Introduction-to-IFT-Electrotherapy..pptx
PPTX
Datura /dharura/ hyoscine / scopolamine / anticholinergic / atropine / atrop...
PPTX
Muscle contraction
PPT
Posterior compartment of leg
PDF
LIGAMENTS BPT.pdf
PPTX
Hand anatomy new
13.2 Contraindications of massage.pptx
Ankle joint
Introduction-to-IFT-Electrotherapy..pptx
Datura /dharura/ hyoscine / scopolamine / anticholinergic / atropine / atrop...
Muscle contraction
Posterior compartment of leg
LIGAMENTS BPT.pdf
Hand anatomy new

What's hot (18)

PPTX
Piyush ppt
PPTX
Anatomy of Gluteal Region
PPTX
Interferential current
PPT
Lower limb joints
PPT
Suspension Therapy
PPT
Ch04 musculoskeletal system
PPTX
PPTX
PPTX
Lecture 8 calcaneum
PPTX
Lumbosacral plexus
PPTX
Physiology of muscle performance - exercise therapy.pptx
PPT
Anatomy of hand and wrist.ppt
PPTX
Lower leg and foot massage.pptx
PPTX
Active and Passive Insufficiency.pptx
PPTX
Wrist joint
PPTX
Lumbosacral (lumbar &amp; sacral) plexus
PPTX
Front of forearm.pptx
Piyush ppt
Anatomy of Gluteal Region
Interferential current
Lower limb joints
Suspension Therapy
Ch04 musculoskeletal system
Lecture 8 calcaneum
Lumbosacral plexus
Physiology of muscle performance - exercise therapy.pptx
Anatomy of hand and wrist.ppt
Lower leg and foot massage.pptx
Active and Passive Insufficiency.pptx
Wrist joint
Lumbosacral (lumbar &amp; sacral) plexus
Front of forearm.pptx
Ad

Similar to Kenneth Truyers - Using Git as a NoSql database - Codemotion Milan 2018 (20)

PDF
Git: An introduction of plumbing and porcelain commands
PPTX
How git works
PDF
Git basics with notes
PPTX
Git internals
PDF
GIT: Content-addressable filesystem and Version Control System
ODP
Introduction to Git (Greg Lonnon)
PDF
Git session-2012-2013
PPT
Git, Fast and Distributed Source Code Management
PPTX
git internals
PDF
Tech thursdays / GIT
ODP
GIT: a Gentle InTroduction
ODP
Geecon11 - Git: a Gentle InTroduction
PDF
An introduction to git
PDF
Git internals
PPTX
PDF
Git basics
PPTX
Git presentation, Viktor Pyskunov
PDF
New Views on your History with git replace
PPTX
KI University - Git internals
PDF
Git Going for the Transylvania JUG
Git: An introduction of plumbing and porcelain commands
How git works
Git basics with notes
Git internals
GIT: Content-addressable filesystem and Version Control System
Introduction to Git (Greg Lonnon)
Git session-2012-2013
Git, Fast and Distributed Source Code Management
git internals
Tech thursdays / GIT
GIT: a Gentle InTroduction
Geecon11 - Git: a Gentle InTroduction
An introduction to git
Git internals
Git basics
Git presentation, Viktor Pyskunov
New Views on your History with git replace
KI University - Git internals
Git Going for the Transylvania JUG
Ad

More from Codemotion (20)

PDF
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
PDF
Pompili - From hero to_zero: The FatalNoise neverending story
PPTX
Pastore - Commodore 65 - La storia
PPTX
Pennisi - Essere Richard Altwasser
PPTX
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
PPTX
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
PPTX
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
PPTX
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
PDF
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
PDF
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
PDF
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
PDF
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
PDF
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
PDF
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
PPTX
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
PPTX
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
PDF
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
PDF
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
PDF
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
PDF
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Pompili - From hero to_zero: The FatalNoise neverending story
Pastore - Commodore 65 - La storia
Pennisi - Essere Richard Altwasser
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
cuic standard and advanced reporting.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Big Data Technologies - Introduction.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Network Security Unit 5.pdf for BCA BBA.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
A comparative analysis of optical character recognition models for extracting...
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction
cuic standard and advanced reporting.pdf
sap open course for s4hana steps from ECC to s4
Assigned Numbers - 2025 - Bluetooth® Document
Big Data Technologies - Introduction.pptx
MIND Revenue Release Quarter 2 2025 Press Release

Kenneth Truyers - Using Git as a NoSql database - Codemotion Milan 2018

  • 1. git as a NoSQL database Kenneth Truyers .NET Developer Microsoft MVP / Pluralsight author @kennethtruyers www.kenneth-truyers.net
  • 4. Mechanics of git as a database Why it’s a fantastic idea Why it’s a terrible idea
  • 5. Database “a structured set of data held in a computer, especially one that is accessible in various ways” NoSQL “provides a mechanism for storage and retrieval of data in means other than the tabular relations used in relational databases” Schema-less Non relational
  • 6. The naïve way > git init myDatabase Initialized empty Git repository in D:/myDatabase/.git/ > cd myDatabase > echo {"id": 1, "name": "kenneth"} > 1.json > git add 1.json > git commit -m "Added person" [master (root-commit) 6c6b907] Added person 1 file changed, 1 insertion(+) create mode 100644 1.json > git show master:1.json {"id": 1, "name": "kenneth"} HOW
  • 7. Git repositories Client Server (Bare repository)
  • 9. Git commands add archive bisect branch checkout cherry-pick clean clone commit diff fetch gc init log merge mv pull push rebase reset revert rm show shortlog stash status submodule tag HOW Porcelain
  • 10. HOW apply commit-tree hash-object merge-file mktag mktree for-each-ref read-tree update-ref write-tree cat-file diff-files diff-index diff-tree merge-base rev-list show-index show-ref unpack-tree send-pack http-fetch http-push shell check-attr mailinfo mailsplit sh-setup stripspace Git commands Plumbing
  • 11. Plumbing We need to get our hands dirty
  • 12. Blobs > echo {"id":1, "name": "kenneth"} | git hash-object –w --stdin da95f8264a0ffe3df10e94eed6371ea83aee9a4d > git cat-file -p da95f {"id": 1, "name": "kenneth"} blob da95f82 HOW
  • 13. Trees > git update-index --add --cacheinfo 100644 da95f8264a0ffe3df10e94eed6371ea83aee9a4d 1.json > git write-tree d6916d3e27baa9ef2742c2ba09696f22e41011a1 > git cat-file -p d6916d 100644 blob da95f8264a0ffe3df10e94eed6371ea83aee9a4d 1.json tree d6916d3 blob: 1.json blob da95f82 HOW
  • 14. Commits > echo "commit kenneth" | git commit-tree d6916d 05c1cec5685bbb84e806886dba0de5e2f120ab2a > git log --stat 05c1ce commit 05c1cec5685bbb84e806886dba0de5e2f120ab2a Author: Kenneth Truyers <truyers.kenneth@gmail.com> Date: Sat Apr 29 10:15:23 2017 +0200 "commit kenneth" 1.json | 1 + 1 files changed, 1 insertions(+) > git show 05c1cec:1.json {"id": 1, "name": "kenneth"} commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82 HOW
  • 15. Updating the file > echo {"id": 1, "name": "updated name"} | git hash- object -w --stdin 42d0d209ecf70a96666f5a4c8ed97f3fd2b75dda > git update-index --add --cacheinfo 100644 42d0d209ecf70a96666f5a4c8ed97f3fd2b75dda 1.json > git write-tree 2c59068b29c38db26eda42def74b7142de392212 > echo "Commit Kenneth v2" | git commit-tree 2c59068 -p 05c1cec 9918e46dfc4241f0782265285970a7c16bf499e4 commit 9918e46 tree: 2c59c06 tree 2c59c06 blob: 1.json blob 42d0620 commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82 HOW
  • 16. Updating the file > git log --stat 9918e46 9918e46dfc4241f0782265285970a7c16bf4 "commit Kenneth v2" 1.json | 1 + 1 file changed, 1 insertions(+) 05c1cec5685bbb84e806886dba0de5e2f120 "commit kenneth" 1.json | 1 + 1 file changed, 1 insertion(+) > git show 9918e4:1.json {"id": 1, "name": "updated name"} > git show 05c1cec:1.json {"id": 1, "name": "kenneth"} commit 9918e46 tree: 2c59c06 tree 2c59c06 blob: 1.json blob 42d0620 commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82 HOW
  • 17. master commit: 9918e46 commit 9918e46 tree: 2c59c06 tree 2c59c06 blob: 1.json blob 42d0620 commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82 Refs > git update-ref refs/heads/master 9918e4 > git show master:1.json {"id": 1, "name": "updated name"} HOW
  • 18. Cool, but… I’m a programmer portable, pure C implementation of the core git methods https://libgit2.github.com/ libgit2sharp brings might and speed of libgit2 to the managed world of .NET and Mono https://github.com/libgit2/libgit2sharp HOW
  • 19. Building a git db in C# public class GitDb { public string Save<T>(string branch, string key, T value) { } public T Get<T>(string branch, string key) { } } HOW
  • 20. Building a git db in C# > git init --bare public class GitDb { readonly Repository _repo; public GitDb(string path) { Repository.Init(path, isBare: true); _repo = new Repository(path); } public string Save<T>(string branch, string key, T value) { } public T Get<T>(string branch, string key) { } } HOW
  • 21. Writing a blob public string Save<T>(string branch, string key, T value) { string val = JsonConvert.SerializeObject(value); byte[] bytes = Encoding.UTF8.GetBytes(val); Stream stream = new MemoryStream(bytes); Blob blob = _repo.ObjectDatabase.CreateBlob(stream); } commit 9918e46 tree 2c59c06 blob 42d0620 master commit: 05c1cec commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82 HOW
  • 22. Writing a tree public string Save<T>(string branch, string key, T value) { string val = JsonConvert.SerializeObject(value); byte[] bytes = Encoding.UTF8.GetBytes(val); Stream stream = new MemoryStream(bytes); Blob blob = _repo.ObjectDatabase.CreateBlob(stream); Commit currentCommit = _repo.Branches[branch].Tip; TreeDefinition treeDef = TreeDefinition.From(currentCommit); treeDef.Add(key, blob, Mode.NonExecutableFile); Tree tree = _repo.ObjectDatabase.CreateTree(treeDef); } commit 9918e46 tree 2c59c06 blob: 1.json blob 42d0620 master commit: 05c1cec commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82 HOW
  • 23. Committing the tree public string Save<T>(string branch, string key, T value) { string val = JsonConvert.SerializeObject(value); byte[] bytes = Encoding.UTF8.GetBytes(val); Stream stream = new MemoryStream(bytes); Blob blob = _repo.ObjectDatabase.CreateBlob(stream); Commit currentCommit = _repo.Branches[branch].Tip; TreeDefinition treeDef = TreeDefinition.From(currentCommit); treeDef.Add(key, blob, Mode.NonExecutableFile); var tree = _repo.ObjectDatabase.CreateTree(treeDef); var now = DateTimeOffset.Now; Commit commit = _repo.ObjectDatabase.CreateCommit( author: new Signature("author", "email", now), message: "commit message", tree: tree, parents: new List<Commit> {currentCommit}); } commit 9918e46 tree: 2c59c06 tree 2c59c06 blob: 1.json blob 42d0620 master commit: 05c1cec commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82 HOW
  • 24. master commit: 9918e46 commit 9918e46 tree: 2c59c06 tree 2c59c06 blob: 1.json blob 42d0620 master commit: 05c1cec commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82 Updating the branch public string Save<T>(string branch, string key, T value) { string val = JsonConvert.SerializeObject(value); byte[] bytes = Encoding.UTF8.GetBytes(val); Stream stream = new MemoryStream(bytes); Blob blob = _repo.ObjectDatabase.CreateBlob(stream); Commit currentCommit = _repo.Branches[branch].Tip; TreeDefinition treeDef = TreeDefinition.From(currentCommit); treeDef.Add(key, blob, Mode.NonExecutableFile); var tree = _repo.ObjectDatabase.CreateTree(treeDef); var now = DateTimeOffset.Now; Commit commit = _repo.ObjectDatabase.CreateCommit( author: new Signature("author", "email", now), message: "commit message", tree: tree, parents: new List<Commit> {currentCommit}); String refName = _repo.Branches .Single(b => b.FriendlyName == branch) .CanonicalName; _repo.Refs.UpdateTarget(_repo.Refs[refName], commit.Id); return commit.Sha; } HOW
  • 25. Reading public T Get<T>(string branch, string key) { Branch branchRef = _repo.Branches[branch]; Commit commit = branchRef.Tip; TreeEntry entry = commit[key]; GitObject gitObj = entry.Target; Blob blob = gitObj as Blob; string content = blob.GetContentText(); return JsonConvert.DeserializeObject<T>(content); } master commit: 9918e4 commit 9918e46 tree: 2c59c06 tree 2c59c06 blob: 1.json blob 42d0620 commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82 HOW
  • 26. Cool, but… where’s my ORM? HOW github.com/YellowLineParking/Appy.GitDb goo.gl/vQpxra
  • 28. Schema-less public class User { public int Id { get; set; } public string Name { get; set; } } WHY public class User { public int Id { get; set; } public string FirstName { get; set; } public string LastName { get; set; } public string Street { get; set; } public string Country { get; set; } }
  • 29. Versioning and roll-back WHY > git show master:1.json {"id": 1, "name": "updated name"} > git update-ref refs/heads/master 05c1ce > git show master:1.json {"id": 1, "name": "kenneth"}' master commit: 9918e46 commit 9918e46 tree: 2c59c06 tree 2c59c06 blob: 1.json blob 42d0620 master commit: 05c1ce commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82
  • 30. Diff WHY > git diff 9918e4 05c1ce -- 1.json diff --git a/1.json b/1.json --- a/1.json +++ b/1.json @@ -1 +1 @@ -{"id": 1, "name": "updated name"} +{"id": 1, "name": "kenneth"} master commit 9918e46 tree: 2c59c06 tree 2c59c06 blob: 1.json blob 42d0620 master commit: 05c1ce commit 05c1cec tree: d6916d3 tree d6916d3 blob: 1.json blob da95f82
  • 31. Backup & Replication WHY > git remote add backup https://github.com/user/repo.git > git push backup > type .git/hooks/post-commit #!/bin/sh git push backup
  • 32. Transactions Short lived WHY > echo "file1" | git hash-object -w --stdin 42d0d20... > echo "file2" | git hash-object -w --stdin 5d8f5r0... > git update-index --add --cacheinfo 100644 42d0d20... 1.json > git update-index --add --cacheinfo 100644 5d8f5r0... 2.json > git write-tree 9c38db2... > echo "commit transaction" | git commit-tree 9c38db2 -p 05c1cec 9918e46...
  • 33. Transactions Long lived WHY > git checkout -b transaction ... ... ... > git checkout master > git checkout merge transaction
  • 38. public string Save<T>(string branch, string key, T value) { Blob blob = _repo.ObjectDatabase.CreateBlob(...); TreeDefinition treeDef = TreeDefinition.From(...); treeDef.Add(key, blob, Mode.NonExecutableFile); Tree tree = _repo.ObjectDatabase.CreateTree(treeDef); Commit commit = _repo.ObjectDatabase.CreateCommit(...); _repo.Refs.UpdateTarget(_repo.Refs[refName], commit.Id); } Concurrency WHY NOT
  • 39. public string Save<T>(string branch, string key, T value) { Blob blob = _repo.ObjectDatabase.CreateBlob(...); TreeDefinition treeDef = TreeDefinition.From(...); treeDef.Add(key, blob, Mode.NonExecutableFile); Tree tree = _repo.ObjectDatabase.CreateTree(treeDef); Commit commit = _repo.ObjectDatabase.CreateCommit(...); _repo.Refs.UpdateTarget(_repo.Refs[refName], commit.Id); } Concurrency WHY NOT blob A tree A commit A master tree B blob B tree C blob C
  • 40. public string Save<T>(string branch, string key, T value) { Blob blob = _repo.ObjectDatabase.CreateBlob(...); TreeDefinition treeDef = TreeDefinition.From(...); treeDef.Add(key, blob, Mode.NonExecutableFile); Tree tree = _repo.ObjectDatabase.CreateTree(treeDef); Commit commit = _repo.ObjectDatabase.CreateCommit(...); _repo.Refs.UpdateTarget(_repo.Refs[refName], commit.Id); } Concurrency WHY NOT commit A master commit B commit C mastermaster
  • 41. var locks = new Dictionary<string, object> { {"master", new object() }, {"branch", new object() }, }; lock (locks[branch]) { // ... } Solution WHY NOT
  • 42. Performance WHY NOT Writes per second 125 (10,000 files) 18 (1,000,000 files) 100644 blob b9e36d8ed060ebdad585e93cec84 1.json 100644 blob 5a947c79aae7b7f6c663f632bfcf 2.json . . 100644 blob 4b54f5a8e56cfbc90947e3071ef6 999998.json 100644 blob 06179b3af5668bf7aa1dcb205949 999999.json 100644 blob a3d0db85108eb3e0c3d2b989a2ce 1000000.json tree B tree A commit master ... 100644 blob b9e36d8ed060ebdad585e93cec84 1.json 100644 blob 5a947c79aae7b7f6c663f632bfcf 2.json . . 100644 blob 4b54f5a8e56cfbc90947e3071ef6 999998.json 100644 blob 06179b3af5668bf7aa1dcb205949 999999.json 100644 blob a3d0db85108eb3e0c3d2b989a2ce 1000000.json 100644 blob 5bc5e544f68a56f992ca2be2fd26 1000001.json
  • 43. ... 100 ... 100 ... 100 ... 100 ... 100 ... 100 ... 100 Solution: tree nesting WHY NOT Writes per second 125 (10,000 files) 18 (1,000,000 files) 250 (1,000,000 files)
  • 44. Solution: partitioning WHY NOT App Repo 1 Disk 1 Repo 2 Disk 2 Repo 3 Disk 3 ... ...
  • 48. Good idea Bad idea Content heavy CMS, Wiki, … Partitionable Country, customer, … Fast writes Immediate consistency
  • 49. Demo
  • 50. (ab)use your tools Kenneth Truyers @kennethtruyers www.kenneth-truyers.net bit.ly/git-db-code bit.ly/git-nosql