SlideShare a Scribd company logo
2
Most read
7
Most read
10
Most read
The State of
Decentralized
Storage
26 September 2022
The Global Data Landscape
In 2019, Google,
Amazon, Microsoft
and Facebook stored
at least 1.2M TB2
Total Coins Tracked
8,563
Decentralized storage is still a small fraction of the pie
Amount of data generated globally (zettabytes)
In 2021, the world
generated
approximately 79
zettabytes of data1
Sources:
1 Statista
2 Science Focus
3 An Honest Report on Web3 Data & Storage
79B
terabytes of
data
BUT…
0.015%
of generated data is
stored3
Even with a 5x
increase in 2022, only
2
History of Decentralized Storage
Early
2000
Launch of Napster, a P2P
audio file sharing system
June
1999
Launch of Gnutella, the first
decentralized file sharing
protocol
May
2000
First version of LimeWire
released
February
2015
Birth of IPFS
June
2015
Launch of the
Sia network
June
2018
Late
2018
Launch of the
Arweave mainnet
October
2020
Launch of
Filecoin mainnet
Launch of Storj, a
decentralized cloud
storage platform
October
2019
Launch of BitTorrent File
System (BTFS) mainnet
3
September
2017
Filecoin’s ICO
raises $257M
July
2018
Justin Sun’s Tron
Foundation acquires
BitTorrent for $140M
July
2001
First version of BitTorrent
released
July
2010
The Pirate Bay, a P2P file
sharing site, is shut down by
authorities
4
Why Do We Need Decentralized Storage?
The benefits of decentralized storage mimic those of blockchain systems
Key Aspects Current State of Centralized Storage How does Decentralized Storage Improve on This
Data Availability
& Resiliency
► Data is typically stored on one main site, putting it at risk of
data becoming unavailable if site goes offline
► Backup storage / site(s) can be provisioned but would usually
cost more
► Typically works on HTTP, which points to a specific path. If
data is no longer at the path it becomes a dead link and returns
404 error.
► Data is typically broken down into many parts and stored on
multiple nodes, creating natural redundancy
► Uses advance techniques to reassemble data without
needing all the parts, further improving availability
► On-chain storage is “always up” as long as there are miners /
validators.
► In addition to supporting HTTP, mainly uses IPFS, which has a
content-based approach – not dependent on path
Security /
Encryption
► Data stored may (or may not) be encrypted
► Encryption keys are also stored on centralized databases,
making them prime targets for hacks
► Most solutions offer auto-encryption and each individual
part of the data can be encrypted separately
Data Integrity
► Tracing of any unauthorized changes to the data would require
prior setup of a logging tool. If no such tool was implemented
such changes may go unnoticed
► IPFS’ approach uses hash to identify data
► Hashing helps ensure that data have not been corrupted /
altered improperly, as doing so will change the hash
Privacy
► Certain storage providers would use the content of your data
as a form of monetization, e.g. via serving ads
► As each node stores encrypted data, there is no way for the
node to read the contents of the data
Open /
Censorship
Resistance
► Provider may undertake KYC on prospective clients
► Data stored in specific sites would be governed by local data
laws / regulations
► Permissionless solutions where anyone can store data
► As nodes have no knowledge of actual content of data, there is
no censorship possible
Performance /
Scalability
► Typically would have multi-tier offerings based on
performance
► Actual performance based on proximity to providers’ data
centers, which are typically built in locations with advance
bandwidth infrastructure
► Sites are typically large datacenters run by providers, which
requires significant investment from providers
► Beginning to see emergence of multiple tiers (though still not
prevalent and still not as varied as centralized providers)
► Potentially could have greater coverage of nodes spread out
across the globe, helping reach more far-flung areas
► Any miners / validator that meet much lower minimum
hardware requirements can join in return for rewards
Who’s Currently Using It?
Universally used by everyone, both in and out of the crypto space
• Images, audio and video files are
usually too expensive to store on
the blockchain.
• But using decentralized storage, the
content of the NFT (audio, images
and its metadata) can be stored off-
chain using a unique hash.
• The hash can then be hosted on
IPFS or other decentralized
solutions where it can be stored
and accessed.
• Allows developers to create tools
and websites in an isolated
environment.
• Developers can create and host
important files and UIs that are
censorship-resistant and pivotal
for decentralized applications
(dApps)
• Networks can store historic on-
chain data to reduce the
computational load of validators.
• Users can permanently store
important documents and retrieve
them even if centralized services
fail or cease to exist.
• Businesses can backup their data
permanently for future purposes.
• Ability to access websites or
information restricted by
centralized entities.
NFTs Everyday Users
Developers
5
6
Decentralized Storage Projects by Data Persistence Mechanisms
Except Arweave, the decentralized storage projects today are contract-based
Contract-based Persistence
▪ Instead of replicating data across every node on
the network, a set of multiple nodes enter into
a contract to store a piece data for a specific
period of time
▪ The contract then can be renewed if the time
period needs to be extended.
▪ Instead of the entire data set, the hash of where
the data is located gets stored on-chain.
Blockchain-based Persistence
▪ Technically, every blockchain is a distributed
database and can function as a decentralized
storage network
▪ However, most blockchains are not built to
store large amounts of data. They are designed
more to store transactions, and are typically
also append-only.
▪ They are also inefficient in the sense that every
node on the network needs to keep a copy of
the data.
Source: Ethereum Foundation
7
Features and Technical Specifications
Each decentralized storage solution comes with its own unique features
~$1,520M
Proof of Spacetime
(PoSt) & Proof of
Replication (PoR)
Users choose the
number of copies to be
replicated
Users choose whether to
encrypt their stored data
or not
Utilizes Filecoin Virtual
Machine (FVM)
CPU: 8 cores or more
RAM: 137GB or more
Hard Drive: 1.1TB or
more
~$546M
Succinct Proof of
Random Access (SPoRA)
Via recall data stored by
miners. Data is replicated
over 16 times across the
blockweave
Users choose whether to
encrypt their stored data
or not
‘Lazy’ SmartWeave
contracts that are
executed and validated
by users, not the
network
CPU: 6 cores or more
RAM: 8.6GB or more
Hard Drive: 4TB or more
~$81M Proof of Availability (PoA)
Via Reed-Solomon
erasure coding. Data is
split into 80 pieces and
only 29 is needed for
retrieval
Automatically encrypted
using the AES-256
algorithm by default
Does not have smart
contracts
CPU: 1 core or more
RAM: 2GB or more
Hard Drive: 550GB or
more
~$214M Proof of Work (PoW)
Via Reed-Solomon
erasure coding. Data is
split into 30 pieces and
only 10 is needed for
retrieval
Automatically encrypted
using the Threefish
algorithm by default
File contracts between
renters and storage
providers, automatically
enforced by the network
CPU: 4 cores or more
RAM: 8GB or more
Hard Drive: 64GB or
more
~$851M Proof of Stake (PoS)
Via Reed-Solomon
erasure coding. Data is
split into 30 pieces and
only 10 is needed for
retrieval
Users choose whether to
encrypt their stored data
or not
Utilizes BitTorrent-Chain
Virtual Machine
(BTTCVM)
CPU: 1 core or more
RAM: 1GB or more
Hard Drive: 32GB or
more
$1.5T1 N/A
Users choose specific
files to replicate within or
across different regions
Users can enable server-
side encryption using the
AES-256 algorithm
N/A N/A
Current Market
Cap
Encryption
Data Replication &
Retrieval
Smart Contract
Execution
Minimum Hosting
Requirements
Consensus
Algorithm
1 Source: Amazon market cap Barron’s
Active Nodes
Is it really decentralized?
Not all nodes are created equal.
Technical or hardware requirements
for some nodes are considerably
higher, such as Filecoin and
Arweave. There will be different
tradeoffs present for different
systems.
Sources: ViewBlock, SiaStats, FIlfox, StorjStats, BTFS Scan (Data as of 8 Sept 2022)
*BTFS documentation indicate that 1) node count may include both renters and hosts, and 2) there is no limit to the number of nodes that can be run from a single public IP
Ultimately, decentralized storage
services provide censorship resistance
if users’ data are split and distributed
widely enough.
8
69
692
4,051
15,860
4.87M
0K
5K
10K
15K
20K
25K
Arweave Sia Filecoin Storj Bittorrent
4.5M
5.0M
Active
Nodes
Capacity vs Usage
Decentralized storage capacity has increased exponentially in the past 2 years
As the NFT season of 2021 took off, there was a
surge of demand for decentralized storage,
resulting in a massive increase of available
storage. By the end of 2021, the total storage
capacity breached 16.7M TB, increasing by
more than 4x from 2020.
Filecoin currently has the largest capacity
compared to other decentralized storage
solutions, with network storage power of over
21M TB. That’s more than 40x the capacity of
BitTorrent’s BTFS network, the
2nd largest decentralized storage provider.
However, most of this storage currently remains
unused. As of Q3 2022, only 1% of Filecoin’s
total capacity is actively being used. On the
other hand, usage on other smaller solutions
is much higher. For example, ~64% of Storj’s
total capacity is currently utilized.
*Total capacity data includes Arweave, Filecoin, Storj, and Sia. BTFS excluded due to incomplete data
**Arweave usage is always equal to its capacity ​ 9
$0.0002 $6.00*
$1.09 $5.00
$4.00 $7.00
$0.94 $5.00
$3.01 $7.00*
Decentralized Trumps Centralized
10
Centralized
Providers
Decentralized
Providers
The catch:
• Bandwidth - Upload (ingress) and retrieval (egress)
fees are involved as well.
• Storj charges $7 / TB to upload / download while Sia
costs $0.41 / TB to upload, and $2 / TB to download.
• Filecoin charges a market price that’s quoted by the
storage or retrieval miners.
* Only 2TB packages available. Figure is derived by dividing the cost of package by half.
Decentralized storage is much cheaper
• In terms of pricing, demand for decentralized
storage trumps centralized storage. Even in the
decentralized sector, there are outliers especially
Filecoin.
• Filecoin Plus, an incentive program that boosts
rewards for legit, verified deals, has seen storage
providers offering near-zero or zero fees to compete
for block rewards.
• These rewards are often subsidized by Filecoin as
they aim to grow the network.
Cost of Decentralized Storage
Monthly Price
per TB
Monthly Price
per TB
Protocol Revenue
Filecoin is far ahead of its peers, but is still a long way behind centralized services
11
11
The bulk of decentralized storage protocols’
revenue comes from their network fees and
are closely intertwined with the price of their
coins (in USD terms).
Filecoin’s revenue has taken a hit due to the
slumping FIL price. For context, FIL closed at
$5.32 by the end of Q2 2022, -85% compared to
the start of the year. Yet, Filecoin is still by far the
largest decentralized storage solution in
revenue.
$19.7 B
$2.9 B
$0.6 B
$0 B
$2 B
$4 B
$6 B
$8 B
$10 B
$12 B
$14 B
$16 B
$18 B
$20 B
AWS Oracle Dropbox Filecoin Arweave SiaCoin
Centralized Storage Decentralized Storage
$13,372,089
$193,430
$35,531
$0 M
$2 M
$4 M
$6 M
$8 M
$10 M
$12 M
$14 M
Filecoin Arweave SiaCoin
Revenue (Q2 2022)
Decentralized storage however still pales in
comparison against its centralized
counterparts. AWS for instance raked in close to
$20 billion in Q2, more than 1,000 times the
revenue of Filecoin, Arweave and Siacoin
combined.
Decentralized storage networks are still growing,
and we can expect more services and
applications being built on top to provide
additional revenue streams in the future.
Revenue data for other decencentralized solutions such as Storj and Bittorrent unavailable.
Sources: TokenTerminal, SiaStats, CNBC
• Web3.Storage
• NFT.storage
• Filedrive
• Fleek
• Estuary
• Lighthouse
• Ocean
• Filehive
• ChainSafe
Files
• Slate
• Chingari • Filfox
• Filscan
• Filecoin
Green
• Kyve
• Permafrost
• Via
• Arweave.
Design
• Gitopia
• TestWeave
• Amplify
• Koii
• Meson.
network
• Ardrive
• Akord
• Evermore
• Verto
• Pianity
• Metaweave
• Decent.land
• Glass
• Koii
• Weve
• Sarcophagus
• ArVerify
• Traxa
• Uplink CLI • Drivex
• Arq
• FileZilla
• Fastly
• Skynet • Filebase
• VUP
• Arzen
• SkyFeed • SkyID
• SkySend
Decentralized Apps on Decentralized Storage
What's being built on storage L1s?
12
Most ubiquitous use of de-storage is
for storing NFTs and Web3 data.
However, apps span many different
current and emerging use cases, some
which may overlap. Others include
content distribution networks,
decentralized IDs, oracles, payment
systems, e-mail and more.
Arweave leads in terms of native
applications and smart contract
capabilities, while Filecoin has many
prominent users (e.g. MagicEden,
OpenSea, Audius, etc.) and stores
important data (e.g. Shoah
Foundation, Internet Archive).
Further buildout of the Filecoin Virtual
Machine in late 2022 could see more
native applications built on Filecoin.
Web3
storage
Dev
Tooling
Data
Market
Consumer
Storage
Marketplaces Socials Others
*BTFS apps still in development
Looking to the future
Decentralized storage is part of the value chain of online computing
13
User Computing Storage
Blockchain
Cloud
1. Immutable data
2. Censorship-resistant
3. Usually cheaper
1. Convenient
2. Established network effect
3. One-stop shop for IT products
To understand where decentralized storage is headed, we must look at the bigger picture. Data storage is merely a subset of the
larger value chain of online computing.
Value Chain
Understanding the competition
Incumbents already have a head start after capturing the value chain
14
While decentralized storage providers focus on one aspect of the value chain, incumbents like Amazon already have a
suite of cloud products designed for online computing, and not just for storage. Under storage alone, AWS has 9 types
of different products, catering to different solutions. However, they have 227 products (as of 15 Sept 2022) across
different lines like AI, IoT etc. This allows them to cross-sell different products while offering a holistic solution
under the umbrella of Cloud services.
1 Source: Arweave News
Incumbents have existing products that funnel users into their storage space. Platforms like Google and Microsoft offer
email messenger services and word processing programs (e.g., Microsoft Word and Google Docs). Users will naturally
save their files on the most convenient platform which is usually the incumbent’s native product.
The regulatory space surrounding data management is strict. Compliance to the GDPR has become the main hurdle
that every business faces when it comes to data compliance. As it stands, there are some arguments to suggest that
blockchain technology is not compliant due to the permanence of blockchain data1. At the very least, the legal
state of decentralized storage is uncertain. As a business/consumer, would you be willing to take that risk? Or would
you rather stick with existing cloud providers that are more familiar?
1
2
3
Conclusion and Key Takeaways
There is a lot of ground to cover before decentralized storage can become mainstream
15
Many decentralized storage solutions are not as decentralized as you think.
Decentralized storage has its advantages, but it will take time before it even comes remotely
close to the big boys. Outside of censorship concerns and costs, there is little incentive to
migrate outside of centralized storage providers, especially when you consider the compliance
concerns. The first mover advantage has allowed incumbents to secure a strong network
effect. Not only that, incumbents like Amazon and Microsoft have a suite of complementary IT-
related products to enforce user stickiness.
Bridging the gap will require innovative methods to capture other parts of the value chain
and funneling users into decentralized storage platforms. Some opportunities include:
Integration with traditional
software companies
Building products which can
attract genuine users
Tackling niche areas that are
dominated by cloud technology,
such as edge computing
THANK YOU!
@bobbyong
@coingecko
16

More Related Content

PDF
Introduction to Data Science
PDF
State of the Cloud 2023—The AI era
PDF
The Brand Gap by Marty Neumeier
PPTX
Ceph Intro and Architectural Overview by Ross Turk
PPTX
Depression
PDF
Network slicing-5g-beyond-networks
PDF
13 ChatGPT Prompts For Studying
PDF
Hadean's $30M Series A pitch deck for Web3 metaverse infrastructure
Introduction to Data Science
State of the Cloud 2023—The AI era
The Brand Gap by Marty Neumeier
Ceph Intro and Architectural Overview by Ross Turk
Depression
Network slicing-5g-beyond-networks
13 ChatGPT Prompts For Studying
Hadean's $30M Series A pitch deck for Web3 metaverse infrastructure

What's hot (20)

PDF
SXSW 2016 takeaways
PDF
5 Best Metaverse Games to Play
PDF
Unlocking the Power of Generative AI An Executive's Guide.pdf
PDF
Overview Of Blockchain Technology And Architecture Powerpoint Presentation Sl...
PDF
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
PDF
Metaverse System Architectures
PPTX
Big data Presentation
PPTX
BLOCKCHAIN
PPTX
Big data
PDF
AI as a service
PPT
Big data ppt
PDF
The Future of Trade: Special Gaming Edition
PDF
Generative AI
PDF
Blockchain Technology Fundamentals
PDF
Solve for X with AI: a VC view of the Machine Learning & AI landscape
PPTX
Blockchain Introduction Presentation
PPTX
The Future of AI is Generative not Discriminative 5/26/2021
PDF
Visualizing Google Cloud 101 Illustrated References for Cloud Engineers and A...
PPTX
Introduction to ChatGPT
PDF
TEDx Manchester: AI & The Future of Work
SXSW 2016 takeaways
5 Best Metaverse Games to Play
Unlocking the Power of Generative AI An Executive's Guide.pdf
Overview Of Blockchain Technology And Architecture Powerpoint Presentation Sl...
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Metaverse System Architectures
Big data Presentation
BLOCKCHAIN
Big data
AI as a service
Big data ppt
The Future of Trade: Special Gaming Edition
Generative AI
Blockchain Technology Fundamentals
Solve for X with AI: a VC view of the Machine Learning & AI landscape
Blockchain Introduction Presentation
The Future of AI is Generative not Discriminative 5/26/2021
Visualizing Google Cloud 101 Illustrated References for Cloud Engineers and A...
Introduction to ChatGPT
TEDx Manchester: AI & The Future of Work
Ad

Similar to The State of Decentralized Storage (20)

PDF
IRJET- Distributed Decentralized Data Storage using IPFS
PPTX
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
PDF
final-unit-ii-cc-cloud computing-2022.pdf
PDF
IRJET- A Survey on File Storage and Retrieval using Blockchain Technology
PDF
Introduction to Filecoin
PDF
Introduction to IPFS & Filecoin - longer version
PDF
Utopoll Whitepaper.pdf
PPT
Huawei Symantec Oceanspace N8000 clustered NAS Overview
PDF
UTOPOLL白皮書.pdf
PPTX
Storage As A Service (StAAS)
PDF
HCSA-Presales-Storage V4.0 Training Material (2).pdf
PPTX
Introduction to IPFS & Filecoin
PPTX
storage system, iscsi,file storage, NAS, SAS
PDF
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
PDF
5 step for deploying cost effective cloud ecommerce
PDF
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
PDF
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
PPTX
Spectrum Scale Unified File and Object with WAN Caching
PPTX
Software Defined Analytics with File and Object Access Plus Geographically Di...
PPT
Predictable Big Data Performance in Real-time
IRJET- Distributed Decentralized Data Storage using IPFS
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
final-unit-ii-cc-cloud computing-2022.pdf
IRJET- A Survey on File Storage and Retrieval using Blockchain Technology
Introduction to Filecoin
Introduction to IPFS & Filecoin - longer version
Utopoll Whitepaper.pdf
Huawei Symantec Oceanspace N8000 clustered NAS Overview
UTOPOLL白皮書.pdf
Storage As A Service (StAAS)
HCSA-Presales-Storage V4.0 Training Material (2).pdf
Introduction to IPFS & Filecoin
storage system, iscsi,file storage, NAS, SAS
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
5 step for deploying cost effective cloud ecommerce
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Spectrum Scale Unified File and Object with WAN Caching
Software Defined Analytics with File and Object Access Plus Geographically Di...
Predictable Big Data Performance in Real-time
Ad

More from CoinGecko (20)

PDF
2024 Q2 Crypto Industry Report | CoinGecko
PDF
2024 Q1 Crypto Industry Report | CoinGecko
PDF
RWA Report 2024: Rise of Real-World Assets in Crypto | CoinGecko
PDF
RWA Report 2024: Rise of Real-World Assets in Crypto | CoinGecko
PDF
2023 Annual Crypto Industry Report | CoinGecko
PDF
GameFi Report 2023: GameFi Levels Up | CoinGecko
PDF
2023 Q3 Crypto Industry Report | CoinGecko
PDF
Deep Diving Into ETH Liquid Staking | CoinGecko
PDF
2023 Q2 Crypto Industry Report | CoinGecko
PDF
The State of Decentralized Perpetual Protocols | CoinGecko
PDF
2023 Q1 Crypto Industry Report | CoinGecko
PDF
The Global Crypto Classification Standard by 21Shares & CoinGecko
PDF
2022 Annual Crypto Industry Report
PDF
2022 Q3 Quarterly Report.pdf
PDF
CoinGecko Q3 2022 Quarterly Report
PDF
CoinGecko Q2 2022 Quarterly Report
PDF
CoinGecko Q2 2022 Quarterly Report
PPTX
CoinGecko Q2 2022 Quarterly Report
PDF
CoinGecko Q2 2022 Quarterly Report
PDF
CoinGecko 2019 Year End Cryptocurrency Report
2024 Q2 Crypto Industry Report | CoinGecko
2024 Q1 Crypto Industry Report | CoinGecko
RWA Report 2024: Rise of Real-World Assets in Crypto | CoinGecko
RWA Report 2024: Rise of Real-World Assets in Crypto | CoinGecko
2023 Annual Crypto Industry Report | CoinGecko
GameFi Report 2023: GameFi Levels Up | CoinGecko
2023 Q3 Crypto Industry Report | CoinGecko
Deep Diving Into ETH Liquid Staking | CoinGecko
2023 Q2 Crypto Industry Report | CoinGecko
The State of Decentralized Perpetual Protocols | CoinGecko
2023 Q1 Crypto Industry Report | CoinGecko
The Global Crypto Classification Standard by 21Shares & CoinGecko
2022 Annual Crypto Industry Report
2022 Q3 Quarterly Report.pdf
CoinGecko Q3 2022 Quarterly Report
CoinGecko Q2 2022 Quarterly Report
CoinGecko Q2 2022 Quarterly Report
CoinGecko Q2 2022 Quarterly Report
CoinGecko Q2 2022 Quarterly Report
CoinGecko 2019 Year End Cryptocurrency Report

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
KodekX | Application Modernization Development
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
cuic standard and advanced reporting.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Digital-Transformation-Roadmap-for-Companies.pptx
KodekX | Application Modernization Development
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

The State of Decentralized Storage

  • 2. The Global Data Landscape In 2019, Google, Amazon, Microsoft and Facebook stored at least 1.2M TB2 Total Coins Tracked 8,563 Decentralized storage is still a small fraction of the pie Amount of data generated globally (zettabytes) In 2021, the world generated approximately 79 zettabytes of data1 Sources: 1 Statista 2 Science Focus 3 An Honest Report on Web3 Data & Storage 79B terabytes of data BUT… 0.015% of generated data is stored3 Even with a 5x increase in 2022, only 2
  • 3. History of Decentralized Storage Early 2000 Launch of Napster, a P2P audio file sharing system June 1999 Launch of Gnutella, the first decentralized file sharing protocol May 2000 First version of LimeWire released February 2015 Birth of IPFS June 2015 Launch of the Sia network June 2018 Late 2018 Launch of the Arweave mainnet October 2020 Launch of Filecoin mainnet Launch of Storj, a decentralized cloud storage platform October 2019 Launch of BitTorrent File System (BTFS) mainnet 3 September 2017 Filecoin’s ICO raises $257M July 2018 Justin Sun’s Tron Foundation acquires BitTorrent for $140M July 2001 First version of BitTorrent released July 2010 The Pirate Bay, a P2P file sharing site, is shut down by authorities
  • 4. 4 Why Do We Need Decentralized Storage? The benefits of decentralized storage mimic those of blockchain systems Key Aspects Current State of Centralized Storage How does Decentralized Storage Improve on This Data Availability & Resiliency ► Data is typically stored on one main site, putting it at risk of data becoming unavailable if site goes offline ► Backup storage / site(s) can be provisioned but would usually cost more ► Typically works on HTTP, which points to a specific path. If data is no longer at the path it becomes a dead link and returns 404 error. ► Data is typically broken down into many parts and stored on multiple nodes, creating natural redundancy ► Uses advance techniques to reassemble data without needing all the parts, further improving availability ► On-chain storage is “always up” as long as there are miners / validators. ► In addition to supporting HTTP, mainly uses IPFS, which has a content-based approach – not dependent on path Security / Encryption ► Data stored may (or may not) be encrypted ► Encryption keys are also stored on centralized databases, making them prime targets for hacks ► Most solutions offer auto-encryption and each individual part of the data can be encrypted separately Data Integrity ► Tracing of any unauthorized changes to the data would require prior setup of a logging tool. If no such tool was implemented such changes may go unnoticed ► IPFS’ approach uses hash to identify data ► Hashing helps ensure that data have not been corrupted / altered improperly, as doing so will change the hash Privacy ► Certain storage providers would use the content of your data as a form of monetization, e.g. via serving ads ► As each node stores encrypted data, there is no way for the node to read the contents of the data Open / Censorship Resistance ► Provider may undertake KYC on prospective clients ► Data stored in specific sites would be governed by local data laws / regulations ► Permissionless solutions where anyone can store data ► As nodes have no knowledge of actual content of data, there is no censorship possible Performance / Scalability ► Typically would have multi-tier offerings based on performance ► Actual performance based on proximity to providers’ data centers, which are typically built in locations with advance bandwidth infrastructure ► Sites are typically large datacenters run by providers, which requires significant investment from providers ► Beginning to see emergence of multiple tiers (though still not prevalent and still not as varied as centralized providers) ► Potentially could have greater coverage of nodes spread out across the globe, helping reach more far-flung areas ► Any miners / validator that meet much lower minimum hardware requirements can join in return for rewards
  • 5. Who’s Currently Using It? Universally used by everyone, both in and out of the crypto space • Images, audio and video files are usually too expensive to store on the blockchain. • But using decentralized storage, the content of the NFT (audio, images and its metadata) can be stored off- chain using a unique hash. • The hash can then be hosted on IPFS or other decentralized solutions where it can be stored and accessed. • Allows developers to create tools and websites in an isolated environment. • Developers can create and host important files and UIs that are censorship-resistant and pivotal for decentralized applications (dApps) • Networks can store historic on- chain data to reduce the computational load of validators. • Users can permanently store important documents and retrieve them even if centralized services fail or cease to exist. • Businesses can backup their data permanently for future purposes. • Ability to access websites or information restricted by centralized entities. NFTs Everyday Users Developers 5
  • 6. 6 Decentralized Storage Projects by Data Persistence Mechanisms Except Arweave, the decentralized storage projects today are contract-based Contract-based Persistence ▪ Instead of replicating data across every node on the network, a set of multiple nodes enter into a contract to store a piece data for a specific period of time ▪ The contract then can be renewed if the time period needs to be extended. ▪ Instead of the entire data set, the hash of where the data is located gets stored on-chain. Blockchain-based Persistence ▪ Technically, every blockchain is a distributed database and can function as a decentralized storage network ▪ However, most blockchains are not built to store large amounts of data. They are designed more to store transactions, and are typically also append-only. ▪ They are also inefficient in the sense that every node on the network needs to keep a copy of the data. Source: Ethereum Foundation
  • 7. 7 Features and Technical Specifications Each decentralized storage solution comes with its own unique features ~$1,520M Proof of Spacetime (PoSt) & Proof of Replication (PoR) Users choose the number of copies to be replicated Users choose whether to encrypt their stored data or not Utilizes Filecoin Virtual Machine (FVM) CPU: 8 cores or more RAM: 137GB or more Hard Drive: 1.1TB or more ~$546M Succinct Proof of Random Access (SPoRA) Via recall data stored by miners. Data is replicated over 16 times across the blockweave Users choose whether to encrypt their stored data or not ‘Lazy’ SmartWeave contracts that are executed and validated by users, not the network CPU: 6 cores or more RAM: 8.6GB or more Hard Drive: 4TB or more ~$81M Proof of Availability (PoA) Via Reed-Solomon erasure coding. Data is split into 80 pieces and only 29 is needed for retrieval Automatically encrypted using the AES-256 algorithm by default Does not have smart contracts CPU: 1 core or more RAM: 2GB or more Hard Drive: 550GB or more ~$214M Proof of Work (PoW) Via Reed-Solomon erasure coding. Data is split into 30 pieces and only 10 is needed for retrieval Automatically encrypted using the Threefish algorithm by default File contracts between renters and storage providers, automatically enforced by the network CPU: 4 cores or more RAM: 8GB or more Hard Drive: 64GB or more ~$851M Proof of Stake (PoS) Via Reed-Solomon erasure coding. Data is split into 30 pieces and only 10 is needed for retrieval Users choose whether to encrypt their stored data or not Utilizes BitTorrent-Chain Virtual Machine (BTTCVM) CPU: 1 core or more RAM: 1GB or more Hard Drive: 32GB or more $1.5T1 N/A Users choose specific files to replicate within or across different regions Users can enable server- side encryption using the AES-256 algorithm N/A N/A Current Market Cap Encryption Data Replication & Retrieval Smart Contract Execution Minimum Hosting Requirements Consensus Algorithm 1 Source: Amazon market cap Barron’s
  • 8. Active Nodes Is it really decentralized? Not all nodes are created equal. Technical or hardware requirements for some nodes are considerably higher, such as Filecoin and Arweave. There will be different tradeoffs present for different systems. Sources: ViewBlock, SiaStats, FIlfox, StorjStats, BTFS Scan (Data as of 8 Sept 2022) *BTFS documentation indicate that 1) node count may include both renters and hosts, and 2) there is no limit to the number of nodes that can be run from a single public IP Ultimately, decentralized storage services provide censorship resistance if users’ data are split and distributed widely enough. 8 69 692 4,051 15,860 4.87M 0K 5K 10K 15K 20K 25K Arweave Sia Filecoin Storj Bittorrent 4.5M 5.0M Active Nodes
  • 9. Capacity vs Usage Decentralized storage capacity has increased exponentially in the past 2 years As the NFT season of 2021 took off, there was a surge of demand for decentralized storage, resulting in a massive increase of available storage. By the end of 2021, the total storage capacity breached 16.7M TB, increasing by more than 4x from 2020. Filecoin currently has the largest capacity compared to other decentralized storage solutions, with network storage power of over 21M TB. That’s more than 40x the capacity of BitTorrent’s BTFS network, the 2nd largest decentralized storage provider. However, most of this storage currently remains unused. As of Q3 2022, only 1% of Filecoin’s total capacity is actively being used. On the other hand, usage on other smaller solutions is much higher. For example, ~64% of Storj’s total capacity is currently utilized. *Total capacity data includes Arweave, Filecoin, Storj, and Sia. BTFS excluded due to incomplete data **Arweave usage is always equal to its capacity ​ 9
  • 10. $0.0002 $6.00* $1.09 $5.00 $4.00 $7.00 $0.94 $5.00 $3.01 $7.00* Decentralized Trumps Centralized 10 Centralized Providers Decentralized Providers The catch: • Bandwidth - Upload (ingress) and retrieval (egress) fees are involved as well. • Storj charges $7 / TB to upload / download while Sia costs $0.41 / TB to upload, and $2 / TB to download. • Filecoin charges a market price that’s quoted by the storage or retrieval miners. * Only 2TB packages available. Figure is derived by dividing the cost of package by half. Decentralized storage is much cheaper • In terms of pricing, demand for decentralized storage trumps centralized storage. Even in the decentralized sector, there are outliers especially Filecoin. • Filecoin Plus, an incentive program that boosts rewards for legit, verified deals, has seen storage providers offering near-zero or zero fees to compete for block rewards. • These rewards are often subsidized by Filecoin as they aim to grow the network. Cost of Decentralized Storage Monthly Price per TB Monthly Price per TB
  • 11. Protocol Revenue Filecoin is far ahead of its peers, but is still a long way behind centralized services 11 11 The bulk of decentralized storage protocols’ revenue comes from their network fees and are closely intertwined with the price of their coins (in USD terms). Filecoin’s revenue has taken a hit due to the slumping FIL price. For context, FIL closed at $5.32 by the end of Q2 2022, -85% compared to the start of the year. Yet, Filecoin is still by far the largest decentralized storage solution in revenue. $19.7 B $2.9 B $0.6 B $0 B $2 B $4 B $6 B $8 B $10 B $12 B $14 B $16 B $18 B $20 B AWS Oracle Dropbox Filecoin Arweave SiaCoin Centralized Storage Decentralized Storage $13,372,089 $193,430 $35,531 $0 M $2 M $4 M $6 M $8 M $10 M $12 M $14 M Filecoin Arweave SiaCoin Revenue (Q2 2022) Decentralized storage however still pales in comparison against its centralized counterparts. AWS for instance raked in close to $20 billion in Q2, more than 1,000 times the revenue of Filecoin, Arweave and Siacoin combined. Decentralized storage networks are still growing, and we can expect more services and applications being built on top to provide additional revenue streams in the future. Revenue data for other decencentralized solutions such as Storj and Bittorrent unavailable. Sources: TokenTerminal, SiaStats, CNBC
  • 12. • Web3.Storage • NFT.storage • Filedrive • Fleek • Estuary • Lighthouse • Ocean • Filehive • ChainSafe Files • Slate • Chingari • Filfox • Filscan • Filecoin Green • Kyve • Permafrost • Via • Arweave. Design • Gitopia • TestWeave • Amplify • Koii • Meson. network • Ardrive • Akord • Evermore • Verto • Pianity • Metaweave • Decent.land • Glass • Koii • Weve • Sarcophagus • ArVerify • Traxa • Uplink CLI • Drivex • Arq • FileZilla • Fastly • Skynet • Filebase • VUP • Arzen • SkyFeed • SkyID • SkySend Decentralized Apps on Decentralized Storage What's being built on storage L1s? 12 Most ubiquitous use of de-storage is for storing NFTs and Web3 data. However, apps span many different current and emerging use cases, some which may overlap. Others include content distribution networks, decentralized IDs, oracles, payment systems, e-mail and more. Arweave leads in terms of native applications and smart contract capabilities, while Filecoin has many prominent users (e.g. MagicEden, OpenSea, Audius, etc.) and stores important data (e.g. Shoah Foundation, Internet Archive). Further buildout of the Filecoin Virtual Machine in late 2022 could see more native applications built on Filecoin. Web3 storage Dev Tooling Data Market Consumer Storage Marketplaces Socials Others *BTFS apps still in development
  • 13. Looking to the future Decentralized storage is part of the value chain of online computing 13 User Computing Storage Blockchain Cloud 1. Immutable data 2. Censorship-resistant 3. Usually cheaper 1. Convenient 2. Established network effect 3. One-stop shop for IT products To understand where decentralized storage is headed, we must look at the bigger picture. Data storage is merely a subset of the larger value chain of online computing. Value Chain
  • 14. Understanding the competition Incumbents already have a head start after capturing the value chain 14 While decentralized storage providers focus on one aspect of the value chain, incumbents like Amazon already have a suite of cloud products designed for online computing, and not just for storage. Under storage alone, AWS has 9 types of different products, catering to different solutions. However, they have 227 products (as of 15 Sept 2022) across different lines like AI, IoT etc. This allows them to cross-sell different products while offering a holistic solution under the umbrella of Cloud services. 1 Source: Arweave News Incumbents have existing products that funnel users into their storage space. Platforms like Google and Microsoft offer email messenger services and word processing programs (e.g., Microsoft Word and Google Docs). Users will naturally save their files on the most convenient platform which is usually the incumbent’s native product. The regulatory space surrounding data management is strict. Compliance to the GDPR has become the main hurdle that every business faces when it comes to data compliance. As it stands, there are some arguments to suggest that blockchain technology is not compliant due to the permanence of blockchain data1. At the very least, the legal state of decentralized storage is uncertain. As a business/consumer, would you be willing to take that risk? Or would you rather stick with existing cloud providers that are more familiar? 1 2 3
  • 15. Conclusion and Key Takeaways There is a lot of ground to cover before decentralized storage can become mainstream 15 Many decentralized storage solutions are not as decentralized as you think. Decentralized storage has its advantages, but it will take time before it even comes remotely close to the big boys. Outside of censorship concerns and costs, there is little incentive to migrate outside of centralized storage providers, especially when you consider the compliance concerns. The first mover advantage has allowed incumbents to secure a strong network effect. Not only that, incumbents like Amazon and Microsoft have a suite of complementary IT- related products to enforce user stickiness. Bridging the gap will require innovative methods to capture other parts of the value chain and funneling users into decentralized storage platforms. Some opportunities include: Integration with traditional software companies Building products which can attract genuine users Tackling niche areas that are dominated by cloud technology, such as edge computing