Testing data and metadata backends with ClawIO

Testing storage and metadata backends
Hugo González Labrador, Arno Formella
LIA2, University of Vigo
CS3: Cloud Storage Services for Novel Applications and Workﬂows
Zürich, January 2016

Outline
• Origin of the project
• Architecture
• Storage backends
• Benchmark results
• Conclusion
• Outlook

• Cloud Synchronisation Benchmarking Framework
• Curiosity for testing new data and metadata backends that are novel
for synchronisation platforms.
• Flexible to plug your implementations in any language and technology
• Experiment with a new design that:
• Avoids synchronisation between the DB (containing the metadata)
and the filesystem (containing the data) that is done in the majority
of sync platforms using a local ﬁlesystem.
• Experience from being a Technical student at CERN working on the
CERNBox project.
Why did we develop ClawIO ?

META DATA META DATA META DATA
SYNC
(ownCloud Sync Protocol)
MGM
LOCAL FS XATTR/NOXATTR EOS S3/SWIFT/RADGOSGW
gRPC
HTTP
API REST
SHARE AUTH
CLI
Both
ownCloud CLIENTS 3rd PARTY APPSTECHNICAL USERS
Architecture

Data
Metadata used by ownCloud
Metadata Key Metadata Value
CHECKSUM md5:8c8d357b5e872bbacd45197
626bd5759
MTIME 104857600
PATH /local/users/d/demo/ﬁle
FILEID a8584c90-ae2a-4dd8-84a7-
f18ced109cce
ETAG 956494f8-5120-4165-afba-
ad5f8d13b8ef
What metadata do we keep ?

META UNIT
DATA UNIT
Ext4 FS
DATA
FILEID
CHECKSUM
MTIME
ETAG PATH
SETUP ONE: Local FS with MySQL as the metadata store

META UNIT
DATA UNIT
Ext4 FS XATTR
DATA
FILEID
CHECKSUM
FILEID
CHECKSUM
MTIME
ETAG PATH
SETUP TWO: Local FS with MySQL and XATTRs as metadata stores

META UNIT
DATA UNIT
Ext4 FS XATTR
DATA
FILEID
CHECKSUM
FILEID
CHECKSUM
MTIME
ETAG PATH
SETUP THREE: Local FS with REDIS and XATTRs as metadata stores

CPU
64 cores Intel(R) Xeon(R) CPU E5-4640 v2 @
2.20GHz
RAM 64 GB
DISK SAS-3 12 Gb/s 4 TB Seagate Constellation
Enterprise ST4000NM3401 (RAID6)
VM Machine Specs, Deployment scenario and Operations to benchmark
Deployment
Operations
16 services (servers) on the same VM
bench client also on the same VM
Stat Upload
VM
infrastructure provided by

• The STAT operation is similar to a Unix ﬁle stat operation or a
webDAV PROPFIND.
• The objective of this operation is to retrieve the metadata
associated with a particular resource (ﬁle or folder)
• For each level of concurrency, 10 000 requests are triggered
and the test is repeated 5 times.
• Operation uses gRPC and HTTP/2.
STAT benchmark

• The upload operation uploads a file randomly
chosen from a fixed set of 100 files that follow the
distribution of files observed on CERNBox.
• The chosen file is uploaded 5000 times per
concurrency level, to random target destinations
to avoid overwrites. The benchmark is repeated 5
times.
• Operation uses HTTP/1.1
• Upload triggers metadata propagation.
UPLOAD benchmark

• Retrieval of file hierarchy from FS favours novel
uses cases and access to existing data
repositories.
• In-memory databases increase the performance
and can scale to a high number or records  
(with a 70 bytes memory footprint per file, 
64 GiB => 981714285714 files)
• Use of XATTRS makes the system more consistent
What can we extract from these results ?

WE THINK WE ARE ON THE RIGHT WAY
Piotr’s Analytics System

What comes next ?
• Improve performance (more parallelisation)
• Run more benchmarks: upload with checksums, overwrites, remove, move, fuzz
• Perform benchmark on cluster (ClawIO design scales out)
• Implement more backends: EOS, S3/SWIFT/RADOSGW 
(plug your backend, suggestions are welcome)
• Implement sharing and run benchmarks on shared folders
• Test other Sync Protocols: SeaFile, StorageSync ?

Acknowledgements
• Centro de Investigación, Transferencia e Innovación
(CITI)
• PhD Jakub T. Mościcki @CERN
• Piotr Mrowczynski @Technical University of Denmark
• LIA2, University of Vigo

Testing data and metadata backends with ClawIO

More Related Content

What's hot (20)

Similar to Testing data and metadata backends with ClawIO (20)

Recently uploaded (20)

Testing data and metadata backends with ClawIO