SlideShare a Scribd company logo
Benchmarking Personal Cloud
Storage

Spyros Eleftheriadis – itp12401@hua.gr
Harokopio University of Athens
Department of Informatics and Telematics
Postgraduate Program "Informatics and Telematics“
Modern Computer System Architectures

Thursday 16 January 2014
Introduction
• Services like Dropbox, SkyDrive and Google
Drive are becoming pervasive in people’s
routine.
• Such applications are data-intensive and their
increasing usage already produces a significant
share of Internet traffic.
• Very little is known about how providers
implement their services and the implications of
different designs.
Goals
• How different providers tackle the problem of
synchronizing people’s files?
• Development of a methodology that helps to
understand both system architecture and client
capabilities.

• What are the consequences of each design on
performance?
METHODOLOGY AND
SERVICES
Testbed I
• The testbed is composed of two parts
• (i) a test computer that runs the application-under-test in the desired
operating system
• (ii) the testing application.
• The complete environment can run either in a single machine, or in separate
machines provided that the testing application can intercept traffic from the
test computer.
Testbed II
• The actual testbed is a single Linux which both controls the experiments
and hosts a virtual machine that runs the test computer (Windows 7
Enterprise).
• The testing application receives as input benchmarking parameters
describing the sequence of operations to be performed. The testing
application acts remotely on the test computer, generating specific
workloads in the form of file batches, which are manipulated using a FTP
client. Files of different types are created or modified at run-time, e.g.,
text files composed of random words from a dictionary, images with
random pixels, or random binary files.
• Generated files are synchronized to the cloud by the application-undertest and the exchanged traffic is monitored to compute performance
metrics. These include the amount of traffic seen during the experiments,
the time before actual synchronization starts and the time to complete
synchronization.
Architecture & Data Centers
To identify how the analyzed services operate, they observe the
DNS names of contacted servers when:

• (i) starting the application
• (ii) immediately after files are manipulated
• (iii) when the application is in idle state
Contacted DNS name > IP address > Hybrid methodology >
Datacenter location estimation with about a hundred
of kilometers of precision
Checking Capabilities
Personal cloud storage applications can implement several capabilities
to optimize storage usage and to speed up transfers.
These capabilities include the adoption of
• chunking (i.e., splitting content into a maximum size data unit),
• bundling (i.e., the transmission of multiple small files as a single
object),
• deduplication (i.e., avoiding re-transmitting content already available
on servers),
• delta encoding (i.e., transmission of only modified portions of a file)
and
• compression.
For each case, a specific test has been designed to observe if the given
capability is implemented.
Benchmarking Performance
After knowing how the services are designed in terms of both data center
locations and system capabilities, we check how such choices influence
synchronization performance and the amount of overhead traffic.
They designed 8 benchmarks varying:
• number of files
• file sizes
• file types
• All files in the sets are created at run-time by our testing application.
• Each experiment is repeated 24 times per service, allowing at least 5 min
between experiments to avoid creating abnormal workloads to servers.
• The benchmark of a single storage service lasts for about 1 day.
Tested Storage Services
• Dropbox, Google Drive and SkyDrive are selected
because they are among the most popular offers
(according to the volume of search queries containing
names of cloud storage services on Google Trends).
• Wuala is considered because it is a system that offers
encryption at the client-side. (They want to verify the
impact of such privacy layer on synchronization
performance).
• Amazon Cloud Drive included to compare its
performance to Dropbox, since both services rely on
Amazon Web Services (AWS) data centers.
SYSTEM ARCHITECTURE
Protocols I
• All clients use HTTPS, except Dropbox
notification protocol, which relies on plain
HTTP. Interestingly, some Wuala storage
operations also use HTTP, since users’ privacy
has already been secured by local encryption.
• All services but Wuala use separate servers for
control and storage.
Protocols II
Firstly, the applications authenticate the user and check if any content has to be
updated.
• SkyDrive requires about 150 kB in total, 4 times more than others. This happens
because the application contacts many Microsoft Live servers during login (13 in this
example).
Secondly, once login is completed, the applications keep exchanging data with the
cloud.
• Wuala is polling servers every 5 min on average (equivalent background traffic of about
60 b/s).
• Google Drive follows close, with a lightweight 40 s polling interval (42 b/s).
• Dropbox and SkyDrive use intervals close to 1 min (82 b/s and 32 b/s, respectively).
• Amazon Cloud Drive is polling servers every 15 s, each time opening a new HTTPS
connection. This notification strategy consumes 6 kb/s – i.e., about 65 MB per day. This
information is relevant to users with bandwidth constraints (e.g., in 3G/4G networks)
and to the system: 1 million users would generate approximately 6 Gb/s of
signaling traffic alone! As the results for other providers demonstrate,
such design is not optimal and seems indeed
possible to be improved.
Protocols III
Data Centers
• Dropbox uses own servers (in the San Jose area) for client management,
while storage servers are committed to Amazon in Northern Virginia.
• Cloud Drive uses three AWS data centers: two are used for both storage
and control (in Ireland and Northern Virginia); a third one is used for storage
only (in Oregon).
• SkyDrive relies on Microsoft’s data centers in the Seattle area (for storage)
and Southern Virginia (for storage and control). We also identified a
destination in Singapore (for control only).
• Wuala data centers are located in Europe: two in the Nuremberg area, one
in Zurich and a fourth in Northern France. None is owned by Wuala.
• Google Drive follows a different approach: TCP connections are terminated
at the closest Google’s edge node, from where the traffic is routed to the
actual storage/control data center using the private Google’s network.
Google Drive Edge Nodes
CLOUD SERVICE
CLIENT CAPABILITIES
Chunking
• Only Amazon Cloud Drive does not perform chunking

• Google Drive uses 8 MB chunks
• Dropbox uses 4 MB chunks
• SkyDrive and Wuala use variable chunk sizes
Bundling
• Only Dropbox implements a file-bundling strategy

• Google Drive and Amazon Cloud Drive open one
separate TCP (and SSL) connection for each file.
• SkyDrive and Wuala submit files sequentially, waiting
for application layer acknowledgments between each
upload.
Client-side Deduplication
• Only Dropbox and Wuala implement deduplication.
• All other services have to upload the same data
even if it is readily available at the storage server.
Interestingly, Dropbox and Wuala can identify
copies of users’ files even after they are deleted
and later restored.
• In the case of Wuala, deduplication is compatible
with local encryption, i.e., two identical files
generate two identical encrypted versions.
Delta Encoding
• Delta encoding is a specialized compression
technique that calculates file differences among
two copies, allowing the transmission of only the
modifications between revisions.

• Only Dropbox fully implements delta encoding
• Wuala does not implement delta encoding.
However, deduplication prevents the client from
uploading those chunks not affected by the
change.
Delta encoding
Compression
• Compression reduce traffic and storage
requirements at the expense of processing time.
• Dropbox and Google Drive compress data before
transmission.
• Google Drive implements smart policies to verify
the file format.
PERFORMANCE
Synchronization Startup
• Dropbox is the fastest service to start
synchronizing single files. Its bundling strategy,
however, slightly delays start up with multiple files.
As we will show next, such strategy pays back in
total upload time.
• Wuala also increases its startup time when
multiple files are submitted.
• SkyDrive is by far
the slowest.
Completion Time
• Google Drive (26,49 Mb/s) and Wuala (33,34 Mb/s)
are the fastest, since each TCP connection is
terminated at data centers nearby our testbed.
• Dropbox and SkyDrive, on the other hand, are the
most impacted services.
Protocol Overhead
• Protocol overhead is the total storage and control
traffic over the benchmark size.

• Amazom Cloud Drive presents a very high
overhead because of its high number of control
flows opened for every file transfer.
• Dropbox exhibits the highest
overhead among the
remaining services, possibly
owing to the signaling cost
of implementing its
advanced capabilities.
Conclusions
• Dropbox implements most of the checked capabilities, and its
sophisticated client clearly boosts performance.

• Wuala deploys client side encryption, and this feature does
not seem to affect Wuala synchronization performance.
• These 4 examples confirm the role played by data center
placement in a centralized approach taking the perspective
of European users only, network latency is still an important
limitation for U.S. centric services, such as Dropbox and
SkyDrive. Services deploying data centers nearby our test
location, such as Wuala and Google Drive.
Thank you!

More Related Content

PDF
How companies use NoSQL & Couchbase - NoSQL Now 2014
PPTX
Real-Time Inverted Search NYC ASLUG Oct 2014
PDF
VTU 6th Sem Elective CSE - Module 5 cloud computing
PPTX
Dcs cloud architecture-high-level-design
PDF
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
PPTX
Relational cloud, A Database-as-a-Service for the Cloud
PPTX
Hadoop Meetup Jan 2019 - Overview of Ozone
PPTX
VTU 6th Sem Elective CSE - Module 3 cloud computing
How companies use NoSQL & Couchbase - NoSQL Now 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
VTU 6th Sem Elective CSE - Module 5 cloud computing
Dcs cloud architecture-high-level-design
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Relational cloud, A Database-as-a-Service for the Cloud
Hadoop Meetup Jan 2019 - Overview of Ozone
VTU 6th Sem Elective CSE - Module 3 cloud computing

What's hot (20)

PPTX
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
PPTX
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
PDF
Project Voldemort
PDF
HDFS Selective Wire Encryption
PDF
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
PPTX
Building a Just-in-Time Application Stack for Analysts
PPTX
IBM Spectrum Scale Secure- Secure Data in Motion and Rest
PPTX
VTU 6th Sem Elective CSE - Module 4 cloud computing
PPTX
Directory Write Leases in MagFS
PDF
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
PPTX
Geek Sync | Infrastructure for the Data Professional: An Introduction
PPTX
Evolving HDFS to Generalized Storage Subsystem
PDF
Voldemort : Prototype to Production
PDF
Voldemort Nosql
PDF
Case Study - How Rackspace Query Terabytes Of Data
PPTX
Ozone: An Object Store in HDFS
PPTX
Introduction to couchbase
PPTX
Real time data pipline with kafka streams
PPTX
Lessons Learned Running Hadoop and Spark in Docker Containers
PPTX
Gpu computing workshop
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Project Voldemort
HDFS Selective Wire Encryption
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Building a Just-in-Time Application Stack for Analysts
IBM Spectrum Scale Secure- Secure Data in Motion and Rest
VTU 6th Sem Elective CSE - Module 4 cloud computing
Directory Write Leases in MagFS
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Geek Sync | Infrastructure for the Data Professional: An Introduction
Evolving HDFS to Generalized Storage Subsystem
Voldemort : Prototype to Production
Voldemort Nosql
Case Study - How Rackspace Query Terabytes Of Data
Ozone: An Object Store in HDFS
Introduction to couchbase
Real time data pipline with kafka streams
Lessons Learned Running Hadoop and Spark in Docker Containers
Gpu computing workshop
Ad

Similar to Benchmarking Personal Cloud Storage (20)

PDF
Thesis presentation
PPTX
Dropbox
PPT
Cloud Encounters: Measuring the computing cloud
PDF
Removal Based Improved Replication Control and Fault Tolerance Method for Roa...
PDF
Cloud Storage System like Dropbox
PPTX
Jaringan virtual komputasi awan bagian ke 2
PPTX
TBLC Workshop Lunch & Learn - Utilizing the Cloud
PPTX
Group 2 - Cloud Storage
PDF
Project_Proposal
PDF
Quickly sync and upload files with Dropbox
PDF
Building a Hybrid Cloud Solution
PPTX
arts and crafts which is a staple in the scientific
PDF
Comparison of Cloud Computing Services | Torry Harris Whitepaper
PDF
Cloud storage
PDF
Industry analysis Consumer Cloud Storage
PDF
Leverage the Best Cloud Storage
DOCX
Mis cloud computing
PDF
Performance,cost and reliability through hybrid cloud storage
PDF
Object Storage: How Can it Work for You
PPTX
Managing storage on Prem and in Cloud
Thesis presentation
Dropbox
Cloud Encounters: Measuring the computing cloud
Removal Based Improved Replication Control and Fault Tolerance Method for Roa...
Cloud Storage System like Dropbox
Jaringan virtual komputasi awan bagian ke 2
TBLC Workshop Lunch & Learn - Utilizing the Cloud
Group 2 - Cloud Storage
Project_Proposal
Quickly sync and upload files with Dropbox
Building a Hybrid Cloud Solution
arts and crafts which is a staple in the scientific
Comparison of Cloud Computing Services | Torry Harris Whitepaper
Cloud storage
Industry analysis Consumer Cloud Storage
Leverage the Best Cloud Storage
Mis cloud computing
Performance,cost and reliability through hybrid cloud storage
Object Storage: How Can it Work for You
Managing storage on Prem and in Cloud
Ad

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
Teaching material agriculture food technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Chapter 3 Spatial Domain Image Processing.pdf
A Presentation on Artificial Intelligence
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Big Data Technologies - Introduction.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
The Rise and Fall of 3GPP – Time for a Sabbatical?
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
Empathic Computing: Creating Shared Understanding
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
MYSQL Presentation for SQL database connectivity
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Benchmarking Personal Cloud Storage

  • 1. Benchmarking Personal Cloud Storage Spyros Eleftheriadis – itp12401@hua.gr Harokopio University of Athens Department of Informatics and Telematics Postgraduate Program "Informatics and Telematics“ Modern Computer System Architectures Thursday 16 January 2014
  • 2. Introduction • Services like Dropbox, SkyDrive and Google Drive are becoming pervasive in people’s routine. • Such applications are data-intensive and their increasing usage already produces a significant share of Internet traffic. • Very little is known about how providers implement their services and the implications of different designs.
  • 3. Goals • How different providers tackle the problem of synchronizing people’s files? • Development of a methodology that helps to understand both system architecture and client capabilities. • What are the consequences of each design on performance?
  • 5. Testbed I • The testbed is composed of two parts • (i) a test computer that runs the application-under-test in the desired operating system • (ii) the testing application. • The complete environment can run either in a single machine, or in separate machines provided that the testing application can intercept traffic from the test computer.
  • 6. Testbed II • The actual testbed is a single Linux which both controls the experiments and hosts a virtual machine that runs the test computer (Windows 7 Enterprise). • The testing application receives as input benchmarking parameters describing the sequence of operations to be performed. The testing application acts remotely on the test computer, generating specific workloads in the form of file batches, which are manipulated using a FTP client. Files of different types are created or modified at run-time, e.g., text files composed of random words from a dictionary, images with random pixels, or random binary files. • Generated files are synchronized to the cloud by the application-undertest and the exchanged traffic is monitored to compute performance metrics. These include the amount of traffic seen during the experiments, the time before actual synchronization starts and the time to complete synchronization.
  • 7. Architecture & Data Centers To identify how the analyzed services operate, they observe the DNS names of contacted servers when: • (i) starting the application • (ii) immediately after files are manipulated • (iii) when the application is in idle state Contacted DNS name > IP address > Hybrid methodology > Datacenter location estimation with about a hundred of kilometers of precision
  • 8. Checking Capabilities Personal cloud storage applications can implement several capabilities to optimize storage usage and to speed up transfers. These capabilities include the adoption of • chunking (i.e., splitting content into a maximum size data unit), • bundling (i.e., the transmission of multiple small files as a single object), • deduplication (i.e., avoiding re-transmitting content already available on servers), • delta encoding (i.e., transmission of only modified portions of a file) and • compression. For each case, a specific test has been designed to observe if the given capability is implemented.
  • 9. Benchmarking Performance After knowing how the services are designed in terms of both data center locations and system capabilities, we check how such choices influence synchronization performance and the amount of overhead traffic. They designed 8 benchmarks varying: • number of files • file sizes • file types • All files in the sets are created at run-time by our testing application. • Each experiment is repeated 24 times per service, allowing at least 5 min between experiments to avoid creating abnormal workloads to servers. • The benchmark of a single storage service lasts for about 1 day.
  • 10. Tested Storage Services • Dropbox, Google Drive and SkyDrive are selected because they are among the most popular offers (according to the volume of search queries containing names of cloud storage services on Google Trends). • Wuala is considered because it is a system that offers encryption at the client-side. (They want to verify the impact of such privacy layer on synchronization performance). • Amazon Cloud Drive included to compare its performance to Dropbox, since both services rely on Amazon Web Services (AWS) data centers.
  • 12. Protocols I • All clients use HTTPS, except Dropbox notification protocol, which relies on plain HTTP. Interestingly, some Wuala storage operations also use HTTP, since users’ privacy has already been secured by local encryption. • All services but Wuala use separate servers for control and storage.
  • 13. Protocols II Firstly, the applications authenticate the user and check if any content has to be updated. • SkyDrive requires about 150 kB in total, 4 times more than others. This happens because the application contacts many Microsoft Live servers during login (13 in this example). Secondly, once login is completed, the applications keep exchanging data with the cloud. • Wuala is polling servers every 5 min on average (equivalent background traffic of about 60 b/s). • Google Drive follows close, with a lightweight 40 s polling interval (42 b/s). • Dropbox and SkyDrive use intervals close to 1 min (82 b/s and 32 b/s, respectively). • Amazon Cloud Drive is polling servers every 15 s, each time opening a new HTTPS connection. This notification strategy consumes 6 kb/s – i.e., about 65 MB per day. This information is relevant to users with bandwidth constraints (e.g., in 3G/4G networks) and to the system: 1 million users would generate approximately 6 Gb/s of signaling traffic alone! As the results for other providers demonstrate, such design is not optimal and seems indeed possible to be improved.
  • 15. Data Centers • Dropbox uses own servers (in the San Jose area) for client management, while storage servers are committed to Amazon in Northern Virginia. • Cloud Drive uses three AWS data centers: two are used for both storage and control (in Ireland and Northern Virginia); a third one is used for storage only (in Oregon). • SkyDrive relies on Microsoft’s data centers in the Seattle area (for storage) and Southern Virginia (for storage and control). We also identified a destination in Singapore (for control only). • Wuala data centers are located in Europe: two in the Nuremberg area, one in Zurich and a fourth in Northern France. None is owned by Wuala. • Google Drive follows a different approach: TCP connections are terminated at the closest Google’s edge node, from where the traffic is routed to the actual storage/control data center using the private Google’s network.
  • 18. Chunking • Only Amazon Cloud Drive does not perform chunking • Google Drive uses 8 MB chunks • Dropbox uses 4 MB chunks • SkyDrive and Wuala use variable chunk sizes
  • 19. Bundling • Only Dropbox implements a file-bundling strategy • Google Drive and Amazon Cloud Drive open one separate TCP (and SSL) connection for each file. • SkyDrive and Wuala submit files sequentially, waiting for application layer acknowledgments between each upload.
  • 20. Client-side Deduplication • Only Dropbox and Wuala implement deduplication. • All other services have to upload the same data even if it is readily available at the storage server. Interestingly, Dropbox and Wuala can identify copies of users’ files even after they are deleted and later restored. • In the case of Wuala, deduplication is compatible with local encryption, i.e., two identical files generate two identical encrypted versions.
  • 21. Delta Encoding • Delta encoding is a specialized compression technique that calculates file differences among two copies, allowing the transmission of only the modifications between revisions. • Only Dropbox fully implements delta encoding • Wuala does not implement delta encoding. However, deduplication prevents the client from uploading those chunks not affected by the change.
  • 23. Compression • Compression reduce traffic and storage requirements at the expense of processing time. • Dropbox and Google Drive compress data before transmission. • Google Drive implements smart policies to verify the file format.
  • 25. Synchronization Startup • Dropbox is the fastest service to start synchronizing single files. Its bundling strategy, however, slightly delays start up with multiple files. As we will show next, such strategy pays back in total upload time. • Wuala also increases its startup time when multiple files are submitted. • SkyDrive is by far the slowest.
  • 26. Completion Time • Google Drive (26,49 Mb/s) and Wuala (33,34 Mb/s) are the fastest, since each TCP connection is terminated at data centers nearby our testbed. • Dropbox and SkyDrive, on the other hand, are the most impacted services.
  • 27. Protocol Overhead • Protocol overhead is the total storage and control traffic over the benchmark size. • Amazom Cloud Drive presents a very high overhead because of its high number of control flows opened for every file transfer. • Dropbox exhibits the highest overhead among the remaining services, possibly owing to the signaling cost of implementing its advanced capabilities.
  • 28. Conclusions • Dropbox implements most of the checked capabilities, and its sophisticated client clearly boosts performance. • Wuala deploys client side encryption, and this feature does not seem to affect Wuala synchronization performance. • These 4 examples confirm the role played by data center placement in a centralized approach taking the perspective of European users only, network latency is still an important limitation for U.S. centric services, such as Dropbox and SkyDrive. Services deploying data centers nearby our test location, such as Wuala and Google Drive.