Sql saturday azure storage by Anton Vidishchev

Windows Azure Storage
Overview, Internals and Best Practices

About me








Program Manager @ Edgar Online, RRD
Windows Azure MVP
Co-organizer of Odessa .NET User Group
Ukrainian IT Awards 2013 Winner – Software Engineering
http://cloudytimes.azurewebsites.net/
http://www.linkedin.com/in/antonvidishchev
https://www.facebook.com/anton.vidishchev

What is Windows Azure Storage?

Windows Azure Storage
 Cloud Storage - Anywhere and anytime access
 Blobs, Disks, Tables and Queues

 Highly Durable, Available and Massively Scalable
 Easily build “internet scale” applications
 10 trillion stored objects
 900K request/sec on average (2.3+ trillion per month)

 Pay for what you use
 Exposed via easy and open REST APIs
 Client libraries in .NET, Java, Node.js, Python, PHP,
Ruby

Abstractions – Blobs and Disks

Abstractions – Tables and Queues

Windows Azure Data Storage Concepts

Container

Blobs

https://<account>.blob.core.windows.net/<container>

Account

Table

Entities

https://<account>.table.core.windows.net/<table>

Queue

Messages

https://<account>.queue.core.windows.net/<queue>

How is Azure Storage used by Microsoft?

Design Goals
Highly Available with Strong Consistency
 Provide access to data in face of failures/partitioning

Durability
 Replicate data several times within and across regions

Scalability
 Need to scale to zettabytes
 Provide a global namespace to access data around
the world
 Automatically scale out and load balance data to
meet peak traffic demands

Windows Azure Storage Stamps
Access blob storage via the URL: http://<account>.blob.core.windows.net/

Data access

Storage
Location
Service

LB

LB

Front-Ends

Front-Ends

Partition Layer

Partition Layer

Inter-stamp (Geo) replication

DFS Layer

DFS Layer

Intra-stamp replication

Intra-stamp replication

Storage Stamp

Storage Stamp

Architecture Layers inside Stamps

Partition Layer

Index

Availability with Consistency for Writing
All writes are appends to the end of a log, which is
an append to the last extent in the log
Write Consistency across all replicas for an
extent:
 Appends are ordered the same across all
3 replicas for an extent (file)
 Only return success if all 3 replica
appends are committed to storage
 When extent gets to a certain size or on
write failure/LB, seal the extent’s replica
set and never append anymore data to it

Write Availability: To handle failures during write
 Seal extent’s replica set
 Append immediately to a new extent
(replica set) on 3 other available nodes
 Add this new extent to the end of the
partition’s log (stream)

Partition Layer

Availability with Consistency for Reading
Read Consistency: Can
read from any replica, since
data in each replica for an
extent is bit-wise identical

Read Availability: Send out
parallel read requests if first
read is taking higher than
95% latency

Partition Layer

Dynamic Load Balancing – Partition Layer
Spreads index/transaction processing
across partition servers
 Master monitors traffic
load/resource utilization on
partition servers
 Dynamically load balance
partitions across servers to
achieve better
performance/availability



Does not move data around, only
reassigns what part of the index a
partition server is responsible for

Partition Layer

Index

Dynamic Load Balancing – DFS Layer
DFS Read load balancing across replicas
 Monitor latency/load on each
node/replica; dynamically select
what replica to read from and start
additional reads in parallel based on
95% latency

Partition Layer

Architecture Summary
 Durability: All data stored with at least 3 replicas
 Consistency: All committed data across all 3 replicas are identical
 Availability: Can read from any 3 replicas; If any issues writing seal
extent and continue appending to new extent
 Performance/Scale: Retry based on 95% latencies; Auto scale out and
load balance based on load/capacity



Additional details can be found in the SOSP paper:

 “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong
Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct.
2011

General .NET Best Practices For Azure
Storage
 Disable Nagle for small messages (< 1400 b)
 ServicePointManager.UseNagleAlgorithm = false;

 Disable Expect 100-Continue*
 ServicePointManager.Expect100Continue = false;

 Increase default connection limit
 ServicePointManager.DefaultConnectionLimit = 100; (Or
More)

 Take advantage of .Net 4.5 GC
 GC performance is greatly improved
 Background GC: http://msdn.microsoft.com/enus/magazine/hh882452.aspx

General Best Practices
 Locate Storage accounts close to compute/users
 Understand Account Scalability targets

 Use multiple storage accounts to get more
 Distribute your storage accounts across regions

 Consider heating up the storage for better
performance
 Cache critical data sets

 To get more request/sec than the account/partition targets
 As a Backup data set to fall back on

 Distribute load over many partitions and avoid
spikes

General Best Practices (cont.)
 Use HTTPS
 Optimize what you send & receive

 Blobs: Range reads, Metadata, Head Requests
 Tables: Upsert, Projection, Point Queries
 Queues: Update Message

 Control Parallelism at the application layer

 Unbounded Parallelism can lead to slow latencies and
throttling

 Enable Logging & Metrics on each storage
service

Blob Best Practices
 Try to match your read size with your write size
 Avoid reading small ranges on blobs with large blocks
 CloudBlockBlob.StreamMinimumReadSizeInBytes/
StreamWriteSizeInBytes

 How do I upload a folder the fastest?
 Upload multiple blobs simultaneously

 How do I upload a blob the fastest?
 Use parallel block upload

 Concurrency (C)- Multiple workers upload different
blobs
 Parallelism (P) – Multiple workers upload different
blocks for same blob

Concurrency Vs. Blob Parallelism

•
•
•

C=1, P=1 => Averaged ~ 13. 2 MB/s
C=1, P=30 => Averaged ~ 50.72 MB/s
C=30, P=1 => Averaged ~ 96.64 MB/s

• Single TCP connection is bound by
TCP rate control & RTT
• P=30 vs. C=30: Test completed
almost twice as fast!
• Single Blob is bound by the limits
of a single partition
• Accessing multiple blobs
concurrently scales

10000
8000
6000
4000
2000

Time (s)

XL VM Uploading 512, 256MB
Blobs (Total upload size =
128GB)

0

Blob Download
 XL VM Downloading
50, 256MB Blobs (Total
download size = 12.5GB)
C=1, P=1 => Averaged ~ 96 MB/s
C=30, P=1 => Averaged ~ 130 MB/s

120

Time (s)

•
•

140

100

80
60
40
20
0
C=1, P=1

C=30, P=1

Table Best Practices
 Critical Queries: Select PartitionKey, RowKey to avoid hotspots

 Table Scans are expensive – avoid them at all costs for latency sensitive
scenarios

 Batch: Same PartitionKey for entities that need to be updated
together
 Schema-less: Store multiple types in same table
 Single Index – {PartitionKey, RowKey}: If needed, concatenate
columns to form composite keys
 Entity Locality: {PartitionKey, RowKey} determines sort order

 Store related entites together to reduce IO and improve performance

 Table Service Client Layer in 2.1 and 2.2: Dramatic performance
improvements and better NoSQL interface

Queue Best Practices
 Make message processing idempotent: Messages
become visible if client worker fails to delete
message
 Benefit from Update Message: Extend visibility time
based on message or save intermittent state
 Message Count: Use this to scale workers
 Dequeue Count: Use it to identify poison messages
or validity of invisibility time used
 Blobs to store large messages: Increase throughput
by having larger batches
 Multiple Queues: To get more than a single queue
(partition) target

Sql saturday azure storage by Anton Vidishchev

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Sql saturday azure storage by Anton Vidishchev (20)

More from Alex Tumanoff (20)

Recently uploaded (20)

Sql saturday azure storage by Anton Vidishchev