SlideShare a Scribd company logo
194B
GETTING 100B METRICS TO DISK
Jonathan Thurman -Site Reliability Engineer
@jthurman42

http://www.flickr.com/photos/meteopassione/9157134653/
NEW RELIC

• Performance Monitoring
• Web Apps
• Mobile Apps
• Servers
• Databases, Caches & More…
• Software Analytics
O K AY, Y O U
C O L L E C T D ATA
• 194 Billion Metrics
• 100,000 req/sec
• 2 Gbps Inbound
• 216 Terabytes
• All backed my MySQL

http://www.flickr.com/photos/bobsfever/6658919861/
HOW WE GOT HERE

http://www.flickr.com/photos/auvet/853157494/
BUILDING BLOCKS

• Hosted Environment
• Xen Virtual Machines
• Data storage
• ATA over Ethernet
• SATA drives
• MySQL 5.0
• Single Ruby on Rails Application
http://www.flickr.com/photos/riekhavoc/4648423297/
SHARDING FROM
INCEPTION
• Account Information
• Read heavy
• Single HA Instance
• Agent Data
• Write heavy
• 8 shards based on AccountId

http://www.flickr.com/photos/erikb/48221952/
TA L E O F 

TWO MODELS

• Ruby on Rails
• class ShardData < ActiveRecord::Base
• Look up shard for Account
• Override ConnectionHandler

http://www.flickr.com/photos/jungle_boy/140279885/
Getting 100B Metrics to Disk
T R I B B L E S TA B L E S

• Metric table name contains
• AccountID
• Year and Julian Day
• Resolution
• ts_72_13221_1h
• Currently ~200k tables per DB

http://www.flickr.com/photos/15942690@N00/4571141076/
BINGE AND PURGE

• Purging data
• DELETE FROM …
• DROP TABLE …
• innodb_file_per_table
• innodb_lazy_drop_table


(pre 5.5.30-30.2)

http://www.flickr.com/photos/exalthim/2261294871/
http://www.flickr.com/photos/heliocentric/1571127347/

http://www.flickr.com/photos/davidmonro/8331755849/

http://www.flickr.com/photos/aigle_dore/6225535459/
G R O W I N G PA I N S

http://www.flickr.com/photos/aigle_dore/5626285743/
M U LT I P L E P O I N T S
O F FA I L U R E

• Single shard slows down
• App servers wait for response
• DB connection pool becomes full
• Site goes down

http://www.flickr.com/photos/boston_public_library/8204384670/
SHARDGUARD

• Monitor all databases
• Identify shard status:
• Bad? Mark as “wedged”
• Good? Clear “wedged” flag
• ShardData checks status!

http://www.flickr.com/photos/mac_filko/5486980804/
S TA B I L I T Y A N D
PERFORMANCE

• Degraded performance
• New Accounts => Shard 9!
• Old accounts remain as-is

http://www.flickr.com/photos/ejpphoto/7823027272/
D ATA C O L L E C T I O N

• Rails isn’t great for data collection
• Ruby isn’t great either…
• Rewritten in Java using Jetty

http://www.flickr.com/photos/autograt/224540606/
http://www.flickr.com/photos/epsos/8474532085/

CACHE IS KING

• Buffered, not queued
• RAM is cheaper than I/O
• Get creative with batch processing
INSERT INTO
(SELECT …

• Select rows and re-process
• Cache last hour in Java’s Heap
• Write a journal and post-process it

http://www.flickr.com/photos/esoteric_13/4741001804/
READ / WRITE
PROBLEM

• Sequential Inserts
• Batched in 5k chunks
• Optimize for Throughput
• Must complete < 1 minute
READ / WRITE
PROBLEM

• Scattered Reads
• Optimized for Latency
• Unique Covering Indexes
MOVE TO
HARDWARE
• Instant performance!
• Just add…
• Datacenter - Chicago, US
• Servers - Dell
• Storage - Direct Attached
• Time - About 6 months

http://www.flickr.com/photos/zebble/9621007/
SPINNING

RUST

• Dell MD1200 shelves
• 8 Disks per shelf
• RAID 5 virtual disk
• Dedicated Hot-spare

http://www.flickr.com/photos/walkn/5472536812/
T H E G R E AT
E X PA N S E

• MD1200s support 12 disks
• Add four more!
• Online RAID expansion

http://www.flickr.com/photos/aigle_dore/5853807037/
# FA I L

• “On-line” expansion, not so much
• Added second 4 disk RAID 5
• LVM Concatenation for space

http://www.flickr.com/photos/fireflythegreat/2845637227/
NEED MORE
C A PA C I T Y

• Tight on disk space
• Performance not an issue
• New Accounts => Shard 10!
• Old Accounts as-is

http://www.flickr.com/photos/seandreilinger/6289721616/
Getting 100B Metrics to Disk
S H A R D P I T FA L L S

http://www.flickr.com/photos/21206761@N00/469110140/
M I G R AT I O N
PROBLEM

• Accounts cannot move
• Not all tables have the shard key
• Rails defaults to auto-increment IDs
• Massive primary key collisions
• Punt and move the metrics

http://www.flickr.com/photos/tzafrir/125380911/
BREAKING UP IS
HARD TO DO

• Agent Databases
• Metadata / Notes / Errors
• Timeslice Databases
• Time-series metric data
• 1 Minute and 1 Hour resolution

http://www.flickr.com/photos/rsepulveda/4275236049/
Getting 100B Metrics to Disk
RESOURCE POOLS

• Distributed by Shard Key
• Distribution can CHANGE
• Lookup table, not hash
• Data can be MOVED

http://www.flickr.com/photos/dclark3996/4971906528/
BACKUPS

• Custom mysqldump wrapper
• Based on business need
• Backup per table
• Ignore tables to be purged

http://www.flickr.com/photos/usdagov/6896218334/
EVOLUTION

http://www.flickr.com/photos/pfsullivan_1056/3485953405/
SSD REVOLUTION

• 600GB Intel 320 SSDs
• Dell MD1220 Direct Attached shelf
• Disks are no longer the bottle-neck
• Inserts in Read-optimized order


are “fast enough”
YOU CAN USE SSD
W I T H D ATA B A S E S

• 6 of 420 drives RMA’d
• March 2012 to Aug 2013
• Average 180TB lifetime writes
• 91% wear remaining

http://www.flickr.com/photos/joeshlabotnik/3584172834/
R E D U N D A N T A R R AY
OF EXPENSIVE DISKS

• Rebuilds under load > 4 hours
• Migrated to RAID 60
• 2 x 12 disk span
• Ditch the Hot-spares

http://www.flickr.com/photos/mbk/27640225/
XFS TUNING

• mkfs.xfs -s size=4096
• options
• noatime
• nobarrier
• inode64
• logbsize=256k

http://www.flickr.com/photos/rocketlass/5169004165/
SHARDGUARD
PA R T D E U X

• Protect all the things!
• Kill UI queries over 75 seconds
• Kill background queries over 1 hour
• Yes, all of them
• No really, kill them, now

http://www.flickr.com/photos/chiky/7194089194/
IF YOU DON’T
BELIEVE ME…

• Delayed Job
• Long running background query
• InnoDB History List Traversal
TO INFINITY AND BEYOND

http://www.flickr.com/photos/temma2/1149223191/
HARDWARE V2

• Dell R620
• 2 x Intel E5-2690 @ 2.90GHz
• 96GB RAM
• MD1220 Storage Shelf
• 800GB Intel SSD S3500

http://www.flickr.com/photos/tnarik/2590037637/
CONTINUOUS

IMPROVEMENT

• EXT4 / ZFS / XFS
• RAID Card vs HBA
• Percona Server 5.6
• Multiple MySQL Instances
• Databases per Service

http://www.flickr.com/photos/shawnclover/8555834230/
JOIN THE TEAM

NewRelic.com/jobs

More Related Content

PDF
Integrating multiple CDNs at Etsy
KEY
improving the performance of Rails web Applications
PDF
Flexible UI Components for a Multi-Framework World
PDF
An Intense Overview of the React Ecosystem
PDF
Riding rails for 10 years
KEY
Dibi Conference 2012
KEY
Windycityrails page performance
PDF
How to build a PostgreSQL-backed website quickly
Integrating multiple CDNs at Etsy
improving the performance of Rails web Applications
Flexible UI Components for a Multi-Framework World
An Intense Overview of the React Ecosystem
Riding rails for 10 years
Dibi Conference 2012
Windycityrails page performance
How to build a PostgreSQL-backed website quickly

What's hot (20)

PPT
Windy cityrails performance_tuning
PDF
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
PDF
PLAT-8 Spring Web Scripts and Spring Surf
PDF
React.js for Rails Developers
PDF
Modern javascript
PDF
React on rails v6.1 at LA Ruby, November 2016
PDF
Webcomponents are your frameworks best friend
PDF
Frameworks and webcomponents
ODP
Cvcc performance tuning
PDF
Write Once, Run Everywhere - Ember.js Munich
PDF
Cloud Native Camel Riding
PDF
Web Development using Ruby on Rails
KEY
Cloud tools
PDF
Service-Oriented Design and Implement with Rails3
PPTX
Best Practices in SharePoint Development - Just Freakin Work! Overcoming Hurd...
PDF
Server Check.in case study - Drupal and Node.js
PDF
PLAT-7 Spring Web Scripts and Spring Surf
PDF
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
PDF
Web a Quebec - JS Debugging
PPTX
Agile sites2
Windy cityrails performance_tuning
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
PLAT-8 Spring Web Scripts and Spring Surf
React.js for Rails Developers
Modern javascript
React on rails v6.1 at LA Ruby, November 2016
Webcomponents are your frameworks best friend
Frameworks and webcomponents
Cvcc performance tuning
Write Once, Run Everywhere - Ember.js Munich
Cloud Native Camel Riding
Web Development using Ruby on Rails
Cloud tools
Service-Oriented Design and Implement with Rails3
Best Practices in SharePoint Development - Just Freakin Work! Overcoming Hurd...
Server Check.in case study - Drupal and Node.js
PLAT-7 Spring Web Scripts and Spring Surf
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
Web a Quebec - JS Debugging
Agile sites2
Ad

Viewers also liked (20)

PDF
Velocity 2013 london developer-friendly web performance testing in continuou...
PPTX
Why Page Speed Isn't Enough - Tim Morrow - Velocity Europe 2012
PDF
Velocity EU 2013 What is the velocity of an unladen swallow?
PDF
Performance and Metrics at Lonely Planet
PPTX
Data viz as_interface_makoto_inoue
PDF
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
PDF
Bring the Noise
PDF
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
PPT
Velocity EU 2012 - Third party scripts and you
PDF
Be Mean to Your Code with Gauntlt and the Rugged Way // Velocity EU 2013 Work...
PDF
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
PDF
Monitoring and observability
PPTX
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
PDF
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
PDF
What HTTP/2.0 Will Do For You
PPT
Case Study: Realtime Analytics with Druid
PDF
SaaS Introduction-May2014
PDF
Web Page Test - Beyond the Basics
PPTX
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
PPTX
Step by Step Mobile Optimization
Velocity 2013 london developer-friendly web performance testing in continuou...
Why Page Speed Isn't Enough - Tim Morrow - Velocity Europe 2012
Velocity EU 2013 What is the velocity of an unladen swallow?
Performance and Metrics at Lonely Planet
Data viz as_interface_makoto_inoue
Are Today’s Good Practices… Tomorrow’s Performance Anti-Patterns?
Bring the Noise
MeasureWorks - Velocity Conference Europe 2012 - a Web Performance dashboard ...
Velocity EU 2012 - Third party scripts and you
Be Mean to Your Code with Gauntlt and the Rugged Way // Velocity EU 2013 Work...
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Monitoring and observability
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
What HTTP/2.0 Will Do For You
Case Study: Realtime Analytics with Druid
SaaS Introduction-May2014
Web Page Test - Beyond the Basics
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
Step by Step Mobile Optimization
Ad

Similar to Getting 100B Metrics to Disk (20)

PPTX
In-browser storage and me
PPTX
Austin cassandra meetup
KEY
Memcached: What is it and what does it do?
PPTX
Building Rackspace Cloud Monitoring
PDF
Using Riak for Events storage and analysis at Booking.com
PPTX
SQLite forensics - Free Lists, unallocated space, carving
PPTX
Openstack Swift - Lots of small files
PPTX
SharePoint Performance - Best Practices from the Field
PPTX
SharePoint Performance: Best Practices from the Field
PPTX
Stack Exchange Infrastructure - LISA 14
PPTX
Breaking the Relational Headlock: A Survey of NoSQL Datastores
PPTX
Data Ingestion Engine
PDF
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PDF
DrupalSouth 2015 - Performance: Not an Afterthought
PDF
Webinar - DreamObjects/Ceph Case Study
PDF
Advanced Core Data - The Things You Thought You Could Ignore
PDF
Just Too Late
PDF
April, 2021 OpenNTF Webinar - Domino Administration Best Practices
PPTX
Drupal performance
PPTX
Urbanesia - Development History
In-browser storage and me
Austin cassandra meetup
Memcached: What is it and what does it do?
Building Rackspace Cloud Monitoring
Using Riak for Events storage and analysis at Booking.com
SQLite forensics - Free Lists, unallocated space, carving
Openstack Swift - Lots of small files
SharePoint Performance - Best Practices from the Field
SharePoint Performance: Best Practices from the Field
Stack Exchange Infrastructure - LISA 14
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Data Ingestion Engine
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
DrupalSouth 2015 - Performance: Not an Afterthought
Webinar - DreamObjects/Ceph Case Study
Advanced Core Data - The Things You Thought You Could Ignore
Just Too Late
April, 2021 OpenNTF Webinar - Domino Administration Best Practices
Drupal performance
Urbanesia - Development History

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PDF
cuic standard and advanced reporting.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Dropbox Q2 2025 Financial Results & Investor Presentation
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Spectral efficient network and resource selection model in 5G networks
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Mobile App Security Testing_ A Comprehensive Guide.pdf
Modernizing your data center with Dell and AMD
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
cuic standard and advanced reporting.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Getting 100B Metrics to Disk

  • 1. 194B GETTING 100B METRICS TO DISK Jonathan Thurman -Site Reliability Engineer @jthurman42 http://www.flickr.com/photos/meteopassione/9157134653/
  • 2. NEW RELIC • Performance Monitoring • Web Apps • Mobile Apps • Servers • Databases, Caches & More… • Software Analytics
  • 3. O K AY, Y O U C O L L E C T D ATA • 194 Billion Metrics • 100,000 req/sec • 2 Gbps Inbound • 216 Terabytes • All backed my MySQL http://www.flickr.com/photos/bobsfever/6658919861/
  • 4. HOW WE GOT HERE http://www.flickr.com/photos/auvet/853157494/
  • 5. BUILDING BLOCKS • Hosted Environment • Xen Virtual Machines • Data storage • ATA over Ethernet • SATA drives • MySQL 5.0 • Single Ruby on Rails Application http://www.flickr.com/photos/riekhavoc/4648423297/
  • 6. SHARDING FROM INCEPTION • Account Information • Read heavy • Single HA Instance • Agent Data • Write heavy • 8 shards based on AccountId http://www.flickr.com/photos/erikb/48221952/
  • 7. TA L E O F 
 TWO MODELS • Ruby on Rails • class ShardData < ActiveRecord::Base • Look up shard for Account • Override ConnectionHandler http://www.flickr.com/photos/jungle_boy/140279885/
  • 9. T R I B B L E S TA B L E S • Metric table name contains • AccountID • Year and Julian Day • Resolution • ts_72_13221_1h • Currently ~200k tables per DB http://www.flickr.com/photos/15942690@N00/4571141076/
  • 10. BINGE AND PURGE • Purging data • DELETE FROM … • DROP TABLE … • innodb_file_per_table • innodb_lazy_drop_table
 (pre 5.5.30-30.2) http://www.flickr.com/photos/exalthim/2261294871/
  • 12. G R O W I N G PA I N S http://www.flickr.com/photos/aigle_dore/5626285743/
  • 13. M U LT I P L E P O I N T S O F FA I L U R E • Single shard slows down • App servers wait for response • DB connection pool becomes full • Site goes down http://www.flickr.com/photos/boston_public_library/8204384670/
  • 14. SHARDGUARD • Monitor all databases • Identify shard status: • Bad? Mark as “wedged” • Good? Clear “wedged” flag • ShardData checks status! http://www.flickr.com/photos/mac_filko/5486980804/
  • 15. S TA B I L I T Y A N D PERFORMANCE • Degraded performance • New Accounts => Shard 9! • Old accounts remain as-is http://www.flickr.com/photos/ejpphoto/7823027272/
  • 16. D ATA C O L L E C T I O N • Rails isn’t great for data collection • Ruby isn’t great either… • Rewritten in Java using Jetty http://www.flickr.com/photos/autograt/224540606/
  • 17. http://www.flickr.com/photos/epsos/8474532085/ CACHE IS KING • Buffered, not queued • RAM is cheaper than I/O • Get creative with batch processing
  • 18. INSERT INTO (SELECT … • Select rows and re-process • Cache last hour in Java’s Heap • Write a journal and post-process it http://www.flickr.com/photos/esoteric_13/4741001804/
  • 19. READ / WRITE PROBLEM • Sequential Inserts • Batched in 5k chunks • Optimize for Throughput • Must complete < 1 minute
  • 20. READ / WRITE PROBLEM • Scattered Reads • Optimized for Latency • Unique Covering Indexes
  • 21. MOVE TO HARDWARE • Instant performance! • Just add… • Datacenter - Chicago, US • Servers - Dell • Storage - Direct Attached • Time - About 6 months http://www.flickr.com/photos/zebble/9621007/
  • 22. SPINNING
 RUST • Dell MD1200 shelves • 8 Disks per shelf • RAID 5 virtual disk • Dedicated Hot-spare http://www.flickr.com/photos/walkn/5472536812/
  • 23. T H E G R E AT E X PA N S E • MD1200s support 12 disks • Add four more! • Online RAID expansion http://www.flickr.com/photos/aigle_dore/5853807037/
  • 24. # FA I L • “On-line” expansion, not so much • Added second 4 disk RAID 5 • LVM Concatenation for space http://www.flickr.com/photos/fireflythegreat/2845637227/
  • 25. NEED MORE C A PA C I T Y • Tight on disk space • Performance not an issue • New Accounts => Shard 10! • Old Accounts as-is http://www.flickr.com/photos/seandreilinger/6289721616/
  • 27. S H A R D P I T FA L L S http://www.flickr.com/photos/21206761@N00/469110140/
  • 28. M I G R AT I O N PROBLEM • Accounts cannot move • Not all tables have the shard key • Rails defaults to auto-increment IDs • Massive primary key collisions • Punt and move the metrics http://www.flickr.com/photos/tzafrir/125380911/
  • 29. BREAKING UP IS HARD TO DO • Agent Databases • Metadata / Notes / Errors • Timeslice Databases • Time-series metric data • 1 Minute and 1 Hour resolution http://www.flickr.com/photos/rsepulveda/4275236049/
  • 31. RESOURCE POOLS • Distributed by Shard Key • Distribution can CHANGE • Lookup table, not hash • Data can be MOVED http://www.flickr.com/photos/dclark3996/4971906528/
  • 32. BACKUPS • Custom mysqldump wrapper • Based on business need • Backup per table • Ignore tables to be purged http://www.flickr.com/photos/usdagov/6896218334/
  • 34. SSD REVOLUTION • 600GB Intel 320 SSDs • Dell MD1220 Direct Attached shelf • Disks are no longer the bottle-neck • Inserts in Read-optimized order
 are “fast enough”
  • 35. YOU CAN USE SSD W I T H D ATA B A S E S • 6 of 420 drives RMA’d • March 2012 to Aug 2013 • Average 180TB lifetime writes • 91% wear remaining http://www.flickr.com/photos/joeshlabotnik/3584172834/
  • 36. R E D U N D A N T A R R AY OF EXPENSIVE DISKS • Rebuilds under load > 4 hours • Migrated to RAID 60 • 2 x 12 disk span • Ditch the Hot-spares http://www.flickr.com/photos/mbk/27640225/
  • 37. XFS TUNING • mkfs.xfs -s size=4096 • options • noatime • nobarrier • inode64 • logbsize=256k http://www.flickr.com/photos/rocketlass/5169004165/
  • 38. SHARDGUARD PA R T D E U X • Protect all the things! • Kill UI queries over 75 seconds • Kill background queries over 1 hour • Yes, all of them • No really, kill them, now http://www.flickr.com/photos/chiky/7194089194/
  • 39. IF YOU DON’T BELIEVE ME… • Delayed Job • Long running background query • InnoDB History List Traversal
  • 40. TO INFINITY AND BEYOND http://www.flickr.com/photos/temma2/1149223191/
  • 41. HARDWARE V2 • Dell R620 • 2 x Intel E5-2690 @ 2.90GHz • 96GB RAM • MD1220 Storage Shelf • 800GB Intel SSD S3500 http://www.flickr.com/photos/tnarik/2590037637/
  • 42. CONTINUOUS
 IMPROVEMENT • EXT4 / ZFS / XFS • RAID Card vs HBA • Percona Server 5.6 • Multiple MySQL Instances • Databases per Service http://www.flickr.com/photos/shawnclover/8555834230/