Long-­‐Term	
  Storage	
  
Panel	
  Session	
  
Erik	
  Riedel,	
  EMC	
  
Library	
  of	
  Congress	
  Workshop	
  
September	
  2012	
  
top	
  picture	
  “Once	
  Blue”	
  by	
  Jesse	
  Wagstaff	
  via	
  flickr/cc	
  	
  
right	
  picture	
  by	
  AusNn	
  Marshall	
  via	
  flickr/cc	
  
revision	
  3	
  
Parameters	
  
•  Non-­‐compressible	
  data	
  
•  Long-­‐term	
  storage	
  
•  Very	
  high	
  reliability	
  
•  Request	
  rate	
  of	
  10%	
  per	
  year	
  
•  5,	
  20,	
  50	
  PB	
  in	
  2012,	
  2015,	
  2018	
  
Density	
  
2012	
   Disks	
  (raw)	
  @	
  3TB	
   Disks	
  (protected)	
   Racks	
  @	
  480	
  disks	
  
5	
  PB	
   1,700	
  disks	
   2,700	
  disks	
   6	
  racks	
  
20	
  PB	
   6,700	
  disks	
   11,000	
  disks	
   23	
  racks	
  
50	
  PB	
   17,000	
  disks	
   27,000	
  disks	
   56	
  racks	
  
Density	
  
2012	
   Disks	
  (raw)	
  @	
  3TB	
   Disks	
  (protected)	
   Racks	
  @	
  480	
  disks	
  
5	
  PB	
   1,700	
  disks	
   2,700	
  disks	
   6	
  racks	
  
20	
  PB	
   6,700	
  disks	
   11,000	
  disks	
   23	
  racks	
  
50	
  PB	
   17,000	
  disks	
   27,000	
  disks	
   56	
  racks	
  
2015	
   Disks	
  (raw)	
  @	
  6TB	
   Disks	
  (protected)	
   Racks	
  @	
  600	
  disks	
  
5	
  PB	
   830	
  disks	
   1,300	
  disks	
   3	
  racks	
  
20	
  PB	
   3,300	
  disks	
   5,300	
  disks	
   9	
  racks	
  
50	
  PB	
   8,300	
  disks	
   13,000	
  disks	
   23	
  racks	
  
Density	
  
2012	
   Disks	
  (raw)	
  @	
  3TB	
   Disks	
  (protected)	
   Racks	
  @	
  480	
  disks	
  
5	
  PB	
   1,700	
  disks	
   2,700	
  disks	
   6	
  racks	
  
20	
  PB	
   6,700	
  disks	
   11,000	
  disks	
   23	
  racks	
  
50	
  PB	
   17,000	
  disks	
   27,000	
  disks	
   56	
  racks	
  
2015	
   Disks	
  (raw)	
  @	
  6TB	
   Disks	
  (protected)	
   Racks	
  @	
  600	
  disks	
  
5	
  PB	
   830	
  disks	
   1,300	
  disks	
   3	
  racks	
  
20	
  PB	
   3,300	
  disks	
   5,300	
  disks	
   9	
  racks	
  
50	
  PB	
   8,300	
  disks	
   13,000	
  disks	
   23	
  racks	
  
2018	
   Disks	
  (raw)	
  @	
  10TB	
   Disks	
  (protected)	
   Racks	
  @	
  600	
  disks	
  
5	
  PB	
   500	
  disks	
   800	
  disks	
   2	
  racks	
  
20	
  PB	
   2,000	
  disks	
   3,200	
  disks	
   6	
  racks	
  
50	
  PB	
   5,000	
  disks	
   8,000	
  disks	
   14	
  racks	
  
Performance	
  
2012	
   10%/yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
  BW	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
Performance	
  
2012	
   10%/yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
  BW	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
2012	
   10%/2day	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
  BW	
   Days-­‐to-­‐fill	
  
5	
  PB	
   2.9	
  GB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   11	
  GB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   29	
  GB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
Performance	
  
2012	
   10%/yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
  BW	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
2018	
   10%/2day	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
  BW	
   Days-­‐to-­‐fill	
  
5	
  PB	
   2.9	
  GB/s	
   800	
   80	
  GB/s	
   2	
  	
   10	
  GB/s	
   3.3	
  GB/s	
   17	
  
20	
  PB	
   11	
  GB/s	
   3,200	
   320	
  GB/s	
   6	
   30	
  GB/s	
   10	
  GB/s	
   23	
  
50	
  PB	
   29	
  GB/s	
   8,000	
   800	
  GB/s	
   14	
   70	
  GB/s	
   23	
  GB/s	
   25	
  
Cost	
  
2012	
   10%yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
Cost	
  
2012	
   10%yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
2012	
   $/month	
  @	
  $0.01/GB	
  
5	
  PB	
   $50,000/month	
  
20	
  PB	
   $200,000/month	
  
50	
  PB	
   $500,000/month	
  
Cost	
  if	
  using	
  e.g.	
  “cold”	
  public	
  cloud	
  storage	
  
Cost	
  
2012	
   10%yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
2012	
   sqN/person	
   $/sqN	
   $/month	
  
20	
  employees	
   90	
   $48	
  	
   $86,000/month	
   Washington,	
  DC	
  
80	
  employees	
   75	
   $48	
   $288,000/month	
   Washington,	
  DC	
  
200	
  employees	
   75	
   $24	
   $360,000/month	
   Minneapolis,	
  MN	
  
2012	
   $/month	
  @	
  $0.01/GB	
  
5	
  PB	
   $50,000/month	
  
20	
  PB	
   $200,000/month	
  
50	
  PB	
   $500,000/month	
  
Cost	
  if	
  using	
  e.g.	
  “cold”	
  public	
  cloud	
  storage	
  
For	
  comparison,	
  the	
  cost	
  to	
  “store”	
  
20	
  librarians	
  or	
  data	
  scienNsts	
  
AssumpNons	
  
•  Data	
  protecNon	
  in	
  a	
  single	
  data	
  center,	
  using	
  an	
  erasure-­‐coding	
  
scheme	
  at	
  1.6x	
  overhead	
  
•  480	
  drive	
  racks	
  in	
  2012	
  (40U)	
  
•  600	
  drive	
  racks	
  in	
  2015	
  and	
  2018	
  (50+U)	
  
•  10%/year	
  access	
  assumes	
  10%	
  of	
  total	
  data	
  is	
  accessed	
  in	
  even	
  
distribuNon	
  over	
  365	
  days/year,	
  24	
  hours/day	
  –	
  opNmisNc	
  
•  10%/2day	
  access	
  assumes	
  10%	
  of	
  data	
  is	
  accessed	
  on	
  only	
  2	
  days	
  
per	
  year	
  (say	
  Thanksgiving	
  and	
  Xmas)	
  –	
  very	
  bursty	
  
•  Bandwidth	
  is	
  theoreNcal	
  bandwidth	
  at	
  40	
  Gb/s	
  per	
  rack	
  (4x	
  10	
  GbE)	
  
•  Actual	
  bandwidth	
  is	
  1/10	
  of	
  theoreNcal	
  maximum	
  for	
  2012	
  and	
  
2015;	
  up	
  to	
  1/3	
  theoreNcal	
  max	
  for	
  2018	
  (sohware	
  improvements)	
  
•  sqh	
  per	
  person	
  and	
  $/sqh	
  references	
  
hip://www.inc.com/news/arNcles/2010/10/washington-­‐dc-­‐rents-­‐top-­‐those-­‐in-­‐nyc.html	
  
hip://newsfeed.Nme.com/2011/02/08/youre-­‐not-­‐imagining-­‐it-­‐your-­‐cubicle-­‐is-­‐gekng-­‐smaller/	
  
References	
  
•  Why	
  access	
  to	
  data	
  maiers,	
  not	
  just	
  “dark	
  storage”,	
  
but	
  wide	
  access	
  to	
  electronic	
  data:	
  
–  The	
  Internet	
  Archive	
  
–  hip://archive.org/about/	
  
–  History	
  of	
  the	
  Internet,	
  sNll	
  online	
  aher	
  20	
  years	
  
–  hip://www.cs.cmu.edu/~riedel/library/birthday.html	
  
	
  (from	
  April	
  2003,	
  LoC	
  workshop	
  on	
  Digital	
  PreservaNon)	
  
•  What	
  about	
  Flash?	
  
–  Death	
  of	
  Disks	
  (has	
  been	
  widely	
  exaggerated)	
  
–  hip://www.cs.cmu.edu/~riedel/#HECFSIO2011	
  
–  How	
  to	
  Build	
  Big	
  Storage	
  as	
  a	
  Cloud	
  
–  hip://storageconference.org/2012/PresentaNons/R00.Keynote.pdf	
  
Backup	
  
What	
  About	
  Tape?	
  
pictures	
  by	
  Gill	
  Wildman	
  via	
  flickr/cc	
  
What	
  About	
  Tape?	
  
•  Tapes	
  are	
  not	
  a	
  commodity	
  technology	
  
•  2011	
  total	
  worldwide	
  market	
  for	
  tape	
  cartridges	
  
is	
  about	
  8m	
  units	
  (just	
  under	
  $1b	
  annual	
  
revenue)	
  
•  Compare	
  to	
  the	
  HDD	
  business	
  at	
  650m	
  units	
  in	
  
2010	
  (close	
  to	
  $40b	
  annual	
  revenue)	
  
•  80	
  disk	
  drives	
  are	
  manufactured	
  for	
  each	
  tape	
  
cartridge;	
  robots	
  are	
  complicated	
  
•  Fits	
  parNcular	
  applicaNon	
  segments	
  very	
  well,	
  but	
  
is	
  not	
  a	
  general-­‐purpose	
  soluNon	
  
hip://www.storagenewsleier.com/news/tapes/sccg-­‐ww-­‐tape-­‐market-­‐lto-­‐1q11	
  
hip://techreport.com/discussions.x/20890	
  
David	
  Anderson,	
  James	
  Dykes,	
  Erik	
  Riedel	
  “SCSI	
  vs.	
  ATA	
  -­‐	
  More	
  than	
  
an	
  interface”	
  2nd	
  Conference	
  on	
  File	
  and	
  Storage	
  Technology	
  (FAST).	
  
San	
  Francisco,	
  CA.	
  April	
  2003.	
  www.cs.cmu.edu/~riedel/#SCSIvsATA	
  

More Related Content

PDF
PDL Distinguished Alumni Talk
PPTX
Exchange 2010 storage improvements
PPTX
The future of tape april 16
PPTX
PASS Summit 2009 Keynote Dave DeWitt
PDF
The Details That Matter: Kafka in Production, at Scale with Or Arnon and Elad...
PDF
AWS vs Azure vs Google Cloud Storage Deep Dive
PPTX
Accelerating hbase with nvme and bucket cache
PPTX
Storage devices
PDL Distinguished Alumni Talk
Exchange 2010 storage improvements
The future of tape april 16
PASS Summit 2009 Keynote Dave DeWitt
The Details That Matter: Kafka in Production, at Scale with Or Arnon and Elad...
AWS vs Azure vs Google Cloud Storage Deep Dive
Accelerating hbase with nvme and bucket cache
Storage devices

Similar to Long-Term Storage - Panel Session @ Library of Congress Workshop (20)

PPTX
Storage devices
PPT
Storage: Alternate Futures
PPT
Optimizing Your WAN Bandwidth Has Immediate ROI
PPTX
Accelerating forensic and incident response workflow: the case for a new stan...
PDF
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
PDF
PDF
Bandwidthreport
PDF
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
PPTX
Chapter 8a: PowerPoint Presentation for External Hard Drives
PPTX
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...
PDF
History of data storage: Infographic
PPT
Blue Ray Disc
DOC
10tb hard drive
PPTX
Cis1 202d-ch8b-project-valenzuela-zibouche
PPTX
Presentation on Blu ray disc by gautam
PPTX
Blu ray disc by gautam
PPTX
SSD-Bondi.pptx
PDF
OSBConf 2015 | Contemporary and cost efficient backups to to tape by josef we...
PPT
Real Cost of Physical Media Distribution
PPTX
The Next Leap Forward LTO-7 - Spectra Logic
Storage devices
Storage: Alternate Futures
Optimizing Your WAN Bandwidth Has Immediate ROI
Accelerating forensic and incident response workflow: the case for a new stan...
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
Bandwidthreport
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
Chapter 8a: PowerPoint Presentation for External Hard Drives
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...
History of data storage: Infographic
Blue Ray Disc
10tb hard drive
Cis1 202d-ch8b-project-valenzuela-zibouche
Presentation on Blu ray disc by gautam
Blu ray disc by gautam
SSD-Bondi.pptx
OSBConf 2015 | Contemporary and cost efficient backups to to tape by josef we...
Real Cost of Physical Media Distribution
The Next Leap Forward LTO-7 - Spectra Logic
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
Modernising the Digital Integration Hub
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
DOCX
search engine optimization ppt fir known well about this
PDF
Five Habits of High-Impact Board Members
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
Configure Apache Mutual Authentication
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PPTX
The various Industrial Revolutions .pptx
PDF
Flame analysis and combustion estimation using large language and vision assi...
NewMind AI Weekly Chronicles – August ’25 Week III
Modernising the Digital Integration Hub
Microsoft Excel 365/2024 Beginner's training
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
search engine optimization ppt fir known well about this
Five Habits of High-Impact Board Members
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Taming the Chaos: How to Turn Unstructured Data into Decisions
Chapter 5: Probability Theory and Statistics
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Benefits of Physical activity for teenagers.pptx
1 - Historical Antecedents, Social Consideration.pdf
Custom Battery Pack Design Considerations for Performance and Safety
Configure Apache Mutual Authentication
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Final SEM Unit 1 for mit wpu at pune .pptx
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
sbt 2.0: go big (Scala Days 2025 edition)
The various Industrial Revolutions .pptx
Flame analysis and combustion estimation using large language and vision assi...
Ad

Long-Term Storage - Panel Session @ Library of Congress Workshop

  • 1. Long-­‐Term  Storage   Panel  Session   Erik  Riedel,  EMC   Library  of  Congress  Workshop   September  2012   top  picture  “Once  Blue”  by  Jesse  Wagstaff  via  flickr/cc     right  picture  by  AusNn  Marshall  via  flickr/cc   revision  3  
  • 2. Parameters   •  Non-­‐compressible  data   •  Long-­‐term  storage   •  Very  high  reliability   •  Request  rate  of  10%  per  year   •  5,  20,  50  PB  in  2012,  2015,  2018  
  • 3. Density   2012   Disks  (raw)  @  3TB   Disks  (protected)   Racks  @  480  disks   5  PB   1,700  disks   2,700  disks   6  racks   20  PB   6,700  disks   11,000  disks   23  racks   50  PB   17,000  disks   27,000  disks   56  racks  
  • 4. Density   2012   Disks  (raw)  @  3TB   Disks  (protected)   Racks  @  480  disks   5  PB   1,700  disks   2,700  disks   6  racks   20  PB   6,700  disks   11,000  disks   23  racks   50  PB   17,000  disks   27,000  disks   56  racks   2015   Disks  (raw)  @  6TB   Disks  (protected)   Racks  @  600  disks   5  PB   830  disks   1,300  disks   3  racks   20  PB   3,300  disks   5,300  disks   9  racks   50  PB   8,300  disks   13,000  disks   23  racks  
  • 5. Density   2012   Disks  (raw)  @  3TB   Disks  (protected)   Racks  @  480  disks   5  PB   1,700  disks   2,700  disks   6  racks   20  PB   6,700  disks   11,000  disks   23  racks   50  PB   17,000  disks   27,000  disks   56  racks   2015   Disks  (raw)  @  6TB   Disks  (protected)   Racks  @  600  disks   5  PB   830  disks   1,300  disks   3  racks   20  PB   3,300  disks   5,300  disks   9  racks   50  PB   8,300  disks   13,000  disks   23  racks   2018   Disks  (raw)  @  10TB   Disks  (protected)   Racks  @  600  disks   5  PB   500  disks   800  disks   2  racks   20  PB   2,000  disks   3,200  disks   6  racks   50  PB   5,000  disks   8,000  disks   14  racks  
  • 6. Performance   2012   10%/yr   Disks   Disk  BW   Racks   Bandwidth   Actual  BW   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21  
  • 7. Performance   2012   10%/yr   Disks   Disk  BW   Racks   Bandwidth   Actual  BW   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21   2012   10%/2day   Disks   Disk  BW   Racks   Bandwidth   Actual  BW   Days-­‐to-­‐fill   5  PB   2.9  GB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   11  GB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   29  GB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21  
  • 8. Performance   2012   10%/yr   Disks   Disk  BW   Racks   Bandwidth   Actual  BW   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21   2018   10%/2day   Disks   Disk  BW   Racks   Bandwidth   Actual  BW   Days-­‐to-­‐fill   5  PB   2.9  GB/s   800   80  GB/s   2     10  GB/s   3.3  GB/s   17   20  PB   11  GB/s   3,200   320  GB/s   6   30  GB/s   10  GB/s   23   50  PB   29  GB/s   8,000   800  GB/s   14   70  GB/s   23  GB/s   25  
  • 9. Cost   2012   10%yr   Disks   Disk  BW   Racks   Bandwidth   Actual   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21  
  • 10. Cost   2012   10%yr   Disks   Disk  BW   Racks   Bandwidth   Actual   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21   2012   $/month  @  $0.01/GB   5  PB   $50,000/month   20  PB   $200,000/month   50  PB   $500,000/month   Cost  if  using  e.g.  “cold”  public  cloud  storage  
  • 11. Cost   2012   10%yr   Disks   Disk  BW   Racks   Bandwidth   Actual   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21   2012   sqN/person   $/sqN   $/month   20  employees   90   $48     $86,000/month   Washington,  DC   80  employees   75   $48   $288,000/month   Washington,  DC   200  employees   75   $24   $360,000/month   Minneapolis,  MN   2012   $/month  @  $0.01/GB   5  PB   $50,000/month   20  PB   $200,000/month   50  PB   $500,000/month   Cost  if  using  e.g.  “cold”  public  cloud  storage   For  comparison,  the  cost  to  “store”   20  librarians  or  data  scienNsts  
  • 12. AssumpNons   •  Data  protecNon  in  a  single  data  center,  using  an  erasure-­‐coding   scheme  at  1.6x  overhead   •  480  drive  racks  in  2012  (40U)   •  600  drive  racks  in  2015  and  2018  (50+U)   •  10%/year  access  assumes  10%  of  total  data  is  accessed  in  even   distribuNon  over  365  days/year,  24  hours/day  –  opNmisNc   •  10%/2day  access  assumes  10%  of  data  is  accessed  on  only  2  days   per  year  (say  Thanksgiving  and  Xmas)  –  very  bursty   •  Bandwidth  is  theoreNcal  bandwidth  at  40  Gb/s  per  rack  (4x  10  GbE)   •  Actual  bandwidth  is  1/10  of  theoreNcal  maximum  for  2012  and   2015;  up  to  1/3  theoreNcal  max  for  2018  (sohware  improvements)   •  sqh  per  person  and  $/sqh  references   hip://www.inc.com/news/arNcles/2010/10/washington-­‐dc-­‐rents-­‐top-­‐those-­‐in-­‐nyc.html   hip://newsfeed.Nme.com/2011/02/08/youre-­‐not-­‐imagining-­‐it-­‐your-­‐cubicle-­‐is-­‐gekng-­‐smaller/  
  • 13. References   •  Why  access  to  data  maiers,  not  just  “dark  storage”,   but  wide  access  to  electronic  data:   –  The  Internet  Archive   –  hip://archive.org/about/   –  History  of  the  Internet,  sNll  online  aher  20  years   –  hip://www.cs.cmu.edu/~riedel/library/birthday.html    (from  April  2003,  LoC  workshop  on  Digital  PreservaNon)   •  What  about  Flash?   –  Death  of  Disks  (has  been  widely  exaggerated)   –  hip://www.cs.cmu.edu/~riedel/#HECFSIO2011   –  How  to  Build  Big  Storage  as  a  Cloud   –  hip://storageconference.org/2012/PresentaNons/R00.Keynote.pdf  
  • 15. What  About  Tape?   pictures  by  Gill  Wildman  via  flickr/cc  
  • 16. What  About  Tape?   •  Tapes  are  not  a  commodity  technology   •  2011  total  worldwide  market  for  tape  cartridges   is  about  8m  units  (just  under  $1b  annual   revenue)   •  Compare  to  the  HDD  business  at  650m  units  in   2010  (close  to  $40b  annual  revenue)   •  80  disk  drives  are  manufactured  for  each  tape   cartridge;  robots  are  complicated   •  Fits  parNcular  applicaNon  segments  very  well,  but   is  not  a  general-­‐purpose  soluNon   hip://www.storagenewsleier.com/news/tapes/sccg-­‐ww-­‐tape-­‐market-­‐lto-­‐1q11   hip://techreport.com/discussions.x/20890  
  • 17. David  Anderson,  James  Dykes,  Erik  Riedel  “SCSI  vs.  ATA  -­‐  More  than   an  interface”  2nd  Conference  on  File  and  Storage  Technology  (FAST).   San  Francisco,  CA.  April  2003.  www.cs.cmu.edu/~riedel/#SCSIvsATA