SlideShare a Scribd company logo
What are the risks that may affect the availability of a data center
Availability of a data center means the maximum uptime that the operation of a
data center work without any failure. Availability is determined by a system’s
reliability and it’s recovery time. Understanding that the system downtime can cause
major impact on business entities, it is necessary to know what are the factors that
can impact on data center availability.
Generally these factors can be divided into 4 and listed as below,
• Nature
• Human
• Utility
• Equipment
Nature
This factor is having one of the major impact on availability of a data centers. We
can’t predict the nature of earth which may change any time and cause to a
complete disaster. This will include tornadoes, hurricanes, flooding , earthquakes etc.
Control against the natural calamities by humans are really less hence this can have
a major impact on availability of data center. Maintaining data access in the event of
a disaster can mean the difference between a company’s success or failure. So let us
have a look at some of the incidents that were occurred in various companies and
their data centers.
• Lightning: They say lightning doesn’t strike the same place twice, but in 2015 one of
Google’s European data centers was struck by lightning not once, but four times,
causing errors in 5% of the disks responsible for Google Compute Engine (GCE)
instances. Although the company restored many of the drives, an estimated
0.000001% of data stored in the data center was irrecoverably lost. While that might
not sound like much, try telling that to the customers who were affected by it.
• Hurricanes: According to National Geographic, 2017 was the most expensive
hurricane season in U.S. history, costing roughly $200 billion. With their combination
of high winds, storm surge, and heavy rains, hurricanes are one of the most
dangerous natural disasters data centers must contend with. The sudden flooding
resulting from Hurricane Sandy in 2012 caused extensive data center outages in New
York and New Jersey. These failures were made even worse by the fact that backup
systems were located in the same geographic region and where knocked out by the
same weather event.
• Tornadoes: A devastating 2011 tornado ripped through several hospital buildings in
Joplin, Missouri, one of which was a data center. While none of the data lost was
mission-critical, that was only because most of the information stored there had
been migrated to a new offsite data center just a few weeks earlier. Hospital officials
noted that if the tornado had hit a month earlier, the data loss would have been
catastrophic and rendered the hospital completely inoperable.
• Flooding: Severe flooding in Leeds, UK caused a Vodafone data center to temporarily
lose power during Christmas of 2015. While data loss was negligible, the power
outage disrupted mobile phone service temporarily. Vodafone, of course, has a bit of
history with flooding, having suffered one of the most infamous data center disasters
when its Istanbul data center was devastated by flooding in 2009.
• Earthquakes: So far, data centers have been lucky. Modern architectural standards
and additional precautions (such as special enclosures and rollers for server racks)
have gone a long way towards protecting data centers from earthquakes, even in
high-risk areas.
• The Unexpected…: Disaster planning is all about expecting the unexpected. Take, for
instance, the squirrel that knocked Yahoo’s Santa Clara data center offline for several
hours in 2010, or the truck that drove into a transformer feeding power into a
Backspace data center in 2007.
Human
According to a survey conducted by Aperture Research Institute, human errors are
behind 57.3% of all data center outages. The second most common reason was
improper failover with 43.7%.
Above: Diagram from the Aperture survey.
Let me tell you the another survey details as well,
According Uptime Institute: 70% of DC Outages due to Human Error and not by a fault in
the infrastructure design. Furthermore, “mistakes” that led to an outage can often be
traced to a poor decision by senior management.
The results from both the organization can be different due to the reasons that it may be
conducted on different entities and different environment. As a summary of both of these
surveys we can conclude that the DC outage due to human mistakes are really much higher
than any other dependencies. Let’s take an example of human raised DC issues,
• Activation of the emergency power-off (EPO) switch
• Adjusting the temperature from Fahrenheit to Celsius
• Pulling power cords out of equipment
• Overloading a circuit
• Not following standard policies or procedures
To minimize the risk of the “human factor” affecting operations, it is important to
have up-to-date documentation on everything connected to your data center and
manuals on how different critical operations should be performed. Manuals and
documentation together with scheduled tests should help you avoid many of the
problems and outages described in this survey.
Utility
In the case of a data center the major source of utility is the electric power that is
drawn to data center from local providers(can be a government entity or private
entity). The secondary utility for a data center would be the Diesel generators and
UPS systems. All other mechanical parts related to data center is directly or indirectly
depend on the availability of utility.
An Uptime Institute survey finds the power usage effectiveness of data centers is
better than ever. However it is also true that survey indicates that the power outages
have increased significantly. The Global Data Center Survey report from Uptime
Institute gathered responses from nearly 900 data center operators and IT
practitioners, both from major data center providers and from private, company-
owned data centers(you can download the report from above link).
Even though we do prepare all equipment’s for redundancies there is chances that
these machines may not work as expected at the time of any incidents. One of the
incident that I can get you is that - Diesel rotary uninterruptable power supply
(DRUPS) systems were implicated in power disruptions that in 2014 affected Amazon
Web Services in Sydney, a former Telecity facility called Sovereign House in London,
now owned by Digital Realty Trust, and the Singapore Stock Exchange. Disruption at
Amazon was caused by what the company called “an unusually long voltage sag.”. If
you go through these incident you will understand the root cause of outage is due to
utility failure and subsequent machines failed to start. Some of the incidents that is
reported in data center imminent failure is as below,
• Generator fail to start.
• Generator fails after X number of hours running.
• Utility power partially fails(usually one of three phases- phase loss)
• UPS fails to switch to battery
• UPS fails to switch from battery to input power
From these incidents we can all say that maintaining the periodic checks, preventive
maintenance tasks are really important that would really help a lot to avoid the
impact of failures.
Equipment
As you know the data center infrastructure is a large collection of multiple
equipment and success is depending on the efficiency of all these together. Any
equipment related to electric, mechanical, cooling, networking , servers are having
chances to fail on an unexpected timeframe. Whether it’s a server reaching the end
of its five-year expected lifespan or a UPS backup battery dying before it should,
equipment failure is one of the most common causes of data center outages.
With today’s powerful data center infrastructure management (DCIM) tools, facilities
can monitor the overall health of their own equipment as well as colocated assets.
While it may not be possible to predict every failure, sophisticated algorithms can
monitor equipment performance continually to anticipate when hardware is
reaching the end of its lifecycle or is prone to break down. When these problems are
identified, data center personnel can plan to switch out faulty or outdated
equipment without having to take critical systems offline. With the
right redundancies and backups and emergency spares, in place, even an unexpected
failure can be managed without compromising network performance.
Source : www.vxchnge.com & www.pingdom.com
Have a comment or points to be review? Knowledge is power and it increases by
sharing. Feel free to comment.

More Related Content

PDF
Transf React Proact T&D Ass Management
PPT
BA 257 C1.C2
PDF
Mastering disaster a data center checklist
PDF
Top Three Root Causes of Data Center Outages
DOCX
IT4215-Info SecurityGroup-2-Disaster-Recovery-Plan-Final
PPT
Data Science and Smart Systems: Creating the Digital Brain
PDF
Automated and dynamic maintenance keeps your computers healthy and performing...
PPT
Neches And Upperman, Wiscr
Transf React Proact T&D Ass Management
BA 257 C1.C2
Mastering disaster a data center checklist
Top Three Root Causes of Data Center Outages
IT4215-Info SecurityGroup-2-Disaster-Recovery-Plan-Final
Data Science and Smart Systems: Creating the Digital Brain
Automated and dynamic maintenance keeps your computers healthy and performing...
Neches And Upperman, Wiscr

Similar to What are the risks that may affect the availability of a data center (20)

PDF
Will You Be Prepared When The Next Disaster Strikes - Whitepaper
PPTX
DistribuTECH 2016: OMNETRIC next generation outage management
PPT
Earthlink Business Cloud Disaster Recovery
PDF
Null Feb 13
PDF
Critical Infrastructure Security Talk At Null Bangalore 13 Feb 2010 Sundar N
PDF
Mastering disaster e book Telehouse
PDF
E guide weathering the storm at your business
PDF
Datacenter Infrastructure Security
PDF
V mware business trend brief - crash insurance - protect your business with...
PDF
Business continuity overview
PDF
Wide area protection-and_emergency_control (1)
PDF
Long form final
PPTX
Disaster Recovery
PDF
Kpacket 2014 Top_Ten_Guide
PDF
IRJET-Comparative Analysis of Disaster Recovery Solutions in Cloud Computing
PPTX
SANOG34-Tutorials-datacentre.pptx
PPTX
European Utility Week 2015: Next Generation Outage Management
PDF
Cyber Security for SCADA
PPTX
Federal Webinar: Slow is the New Broke: Improving Government Efficiency with ...
PDF
Successful_BC_Strategy.pdf
Will You Be Prepared When The Next Disaster Strikes - Whitepaper
DistribuTECH 2016: OMNETRIC next generation outage management
Earthlink Business Cloud Disaster Recovery
Null Feb 13
Critical Infrastructure Security Talk At Null Bangalore 13 Feb 2010 Sundar N
Mastering disaster e book Telehouse
E guide weathering the storm at your business
Datacenter Infrastructure Security
V mware business trend brief - crash insurance - protect your business with...
Business continuity overview
Wide area protection-and_emergency_control (1)
Long form final
Disaster Recovery
Kpacket 2014 Top_Ten_Guide
IRJET-Comparative Analysis of Disaster Recovery Solutions in Cloud Computing
SANOG34-Tutorials-datacentre.pptx
European Utility Week 2015: Next Generation Outage Management
Cyber Security for SCADA
Federal Webinar: Slow is the New Broke: Improving Government Efficiency with ...
Successful_BC_Strategy.pdf
Ad

More from Livin Jose (9)

PDF
Data center cooling infrastructure slide
PDF
Data center power infrastructure
PDF
Compliance policies and procedures followed in data centers
PDF
What are cloud service models
PDF
What are the types of cloud computing
PDF
Data center power availability provisioning
PDF
What is data center availability modes slide
PDF
What is a data center
PDF
What are the types of data centers
Data center cooling infrastructure slide
Data center power infrastructure
Compliance policies and procedures followed in data centers
What are cloud service models
What are the types of cloud computing
Data center power availability provisioning
What is data center availability modes slide
What is a data center
What are the types of data centers
Ad

Recently uploaded (20)

PDF
Advanced IT Governance
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Big Data Technologies - Introduction.pptx
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
Teaching material agriculture food technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Advanced IT Governance
NewMind AI Weekly Chronicles - August'25 Week I
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Approach and Philosophy of On baking technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Understanding_Digital_Forensics_Presentation.pptx
Advanced Soft Computing BINUS July 2025.pdf
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Big Data Technologies - Introduction.pptx
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectral efficient network and resource selection model in 5G networks
Teaching material agriculture food technology
Advanced methodologies resolving dimensionality complications for autism neur...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

What are the risks that may affect the availability of a data center

  • 1. What are the risks that may affect the availability of a data center Availability of a data center means the maximum uptime that the operation of a data center work without any failure. Availability is determined by a system’s reliability and it’s recovery time. Understanding that the system downtime can cause major impact on business entities, it is necessary to know what are the factors that can impact on data center availability. Generally these factors can be divided into 4 and listed as below, • Nature • Human • Utility • Equipment Nature This factor is having one of the major impact on availability of a data centers. We can’t predict the nature of earth which may change any time and cause to a complete disaster. This will include tornadoes, hurricanes, flooding , earthquakes etc. Control against the natural calamities by humans are really less hence this can have a major impact on availability of data center. Maintaining data access in the event of a disaster can mean the difference between a company’s success or failure. So let us have a look at some of the incidents that were occurred in various companies and their data centers. • Lightning: They say lightning doesn’t strike the same place twice, but in 2015 one of Google’s European data centers was struck by lightning not once, but four times, causing errors in 5% of the disks responsible for Google Compute Engine (GCE) instances. Although the company restored many of the drives, an estimated 0.000001% of data stored in the data center was irrecoverably lost. While that might not sound like much, try telling that to the customers who were affected by it. • Hurricanes: According to National Geographic, 2017 was the most expensive hurricane season in U.S. history, costing roughly $200 billion. With their combination of high winds, storm surge, and heavy rains, hurricanes are one of the most dangerous natural disasters data centers must contend with. The sudden flooding
  • 2. resulting from Hurricane Sandy in 2012 caused extensive data center outages in New York and New Jersey. These failures were made even worse by the fact that backup systems were located in the same geographic region and where knocked out by the same weather event. • Tornadoes: A devastating 2011 tornado ripped through several hospital buildings in Joplin, Missouri, one of which was a data center. While none of the data lost was mission-critical, that was only because most of the information stored there had been migrated to a new offsite data center just a few weeks earlier. Hospital officials noted that if the tornado had hit a month earlier, the data loss would have been catastrophic and rendered the hospital completely inoperable. • Flooding: Severe flooding in Leeds, UK caused a Vodafone data center to temporarily lose power during Christmas of 2015. While data loss was negligible, the power outage disrupted mobile phone service temporarily. Vodafone, of course, has a bit of history with flooding, having suffered one of the most infamous data center disasters when its Istanbul data center was devastated by flooding in 2009. • Earthquakes: So far, data centers have been lucky. Modern architectural standards and additional precautions (such as special enclosures and rollers for server racks) have gone a long way towards protecting data centers from earthquakes, even in high-risk areas. • The Unexpected…: Disaster planning is all about expecting the unexpected. Take, for instance, the squirrel that knocked Yahoo’s Santa Clara data center offline for several hours in 2010, or the truck that drove into a transformer feeding power into a Backspace data center in 2007. Human According to a survey conducted by Aperture Research Institute, human errors are behind 57.3% of all data center outages. The second most common reason was improper failover with 43.7%.
  • 3. Above: Diagram from the Aperture survey. Let me tell you the another survey details as well, According Uptime Institute: 70% of DC Outages due to Human Error and not by a fault in the infrastructure design. Furthermore, “mistakes” that led to an outage can often be traced to a poor decision by senior management. The results from both the organization can be different due to the reasons that it may be conducted on different entities and different environment. As a summary of both of these surveys we can conclude that the DC outage due to human mistakes are really much higher than any other dependencies. Let’s take an example of human raised DC issues, • Activation of the emergency power-off (EPO) switch • Adjusting the temperature from Fahrenheit to Celsius • Pulling power cords out of equipment • Overloading a circuit • Not following standard policies or procedures To minimize the risk of the “human factor” affecting operations, it is important to have up-to-date documentation on everything connected to your data center and manuals on how different critical operations should be performed. Manuals and
  • 4. documentation together with scheduled tests should help you avoid many of the problems and outages described in this survey. Utility In the case of a data center the major source of utility is the electric power that is drawn to data center from local providers(can be a government entity or private entity). The secondary utility for a data center would be the Diesel generators and UPS systems. All other mechanical parts related to data center is directly or indirectly depend on the availability of utility. An Uptime Institute survey finds the power usage effectiveness of data centers is better than ever. However it is also true that survey indicates that the power outages have increased significantly. The Global Data Center Survey report from Uptime Institute gathered responses from nearly 900 data center operators and IT practitioners, both from major data center providers and from private, company- owned data centers(you can download the report from above link). Even though we do prepare all equipment’s for redundancies there is chances that these machines may not work as expected at the time of any incidents. One of the incident that I can get you is that - Diesel rotary uninterruptable power supply (DRUPS) systems were implicated in power disruptions that in 2014 affected Amazon Web Services in Sydney, a former Telecity facility called Sovereign House in London, now owned by Digital Realty Trust, and the Singapore Stock Exchange. Disruption at Amazon was caused by what the company called “an unusually long voltage sag.”. If you go through these incident you will understand the root cause of outage is due to utility failure and subsequent machines failed to start. Some of the incidents that is reported in data center imminent failure is as below, • Generator fail to start. • Generator fails after X number of hours running. • Utility power partially fails(usually one of three phases- phase loss) • UPS fails to switch to battery • UPS fails to switch from battery to input power From these incidents we can all say that maintaining the periodic checks, preventive maintenance tasks are really important that would really help a lot to avoid the impact of failures.
  • 5. Equipment As you know the data center infrastructure is a large collection of multiple equipment and success is depending on the efficiency of all these together. Any equipment related to electric, mechanical, cooling, networking , servers are having chances to fail on an unexpected timeframe. Whether it’s a server reaching the end of its five-year expected lifespan or a UPS backup battery dying before it should, equipment failure is one of the most common causes of data center outages. With today’s powerful data center infrastructure management (DCIM) tools, facilities can monitor the overall health of their own equipment as well as colocated assets. While it may not be possible to predict every failure, sophisticated algorithms can monitor equipment performance continually to anticipate when hardware is reaching the end of its lifecycle or is prone to break down. When these problems are identified, data center personnel can plan to switch out faulty or outdated equipment without having to take critical systems offline. With the right redundancies and backups and emergency spares, in place, even an unexpected failure can be managed without compromising network performance. Source : www.vxchnge.com & www.pingdom.com
  • 6. Have a comment or points to be review? Knowledge is power and it increases by sharing. Feel free to comment.