SlideShare a Scribd company logo
DB2 pureScale
Availability & Recovery
© 2010 IBM CorporationOctober 13, 2010
Aamer Sachedina (aamers@ca.ibm.com)
Kelly Schlamb (kschlamb@ca.ibm.com)
Information Management
Continuous Availability
Protect from infrastructure outages
Automatic workload balancing
Duplexed secondary global lock
© 2009 IBM Corporation2
Automatically recovers from
component failures
Tolerates multiple node failures
Duplexed secondary global lock
and memory manager
Information Management
DB2 Cluster Services Overview
DB2DB2 DB2 DB2 DB2
Integrated DB2 component
Single install as part of DB2 installation
Upgrades and maintenance through DB2 fixpack
© 2009 IBM Corporation3
C
F
DB2 Cluster Services:
Cluster File System
(GPFS)
DB2 Cluster Services:
Cluster Manager (RSCT) Cluster Automation (Tivoli SA MP)
DB2DB2 DB2 DB2 DB2
C
F
Information Management
DB2 Cluster Services
DB2 Cluster Services
Reliable Scalable Cluster Technology
Tivoli Systems Automation for Multi-Platforms
IBM General Parallel File System
© 2009 IBM Corporation4
DB2 CS tightly integrates these IBM products into DB2
pureScale
DB2 instance creation creates RSCT and GPFS domains
across hosts
Single command used to add hosts to the instance:
db2iupdt –add -m newhost.acme.com db2inst1
Information Management
DB2 pureScale HA Architecture
Member
DB2 CS
Member
DB2 CS
Member
DB2 CS
Member
DB2 CS
© 2009 IBM Corporation5
Cluster Interconnect
GPFS
2nd-ary
CS
CS
Primary
Secondary
Information Management
Application Servers
and
DB2 Clients
Virtually Instantaneous Recovery From Node Failure
Protect from
infrastructure
related outages
– Automatically
redistribute workload to
© 2009 IBM Corporation6
redistribute workload to
surviving nodes
– Automatically recover
in-flight transactions in
as little as 15-20
seconds including
detection
of the problem
Information Management
Minimize the Impact of Planned Outages
Bring node
Keep your system up
– During OS fixes
– HW updates
– Administration
© 2009 IBM Corporation7
Identify MemberDo Maintenance
Bring node
back online
Information Management
Member Hardware Failure
Clients
Power cord tripped over accidentally
DB2 Cluster Services looses heartbeat
and declares member down
– Informs other members & CF servers
– Fences member from logs and data
– Initiates automated member restart on another
(“guest”) host
> Using reduced, and pre-allocated memory model
– Member restart is like a database crash
recovery in a single system database, but is
much faster
• Redo limited to inflight transactions (due to FAC)
• Benefits from page cache in CF
In the mean-time, client connections
Single Database View
Automatic;
© 2009 IBM Corporation8
Log
CS
CS
DB2
Shared Data
In the mean-time, client connections
are automatically re-routed to healthy
members
– Based on least load (by default), or,
– Pre-designated failover member
Other members remain fully available
throughout – “Online Failover”
– Primary retains update locks held by member
at the time of failure
– Other members can continue to read and
update data not locked for write access by
failed member
Member restart completes
– Retained locks released and all data fully
available
CS
DB2
CS
DB2
CS
Updated Pages
Global Locks
LogLogLog
PrimarySecondary
Updated Pages
Global Locks
CS
DB2
DB2
Ultra Fast;
Online
Almost all data remains available. Affected connections transparently re-routed to other members.
Information Management
Member Failback
ClientsPower restored and system re-
booted
DB2 Cluster Services
automatically detects system
availability
– Informs other members and
Single Database View
© 2009 IBM Corporation9
Log
CS
CS
DB2
Shared Data
– Informs other members and
PowerHA pureScale servers
– Removes fence
– Brings up member on home host
Client connections automatically
re-routed back to member
CS
DB2
CS
CS
Updated Pages
Global Locks
LogLogLog
PrimarySecondary
Updated Pages
Global Locks
CS
DB2
DB2
DB2
Information Management
Member Hardware Failure and Failback
> db2instance -list
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT
0 MEMBER STARTED host0 host0 NO
1 MEMBER STARTED host1 host1 NO
2 MEMBER STARTED host2 host2 NO
3 MEMBER STARTED host3 host3 NO
4 CF PRIMARY host4 host4 NO
5 CF PEER host5 host5 NO
DB2 DB2 DB2 DB2
host1host0 host3host2
> db2instance -list
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT
0 MEMBER STARTED host0 host0 NO
1 MEMBER STARTED host1 host1 NO
2 MEMBER STARTED host2 host2 NO
3 MEMBER RESTARTING host3 host2 NO
4 CF PRIMARY host4 host4 NO
5 CF PEER host5 host5 NOCS
> db2instance -list
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT
0 MEMBER STARTED host0 host0 NO
1 MEMBER STARTED host1 host1 NO
2 MEMBER STARTED host2 host2 NO
3 MEMBER WAITING_FOR_FAILBACK host3 host2 NO
4 CF PRIMARY host4 host4 NO
5 CF PEER host5 host5 NO
© 2009 IBM Corporation10
5 CF PEER host5 host5 NO
HOST_NAME STATE INSTANCE_STOPPED ALERT
host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 ACTIVE NO NO
host4 ACTIVE NO NO
host5 ACTIVE NO NO
0 host0 0 - MEMBER
1 host1 0 - MEMBER
2 host2 0 - MEMBER
3 host3 0 - MEMBER
4 host4 0 - CF
5 host5 0 - CF
db2nodes.cfg
Shared Data
host4
PrimarySecondary
5 CF PEER host5 host5 NO
HOST_NAME STATE INSTANCE_STOPPED ALERT
host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 INACTIVE NO YES
host4 ACTIVE NO NO
host5 ACTIVE NO NO
Log
DB2
CS
LogLogLog
DB2
5 CF PEER host5 host5 NO
HOST_NAME STATE INSTANCE_STOPPED ALERT
host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 INACTIVE NO YES
host4 ACTIVE NO NO
host5 ACTIVE NO NO
host5
Shared Data
Failure Mode
DB2 DB2
DB2 DB2
CF CF
Member
Other
Members
Remain
Online ?
Automatic &
Transparent ? Comments
Only data that was in-
flight on failed member
remains locked temporarily.
Connections to failed
member transparently
Summary : Single Failure
DB2 DB2
DB2 DB2
CF CF
DB2 DB2
DB2 DB2
CF CF
Primary
CF
Secondary
CF
member transparently
move to another member
Momentary “blip” in CF service.
Transparent to members
(In-flight CCF
requests just take a few more
seconds before completing
normally.)
.
Momentary “blip” in CF service.
Transparent to members
(In-flight CF
requests just take a few more
seconds before completing
normally.)
.
DB2 DB2 DB2 DB2
CF CF
Failure Mode
Other
Members
Remain
Online ?
Automatic &
Transparent ? Comments
Only data that was in-
flight on failed members
remains locked temporarily.
Recoveries done in parallel.Connections to failed
member transparently
Summary : Multiple Failures
DB2 DB2 DB2 DB2
CF CF
DB2 DB2 DB2 DB2
CF CF
.
Same as member failure.
Momentary, transparent, “blip”
in CF service.
.
.
Same as member failure.
Momentary, transparent, “blip”
in CF service.
.
member transparently
move to another member
Connections to failed
member transparently
move to another member
Connections to failed
member transparently
move to another member

More Related Content

PDF
DB2 Pure Scale Webcast
PDF
Episode 4 DB2 pureScale Performance Webinar Oct 2010
PDF
DB2 pureScale Overview Sept 2010
PDF
A First Look at the DB2 10 DSNZPARM Changes
PDF
DB2 for z/OS Architecture in Nutshell
PDF
Db2 recovery IDUG EMEA 2013
PDF
Best practices for DB2 for z/OS log based recovery
PDF
DB2 Accounting Reporting
DB2 Pure Scale Webcast
Episode 4 DB2 pureScale Performance Webinar Oct 2010
DB2 pureScale Overview Sept 2010
A First Look at the DB2 10 DSNZPARM Changes
DB2 for z/OS Architecture in Nutshell
Db2 recovery IDUG EMEA 2013
Best practices for DB2 for z/OS log based recovery
DB2 Accounting Reporting

What's hot (14)

PDF
DB2 for z/OS - Starter's guide to memory monitoring and control
PDF
ALL ABOUT DB2 DSNZPARM
 
PDF
Using Release(deallocate) and Painful Lessons to be learned on DB2 locking
PDF
DB2 11 for z/OS Migration Planning and Early Customer Experiences
PDF
Educational seminar lessons learned from customer db2 for z os health check...
PDF
DB2 for z/OS and DASD-based Disaster Recovery - Blowing away the myths
PDF
DB2 for z/OS Real Storage Monitoring, Control and Planning
PPTX
IMSDB - DBRC
PDF
High Availability Options for DB2 Data Centre
PDF
Universal Table Spaces for DB2 10 for z/OS - IOD 2010 Seesion 1929 - favero
PDF
DB2 for z/OS Bufferpool Tuning win by Divide and Conquer or Lose by Multiply ...
PDF
Db2 for z os trends
PDF
Efficient Monitoring & Tuning of Dynamic SQL in DB2 for z/OS by Namik Hrle ...
PDF
Best Practices For Optimizing DB2 Performance Final
DB2 for z/OS - Starter's guide to memory monitoring and control
ALL ABOUT DB2 DSNZPARM
 
Using Release(deallocate) and Painful Lessons to be learned on DB2 locking
DB2 11 for z/OS Migration Planning and Early Customer Experiences
Educational seminar lessons learned from customer db2 for z os health check...
DB2 for z/OS and DASD-based Disaster Recovery - Blowing away the myths
DB2 for z/OS Real Storage Monitoring, Control and Planning
IMSDB - DBRC
High Availability Options for DB2 Data Centre
Universal Table Spaces for DB2 10 for z/OS - IOD 2010 Seesion 1929 - favero
DB2 for z/OS Bufferpool Tuning win by Divide and Conquer or Lose by Multiply ...
Db2 for z os trends
Efficient Monitoring & Tuning of Dynamic SQL in DB2 for z/OS by Namik Hrle ...
Best Practices For Optimizing DB2 Performance Final
Ad

Viewers also liked (9)

PDF
Episode 2 DB2 pureScale Installation, Instance Management & Monitoring
PPTX
Herd your chickens: Ansible for DB2 configuration management
PPTX
A DBA’s guide to using TSA
PPTX
Db2 v10.5 An Overview
PDF
UKGSE DB2 pureScale
PDF
DB2 9.7 Overview
PPTX
D02 Evolution of the HADR tool
PDF
DB2 V 10 HADR Multiple Standby
PDF
Blue Coat Internet Gateway
Episode 2 DB2 pureScale Installation, Instance Management & Monitoring
Herd your chickens: Ansible for DB2 configuration management
A DBA’s guide to using TSA
Db2 v10.5 An Overview
UKGSE DB2 pureScale
DB2 9.7 Overview
D02 Evolution of the HADR tool
DB2 V 10 HADR Multiple Standby
Blue Coat Internet Gateway
Ad

Similar to Episode 3 DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode] (20)

PPT
DB2 for z/O S Data Sharing
PPT
DB2UDB_the_Basics
PPT
MAINVIEW for DB2.ppt
PDF
DB2 Design for High Availability and Scalability
PPT
DB2UDB_the_Basics Day 4
PDF
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
PPT
DB2UDB_the_Basics Day 6
PPT
Showdown: IBM DB2 versus Oracle Database for OLTP
PDF
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016
PPT
Windows Server 2008 (Active Directory Yenilikleri)
PPTX
Sql 2012 always on
PDF
DBA Basics guide
PPTX
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
ODP
Debugging IBM Connections for the Impatient Admin - Social Connections VII
PPTX
70-410 Practice Test
PPTX
Load Balancing, Failover and Scalability with ColdFusion
ODP
IBM Lotusphere 2012 Show301: Leveraging the Sametime Proxy to support Mobile ...
PDF
Connect2013 id506 hadr ideas for social business
PPT
TWS 8.6 new features (from the 2013 European Tour)
PPTX
Database Mirror for the exceptional DBA – David Izahk
DB2 for z/O S Data Sharing
DB2UDB_the_Basics
MAINVIEW for DB2.ppt
DB2 Design for High Availability and Scalability
DB2UDB_the_Basics Day 4
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
DB2UDB_the_Basics Day 6
Showdown: IBM DB2 versus Oracle Database for OLTP
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016
Windows Server 2008 (Active Directory Yenilikleri)
Sql 2012 always on
DBA Basics guide
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
Debugging IBM Connections for the Impatient Admin - Social Connections VII
70-410 Practice Test
Load Balancing, Failover and Scalability with ColdFusion
IBM Lotusphere 2012 Show301: Leveraging the Sametime Proxy to support Mobile ...
Connect2013 id506 hadr ideas for social business
TWS 8.6 new features (from the 2013 European Tour)
Database Mirror for the exceptional DBA – David Izahk

More from Laura Hood (20)

PDF
Top 10 DB2 Support Nightmares #10
PDF
Top 10 DB2 Support Nightmares #9
PDF
Top 10 DB2 Support Nightmares #8
PDF
Top 10 DB2 Support Nightmares #7
PDF
Top 10 db2 support nightmares #6
PDF
Consultancy on Demand - Infographic
PDF
A Time Traveller's Guide to DB2: Technology Themes for 2014 and Beyond
PDF
Top 10 DB2 Support Nightmares #1
PDF
Db2 10 memory management uk db2 user group june 2013 [read-only]
PDF
DB2 10 Security Enhancements
PDF
DbB 10 Webcast #3 The Secrets Of Scalability
PDF
DB2 10 Webcast #2 - Justifying The Upgrade
PDF
DB2 10 Webcast #1 - Overview And Migration Planning
PDF
Time Travelling With DB2 10 For zOS
PDF
DB2DART - DB2Night Show October 2011
PDF
DB2 z/OS & Java - What\'s New?
PDF
Temporal And Other DB2 10 For Z Os Highlights
PDF
DB210 Smarter Database IBM Tech Forum 2011
PPTX
UKCMG DB2 pureScale
PDF
Episode 2 Installation Triton Slides
Top 10 DB2 Support Nightmares #10
Top 10 DB2 Support Nightmares #9
Top 10 DB2 Support Nightmares #8
Top 10 DB2 Support Nightmares #7
Top 10 db2 support nightmares #6
Consultancy on Demand - Infographic
A Time Traveller's Guide to DB2: Technology Themes for 2014 and Beyond
Top 10 DB2 Support Nightmares #1
Db2 10 memory management uk db2 user group june 2013 [read-only]
DB2 10 Security Enhancements
DbB 10 Webcast #3 The Secrets Of Scalability
DB2 10 Webcast #2 - Justifying The Upgrade
DB2 10 Webcast #1 - Overview And Migration Planning
Time Travelling With DB2 10 For zOS
DB2DART - DB2Night Show October 2011
DB2 z/OS & Java - What\'s New?
Temporal And Other DB2 10 For Z Os Highlights
DB210 Smarter Database IBM Tech Forum 2011
UKCMG DB2 pureScale
Episode 2 Installation Triton Slides

Episode 3 DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

  • 1. DB2 pureScale Availability & Recovery © 2010 IBM CorporationOctober 13, 2010 Aamer Sachedina (aamers@ca.ibm.com) Kelly Schlamb (kschlamb@ca.ibm.com)
  • 2. Information Management Continuous Availability Protect from infrastructure outages Automatic workload balancing Duplexed secondary global lock © 2009 IBM Corporation2 Automatically recovers from component failures Tolerates multiple node failures Duplexed secondary global lock and memory manager
  • 3. Information Management DB2 Cluster Services Overview DB2DB2 DB2 DB2 DB2 Integrated DB2 component Single install as part of DB2 installation Upgrades and maintenance through DB2 fixpack © 2009 IBM Corporation3 C F DB2 Cluster Services: Cluster File System (GPFS) DB2 Cluster Services: Cluster Manager (RSCT) Cluster Automation (Tivoli SA MP) DB2DB2 DB2 DB2 DB2 C F
  • 4. Information Management DB2 Cluster Services DB2 Cluster Services Reliable Scalable Cluster Technology Tivoli Systems Automation for Multi-Platforms IBM General Parallel File System © 2009 IBM Corporation4 DB2 CS tightly integrates these IBM products into DB2 pureScale DB2 instance creation creates RSCT and GPFS domains across hosts Single command used to add hosts to the instance: db2iupdt –add -m newhost.acme.com db2inst1
  • 5. Information Management DB2 pureScale HA Architecture Member DB2 CS Member DB2 CS Member DB2 CS Member DB2 CS © 2009 IBM Corporation5 Cluster Interconnect GPFS 2nd-ary CS CS Primary Secondary
  • 6. Information Management Application Servers and DB2 Clients Virtually Instantaneous Recovery From Node Failure Protect from infrastructure related outages – Automatically redistribute workload to © 2009 IBM Corporation6 redistribute workload to surviving nodes – Automatically recover in-flight transactions in as little as 15-20 seconds including detection of the problem
  • 7. Information Management Minimize the Impact of Planned Outages Bring node Keep your system up – During OS fixes – HW updates – Administration © 2009 IBM Corporation7 Identify MemberDo Maintenance Bring node back online
  • 8. Information Management Member Hardware Failure Clients Power cord tripped over accidentally DB2 Cluster Services looses heartbeat and declares member down – Informs other members & CF servers – Fences member from logs and data – Initiates automated member restart on another (“guest”) host > Using reduced, and pre-allocated memory model – Member restart is like a database crash recovery in a single system database, but is much faster • Redo limited to inflight transactions (due to FAC) • Benefits from page cache in CF In the mean-time, client connections Single Database View Automatic; © 2009 IBM Corporation8 Log CS CS DB2 Shared Data In the mean-time, client connections are automatically re-routed to healthy members – Based on least load (by default), or, – Pre-designated failover member Other members remain fully available throughout – “Online Failover” – Primary retains update locks held by member at the time of failure – Other members can continue to read and update data not locked for write access by failed member Member restart completes – Retained locks released and all data fully available CS DB2 CS DB2 CS Updated Pages Global Locks LogLogLog PrimarySecondary Updated Pages Global Locks CS DB2 DB2 Ultra Fast; Online Almost all data remains available. Affected connections transparently re-routed to other members.
  • 9. Information Management Member Failback ClientsPower restored and system re- booted DB2 Cluster Services automatically detects system availability – Informs other members and Single Database View © 2009 IBM Corporation9 Log CS CS DB2 Shared Data – Informs other members and PowerHA pureScale servers – Removes fence – Brings up member on home host Client connections automatically re-routed back to member CS DB2 CS CS Updated Pages Global Locks LogLogLog PrimarySecondary Updated Pages Global Locks CS DB2 DB2 DB2
  • 10. Information Management Member Hardware Failure and Failback > db2instance -list ID TYPE STATE HOME_HOST CURRENT_HOST ALERT 0 MEMBER STARTED host0 host0 NO 1 MEMBER STARTED host1 host1 NO 2 MEMBER STARTED host2 host2 NO 3 MEMBER STARTED host3 host3 NO 4 CF PRIMARY host4 host4 NO 5 CF PEER host5 host5 NO DB2 DB2 DB2 DB2 host1host0 host3host2 > db2instance -list ID TYPE STATE HOME_HOST CURRENT_HOST ALERT 0 MEMBER STARTED host0 host0 NO 1 MEMBER STARTED host1 host1 NO 2 MEMBER STARTED host2 host2 NO 3 MEMBER RESTARTING host3 host2 NO 4 CF PRIMARY host4 host4 NO 5 CF PEER host5 host5 NOCS > db2instance -list ID TYPE STATE HOME_HOST CURRENT_HOST ALERT 0 MEMBER STARTED host0 host0 NO 1 MEMBER STARTED host1 host1 NO 2 MEMBER STARTED host2 host2 NO 3 MEMBER WAITING_FOR_FAILBACK host3 host2 NO 4 CF PRIMARY host4 host4 NO 5 CF PEER host5 host5 NO © 2009 IBM Corporation10 5 CF PEER host5 host5 NO HOST_NAME STATE INSTANCE_STOPPED ALERT host0 ACTIVE NO NO host1 ACTIVE NO NO host2 ACTIVE NO NO host3 ACTIVE NO NO host4 ACTIVE NO NO host5 ACTIVE NO NO 0 host0 0 - MEMBER 1 host1 0 - MEMBER 2 host2 0 - MEMBER 3 host3 0 - MEMBER 4 host4 0 - CF 5 host5 0 - CF db2nodes.cfg Shared Data host4 PrimarySecondary 5 CF PEER host5 host5 NO HOST_NAME STATE INSTANCE_STOPPED ALERT host0 ACTIVE NO NO host1 ACTIVE NO NO host2 ACTIVE NO NO host3 INACTIVE NO YES host4 ACTIVE NO NO host5 ACTIVE NO NO Log DB2 CS LogLogLog DB2 5 CF PEER host5 host5 NO HOST_NAME STATE INSTANCE_STOPPED ALERT host0 ACTIVE NO NO host1 ACTIVE NO NO host2 ACTIVE NO NO host3 INACTIVE NO YES host4 ACTIVE NO NO host5 ACTIVE NO NO host5 Shared Data
  • 11. Failure Mode DB2 DB2 DB2 DB2 CF CF Member Other Members Remain Online ? Automatic & Transparent ? Comments Only data that was in- flight on failed member remains locked temporarily. Connections to failed member transparently Summary : Single Failure DB2 DB2 DB2 DB2 CF CF DB2 DB2 DB2 DB2 CF CF Primary CF Secondary CF member transparently move to another member Momentary “blip” in CF service. Transparent to members (In-flight CCF requests just take a few more seconds before completing normally.) . Momentary “blip” in CF service. Transparent to members (In-flight CF requests just take a few more seconds before completing normally.) .
  • 12. DB2 DB2 DB2 DB2 CF CF Failure Mode Other Members Remain Online ? Automatic & Transparent ? Comments Only data that was in- flight on failed members remains locked temporarily. Recoveries done in parallel.Connections to failed member transparently Summary : Multiple Failures DB2 DB2 DB2 DB2 CF CF DB2 DB2 DB2 DB2 CF CF . Same as member failure. Momentary, transparent, “blip” in CF service. . . Same as member failure. Momentary, transparent, “blip” in CF service. . member transparently move to another member Connections to failed member transparently move to another member Connections to failed member transparently move to another member