GDPS and System Complex

G D P S A N D S Y S T E M C O M P L E X
M A I N F R A M E C L U S T E R I N G
Najmi Mansoor Ahmed
Principal Architect PSS (IBM ALCS v241 under z/OS 2.1)
Presented on 23-Nov-2016

BASICS
• A system is made up of hardware products including a central processor (CPU), and
software products, with the primary one being an operating system such as z/OS.
• The CPU and other system hardware, such as channels and storage (RAM), make up a
Central Processor Complex (CPC) or in general terms – mainframe box.
Mainframe
Disks

• It is possible to run mainframe with a single processor /uniprocessor (CP) , but this is not
a typical system.
• When all the CPs share central storage and a single OS image manages the processing,
work is assigned to a CP that is available to do the work. If a CP fails, work can be routed
to another CP.
Mainframe
Disks
z/OS
CP1
Multi and Uniprocessor
 The ability to partition a large system into multiple smaller systems, called logical
partitions or LPARs, is now a core requirement in practically all mainframe installations.

It allows to build virtual clusters of CPUs, OSs and applications with in a single box
Multi and Uniprocessor

• Mainframe has three major clustering techniques:
LPAR
Channels
z/OS
Disk control unit
Clustering
LPAR
Channels
z/OS
Disk control unit
Basic Shared Storage (DASD)
It is typically used when operations
staff controls which jobs go to which
system and ensures that there is no
conflict (both system trying to
update same data at the same time)
A channel is a high-speed data
bus.
Todays mainframe use FICON
(FIber CONnection) channels.
• Basic Shared DASD
• CTC rings
• Parallel Sysplex

• Channel-to-Channel CTC ring
LPAR
Channels
z/OS
Disk control unit
Clustering
LPAR
Channels
z/OS
Disk control unit
Channel-to-Channel (CTC)
It simulates an I/O device that can
be used by one system to
communicate with another and
provides data path and
synchronization for data transfer
When CTC is used to connect
two channels a loosely coupled
multiprocessing system is
established
CTC ring

• A loosely coupled configuration has more than one mainframes managed by more than
one z/OS image.
• Although a loosely coupled configuration increases system capacity, it is not as easy to
manage as either a uniprocessor or a tightly coupled multiprocessor.
• Each system must be managed separately, often by a human operator, who monitors
product-specific messages on a set of consoles for each system.
• Products and applications that need to communicate and are running on separate
systems have to create their own communication mechanism.
Loosely coupled multiprocessors
CP4CP3CP2CP1 z/OS Image CP4CP3CP2CP1 z/OS Image

• To help solve the difficulties of managing many z/OS systems, IBM introduced the z/OS systems complex or
sysplex
• A sysplex is a collection of z/OS systems that cooperate, using certain hardware and software products, to
process work.
Sysplex
Mainframe 4
Disks
Mainframe 3
Disks
Mainframe 2
Disks
Mainframe 1

• SYSPLEX - System Complex - Is clustering more systems for availability, sharing workload, recovery,
resource and data sharing.
• System complex can be build in same data centre or in two different data centres from 2 to 32
mainframes
Sysplex
Mainframe 4Mainframe 3Mainframe 2Mainframe 1

SYSPLEX
It is a Clustering technique
• Every server (node) / LPAR has access to data resources
• Every cloned sysplex enabled application can run on every
LPAR
• It appears as single large system having single operating
interface to control it

Base Sysplex
• Joining systems through Channel to Channel
(CTC) connections
Types of Sysplex
LPAR
Channels
z/OS
Disk control unit
LPAR
Channels
z/OS
Disk control unit
CTC ring

PARALLEL SYSPLEX
A Parallel Sysplex is a cluster of IBM mainframes acting together
as a single system image with z/OS.
Used for disaster recovery, Parallel Sysplex combines data
sharing and parallel computing to allow a cluster of up to 32
systems to share a workload for high performance and high
availability.

PARALLEL SYSPLEX
Parallel Sysplex = Base Sysplex + Coupling Facility (CF)
Mainframe 1 Mainframe 2
Site A Site B
System Complex - SysPlex

Parallel Sysplex
• An enhancement to Base Sysplex by joining systems through a
Facility (CF)
Types of Sysplex
LPAR
Channels
z/OS
Disk control unit
LPAR
Channels
z/OS
Disk control unit
CTC ring
Coupling
Facility
CF channels

Types of Sysplex
LPAR
Channels
z/OS
Disk control unit
LPAR
Channels
z/OS
Disk control unit
CTC ring
Coupling
Facility
CF channels;
 Parallel Sysplex technology is an enabling technology with two critical capabilities
 Parallel processing
 Enabling read/write data sharing across multiple systems with full data integrity. "shared data" (as
opposed to "shared nothing")
DisksDisks

A key component in any Parallel Sysplex is the Coupling Facility (CF)
infrastructure.
Coupling facility = sharing of central memory between two systems.
Parallel Sysplex
Parallel Sysplex is analogous in concept to a UNIX cluster – allow the
customer to operate multiple copies of the operating system as a
single system. This allows systems to be added or removed as
needed, while applications continue to run.
PrimarySecondary
System Complex (Clustering)
System Complex (Clustering)
Standby
Production
Production
Non Production
Standby
Non Production

• A coupling facility is a special logical partition that runs the coupling facility
control code (CFCC) and provides high-speed caching, list processing, and
locking functions in a sysplex.
• A CF functions largely as a fast scratch pad. It is used for three purposes:
Coupling Facility (CF) structure
 Locking information that is shared among all attached systems
 Cache information (such as for a database) that is shared among all attached systems
 Data list information that is shared among all attached systems

Characteristics of Parallel Sysplex
 A common time source to synchronize all Mainframes systems' clocks.
 Coupling facility (CF) sharing of central memory between two systems for high performance data
sharing
 Cross System Coupling Facility (XCF) allows systems to communicate peer-to- peer
 Global Resource Serialization (GRS) allows multiple systems to access the same resources
concurrently, serializing where necessary to ensure exclusive access to prevents updates to the
same data
 Couple Data Sets (CDS) requires by sysplex to store information about its systems
FICON
Sysplex Timer
Coupling Facility
XCF
XCF
GRS
GRS
z/OS
LPAR
z/OS
LPAR
Couple Data Sets
Couple Data Sets
CDS CDS

• The best practice for any data sharing Parallel Sysplex is that there is at
least one failure-isolated CF implemented.
• It is critical that all Parallel Sysplex have at least two CFs to every member
of sysplex
FICON
Sysplex Timer
Coupling Facility
XCF
XCF
GRS
GRS
z/OS
LPAR
z/OS
LPAR
CF CF
Couple Data Sets
Couple Data Sets
CDS CDS

 Timer is a mandatory hardware requirement for a parallel sysplex consisting of more than one z
Series servers.
 It provides synchronization for the time-of-day (TOD) clocks of multiple servers thereby allows
events started by different servers to be properly sequenced in time.
 When multiple server updates same data base , all updates are required to be time stamped in
proper sequence.
 The Server Time Protocol feature is designed to provide the capability for multiple servers and
Coupling Facilities to maintain time synchronization with each other
 Redundancy of timers allows to stay Sysplex if either of has planned /unplanned outage
FICON
Couple Data Sets
Couple Data Sets
STP Timer
XCF
XCF
GRS
GRS
z/OS
LPAR
z/OS
LPAR
CF CF
Server Time Protocol
(STP) is a time
synchronization
architecture designed to
provide the capability for
multiple servers (CPCs) to
maintain time
synchronization with each
other and to form a
Coordinated Timing
Network (CTN)
To maintain time accuracy,
the STP facility supports
connectivity to an External
Time Source (ETS)
It is IBM’s License Internal
Code (LIC)
CDS CDS

Parallel Sysplex
Continuous Availability
 With Parallel Sysplex cluster it is possible to construct a parallel processing environment
single points of failure
 Because of the redundancy in the configuration, there is a significant reduction in the
number of single points of failure.
 Ability to perform hardware and software maintenance and installations in a non-
disruptive manner. Through data sharing and dynamic workload management, servers
can be dynamically removed from or added to the cluster allowing installation and
maintenance activities to be performed while the remaining systems continue to
work.
Capacity
 Parallel Sysplex environment can scale near linearly from 2 to 32 systems
Dynamic Workload Balancing
 The entire Parallel Sysplex cluster can be viewed as a single logical resource to end
and business applications.
- Benefits

GDPS
Geographically Dispersed Parallel Sysplex (GDPS) is an extension
of Parallel Sysplex of mainframes located, potentially, in different
cities or/and data centres.
GDPS is an end to end application availability solution

Globally Dispersed Parallel Sysplex - GDPS
• It is the ultimate disaster recovery and continuous availability
solution for a multi-site enterprise
• GDPS is combination of storage and Parallel Sysplex technology.
• Automates Parallel Sysplex operation tasks and perform failure
recovery from a single point of control.
• Types of GDPS configurations
• GDPS/PPRC based on synchronous data mirroring technology (PPRC)
that can be used on mainframes 200 kilometres (120 mi) apart.
• GDPS/XRC is an asynchronous Extended Remote Copy (XRC)
technology with no restrictions on distance
• GDPS/Global Mirror is based on asynchronous IBM Global
Mirror technology with no restrictions on distance.
• GDPS/active-active is a disaster recovery / continuous availability
solution, based on two or more sites, separated by unlimited
distances, running the same applications and having the same data to
provide cross-site workload balancing.

GDPS ACTIVE/ACTIVE
To achieve GDPS Active/Active configuration :
• All critical data must be PPRCed and Hyper-swaped enabled
• All critical CF structure must be duplexed
• Application must be parallel sysplex enabled

GDPS/PPRC
GDPS/PPRC, is metro area Continuous Availability (CA) and Disaster
Recovery (DR) solution, based upon
 Multi-site Parallel (Sys)tem Com(plex)
 Synchronous disk replication
It supports two configurations
 Active/Standby or single site workload
 Active/active or multi-site workload

Disks
Site 1 Active
Disks
GDPS / PPRC
Site 2 Warm
PPRC / Metro Mirror
• Even with the multi-path and RAID architecture within DASD subsystems the
single copy of the data continues to be a single point of failure (SPOF).
• A failure of a disk subsystem or even a single disk array failure can take down
major applications, the system, or even the sysplex.
• GDPS/PPRC is IBM disk replication technology to supplement removing SPOF

A Parallel Sysplex environment has been designed to reduce
outages by replicating hardware, operating systems, and
application components. In spite of this redundancy, having only
one copy on the data is an exposure.
GDPS Hyperswap
If there is a problem writing or accessing the primary disk, then
there is a need to swap I/O from the primary disks to the
secondary disks.
HyperSwap, a feature of GDPS, enhances the resilience by facilitating the
immediate switching I/O operations from the primary to the secondary
disks therefore providing near-continuous access to data

Secondary
Disks
Primary
Disks PPRC / Metro Mirror
GDPS Hyperswap
GDPS
Controlling
System
GDPS
Controlling
System
Primary Site Secondary Site
Hypersawap provides continuous availability of
data by masking disk outages and automates
switching between the two copies of the data
without causing an application outage In real time.
Failure
HyperSwap
K2K1 CDS

• In order for GDPS to operate, there must be a separate, isolated, z/OS system
known as the Controlling system.
• GDPS environments without a Controlling system are not supported.
• IBM strongly recommends 2 Controlling systems are setup per Sysplex.
• The idea is for one to act as a Primary and the other to be a Backup.
GDPS controlling System
PPRC / Metro Mirror
GDPS
Controlling
System (K1)
GDPS
Controlling
System(K2)
K2
K1

PPRC / Metro Mirror
GDPS controlling system :
GDPS controlling system ?
GDPS
Controlling
System (K1)
GDPS
Controlling
System (K2)
1. Performs situation analysis (after the unplanned event) to determine the status of
the production system and/or disks.
2. Drives automated recovery actions.
The controlling system must be in same sysplex so it can see all the messages
from systems in sysplex and communicate with them.
K2K1
CDSCDS
z/OS
LPAR
(K1)
XCF
GRS
CF
z/OS
LPAR
XCF
GRS
CF
(K2)

PPRC / Metro Mirror
K2K1
CDSCDS
ALCS
(K1)
XCF
GRS
CF
ALCS
XCF
GRS
CF
(K2)
The availability of controlling system is fundamental to GDPS
Why does a GDPS configuration need a controlling system ?
GDPS controlling system is designed to survive a failure in the opposite site of
where primary disks are .
Primary disks are normally in Site1 and controlling system in Site2 is designed to
survive if Site1 or the disks in Site1 fail.

Final view - Combining the jigsaw puzzle
PPRC / Hyperswap
K2
K1
Primary Site
CDSCDS
ALCS
(K1)
XCF
GRS
CF
GDPS managed
40 KM
Secondary Site
ALCS
XCF
GRS
CF
(K2)
CF links (Timer links)
ADVA /DWDM ADVA /DWDM
PPRC links , ISL Channels
GDPS
Site 1 TIBCO Site 2- Tibco
Applications
ESW Network

Conclusion
Mainframe physical clustering (System complex /Sysplex) between
dispersed data centres (GDPS) provides enterprise level disaster
recovery, data sharing , parallel computing capability to share workload
workload for high performance and high availability.

G D P S A N D S Y S T E M C O M P L E X
N A J M I M A N S O O R A H M E D

GDPS and System Complex

More Related Content

What's hot (20)

Similar to GDPS and System Complex (20)

Recently uploaded (20)

GDPS and System Complex