GDDR Solution Design and Implementation Techniques

GDDR Solution Design and
Implementation Techniques

Version 1.0

• Business Requirement Definitions
• Project Planning and Change Management Guidelines
• Technology Selection and Solution Design
• Architecture and Sizing Considerations

Dean Smith
Peter Callewaert
Chris Fallon
John Egan

Copyright © 2011 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is
subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable
software license.

For the most up-to-date regulatory document for your product line, go to the Technical Documentation and
Advisories section on EMC Powerlink.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

All other trademarks used herein are the property of their respective owners.

Part number h8231

2 GDDR Solution Design and Implementation Techniques

Contents

Preface

Chapter 1 Requirements Definition
Introduction ....................................................................................... 16
Understanding Business Continuity requirements ..................... 18
High Availability............................................................................... 21
Recovery Time Objective ................................................................. 22
Recovery Point Objective................................................................. 23
EMC GDDR role................................................................................ 24

Chapter 2 Planning for your GDDR Implementation
Understanding project constraints ................................................. 28
The importance of test systems....................................................... 29
Minimizing operational risks during implementations.............. 30
The importance of design ................................................................ 32
Construction of your timeline and milestones ............................. 33
Review change management procedures...................................... 34

Chapter 3 GDDR Managed Technologies
EMC Symmetrix VMAX with EMC Enginuity............................. 36
Symmetrix VMAX hardware platform....................................36
EMC Enginuity operating environment..................................38
Symmetrix features for mainframe ..........................................38
EMC Mainframe Enablers .........................................................39
EMC TimeFinder............................................................................... 44
TimeFinder/Clone......................................................................44
TimeFinder/Consistency Group ..............................................45

GDDR Solution Design and Implementation Techniques 3

Contents

TimeFinder/Mirror for z/OS ................................................... 45
EMC Symmetrix Remote Data Facility (SRDF) ............................ 48
SRDF/Synchronous ................................................................... 48
SRDF/Asynchronous................................................................. 48
SRDF/A Multi-Session Consistency (MSC) ........................... 51
Concurrent SRDF........................................................................ 54
Cascaded SRDF........................................................................... 55
SRDF/Extended Distance Protection ...................................... 56
EMC SRDF consistency groups ...................................................... 58
EMC AutoSwap ................................................................................ 62
EMC SRDF/Star................................................................................ 64

Chapter 4 EMC Geographically Dispersed Disaster Restart (EMC
GDDR)
Supported configurations................................................................ 72
SRDF/S with ConGroup ........................................................... 73
SRDF/S with AutoSwap ........................................................... 76
SRDF/A ....................................................................................... 78
SRDF/Star ................................................................................... 81
EMC GDDR SRDF/Star complex ............................................ 84
SRDF/Star with AutoSwap ...................................................... 85
EMC GDDR requirements............................................................... 89
Mainframe environment requirements ................................... 89
Minimum hardware requirements .......................................... 90
Minimum software requirements ............................................ 90
Additional configuration requirements .................................. 91
EMC GDDR fundamental concepts ............................................... 93
EMC GDDR control systems .................................................... 93
Workload location ..................................................................... 94
EMC GDDR processes ............................................................... 96
EMC GDDR components................................................................. 98
Parameters .................................................................................. 98
User interface ............................................................................. 99
Events and rules....................................................................... 112
Procedures ................................................................................ 113
EMC GDDR procedures ................................................................ 114
EMC GDDR supported scripts ..................................................... 116
GDDR for SRDF/S with ConGroup (Two sites - DC1
and DC2) .................................................................................... 119
GDDR for SRDF/S with AutoSwap (Two sites - DC1
and DC2) .................................................................................... 121
GDDR for SRDF/A (Two sites - DC1 and DC3) .................. 123


Contents

GDDR for SRDF/Star (Three sites) ........................................125
GDDR for SRDF/Star with AutoSwap (Three sites) ...........130

Chapter 5 GDDR Managed Systems Design Considerations
Overview .......................................................................................... 140
Ignore, Reset Clear, or IPL in place............................................... 142
Contingency systems and workload movement ........................ 143
HMC-Only systems (new in GDDR 3.1)...................................... 144
LPAR and CEC recovery ................................................................ 146
Planned CPC Swap .................................................................. 147
Automated Configuration Check - DASD................................... 149
GDDR variables influencing the expected configuration ...149

Chapter 6 Storage Infrastructure Design Considerations
Storage configuration and layout ................................................. 152
Infrastructure recommendations .................................................. 153
Gatekeeper device list ..............................................................153
Started tasks names ................................................................. 155
Important dataset names .........................................................157
EMC software levels ....................................................................... 158
LPAR configuration specifics ..................................................158
Sysplex and coupling facility considerations.............................. 160

Chapter 7 Best Practices for Network Design
Overview .......................................................................................... 164
Understanding bandwidth requirements.................................... 165
Synchronous network design considerations ............................. 168
Asynchronous network design considerations .......................... 172

Glossary


Contents


Figures

Title Page
1 Business Continuity example ....................................................................... 19
2 RTO example................................................................................................... 22
3 RPO example................................................................................................... 23
4 EMC Foundation technologies ..................................................................... 25
5 VMAX architecture ........................................................................................ 37
6 z/OS SymmAPI architecture........................................................................ 40
7 TimeFinder/Mirror under Enginuity 5773 and earlier ............................ 46
8 TimeFinder/Mirror under Enginuity 5874 and later................................ 47
9 SRDF/A delta sets.......................................................................................... 49
10 Delta set switching ......................................................................................... 51
11 SRDF/A cycles and delta sets ...................................................................... 53
12 Cascaded SRDF configuration...................................................................... 56
13 SRDF/EDP basic configuration ................................................................... 57
14 Primary and secondary relationships ......................................................... 59
15 Failed links ...................................................................................................... 60
16 Primary systems 1, 2, and 3 in a consistency group.................................. 61
17 Concurrent SRDF/Star configuration ......................................................... 66
18 Cascaded SRDF/Star configuration ............................................................ 67
19 Cascaded SRDF/Star with EDP (diskless) configuration ........................ 68
20 SRDF/S with ConGroup environment ....................................................... 73
21 EMC GDDR complex with SRDF/S ............................................................ 75
22 SRDF/S with AutoSwap environment ....................................................... 76
23 SRDF/A environment ................................................................................... 78
24 GDDR complex with SRDF/A ..................................................................... 79
25 Concurrent SRDF/Star environment .......................................................... 81
26 Cascaded SRDF/Star environment ............................................................. 82
27 GDDR Complex with Star............................................................................. 84
28 Concurrent SRDF/Star with AutoSwap environment ............................. 86
29 Cascaded SRDF/Star with AutoSwap environment ................................ 87
30 Primary Options Menu................................................................................ 100


Figures

31 Change GDDR ISPF Profile Variable Values ........................................... 102
32 Setup and Maintenance............................................................................... 103
33 View GDDR Configuration ........................................................................ 105
34 Manage Site Roles ........................................................................................ 106
35 Perform Health Check ................................................................................. 107
36 Select Script to Run ...................................................................................... 109
37 View GDDR Script Statistics....................................................................... 110
38 Actions Menu................................................................................................ 111
39 GDDR Actions Menu................................................................................... 144
40 GDDR Perform HMC LPAR Actions ........................................................ 145
41 Write distribution......................................................................................... 166
42 Bandwidth requirement .............................................................................. 167
43 DASD write activity by volser ................................................................... 169
44 DASD Volumes Flagged for Tuning ......................................................... 170
45 Recommended asynchronous network topology ................................... 174


Tables

Title Page
1 Mainframe environment requirements........................................................ 89
2 Minimum hardware requirements ............................................................... 90
3 Software requirements ................................................................................... 91
4 Script names and descriptions..................................................................... 116
5 Script mapping .............................................................................................. 118
6 GDDR managed system recover map........................................................ 141
7 Software levels ............................................................................................... 158


Tables


Preface

This EMC Engineering TechBook draws on GDDR field expertise,
highlighting best practices relating to both technology and project
management disciplines required to achieve success during GDDR
implementations. This field-inspired guide starts with the first risk area to a
business continuity project: basic understanding of business continuity or
high availability requirements, then proceeds through analysis, technology
selection, and architecture design guidelines. Following the guidelines in
this GDDR TechBook will help ensure that your business continuity or high
availability project is successful.
As part of an effort to improve and enhance the performance and capabilities
of its product lines, EMC periodically releases revisions of its hardware and
software. Therefore, some functions described in this document may not be
supported by all versions of the software or hardware currently in use. For
the most up-to-date information on product features, refer to your product
release notes. If a product does not function properly or does not function as
described in this document, please contact your EMC representative.

Note: This document was accurate as of the time of publication. However, as
information is added, new versions of this document may be released to the
EMC Powerlink website. Check the Powerlink website to ensure that you are
using the latest version of this document.

Audience This manual is intended for customers considering improvements to
business continuity or high availability technology deployments.


Preface

Related The following is a list of related documents that may assist readers
documentation with more detailed information on topics described in this TechBook.
These documents can be found on the EMC Powerlink website at
http://Powerlink.EMC.com.
◆ ResourcePak Base for z/OS
◆ SRDF Host Component for z/OS
◆ TimeFinder/Clone Mainframe SNAP Facility
◆ TimeFinder/Mirror for z/OS
◆ Consistency Groups for z/OS
◆ TimeFinder Utility for z/OS
◆ AutoSwap
◆ GDDR for SRDF/Star with AutoSwap
◆ GDDR for SRDF/Star
◆ GDDR for SRDF/S with AutoSwap
◆ GDDR for SRDF/S with ConGroup
◆ GDDR for SRDF/A

Authors of this Dean Smith is a Consulting Software Engineer and has recently
TechBook rejoined EMC with the GDDR team after working abroad for several
years. With nearly 33 years of IT experience in the Mainframe
environment in both client and systems development, Dean has
focused exclusively on the justification, architecture, and
implementation of business continuity solutions world wide for the
last 12 years.
Peter Callewaert is a Senior Solutions Architect at EMC, with an
active role in the evolution of GDDR as a product, in the
troubleshooting and fixing of product issues, and in providing
support to the GDDR Solutions Support team. He has been with EMC
for 12 years in various roles including Pre-Sales, Customer Service,
and on-site Implementation. He has 20+ years of IT-experience
mainly in the Mainframe environment, with an ever-increasing focus
on Disaster Recovery and Business Continuity.
Chris Fallon is a Consulting Corporate System Engineer in the EMC
Symmetrix Engineering and Virtualization product group with over
22 years experience in enterprise data center systems and storage
expertise. Chris has been working with the Symmetrix since 1995 and


Preface

SRDF since 1997 where he has developed, architected, and deployed
geographically dispersed business continuity solutions on the
Symmetrix storage array.
John Egan is an EMC Distinguished Engineer in the EMC Symmetrix
Engineering and Virtualization product group. He leads the System z
Automation team, responsible for the GDDR and EzSM products. His
background extends over 30 years with specialization in large scale
systems integration, software engineering, project management, and
design.

Conventions used in EMC uses the following conventions for special notices.
this document
Note: A note presents information that is important, but not hazard-related.

A caution contains information essential to avoid data loss or
damage to the system or equipment.

IMPORTANT
An important notice contains information essential to operation
Typographical conventions

EMC uses the following type style conventions in this document:
Normal Used in running (nonprocedural) text for:
• Names of interface elements (such as names of windows,
dialog boxes, buttons, fields, and menus)
• Names of resources, attributes, pools, Boolean expressions,
buttons, DQL statements, keywords, clauses, environment
variables, functions, utilities
• URLs, pathnames, filenames, directory names, computer
names, filenames, links, groups, service keys, file systems,
notifications
Bold Used in running (nonprocedural) text for:
• Names of commands, daemons, options, programs,
processes, services, applications, utilities, kernels,
notifications, system calls, man pages
Used in procedures for:
• Names of interface elements (such as names of windows,
dialog boxes, buttons, fields, and menus)
• What user specifically selects, clicks, presses, or types


Preface

Italic Used in all text (including procedures) for:
• Full titles of publications referenced in text
• Emphasis (for example a new term)
• Variables
Courier Used for:
• System output, such as an error message or script
• Specific user input (such as commands)
• URLs, complete paths, filenames, prompts, and syntax when
shown outside of running text
Courier italic Used in procedures for:
• Variables on command line
• User input variables
<> Angle brackets enclose parameter or variable values supplied by
the user
[] Square brackets enclose optional values
| Vertical bar indicates alternate selections - the bar means “or”
{} Braces indicate content that you must specify (that is, x or y or z)
... Ellipses indicate nonessential information omitted from the
example

We'd like to hear from you!
Your feedback on our TechBooks is important to us! We want our
books to be as helpful and relevant as possible, so please feel free to
send us your comments, opinions and thoughts on this or any other
TechBook:
TechBooks@emc.com

1

Requirements Definition

This chapter contains the following information:
◆ Introduction ........................................................................................ 16
◆ Understanding Business Continuity requirements....................... 18
◆ High Availability................................................................................ 21
◆ Recovery Time Objective .................................................................. 22
◆ Recovery Point Objective .................................................................. 23
◆ EMC GDDR role................................................................................. 24

Requirements Definition 15


Introduction
This chapter introduces EMC® Geographically Dispersed Disaster
Restart (EMC GDDR) and discusses its role in High Availability
(HA), Business Continuity (BC) and disaster restart (DR) strategies.
In recent years, business continuity requirements for many
enterprises have changed particularly in response to the threat of
terrorism as well as from major natural disasters. What has evolved is
the notion of a “regional disaster”—whereby local synchronously
replicated centers are “in region” in terms of the scope of disaster
events. A regional disaster such as a major flood or prolonged and
widespread power disruption could concurrently affect both data
centers. In response to this many sites have re-evaluated their disaster
readiness capabilities.
Commerce has not stood still either. The shift toward online/internet
enabled enterprises has continued to build high availability and
continuous compute architectures to support “24*7” business
operations.
Consistent with these trends we have seen customers building data
centers and technology infrastructure to match changing business
requirements. A common practice for customers running
synchronous replication is to add third-site asynchronous “out of
region” disaster recovery centers. This has been particularly observed
for the large financial institutions.
Customers with high (HA) or continuous availability requirements
are trending towards building adjacent primary and secondary data
centers to gain the benefit of synchronous data replication with
minimal performance impact. Another common data center strategy
to support HA requirements is to modify existing data center
infrastructure for redundant power connect and physical firewall
separation for compute and/or storage infrastructure. These
strategies facilitate synchronous replication to operate with minimum
latency overhead and also enables construction of very high-speed,
high bandwidth networks to support processor-to-processor
interconnect for the IBM Parallel Sysplex Coupling Facility.



The larger financial institutions are trending toward building
combined HA and Disaster Recovery (DR) solutions where both
synchronous and asynchronous replication architectures are
combined with data replication and management infrastructure to
detect and manage potentially catastrophic disaster-in-progress
“events.”

Introduction 17


Understanding Business Continuity requirements
Business Continuity (BC) requirements tend to cluster around
industry verticals. Telecommunications companies typically deploy
some form of two-site replication and recovery architecture. Large
financial institutions are trending towards three-site architectures
that support both HA and DR solutions in three data center
strategies. The largest retailers tend towards geographically
separated two-site data center strategies, some deploying
asynchronous and others synchronous data replication strategies.
Figure 1 on page 19 illustrates that as the size, sophistication and
wealth of an enterprise increase over time, business continuity
requirements shift from essentially no protection through disaster
recovery, business restart and eventually towards high availability.



100%
Procedural
0% IT
Architectural
Low Security
Manual
Resources Food
Manufacturing

Financial
Manufacturing Services
& Essential Retail
Telecommunication Services & Online Banks

Small Business Brokerages
Transportation
And Consumer
Logistics Goods 0% Procedural
Manufacturing 100% IT
Architectural
High Security
Automated

Single Data Dual Data Three Data
Center Center Center

Market Cap $10b $20b+

SYM-002544

Figure 1 Business Continuity example

Disaster recovery is the process of restoring a previous copy of the
data as the starting point for resumption of business operations,
typically from a tape image. It is then necessary to apply database
logs or perform other manual processes to bring the data to the last
known point of consistency. Depending on the size of the enterprise
this process can range from a few hours to days and weeks.
Disaster restart is the process of restarting dependent-write consistent
copies of data. This type of “restart” data image is created by EMC
data replication technologies. The image is presented to the operating

Understanding Business Continuity requirements 19


system for IPL. The data itself is in “crash restart” form, the exact
same image that would be available following the loss of power at a
primary (non-replicated) data center. Disaster restart displaces tape
based recovery and greatly reduces the Recovery Time Objective
(RTO).
Database management systems and mainframe applications have
evolved over time to restart from these data images. Any incomplete
or partially complete transactions are “backed” out by the application
layer to find a point where business resumption can occur. Any
backed out or incomplete transactions are re-applied after business
operations resume. The time required for restart depends on the data
change rate and state of the databases at the time of disruption:
◆ If a database is shut down normally, achieving a point of
consistency on restart requires minimal work.
◆ However, if the database terminates abnormally or is resumed
from a “restart” data image, the restart process is elongated
depending on the number and size of in-flight transactions at the
time of DBMS termination.
An image of a database created using EMC consistency technology
while the database is in operation is in a dependent-write consistent
state, also known as a DBMS restartable image. The restart of this
image transforms it to a transactionally consistent data state by
completing committed transactions and rolling back uncommitted
transactions during the normal database initialization process.



High Availability
High Availability (HA) solutions are built on synchronous data
replication to enable EMC AutoSwap™ host software to interact with
storage arrays to achieve a non-disruptive swap of DASD I/O from a
source array to a synchronous target array. Swap operations can
occur automatically for unplanned events, such as the loss of all
FICON connectivity between a z/OS host and its attached EMC
Symmetrix® array(s), or for planned events through an operator
command interface and also through panel dialogs supplied with
GDDR.
Synchronous data replication is required for swap operations because
of the requirement for the replicated data to be an exact image of the
primary disk at the precise moment of the swap event.
Asynchronous replication is chronologically behind the primary
storage instance (by measure of seconds or minutes), it is therefore
not possible to deploy AutoSwap into an asynchronous replication
environment.
Some customers require both HA and DR solutions and deploy a
combination of synchronous and asynchronous replication
infrastructures. These solutions provide non-disruptive swaps to the
secondary synchronous disk for localized “disasters” such as
severing the FICON cables between a host and its local storage as
well as DR protection in the event of a catastrophic loss of the
primary and secondary data centers.

High Availability 21


Recovery Time Objective
To gain an understanding of the type of replication and recovery
infrastructure that must be in place to support an enterprise, it is
typical to first classify applications by criticality. Many customers
have three or more application tiers, whereby applications are
categorized by time-to-recover following planned or unplanned
service interruption.
Ranges can vary from high availability (RTO = zero) through to hours
or days to recover; it all depends on the impact to the business as a
trade off to the cost of protection against service interruption.
Figure 2 illustrates the relationship between the cost of the solution
and the cost of the service interruption. Once the RTO is known, the
cost of the solution can be determined.

Figure 2 RTO example



Recovery Point Objective
Applications must also be classified according to the amount of data
loss that could be sustained as result of recovery from service
interruption. The measure of data loss is termed Recovery Point
Objective (RPO). Typically synchronous data replication supports a
near zero RPO, whereas asynchronous data replication supports RPO
measured in terms of seconds, minutes or hours. For asynchronous
data replication the measure of data loss is usually proportional to
the amount of Wide Area Network (WAN) bandwidth applied to the
solution. Figure 3 illustrates the relationship between the cost of the
solution and the cost of the service interruption. Once the RPO is
known, the cost of the solution can be determined.

Figure 3 RPO example

Recovery Point Objective 23


EMC GDDR role
EMC Geographically Dispersed Disaster Restart (EMC GDDR) is a
software product that runs on the IBM z/OS operating system to
automate business recovery following both planned outages and
disaster situations, including the total loss of a data center. EMC
GDDR does not provide replication and recovery services itself, but
rather monitors and automates the services provided by other EMC
products, as well as third-party products required for continuous
operations or business restart.
EMC GDDR facilitates business continuity by generating scripts that
can be run on demand; for example, restart business applications
following a major data center incident, or resume replication to
provide ongoing data protection following unplanned link outages.
Scripts are customized at the time of invocation by an expert system
that tailors the steps based on the configuration and the event that
GDDR is managing. Through automatic event detection and
end-to-end automation of managed technologies, GDDR removes
human error from the recovery process and allows it to complete in
the shortest time possible.
The GDDR expert system is also invoked to automatically generate
planned procedures, such as moving compute operations from one
data center to another. This is the gold standard for high availability
compute operations, to be able to move from scheduled DR test
weekend activities to regularly scheduled data center swaps without
disrupting application workloads.



Figure 4 on page 25 shows the relationship between the software
layers and the Symmetrix hardware layer.

Figure 4 EMC Foundation technologies

EMC GDDR role 25

2
Planning for your GDDR
Implementation

This chapter deals with important considerations to ensure that your
major infrastructure project is successful. Some of the issues
highlighted here are best practices specific to managing technology
risk. Other recommendations relate to the common but important
risk factors associated with project administration and change
management:
◆ Understanding project constraints .................................................. 28
◆ The importance of test systems........................................................ 29
◆ Minimizing operational risks during implementations ............... 30
◆ The importance of design ................................................................. 32
◆ Construction of your timeline and milestones .............................. 33
◆ Review change management procedures....................................... 34

Planning for your GDDR Implementation 27

Planning for your GDDR Implementation

Understanding project constraints
EMC GDDR implementations operate under the “triple” constraint
model of project management. Essentially, there are three major areas
of constraint on any infrastructure project:
◆ Time
◆ Quality
◆ Cost
In many instances the date of implementation is already fixed.
Ideally, the implementation date will have been calculated from the
sum total of all of the complex technical infrastructure tasks that must
be accomplished. These tasks may include:
◆ Workload measurement
◆ Bandwidth assessment
◆ Compute resiliency (to support HA or DR etc)
◆ Storage layout and I/O access density (cache, device types, RAID
type etc)
◆ Data center planning (floor space, power, cooling etc)
◆ Data migration planning
◆ Construction of test facility
◆ Staff training
It is a major risk factor if the target date for the completion of the
project has been set without due regard to the number and
complexity of tasks to achieve the goal.
In theory, the three constraint project model is a method for the
project manager to retain control of the project delivery. If the
customer owns all three constraints it is impossible for the project
manager to obtain or retain control of the project. Control can be
maintained by allowing the project manager to control at least one of
the constraints, typically the Quality objective. In environments
where time and budget are fixed the only remaining variable is
quality. Inevitably, if the time constraint is unreasonable project
quality will be affected.



The importance of test systems
As the complexity of solutions increases so too does the requirement
to allow your operations staff to become competent with new
technology. This is best achieved with a test environment. Even if the
customer cannot afford a perpetual test environment – serious
consideration should be given to constructing an interim test
environment for the duration of the GDDR project. It will allow
operations staff to safely acquire knowledge and also provide a
framework for formalized “hands on” training offered by the EMC
GDDR implementation team both during the project and more
formally as part of the acceptance criteria towards the end of the
project.
Some customers can benefit from this more than others. If your
operations staff do not have experience with replication and
automation technologies in general and EMC in particular, then the
construction of a test environment should be considered mandatory.
If however, the customer has the necessary skills in-house then the
benefit from a test environment is more limited to ensuring that
software and fixes provided by EMC and other vendors operate
correctly in the customer environment.

The importance of test systems 29


Minimizing operational risks during implementations
There are several methods for implementing GDDR into production
environments. The two approaches outlined below are used where a
customer cannot tolerate downtime in their existing production
workflow. The techniques used now allow for both continuous
operation of production systems and also maintenance of the existing
DR posture.
The parallel infrastructure approach is a technique where new
network, compute and storage infrastructure is commissioned
adjacent to the existing production systems. Data is migrated from
the existing production infrastructure into the new GDDR managed
environment to allow for infrastructure and applications orientated
testing to be conducted. The testing process is destructive to the data
image, so the data migration process is repeated for each test cycle.
Once the infrastructure has been proven to operate successfully in the
new environment one final data migration is conducted. The GDDR
managed infrastructure becomes the new production environment
and the older infrastructure is decommissioned. Another significant
advantage of this technique over other techniques is that not only can
the resiliency of the compute infrastructure be preserved but also the
existing HA/DR posture can be retained until the new infrastructure
can be deployed. The parallel infrastructure approach is the “best
practice” for deploying GDDR management infrastructures into
customer environments with both minimal risk and minimal
disruption to existing application workloads.
The partial production approach is where a small subset of DASD
volumes on the production system is used to prove the entire
infrastructure. This allows for the validation of GDDR and
prerequisite technologies into the production environment but
without actually using DASD that contains production data. When
the time comes to increase the scope of the parameters to address the
full environment the first time it is executed is live in the production
environment. This partial production approach is inferior to the
parallel infrastructure approach in that the full function and scope of
functionality can be not be fully tested without risking impact to
production environments. The emphasis here is to eliminate (in
particular) typing errors in parameter libraries, security issues, issues
with gate keeper devices, firewalls etc., all of which can be debugged
without impact using the parallel infrastructure approach but which
cannot be fully detected in the partial production approach until
actually encountered live in the production environment.



The importance of a test environment to allow operations staff to
obtain competency cannot be overstressed when the partial
production approach is used. The partial production approach is a
good compromise method of implementing where the customer is
able to provide sufficient test windows to safely test the new
infrastructure "in place." It is also commonly used where financial or
other constraints on a project make the "parallel infrastructure"
implementation approach impractical.

Minimizing operational risks during implementations 31


The importance of design
Another significant area of addressable risk with GDDR projects is in
the correct sizing of the entire solution. This book sets out guidelines
for properly sizing the proposed solution. Particular attention must
be paid to the sizing of network infrastructure. See Chapter 7, “Best
Practices for Network Design,” for more details. A second major area
of design risk is in the storage layout. Storage must be configured not
only for raw storage capacity, raid groups etc but also for throughput
capacity. The solution must be designed to sustain not only I/O
loads from the production environment but also to have sufficient
reserve capacity in terms of cache and throughput to operate the
replication infrastructure and (if required) restart operations. See the
section Storage Infrastructure Design Considerations for more details
on this important topic. EMC recommends early contact (pre-sales)
with GDDR solution design experts to ensure that any solution
proposal meets design standards to ensure optimal operational
performance. The GDDR team operates globally and is centrally
managed from EMC Corporate Headquarters in Hopkinton,
Massachusetts.
The team can be contacted via email at the following address:
gddrsolutionsupport@emc.com

Conventional mail:

GDDR Solution Support Team
Mailstop 1/P-36
176 South Street
Hopkinton, MA 01748 USA



Construction of your timeline and milestones
Setting a “too aggressive” timeline can cause a project to fail. As
discussed earlier in the “three constraint” model for project
administration, if the timeline is fixed and the budget is fixed that
leaves only one variable: quality. In other words in order to meet
strict deadlines for task completion shortcuts may be taken in quality
assurance measures. The result of this may be a sequence of failed
changes or service interruptions to the production environment.
One of the ways to avoid placing undue/unreasonable time pressure
on a project is to make sure to involve the technical architecture and
implementation people in the construction of the implementation
plan time outline.

Construction of your timeline and milestones 33


Review change management procedures
The “GDDR Project” is typically much larger than just installing the
GDDR software and customizing parameter libraries. As discussed
earlier, there may be changes to physical infrastructure, to utilities,
air-conditioning as well as IT infrastructure such as network
provisioning, new compute and storage arrays, switches, routers etc.
The scope of these projects is very large. In the space of a few months
a large number of potentially high risk changes need to be executed.
Change scheduling is important in order to avoid conflicts between
infrastructure provisioning groups and to align with the business
calendar.
However, much more important than change scheduling is change
management. This is a much broader topic and addresses issues such
as:
◆ Change scheduling
◆ Backout/fallback planning
◆ Risk mitigation strategies
◆ Documentation
◆ Root cause analysis (for failed changes)
When trying to determine the health of a change management
system, look beyond the number of change “tickets” to see how many
changes are rejected because of scheduling conflicts, inadequate
documentation, risk mitigation etc. A robust change management
system will exhibit much more than just change scheduling.
Assess the health of your change management systems and if
necessary adjust to deal with the large volume of changes typically
encountered during these infrastructure projects.


3
GDDR Managed
Technologies

EMC provides many hardware and software products that support
applications in the z/OS environment. The following products, some
required and some configuration-specific, are used in EMC GDDR
solutions:
◆ EMC Symmetrix VMAX with EMC Enginuity.............................. 36
◆ EMC TimeFinder ................................................................................ 44
◆ EMC Symmetrix Remote Data Facility (SRDF) ............................. 48
◆ EMC SRDF consistency groups........................................................ 58
◆ EMC AutoSwap.................................................................................. 62
◆ EMC SRDF/Star ................................................................................. 64

GDDR Managed Technologies 35

GDDR Managed Technologies

EMC Symmetrix VMAX with EMC Enginuity
Symmetrix hardware architecture and the EMC Enginuity™
operating environment are the foundation for the Symmetrix storage
platform. This environment consists of the following components:
◆ Symmetrix hardware
◆ Enginuity-based operating functions
◆ Mainframe Enablers
◆ Symmetrix application program interface (API) for mainframe
◆ Symmetrix-based applications
◆ Host-based Symmetrix applications
◆ Independent software vendor (ISV) applications

Symmetrix VMAX hardware platform
The Symmetrix VMAX™ design is based on a highly-available VMAX
Engine with redundant CPU, memory, and connectivity on two
directors for fault tolerance. Symmetrix VMAX Engines connect to
and scale out linearly through the Virtual Matrix Architecture®,
which allows resources to be shared across Symmetrix VMAX
engines. To meet growth requirements, additional engines can be
added non-disruptively for efficient and dynamic scaling of capacity
and performance.



Figure 5 on page 37 shows the Symmetrix VMAX hardware
architecture.

Figure 5 VMAX architecture

EMC Symmetrix VMAX with EMC Enginuity 37


EMC Enginuity operating environment
The Enginuity operating system provides controls for all components
in a Symmetrix array. Enginuity coordinates real-time events related
to the processing of production data providing the following services:
◆ Manages system resources to intelligently optimize performance
across a wide range of I/O workload requirements.
◆ Ensures system availability through advanced fault monitoring,
detection and correction capabilities and provides concurrent
maintenance and serviceability features.
◆ Provides the foundation for specific software features available
through EMC disaster recovery, business continuance, and
storage management software.
◆ Supports functional services for both Symmetrix-based
functionality and for a large suite of EMC storage application
software.
◆ Defines priority of each task, including basic system maintenance,
I/O processing, and application processing.
◆ Provides uniform access through APIs for internal calls and
provides an external interface to allow integration with other
software providers.

Symmetrix features for mainframe
Symmetrix storage systems appear to mainframe operating systems
as any of the following control units: IBM 3990, IBM 2105, and IBM
2107. The physical storage devices can appear to the mainframe
operating system as any mixture of different sized 3380 and 3390
devices. Mainframe host connectivity is supported through serial
ESCON and FICON channels.
Fibre Channel and GigE are supported options in SRDF®
environments. Symmetrix GigE directors in an SRDF environment
provide direct TCP/IP connectivity end-to-end for remote replication
solutions over extended distances. This negates costly FC to IP
converters and helps utilize the existing IP infrastructure without
major disruptions.
EMC supported IBM compatibility features include:
◆ Channel Command Emulation for IBM ESS 2105/2107



◆ Multiple Allegiance (MA)
◆ Parallel Access Volume (PAV) Static and Dynamic
◆ HyperPAV
◆ Concurrent Copy
◆ Dynamic Channel Path Management (DCM)
◆ Dynamic Path Reconnection (DPR) support
◆ Host Data Compression
◆ Logical Path and Control Unit Address Support (CUADD)
◆ Mainframe systems hypervolumes
◆ Partitioned Dataset (PDS) Search Assist
◆ FlashCopy
◆ GDPS
◆ GDPS HyperSwap
◆ Peer to Peer Remote Copy (PPRC)
◆ Extended Remote Copy (XRC)
◆ Extended Address Volume
◆ Extended Distance FICON
◆ High Performance FICON (zHPF)
◆ DFSMS support for Flash Drives
◆ 8 Gb/s FICON

EMC Mainframe Enablers
Mainframe Enablers (MFE) is a software suite that includes the
Symmetrix API runtime libraries for all EMC mainframe software.
These software packages can be used to monitor device configuration
and status and to perform control operations on devices and data
objects within a storage complex. The following components are
included in Mainframe Enablers (MFE) 7.0 and above:
◆ ResourcePak® Base for z/OS
◆ SRDF Host Component for z/OS
◆ TimeFinder®/Clone Mainframe SNAP Facility
◆ TimeFinder/Mirror for z/OS



◆ Consistency Groups for z/OS
◆ TimeFinder Utility for z/OS
◆ AutoSwap

ResourcePak Base for z/OS
ResourcePak Base delivers EMC Symmetrix Control Facility
(EMCSCF) for IBM and IBM-compatible mainframes. EMCSCF
provides a uniform interface for EMC and ISV software products.
EMCSCF delivers a "persistent address space" on the host that
facilitates communication between the host and the Symmetrix as
well as other EMC-delivered and partner-delivered applications. In
GDDR managed configurations this host is normally the GDDR
Master C-System. ResourcePak Base performs tasks such as the
following:
◆ Maintaining an active repository of information about EMC
Symmetrix devices attached to z/OS environments and making
that information available to other EMC products.
◆ Performing automation functions.
◆ Handling inter-LPAR (logical partition) communication through
the Symmetrix storage system.

EMC or ISV developed Symmetrix
products Symmetrix
Control Facility devices
(e. g., TimeFinder, SRDF (ResourcePak Base)
Host Component)
EMCSAI
Program Calls SNAPAPI
Automation: I
SWAP O
Metadata: S
Config info
Device status
Event monitor
ICO-IMG-000104

Figure 6 z/OS SymmAPI architecture



ResourcePak Base provides faster delivery of new Symmetrix
functions by EMC and ISV partners, along with easier upgrades. It
also provides the ability to gather data when using tools such as
TimeFinder/Mirror query because device status information is now
cached along with other important information.
ResourcePak Base for z/OS is a prerequisite for EMC mainframe
applications like the TimeFinder/Clone Mainframe SNAP Facility or
SRDF Host Component for z/OS. As of release 7.0, these products are
packaged and installed together as Mainframe Enablers, along with
TimeFinder/Mirror for z/OS, Consistency Groups for z/OS,
TimeFinder Utility for z/OS, and AutoSwap.
ResourcePak Base provides the following functionality through
EMCSCF.

Cross-system communication
Inter-LPAR communication is handled by the EMCSCF cross-system
communication (CSC) component. CSC uses a Symmetrix storage
system to facilitate communication between LPARs. Several EMC
Symmetrix mainframe applications use CSC to handle inter-LPAR
communications.

Non-disruptive SymmAPI-MF refreshes
As of version 5.3, EMCSCF allows the SymmAPI-MF to be refreshed
non-disruptively. Refreshing SymmAPI-MF does not impact
currently executing applications that use SymmAPI-MF; for example,
SRDF Host Component for z/OS or TimeFinder/Clone Mainframe
SNAP Facility.

SRDF/A Monitor
The SRDF/A Monitor in ResourcePak Base is designed to:
◆ Find EMC Symmetrix controllers that are running SRDF/A.
◆ Collect and write SMF data about those controllers.
After ResourcePak Base is installed, the SRDF/A Monitor is started
as a subtask of EMCSCF.

Group Name Service support
ResourcePak Base includes support for Symmetrix Group Name
Service (GNS). Using GNS, you can define a device group once and
then use that single definition across multiple EMC products on
multiple platforms. This means that you can use a device group



defined through GNS with both mainframe and open systems-based
EMC applications. GNS also allows you to define group names for
volumes that can then be operated upon by various other commands.

Pool management With ResourcePak Base V5.7 or higher, generalized device pool
management is a provided service. Pool devices are a predefined set
of devices that provide a pool of physical space. Pool devices are not
host-accessible. The CONFIGPOOL commands allow management of
SNAPPOOLS or DSEPOOLS with CONFIGPOOL batch statements.

SRDF/A Multi-Session Consistency
SRDF/A Multi-Session Consistency (MSC) is a task in EMCSCF that
ensures remote R2 consistency across multiple Symmetrix storage
systems running SRDF/A. MSC provides the following:
◆ Coordination of SRDF/A cycle switches across systems.
◆ Up to 24 SRDF groups in a multi-session group.
◆ One SRDF/A session and one SRDF group per Symmetrix
storage system when using Enginuity release level 5X70.
◆ With Enginuity release level 5X71 and later, SRDF groups are
dynamic and are not limited to one per Symmetrix storage
system. Group commands of ENABLE, DISPLAY, DISABLE,
REFRESH, and RESTART are now available.

SWAP services
ResourcePak Base deploys a SWAP service in EMCSCF. It is used by
EMC AutoSwap for planned outages with the ConGroup Continuous
Availability Extensions (CAX).

Recovery services
Recovery service commands allow you to perform recovery on local
or remote devices (if the links are available for the remote devices).

Licensed Feature Code management
EMCSCF manages Licensed Feature Codes (LFCs) to enable
separately chargeable features in EMC software. These features
require an LFC to be provided during the installation and
customization of EMCSCF. LFCs are available for:
◆ Symmetrix Priority Control
◆ Dynamic Cache Partitioning



◆ AutoSwap (Congroup with AutoSwap Extensions) - separate
LFCs are required for planned and unplanned swaps
◆ EMC z/OS Storage Manager
◆ SRDF/Asynchronous (MSC)
◆ SRDF/Automated Replication
◆ SRDF/Star
◆ TimeFinder/Clone (TARGET)
◆ TimeFinder/Consistency Group (CONSISTENT)
◆ TimeFinder/Snap (VDEV)



EMC TimeFinder
TimeFinder is a family of products that enables both volume-based
replication and data set level replication within a single Symmetrix
system. Data is copied from Symmetrix devices using array-based
resources without using host CPU or I/O. The source Symmetrix
devices remain online for regular I/O operations while the copies are
created. GDDR automation extends to managing TimeFinder within
an array or across multiple arrays, for example to provide "gold copy"
enterprise data images for DR testing.
TimeFinder enables customers to create multiple point-in-time copies
of data, allowing simultaneous execution of business tasks that were
previously sequential. GDDR exploits the following TimeFinder
modes of operation:
◆ TimeFinder/Mirror enables users to configure special devices
called business continuance volumes (BCVs) to create a mirror
image of Symmetrix standard devices. Using BCVs, TimeFinder
creates a point-in-time copy of data that can be repurposed. The
TimeFinder/Mirror component extends the basic API command
set of Mainframe Enablers to include commands that specifically
manage Symmetrix BCVs and standard devices.
◆ TimeFinder/Clone enables users to make copies of data from
source volumes to target volumes without consuming mirror
positions within the Symmetrix. The data is available to a target's
host immediately upon activation, even if the copy process has
not completed. Data may be copied from a single source device to
multiple target devices. A source device can be either a
Symmetrix standard device or a BCV device.

TimeFinder/Clone
TimeFinder/Clone for z/OS produces point-in-time copies of full
volumes or individual datasets. TimeFinder/Clone operations
involve full volumes or datasets where the amount of data at the
source is the same as the amount of data at the target.
TimeFinder/Clone provides significant configuration flexibility
because clone copies do not require Symmetrix mirror positions.
TimeFinder/Clone source and target devices can have any form of
RAID protection. The clone copies can also be configured as a
standard device or as a Business Continuance Volume (BCV).



A Full-Volume Snap captures a complete replica of the source volume
on the target volume in the local Symmetrix system.
TimeFinder/Clone requests that span control units can invoke an
external datamover to accomplish the request.

TimeFinder/Consistency Group
TimeFinder/Consistency Group, using the Enginuity Consistency
Assist (ECA) feature, provides consistent snap operations on multiple
volumes so that the targets are dependent-write consistent.
TimeFinder/Consistency Group is available for full device, virtual
device, and remote full device snaps. The source and target device
pairs must reside in the same Symmetrix system. Consistency can be
preserved over multiple volumes.

TimeFinder/Mirror for z/OS
EMC TimeFinder/Mirror is a business continuance solution. GDDR
uses TimeFinder/Mirror to make full-volume copies of production
data from a standard Symmetrix device (which is online for regular
I/O operations from the host) to a Business Continuance Volume
(BCV) with which the standard device is paired. The BCV was a
specially tagged volume established when the Symmetrix unit was
configured. The BCV functioned as a mirror controlled with the
TimeFinder/Mirror ESTABLISH, SPLIT, RE-ESTABLISH, and
RESTORE commands. The BCV device can then be separated (split)
from the standard device and used for backup, restore, decision
support, or applications testing. In GDDR controlled environments
GDDR will re-establish split BCVs with their associated devices at the
conclusion of test windows or otherwise as required during disaster
recovery operations.

EMC TimeFinder 45


Under Enginuity 5773 and earlier, TimeFinder/Mirror supports two
local-replication technologies as shown in Figure 7 on page 46.

TimeFinderMirror TimeFinderClone
Commands Commands

Emulation API

Mirror Clone

SYM-002533

Figure 7 TimeFinder/Mirror under Enginuity 5773 and earlier



Starting with Enginuity 5874, TimeFinder/Mirror uses clone
emulation for all operations, as shown in Figure 8 on page 47.

TimeFinderMirror TimeFinderClone
Commands Commands

Emulation API

Clone

SYM-002534

Figure 8 TimeFinder/Mirror under Enginuity 5874 and later

There are no specific steps that need to be taken to exploit Clone
Emulation. Whenever TimeFinder/Mirror internally detects a
Symmetrix controller running at Enginuity level 5874 and later,
TimeFinder/Mirror automatically sets the mode to clone emulation.
All TimeFinder/Mirror commands will be converted to clone
emulation. GDDR uses TimeFinder/Mirror command syntax. For
Enginuity 5874 and above code releases Clone Emulation will be
used-this is transparent to GDDR. So GDDR generated scripts
continue to operate as they did on TimeFinder/Mirror environments.

EMC TimeFinder 47


EMC Symmetrix Remote Data Facility (SRDF)
The SRDF family of products provides synchronous and
asynchronous remote replication capabilities for Symmetrix storage
systems. At the conceptual level, SRDF is mirroring (RAID level 1)
one logical disk device (the primary source/R1 within the primary
Symmetrix storage system) to a second logical device (the secondary
target/R2 within a physically separate secondary Symmetrix storage
system) over Fibre Channel or GigE high-speed communication links.
SRDF's basic premise is that a remote mirror of data in a different
Symmetrix storage system can serve as a valuable resource for:
◆ Protecting data using geographical separation.
◆ Giving applications a second location from which to retrieve data
should the primary location become unavailable for any reason.
◆ Providing a means to establish a set of volumes on which to
conduct parallel operations, such as testing or modeling.
GDDR supports SRDF Synchronous, SRDF/ Asynchronous and
SRDF/Star solutions in various forms of two and three-site solutions.
The following paragraphs describe the SRDF features exploited by
GDDR.

SRDF/Synchronous
SRDF synchronous (SRDF/S) mode facilitates disaster recovery
within the customer's campus or metropolitan area network through
real-time synchronous remote mirroring from one Symmetrix system
to one or more Symmetrix systems.

SRDF/Asynchronous
SRDF asynchronous (SRDF/A) mode provides a long distance
disaster restart data image with minimal impact on performance.
Symmetrix systems implement asynchronous mode host writes from
the primary Symmetrix to the secondary Symmetrix system using
dependent-write consistent delta sets transferred in cycles:
◆ Each delta set contains groups of write I/Os for processing,
which are managed for dependent-write consistency by the
Enginuity operating environment.



◆ SRDF/A transfers these sets of data using cycles of operation, one
cycle at a time, between the primary Symmetrix system and the
secondary Symmetrix system.
Write Folding improves the efficiency of the SRDF network links. If
there are multiple data updates in the same cycle, the systems send
the most current data across the SRDF links. This is called Write
Folding and is major advantage over competitive asynchronous
replication solutions as it decreases network bandwidth consumption
and the number of I/Os the SRDF director processes, thereby
reducing the system overhead per host I/O.
Figure 9 on page 49 illustrates SRDF/A delta sets and their
relationships to cycle processing.
Primary Symmetrix Secondary Symmetrix

R1 Apply R2
Capture
N N-2

Transmit Receive
N-1 N-1
R1 R2

Capture Transmit Receive Apply
“Active” “Inactive” “Inactive” “Active”
cycle cycle cycle cycle
SYM-001276

Figure 9 SRDF/A delta sets

In single session mode, the Symmetrix SRDF director ensures
dependent-write consistency within SRDF/A by obtaining the active
cycle number from a single location in global memory and assigning
it to each I/O. The director retains that cycle number even if a cycle
switch occurs during the life of that I/O. This results in an atomic
cycle switch process for dependent-write sequences, even though it is
not physically an atomic event across a range of volumes. As a result,
two I/Os with a dependent relationship between them can be in the
same cycle, or the dependent I/O can be in a subsequent cycle.
Delta set switching is at the core of the SRDF/A active session
operation. The following is required before a primary Symmetrix
system cycle switch can occur:

EMC Symmetrix Remote Data Facility (SRDF) 49


◆ The transmit delta set must have completed data transfer to the
secondary Symmetrix system.
◆ The minimum cycle time (single session mode) or minimum cycle
target time (MSC mode) must be reached.
◆ The previous Apply delta set must have been completed.
During the delta-set switching process:
1. Write I/Os are collected in the capture delta set on the primary
Symmetrix system. The previous cycle's transmit delta set is
completing the SRDF transfer to the secondary Symmetrix system
receive delta set, which is the N-1 copy. The secondary Symmetrix
system apply delta set (N-2) is written to global memory, so that
data is marked write pending to the secondary devices.
2. The primary Symmetrix system waits for the minimum cycle time
to elapse and the transmit delta set to empty, meaning that all
data has been transferred to the secondary Symmetrix system.
3. Once these conditions are satisfied, the primary Symmetrix
system sends a commit message to the secondary Symmetrix
system to begin the secondary Symmetrix system cycle switch.
4. On the primary system, the new capture delta set is available to
collect new host I/Os. Before the secondary Symmetrix system
cycle switch can occur, the following must be true:
• The secondary Symmetrix system receives the commit
message from the primary Symmetrix system.
• Apply delta set (N-2 copy) completes its write process
marking the data write pending to the secondary devices.
5. Once the secondary Symmetrix system receives the commit
message from the primary Symmetrix system, the secondary
Symmetrix system verifies the apply delta set has been written.
This occurs while the primary Symmetrix system is performing
the cycle switch between the capture and transmit delta sets.
6. The next step is a delta set cycle switch on the secondary
Symmetrix system between the receive (inactive) and apply
(active) delta sets. This preserves the dependent-write consistent
copy at the secondary Symmetrix system prior to receiving the
next dependent-write consistent delta set.
7. The secondary Symmetrix system sends an acknowledgement to
the primary Symmetrix system. The data in the apply delta set is
written to disk.



Figure 10 on page 51 shows the delta set switching process.

1 3a
1. Capture delta set (DS) collects
3e application write I/O
R1 R2
N Capture Apply 2. Primary waits for the minimum cycle
N-2
N N-2 time, and for the Transmit DS to empty
2d 2c 3b
Receive
a) Primary tells Secondary to commit the Receive
Transmit
N-1 2b, 4a N-1 DS (begins Secondary step 3 in unison)
R1 2 3c b) SRDF transfer halted
N R2
N-2 c) Primary cycle switch occurs – Capture DS
2d becomes the Transmit DS
Primary Symmetrix Secondary Symmetrix d) New Capture DS available for Host I/O
3. Secondary receives commit from Primary
a) Check if the data in Apply DS is restored (data
marked write pending to the R2 devices)
b) Secondary cycle switch –
Receive DS becomes Apply DS
c) New Receive DS available for SRDF transfer
d) Secondary sends Primary
acknowledgement
e) Begin restore of Apply DS
4. Primary receives acknowledgement of Secondary
cycle switch
a) SRDF transfer begins SYM-001274

Figure 10 Delta set switching

SRDF/A Multi-Session Consistency (MSC)
SRDF/A Multi-Session Consistency (MSC) supports SRDF/A
operations in configurations where there are multiple primary
Symmetrix systems or multiple primary Symmetrix system SRDF
groups connected to multiple secondary Symmetrix systems or
multiple secondary Symmetrix systems SRDF groups. SRDF/A MSC
configurations can also support mixed open systems and mainframe
data controlled within the same SRDF/A MSC session.
GDDR managed solutions using SRDF asynchronous replication
require MSC because the interface to control asynchronous
replication is provided via MSC.
Achieving data consistency across multiple SRDF/A groups requires
the cycle switch process be coordinated among the participating
Symmetrix system SRDF groups or systems, and that the switch
occur during a very brief time period when no host writes are being



serviced by any participating Symmetrix system. SRDF control
software running on the host provides a single coordination point to
drive the cycle switch process in all participating Symmetrix systems.
I/Os are processed exactly the same way in SRDF/A MSC mode as
they are in single session mode:
1. The active cycle on the primary Symmetrix system contains the
current host writes or N data version in the capture delta set.
2. The inactive cycle contains the N-1 data version that is transferred
using SRDF/A from the primary Symmetrix system to the
secondary Symmetrix system. The primary inactive delta set is
the transmit delta set and the secondary Symmetrix system's
inactive delta set is the receive delta set.
3. The active cycle on the secondary Symmetrix system contains the
N-2 data version of the apply delta set. This is the guaranteed
dependent-write consistent image in the event of a disaster or
failure.



Figure 11 on page 53 illustrates the delta sets and their relationships
to SRDF/A cycles.
Primary Symmetrix Secondary Symmetrix

R1 Apply R2
Capture
N N-2

Transmit Receive
N-1 N-1
R1 R2

Capture Apply
N N-2
R1 R2
Transmit Receive
N-1 N-1

Capture Apply
N N-2
R1 R2
Transmit Receive
N-1 N-1

Capture Transmit Receive Apply
“Active” “Inactive” “Inactive” “Active”
cycle cycle cycle cycle
SYM-001275

Figure 11 SRDF/A cycles and delta sets

For the host to control the cycle switch process, the Symmetrix
systems must be aware that they are running in multi-session
consistency mode. This is done using the SRDF control software
running on the host. The host software:
1. Coordinates the cycle switching for all SRDF/A sessions
comprising the SRDF/A MSC configuration.
2. Monitors for any failure to propagate data to the secondary
Symmetrix system devices and drops all SRDF/A sessions
together to maintain dependent-write consistency.
3. Performs MSC cleanup if able.
As part of the process to enter MSC mode, and with each cycle switch
issued thereafter, Enginuity assigns a cycle tag to each new capture
cycle. That cycle tag is retained throughout that cycle's life. This cycle
tag is a value that is common across all participating SRDF/A



sessions and eliminates the need to synchronize the cycle numbers
across them. The cycle tag is the mechanism by which
dependent-write consistency is assured across multiple MSC
controlled participating Symmetrix system SRDF groups or systems.
SRDF/A MSC mode performs a coordinated cycle switch during a
very short window of time referred to as an SRDF/A window, which
is actually a flag (open/closed) indicating when there are no host
writes being completed. The SRDF/A window flag is an attribute of
the SRDF/A group and is checked at the start of each I/O, imposing
no additional overhead because the front-end director is already
obtaining the cycle number from global memory as part of the
existing SRDF/A cycle switch operations.
When the host software discovers that all the SRDF groups and
Symmetrix systems are ready for a cycle switch, MSC "opens" the
SRDF/A window and issues a single command to each SRDF group
to perform a cycle switch. In multi-session mode, the front-end
director obtains the cycle number at the start of each write and also
checks the SRDF/A window flag, if is the flag is on (an open
window) the front-end director disconnects upon receiving host write
I/O and begins polling to determine when the SRDF MSC host
control software has closed the window. While the window is open,
any write I/Os that start are disconnected and as a result no
dependent-write I/Os are issued by any host to any devices in the
SRDF/A MSC session.
The SRDF/A window remains open on each SRDF group and
Symmetrix system until the last SRDF group and Symmetrix system
in the multi-session group acknowledges to the SRDF MSC host
control software that the open and switch command has been
processed indicating a successful cycle switch. At this point the SRDF
MSC host control software issues a close command for each SRDF/A
group under MSC control. As a result, dependent-write consistency
across the SRDF/A MSC session is ensured.

Concurrent SRDF
Enginuity version 5567 and later support the ability for a single
primary device to be remotely mirrored to two secondary devices
concurrently. This feature is called concurrent SRDF and is supported
on Fibre Channel, Gigabit Ethernet (GigE) topologies. In concurrent
SRDF, the primary (R11) devices must be assigned to two different



SRDF groups. This is because each remote mirror can be assigned to
only one SRDF group, even if the two secondary devices reside in the
same Symmetrix system.
The secondary devices in a concurrent SRDF configuration can be
operating in synchronous, adaptive copy, or asynchronous mode. The
only unsupported combination is for both secondary devices to be
operating in asynchronous mode. Concurrent SRDF/S with SRDF/A
provides the ability to remotely mirror a group of devices in
synchronous mode to one secondary site and in asynchronous mode
to a different secondary site which may be an extended distance site.
Normal I/O operational rules for SRDF also apply to Concurrent
SRDF configurations. When operating in synchronous mode, ending
status for an I/O is not presented to the host until the remote
Symmetrix system acknowledges receipt of the I/O to the primary
Symmetrix system. If both secondary devices are operating in
synchronous mode, ending status is not presented to the host until
both devices acknowledge receipt of the I/O. If one remote mirror is
in synchronous mode and one remote mirror is in adaptive copy or
asynchronous mode, ending status is presented to the host when the
synchronous device acknowledges receipt of the I/O. GDDR
automation provides scripts that allow customers to convert from
concurrent to cascaded mode provided that the initial configuration
is installed as cascaded mode.

Cascaded SRDF
Cascaded SRDF is a three-site disaster recovery solution where data
from a primary site is synchronously replicated to a secondary site,
and then asynchronously replicated from the secondary site to a
tertiary site, as shown in Figure 12 on page 56. The core benefit
behind a cascaded configuration is its inherent capability to continue
replicating, with minimal user intervention, from the secondary site
to the tertiary site in the event that the primary site fails. This enables
a faster recovery at the tertiary site, provided that the tertiary site is
where the customer wishes to restart production operations.



Cascaded SRDF uses dual-role SRDF devices (R21 devices) on the
secondary site which acts as both an R2 to the primary site and an R1
to the tertiary site as shown in the following figure.

Host Primary Secondary Tertiary

R1 SRDF/S R21 SRDF/S R2

SYM-002536

Figure 12 Cascaded SRDF configuration

The following SRDF modes are allowed in Cascaded SRDF:
◆ R1 -> R21: SRDF/S, SRDF/A, Adaptive copy disk mode,
Adaptive copy write-pending mode
◆ R21 -> R2: SRDF/A, Adaptive copy disk mode
The most common implementation is for the first hop to be in
SRDF/S mode, and the second hop to be in SRDF/A mode.
The following limitations apply to Cascaded SRDF:
◆ Only one hop (R1 -> R21, or (R21 -> R2) can be asynchronous at a
time
◆ Like concurrent SRDF, the two SRDF mirrors of an R21 cannot be
configured using the same SRDF group. GDDR automation
provides scripts that allow customers to convert from cascaded to
concurrent mode provided that the initial configuration is
installed as cascaded mode.

SRDF/Extended Distance Protection
SRDF/Extended Distance Protection (EDP) is a three-site
configuration that requires Enginuity 5874 and later running on the
secondary Symmetrix system and Enginuity 5773 or 5874 and later
running on the primary and tertiary systems. Figure 13 on page 57
shows an example of an SRDF/EDP basic configuration.



SRDF/EDP is achieved through a cascaded SRDF setup, where a
Symmetrix VMAX system at a secondary site uses diskless R21
devices to capture only the differential data that would be owed to
the tertiary site in the event of a primary site failure.

Primary Secondary Tertiary
Host

SRDF/S SRDF/A
DL R2
R1
Dev

Full copy Diskless Full copy
R21 decive
(Not full copy-
differences only) SYM-002130

Figure 13 SRDF/EDP basic configuration

SRDF/EDP provides a long distance replication solution with the
ability to achieve zero RPO at the tertiary site. This is also a lower cost
solution when compared to three-site solutions such as concurrent
and cascaded SRDF.



EMC SRDF consistency groups
GDDR manages a single consistency group. Devices in the group can
be FBA (Open Systems) or CKD (mainframe) format. The design of
ConGroup and GDDR allows for customers to build HA and DR
solutions around the concept of the enterprise. Because GDDR is
installed outside of the scope of a Sysplex-GDDR controlled solutions
can also span multiple Sysplex environments. An SRDF consistency
group is a collection of related Symmetrix devices that are configured
to act in unison to maintain data integrity. The devices in consistency
groups can be spread across multiple Symmetrix systems.
Consistency Groups for z/OS (ConGroup) is an SRDF product
offering designed to ensure the dependent-write consistency of the
data remotely mirrored by SRDF/S operations in the event of a
rolling disaster. Most applications, and in particular database
management systems (DBMSs), have dependent-write logic
embedded in them to ensure data integrity if a failure occurs, either
hardware or software.
ConGroup is based on the concept of dependent-write operations. A
dependent-write is a write not issued by an application until a prior,
related write I/O operation is completed. An example of
dependent-write activity is a database update:
1. The DBMS writes to the disk containing the transaction log.
2. The DBMS writes the data to the actual database.
3. The DBMS writes again to the log volume to indicate that the
database update was made.
In a remotely mirrored environment, data consistency cannot be
ensured if one of the writes is remotely mirrored, but its predecessor
write was not. This could occur, for example, in a rolling disaster
where a communication loss occurs and affects only a subset of the
devices involved in the remote copy function.
ConGroup prevents a rolling disaster from affecting data integrity at
the secondary site. When ConGroup detects any write I/O to a
volume that cannot communicate with its R2 (secondary) mirror, it
suspends the remote mirroring for all volumes defined to the
consistency group before completing the intercepted I/O and
returning control to the application. This is referred to as a ConGroup
trip. In this way, ConGroup ensures a dependent-write consistent



image is available at the secondary site by preventing a
dependent-write I/O from reaching the secondary site if the previous
I/O only gets as far as the primary mirror.
I/O to the primary devices in the consistency group can still occur
even when the devices are Not Ready on the SRDF links. Such
updates are not immediately sent to the secondary site. However,
they are propagated after the affected links are again operational, and
data transfer from the primary devices to the secondary devices
resumes.
Assume an SRDF configuration consists of three Symmetrix systems
with primary devices, and two additional Symmetrix systems with
secondary devices. The systems with primary devices send data to
the systems with secondary devices as shown in Figure 14 on
page 59.

Primary 1 Primary 2 Primary 3

Secondary 1 Secondary 2

SYM-002537

Figure 14 Primary and secondary relationships

EMC SRDF consistency groups 59


Next, assume that the links between primary system 2 and secondary
system 1 fail. Without a consistency group, primary systems 1 and 3
continue to write data to the secondary site systems 1 and 2 while
primary system 2 does not, as shown in Figure 15 on page 60. The
copy of the data spread across secondary systems 1 and 2 becomes
inconsistent.



SYM-002538

Figure 15 Failed links

However, if primary systems 1, 2, and 3 belong to a consistency
group, as shown in Figure 16 on page 61, and the links between
primary system 2 and secondary system 1 fail, the consistency group
automatically stops primary systems 1 and 3 from sending data to
secondary systems 1 and 2. Thus, the dependent-write consistency of
the data spanning secondary systems 1 and 2 remains intact.



Consistency group



SYM-002539

Figure 16 Primary systems 1, 2, and 3 in a consistency group

EMC SRDF consistency groups 61


EMC AutoSwap

Consistency group



SYM-002540

EMC AutoSwap provides the ability to move (swap) workload I/Os
transparently from volumes in one set of Symmetrix storage systems
to volumes in other Symmetrix storage systems without operational
interruption. AutoSwap, with SRDF and EMC Consistency Groups,
dramatically increases data availability.
AutoSwap differs from IBM HyperSwap in that AutoSwap is not
confined to operating completely within a Sysplex. AutoSwap can
coordinate swaps for multiple monoplex or Sysplex systems in a
single swap event.
Swaps may be initiated either manually as planned events or
automatically as unplanned events (upon failure detection).
◆ Planned swaps facilitate operations such as non-disruptive
building maintenance, power reconfiguration, DASD relocation,
and channel path connectivity reorganization.
◆ Unplanned swaps protect systems against outages in a number of
scenarios. Examples include: power supply failures, building
infrastructure faults, air conditioning problems, loss of channel
connectivity, entire DASD system failures, operator error, or the
consequences of intended or unintended fire suppression system
discharge.



◆ In GDDR managed environments planned swap events are
initiated and controlled by GDDR. Unplanned swap events are
initiated and controlled by AutoSwap. GDDR monitors for
unplanned swap events and will re-configure surviving
infrastructure to deal with the new source of write I/O activity
after the successful unplanned swap event.
Swaps are concurrently performed while application workloads
continue in conjunction with EMC Consistency Groups. This option
protects data against unforeseen events, and ensures that swaps are
unique, atomic operations that maintain dependent-write
consistency.

EMC AutoSwap 63


EMC SRDF/Star
SRDF/Star is a three-site disaster recovery solution consisting of a
primary Symmetrix system (Site A) and two remote Symmetrix
systems (Sites B and C). SRDF/Star provides advanced multisite
business continuity replication that augments concurrent SRDF/S
(synchronous) and SRDF/A (asynchronous) operations for the same
primary volumes with the ability to incrementally establish or
reestablish SRDF/A replication between the primary (R1) and remote
(R2) sites in the event of a primary site outage.
In mainframe environments, EMC GDDR is a mandatory
requirement for all SRDF/Star deployments to automate site swap
activity and/or disaster restart. In the event of a primary site outage,
the EMC GDDR SRDF/Star solution allows customers to quickly
move operations and re-establish protection between the remaining
sites. Once conditions permit, customers can rejoin the primary site to
the configuration, resuming SRDF/Star protection.
SRDF/Star can operate in concurrent or cascaded environments,
providing the appropriate differential synchronization. These
environments address different recovery and availability objectives:
◆ Concurrent SRDF/Star positions the secondary site or the remote
site as potential recovery sites, and provides differential
resynchronization between the secondary and remote sites. To
achieve this positioning, some level of reconfiguration
intervention is required to access point-of-disaster data.
◆ Cascaded SRDF/Star positions only the remote site as the
recovery site with minimal intervention to access
point-of-disaster data. This solution differentially synchronizes
data between the primary site and the remote site.
◆ SRDF/EDP can also be used in an SRDF/Star configuration. By
design, SRDF/EDP only allows recovery at the tertiary site.
SRDF/Star provides rapid reestablishment of cross-site protection in
the event of primary site failure. Rather than a full resynchronization
between the remote sites, SRDF/Star provides differential
synchronization between the remote sites, and dramatically reduces
the time it takes to remotely protect the new production site.
SRDF/Star also provides a mechanism to determine which remote
site has the most current data in the event of a rolling disaster that



affects the primary site. In all cases, users maintain the ability to
choose which site to operate from and which site's data to use when
recovering from a primary site failure.
The host-based Multi-Session Consistency (MSC) task at the primary
(R1) site controls normal SRDF/Star operation. MSC performs
session management at SRDF/S site B and when necessary at
SRDF/A site C. The MSC session management task maintains the
information needed to perform differential synchronization between
site B and site C. Automation for SRDF/Star mainframe
configuration management is provided by EMC GDDR.
In Concurrent SRDF/Star operations, site B serves as the secondary
site and the target of the SRDF/S links from site A. Site C serves as
the tertiary site and the secondary site of the SRDF/A links from site
A. The recovery links are the SRDF/A links between site C and site B.
Figure 17 on page 66 shows a Concurrent SRDF/Star configuration.

EMC SRDF/Star 65


Primary site (A) Secondary site (B)

R1 SRDF/Synchronous R2

SRDF/Asynchronous SRDF/A
(recovery links)

Out of region site C

Active R2
Inactive

SYM-002541

Figure 17 Concurrent SRDF/Star configuration

SRDF/Star in a cascaded SRDF configuration has the ability to
incrementally establish an SRDF/A session between the primary and
the asynchronous site in the event the synchronous site fails. With
cascaded SRDF/Star, the synchronous secondary site is always more
current than the asynchronous secondary site



Figure 18 on page 67 shows a basic cascaded SRDF/Star
configuration. Site B serves as the secondary site and the target of the
SRDF/S links from site A. Site C serves as the out of region site and
the target of the SRDF/A links from site B. The recovery SRDF/A
links are between site C and site A.

Primary site (A) Secondary site (B)


SRDF/A
(recovery links) SRDF/Asynchronous

Out of region site C

Active R2
Inactive

SYM-002542

Figure 18 Cascaded SRDF/Star configuration

EMC SRDF/Star 67


Figure 19 on page 68 illustrates a cascaded SRDF/Star EDP diskless
environment with the remote devices at the asynchronous secondary
site. The diskless R21 device streamlines the link connections to the
asynchronous site in cascaded mode. No data copies are available at
the synchronous target site B because that site's Symmetrix use
diskless cache only R21 devices.
Synchronous secondary site (B)
Primary site (A) (diskless DL Dev)


SRDF/A
(recovery links) SRDF/Asynchronous

Asynchronous secondary site (C)

Active R2
Inactive

SYM-002543

Figure 19 Cascaded SRDF/Star with EDP (diskless) configuration


4
EMC Geographically
Dispersed Disaster
Restart (EMC GDDR)

This chapter contains the following:
◆ Supported configurations ................................................................. 72
◆ EMC GDDR requirements ................................................................ 89
◆ EMC GDDR fundamental concepts ................................................ 93
◆ EMC GDDR components.................................................................. 98
◆ EMC GDDR procedures.................................................................. 114
◆ EMC GDDR supported scripts....................................................... 116

EMC Geographically Dispersed Disaster Restart (EMC GDDR) 69

EMC Geographically Dispersed Disaster Restart (EMC GDDR)

EMC Geographically Dispersed Disaster Restart (EMC GDDR) can be
implemented in a variety of configurations involving two or
three-sites, SRDF/S, SRDF/A, ConGroup, AutoSwap, SRDF/EDP,
and SRDF/Star. In the mainframe environment, EMC GDDR is a
requirement for a SRDF/Star configuration. EMC GDDR can manage
environments that are comprised of the following elements:
◆ Multiple z/OS systems
◆ Multiple Sysplexes
◆ Multiple Symmetrix controllers
◆ Intermix of CKD and FBA/FBAM DASD and BCVs
In each configuration, EMC GDDR provides specific capabilities
tailored to that configuration. However, the major features of EMC
GDDR are common across all topologies.
Because EMC GDDR manages production systems following
disasters, it does not reside on the same servers that it is seeking to
protect. EMC GDDR resides on separate z/OS systems from the host
servers that run your application workloads. EMC GDDR is installed
on a control LPAR at each site. These control LPARs are referred to as
GDDR nodes, Control Systems, or C-Systems. Each EMC GDDR node
is aware of the other EMC GDDR nodes through network
connections between each site. This multi-site awareness allows EMC
GDDR to detect disasters and identify survivors. In two-site
configurations, EMC GDDR can recover business at the surviving
site. In three-site configurations, EMC GDDR can nominate a control
LPAR to assume the leadership role for EMC GDDR and recover
business at one of the surviving sites.
To achieve the task of business restart, EMC GDDR automation
extends well beyond the disk level and into the host operating system
level where sufficient controls and access to third party software and
hardware products exist to enable EMC GDDR to provide automated
recovery capabilities.
EMC GDDR can distinguish normal operational disruptions from
disasters and respond accordingly. For example, EMC GDDR is able
to distinguish between network outages (SRDF link drop) and real
disasters. This awareness is achieved by periodic exchange of
dual-direction heartbeats between the EMC GDDR C-Systems. EMC
GDDR constantly checks for disaster situations and ensures that
other GDDR systems are "healthy." This checking allows EMC GDDR
to recognize, and act on, potential disaster situations, even if only one
EMC GDDR C-system survives. "Split brain" problems associated



with cluster technologies are avoided through operator prompts.
Upon the initial recognition stage, EMC GDDR issues messages to the
operator console seeking confirmation of the event and, further,
confirmation of restart actions required.
In any EMC GDDR installation, one Control System (C-System) is
designated as the Master C-System. In a two-site configuration,
failure of the Master C-System will result in the remaining C-System
becoming the Master. In a three-site configuration, if a local or
regional disaster occurs EMC GDDR can determine which of the
surviving sites will execute the recovery. Changes to EMC GDDR
configuration information can only be made on the EMC GDDR
Master Control System. EMC GDDR propagates these changes to the
subordinate EMC GDDR C-system(s) using inter-system
communications. Restart procedures following disasters are
coordinated from the EMC GDDR Master C-System. EMC GDDR
scripts, whether for planned or unplanned scenarios, can only be
executed from the Master C-System. If necessary, the Master role can
be manually transferred from one C-System to another through the
EMC GDDR user interface.

71


Supported configurations
EMC GDDR is available to monitor and manage the following
configurations and topologies:
Two-site solutions:
◆ SRDF/S with ConGroup — The two-site SRDF/S with ConGroup
configuration provides disaster restart capabilities at site DC2.
◆ SRDF/S with AutoSwap — The two-site SRDF/S with AutoSwap
configuration provides for near-continuous availability through
device failover between DC1 and DC2.
◆ SRDF/A — The two-site SRDF/A configuration provides
disaster restart capabilities at site DC3.
Three-site solutions:
◆ SRDF/Star — The three-site SRDF/Star configuration provides
disaster restart capabilities at either the Secondary DASD site or
DC3. Concurrent and Cascaded SRDF support further minimize
the DC3 recovery time objective. R22 support is available,
removing the need to create device pairs during GDDR scripts.
◆ SRDF/Star with AutoSwap — The three-site SRDF/Star with
AutoSwap configuration provides for near-continuous
availability through device failover between DC1 and DC2 as
well as disaster restart capabilities at DC3. Concurrent and
Cascaded SRDF support further minimize the DC3 recovery time
objective. R22 support is available, removing the need to create
device pairs during GDDR scripts.
◆ SRDF/Star with EDP.
EMC GDDR functionality, controlled by a parameter library, can be
customized to operate in any of these two or three-site
configurations. During EMC GDDR implementations, the EMC
GDDR parameter library is customized to reflect:
◆ The prerequisite software stack.
◆ The desired data center topology (two-site versus three-site,
synchronous or asynchronous).



An EMC GDDR complex consists of EMC GDDR control systems
(C-Systems), the z/OS and open systems hosts, and EMC Symmetrix
storage systems which support an organization's mission-critical
workload. A single GDDR complex is able to support multiple
Sysplexes.

SRDF/S with ConGroup
The two-site SRDF/S with ConGroup configuration provides disaster
restart capabilities at site DC2. Figure 20 on page 73 illustrates EMC
GDDR operation in the SRDF/S with Consistency Group
environment.

DC1 DC2

EMC EMC
GDDR GDDR

ConGroup ConGroup

R1 R2
SRDF/S

GDDR heartbeat communication
Active Escon/Ficon channels
Standby Escon/Ficon channels
Active SRDF links SYM-002226

Figure 20 SRDF/S with ConGroup environment

As Figure 20 on page 73 shows, the relationship between the DC1 and
DC2 sites is maintained through SRDF/S replication of primary disk
images at DC1 to DC2. Both open systems (FBA) and mainframe
(CKD) disk images can be replicated. Figure 20 on page 73 also shows
the two EMC GDDR C-Systems with their heartbeat communication
paths, separate from the production disk and computer facilities. The
C-system at the Primary DASD site has EMC Consistency Group
software installed. It is recommended that one or more of the Primary
DASD site production z/OS systems also has EMC Consistency

Supported configurations 73


Group software installed. SRDF/S and ConGroup ensure that at the
point that EMC GDDR receives notification of an unplanned or
failure event, a point of consistency is already achieved.
In this environment, EMC GDDR can do the following:
◆ Manage planned site swaps
◆ Restart processing at the secondary site following unplanned
primary site events
◆ Perform standard operational tasks:
• IPL, system reset, activate, deactivate
• Trigger stop/start of business workloads
◆ Actively monitor for unplanned/failure events
• Sites
• Systems
• Loss of SRDF/S
• ConGroup trip
• Inter-site communication failure

EMC GDDR SRDF/S complex
Each GDDR complex manages one consistency group. A consistency
group is a named group of source (R1) volumes managed by the EMC
Consistency Group (ConGroup) application as a unit. The volumes
can be any mix of FBA and CKD devices on multiple Symmetrix units
supporting a system, a SYSPLEX, or multiple SYSPLEXs, as well as
"Open systems" hosts (Windows and various Unix flavors). Figure 21
on page 75 depicts a logical view of a typical EMC GDDR complex.



Primary site Secondary site

DC1 DC2
Sysplex #n Sysplex #n
Sysplex #2 Sysplex #2
C1 C2
Master C

SRDF/S
Local R1 R2 Local

BCV BCV

SYM-002219

Figure 21 EMC GDDR complex with SRDF/S

The following are brief descriptions of the components that comprise
this GDDR complex:
BCVs — BCVs (Business Continuance Volumes) can be supported at
each of the sites. They may be established at the DC2 site and split at
the DC1 site. BCVs taken at the recovery site are often referred to as
"Gold Copy" devices.
C1 and C2 — C1 and C2 are the EMC GDDR Control LPARs (or
C-Systems) at each of the sites.
Primary site — The primary site is the site where the production
workload is located.
Primary DASD site — The primary DASD (direct access storage
device) site is the site where the source (R1) DASD is located. The
primary DASD site is the same as the primary site.
Secondary site — The secondary site is the site where the
contingency or standby systems are located.
Secondary DASD site — The secondary DASD site is the site where
the target (R2) DASD is located. The secondary DASD site is the same
as the secondary site.
Sites DC1 and DC2 — Sites DC1 and DC2 are the primary and
secondary data centers of critical production applications and data.
DC1 is the primary site, with SRDF/S data replication to the
secondary site, DC2. These sites are considered fully equivalent for



strategic production applications, connected with highly redundant
direct network links. At all times, all production data is replicated
synchronously between the two-sites.

SRDF/S with AutoSwap
The two-site SRDF/S with AutoSwap configuration provides for
near-continuous availability through device failover between DC1
and DC2. Figure 22 on page 76 illustrates EMC GDDR operation in
the SRDF/S with AutoSwap environment.

DC1 DC2
AutoSwap

EMC EMC
GDDR GDDR

AutoSwap AutoSwap

R1 R2
SRDF/S


Figure 22 SRDF/S with AutoSwap environment

DC2 sites is maintained through SRDF/S replication of primary disk
images at DC1 to DC2. Both open systems (FBA) and mainframe
(CKD) disk images can be replicated. Figure 22 on page 76 also shows
the two EMC GDDR C-Systems with their heartbeat communication
paths, separate from the production disk and computer facilities.
Each of the DC1 and DC2 production z/OS LPARs as well as both
C-systems has EMC AutoSwap and EMC Consistency Group
(ConGroup) software installed. AutoSwap and ConGroup ensure
that a point of consistency exists whenever EMC GDDR receives
notification of an unplanned or failure event.



◆ Manage recovery after unplanned site swaps
• Sites
• Systems
• Loss of SRDF/S
• ConGroup trip
• AutoSwap events
◆ Configure/reconfigure
• Couple datasets
◆ Manage coupling facilities
• Policies



SRDF/A
The two-site SRDF/A configuration provides disaster restart
capabilities at site DC3. In case DC3 was the Primary DASD site,
GDDR offers disaster restart capabilities at DC1. Figure 23 on page 78
illustrates EMC GDDR operation in the SRDF/A environment.

DC1 DC3

EMC EMC
GDDR GDDR

R1 R2
SRDF/A


Figure 23 SRDF/A environment

DC3 sites is maintained through SRDF/A replication of primary disk
images from DC1 to DC3. Both open systems (FBA) and mainframe
(CKD) disk images can be replicated. It also shows the two EMC
GDDR C-Systems with their heartbeat communication paths,
separate from the production disk and computer facilities.
EMC GDDR does not have a requirement to "freeze" I/O to obtain a
point of consistency. Multi-Session Consistency and SRDF/A provide
the mechanism to obtain a point of consistency. At the point that
EMC GDDR receives notification of an unplanned or failure event, a
point of consistency is already achieved through these foundation
technologies.



◆ Restart processing at the secondary site following unplanned
primary site events
◆ Perform standard operational tasks
• Sites
• Systems
• Loss of SRDF/A

EMC GDDR SRDF/A complex
Each GDDR complex can manage one Multi-Session Consistency
(MSC) group. An MSC group is a named group, consisting of
multiple RDF groups operating in SRDF/A mode, managed by the
EMC MSC control software feature as a single unit. These groups can
have any mix of CKD and FBA devices, on multiple Symmetrix units
supporting a system, a SYSPLEX, or multiple SYSPLEXs as well as
"Open systems" hosts (Windows and various flavours of Unix).
Figure 24 on page 79 depicts a logical view of a typical EMC GDDR
complex.


DC1 DC3
C1 C3
Master C

SRDF/A
Local R1 R2 Local

BCV BCV

SYM002218

Figure 24 GDDR complex with SRDF/A



this GDDR complex:
BCVs — Business Continuance Volumes can be supported at each of
the sites. They may be established at the DC3 site and split at the DC1
site.
C1 and C3 — C1 and C3 are the EMC GDDR Control LPARs (or
C-Systems) at each of the sites.
Primary site —The primary site is the site where the production
Primary DASD site —The primary DASD (direct access storage
primary DASD site is the same as the primary site.
the target (R2) DASD is located. The secondary DASD site is the same
as the secondary site.
Sites DC1 and DC3 — Sites DC1 and DC3 are the primary and
secondary data centers of critical production applications and data.
DC1 is the primary site, with SRDF/A data replication to the
secondary site, DC3. These sites are considered fully equivalent for
strategic production applications, connected with highly redundant
direct network links. At all times, all production data is replicated
asynchronously between the two-sites.



SRDF/Star
Configuration
The three-site SRDF/Star configuration provides disaster restart
capabilities at DC2 or DC3. Figure 25 on page 81 illustrates EMC
GDDR operation in a concurrent SRDF/Star environment. Figure 26
on page 82 illustrates EMC GDDR operation in a cascaded
SRDF/Star environment.

DC1 DC2

EMC EMC
GDDR GDDR

ConGroup ConGroup

R1 R2
SRDF/S

DC3

SRDF/A

R2
EMC
Standby Escon/Ficon channels GDDR

Active SRDF links
SRDF links in standby mode

SYM-002223

Figure 25 Concurrent SRDF/Star environment



DC1 DC2

EMC EMC
GDDR GDDR

ConGroup ConGroup

R1 R21
SRDF/S

SRDF/A
DC3

R2
EMC

Active SRDF links

SYM-002221

Figure 26 Cascaded SRDF/Star environment

As Figure 25 on page 81 and Figure 26 on page 82 show, the
relationship between the DC1 and DC2 sites is maintained through
SRDF/Synchronous replication of primary disk images at DC1 to
DC2. Both open systems (FBA) and mainframe (CKD) disk images
can be replicated. In a concurrent configuration, the asynchronous
relationship is between DC1 and DC3, while in a cascaded
environment, the asynchronous relationship is between DC2 and
DC3.
Figure 25 on page 81 and Figure 26 on page 82 also show the three
EMC GDDR C-Systems with their independent heartbeat
communication paths, separate from the production disk and
computer facilities. The C-system at the Primary DASD site has EMC
Consistency Group software installed. It is recommended that one or



more of the Primary DASD site production z/OS systems also has
EMC Consistency Group software installed. During relevant GDDR
scripts, GDDR will ensure the Consistency Group software is started
where needed.
In this environment, EMC GDDR can perform the following tasks:
◆ Manage reconfiguration of the SRDF/Star environment between
concurrent and cascaded topologies
◆ Manage reconfiguration of the SRDF/Star environment from
cascaded to concurrent with a primary processing site move
◆ Actively monitor for unplanned/failure events, including:
• Sites
• Systems
• ConGroup trip
• Loss of SRDF/S
• Loss of SRDF/A



EMC GDDR SRDF/Star complex
Figure 27 on page 84 depicts a logical view of a typical EMC GDDR
complex in a concurrent configuration.


DC1 DC2
C1 C2
Master C

SRDF/S
Local R1 R2 Local

BCV BCV

SRDF/A

DC3

C3

R2 Local
BCV

Tertiary site SYM-002220

Figure 27 GDDR Complex with Star

this GDDR complex:
BCVs — BCVs (Business Continuance Volumes) can be supported at
each of the sites.
Primary site — The primary site is the site where the production



Primary DASD site — The primary DASD (direct access storage
primary DASD site is normally the same as the primary site.
the target (R2) DASD is located. The secondary DASD site is normally
the same as the secondary site.
Sites DC1, DC2, and DC3 — Site DC1, DC2, and DC3 are used
together in an SRDF/Star data replication three-site topology, where
DC1 is the primary site with concurrent or cascaded replication to the
secondary site DC2 and the tertiary site DC3. Site DC3 is the tertiary
data center for critical production applications and data. It is
connected with redundant network to both DC1 and DC2. Data is
replicated asynchronously from the current primary DASD site or
secondary DASD site (with cascaded SRDF) with an intended
recovery point objective (RPO) in a short time period.

SRDF/Star with AutoSwap
The three-site SRDF/Star with AutoSwap configuration provides for
near-continuous availability through device failover between sites
DC1 and DC2 as well as disaster restart capabilities at site DC3.
Figure 28 on page 86 illustrates EMC GDDR operation in a
Concurrent SRDF/Star with AutoSwap environment. Figure 29 on
page 87 illustrates EMC GDDR operation in a Cascaded SRDF/Star
with AutoSwap environment.



DC1 DC2
AutoSwap

EMC EMC
GDDR GDDR

AutoSwap AutoSwap

R1 R2
SRDF/S

DC3

SRDF/A

R2
Active Escon/Ficon channels EMC
Active SRDF links

SYM-002228

Figure 28 Concurrent SRDF/Star with AutoSwap environment



DC1 DC2
AutoSwap

EMC EMC
GDDR GDDR

AutoSwap AutoSwap

R1 R21
SRDF/S

SRDF/A
DC3

R2
EMC

Active SRDF links

SYM-002222

Figure 29 Cascaded SRDF/Star with AutoSwap environment

As Figure 28 on page 86 and Figure 29 on page 87 show, the
relationship between the DC1 and DC2 sites is maintained through
SRDF/Synchronous replication of primary disk images at DC1 to
DC2. Both open systems (FBA) and mainframe (CKD) disk images
can be replicated. In a concurrent configuration, the asynchronous
relationship is between DC1 and DC3, while in a cascaded
environment, the asynchronous relationship is between DC2 and
DC3.
These illustrations show the three EMC GDDR C-Systems with their
independent heartbeat communication paths, separate from the
production disk and computer facilities. Each of the DC1 and DC2
production z/OS LPARs has EMC AutoSwap and EMC Consistency
Group (ConGroup) installed.



In this environment, EMC GDDR can perform the following tasks:
◆ Manage reconfiguration of the SRDF/Star environment between
the concurrent and cascaded topologies
◆ Manage reconfiguration of the SRDF/Star environment from
cascaded to concurrent with a primary processing site move
◆ Actively monitor for unplanned/failure events, including:
• Sites
• Systems
• ConGroup trip
• Loss of SRDF/S
• Loss of SRDF/A
• AutoSwap events
◆ Configure/reconfigure:
• Couple datasets
◆ Manage coupling facilities:
• Policies



EMC GDDR requirements

Mainframe environment requirements
The basic infrastructure must support the specific EMC technologies
involved in the configuration (for example, SRDF/S with AutoSwap
or SRDF/A). In addition, EMC GDDR has the following specific
infrastructure requirements:
◆ One LPAR is required for each C-System, preferably using DASD
isolated from the managed environment.
◆ There must be network connectivity between all C-Systems.
◆ An HMC (Hardware Management Console) must be available at
each site that can be accessed from each C-System (access to these
HMCs can be protected by means of a private VLAN).
EMC GDDR has the mainframe environment requirements listed in
Table 1 on page 89.

Table 1 Mainframe environment requirements

Item Requirements

Processor hardware configuration Any system that supports current IBM mainframe
operating systems

DASD hardware configuration Any supported Symmetrix DASD model at an
Enginuity microcode level specified in the EMC
GDDR Release Notes

Software Any currently supported IBM operating system

EMC GDDR supports and can manage the following combinations of
DASD in a single Enterprise Consistency Group:
◆ Single EMC Symmetrix controllers configured with any of the
following:
• All CKD devices
• All FBA and FBA-META devices
• Any combination of CKD, FBA and FBA-META devices
◆ Multiple EMC Symmetrix controllers configured with any of the
following:
• All CKD devices

EMC GDDR requirements 89


• All FBA and FBA-META devices
• Any combination of CKD, FBA and FBA-META devices
Management and monitoring of both CKD and FBA/FBA-META
devices is performed from the z/OS platform where the EMC GDDR
application resides. From the EMC GDDR point of view, CKD and
FBA/FBA-META Symmetrix devices are the same; that is, each is
treated no differently than the other. They are all command targets of
SRDF Host Component configuration commands using local, or
remote syntax.
EMC GDDR requires that if even only one device in an RDF group is
defined to GDDR, then all devices in that group must be defined to
GDDR. Most GDDR actions are directed at the RDF group level
(although in some cases, GDDR will act on device ranges if that is
appropriate).
EMC GDDR has no limitations on the number of EMC Symmetrix
controllers/devices that can be managed. Any limitations are subject
to restrictions in EMC hardware and software.

Minimum hardware requirements
Table 2 on page 90 describes the recommended minimum processor
and I/O configuration for an EMC GDDR C-System.
Table 2 Minimum hardware requirements

Item Requirements

Logical processors 1 (2 are recommended)

MSU 15 on IBM 2084-306 (or equivalent)

Storage 512 MB

Logical paths to own local DASD devices 4

Logical paths to managed DASD devices 4

Minimum software requirements
The minimum software requirements for EMC GDDR for the various
supported configurations are listed in the following sections. The
EMC GDDR Release Notes provide the latest information regarding



supported software release levels. Installation procedures for the
EMC software products listed can be found in the EMC Mainframe
Enablers Installation and Customization Guide.

Table 3 Software requirements

Configuration HMC API or BCPii MFE Consistency AutoSwap
support Group

SRDF/S with ConGroup   
SRDF/S with AutoSwap    
SRDF/A  
SRDF/Star   
SRDF/Star with AutoSwap    

Additional configuration requirements
The following sections list additional configuration requirements for
the supported EMC GDDR configurations.

SRDF/S with ConGroup
Refer to the EMC SRDF Host Component for z/OS Product Guide for
information on configuring an SRDF/S environment.

Refer to the EMC SRDF Host Component for z/OS Product Guide for
information on configuring an SRDF/S environment.
◆ SRDF/S with AutoSwap has the following additional
requirements:
◆ CAX protection must be added to the SRDF/S-defined
ConGroups.
◆ LOSTOWNERPOLICY ONSWAP=OPERATOR must be specified.
The EMC Consistency Group for z/OS Product Guide and EMC
AutoSwap Product Guide provide information on these items.

SRDF/A Refer to the EMC SRDF Host Component for z/OS Product Guide for
information on configuring an SRDF/A environment. EMC GDDR is
compatible with SRDF Automated Recovery functionality.

EMC GDDR requirements 91


SRDF/A MSC has the following additional gatekeeper requirement:
◆ There must be one or more gatekeeper devices for each
MSC-controlled RDF group. These gatekeeper devices must be in
OS configuration as OFFLINE at IPL as regular local devices (not
BCV, SRDF, SAV, and so forth).

SRDF/Star Refer to the SRDF Host Component for z/OS Product Guide for
information on configuring an SRDF/Star environment.

SRDF/Star with AutoSwap
Refer to the SRDF Host Component for z/OS Product Guide for
information on configuring an SRDF/Star environment.
SRDF/Star with AutoSwap has the following additional
requirements:
◆ CAX protection must be added to the SRDF/Star-defined
ConGroup.
◆ LOSTOWNERPOLICY ONSWAP=OPERATOR must be specified.
The EMC Consistency Group for z/OS Product Guide and EMC
AutoSwap Product Guide provide information on these items.
In addition, there must be one or more gatekeeper devices for each
MSC-controlled RDF group. These gatekeeper devices must be:
◆ SRDF/S-protected (you can choose any available SRDF/S RDF
group)
◆ ConGroup and CAX-protected
◆ Not SRDF/A-protected
◆ If CKD, in OS configuration as ONLINE at IPL
◆ If FBA, in OS configuration as OFFLINE at IPL



EMC GDDR fundamental concepts

EMC GDDR control systems
The EMC GDDR control systems are more commonly referred to as
EMC GDDR C-Systems. One EMC GDDR C-System is located at each
site in a separate z/OS System. Each EMC GDDR C-System runs in
monoplex mode from local DASD. EMC GDDR C-Systems do not run
any production workload. EMC recommends that GDDR is installed
dedicated C-System DASD on separate controllers from the
production DASD.
Deploying GDDR's C Systems as dedicated z/OS systems insures
their availability for both planned and unplanned events effecting
application and/or site operations. Isolation of GDDR C-Systems
from the sysplex(s) insures GDDR operational isolation and provides
capabilities other automated business continuity solutions cannot
offer.
The main functions of an EMC GDDR C-System are to:
◆ Control the recovery after an outage
◆ Control a planned site swap
One of the C-Systems is designated as the Master C-System. During
normal operations, the Master C-System is the central control point
for all EMC GDDR activities. The Master C-System is located at the
secondary DASD site in configurations with AutoSwap and on the
primary DASD site in other configurations. All EMC GDDR
C-Systems are potential candidates to take over as the Master
C-System and if the Master C-System becomes unavailable for some
reason, a C-System at another location/site assumes the EMC GDDR
master function ownership. This way, either the Master-C system
survives a site loss, or the Master-C role is automatically transferred
to a surviving site. When the original Master C-System becomes
available, the master function ownership automatically transfers back
to the correct location.
Some EMC GDDR functions can only be carried out by the Master
C-System, for example:
◆ Running planned processes
◆ Updating EMC GDDR parameters

EMC GDDR fundamental concepts 93


Workload location In an EMC GDDR complex without AutoSwap, the business or
production workload runs at a single site; that is, one side of the
Sysplex. This is the same location as the primary DASD site.
In an EMC GDDR complex with AutoSwap, the business or
production workload can run as either a single site workload, usually
at the same location as the primary DASD site, or as a multi-site
workload, where the production workload runs at both the primary
and secondary sites.
In an EMC GDDR complex with an SRDF/A two-site configuration
the production systems and primary DASD must always be at the
same site.
A contingency or standby system is a system that provides backup to
a production system. The characteristics of a contingency system in
an EMC GDDR installation depend on the configuration.
Contingency or standby systems are typically located at the same
location as the secondary DASD. Multiple locations containing
contingency or standby systems may be used to increase availability
and provide disaster restart options. Regional contingency systems
are typically located in the same location as the secondary DASD,
while out-of-region standby systems provide protection from
geographic and infrastructure exposures that may negatively impact
the primary and secondary sites.
In an SRDF/S two-site configuration, a contingency system normally
provides a hot backup to the production system and is typically
located at the same location as the secondary DASD. The contingency
system:
◆ Is in the same Sysplex as its production system partner
◆ Is IPLed, but runs no business workload
In an SRDF/A two-site configuration, a contingency or standby
system is a system that replaces production system capacity in the
event of a loss of use of the primary site. It is located at an appropriate
distance from the primary systems to minimize risks from geographic
and infrastructure exposures which may negatively impact primary
systems availability. The contingency system:
◆ May be used for expendable workloads which are displaced by
business workload following the loss of a primary site.



◆ May be cold (not powered up), or warm (powered up but not
IPLed) systems reserved for business workload restart and
testing of restart processes, but not in support of any meaningful
day-to-day workload.
A contingency or standby system in a SRDF/Star configuration
normally provides a hot backup to a production system. A
contingency system:
◆ Is in the same Sysplex as its production system partner
◆ Is IPLed, but runs no business workload
◆ May be an out-of-region location equipped with idle processor
capacity reserved for support of business workload restarts
Any production or contingency/standby system defined to EMC
GDDR is known as an EMC GDDR managed system.

Managed workloads EMC GDDR can trigger the stop and restart of production workloads
on:
◆ z/OS systems
◆ Distributed systems

External workloads External workloads run on mainframe systems which do not have
their DASD in the managed Symmetrix units. EMC GDDR can
coordinate stop and start of the workload on these "non-managed"
mainframe systems with the workload stop and start actions for
managed systems.

Excluded systems Excluded systems apply to GDDR configurations with AutoSwap
only. Symmetrix arrays can be shared by multiple systems and some
of those systems may not require EMC GDDR protection. EMC
GDDR can be configured to exclude certain z/OS systems from
workload management - although these systems have their DASD in
the managed Symmetrix arrays. Because of this, these systems must
be running AutoSwap, or else they would fail during GDDR
managed site swaps.

HMC-Bypass option Applies at the LPAR-level or the Site level and prevents GDDR from
performing ANY action that would require HMC access for the
relevant site or LPAR. Goal of this option is to allow GDDR to be
deployed at sites under a third-party Facilities Management contract,
where the third-party has exclusive access to the HMC, and this
HMC-access by individual customers using the provided facilities is
not allowed.



HMC-only systems EMC GDDR can be configured to restrict IPL and CBU (Capacity
Backup Upgrade) actions for certain systems to the online interface.
No other actions or automation are performed for these systems.

EMC GDDR processes
An EMC GDDR process or script is a dynamically determined
sequence of function calls to achieve a predetermined result.
Generally one function call corresponds to one action. An EMC
GDDR process is executed by calling EMC GDDR provided routines,
either from a batch job or as a result of specific messages being
issued. There are two types of EMC GDDR processes:

Planned process An EMC GDDR planned process is initiated through the EMC GDDR
interface to perform a planned task.

Unplanned process or Takeover process
The EMC GDDR unplanned process or takeover process can only be
initiated following an error that results in a possible takeover
situation. Takeover processes are initiated as a result of certain
messages being issued or specific events occurring.
The messages or events that trigger an unplanned or takeover process
can originate on any system, either a C-System or a production
system. In response to a trigger event the current Master C-System,
following operator confirmation of any trigger event, will invoke the
necessary processes based on the current EMC GDDR complex and
managed system configurations. Processes are invoked following
operator confirmation of any of the following types of failure or loss:
◆ Sites
◆ DASD
◆ Systems
◆ Loss of SRDF link
◆ Loss of host channels

EMC GDDR process restart
In the event an EMC GDDR process does not complete properly the
return codes from the function calls that make up an EMC GDDR
process are saved in EMC GDDR global variables. For functions that
issue EMC SRDF Host Component commands, the return code of the



commands are also saved. If multiple commands are issued from one
function, the return codes from each command are saved in EMC
GDDR global variables.
After the cause of the original failure has been identified and
resolved, the EMC GDDR process can be rerun. EMC GDDR uses the
saved return codes to establish the point of restart; that is, the point of
the previous failure. This ensures that no modifications to the
supplied EMC GDDR process jobs are required in order to rerun after
a failure. If the underlying problem causing a GDDR function to fail,
has been resolved in a way that makes the GDDR function call
unnecessary, the GDDR interface provides an easy way to skip the
failed function call. If necessary, restart can be forced in a particular
step by manually setting the EMC GDDR system variables
appropriately.



EMC GDDR components
The EMC GDDR software environment includes a many components
controlled by a parameter library. Parameters define the EMC GDDR
managed configuration. Some parameters are maintained in external
parmlib members, but most are maintained via the EMC GDDR user
interface and loaded into EMC GDDR global variables by the
parameter activation process. The user interface is an ISPF
application available only on the EMC GDDR C-Systems. There are
events in which EMC GDDR is interested and message rules that
determine the actions to be taken in response to those events. Most
GDDR procedures run as started tasks and include EMC foundation
products such as ResourcePak Base and SRDF Host Component.

Parameters
Internal parameters
The GDDR Parameter Wizard is the control interface for managing
GDDR parameters, it should only be used should only be used on the
C-system which is intended to be the Master C-System for the
managed configuration. During the activation process, the new
parameters are loaded into EMC GDDR global variables on the
Master C-System and propagated to the remaining C-Systems using
GDDR peer-to-peer communication.



The basic process of updating EMC GDDR parameters consists of the
following steps:
◆ Back up the existing parameters. The parameter management
process uses an existing backup of parameter values as the
starting point for subsequent updates.
◆ Select the backup as input.
◆ Modify parameters as required. The updates are captured in PDS
members in a work dataset.
◆ Validate the parameter set. This is done via a batch job submitted
from the user interface.
◆ Activate the parameter set. This is also a batch job submitted from
the user interface.

User interface The EMC GDDR user interface is an ISPF application. It is used only
on the EMC GDDR C-Systems.

System components
The EMC GDDR SAMPLIB contains an exec called GDDREXC that
can be customized to allocate the required EMC GDDR and libraries.
Once customized, this exec can be integrated into a site's ISPF panels
in accordance with site standards. EMC GDDR panels will display
only the options the user is authorized to use.
The GDDR ISPF user interface does not currently support operating
in both screens when in split screen mode.

EMC GDDR components 99


Primary Options Menu
Invoking EMC GDDR produces the Primary Options Menu, as
shown in Figure 30 on page 100.

Figure 30 Primary Options Menu

The Primary Options Menu displays information on the current state
of EMC GDDR in the environment. It shows the name of the
C-System and site, which C-System is currently Master, which site is
the primary processing site, which site is the primary DASD site,
whether EMC GDDR automation is currently enabled, and the name
of the planned or unplanned script, if any, that is currently in process.
From this screen, EMC GDDR automation can be toggled on and off
using the commands GDDR ON and GDDR OFF. This menu also
provides access to the following functions:



◆ Update Personal GDDR ISPF Profile allows each user to change
GDDR ISPF profile variable values associated with their TSO ID.
This includes items like datasets required to submit scripts, and
job card information. Each user must enter their profile updates
before performing any setup and maintenance functions.
◆ GDDR Setup and Maintenance — Provides options to manage
EMC GDDR internal parameters, system variables and system
options.
◆ View GDDR configuration — Displays EMC GDDR configuration
details for sites, features, and C-Systems.
◆ Manage Site Roles — Allows manual changes to be made to site
roles, including transferring the Master role to a different
C-System, changing the AutoSwap owner, and moving the MSC
control function.
◆ Perform pre-script checkup — Displays any active events, shows
the status of the EMC GDDR internal command queue, and
performs communication checks to validate that all production
systems and C-Systems are active and all inter-systems
communications are enabled.
◆ Run GDDR Scripts — Allows the user to run EMC GDDR
background scripts.
◆ View GDDR Script Statistics — Displays timing information for
recently-executed scripts.
◆ Perform GDDR Actions — Provides options to perform actions
such as HMC LPAR and CBU actions, HMC discovery, and
managing couple datasets.



Option P: Profile - Update Personal GDDR ISPF Profile
Choosing this option from the Primary Options Menu displays the
screen shown in Figure 31 on page 102.

Figure 31 Change GDDR ISPF Profile Variable Values

The GDDR Subsystem name cannot be changed but all other values
requested on the screen must be filled in to allow the user to perform
EMC GDDR functions like parameter maintenance and script
execution.



Option M: Maintenance - GDDR Setup and Maintenance
Figure 32 on page 103 shows the EMC GDDR Setup and Maintenance
Menu.

Figure 32 Setup and Maintenance

The GDDR ON and GDDR OFF commands can be used from this
menu to enable and disable EMC GDDR automation. This menu also
provides access to the following functions:
◆ Manage GDDR Parameters — Provides access to a series of
screens that are used to maintain the EMC GDDR internal
parameters described in 5.4.1.1 Internal parameters. These
screens will be presented in detail later in this TechBook.
◆ Message, Debug, and Trace options — Provides the ability to set
default debugging levels and override those values for specific
programs.



◆ Manage GDDR Internal Command Queue — Provides
information on the command queue used by EMC GDDR to store
SRDF commands that have not yet been executed. This screen
can be used to alter EMC GDDR processing. EMC strongly
recommends that no action be taken on this screen unless
instructed to do so by EMC GDDR Solution Support.
◆ Perform HMC Discovery — Discovers the HMC objects at each
site. This function can be used to validate the HMC information
entered during EMC GDDR configuration as well as the EMC
GDDR HMC interface installation and setup.
◆ Refresh GDDR Message Table — Refreshes the GDDRMSG table
managed by GDDRMAIN
◆ Manage GDDR System Variables — Provides the ability to view
and modify all system variables used by EMC GDDR to describe
and manage the environment. Any updates done thru this panel
remain local to the current C-system and are not subject to any
validation.
In general, use of this screen should be avoided without guidance
from EMC GDDR Solution Support.



Option G: GDDR Config - View GDDR Configuration
Selecting this option displays a screen similar to Figure 33 on
page 105 with the information relevant to the EMC GDDR
configuration in use. No changes can be made on this screen.

Figure 33 View GDDR Configuration



Option R: Roles - Manage Site Roles
EMC GDDR script processing manages the location of the Master
C-System, the AutoSwap Owner system, and the MSC Control site as
part of normal operations. The Manage Site Roles panel is provided
to address exception conditions such as preparing for a script restart.
Figure 34 on page 106 shows the Manage Site Roles menu.

Figure 34 Manage Site Roles

The following choices are available:
◆ Transfer Master-C system function — Displays the sites in the
EMC GDDR complex and indicates the current and
recommended Master. The Master C-System function can be
transferred to another site.
◆ Move AutoSwap owner — Displays the sites involved in an
AutoSwap relationship and indicates the current and
recommended owner. AutoSwap ownership can be transferred to
another site.



◆ Move Primary MSC Server — Displays the sites eligible to
assume the MSC control function and indicates the current and
recommended owner. MSC control can be transferred to another
site.

Option C: Checkup - Perform Pre-Script Checkup
When this option is selected, EMC GDDR performs a series of health
checks and returns the results. A screen similar to Figure 35 on
page 107 is presented.

Figure 35 Perform Health Check

In this example, the following information is provided:
◆ There is an SRA and an MSC event
◆ GDDR is running in Degraded Mode
◆ The GDDMPARM member was found to be consistent



◆ We do not have STAR-HA (either it is not configured, or there is
no secondary msc server)
◆ The GDDR command queue is empty
◆ All C-system are operating normally from an HMC perspective,
but none of them have GDDRMAIN fully operational (some
components are inactive, a D (detail) line command is available
to drill-down on this situation)
◆ There was an HMC setup issue preventing system-status
determination for the 2 managed systems, and GDDRMAIN is
not active of the managed systems

Option S: Scripts - Run GDDR Scripts
This option can only be chosen on the Master C-System. It displays
the Select Script to Run screen, which is a dynamic display. It lists
EMC GDDR scripts that are eligible to run under the current
configuration, and also specifies configuration information and
scripts that are in progress, if any. Depending on the location of the
primary site and primary DASD locations, the panel is customized to
display valid choices. Scripts that cannot be used at the current site
do not appear. The Configuration field indicates the features of the
configuration that EMC GDDR takes into account when determining
which scripts are eligible to run and hence which are present in the
list of scripts displayed on the panel. Figure 36 on page 109 shows a
sample of the Scripts to Run screen.



Figure 36 Select Script to Run



Option T: Timing - View GDDR Script Statistics
Choosing this option displays a list of scripts that have run on this
C-System, as shown in Figure 37 on page 110.

Figure 37 View GDDR Script Statistics

Selecting a script displays step start and end times. If a script were
currently running, it would be displayed on the "Last planned:" or
"Last unplanned:" line.



Option A: Actions - Perform GDDR Actions
Figure 38 on page 111 is the sub-menu displayed when Option A is
chosen.

Figure 38 Actions Menu

From this menu, the following actions are available:
◆ Perform HMC Discovery — Discovers the HMC objects at each
site. This function can be used to validate the HMC information
entered during EMC GDDR configuration as well as the EMC
GDDR HMC interface installation and setup. This function is also
available from the GDDR Setup and Maintenance menu.
◆ Perform HMC LPAR actions — Provides the ability to carry out
actions against the EMC GDDR defined sites and systems
including activating or deactivating an LPAR, performing a load



clear, reset clear, stop or start for an LPAR, and querying an LPAR
for its status. This option can only be used on the Master
C-System.
◆ Perform HMC CBU actions — Presents a list of sites with
processors enabled for CBU (Capacity Backup Upgrade)
activation, if any. Action codes can be used to activate backup
capacity (REAL mode), simulate activation of backup capacity
(TEST mode), terminate actual backup capacity activation, or
terminate simulated backup capacity activation. These actions
can only be performed on the Master C-System.
◆ Manage Couple Datasets — Checks the couple dataset
configurations on all systems. Currently used couple datasets are
compared against those defined on the "Define Managed Couple
Datasets" panel during EMC GDDR parameter definition. If
discrepancies are found, SETXCF commands are issued to bring
couple dataset usage in line with EMC GDDR parameter
specifications. This function is provided to adjust couple datasets
when it is not appropriate to run a site swap script, which will
also align couple dataset usage with the defined parameters.

Events and rules An EMC GDDR event is a state change in a component part of the
environment that EMC GDDR is actively monitoring. Examples of
EMC GDDR events include:
◆ CGT — ConGroup trip has occurred/state change
◆ CGD — ConGroup group is disabled/state change
◆ SRA — SRDF/A link is down
◆ MHB — Missing C-System heartbeat
◆ LNK — Link to a C-System down
◆ RDF — RDF link down
◆ LDR — Local Disaster
◆ RDR — Regional Disaster
Events are stored in EMC GDDR global variables. An event can have
a state of either TRUE or FALSE. If the event has a state of TRUE, it
has occurred or is currently occurring. If the event has a state of
FALSE, it is no longer occurring. An event that is TRUE is considered
an exception. EMC GDDR events are used by the GDDR event
monitor and GDDR processes to determine environment state. A
change in state can then:



◆ Trigger unplanned/takeover processes
◆ Prevent a planned process from running
EMC GDDR is supplied with message interception rules These
message rules have two primary functions:
◆ To detect events that EMC GDDR is interested in and set the
appropriate EMC GDDR event TRUE or FALSE.
◆ To detect events that EMC GDDR processes have to wait for
(WTOR), and reply as to the success or failure of the waited for
event. This will determine if an EMC GDDR process proceeds or
terminates.
EMC GDDR uses intersystems communication to route message
traffic between production systems and C-systems. The arrival of a
message at the target production system can be used to trigger an
automation rule (for example in IBM Tivoli NetView or BMC
Control-M). Such rules can be used to start or shut down workloads
on the appropriate systems.

Procedures There are several JCL procedures used in an EMC GDDR
environment, though most are not part of EMC GDDR itself. They
include the tasks for EMC foundation products such as ResourcePak
Base and SRDF Host Component. The names presented here are the
vendor defaults and may not be the names used in any given
environment.
In case a customer environment does not support the BCPii HMC
interface, CA-OPS is required. Refer to the GDDR product guides for
more information.

EMC foundation products
Depending on the EMC GDDR configuration in use, there will be two
or three EMC foundation product tasks running. All environments
will run EMCSCF and EMCRDF, and environments including
SRDF/S will also run EMCCGRP:
◆ EMCSCF — ResourcePak Base or Symmetrix Control Facility
◆ EMCRDF — SRDF Host Component
◆ EMCCGRP — Consistency Group
For more information on the EMC foundation products refer to
Chapter 2 of this book or the EMC product documentation.



EMC GDDR procedures
These are the JCL procedures needed in an EMC GDDR environment:
◆ GDDRMAIN — Main EMC GDDR address space.
◆ GDDREVM — EMC GDDR Event Monitor. The event monitor is
used to analyze event state changes in which EMC GDDR is
interested. On detecting the occurrence of selected events, the
event monitor determines what action to take and prompts
operators with the appropriate choices via WTOR.
◆ GDDRHBM — EMC GDDR C-System heartbeat monitor. The
heartbeat monitor aids the event monitor in determining the
status of the EMC GDDR managed environment. The lack of a
heartbeat from a particular C-System is used to determine the
state of a C-System and the site.
◆ GDDRWORK — GDDR Workers. Some pieces of work required
for proper GDDR functioning will run under separate
GDDRWORK started tasks. This includes all GDDR HMC
functions, GDDR Command queue processing, parts of GDDR
scripts shipped to a remote C-system.
◆ GDDRPROC — Used to run scripts. This is the only EMC GDDR
procedure not used for a started task that is up and running at all
times.

Started task locations The started tasks that need to run on any given system in the EMC
GDDR complex depend on whether it is a managed production or
C-System, which site it is, and which configuration is in use.
GDDRMAIN runs on all C-systems in the GDDR complex. In
three-site configurations, it optionally does not run at DC3, if the
customer selects to run DC3 as a "lights-out" data centre.
GDDRMAIN also runs on all fully managed production systems.
GDDREVM, GDDRHBM and GDDRWORK run on all C-Systems at
all sites in the EMC GDDR complex. They are started automatically
by GDDRMAIN. They are not required on the productions systems.
EMCSCF runs on all C-Systems and all IPLed production systems at
all sites.
EMCRDF runs on all C-systems.



EMCCGRP runs on C-Systems and one or more production systems
at the primary DASD site when SRDF/S is in use. It runs on all
systems both at DC1 and DC2 in AutoSwap configurations.

EMC GDDR procedures 115

EMC GDDR supported scripts
Table 4 shows the mapping of script names with a short description
of the script function as well as the event type that causes the script to
be generated.

Table 4 Script names and descriptions (page 1 of 2)

Script name Description Event Type

GDDRPA51 Reconfigure to Concurrent RECONFIGURE
GDDRPA52 Reconfigure to Cascaded RECONFIGURE
GDDRPA53 Reconfigure to concurrent SRDF with Site Move RECONFIGURE

GDDRPA42 Swap production from to <S> (Protected) PLANNED
GDDRPA45 Swap production from to <S> (FAST) PLANNED
GDDRPA21 Swap DASD from to <S> (Protected) PLANNED
GDDRPA25 Swap DASD from to <S> - FAST PLANNED
GDD2P17A Abandon Site –(site swap) PLANNED
GDD2P18A Restart production at <S> after site swap PLANNED
GDD2P01A Perform test IPL from BCVs at <tgt site> PLANNED
GDDRPA27 Perform test IPL from R2s at DC3 PLANNED
GDD2P03A Perform test IPL from R2s at <S> PLANNED

GDDRUP31 Recover after unplanned swap UNPLANNED
GDD2U09A Recover after unplanned swap or LDR UNPLANNED
GDD2U13A Recover after loss of (LDR) UNPLANNED
GDDRUP41 Recover after loss of <S> UNPLANNED
GDDRPA0A(1) Resume replication after unplanned swap UNPLANNED
GDDRPA0A(2) Resume replication after loss of UNPLANNED
GDDRPA0A(3) Resume replication after loss of <S> UNPLANNED
GDD2PA0A(4) Resume replication after unplanned swap UNPLANNED
GDD2PA0A(5) Resume replication after loss of UNPLANNED

Table 4 Script names and descriptions (page 2 of 2)

Script name Description Event Type

GDDRPA23 Resume ConGroup RESUME
GDD2P02A Resume after test IPL from BCVs at <tgt site> RESUME
GDDRPA28 Resume SRDF/A after test IPL at DC3 RESUME
GDDRPM29 Resume SRDF/A in MSC mode RESUME
GDDRPF29 Resume SRDF/A in STAR mode RESUME
GDD2P14A Resume replication after link failure RESUME
GDD2P16A Resume after test IPL from R2s at <S> RESUME

GDDRPAAB Abandon Sites DC1 and DC2 DC3 ONLY
GDDRPA05 Recover at DC3 after RDR at DC1 and DC2 DC3 ONLY
GDDRPA06 Restart Production LPARs at DC3 SRDFA to <tgt site> DC3 ONLY
GDDRPA07 Recover at DC3 after LDR at DC1 SRDFA to DC2 DC3 ONLY

Where:
◆ PLANNED — Result from operator interaction with GDDR
panels
◆ UNPLANNED — Result from unplanned events detected by the
GDDR Event Monitor (EVM)
◆ RESUME — Resume replication after some type of service
interruption (planned or unplanned)
◆ DC3 Only — These scripts only execute at DC3 for planned test
events or for unplanned disaster event management.

EMC GDDR supported scripts 117


Table 5 shows scripts mapped to the configuration type where the
script executes.

Table 5 Script mapping (page 1 of 2)
Script Name

AutoSwap
Concurrent Star with

Concurrent Star

Concurrent Star 2 Site

AutoSwap
Cascaded Star with

Cascaded Star

Cascaded Star with EDP

SRDF/A with MSC


SRDF/S
GDDRPA51  
GDDRPA52  
GDDRPA53 
GDDRPA42  
GDDRPA45 
GDDRPA21  
GDDRPA25 
GDD2P17A    
GDD2P18A    
GDD2P01A         
GDDRPA27    
GDD2P03A  
GDDRUP31  
GDD2U09A 
GDD2U13A    
GDD2U10A 
GDD2U12A 
GDDRUP41  
GDDRPA0A(1)   



Table 5 Script mapping (page 2 of 2)
Script Name

AutoSwap
Concurrent Star with

Concurrent Star

Concurrent Star 2 Site

AutoSwap
Cascaded Star with

Cascaded Star

Cascaded Star with EDP

SRDF/A with MSC


SRDF/S
GDDRPA0A(2)    
GDDRPA0A(3)  
GDD2PA0A(4) 
GDD2PA0A(5)   
GDDRPA23    
GDD2P02A         
GDDRPA28    
GDDRPM29      
GDDRPF29   
GDD2P14A 
GDD2P16A  
GDDRPAAB      
GDD2PA05      
GDD2PA06      
GDD2PA07     

GDDR for SRDF/S with ConGroup (Two sites - DC1 and DC2)
Planned script operations
◆ Abandon Site DC1 (site swap) — Shut down the single site
workload at the primary site in preparation for the restart of
processing at the secondary site:
• Stop the business workload at the primary DASD site



• Wait for the stop of all business applications
• Reset clear all production systems managed by EMC GDDR
◆ Restart production at DC2 after site swap — Restart the single site
workload after the 'Abandon Site DC1 (site swap)' script has
completed successfully:
• Attempt reset clear of all systems at the primary DASD site
• Activate CBU (if required)
• Activate all needed LPARs, including CFs at the secondary
DASD site
• Create a consistency point at the secondary DASD site
• Prepare the SRDF environment
• IPL all needed production systems
◆ Perform test IPL from BCVs at DC2 — IPL contingency systems at
site DC2 using BCV devices:
• Split BCVs; make them R/W
• Activate test LPARs using BCV volumes
• Start test business workload, if applicable
◆ Perform test IPL from R2s at DC2 — IPL contingency systems at
site DC2 using R2 devices:
• Confirm that SRDF/S has been stopped normally via a
ConGroup trip
• Activate LPARs using R2 volumes

Unplanned script operations
◆ Recover after loss of DC1 (LDR) — Restart the single site
workload after the GDDR Event Monitor has detected a
ConGroup trip and the GDDR Heartbeat Monitor has timed out.
These events occurring concurrently are viewed by EMC GDDR
as a local disaster:
• Confirm that a ConGroup trip occurred
• Confirm that SRDF links failed
• Confirm that a local disaster (LDR) event occurred
• Shut down applications at the primary site, if applicable
• Split BCVs and condition R2s at secondary site for restart
• Activate contingency systems



• Restart applications
◆ Resume replication after loss of DC1 — Resume the SRDF/S link
to the secondary site after a primary site disaster:
• Confirm SRDF/S links are down
• Split BCVs at the secondary site, if applicable
• Issue ConGroup cleanup and SRDF/S restart commands
• Reestablish BCVs at the secondary site

Resumption operations
◆ Resume after test IPL from BCVs at DC2 — Reset clear
contingency systems after a test at DC2:
• Stop test business workload, if applicable
• Reset clear test LPARs
• Reestablish the BCVs
◆ Resume after test IPL from R2s at DC2 — Reset clear contingency
systems after a test at DC2:
• Restart SRDF/S to DC2
◆ Resume replication after link failure — Restore the SRDF/S link
after a loss of the link:
• Stop ConGroup on all systems
• Issue ConGroup cleanup and restart commands

GDDR for SRDF/S with AutoSwap (Two sites - DC1 and DC2)
◆ Swap production from DC1 to DC2 — Swap the DASD and the
single-site workload from site DC1 to site DC2:
• Swap the DASD to the secondary DASD site (AutoSwap
followed by SRDF/S personality swap)



• Resume SRDF/S
• Restart the business workload
◆ Swap DASD from DC1 to DC2 — Swap the DASD only from site
DC1 to site DC2.
• Confirm that SRDF/S has been stopped normally via a
Congroup trip

◆ Recover after unplanned swap — Restart processing at the
secondary site:
• Stop business workload on the old primary site (if applicable)
• Split BCVs at the new primary site
• Move couple datasets to the new primary site (if applicable)
• Trigger restart of business workload at the new primary site
• Transfer Master C-System to the new secondary site
workload after the GDDR Event Monitor has detected a
ConGroup trip and the GDDR Heartbeat Monitor has timed out.
These events occurring concurrently are viewed by EMC GDDR
as a local disaster:
• Condition R2s at the secondary site for restart
• Split BCVs at the secondary site
• Manage Couple Facility structures and datasets



◆ Resume replication after unplanned swap — Resume the SRDF/S
link to the secondary site after a primary site disaster:
• Perform ConGroup cleanup, transfer AutoSwap ownership
and restart SRDF/S

◆ Resume after test IPL from R2s at DC2 — Reset clear contingency
systems after a test at DC2:
• Restart SRDF/S with AutoSwap to DC2
◆ Resume replication after link failure — Restore the SRDF/S link
after a loss of the link:

GDDR for SRDF/A (Two sites - DC1 and DC3)



DASD site
• Activate and load test LPARs using BCV volumes

◆ Recover after loss of DC1 (RDR) — Restart the single site
workload after the GDDR Event Monitor has detected an
SRDF/A link down event and the GDDR Heartbeat Monitor has
timed out. These events occurring concurrently are viewed by
EMC GDDR as a regional disaster:
• Confirm that an MSC drop occurred
• Confirm that SRDF links failed
• Confirm that a regional disaster (RDR) event occurred
• Split BCVs and condition R2s at secondary site for restart
◆ Resume replication after loss of DC1 — Resume the SRDF/A link
• Confirm SRDF/A links are down



• Issue MSC cleanup and SRDF/A restart commands

◆ Resume SRDF/A after link loss — Restore the SRDF/A link after
a loss of the link (planned or unplanned).

GDDR for SRDF/Star (Three sites)
Reconfiguration operations
◆ Reconfigure to concurrent SRDF — Reconfigure a Cascaded
SRDF/Star environment to a Concurrent SRDF/Star
environment. The result is that site DC1 is protected at site DC2
using SRDF/S, and DC1 is the source of the SRDF/A replication
to DC3. The workload continues at DC1:
• Stop SRDF/A and delete the SRDF/A relationship between
DC2 and DC3
• Perform a differential CREATEPAIR from DC1 to DC3
• Reestablish the SRDF/Star environment (DC1 to DC3)
◆ Reconfigure to cascaded SRDF — Reconfigure a Concurrent
SRDF/Star environment to a Cascaded SRDF/Star environment.
The result is that site DC1 is protected at site DC2 using SRDF/S,
and DC2 is the source of the SRDF/A replication to DC3. The
workload continues at DC1:
DC1 and DC3
• Reestablish the SRDF/Star environment (DC2 to DC3)




DASD site
• Confirm that SRDF/A has been stopped normally via an
SRDF/A PENDDROP

workload after the GDDR Event Monitor has detected a local
disaster (LDR) at the primary site. The following events
occurring concurrently are considered a local disaster: Loss of
SRDF/A, Loss of SRDF/S, or a ConGroup trip, or the primary
site GDDR Heartbeat Monitor has timed out:

Note: In the cascaded SRDF with ConGroup topology, the SRDF/A event is
not considered for LDR.




• Deactivate systems at old primary site
• Prepare the SRDF environment at secondary site
• Create a consistent point on BCVs at DC3
• Perform a differential CREATEPAIR between the new primary
site and DC3
• Start SRDF/A in MSC mode
• Reestablish the SRDF/Star environment
• Reestablish BCVs at DC3
• Activate production systems
◆ Recover after loss of DC2 — In environments which support
Cascaded SRDF mode, restore data protection for the primary
site should the secondary site suffer an outage resulting in loss of
SRDF/S and SRDF/A. This script is submitted by the GDDR
Event Monitor when a ConGroup trip is detected and the GDDR
Heartbeat Monitor at the secondary site has timed out. These
events occurring concurrently are viewed by EMC GDDR as a
secondary site disaster. Following the restoration of the
secondary site, the script 'Resume replication after loss of DC2' is
used to resume SRDF/S:
• Provide the option to stop primary site and secondary site
workloads since all remote protection is lost
• Split BCVs at DC3
• Remove the SRDF pairing between DC3 and the secondary
DASD site
• Perform a differential re-synchronization between the primary
site and DC3
• Start SRDF/A in concurrent MSC mode
• Provide the option to start primary site workloads



◆ Resume replication after loss of DC2 — In environments which
support Cascaded SRDF mode, resume the SRDF/S link to the
primary site after a secondary site disaster and restart SRDF/A in
SRDF/Star mode. The script appears in the 'Select Script to Run'
panel of the current Master C-System after completion of the
'Recover after loss of DC2' script:
• Split BCVs at the primary site, if applicable
• Reestablish BCVs at the primary site

◆ Resume ConGroup — Resume consistency group protection
following a ConGroup trip:
• Resume the consistency group.
• Reestablish the BCVs at the secondary DASD site, if
applicable.
◆ Resume SRDF/A after test IPL at DC3 — Restore the SRDF/A
link to DC3 (either from DC1 or DC2 depending upon where the
production workload is currently running) after a test on DC3:
• Reset clear all systems IPL’d during the test of DC3 (at DC3)
• Deactivate all LPARs previously activated for the test of DC3
(at DC3)
• Restart SRDF/Star to DC3
◆ Resume SRDF/A in MSC mode to DC3 — Restore the SRDF/A
link to DC3 in MSC mode (from either DC1 or DC2 depending
upon where the production workload is currently running) after
a planned or unplanned swap.
◆ Resume SRDF/A (SRDF/Star) to DC3 — Restore the SRDF/A
link to DC3 in SRDF/Star mode (from either DC1 or DC2) after a
planned or unplanned stop of SRDF/A.



DC3-specific operations
◆ Abandon Sites DC1 and DC2 — In the event of a perceived threat
to both DC1 and DC2, perform an orderly shutdown of DC1 and
DC2 and force the C-System at DC3 as the new EMC GDDR
Master C-System:
• Stop the business applications
• Create a consistency point on the DC3 BCVs
• Create a consistency point on the secondary DASD BCVs
• Stop ConGroup on all remaining systems
• Force the C-System at DC3 as the new EMC GDDR Master
C-System
◆ Recover at DC3 after RDR at DC1 and DC2 — In the event of a
major failure that prevents the production workload from being
run from either DC1 or DC2, restart the production LPARs at site
DC3 and reestablish SRDF/A to site DC1 or DC2. There must be
an SRDF/A connection to either DC1 or DC2:
• Attempt reset clear of all systems at DC1/DC2
• Activate all needed LPARs
• Create a consistency point at DC3
• Prepare SRDF environment
◆ Restart production LPARs at DC3 SRDFA to DC1 or DC2 — In the
event of a major failure that prevents the production workload
from being run from either DC1 or DC2, restart the production
LPARs at site DC3 and reestablish SRDF/A to site DC1 or DC2:
• Attempt reset clear of all systems at DC1/DC2 (except
C-System)



• Perform a differential CREATEPAIR between DC3 and DC1 or
DC2
◆ Recover at DC3 after LDR at DC1 SRDFA to DC2 — In the event
of a local disaster (LDR) that prevents the production workload
from being run at DC1, and DC2 is a bunker site (storage only),
restart the production LPARs at site DC3 and reestablish SRDF/A
to DC2. The distinction between this script and the 'Restart
production LPARs at DC3 SRDFA to DC1 or DC2' scripts is, that
the 'Star C-Ahead' indicator is interrogated to enable the
workload to restart with the DC3 data, or with the DC2 data if it
is more current:
• Activate all needed LPARs including CFs at DC3
• Perform a differential CREATEPAIR between DC3 and DC2

GDDR for SRDF/Star with AutoSwap (Three sites)
Reconfiguration operations
◆ Reconfigure to concurrent SRDF — Reconfigure a Cascaded
SRDF/Star with AutoSwap environment to a Concurrent
SRDF/Star with AutoSwap environment. The result is that site
DC1 is protected at site DC2 using SRDF/S and AutoSwap, and
DC1 is the source of the SRDF/A replication to DC3. The
DC2 and DC3
• Reestablish the SRDF/Star with AutoSwap environment (DC1
to DC3)
◆ Reconfigure to concurrent SRDF with site move — Reconfigure a
Cascaded SRDF/Star with AutoSwap environment to
Concurrent SRDF/Star with AutoSwap with a primary
processing site move. The initial state is that the workload at DC1



is protected at DC2 with SRDF/S. The result is that DC2 is
protected at DC1 using SRDF/S and AutoSwap, and DC2 is the
source of the SRDF/A replication to DC3. The workload will be
running at DC2:
• Stop the business workload at site DC1
• Swap the DASD to DC2 (AutoSwap followed by SRDF/S
personality swap)
• Resume SRDF/S (DC2 to DC1)
to DC3)
◆ Reconfigure to cascaded SRDF — Reconfigure a Concurrent
SRDF/Star with AutoSwap environment to a Cascaded
SRDF/Star with AutoSwap environment. The result is that site
DC1 is protected at site DC2 using SRDF/S and AutoSwap, and
DC2 is the source of the SRDF/A replication to DC3. The
DC1 and DC3
to DC3)

◆ Swap production from DC1 to DC2 — Swap the DASD and the
single-site workload from the primary DASD site to the
secondary DASD site. In environments with geographically
dispersed Sysplexes supported by cross-site host-DASD
channels, Couple Dataset and Coupling Facility Structure
management is performed as a result of a site move, either
planned or unplanned. These actions are controlled by the
Realign Couple Datasets and Rebuild Couple Facility Structure
parameters:
• Swap the DASD to the secondary DASD site (AutoSwap
followed by SRDF/S personality swap)
• Resume SRDF/S



• Reestablish the SRDF/Star with AutoSwap environment
◆ Swap DASD from DC1 to DC2 — Swap only the DASD from the
primary DASD site to the secondary DASD site.
• Confirm that SRDF/A has been stopped normally via an
SRDF/A PENDDROP

◆ Recover after unplanned swap — Restart processing at the
secondary site after an unplanned swap has completed
successfully.
• Stop business workload on the old primary site (if applicable)
• Trigger restart of business workload at the new primary site (if
applicable)
• Remove SRDF pairing between DC3 and the primary DASD
site
DASD site and DC3
workload after the GDDR Event Monitor has detected a local
disaster (LDR) at the primary site. The following events
occurring concurrently are considered a local disaster: Loss of
SRDF/A, Loss of SRDF/S, or a ConGroup trip, or the primary
site GDDR Heartbeat Monitor has timed out:



• Deactivate systems at old primary site
• Prepare the SRDF environment at secondary site
• Create a consistent point on BCVs at DC3
site and DC3
• Reestablish the SRDF/Star environment
• Activate production systems
◆ Recover after loss of DC2 — In environments which support
Cascaded SRDF mode, restore data protection for the primary
site should the secondary site suffer an outage resulting in loss of
SRDF/S and SRDF/A. This script is submitted by the GDDR
Event Monitor when a ConGroup trip is detected and the GDDR
Heartbeat Monitor at the secondary site has timed out. These
events occurring concurrently are viewed by EMC GDDR as a
secondary site disaster. Following the restoration of the
secondary site, the script 'Resume replication after loss DC2' is
used to resume SRDF/S:
• Provide the option to stop primary site and secondary site
workloads since all remote protection is lost
• Split BCVs at DC3
• Remove the SRDF pairing between DC3 and the secondary
DASD site
• Perform a differential re-synchronization between the primary
site and DC3
• Start SRDF/A in concurrent MSC mode
• Provide the option to start primary site workloads
◆ Resume replication after unplanned swap — Resume the SRDF/S
link to the secondary DASD site after an unplanned swap (due to
the loss of the primary DASD). The script appears in the 'Select
Script to Run' panel of the current Master C-System after
completion of the 'Recover after unplanned swap' script:
• Remove the SRDF pairing between the secondary DASD site
and the primary DASD site



• Clean ConGroup bits from secondary DASD
• Resume SRDF/S (from current primary DASD site to the
previous primary DASD site)
• Put SRDF/A into SRDF/Star with AutoSwap mode
to the secondary site after a primary site disaster and restart
SRDF/A in SRDF/Star with AutoSwap mode. The script appears
in the 'Select Script to Run' panel of the current Master C-System
after completion of the 'Recover after loss of DC1 (LDR)' script:
◆ Resume replication after loss of DC2 — In environments which
support Cascaded SRDF mode, resume the SRDF/S link to the
primary site after a secondary site disaster and restart SRDF/A in
SRDF/Star with AutoSwap mode. The script appears in the
'Select Script to Run' panel of the current Master C-System after
completion of the 'Recover after loss of DC2' script:
• Split BCVs at the primary site, if applicable
• Reestablish BCVs at the primary site

◆ Resume ConGroup — Resume consistency group protection
following a ConGroup trip:
• Resume the consistency group
• Re-establish the BCVs at the secondary DASD site, if
applicable



◆ Resume SRDF/A after test IPL at DC3 — Restore the SRDF/A
link to DC3 (either from DC1 or DC2 depending upon where the
production workload is currently running) after a test on DC3:
• Reset clear all systems IPLed during the test of DC3 (at DC3)
• Deactivate all LPARs previously activated for the test of DC3
(at DC3)
• Restart SRDF/Star with AutoSwap to DC3
◆ Resume SRDF/A in MSC mode to DC3 — Restore the SRDF/A
link to DC3 in MSC mode (from either DC1 or DC2 depending
upon where the production workload is currently running) after
a planned or unplanned swap.
◆ Resume SRDF/A (SRDF/Star AutoSwap) to DC3 — Restore the
SRDF/A link to DC3 in SRDF/Star with AutoSwap mode (from
either DC1 or DC2) after a planned or unplanned stop of
SRDF/A.

DC3-specific operations
◆ Abandon Sites DC1 and DC2 — In the event of a perceived threat
to both DC1 and DC2, perform an orderly shutdown of DC1 and
DC2 and force the C-System at DC3 as the new EMC GDDR
Master C-System:
• Stop the business applications
• Create a consistency point on the DC3 BCVs
• Create a consistency point on the secondary DASD BCVs
• Stop SRDF/Star with AutoSwap
• Stop ConGroup on all remaining systems
• Force the C-System at DC3 as the new EMC GDDR Master
C-System
◆ Recover at DC3 after RDR at DC1 and DC2 — In the event of a
major failure that prevents the production workload from being
run from either DC1 or DC2, restart the production LPARs at site
DC3 and reestablish SRDF/A to site DC1 or DC2. There must be
an SRDF/A connection to either DC1 or DC2:
• Attempt reset clear of all systems at DC1/DC2



◆ Restart production LPARs at DC3 SRDFA to DC1 or DC2 — In the
event of a major failure that prevents the production workload
from being run from either DC1 or DC2, restart the production
LPARs at site DC3 and reestablish SRDF/A to site DC1 or DC2:
• Attempt reset clear of all systems at DC1/DC2 (except
C-System)
• Perform a differential CREATEPAIR between DC3 and DC1 or
DC2
◆ Recover at DC3 after LDR at DC1 SRDFA to DC2 — In the event
of a local disaster (LDR) that prevents the production workload
from being run at DC1, and DC2 is a bunker site (storage only),
restart the production LPARs at site DC3 and reestablish SRDF/A
to DC2. The distinction between this script and the 'Restart
production LPARs at DC3 SRDFA to DC1 or DC2' scripts is that
the 'Star C-Ahead' indicator is interrogated to enable the
workload to restart with the DC3 data, or with the DC2 data if it
is more current:
• Activate all needed LPARs including CFs at DC3
• Perform a differential CREATEPAIR between DC3 and DC2



The remainder of the TechBook will describe the lab environment,
document the installation and configuration of GDDR, and document
running scripts with more detail on the steps executed in the scripts.


5
GDDR Managed
Systems Design
Considerations

◆ Overview ........................................................................................... 140
◆ Ignore, Reset Clear, or IPL in place................................................ 142
◆ Contingency systems and workload movement ......................... 143
◆ HMC-Only systems (new in GDDR 3.1)....................................... 144
◆ LPAR and CEC recovery ................................................................. 146
◆ Automated Configuration Check - DASD ................................... 149

GDDR Managed Systems Design Considerations 139

GDDR Managed Systems Design Considerations

Overview
One of the many decisions that must be made as part of a GDDR
implementation is whether or not customers want GDDR to manage
production systems, production LPARs and/or production CPCs
from GDDR. It is important to understand that systems
management in GDDR is optional. Some customers choose to have
GDDR automation extend into managing zSeries systems, LPARs and
workloads, other customers want some systems managed by GDDR
and others excluded and finally there are a set of customers who do
not want GDDR to do anything to their environment other than to
manage EMC infrastructure software such as SRDF, TimeFinder and
AutoSwap. The GDDR systems management functions are flexible
and can accommodate this wide ranging set of customer operational
requirements.
If a decision is made for GDDR to manage system recovery beyond
just the storage layer, a further decision needs to be made whether to
include systems management into the scope of the implementation
project or defer systems management to a later date. Deferring
systems management to a second phase of implementation does
make the implementation project simpler and shorter. When a
decision is made to implement GDDR systems management, the
following features/functions must be considered for each system to
be managed by GDDR:
◆ Ignore, Reset Clear or IPL in place
◆ Contingency systems and workload movement
◆ HMC Only
◆ LPAR and CPC recovery
Production systems under GDDR control are known as P-Systems or
sometimes “managed systems.” These are the customer systems that
run application workload.
Commencing with GDDR 3.1 EMC began introducing more
functionality into CEC, LPAR and system recovery options. Some
customers use GDDR to manage their P-Systems and others chose to
manage these systems themselves. GDDR has been architected to
provide significant flexibility when it comes to P-System
management.



For customers that have more complex environments it helps to build
a GDDR managed system recovery map similar to Table 6 on
page 141. This documents the actions that GDDR is to take for event
management on any of the managed systems.

Table 6 GDDR managed system recover map

LPAR NAME E01 E02 E03 E04 F01 F02 F03

Location DC1 DC1 DC1 DC1 DC2 DC2 DC2

CPC SYSTEM E SYSTEM E SYSTEM E SYSTEM E SYSTEM E SYSTEM E SYSTEM E
IBM IBM IBM IBM IBM IBM IBM
2096-S07-R04 2096-S07-R04 2096-S07-R04 2096-S07-R 2096-S07-R04 2096-S07-R 2096-S07-R0
#012345 #012345 #012345 04 #012345 #6789A 04 #6789A 4 #6789A

Workload Production Production Development Test Production Production Development
type

Sysplex Prod Prod Prod Prod
Name

System MVSA MVSB MVSC MVSD MVSE MVSF MVSG
Name

Desired UP UP UP UP UP UP UP
State

Recover Yes Yes Yes No Yes Yes Yes

Recovery Workload Workload Recovery Reset Clear IPL in place
type Move Move LPAR

Recovery F01 F02 F03
location

Activate Y Y N
CBU

Overview 141


Ignore, Reset Clear, or IPL in place
When a system is defined to GDDR systems management a decision
must be made regarding GDDR treatment of a failure of the defined
system. The options are:
◆ Ignore — The system is defined to GDDR but this instructs
GDDR to take no action if GDDR observes that this system has
failed. Failure is typically “noticed” by GDDR when the EMC
SCF product “de- registers” the system typically because of
polling time-out.
◆ Reset Clear — If GDDR detects that the system has crashed (such
as SCF de-registration), GDDR will access the HMC and RESET
CLEAR the LPAR but take no other action.
◆ IPL in place— If GDDR detects that the system has crashed,
GDDR will access the HMC, RESET CLEAR the system and then
IPL in place.



Contingency systems and workload movement
Contingency systems are “hot standby” systems that are IPLed and
active and (typically) are located in a different data center. Often,
Contingency Systems are the “remote half” of a geographically
separated, single site workload, parallel Sysplex. In the event of a
primary site failure (for example system MVSA fails in LPAR E01 in
the chart above) GDDR will trigger workload restart in system MVSE
at site DC2 in LPAR F01. GDDR does this by sending message
GDDX191I to appear on the MVSE console. A message rule must be
written and executed for MVSE to intercept the GDDX191I message
and fire a REXX rule for that system to carry out the actions necessary
to start the production workload. This message interception REXX
rule would be written for whatever automation package runs on
MVSE.

Contingency systems and workload movement 143


HMC-Only systems (new in GDDR 3.1)
For systems defined as “HMC-Only” GDDR can be used in place of
native access to the HMC to perform system options. Interaction with
HMC-Only systems is obtained through the “LPAR Actions” panel,
shown in Figure 39 on page 144, by choosing option “L”.

Figure 39 GDDR Actions Menu

After choosing option “L” a screen similar to Figure 40 on page 145, is
displayed. Systems defined as HMC Only will appear. The “T”
column below shows the GDDR system type. OS in the display
indicates that this is a “HMC Only” system with Standard IPL Parms.
For a full description of fields in this display, refer to the GDDR
product guides.



Figure 40 GDDR Perform HMC LPAR Actions

All of the functions displayed on this screen are protected by SAF
profiles. This offers more controls than are available from direct HMC
access. Customers choose to add “HMC Only” systems so that they
have more control over access controls to these powerful system
commands.

HMC-Only systems (new in GDDR 3.1) 145


LPAR and CEC recovery
GDDR can tolerate the loss of a C-System. In the event of a C-System
failure GDDR will report “degraded mode” errors periodically until
the C-System rejoins the GDDR complex. In the case of the loss of a
GDDR Master C-System, another GDDR C-System will take over the
master function. Again, “degraded mode” will be reported until the
missing C-System returns. LPAR Recovery can be specified for GDDR
C-Systems which may help minimize the amount of time GDDR
operates in “degraded mode.”
For production systems, the following options are available:
◆ Ignore
◆ IPL in place
◆ Reset Clear
◆ Start workload in contingency systems.
The “desired state” indicator can be useful if a system needs to be
down for maintenance ensuring that GDDR will not kick into action.
If SCF is topped normally on that system GDDR will not kick into
action anyway, as the SCF controls the de-registration process and
GDDR will know that the system has been shut down in a controlled
manner.
For managed systems, the location indicator is either set to “H”
indicating “H”ome location for this system or “A” indicating
“A”way, meaning that this system is operating in its recovery
location. It is possible that the Home and Away locations are in the
same data center, but commonly the Away location is a different data
center.
For managed systems, the following should be considered:
◆ Multi-site workloads do not have CONTingency systems.
◆ A system can have LPAR-RECOVERY specified indicating that
the workload can be restarted in a different LPAR to where the
system is currently running. The recovery LPAR can be in the
same or different data center.
◆ Recovery LPARs defined to GDDR as REGULAR LPARs, but are
defined a second time, on the system Recovery Attributes panel,
as recovery for a protected systems



◆ A protected system can have a CONTingency system as a
Recovery LPAR
◆ Sysplex systems with more than one LPAR can have Recovery
LPARs defined, which could be in the primary or secondary data
center.
◆ So for systems protected with LPAR recovery, they are either
Home or Away but their SITE value does not change when they
are “Away”.
◆ Systems NOT protected by LPAR Recovery can (optionally) have
a second regular LPAR defined for them, on a different site. These
second LPARs could even be the home location of a system
protected with LPAR_Recovery.
◆ The difference between a Recover LPAR definition and a second
LPAR definition is subtle:
• The second LPAR definition is NOT considered as a recovery
option, and will never lead to a system running in that
location to be displaced.
• The SITE value for a system NOT protected with
LPAR_RECOVERY changes depending on whether it is
running in its first or second LPAR.
◆ LPAR Recovery for C-systems: avoid degrade mode when a
C-system is lost.
◆ GDDR LPAR Recovery supports repeated failures: LPAR in Away
location is protected with LPAR Recovery in Home location

Planned CPC Swap This is a new Planned Script, triggered from the “Perform HMC
LPAR Actions” panel.
While we have that panel up here, let’s mention a few other changes
to it which happened in 3.2:
◆ Action X LOAD Recovery: for a system protected with LPAR
recovery only
◆ E show events: queries GDDR global variables for events related
to a system
◆ T Toggle Desired state
◆ Site column: now indicates site of the LPAR where the system
currently runs
◆ T Type of IPL parameters: S: STD, A: Alternate B: BCV

LPAR and CEC recovery 147


◆ IPL-parms shown as “*” (not new) IPL parms are only shown if
their definition matches the current Primary DASD site



Automated Configuration Check - DASD
Traditionally GDDR parameter validation has been very much
limited to checking syntax and internal consistency, with relatively
few verifications against real world data.
GDDR 3.2 introduced the Automated Configuration Check—DASD
function with the GDDR ACCD utility.
One of the more challenging parts of defining an environment to
GDDR has always been to ensure correct definition of large and
complex SRDF configurations.
GDDRACCD removes the uncertainty about the correctness of an
existing GDDR SRDF device definition, by checking the real
configuration found in the GDDR managed EMC Symmetrix storage.

GDDR variables influencing the expected configuration
GDDRACCD works is by formulating a detailed expectation about
the configuration based on a whole series of GDDR global variables.
This expectation is then compared against what we find in the DMX.
5 phases:
Phase 1
Validation for internal consistency and exit if errors: no point
checking the DMX if we already know the parms are bad.
Phases 2 and 3
Query the box and report deviations.
Phase 4
Provides information allowing the user to verify what GDDRACCD
based its decisions on.
Phase 5
RDF.DEVICES parms allow comparison to existing parms, but
differences are to be expected:
◆ Real parms could be a subset of generated parms
◆ Ranges in real parms could for a split of generated ranges
◆ Gatekeepers will certainly be different

Automated Configuration Check - DASD 149

6
Storage Infrastructure
Design Considerations

◆ Storage configuration and layout .................................................. 152
◆ Infrastructure recommendations ................................................... 153
◆ EMC software levels ........................................................................ 158
◆ Sysplex and coupling facility considerations............................... 160

Storage Infrastructure Design Considerations 151

Storage Infrastructure Design Considerations

Storage configuration and layout
Symmetrix storage systems incorporate many standard features that
ensure a higher level of recoverability and availability than other
storage systems. They are configurable at the logical volume level so
different protection schemes can be applied to different classes of
data within the same Symmetrix storage system on the same physical
device. Customers choose data protection options, such as the
following, to match their data requirements:
◆ Mirroring (RAID 1) or RAID 10
◆ RAID 6 (6+2) and RAID 6 (14+2)
◆ RAID 5 (3+1) and RAID 5 (7+1)
◆ Symmetrix Remote Data Facility (SRDF)
◆ TimeFinder
◆ Dynamic Sparing
◆ Global Sparing



Infrastructure recommendations
There is benefit is standardizing configurations "best practices" across
the install base of GDDR worldwide. This makes the support
function more efficient in that support people will be able provide
guidance aided by common naming conventions and infrastructure
layout. Also, the recommendations in this section are designed to
minimize the overheads associated with large infrastructure
deployments. A good example is the number of device ranges.
Having very small device ranges increases the overheads associated
with infrastructure manipulations - consuming more C-System
memory, more worker tasks, more commands issued to the arrays
and so on. Engaging the GDDR Solution Support team very early in
your design phase will ensure an appropriate trade-off between
configuration flexibility and associated overheads.

Gatekeeper device list
A Gatekeeper device is a reserved device used to direct commands to
a particular array(s). Due to the volume of the commands during
certain activities, such as data center swaps, the volume of commands
to these devices can be quite high. GDDR does not require gatekeeper
devices for itself but some underlying technologies do require these
devices. For performance and to avoid potential conflicts, gatekeeper
devices for SRDF/S, SRDF/A, MSC/Star etc must be defined and
must be separate devices.
It is very important to understand the layout of gatekeeper devices
and which component uses gatekeepers. It is recommended to create
and maintain charts similar to these for each array in the solution.
This example shows a list of the SRDF Gatekeeper devices and RDF
groups for a SRDF/Star with AutoSwap configuration with one
VMAX frame located at each site. The following list of SRDF
Gatekeeper devices & RDF groups used at each location describes the
intended purpose for use in commands. The required Host
Component (H.C.) command prefix is also provided to help facilitate
command generation for each specific RDF group. This additional
information can be helpful in a configuration where multiple SRDF
Host Component tasks are present.

Infrastructure recommendations 153


DC1 Complex:

GDDRPARM Gatekeeper = 803D-8075 (00E1-0119)

H.C. Prefix ##

Customer-Specific-Name VMAX# 2455:

SRDF/ S - STAR: Group = 10

R/W - AD: Group = 21

SRDF/ A - Concurrent STAR::Group = 20

MSC/STA Gatekeeper = 8037 (00DB)

SRDFA/MSC Gatekeeper = 8076 (011A)

SCF Gatekeeper = 8038-803A (00DC-00DE)

CSC Gatekeeper = 803B-803C (00DF-00E0)

DC2 Complex:

GDDRPAR Gatekeeper = 903D-9075 (00E1-0119)

H.C. Prefix ##


SRDF/S-STAR: Group=10

STAR-Recovery: Group=30

MSC/STAR Gatekeeper=9037 (00DB)

SRDFA/MSC Gatekeeper=9076 (011A)

SCF Gatekeeper=9038-903A (00DC-00DE)

CSC Gatekeeper=903B-903C(00DF-00E0)



DC3 Complex:

GDDRPARM Gatekeeper=803D-8075 (00E1-0119)

H.C.Prefix ##


SRDF/A - Concurrent STAR: Group = 20

R/W - AD: Group = 21

STAR - Recovery: Group = 30

SRDFA/MSC Gatekeeper = 8076 (011A)

SCF Gatekeeper = 8038-803A (00DC-00DE)

CSC Gatekeeper = 803B-803C (00DF-00E0)

Started tasks names EMC recommends the following conventions are adopted for started
task names.
STCs in DC1
GDDR managed Production LPARs:
◆ EMCBASE
• SYSISVP.GDDR.PARMLIB(SCFINI00)
◆ EMCCGRP
• SYSISVP.GDDR.PARMLIB(CONGRP34)
◆ EMCSRDF
• SYSISVP.GDDR.PARMLIB(EMCCMD00)
◆ GDDRMAIN
• SYSISVP.GDDR.PARMLIB(GDDMPARM)
GDDR Control LPAR Customer-Specific-Name:
◆ GDDRSCF
◆ GDDRSRDF



◆ GDDRCGRP
◆ GDDRMAIN
◆ GDDREVM
• SYSISVP.GDDR.PARMLIB(GDDREVMP)
◆ GDDRHBM
• SYS1.GDDR.PARMLIB(GDDRHBMP)
◆ GDDRWORK
◆ GDDRPARMS
• SYSISVP.GDDR.PARMLIB
STCs in DC2
◆ GDDRSCF
◆ GDDRSRDF
◆ GDDRCGRP
◆ GDDRMAIN
◆ GDDREVM
◆ GDDRHBM
• SYSISVP.GDDR.PARMLIB(GDDRHBMP)
◆ GDDRWORK
◆ GDDRPARMS
• SYSISVP.GDDR.PARMLIB
STCs in DC3
◆ GDDRSCF



◆ GDDRSRDF
◆ GDDRMAIN
◆ GDDREVM
◆ GDDRWORK
◆ GDDRHBM
• SYSISVP.GDDR.PARMLIB(GDDRHBMP)

Important dataset names
The following is a list of the dataset names which are used by the
various LPARs for GDDR control. These are the names as supplied by
EMC, regardless of the naming convention used it is important to
ensure that any support staff are familiar with the naming
convention:
◆ SYSISVP.GDDR.*
◆ SYS1.GDDR.*
◆ SYS1.EMC.*



EMC software levels
Maintain a list of software levels, as shown in Table 7.

Table 7 Software levels

Customer-Specific-Name SOFTWARE LEVELS

Updated 8/17/2010

Mainframe Enablers Version Version - Fix Level

Resource Pack Base 7.x 7.x - SF7xxxx

Consistency Group 7.x 7.x - SC7xxxx

SRDF Host Component 7.x 7.x - SR7xxxx

TimeFinder Mirror 7.x 7.x - ST7xxxx

TimeFinder Snap 7.x 7.x - SN7xxxx

TimeFinder Utility 7.x 7.x - SU7xxxx

GDDR 3.x 3.x - GD3xxxx

Microcode

VMAX base code 5874.230.183

VMAX E-Packs RB/23 with 50853, 50854

DMX-3/DMX-4 base code 5773.155.107

E-Packs RB/53 with 49123, 49125

LPAR configuration specifics
These are guidelines/best practices for GDDR C-System LPAR
configurations.
Each GDDR LPAR will Require:
◆ 2 GB Storage
◆ 2 Shared CPs
◆ Minimum 25 Mod-9 DASD Units (or equivalent if different
volume sizes are used)



GDDR LPAR Configuration Restrictions and Considerations:
◆ GDDR Control LPAR DASD volumes are not shared with any
LPAR
◆ No Coupling Facility or Specialty Engine Requirements.
◆ No DR Requirements for GDDR Control LPARs.
◆ Initial GDDR LPAR to be modeled from CMCX LPAR
◆ Maintain recommended EMC Symmetrix Microcode Levels
◆ Initial Security Databases created from customer-environment
LPAR database
◆ LPAR Access Restricted to Infrastructure Support Staff
GDDR - Required Automation and Software Elements:
◆ z/OS 1.x Operating System
◆ CA/ACF2 Security
◆ EMC SRDF Product Suite
Example customer Infrastructure Management Components:
◆ IOC Console Operations Support / Automation (Tivoli Netview,
SA for z/OS)
◆ Storage (SMS, DFHSM DASD Space Mgmt, Tape Mgmt)
◆ Network (FTP, Firewall, TN3270, NDM)
◆ Systems Mgmt (Incident, Change (ECMS))
◆ Change Mgmt (Version and Change Control)
◆ SMF and Logrec Data Collection (Security, Billing, Performance)
◆ Output Archival / Retrieval / Spool Mgmt (product)
◆ Infrastructure Job Scheduling (Data Collection, Infrastructure
Backups
Cross System Communications (CSC) Requirements:
Update the SCFINIxx member in SYSISVP.GDDR.PARMLIB with the
following updates on all LPARs (PROD and GDDR C-Systems):
◆ SCF.CSC.IDLEPOLL=5
◆ SCF.CSC.EXPIRECYCLE=20
◆ SCF.CSC.MITPERIOD=60

EMC software levels 159


Sysplex and coupling facility considerations
GDDR operations, system management and enablement of data
consistency are compliant with z/OS Sysplex availability, design and
operations. GDDR extends Sysplex availability by addressing a wide
range of planned and unplanned operations for Sysplex systems,
DASD resources, Sysplex resources, Central Processors, processor
capacities and sites.
Sysplex systems require a single time source and synchronous
mirroring between the source DASD and recovery DASD.
Asynchronous mirroring does not support Sysplex operations,
therefore Sysplex linkage between asynchronous operational sites is a
not a consideration for sites using SRDF/A replication - either as a
two-site GDDR SRDF/A or to the 3rd site in STAR configurations.
Sysplex configuration is a key component of business continuity
capability of a proposed solution. The design of the Sysplex, its
required connectivity and the topology of shared Sysplex resources
has a role in the solution's redundancy, performance and business
continuity. Typically, solutions designed for higher availability and
continuous availability have increasingly complex Sysplex
considerations. Whereas, GDDR CONGROUP solutions designed
for site switch and system restart require very modest Sysplex
considerations between the sites.
GDDR's C systems do not have any Sysplex operational requirement.
The C systems are stand alone systems outside any of the managed
Sysplexs. This isolates the C systems from Sysplex wide issues,
simplifies C system implementation and allows the C systems to
manage multiple Sysplexs simultaneously. It also allows
asynchronous mirroring solutions and some synchronous mirroring
solutions without any cross site required Sysplex connectivity
without degradation of Recovery Time Objectives.
When data is shared within a Sysplex data consistency is presumed
and GDDR's architecture should align with Sysplex requirements for
data consistency. In most instances a Sysplex's mirrored DASD
resources required for the restart of systems and workload should
belong to a single CONGROUP. There is no limitation to the number
of Sysplexs encompassed by a single CONGROUP and managed by
GDDR.



Significant variations in the required Sysplex connectivity depend on
the Sysplex topology, the models of the z/series hardware, Sysplex
timing mechanisms and the desired level of business continuity.
Careful planning of the cross site linkages, supported distances,
sufficient redundancy and considerations for performance impact
must be part of the early design and project planning.
Typically, when architecting a multisite Sysplex, DWDMs are
employed to coalesce, aggregate and managed the Sysplex
connectivity as well as cross site DASD and mirroring connectivity.
Solutions including AutoSwap are easily facilitated by cross site
DASD connectivity over DWDMs. Furthermore, many of the z/series
linkage protocols can not employ channel extension, requiring dark
fiber or DWDM connectivity.

When the solution includes AutoSwap, regardless of the Sysplex
topology all DASD is required to stay up:
◆ Must be mirrored in SRDF/S
◆ Must be available to the systems at DC1 and DC2
◆ Must have all R1-devices on 1 site, and all R2-devices on the
opposite site
◆ If it has a UCB on the R1-side, must also have a UCB on the
R2-side
When employing Sysplex LOGR the Sysplex LOGGER data sets
should be mirrored with SRDF/S and SRDF/A in SRDF/Star
configurations.
Single Site Sysplex
All active production systems reside at primary site, the systems
comprising the Sysplex can not span DC1 and DC2. This is usually
representative of GDDR CONGROUP or GDDR SRDF/A solutions
where a complete Sysplex restart is required to restart systems and
workload at the recovery site.
No Sysplex connectivity is required between sites, Sysplex time is not
an issue, duplexing of structures between sites is not
indicated-structures would be rebuilt when the systems are IPLed at
the recovery site. Restart of systems can be facilitated by the
occasional mirroring of system resources including the Sysplex
couple data sets. The Sysplex couple data should not be mirrored
using SRDF/S and not in the CONGROUP (consider mirroring with

Sysplex and coupling facility considerations 161


Adaptive Copy). Alternatively, with minimal cross site DASD
connectivity the Sysplex couple data sets can be mirrored with
Sysplex couple data set duplexing.
In the event of a CONGROUP trip, GDDR doesn't attempt to stop all
systems to preserve the consistency of the target SRDF/S DASD and
the contents of the target duplex Coupling Facility structures. This
technique has only limited value during the restart of systems.
Multi-Site Sysplex
This architecture is consistent with solutions requiring high or
continuous levels of availability, when some subset of operations is
expected to continue in the event of systems, DASD, site failures or
multiple failures.
The z/series systems hosting z/OS systems in the multisite Sysplex
at both sites must share a single time consistency in a Common Time
Network (CTN). The CTN network topology, redundancy and
protocols depend up the timer technology employed. When
employing STP, placement of the PTS and BTS is an important
consideration.
Base Sysplex requires cross site CTC connectivity between all
z/series CPCs, single time consistency and common Sysplex couple
data sets.
Parallel Sysplex employs Coupling Facilities to facilitate system to
system communications. CFs can be Internal or External Coupling
Facilities and all systems require connectivity to the CFs, these
Sysplex links should be redundant. CFs should be deployed in a
manner consistent with the business continuity objectives. If
structures are duplexed these links should be redundant.
AutoSwap is usually included in the implementation to eliminate
DASD as a single point of failure. If so, all DASD required for
operations must be mirrored. The Sysplex couple data sets must not
be mirrored with SRDF/S and must rely upon Sysplex couple data
set duplexing. Cross site DASD connectivity consistent with cross
site operations is typically a consideration.


7
Best Practices for
Network Design

◆ Overview ........................................................................................... 164
◆ Understanding bandwidth requirements..................................... 165
◆ Synchronous network design considerations .............................. 168
◆ Asynchronous network design considerations ........................... 172

Best Practices for Network Design 163

Best Practices for Network Design

Overview
After the synchronous and/or asynchronous bandwidth assessment
has been conducted attention turns to the design and construction of
the network. There are many different types of networks but for this
discussion we will differentiate between networks designed to carry
synchronous workloads and those designed to carry asynchronous
workloads.
Delays in the provisioning, debugging and load testing of network
infrastructure are one of the most common causes of delay for GDDR
projects. Remember the constraint framework for the project, if your
network provisioning is late, be prepared to adjust the
implementation timelines around new and realistic dates.



Understanding bandwidth requirements
Before a network topology can be designed the data change rate must
be measured. This change rate will be used to determine both
synchronous and asynchronous network bandwidth requirements.
EMC TS has toolsets and methods that are used to measure existing
I/O flows to construct data replication requirements over time.
Replication requirements not only vary through out the 24 hour day
(batch vs online) but also throughout the month year, with most
enterprises having at least one peak processing period. Therefore the
data sampling for modeling must span a critical time period or at
least be “factored” from the observation period based on the known
difference between the observation period and the peak period.
Both SRDF/S and SRDF/A use block level I/O for replication,
therefore the collected data can be used for both modeling exercises.
The actual bandwidth requirement will typically vary between the
two types of networks (even for combined Synchronous and
Asynchronous). For HA networks page volumes and temporary data
sets are included in the replication set; for asynchronous networks
these data types are typically excluded since in the event that the DR
site copy is required the contents of the page volumes and temporary
data set volumes are not required for system restart.

Note: Figure 41 on page 166 shows a measured workload with significant
write I/O imbalance across the three Symmetrix. Part of the deployment of
the project will be to move workloads between the arrays to achieve a more
balanced I/O profile.

Understanding bandwidth requirements 165


Figure 41 Write distribution

The raw (uncompressed) bandwidth requirement is determined by
multiplying the number of writes observed in a specific interval by
the size of the write. Earlier design tools use the Connect Time to
determine transmission interval and then calculate blocksize. This
method does not work for FICON attached arrays so therefore an
average block size is determined by dividing the total bytes
transferred in the time interval by the number of writes observed.
The method also varies based on data source, such as SMF records
versus internal EMC Symmetrix STP data.
For synchronous workloads, performance is critical. Therefore this
analysis includes a check of “hot volumes” —if detected—a list of
volumes is produced where the overhead of synchronous replication
is likely to be problematic. Further analysis of these volumes is
required to determine the contents of the volume and identify
mitigation techniques, which may include:
◆ Re-blocking data sets



◆ Separating hot data sets
◆ Deploying SMS data striping
◆ Changing ACS routines
◆ Balancing workload across Symmetrix arrays
At the conclusion of the study period a report is produced which will
include a profile of the observed bandwidth requirement as shown in
Figure 42 on page 167.

Note: Many customers are accustomed to working with their local account
teams for the sizing of infrastructure deployments. Arguably though, your
account team will not have experience designing for this type of complex
solution. The GDDR Solution Support Team can help to direct EMC
Corporate resources to ensure that this vital piece of supporting
infrastructure meets best practice standards.

Figure 42 Bandwidth requirement

Understanding bandwidth requirements 167


Synchronous network design considerations
All array-based synchronous data replication creates an overhead on
write response times. For this reason it is absolutely essential to
understand the impact of synchronous data replication before
building out any infrastructure. EMC has sophisticated modeling
tools which can be used to apply the overheads of SRDF/S
synchronous mirroring at zero distance, and then apply latency into
the model as the distance between the primary (R1) and secondary
(R2) storage are increased. At some point in these modeling studies
the maximum amount of separation between the R1 and R2 becomes
known when the increase I/O response time begins to slow write
throughput to the point that the observed write activity to the volume
can no longer be sustained. Post implementation, SRDF/S
overheads can be seen as “disconnect” time when looking at I/O
response times.
Figure 43 on page 169 shows the results of one part of a typical
synchronous design study output: DASD volumes sorted into
descending write activity. Looking more closely at volume CI9003, it
can be seen to have a peak write activity of 300 writes per second. In
effect this volume could have an average response time of 3.3
milliseconds and sustain a throughput of 300 writes per second. A
separate analysis (not show) reveals the average response time to this
volume as 0.5 milliseconds. By subtracting 0.5 from 3.3—reveals that
the overheads of synchronous replication cannot impose more than
3.3 – 0.5 = 2.8 milliseconds to this I/O before the application would



experience performance degradation (because the throughput
requirement of 300 writes cannot be maintained if the average
response time exceeds 3.3 milliseconds.

Figure 43 DASD write activity by volser

Applying this logic to all volumes in the study group produces a
chart similar to Figure 44 on page 170. The chart shows the existing
throughput requirement (in blue) and the likely throughput
requirement (in red) when synchronous overheads at the
customer-specific distance between the R1 and R2 devices is applied
to the existing write I/O response time.

Synchronous network design considerations 169


Figure 44 DASD Volumes Flagged for Tuning

Tuning for synchronous replication can be done well in advance of
the implementation of GDDR. The following techniques are typically
used to address performance issues:
◆ Use sequential data striping
◆ Change the blocksize of the data set write. Many of the overheads
associated with synchronous replication are fixed, such as the
speed of light problem. This is the case whether the block being
transmitted is small or large. Therefore, large block writes are
more efficient in terms of write throughput over time in
synchronous replication than smaller block sizes.
◆ Separate “hot” data sets — Two or more write I/O intensive data
sets can be moved from the same volser and separated.
◆ Change ACS routines — May be needed to ensure that separated
data sets remain separated.
Most customers deploy a “dark fiber” network to support
synchronous replication. The Network Terminating Equipment
(NTE) for these networks is usually a Dense Wave Division
Multiplexer (DWDM). DWDMs are able to concentrate multiple
wavelengths onto single fiber(s) laid between data centers. DWDMs
are very low latency so the greatest impact in terms of write I/O
performance is typically the effect of the speed-of-light transmission
of data through the fiber and is directly proportional to the distance
of the fiber route.



Another advantage to the DWDM approach is for customers looking
to build HA infrastructures. In addition to the Fibre Channel packets
generated by SRDF/S replication, the same fiber can carry FICON
signals necessary to support AutoSwap.
In some instances customers will build HA DASD infrastructures
where the R1 and R2 devices are in the same data center (perhaps
separated by a thermal firewall), in which case direct connect or
connect via cascading directors is common.

Synchronous network design considerations 171


Asynchronous network design considerations
Asynchronous replication is provided by SRDF/A. The interface to
SRDF/A is via EMC Multi-Session Consistency (MSC) software.
MSC provides an API interface used by GDDR to control both
SRDF/A running in “MSC mode” and also SRDF/A when it is
running in “Star Mode”. The GDDR for SRDF/A configuration (2
site) controls SRDF/A replication using the MSC API. The same MSC
API interface is used by GDDR to control GDDR for SRDF/Star
solutions (3 site). For more specific information on these control
interfaces, refer to the sections “SRDF/Star with AutoSwap” on
page 85, “GDDR for SRDF/S with ConGroup (Two sites - DC1 and
DC2)” on page 119, and “GDDR for SRDF/A (Two sites - DC1 and
DC3)” on page 123.
SRDF/A is designed as a cache-to-cache replication infrastructure. It
typically supports cycle times down to three second durations,
assuming that the network infrastructure is designed to carry the
workload. Since it is theoretically possible to lose up to 2 SRDF/A
cycles in a DR event, the RPO is calculated by multiplying the
SRDF/A cycle time by 2. Thus, a 3 second cycle time is said to
support an RPO of 6 seconds. It is important to understand that the
RPO is an objective. At times it may not be possible for the replication
infrastructure to maintain the objective time. Examples include:
program loops that generate large amounts of write I/O, unusually
high write activity such as during data base “reorgs,” a compromised
network (example: a failed switch). EMC has instituted a number of
measures to improve resiliency of SRDF/A replication:
◆ Transmit Idle — Temporarily suspend SRDF/A data
transmission to allow SRDF/A replication to survive temporary
link problems such as link “bounces”.
◆ Delta Set Extension (DSE) — Elongates the SRDF/A capture
cycle during longer duration temporary link problems, instead of
terminating SRDF/A replication. DSE uses a disk pool to buffer
the current capture cycle data until such time as it can be
de-staged across the link.
◆ Write Pacing — Slowing the arrival rate of write activity into the
capture cycle by accepting the write to cache, disconnecting for a
specified period of time (milliseconds) and then reconnecting to
complete the I/O.



None of these improvements should be assumed to be substitutes to
provisioning the network to the observed peak bandwidth
requirement. To build one or two year operational resiliency into the
network it should be provisioned at the peak observed rate plus the
annual rate of increase of observed throughput. This can be
calculated from the known growth in storage, assuming a growth
rate of 20% over the past 12 months the allowance for capacity
increase can be calculated as:
Bandwidth_growth = (Total_Write_IOs * I/O Size * 20%) /
Compression_Ratio
Many of the GDDR supported configurations are very large. The
designer must consider the scalability of the proposed network and
also “troubleshooting.” For mainframe workloads EMC has observed
FCIP networks to be reliable and scalable and relatively easy to
troubleshoot and tune. Consider operational complexity when
looking to tune or balance network workloads. Gig/E networks are
becoming more popular but do introduce more complexity for
workload balancing and, if used, WAN Optimizers can complicate
the network architecture and impact the throughput, latency and
scalability of the network.
EMC has observed the following type of network topology to be very
successful in large scale SRDF/A implementations, EMC therefore
recommends the following network topology for SRDF/A
replication. Figure 45 on page 174 shows 15 storage arrays in
three-sites (DC1, DC2 and DC3) deploying multiple switches to
aggregate bandwidth up to 10GB for long-distance transmission and
then splitting back through switches on the target side to the
Symmetrix at the remote location. Note how readily this architecture
allows for scale of communications paths, storage devices and
switches.

Asynchronous network design considerations 173


60

61 BI BI
60

62

61

63 10Gbps ULH
DC1 Bouh Router
62

64

63

65
BI BI

66

67

68

69 BI BI
68
6A

69
6B
DC2 10Gbps ULH
Bouh Router
6A
6C

6B
6D
BI BI

6E

SYM-002545

Figure 45 Recommended asynchronous network topology


Glossary

This glossary contains terms related to disk storage subsystems.
Many of these terms are used in this manual.

A
alternate track A track designated to contain data in place of a defective primary
track. See also ”primary track.”

actuator A set of access arms and their attached read/write heads, which
move as an independent component within a head and disk assembly
(HDA).

adapter Card that provides the physical interface between the director and
disk devices (SCSI adapter), director and parallel channels (Bus & Tag
adapter), director and serial channels (Serial adapter).

C
cache Random access electronic storage used to retain frequently used data
for faster access by the channel.

cache slot Unit of cache equivalent to one track.

channel director The component in the Symmetrix subsystem that interfaces between
the host channels and data storage. It transfers data between the
channel and cache.


Glossary

controller ID Controller identification number of the director the disks are
channeled to for EREP usage. There is only one controller ID for
Symmetrix.

CKD Count Key Data, a data recording format employing self-defining
record formats in which each record is represented by a count area
that identifies the record and specifies its format, an optional key area
that may be used to identify the data area contents, and a data area
that contains the user data for the record. CKD can also refer to a set
of channel commands that are accepted by a device that employs the
CKD recording format.

D
DASD Direct access storage device, a device that provides nonvolatile
storage of computer data and random access to that data.

data availability Access to any and all user data by the application.

delayed fast write There is no room in cache for the data presented by the write
operation.

destage The asynchronous write of new or updated data from cache to disk
device.

device A uniquely addressable part of the Symmetrix subsystem that
consists of a set of access arms, the associated disk surfaces, and the
electronic circuitry required to locate, read, and write data. See also
”volume.”

device address The hexadecimal value that uniquely defines a physical I/O device
on a channel path in an MVS environment. See also ”unit address.”

device number The value that logically identifies a disk device in a string.

diagnostics System level tests or firmware designed to inspect, detect, and correct
failing components. These tests are comprehensive and self-invoking.

director The component in the Symmetrix subsystem that allows Symmetrix
to transfer data between the host channels and disk devices. See also
”channel director.”

disk director The component in the Symmetrix subsystem that interfaces between
cache and the disk devices.


Glossary

dual-initiator A Symmetrix feature that automatically creates a backup data path to
the disk devices serviced directly by a disk director, if that disk
director or the disk management hardware for those devices fails.

dynamic sparing A Symmetrix feature that automatically transfers data from a failing
disk device to an available spare disk device without affecting data
availability. This feature supports all non-mirrored devices in the
Symmetrix subsystem.

E
ESCON Enterprise Systems Connection, a set of IBM and vendor products
that connect mainframe computers with each other and with attached
storage, locally attached workstations, and other devices using
optical fiber technology and dynamically modifiable switches called
ESCON Directors. See also ”ESCON director.”

ESCON director Device that provides a dynamic switching function and extended link
path lengths (with XDF capability) when attaching an ESCON
channel to a Symmetrix serial channel interface.

F
fast write In Symmetrix, a write operation at cache speed that does not require
immediate transfer of data to disk. The data is written directly to
cache and is available for later destaging.

FBA Fixed Block Architecture, disk device data storage format using
fixed-size data blocks.

FRU Field Replaceable Unit, a component that is replaced or added by
service personnel as a single entity.

frame Data packet format in an ESCON environment. See also ”ESCON.”

G
gatekeeper A small logical volume on a Symmetrix storage subsystem used to
pass commands from a host to the Symmetrix storage subsystem.
Gatekeeper devices are configured on standard Symmetrix disks.

GB Gigabyte, 109 bytes.


Glossary

H
head and disk A field replaceable unit in the Symmetrix subsystem containing the
assembly disk and actuator.

home address The first field on a CKD track that identifies the track and defines its
operational status. The home address is written after the index point
on each track.

hyper-volume The ability to define more than one logical volume on a single
extension physical disk device making use of its full formatted capacity. These
logical volumes are user-selectable in size. The minimum volume size
is one cylinder and the maximum size depends on the disk device
capacity and the emulation mode selected.

I
ID Identifier, a sequence of bits or characters that identifies a program,
device, controller, or system.

IML Initial microcode program loading.

index marker Indicates the physical beginning and end of a track.

index point The reference point on a disk surface that determines the start of a
track.

INLINES An EMC-provided host-based Cache Reporter utility for viewing
short and long term cache statistics at the system console.

I/O device An addressable input/output unit, such as a disk device.

K
K Kilobyte, 1024 bytes.

L
least recently used The algorithm used to identify and make available the cache space by
algorithm (LRU) removing the least recently used data.

logical volume A user-defined storage device. In the Model 5200, the user can define
a physical disk device as one or two logical volumes.


Glossary

long miss Requested data is not in cache and is not in the process of being
fetched.

longitude redundancy Exclusive OR (XOR) of the accumulated bytes in the data record.
code (LRC)
M
MB Megabyte, 106 bytes.

mirrored pair A logical volume with all data recorded twice, once on each of two
different physical devices.

mirroring The Symmetrix maintains two identical copies of a designated
volume on separate disks. Each volume automatically updates
during a write operation. If one disk device fails, Symmetrix
automatically uses the other disk device.

P
physical ID Physical identification number of the Symmetrix director for EREP
usage. This value automatically increments by one for each director
installed in Symmetrix. This number must be unique in the
mainframe system. It should be an even number. This number is
referred to as the SCU_ID.

primary track The original track on which data is stored. See also ”alternate track.”

promotion The process of moving data from a track on the disk device to cache
slot.

R
read hit Data requested by the read operation is in cache.

read miss Data requested by the read operation is not in cache.

record zero The first record after the home address.

S
scrubbing The process of reading, checking the error correction bits, and writing
corrected data back to the source.


Glossary

SCSI adapter Card in the Symmetrix subsystem that provides the physical interface
between the disk director and the disk devices.

short miss Requested data is not in cache, but is in the process of being fetched.

SSID For 3990 storage control emulations, this value identifies the physical
components of a logical DASD subsystem. The SSID must be a
unique number in the host system. It should be an even number and
start on a zero boundary.

stage The process of writing data from a disk device to cache.

storage control unit The component in the Symmetrix subsystem that connects
Symmetrix to the host channels. It performs channel commands and
communicates with the disk directors and cache. See also ”channel
director.”

string A series of connected disk devices sharing the same disk director.

U
unit address The hexadecimal value that uniquely defines a physical I/O device
on a channel path in an MVS environment. See also ”device address.”

V
volume A general term referring to a storage device. In the Symmetrix
subsystem, a volume corresponds to single disk device.

W
write hit There is room in cache for the data presented by the write operation.

write miss There is no room in cache for the data presented by the write
operation.


GDDR Solution Design and Implementation Techniques

More Related Content

Viewers also liked (16)

Similar to GDDR Solution Design and Implementation Techniques (20)

More from EMC (20)

Recently uploaded (20)

GDDR Solution Design and Implementation Techniques