SlideShare a Scribd company logo
GDDR Solution Design and
                   Implementation Techniques


                                                      Version 1.0




• Business Requirement Definitions
• Project Planning and Change Management Guidelines
• Technology Selection and Solution Design
• Architecture and Sizing Considerations




Dean Smith
Peter Callewaert
Chris Fallon
John Egan
Copyright © 2011 EMC Corporation. All rights reserved.
    EMC believes the information in this publication is accurate as of its publication date. The information is
    subject to change without notice.

    THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO
    REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
    PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR
    FITNESS FOR A PARTICULAR PURPOSE.

    Use, copying, and distribution of any EMC software described in this publication requires an applicable
    software license.

    For the most up-to-date regulatory document for your product line, go to the Technical Documentation and
    Advisories section on EMC Powerlink.

    For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

    All other trademarks used herein are the property of their respective owners.


    Part number h8231




2       GDDR Solution Design and Implementation Techniques
Contents




Preface

Chapter 1   Requirements Definition
            Introduction .......................................................................................   16
            Understanding Business Continuity requirements .....................                                   18
            High Availability...............................................................................       21
            Recovery Time Objective .................................................................              22
            Recovery Point Objective.................................................................              23
            EMC GDDR role................................................................................          24


Chapter 2   Planning for your GDDR Implementation
            Understanding project constraints .................................................                    28
            The importance of test systems.......................................................                  29
            Minimizing operational risks during implementations..............                                      30
            The importance of design ................................................................              32
            Construction of your timeline and milestones .............................                             33
            Review change management procedures......................................                              34


Chapter 3   GDDR Managed Technologies
            EMC Symmetrix VMAX with EMC Enginuity............................. 36
              Symmetrix VMAX hardware platform....................................36
              EMC Enginuity operating environment..................................38
              Symmetrix features for mainframe ..........................................38
              EMC Mainframe Enablers .........................................................39
            EMC TimeFinder............................................................................... 44
              TimeFinder/Clone......................................................................44
              TimeFinder/Consistency Group ..............................................45



                            GDDR Solution Design and Implementation Techniques                                          3
Contents



                                  TimeFinder/Mirror for z/OS ................................................... 45
                                EMC Symmetrix Remote Data Facility (SRDF) ............................ 48
                                  SRDF/Synchronous ................................................................... 48
                                  SRDF/Asynchronous................................................................. 48
                                  SRDF/A Multi-Session Consistency (MSC) ........................... 51
                                  Concurrent SRDF........................................................................ 54
                                  Cascaded SRDF........................................................................... 55
                                  SRDF/Extended Distance Protection ...................................... 56
                                EMC SRDF consistency groups ...................................................... 58
                                EMC AutoSwap ................................................................................ 62
                                EMC SRDF/Star................................................................................ 64


               Chapter 4        EMC Geographically Dispersed Disaster Restart (EMC
                                GDDR)
                                Supported configurations................................................................ 72
                                   SRDF/S with ConGroup ........................................................... 73
                                   SRDF/S with AutoSwap ........................................................... 76
                                   SRDF/A ....................................................................................... 78
                                   SRDF/Star ................................................................................... 81
                                   EMC GDDR SRDF/Star complex ............................................ 84
                                   SRDF/Star with AutoSwap ...................................................... 85
                                EMC GDDR requirements............................................................... 89
                                   Mainframe environment requirements ................................... 89
                                   Minimum hardware requirements .......................................... 90
                                   Minimum software requirements ............................................ 90
                                   Additional configuration requirements .................................. 91
                                EMC GDDR fundamental concepts ............................................... 93
                                   EMC GDDR control systems .................................................... 93
                                   Workload location ..................................................................... 94
                                   EMC GDDR processes ............................................................... 96
                                EMC GDDR components................................................................. 98
                                   Parameters .................................................................................. 98
                                   User interface ............................................................................. 99
                                   Events and rules....................................................................... 112
                                   Procedures ................................................................................ 113
                                EMC GDDR procedures ................................................................ 114
                                EMC GDDR supported scripts ..................................................... 116
                                   GDDR for SRDF/S with ConGroup (Two sites - DC1
                                   and DC2) .................................................................................... 119
                                   GDDR for SRDF/S with AutoSwap (Two sites - DC1
                                   and DC2) .................................................................................... 121
                                   GDDR for SRDF/A (Two sites - DC1 and DC3) .................. 123


4          GDDR Solution Design and Implementation Techniques
Contents



                  GDDR for SRDF/Star (Three sites) ........................................125
                  GDDR for SRDF/Star with AutoSwap (Three sites) ...........130


Chapter 5   GDDR Managed Systems Design Considerations
            Overview .......................................................................................... 140
            Ignore, Reset Clear, or IPL in place............................................... 142
            Contingency systems and workload movement ........................ 143
            HMC-Only systems (new in GDDR 3.1)...................................... 144
            LPAR and CEC recovery ................................................................ 146
                Planned CPC Swap .................................................................. 147
            Automated Configuration Check - DASD................................... 149
                GDDR variables influencing the expected configuration ...149


Chapter 6   Storage Infrastructure Design Considerations
            Storage configuration and layout ................................................. 152
            Infrastructure recommendations .................................................. 153
                Gatekeeper device list ..............................................................153
                Started tasks names ................................................................. 155
                Important dataset names .........................................................157
            EMC software levels ....................................................................... 158
                LPAR configuration specifics ..................................................158
            Sysplex and coupling facility considerations.............................. 160


Chapter 7   Best Practices for Network Design
            Overview .......................................................................................... 164
            Understanding bandwidth requirements.................................... 165
            Synchronous network design considerations ............................. 168
            Asynchronous network design considerations .......................... 172


Glossary




                            GDDR Solution Design and Implementation Techniques                                        5
Contents




6          GDDR Solution Design and Implementation Techniques
Figures




     Title                                                                                                       Page
1    Business Continuity example ....................................................................... 19
2    RTO example................................................................................................... 22
3    RPO example................................................................................................... 23
4    EMC Foundation technologies ..................................................................... 25
5    VMAX architecture ........................................................................................ 37
6    z/OS SymmAPI architecture........................................................................ 40
7    TimeFinder/Mirror under Enginuity 5773 and earlier ............................ 46
8    TimeFinder/Mirror under Enginuity 5874 and later................................ 47
9    SRDF/A delta sets.......................................................................................... 49
10   Delta set switching ......................................................................................... 51
11   SRDF/A cycles and delta sets ...................................................................... 53
12   Cascaded SRDF configuration...................................................................... 56
13   SRDF/EDP basic configuration ................................................................... 57
14   Primary and secondary relationships ......................................................... 59
15   Failed links ...................................................................................................... 60
16   Primary systems 1, 2, and 3 in a consistency group.................................. 61
17   Concurrent SRDF/Star configuration ......................................................... 66
18   Cascaded SRDF/Star configuration ............................................................ 67
19   Cascaded SRDF/Star with EDP (diskless) configuration ........................ 68
20   SRDF/S with ConGroup environment ....................................................... 73
21   EMC GDDR complex with SRDF/S ............................................................ 75
22   SRDF/S with AutoSwap environment ....................................................... 76
23   SRDF/A environment ................................................................................... 78
24   GDDR complex with SRDF/A ..................................................................... 79
25   Concurrent SRDF/Star environment .......................................................... 81
26   Cascaded SRDF/Star environment ............................................................. 82
27   GDDR Complex with Star............................................................................. 84
28   Concurrent SRDF/Star with AutoSwap environment ............................. 86
29   Cascaded SRDF/Star with AutoSwap environment ................................ 87
30   Primary Options Menu................................................................................ 100


                                  GDDR Solution Design and Implementation Techniques                                          7
Figures



               31      Change GDDR ISPF Profile Variable Values ...........................................                           102
               32      Setup and Maintenance...............................................................................           103
               33      View GDDR Configuration ........................................................................               105
               34      Manage Site Roles ........................................................................................     106
               35      Perform Health Check .................................................................................         107
               36      Select Script to Run ......................................................................................    109
               37      View GDDR Script Statistics.......................................................................             110
               38      Actions Menu................................................................................................   111
               39      GDDR Actions Menu...................................................................................           144
               40      GDDR Perform HMC LPAR Actions ........................................................                         145
               41      Write distribution.........................................................................................    166
               42      Bandwidth requirement ..............................................................................           167
               43      DASD write activity by volser ...................................................................              169
               44      DASD Volumes Flagged for Tuning .........................................................                      170
               45      Recommended asynchronous network topology ...................................                                  174




8         GDDR Solution Design and Implementation Techniques
Tables




    Title                                                                                                    Page
1   Mainframe environment requirements........................................................ 89
2   Minimum hardware requirements ............................................................... 90
3   Software requirements ................................................................................... 91
4   Script names and descriptions..................................................................... 116
5   Script mapping .............................................................................................. 118
6   GDDR managed system recover map........................................................ 141
7   Software levels ............................................................................................... 158




                                GDDR Solution Design and Implementation Techniques                                        9
Tables




10       GDDR Solution Design and Implementation Techniques
Preface




           This EMC Engineering TechBook draws on GDDR field expertise,
           highlighting best practices relating to both technology and project
           management disciplines required to achieve success during GDDR
           implementations. This field-inspired guide starts with the first risk area to a
           business continuity project: basic understanding of business continuity or
           high availability requirements, then proceeds through analysis, technology
           selection, and architecture design guidelines. Following the guidelines in
           this GDDR TechBook will help ensure that your business continuity or high
           availability project is successful.
           As part of an effort to improve and enhance the performance and capabilities
           of its product lines, EMC periodically releases revisions of its hardware and
           software. Therefore, some functions described in this document may not be
           supported by all versions of the software or hardware currently in use. For
           the most up-to-date information on product features, refer to your product
           release notes. If a product does not function properly or does not function as
           described in this document, please contact your EMC representative.

           Note: This document was accurate as of the time of publication. However, as
           information is added, new versions of this document may be released to the
           EMC Powerlink website. Check the Powerlink website to ensure that you are
           using the latest version of this document.


Audience   This manual is intended for customers considering improvements to
           business continuity or high availability technology deployments.




                           GDDR Solution Design and Implementation Techniques                11
Preface



                  Related         The following is a list of related documents that may assist readers
             documentation        with more detailed information on topics described in this TechBook.
                                  These documents can be found on the EMC Powerlink website at
                                  http://Powerlink.EMC.com.
                                  ◆   ResourcePak Base for z/OS
                                  ◆   SRDF Host Component for z/OS
                                  ◆   TimeFinder/Clone Mainframe SNAP Facility
                                  ◆   TimeFinder/Mirror for z/OS
                                  ◆   Consistency Groups for z/OS
                                  ◆   TimeFinder Utility for z/OS
                                  ◆   AutoSwap
                                  ◆   GDDR for SRDF/Star with AutoSwap
                                  ◆   GDDR for SRDF/Star
                                  ◆   GDDR for SRDF/S with AutoSwap
                                  ◆   GDDR for SRDF/S with ConGroup
                                  ◆   GDDR for SRDF/A

               Authors of this    Dean Smith is a Consulting Software Engineer and has recently
                   TechBook       rejoined EMC with the GDDR team after working abroad for several
                                  years. With nearly 33 years of IT experience in the Mainframe
                                  environment in both client and systems development, Dean has
                                  focused exclusively on the justification, architecture, and
                                  implementation of business continuity solutions world wide for the
                                  last 12 years.
                                  Peter Callewaert is a Senior Solutions Architect at EMC, with an
                                  active role in the evolution of GDDR as a product, in the
                                  troubleshooting and fixing of product issues, and in providing
                                  support to the GDDR Solutions Support team. He has been with EMC
                                  for 12 years in various roles including Pre-Sales, Customer Service,
                                  and on-site Implementation. He has 20+ years of IT-experience
                                  mainly in the Mainframe environment, with an ever-increasing focus
                                  on Disaster Recovery and Business Continuity.
                                  Chris Fallon is a Consulting Corporate System Engineer in the EMC
                                  Symmetrix Engineering and Virtualization product group with over
                                  22 years experience in enterprise data center systems and storage
                                  expertise. Chris has been working with the Symmetrix since 1995 and



12        GDDR Solution Design and Implementation Techniques
Preface



                      SRDF since 1997 where he has developed, architected, and deployed
                      geographically dispersed business continuity solutions on the
                      Symmetrix storage array.
                      John Egan is an EMC Distinguished Engineer in the EMC Symmetrix
                      Engineering and Virtualization product group. He leads the System z
                      Automation team, responsible for the GDDR and EzSM products. His
                      background extends over 30 years with specialization in large scale
                      systems integration, software engineering, project management, and
                      design.

Conventions used in   EMC uses the following conventions for special notices.
    this document
                      Note: A note presents information that is important, but not hazard-related.




                      A caution contains information essential to avoid data loss or
                      damage to the system or equipment.


                      IMPORTANT
                      An important notice contains information essential to operation
                      Typographical conventions

                      EMC uses the following type style conventions in this document:
                      Normal                 Used in running (nonprocedural) text for:
                                             • Names of interface elements (such as names of windows,
                                               dialog boxes, buttons, fields, and menus)
                                             • Names of resources, attributes, pools, Boolean expressions,
                                               buttons, DQL statements, keywords, clauses, environment
                                               variables, functions, utilities
                                             • URLs, pathnames, filenames, directory names, computer
                                               names, filenames, links, groups, service keys, file systems,
                                               notifications
                      Bold                   Used in running (nonprocedural) text for:
                                             • Names of commands, daemons, options, programs,
                                               processes, services, applications, utilities, kernels,
                                               notifications, system calls, man pages
                                             Used in procedures for:
                                             • Names of interface elements (such as names of windows,
                                               dialog boxes, buttons, fields, and menus)
                                             • What user specifically selects, clicks, presses, or types




                                      GDDR Solution Design and Implementation Techniques                       13
Preface



                                 Italic                 Used in all text (including procedures) for:
                                                        • Full titles of publications referenced in text
                                                        • Emphasis (for example a new term)
                                                        • Variables
                                 Courier                Used for:
                                                        • System output, such as an error message or script
                                                        • Specific user input (such as commands)
                                                        • URLs, complete paths, filenames, prompts, and syntax when
                                                          shown outside of running text
                                 Courier italic         Used in procedures for:
                                                        • Variables on command line
                                                        • User input variables
                                 <>                     Angle brackets enclose parameter or variable values supplied by
                                                        the user
                                 []                     Square brackets enclose optional values
                                 |                      Vertical bar indicates alternate selections - the bar means “or”
                                 {}                     Braces indicate content that you must specify (that is, x or y or z)
                                 ...                    Ellipses indicate nonessential information omitted from the
                                                        example




                                  We'd like to hear from you!
                                  Your feedback on our TechBooks is important to us! We want our
                                  books to be as helpful and relevant as possible, so please feel free to
                                  send us your comments, opinions and thoughts on this or any other
                                  TechBook:
                                  TechBooks@emc.com




14        GDDR Solution Design and Implementation Techniques
1

                                                    Requirements Definition




This chapter contains the following information:
◆   Introduction ........................................................................................   16
◆   Understanding Business Continuity requirements.......................                                   18
◆   High Availability................................................................................       21
◆   Recovery Time Objective ..................................................................              22
◆   Recovery Point Objective ..................................................................             23
◆   EMC GDDR role.................................................................................          24




                                                                    Requirements Definition                      15
Requirements Definition




     Introduction
                                 This chapter introduces EMC® Geographically Dispersed Disaster
                                 Restart (EMC GDDR) and discusses its role in High Availability
                                 (HA), Business Continuity (BC) and disaster restart (DR) strategies.
                                 In recent years, business continuity requirements for many
                                 enterprises have changed particularly in response to the threat of
                                 terrorism as well as from major natural disasters. What has evolved is
                                 the notion of a “regional disaster”—whereby local synchronously
                                 replicated centers are “in region” in terms of the scope of disaster
                                 events. A regional disaster such as a major flood or prolonged and
                                 widespread power disruption could concurrently affect both data
                                 centers. In response to this many sites have re-evaluated their disaster
                                 readiness capabilities.
                                 Commerce has not stood still either. The shift toward online/internet
                                 enabled enterprises has continued to build high availability and
                                 continuous compute architectures to support “24*7” business
                                 operations.
                                 Consistent with these trends we have seen customers building data
                                 centers and technology infrastructure to match changing business
                                 requirements. A common practice for customers running
                                 synchronous replication is to add third-site asynchronous “out of
                                 region” disaster recovery centers. This has been particularly observed
                                 for the large financial institutions.
                                 Customers with high (HA) or continuous availability requirements
                                 are trending towards building adjacent primary and secondary data
                                 centers to gain the benefit of synchronous data replication with
                                 minimal performance impact. Another common data center strategy
                                 to support HA requirements is to modify existing data center
                                 infrastructure for redundant power connect and physical firewall
                                 separation for compute and/or storage infrastructure. These
                                 strategies facilitate synchronous replication to operate with minimum
                                 latency overhead and also enables construction of very high-speed,
                                 high bandwidth networks to support processor-to-processor
                                 interconnect for the IBM Parallel Sysplex Coupling Facility.




16       GDDR Solution Design and Implementation Techniques
Requirements Definition



The larger financial institutions are trending toward building
combined HA and Disaster Recovery (DR) solutions where both
synchronous and asynchronous replication architectures are
combined with data replication and management infrastructure to
detect and manage potentially catastrophic disaster-in-progress
“events.”




                                                   Introduction        17
Requirements Definition




     Understanding Business Continuity requirements
                                 Business Continuity (BC) requirements tend to cluster around
                                 industry verticals. Telecommunications companies typically deploy
                                 some form of two-site replication and recovery architecture. Large
                                 financial institutions are trending towards three-site architectures
                                 that support both HA and DR solutions in three data center
                                 strategies. The largest retailers tend towards geographically
                                 separated two-site data center strategies, some deploying
                                 asynchronous and others synchronous data replication strategies.
                                 Figure 1 on page 19 illustrates that as the size, sophistication and
                                 wealth of an enterprise increase over time, business continuity
                                 requirements shift from essentially no protection through disaster
                                 recovery, business restart and eventually towards high availability.




18       GDDR Solution Design and Implementation Techniques
Requirements Definition




        100%
      Procedural
        0% IT
     Architectural
     Low Security
       Manual
                                    Resources        Food
                                                     Manufacturing


                                                                                     Financial
                               Manufacturing                                         Services
                                    &               Essential           Retail
                             Telecommunication      Services         & Online          Banks

         Small Business                                                            Brokerages
                              Transportation
                                   And                Consumer
                                 Logistics              Goods                      0% Procedural
                                                     Manufacturing                    100% IT
                                                                                    Architectural
                                                                                   High Security
                                                                                     Automated


  Single Data                        Dual Data                                   Three Data
    Center                            Center                                       Center

Market Cap                               $10b                                     $20b+

                                                                                                    SYM-002544



         Figure 1         Business Continuity example

                          Disaster recovery is the process of restoring a previous copy of the
                          data as the starting point for resumption of business operations,
                          typically from a tape image. It is then necessary to apply database
                          logs or perform other manual processes to bring the data to the last
                          known point of consistency. Depending on the size of the enterprise
                          this process can range from a few hours to days and weeks.
                          Disaster restart is the process of restarting dependent-write consistent
                          copies of data. This type of “restart” data image is created by EMC
                          data replication technologies. The image is presented to the operating


                                                 Understanding Business Continuity requirements                  19
Requirements Definition



                                 system for IPL. The data itself is in “crash restart” form, the exact
                                 same image that would be available following the loss of power at a
                                 primary (non-replicated) data center. Disaster restart displaces tape
                                 based recovery and greatly reduces the Recovery Time Objective
                                 (RTO).
                                 Database management systems and mainframe applications have
                                 evolved over time to restart from these data images. Any incomplete
                                 or partially complete transactions are “backed” out by the application
                                 layer to find a point where business resumption can occur. Any
                                 backed out or incomplete transactions are re-applied after business
                                 operations resume. The time required for restart depends on the data
                                 change rate and state of the databases at the time of disruption:
                                 ◆   If a database is shut down normally, achieving a point of
                                     consistency on restart requires minimal work.
                                 ◆   However, if the database terminates abnormally or is resumed
                                     from a “restart” data image, the restart process is elongated
                                     depending on the number and size of in-flight transactions at the
                                     time of DBMS termination.
                                 An image of a database created using EMC consistency technology
                                 while the database is in operation is in a dependent-write consistent
                                 state, also known as a DBMS restartable image. The restart of this
                                 image transforms it to a transactionally consistent data state by
                                 completing committed transactions and rolling back uncommitted
                                 transactions during the normal database initialization process.




20       GDDR Solution Design and Implementation Techniques
Requirements Definition




High Availability
                    High Availability (HA) solutions are built on synchronous data
                    replication to enable EMC AutoSwap™ host software to interact with
                    storage arrays to achieve a non-disruptive swap of DASD I/O from a
                    source array to a synchronous target array. Swap operations can
                    occur automatically for unplanned events, such as the loss of all
                    FICON connectivity between a z/OS host and its attached EMC
                    Symmetrix® array(s), or for planned events through an operator
                    command interface and also through panel dialogs supplied with
                    GDDR.
                    Synchronous data replication is required for swap operations because
                    of the requirement for the replicated data to be an exact image of the
                    primary disk at the precise moment of the swap event.
                    Asynchronous replication is chronologically behind the primary
                    storage instance (by measure of seconds or minutes), it is therefore
                    not possible to deploy AutoSwap into an asynchronous replication
                    environment.
                    Some customers require both HA and DR solutions and deploy a
                    combination of synchronous and asynchronous replication
                    infrastructures. These solutions provide non-disruptive swaps to the
                    secondary synchronous disk for localized “disasters” such as
                    severing the FICON cables between a host and its local storage as
                    well as DR protection in the event of a catastrophic loss of the
                    primary and secondary data centers.




                                                                       High Availability       21
Requirements Definition




     Recovery Time Objective
                                     To gain an understanding of the type of replication and recovery
                                     infrastructure that must be in place to support an enterprise, it is
                                     typical to first classify applications by criticality. Many customers
                                     have three or more application tiers, whereby applications are
                                     categorized by time-to-recover following planned or unplanned
                                     service interruption.
                                     Ranges can vary from high availability (RTO = zero) through to hours
                                     or days to recover; it all depends on the impact to the business as a
                                     trade off to the cost of protection against service interruption.
                                     Figure 2 illustrates the relationship between the cost of the solution
                                     and the cost of the service interruption. Once the RTO is known, the
                                     cost of the solution can be determined.




                          Figure 2   RTO example




22       GDDR Solution Design and Implementation Techniques
Requirements Definition




Recovery Point Objective
                     Applications must also be classified according to the amount of data
                     loss that could be sustained as result of recovery from service
                     interruption. The measure of data loss is termed Recovery Point
                     Objective (RPO). Typically synchronous data replication supports a
                     near zero RPO, whereas asynchronous data replication supports RPO
                     measured in terms of seconds, minutes or hours. For asynchronous
                     data replication the measure of data loss is usually proportional to
                     the amount of Wide Area Network (WAN) bandwidth applied to the
                     solution. Figure 3 illustrates the relationship between the cost of the
                     solution and the cost of the service interruption. Once the RPO is
                     known, the cost of the solution can be determined.




          Figure 3   RPO example




                                                                Recovery Point Objective         23
Requirements Definition




     EMC GDDR role
                                 EMC Geographically Dispersed Disaster Restart (EMC GDDR) is a
                                 software product that runs on the IBM z/OS operating system to
                                 automate business recovery following both planned outages and
                                 disaster situations, including the total loss of a data center. EMC
                                 GDDR does not provide replication and recovery services itself, but
                                 rather monitors and automates the services provided by other EMC
                                 products, as well as third-party products required for continuous
                                 operations or business restart.
                                 EMC GDDR facilitates business continuity by generating scripts that
                                 can be run on demand; for example, restart business applications
                                 following a major data center incident, or resume replication to
                                 provide ongoing data protection following unplanned link outages.
                                 Scripts are customized at the time of invocation by an expert system
                                 that tailors the steps based on the configuration and the event that
                                 GDDR is managing. Through automatic event detection and
                                 end-to-end automation of managed technologies, GDDR removes
                                 human error from the recovery process and allows it to complete in
                                 the shortest time possible.
                                 The GDDR expert system is also invoked to automatically generate
                                 planned procedures, such as moving compute operations from one
                                 data center to another. This is the gold standard for high availability
                                 compute operations, to be able to move from scheduled DR test
                                 weekend activities to regularly scheduled data center swaps without
                                 disrupting application workloads.




24       GDDR Solution Design and Implementation Techniques
Requirements Definition



           Figure 4 on page 25 shows the relationship between the software
           layers and the Symmetrix hardware layer.




Figure 4   EMC Foundation technologies




                                                            EMC GDDR role           25
Requirements Definition




26       GDDR Solution Design and Implementation Techniques
2
                                               Planning for your GDDR
                                                      Implementation




This chapter deals with important considerations to ensure that your
major infrastructure project is successful. Some of the issues
highlighted here are best practices specific to managing technology
risk. Other recommendations relate to the common but important
risk factors associated with project administration and change
management:
◆   Understanding project constraints ..................................................         28
◆   The importance of test systems........................................................       29
◆   Minimizing operational risks during implementations ...............                          30
◆   The importance of design .................................................................   32
◆   Construction of your timeline and milestones ..............................                  33
◆   Review change management procedures.......................................                   34




                                        Planning for your GDDR Implementation                         27
Planning for your GDDR Implementation




     Understanding project constraints
                                        EMC GDDR implementations operate under the “triple” constraint
                                        model of project management. Essentially, there are three major areas
                                        of constraint on any infrastructure project:
                                        ◆   Time
                                        ◆   Quality
                                        ◆   Cost
                                        In many instances the date of implementation is already fixed.
                                        Ideally, the implementation date will have been calculated from the
                                        sum total of all of the complex technical infrastructure tasks that must
                                        be accomplished. These tasks may include:
                                        ◆   Workload measurement
                                        ◆   Bandwidth assessment
                                        ◆   Compute resiliency (to support HA or DR etc)
                                        ◆   Storage layout and I/O access density (cache, device types, RAID
                                            type etc)
                                        ◆   Data center planning (floor space, power, cooling etc)
                                        ◆   Data migration planning
                                        ◆   Construction of test facility
                                        ◆   Staff training
                                        It is a major risk factor if the target date for the completion of the
                                        project has been set without due regard to the number and
                                        complexity of tasks to achieve the goal.
                                        In theory, the three constraint project model is a method for the
                                        project manager to retain control of the project delivery. If the
                                        customer owns all three constraints it is impossible for the project
                                        manager to obtain or retain control of the project. Control can be
                                        maintained by allowing the project manager to control at least one of
                                        the constraints, typically the Quality objective. In environments
                                        where time and budget are fixed the only remaining variable is
                                        quality. Inevitably, if the time constraint is unreasonable project
                                        quality will be affected.




28      GDDR Solution Design and Implementation Techniques
Planning for your GDDR Implementation




The importance of test systems
                  As the complexity of solutions increases so too does the requirement
                  to allow your operations staff to become competent with new
                  technology. This is best achieved with a test environment. Even if the
                  customer cannot afford a perpetual test environment – serious
                  consideration should be given to constructing an interim test
                  environment for the duration of the GDDR project. It will allow
                  operations staff to safely acquire knowledge and also provide a
                  framework for formalized “hands on” training offered by the EMC
                  GDDR implementation team both during the project and more
                  formally as part of the acceptance criteria towards the end of the
                  project.
                  Some customers can benefit from this more than others. If your
                  operations staff do not have experience with replication and
                  automation technologies in general and EMC in particular, then the
                  construction of a test environment should be considered mandatory.
                  If however, the customer has the necessary skills in-house then the
                  benefit from a test environment is more limited to ensuring that
                  software and fixes provided by EMC and other vendors operate
                  correctly in the customer environment.




                                                       The importance of test systems        29
Planning for your GDDR Implementation




     Minimizing operational risks during implementations
                                        There are several methods for implementing GDDR into production
                                        environments. The two approaches outlined below are used where a
                                        customer cannot tolerate downtime in their existing production
                                        workflow. The techniques used now allow for both continuous
                                        operation of production systems and also maintenance of the existing
                                        DR posture.
                                        The parallel infrastructure approach is a technique where new
                                        network, compute and storage infrastructure is commissioned
                                        adjacent to the existing production systems. Data is migrated from
                                        the existing production infrastructure into the new GDDR managed
                                        environment to allow for infrastructure and applications orientated
                                        testing to be conducted. The testing process is destructive to the data
                                        image, so the data migration process is repeated for each test cycle.
                                        Once the infrastructure has been proven to operate successfully in the
                                        new environment one final data migration is conducted. The GDDR
                                        managed infrastructure becomes the new production environment
                                        and the older infrastructure is decommissioned. Another significant
                                        advantage of this technique over other techniques is that not only can
                                        the resiliency of the compute infrastructure be preserved but also the
                                        existing HA/DR posture can be retained until the new infrastructure
                                        can be deployed. The parallel infrastructure approach is the “best
                                        practice” for deploying GDDR management infrastructures into
                                        customer environments with both minimal risk and minimal
                                        disruption to existing application workloads.
                                        The partial production approach is where a small subset of DASD
                                        volumes on the production system is used to prove the entire
                                        infrastructure. This allows for the validation of GDDR and
                                        prerequisite technologies into the production environment but
                                        without actually using DASD that contains production data. When
                                        the time comes to increase the scope of the parameters to address the
                                        full environment the first time it is executed is live in the production
                                        environment. This partial production approach is inferior to the
                                        parallel infrastructure approach in that the full function and scope of
                                        functionality can be not be fully tested without risking impact to
                                        production environments. The emphasis here is to eliminate (in
                                        particular) typing errors in parameter libraries, security issues, issues
                                        with gate keeper devices, firewalls etc., all of which can be debugged
                                        without impact using the parallel infrastructure approach but which
                                        cannot be fully detected in the partial production approach until
                                        actually encountered live in the production environment.

30      GDDR Solution Design and Implementation Techniques
Planning for your GDDR Implementation



The importance of a test environment to allow operations staff to
obtain competency cannot be overstressed when the partial
production approach is used. The partial production approach is a
good compromise method of implementing where the customer is
able to provide sufficient test windows to safely test the new
infrastructure "in place." It is also commonly used where financial or
other constraints on a project make the "parallel infrastructure"
implementation approach impractical.




                  Minimizing operational risks during implementations        31
Planning for your GDDR Implementation




     The importance of design
                                        Another significant area of addressable risk with GDDR projects is in
                                        the correct sizing of the entire solution. This book sets out guidelines
                                        for properly sizing the proposed solution. Particular attention must
                                        be paid to the sizing of network infrastructure. See Chapter 7, “Best
                                        Practices for Network Design,” for more details. A second major area
                                        of design risk is in the storage layout. Storage must be configured not
                                        only for raw storage capacity, raid groups etc but also for throughput
                                        capacity. The solution must be designed to sustain not only I/O
                                        loads from the production environment but also to have sufficient
                                        reserve capacity in terms of cache and throughput to operate the
                                        replication infrastructure and (if required) restart operations. See the
                                        section Storage Infrastructure Design Considerations for more details
                                        on this important topic. EMC recommends early contact (pre-sales)
                                        with GDDR solution design experts to ensure that any solution
                                        proposal meets design standards to ensure optimal operational
                                        performance. The GDDR team operates globally and is centrally
                                        managed from EMC Corporate Headquarters in Hopkinton,
                                        Massachusetts.
                                        The team can be contacted via email at the following address:
                                        gddrsolutionsupport@emc.com

                                        Conventional mail:

                                        GDDR Solution Support Team
                                        Mailstop 1/P-36
                                        176 South Street
                                        Hopkinton, MA 01748 USA




32      GDDR Solution Design and Implementation Techniques
Planning for your GDDR Implementation




Construction of your timeline and milestones
                  Setting a “too aggressive” timeline can cause a project to fail. As
                  discussed earlier in the “three constraint” model for project
                  administration, if the timeline is fixed and the budget is fixed that
                  leaves only one variable: quality. In other words in order to meet
                  strict deadlines for task completion shortcuts may be taken in quality
                  assurance measures. The result of this may be a sequence of failed
                  changes or service interruptions to the production environment.
                  One of the ways to avoid placing undue/unreasonable time pressure
                  on a project is to make sure to involve the technical architecture and
                  implementation people in the construction of the implementation
                  plan time outline.




                                          Construction of your timeline and milestones        33
Planning for your GDDR Implementation




     Review change management procedures
                                        The “GDDR Project” is typically much larger than just installing the
                                        GDDR software and customizing parameter libraries. As discussed
                                        earlier, there may be changes to physical infrastructure, to utilities,
                                        air-conditioning as well as IT infrastructure such as network
                                        provisioning, new compute and storage arrays, switches, routers etc.
                                        The scope of these projects is very large. In the space of a few months
                                        a large number of potentially high risk changes need to be executed.
                                        Change scheduling is important in order to avoid conflicts between
                                        infrastructure provisioning groups and to align with the business
                                        calendar.
                                        However, much more important than change scheduling is change
                                        management. This is a much broader topic and addresses issues such
                                        as:
                                        ◆   Change scheduling
                                        ◆   Backout/fallback planning
                                        ◆   Risk mitigation strategies
                                        ◆   Documentation
                                        ◆   Root cause analysis (for failed changes)
                                        When trying to determine the health of a change management
                                        system, look beyond the number of change “tickets” to see how many
                                        changes are rejected because of scheduling conflicts, inadequate
                                        documentation, risk mitigation etc. A robust change management
                                        system will exhibit much more than just change scheduling.
                                        Assess the health of your change management systems and if
                                        necessary adjust to deal with the large volume of changes typically
                                        encountered during these infrastructure projects.




34      GDDR Solution Design and Implementation Techniques
3
                                                                  GDDR Managed
                                                                    Technologies




EMC provides many hardware and software products that support
applications in the z/OS environment. The following products, some
required and some configuration-specific, are used in EMC GDDR
solutions:
◆   EMC Symmetrix VMAX with EMC Enginuity..............................                               36
◆   EMC TimeFinder ................................................................................   44
◆   EMC Symmetrix Remote Data Facility (SRDF) .............................                           48
◆   EMC SRDF consistency groups........................................................               58
◆   EMC AutoSwap..................................................................................    62
◆   EMC SRDF/Star .................................................................................   64




                                                       GDDR Managed Technologies                           35
GDDR Managed Technologies




     EMC Symmetrix VMAX with EMC Enginuity
                               Symmetrix hardware architecture and the EMC Enginuity™
                               operating environment are the foundation for the Symmetrix storage
                               platform. This environment consists of the following components:
                               ◆   Symmetrix hardware
                               ◆   Enginuity-based operating functions
                               ◆   Mainframe Enablers
                               ◆   Symmetrix application program interface (API) for mainframe
                               ◆   Symmetrix-based applications
                               ◆   Host-based Symmetrix applications
                               ◆   Independent software vendor (ISV) applications


     Symmetrix VMAX hardware platform
                               The Symmetrix VMAX™ design is based on a highly-available VMAX
                               Engine with redundant CPU, memory, and connectivity on two
                               directors for fault tolerance. Symmetrix VMAX Engines connect to
                               and scale out linearly through the Virtual Matrix Architecture®,
                               which allows resources to be shared across Symmetrix VMAX
                               engines. To meet growth requirements, additional engines can be
                               added non-disruptively for efficient and dynamic scaling of capacity
                               and performance.




36     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



           Figure 5 on page 37 shows the Symmetrix VMAX hardware
           architecture.




Figure 5   VMAX architecture




                                  EMC Symmetrix VMAX with EMC Enginuity      37
GDDR Managed Technologies




     EMC Enginuity operating environment
                               The Enginuity operating system provides controls for all components
                               in a Symmetrix array. Enginuity coordinates real-time events related
                               to the processing of production data providing the following services:
                               ◆   Manages system resources to intelligently optimize performance
                                   across a wide range of I/O workload requirements.
                               ◆   Ensures system availability through advanced fault monitoring,
                                   detection and correction capabilities and provides concurrent
                                   maintenance and serviceability features.
                               ◆   Provides the foundation for specific software features available
                                   through EMC disaster recovery, business continuance, and
                                   storage management software.
                               ◆   Supports functional services for both Symmetrix-based
                                   functionality and for a large suite of EMC storage application
                                   software.
                               ◆   Defines priority of each task, including basic system maintenance,
                                   I/O processing, and application processing.
                               ◆   Provides uniform access through APIs for internal calls and
                                   provides an external interface to allow integration with other
                                   software providers.


     Symmetrix features for mainframe
                               Symmetrix storage systems appear to mainframe operating systems
                               as any of the following control units: IBM 3990, IBM 2105, and IBM
                               2107. The physical storage devices can appear to the mainframe
                               operating system as any mixture of different sized 3380 and 3390
                               devices. Mainframe host connectivity is supported through serial
                               ESCON and FICON channels.
                               Fibre Channel and GigE are supported options in SRDF®
                               environments. Symmetrix GigE directors in an SRDF environment
                               provide direct TCP/IP connectivity end-to-end for remote replication
                               solutions over extended distances. This negates costly FC to IP
                               converters and helps utilize the existing IP infrastructure without
                               major disruptions.
                               EMC supported IBM compatibility features include:
                               ◆   Channel Command Emulation for IBM ESS 2105/2107


38     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



                    ◆    Multiple Allegiance (MA)
                    ◆    Parallel Access Volume (PAV) Static and Dynamic
                    ◆    HyperPAV
                    ◆    Concurrent Copy
                    ◆    Dynamic Channel Path Management (DCM)
                    ◆    Dynamic Path Reconnection (DPR) support
                    ◆    Host Data Compression
                    ◆    Logical Path and Control Unit Address Support (CUADD)
                    ◆    Mainframe systems hypervolumes
                    ◆    Partitioned Dataset (PDS) Search Assist
                    ◆    FlashCopy
                    ◆    GDPS
                    ◆    GDPS HyperSwap
                    ◆    Peer to Peer Remote Copy (PPRC)
                    ◆    Extended Remote Copy (XRC)
                    ◆    Extended Address Volume
                    ◆    Extended Distance FICON
                    ◆    High Performance FICON (zHPF)
                    ◆    DFSMS support for Flash Drives
                    ◆    8 Gb/s FICON


EMC Mainframe Enablers
                    Mainframe Enablers (MFE) is a software suite that includes the
                    Symmetrix API runtime libraries for all EMC mainframe software.
                    These software packages can be used to monitor device configuration
                    and status and to perform control operations on devices and data
                    objects within a storage complex. The following components are
                    included in Mainframe Enablers (MFE) 7.0 and above:
                    ◆    ResourcePak® Base for z/OS
                    ◆    SRDF Host Component for z/OS
                    ◆    TimeFinder®/Clone Mainframe SNAP Facility
                    ◆    TimeFinder/Mirror for z/OS


                                               EMC Symmetrix VMAX with EMC Enginuity      39
GDDR Managed Technologies



                                    ◆   Consistency Groups for z/OS
                                    ◆   TimeFinder Utility for z/OS
                                    ◆   AutoSwap

     ResourcePak Base for z/OS
                                    ResourcePak Base delivers EMC Symmetrix Control Facility
                                    (EMCSCF) for IBM and IBM-compatible mainframes. EMCSCF
                                    provides a uniform interface for EMC and ISV software products.
                                    EMCSCF delivers a "persistent address space" on the host that
                                    facilitates communication between the host and the Symmetrix as
                                    well as other EMC-delivered and partner-delivered applications. In
                                    GDDR managed configurations this host is normally the GDDR
                                    Master C-System. ResourcePak Base performs tasks such as the
                                    following:
                                    ◆   Maintaining an active repository of information about EMC
                                        Symmetrix devices attached to z/OS environments and making
                                        that information available to other EMC products.
                                    ◆   Performing automation functions.
                                    ◆   Handling inter-LPAR (logical partition) communication through
                                        the Symmetrix storage system.


                            EMC or ISV developed            Symmetrix
                                 products                                                 Symmetrix
                                                           Control Facility                devices
                               (e. g., TimeFinder, SRDF   (ResourcePak Base)
                                    Host Component)
                                                            EMCSAI
                                   Program Calls            SNAPAPI
                                                            Automation:           I
                                                                SWAP              O
                                                            Metadata:             S
                                                                Config info
                                                                Device status
                                                                Event monitor
                                                                                              ICO-IMG-000104



                    Figure 6        z/OS SymmAPI architecture




40      GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



                        ResourcePak Base provides faster delivery of new Symmetrix
                        functions by EMC and ISV partners, along with easier upgrades. It
                        also provides the ability to gather data when using tools such as
                        TimeFinder/Mirror query because device status information is now
                        cached along with other important information.
                        ResourcePak Base for z/OS is a prerequisite for EMC mainframe
                        applications like the TimeFinder/Clone Mainframe SNAP Facility or
                        SRDF Host Component for z/OS. As of release 7.0, these products are
                        packaged and installed together as Mainframe Enablers, along with
                        TimeFinder/Mirror for z/OS, Consistency Groups for z/OS,
                        TimeFinder Utility for z/OS, and AutoSwap.
                        ResourcePak Base provides the following functionality through
                        EMCSCF.

Cross-system communication
                        Inter-LPAR communication is handled by the EMCSCF cross-system
                        communication (CSC) component. CSC uses a Symmetrix storage
                        system to facilitate communication between LPARs. Several EMC
                        Symmetrix mainframe applications use CSC to handle inter-LPAR
                        communications.

Non-disruptive SymmAPI-MF refreshes
                        As of version 5.3, EMCSCF allows the SymmAPI-MF to be refreshed
                        non-disruptively. Refreshing SymmAPI-MF does not impact
                        currently executing applications that use SymmAPI-MF; for example,
                        SRDF Host Component for z/OS or TimeFinder/Clone Mainframe
                        SNAP Facility.

SRDF/A Monitor
                        The SRDF/A Monitor in ResourcePak Base is designed to:
                        ◆    Find EMC Symmetrix controllers that are running SRDF/A.
                        ◆    Collect and write SMF data about those controllers.
                        After ResourcePak Base is installed, the SRDF/A Monitor is started
                        as a subtask of EMCSCF.

Group Name Service support
                        ResourcePak Base includes support for Symmetrix Group Name
                        Service (GNS). Using GNS, you can define a device group once and
                        then use that single definition across multiple EMC products on
                        multiple platforms. This means that you can use a device group



                                                   EMC Symmetrix VMAX with EMC Enginuity        41
GDDR Managed Technologies



                                defined through GNS with both mainframe and open systems-based
                                EMC applications. GNS also allows you to define group names for
                                volumes that can then be operated upon by various other commands.

        Pool management         With ResourcePak Base V5.7 or higher, generalized device pool
                                management is a provided service. Pool devices are a predefined set
                                of devices that provide a pool of physical space. Pool devices are not
                                host-accessible. The CONFIGPOOL commands allow management of
                                SNAPPOOLS or DSEPOOLS with CONFIGPOOL batch statements.

     SRDF/A Multi-Session Consistency
                                SRDF/A Multi-Session Consistency (MSC) is a task in EMCSCF that
                                ensures remote R2 consistency across multiple Symmetrix storage
                                systems running SRDF/A. MSC provides the following:
                                ◆   Coordination of SRDF/A cycle switches across systems.
                                ◆   Up to 24 SRDF groups in a multi-session group.
                                ◆   One SRDF/A session and one SRDF group per Symmetrix
                                    storage system when using Enginuity release level 5X70.
                                ◆   With Enginuity release level 5X71 and later, SRDF groups are
                                    dynamic and are not limited to one per Symmetrix storage
                                    system. Group commands of ENABLE, DISPLAY, DISABLE,
                                    REFRESH, and RESTART are now available.

     SWAP services
                                ResourcePak Base deploys a SWAP service in EMCSCF. It is used by
                                EMC AutoSwap for planned outages with the ConGroup Continuous
                                Availability Extensions (CAX).

     Recovery services
                                Recovery service commands allow you to perform recovery on local
                                or remote devices (if the links are available for the remote devices).

     Licensed Feature Code management
                                EMCSCF manages Licensed Feature Codes (LFCs) to enable
                                separately chargeable features in EMC software. These features
                                require an LFC to be provided during the installation and
                                customization of EMCSCF. LFCs are available for:
                                ◆   Symmetrix Priority Control
                                ◆   Dynamic Cache Partitioning




42      GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



◆   AutoSwap (Congroup with AutoSwap Extensions) - separate
    LFCs are required for planned and unplanned swaps
◆   EMC z/OS Storage Manager
◆   SRDF/Asynchronous (MSC)
◆   SRDF/Automated Replication
◆   SRDF/Star
◆   TimeFinder/Clone (TARGET)
◆   TimeFinder/Consistency Group (CONSISTENT)
◆   TimeFinder/Snap (VDEV)




                        EMC Symmetrix VMAX with EMC Enginuity      43
GDDR Managed Technologies




     EMC TimeFinder
                               TimeFinder is a family of products that enables both volume-based
                               replication and data set level replication within a single Symmetrix
                               system. Data is copied from Symmetrix devices using array-based
                               resources without using host CPU or I/O. The source Symmetrix
                               devices remain online for regular I/O operations while the copies are
                               created. GDDR automation extends to managing TimeFinder within
                               an array or across multiple arrays, for example to provide "gold copy"
                               enterprise data images for DR testing.
                               TimeFinder enables customers to create multiple point-in-time copies
                               of data, allowing simultaneous execution of business tasks that were
                               previously sequential. GDDR exploits the following TimeFinder
                               modes of operation:
                               ◆   TimeFinder/Mirror enables users to configure special devices
                                   called business continuance volumes (BCVs) to create a mirror
                                   image of Symmetrix standard devices. Using BCVs, TimeFinder
                                   creates a point-in-time copy of data that can be repurposed. The
                                   TimeFinder/Mirror component extends the basic API command
                                   set of Mainframe Enablers to include commands that specifically
                                   manage Symmetrix BCVs and standard devices.
                               ◆   TimeFinder/Clone enables users to make copies of data from
                                   source volumes to target volumes without consuming mirror
                                   positions within the Symmetrix. The data is available to a target's
                                   host immediately upon activation, even if the copy process has
                                   not completed. Data may be copied from a single source device to
                                   multiple target devices. A source device can be either a
                                   Symmetrix standard device or a BCV device.


     TimeFinder/Clone
                               TimeFinder/Clone for z/OS produces point-in-time copies of full
                               volumes or individual datasets. TimeFinder/Clone operations
                               involve full volumes or datasets where the amount of data at the
                               source is the same as the amount of data at the target.
                               TimeFinder/Clone provides significant configuration flexibility
                               because clone copies do not require Symmetrix mirror positions.
                               TimeFinder/Clone source and target devices can have any form of
                               RAID protection. The clone copies can also be configured as a
                               standard device or as a Business Continuance Volume (BCV).


44     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



                        A Full-Volume Snap captures a complete replica of the source volume
                        on the target volume in the local Symmetrix system.
                        TimeFinder/Clone requests that span control units can invoke an
                        external datamover to accomplish the request.


TimeFinder/Consistency Group
                        TimeFinder/Consistency Group, using the Enginuity Consistency
                        Assist (ECA) feature, provides consistent snap operations on multiple
                        volumes so that the targets are dependent-write consistent.
                        TimeFinder/Consistency Group is available for full device, virtual
                        device, and remote full device snaps. The source and target device
                        pairs must reside in the same Symmetrix system. Consistency can be
                        preserved over multiple volumes.


TimeFinder/Mirror for z/OS
                        EMC TimeFinder/Mirror is a business continuance solution. GDDR
                        uses TimeFinder/Mirror to make full-volume copies of production
                        data from a standard Symmetrix device (which is online for regular
                        I/O operations from the host) to a Business Continuance Volume
                        (BCV) with which the standard device is paired. The BCV was a
                        specially tagged volume established when the Symmetrix unit was
                        configured. The BCV functioned as a mirror controlled with the
                        TimeFinder/Mirror ESTABLISH, SPLIT, RE-ESTABLISH, and
                        RESTORE commands. The BCV device can then be separated (split)
                        from the standard device and used for backup, restore, decision
                        support, or applications testing. In GDDR controlled environments
                        GDDR will re-establish split BCVs with their associated devices at the
                        conclusion of test windows or otherwise as required during disaster
                        recovery operations.




                                                                           EMC TimeFinder        45
GDDR Managed Technologies



                               Under Enginuity 5773 and earlier, TimeFinder/Mirror supports two
                               local-replication technologies as shown in Figure 7 on page 46.




                                 TimeFinderMirror                                   TimeFinderClone
                                    Commands                                           Commands




                                                            Emulation API




                                       Mirror                                           Clone

                                                                                                SYM-002533




                    Figure 7   TimeFinder/Mirror under Enginuity 5773 and earlier




46     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



           Starting with Enginuity 5874, TimeFinder/Mirror uses clone
           emulation for all operations, as shown in Figure 8 on page 47.




            TimeFinderMirror                                  TimeFinderClone
               Commands                                          Commands




                                      Emulation API




                                                                   Clone

                                                                           SYM-002534




Figure 8   TimeFinder/Mirror under Enginuity 5874 and later

           There are no specific steps that need to be taken to exploit Clone
           Emulation. Whenever TimeFinder/Mirror internally detects a
           Symmetrix controller running at Enginuity level 5874 and later,
           TimeFinder/Mirror automatically sets the mode to clone emulation.
           All TimeFinder/Mirror commands will be converted to clone
           emulation. GDDR uses TimeFinder/Mirror command syntax. For
           Enginuity 5874 and above code releases Clone Emulation will be
           used-this is transparent to GDDR. So GDDR generated scripts
           continue to operate as they did on TimeFinder/Mirror environments.


                                                               EMC TimeFinder           47
GDDR Managed Technologies




     EMC Symmetrix Remote Data Facility (SRDF)
                               The SRDF family of products provides synchronous and
                               asynchronous remote replication capabilities for Symmetrix storage
                               systems. At the conceptual level, SRDF is mirroring (RAID level 1)
                               one logical disk device (the primary source/R1 within the primary
                               Symmetrix storage system) to a second logical device (the secondary
                               target/R2 within a physically separate secondary Symmetrix storage
                               system) over Fibre Channel or GigE high-speed communication links.
                               SRDF's basic premise is that a remote mirror of data in a different
                               Symmetrix storage system can serve as a valuable resource for:
                               ◆   Protecting data using geographical separation.
                               ◆   Giving applications a second location from which to retrieve data
                                   should the primary location become unavailable for any reason.
                               ◆   Providing a means to establish a set of volumes on which to
                                   conduct parallel operations, such as testing or modeling.
                               GDDR supports SRDF Synchronous, SRDF/ Asynchronous and
                               SRDF/Star solutions in various forms of two and three-site solutions.
                               The following paragraphs describe the SRDF features exploited by
                               GDDR.


     SRDF/Synchronous
                               SRDF synchronous (SRDF/S) mode facilitates disaster recovery
                               within the customer's campus or metropolitan area network through
                               real-time synchronous remote mirroring from one Symmetrix system
                               to one or more Symmetrix systems.


     SRDF/Asynchronous
                               SRDF asynchronous (SRDF/A) mode provides a long distance
                               disaster restart data image with minimal impact on performance.
                               Symmetrix systems implement asynchronous mode host writes from
                               the primary Symmetrix to the secondary Symmetrix system using
                               dependent-write consistent delta sets transferred in cycles:
                               ◆   Each delta set contains groups of write I/Os for processing,
                                   which are managed for dependent-write consistency by the
                                   Enginuity operating environment.



48     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



           ◆   SRDF/A transfers these sets of data using cycles of operation, one
               cycle at a time, between the primary Symmetrix system and the
               secondary Symmetrix system.
           Write Folding improves the efficiency of the SRDF network links. If
           there are multiple data updates in the same cycle, the systems send
           the most current data across the SRDF links. This is called Write
           Folding and is major advantage over competitive asynchronous
           replication solutions as it decreases network bandwidth consumption
           and the number of I/Os the SRDF director processes, thereby
           reducing the system overhead per host I/O.
           Figure 9 on page 49 illustrates SRDF/A delta sets and their
           relationships to cycle processing.
                         Primary Symmetrix     Secondary Symmetrix


                          R1                      Apply     R2
                                 Capture
                                   N               N-2

                                 Transmit        Receive
                                   N-1            N-1
                          R1                                R2




                    Capture       Transmit      Receive           Apply
                    “Active”      “Inactive”   “Inactive”        “Active”
                     cycle          cycle        cycle            cycle
                                                                            SYM-001276




Figure 9   SRDF/A delta sets

           In single session mode, the Symmetrix SRDF director ensures
           dependent-write consistency within SRDF/A by obtaining the active
           cycle number from a single location in global memory and assigning
           it to each I/O. The director retains that cycle number even if a cycle
           switch occurs during the life of that I/O. This results in an atomic
           cycle switch process for dependent-write sequences, even though it is
           not physically an atomic event across a range of volumes. As a result,
           two I/Os with a dependent relationship between them can be in the
           same cycle, or the dependent I/O can be in a subsequent cycle.
           Delta set switching is at the core of the SRDF/A active session
           operation. The following is required before a primary Symmetrix
           system cycle switch can occur:




                                     EMC Symmetrix Remote Data Facility (SRDF)           49
GDDR Managed Technologies



                               ◆   The transmit delta set must have completed data transfer to the
                                   secondary Symmetrix system.
                               ◆   The minimum cycle time (single session mode) or minimum cycle
                                   target time (MSC mode) must be reached.
                               ◆   The previous Apply delta set must have been completed.
                               During the delta-set switching process:
                               1. Write I/Os are collected in the capture delta set on the primary
                                  Symmetrix system. The previous cycle's transmit delta set is
                                  completing the SRDF transfer to the secondary Symmetrix system
                                  receive delta set, which is the N-1 copy. The secondary Symmetrix
                                  system apply delta set (N-2) is written to global memory, so that
                                  data is marked write pending to the secondary devices.
                               2. The primary Symmetrix system waits for the minimum cycle time
                                  to elapse and the transmit delta set to empty, meaning that all
                                  data has been transferred to the secondary Symmetrix system.
                               3. Once these conditions are satisfied, the primary Symmetrix
                                  system sends a commit message to the secondary Symmetrix
                                  system to begin the secondary Symmetrix system cycle switch.
                               4. On the primary system, the new capture delta set is available to
                                  collect new host I/Os. Before the secondary Symmetrix system
                                  cycle switch can occur, the following must be true:
                                   • The secondary Symmetrix system receives the commit
                                     message from the primary Symmetrix system.
                                   • Apply delta set (N-2 copy) completes its write process
                                     marking the data write pending to the secondary devices.
                               5. Once the secondary Symmetrix system receives the commit
                                  message from the primary Symmetrix system, the secondary
                                  Symmetrix system verifies the apply delta set has been written.
                                  This occurs while the primary Symmetrix system is performing
                                  the cycle switch between the capture and transmit delta sets.
                               6. The next step is a delta set cycle switch on the secondary
                                  Symmetrix system between the receive (inactive) and apply
                                  (active) delta sets. This preserves the dependent-write consistent
                                  copy at the secondary Symmetrix system prior to receiving the
                                  next dependent-write consistent delta set.
                               7. The secondary Symmetrix system sends an acknowledgement to
                                  the primary Symmetrix system. The data in the apply delta set is
                                  written to disk.


50     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



                               Figure 10 on page 51 shows the delta set switching process.

                 1                               3a
                                                          1. Capture delta set (DS) collects
                                           3e                application write I/O
            R1                                   R2
            N        Capture             Apply            2. Primary waits for the minimum cycle
                                                 N-2
                       N                  N-2                time, and for the Transmit DS to empty
            2d             2c       3b
                                     Receive
                                                             a) Primary tells Secondary to commit the Receive
                     Transmit
                       N-1    2b, 4a  N-1                       DS (begins Secondary step 3 in unison)
            R1       2                      3c               b) SRDF transfer halted
            N                                    R2
                                                 N-2         c) Primary cycle switch occurs – Capture DS
            2d                                                  becomes the Transmit DS
           Primary Symmetrix        Secondary Symmetrix      d) New Capture DS available for Host I/O
                                                          3. Secondary receives commit from Primary
                                                             a) Check if the data in Apply DS is restored (data
                                                                marked write pending to the R2 devices)
                                                             b) Secondary cycle switch –
                                                                Receive DS becomes Apply DS
                                                             c) New Receive DS available for SRDF transfer
                                                             d) Secondary sends Primary
                                                                acknowledgement
                                                             e) Begin restore of Apply DS
                                                          4. Primary receives acknowledgement of Secondary
                                                             cycle switch
                                                             a) SRDF transfer begins                    SYM-001274



            Figure 10          Delta set switching



SRDF/A Multi-Session Consistency (MSC)
                               SRDF/A Multi-Session Consistency (MSC) supports SRDF/A
                               operations in configurations where there are multiple primary
                               Symmetrix systems or multiple primary Symmetrix system SRDF
                               groups connected to multiple secondary Symmetrix systems or
                               multiple secondary Symmetrix systems SRDF groups. SRDF/A MSC
                               configurations can also support mixed open systems and mainframe
                               data controlled within the same SRDF/A MSC session.
                               GDDR managed solutions using SRDF asynchronous replication
                               require MSC because the interface to control asynchronous
                               replication is provided via MSC.
                               Achieving data consistency across multiple SRDF/A groups requires
                               the cycle switch process be coordinated among the participating
                               Symmetrix system SRDF groups or systems, and that the switch
                               occur during a very brief time period when no host writes are being


                                                          EMC Symmetrix Remote Data Facility (SRDF)                  51
GDDR Managed Technologies



                               serviced by any participating Symmetrix system. SRDF control
                               software running on the host provides a single coordination point to
                               drive the cycle switch process in all participating Symmetrix systems.
                               I/Os are processed exactly the same way in SRDF/A MSC mode as
                               they are in single session mode:
                               1. The active cycle on the primary Symmetrix system contains the
                                  current host writes or N data version in the capture delta set.
                               2. The inactive cycle contains the N-1 data version that is transferred
                                  using SRDF/A from the primary Symmetrix system to the
                                  secondary Symmetrix system. The primary inactive delta set is
                                  the transmit delta set and the secondary Symmetrix system's
                                  inactive delta set is the receive delta set.
                               3. The active cycle on the secondary Symmetrix system contains the
                                  N-2 data version of the apply delta set. This is the guaranteed
                                  dependent-write consistent image in the event of a disaster or
                                  failure.




52     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



            Figure 11 on page 53 illustrates the delta sets and their relationships
            to SRDF/A cycles.
                           Primary Symmetrix     Secondary Symmetrix


                            R1                      Apply     R2
                                   Capture
                                     N               N-2

                                   Transmit        Receive
                                     N-1            N-1
                            R1                                R2



                                   Capture          Apply
                                     N               N-2
                            R1                                R2
                                   Transmit        Receive
                                     N-1            N-1

                                   Capture          Apply
                                     N               N-2
                            R1                                R2
                                   Transmit        Receive
                                     N-1            N-1



                      Capture       Transmit      Receive           Apply
                      “Active”      “Inactive”   “Inactive”        “Active”
                       cycle          cycle        cycle            cycle
                                                                              SYM-001275




Figure 11   SRDF/A cycles and delta sets

            For the host to control the cycle switch process, the Symmetrix
            systems must be aware that they are running in multi-session
            consistency mode. This is done using the SRDF control software
            running on the host. The host software:
            1. Coordinates the cycle switching for all SRDF/A sessions
               comprising the SRDF/A MSC configuration.
            2. Monitors for any failure to propagate data to the secondary
               Symmetrix system devices and drops all SRDF/A sessions
               together to maintain dependent-write consistency.
            3. Performs MSC cleanup if able.
            As part of the process to enter MSC mode, and with each cycle switch
            issued thereafter, Enginuity assigns a cycle tag to each new capture
            cycle. That cycle tag is retained throughout that cycle's life. This cycle
            tag is a value that is common across all participating SRDF/A


                                       EMC Symmetrix Remote Data Facility (SRDF)           53
GDDR Managed Technologies



                               sessions and eliminates the need to synchronize the cycle numbers
                               across them. The cycle tag is the mechanism by which
                               dependent-write consistency is assured across multiple MSC
                               controlled participating Symmetrix system SRDF groups or systems.
                               SRDF/A MSC mode performs a coordinated cycle switch during a
                               very short window of time referred to as an SRDF/A window, which
                               is actually a flag (open/closed) indicating when there are no host
                               writes being completed. The SRDF/A window flag is an attribute of
                               the SRDF/A group and is checked at the start of each I/O, imposing
                               no additional overhead because the front-end director is already
                               obtaining the cycle number from global memory as part of the
                               existing SRDF/A cycle switch operations.
                               When the host software discovers that all the SRDF groups and
                               Symmetrix systems are ready for a cycle switch, MSC "opens" the
                               SRDF/A window and issues a single command to each SRDF group
                               to perform a cycle switch. In multi-session mode, the front-end
                               director obtains the cycle number at the start of each write and also
                               checks the SRDF/A window flag, if is the flag is on (an open
                               window) the front-end director disconnects upon receiving host write
                               I/O and begins polling to determine when the SRDF MSC host
                               control software has closed the window. While the window is open,
                               any write I/Os that start are disconnected and as a result no
                               dependent-write I/Os are issued by any host to any devices in the
                               SRDF/A MSC session.
                               The SRDF/A window remains open on each SRDF group and
                               Symmetrix system until the last SRDF group and Symmetrix system
                               in the multi-session group acknowledges to the SRDF MSC host
                               control software that the open and switch command has been
                               processed indicating a successful cycle switch. At this point the SRDF
                               MSC host control software issues a close command for each SRDF/A
                               group under MSC control. As a result, dependent-write consistency
                               across the SRDF/A MSC session is ensured.


     Concurrent SRDF
                               Enginuity version 5567 and later support the ability for a single
                               primary device to be remotely mirrored to two secondary devices
                               concurrently. This feature is called concurrent SRDF and is supported
                               on Fibre Channel, Gigabit Ethernet (GigE) topologies. In concurrent
                               SRDF, the primary (R11) devices must be assigned to two different



54     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



                SRDF groups. This is because each remote mirror can be assigned to
                only one SRDF group, even if the two secondary devices reside in the
                same Symmetrix system.
                The secondary devices in a concurrent SRDF configuration can be
                operating in synchronous, adaptive copy, or asynchronous mode. The
                only unsupported combination is for both secondary devices to be
                operating in asynchronous mode. Concurrent SRDF/S with SRDF/A
                provides the ability to remotely mirror a group of devices in
                synchronous mode to one secondary site and in asynchronous mode
                to a different secondary site which may be an extended distance site.
                Normal I/O operational rules for SRDF also apply to Concurrent
                SRDF configurations. When operating in synchronous mode, ending
                status for an I/O is not presented to the host until the remote
                Symmetrix system acknowledges receipt of the I/O to the primary
                Symmetrix system. If both secondary devices are operating in
                synchronous mode, ending status is not presented to the host until
                both devices acknowledge receipt of the I/O. If one remote mirror is
                in synchronous mode and one remote mirror is in adaptive copy or
                asynchronous mode, ending status is presented to the host when the
                synchronous device acknowledges receipt of the I/O. GDDR
                automation provides scripts that allow customers to convert from
                concurrent to cascaded mode provided that the initial configuration
                is installed as cascaded mode.


Cascaded SRDF
                Cascaded SRDF is a three-site disaster recovery solution where data
                from a primary site is synchronously replicated to a secondary site,
                and then asynchronously replicated from the secondary site to a
                tertiary site, as shown in Figure 12 on page 56. The core benefit
                behind a cascaded configuration is its inherent capability to continue
                replicating, with minimal user intervention, from the secondary site
                to the tertiary site in the event that the primary site fails. This enables
                a faster recovery at the tertiary site, provided that the tertiary site is
                where the customer wishes to restart production operations.




                                          EMC Symmetrix Remote Data Facility (SRDF)           55
GDDR Managed Technologies



                                Cascaded SRDF uses dual-role SRDF devices (R21 devices) on the
                                secondary site which acts as both an R2 to the primary site and an R1
                                to the tertiary site as shown in the following figure.

                  Host                   Primary                 Secondary                  Tertiary




                                           R1        SRDF/S        R21         SRDF/S          R2



                                                                                               SYM-002536



                    Figure 12   Cascaded SRDF configuration

                                The following SRDF modes are allowed in Cascaded SRDF:
                                ◆   R1 -> R21: SRDF/S, SRDF/A, Adaptive copy disk mode,
                                    Adaptive copy write-pending mode
                                ◆   R21 -> R2: SRDF/A, Adaptive copy disk mode
                                The most common implementation is for the first hop to be in
                                SRDF/S mode, and the second hop to be in SRDF/A mode.
                                The following limitations apply to Cascaded SRDF:
                                ◆   Only one hop (R1 -> R21, or (R21 -> R2) can be asynchronous at a
                                    time
                                ◆   Like concurrent SRDF, the two SRDF mirrors of an R21 cannot be
                                    configured using the same SRDF group. GDDR automation
                                    provides scripts that allow customers to convert from cascaded to
                                    concurrent mode provided that the initial configuration is
                                    installed as cascaded mode.


     SRDF/Extended Distance Protection
                                SRDF/Extended Distance Protection (EDP) is a three-site
                                configuration that requires Enginuity 5874 and later running on the
                                secondary Symmetrix system and Enginuity 5773 or 5874 and later
                                running on the primary and tertiary systems. Figure 13 on page 57
                                shows an example of an SRDF/EDP basic configuration.




56     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



              SRDF/EDP is achieved through a cascaded SRDF setup, where a
              Symmetrix VMAX system at a secondary site uses diskless R21
              devices to capture only the differential data that would be owed to
              the tertiary site in the event of a primary site failure.

                        Primary                   Secondary                    Tertiary
Host


                                      SRDF/S                       SRDF/A
                                                     DL                          R2
                           R1
                                                     Dev



                        Full copy                   Diskless                  Full copy
                                                  R21 decive
                                                (Not full copy-
                                               differences only)                 SYM-002130



  Figure 13   SRDF/EDP basic configuration

              SRDF/EDP provides a long distance replication solution with the
              ability to achieve zero RPO at the tertiary site. This is also a lower cost
              solution when compared to three-site solutions such as concurrent
              and cascaded SRDF.




                                        EMC Symmetrix Remote Data Facility (SRDF)             57
GDDR Managed Technologies




     EMC SRDF consistency groups
                               GDDR manages a single consistency group. Devices in the group can
                               be FBA (Open Systems) or CKD (mainframe) format. The design of
                               ConGroup and GDDR allows for customers to build HA and DR
                               solutions around the concept of the enterprise. Because GDDR is
                               installed outside of the scope of a Sysplex-GDDR controlled solutions
                               can also span multiple Sysplex environments. An SRDF consistency
                               group is a collection of related Symmetrix devices that are configured
                               to act in unison to maintain data integrity. The devices in consistency
                               groups can be spread across multiple Symmetrix systems.
                               Consistency Groups for z/OS (ConGroup) is an SRDF product
                               offering designed to ensure the dependent-write consistency of the
                               data remotely mirrored by SRDF/S operations in the event of a
                               rolling disaster. Most applications, and in particular database
                               management systems (DBMSs), have dependent-write logic
                               embedded in them to ensure data integrity if a failure occurs, either
                               hardware or software.
                               ConGroup is based on the concept of dependent-write operations. A
                               dependent-write is a write not issued by an application until a prior,
                               related write I/O operation is completed. An example of
                               dependent-write activity is a database update:
                               1. The DBMS writes to the disk containing the transaction log.
                               2. The DBMS writes the data to the actual database.
                               3. The DBMS writes again to the log volume to indicate that the
                                  database update was made.
                               In a remotely mirrored environment, data consistency cannot be
                               ensured if one of the writes is remotely mirrored, but its predecessor
                               write was not. This could occur, for example, in a rolling disaster
                               where a communication loss occurs and affects only a subset of the
                               devices involved in the remote copy function.
                               ConGroup prevents a rolling disaster from affecting data integrity at
                               the secondary site. When ConGroup detects any write I/O to a
                               volume that cannot communicate with its R2 (secondary) mirror, it
                               suspends the remote mirroring for all volumes defined to the
                               consistency group before completing the intercepted I/O and
                               returning control to the application. This is referred to as a ConGroup
                               trip. In this way, ConGroup ensures a dependent-write consistent




58     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



            image is available at the secondary site by preventing a
            dependent-write I/O from reaching the secondary site if the previous
            I/O only gets as far as the primary mirror.
            I/O to the primary devices in the consistency group can still occur
            even when the devices are Not Ready on the SRDF links. Such
            updates are not immediately sent to the secondary site. However,
            they are propagated after the affected links are again operational, and
            data transfer from the primary devices to the secondary devices
            resumes.
            Assume an SRDF configuration consists of three Symmetrix systems
            with primary devices, and two additional Symmetrix systems with
            secondary devices. The systems with primary devices send data to
            the systems with secondary devices as shown in Figure 14 on
            page 59.


             Primary 1          Primary 2         Primary 3




                     Secondary 1                 Secondary 2


                                                        SYM-002537



Figure 14   Primary and secondary relationships




                                                  EMC SRDF consistency groups         59
GDDR Managed Technologies



                                Next, assume that the links between primary system 2 and secondary
                                system 1 fail. Without a consistency group, primary systems 1 and 3
                                continue to write data to the secondary site systems 1 and 2 while
                                primary system 2 does not, as shown in Figure 15 on page 60. The
                                copy of the data spread across secondary systems 1 and 2 becomes
                                inconsistent.


                                  Primary 1        Primary 2        Primary 3




                                          Secondary 1              Secondary 2


                                                                          SYM-002538




                    Figure 15   Failed links

                                However, if primary systems 1, 2, and 3 belong to a consistency
                                group, as shown in Figure 16 on page 61, and the links between
                                primary system 2 and secondary system 1 fail, the consistency group
                                automatically stops primary systems 1 and 3 from sending data to
                                secondary systems 1 and 2. Thus, the dependent-write consistency of
                                the data spanning secondary systems 1 and 2 remains intact.




60     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies




                                    Consistency group



                      Primary 1         Primary 2          Primary 3




                             Secondary 1                  Secondary 2


                                                                         SYM-002539




Figure 16   Primary systems 1, 2, and 3 in a consistency group




                                                    EMC SRDF consistency groups       61
GDDR Managed Technologies




     EMC AutoSwap


                                           Consistency group



                             Primary 1        Primary 2         Primary 3




                                    Secondary 1                Secondary 2


                                                                             SYM-002540

                               EMC AutoSwap provides the ability to move (swap) workload I/Os
                               transparently from volumes in one set of Symmetrix storage systems
                               to volumes in other Symmetrix storage systems without operational
                               interruption. AutoSwap, with SRDF and EMC Consistency Groups,
                               dramatically increases data availability.
                               AutoSwap differs from IBM HyperSwap in that AutoSwap is not
                               confined to operating completely within a Sysplex. AutoSwap can
                               coordinate swaps for multiple monoplex or Sysplex systems in a
                               single swap event.
                               Swaps may be initiated either manually as planned events or
                               automatically as unplanned events (upon failure detection).
                               ◆   Planned swaps facilitate operations such as non-disruptive
                                   building maintenance, power reconfiguration, DASD relocation,
                                   and channel path connectivity reorganization.
                               ◆   Unplanned swaps protect systems against outages in a number of
                                   scenarios. Examples include: power supply failures, building
                                   infrastructure faults, air conditioning problems, loss of channel
                                   connectivity, entire DASD system failures, operator error, or the
                                   consequences of intended or unintended fire suppression system
                                   discharge.


62     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



◆   In GDDR managed environments planned swap events are
    initiated and controlled by GDDR. Unplanned swap events are
    initiated and controlled by AutoSwap. GDDR monitors for
    unplanned swap events and will re-configure surviving
    infrastructure to deal with the new source of write I/O activity
    after the successful unplanned swap event.
Swaps are concurrently performed while application workloads
continue in conjunction with EMC Consistency Groups. This option
protects data against unforeseen events, and ensures that swaps are
unique, atomic operations that maintain dependent-write
consistency.




                                                   EMC AutoSwap        63
GDDR Managed Technologies




     EMC SRDF/Star
                               SRDF/Star is a three-site disaster recovery solution consisting of a
                               primary Symmetrix system (Site A) and two remote Symmetrix
                               systems (Sites B and C). SRDF/Star provides advanced multisite
                               business continuity replication that augments concurrent SRDF/S
                               (synchronous) and SRDF/A (asynchronous) operations for the same
                               primary volumes with the ability to incrementally establish or
                               reestablish SRDF/A replication between the primary (R1) and remote
                               (R2) sites in the event of a primary site outage.
                               In mainframe environments, EMC GDDR is a mandatory
                               requirement for all SRDF/Star deployments to automate site swap
                               activity and/or disaster restart. In the event of a primary site outage,
                               the EMC GDDR SRDF/Star solution allows customers to quickly
                               move operations and re-establish protection between the remaining
                               sites. Once conditions permit, customers can rejoin the primary site to
                               the configuration, resuming SRDF/Star protection.
                               SRDF/Star can operate in concurrent or cascaded environments,
                               providing the appropriate differential synchronization. These
                               environments address different recovery and availability objectives:
                               ◆   Concurrent SRDF/Star positions the secondary site or the remote
                                   site as potential recovery sites, and provides differential
                                   resynchronization between the secondary and remote sites. To
                                   achieve this positioning, some level of reconfiguration
                                   intervention is required to access point-of-disaster data.
                               ◆   Cascaded SRDF/Star positions only the remote site as the
                                   recovery site with minimal intervention to access
                                   point-of-disaster data. This solution differentially synchronizes
                                   data between the primary site and the remote site.
                               ◆   SRDF/EDP can also be used in an SRDF/Star configuration. By
                                   design, SRDF/EDP only allows recovery at the tertiary site.
                               SRDF/Star provides rapid reestablishment of cross-site protection in
                               the event of primary site failure. Rather than a full resynchronization
                               between the remote sites, SRDF/Star provides differential
                               synchronization between the remote sites, and dramatically reduces
                               the time it takes to remotely protect the new production site.
                               SRDF/Star also provides a mechanism to determine which remote
                               site has the most current data in the event of a rolling disaster that




64     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



affects the primary site. In all cases, users maintain the ability to
choose which site to operate from and which site's data to use when
recovering from a primary site failure.
The host-based Multi-Session Consistency (MSC) task at the primary
(R1) site controls normal SRDF/Star operation. MSC performs
session management at SRDF/S site B and when necessary at
SRDF/A site C. The MSC session management task maintains the
information needed to perform differential synchronization between
site B and site C. Automation for SRDF/Star mainframe
configuration management is provided by EMC GDDR.
In Concurrent SRDF/Star operations, site B serves as the secondary
site and the target of the SRDF/S links from site A. Site C serves as
the tertiary site and the secondary site of the SRDF/A links from site
A. The recovery links are the SRDF/A links between site C and site B.
Figure 17 on page 66 shows a Concurrent SRDF/Star configuration.




                                                    EMC SRDF/Star        65
GDDR Managed Technologies




                                Primary site (A)                           Secondary site (B)




                                     R1             SRDF/Synchronous             R2




                                                     SRDF/Asynchronous                SRDF/A
                                                                                      (recovery links)




                                                                                      Out of region site C




                                    Active                                       R2
                                    Inactive


                                                                                                 SYM-002541



                    Figure 17    Concurrent SRDF/Star configuration

                                 SRDF/Star in a cascaded SRDF configuration has the ability to
                                 incrementally establish an SRDF/A session between the primary and
                                 the asynchronous site in the event the synchronous site fails. With
                                 cascaded SRDF/Star, the synchronous secondary site is always more
                                 current than the asynchronous secondary site




66     GDDR Solution Design and Implementation Techniques
GDDR Managed Technologies



               Figure 18 on page 67 shows a basic cascaded SRDF/Star
               configuration. Site B serves as the secondary site and the target of the
               SRDF/S links from site A. Site C serves as the out of region site and
               the target of the SRDF/A links from site B. The recovery SRDF/A
               links are between site C and site A.

            Primary site (A)                               Secondary site (B)




                 R1               SRDF/Synchronous              R21




                                    SRDF/A
                                    (recovery links)                  SRDF/Asynchronous




                                                                      Out of region site C




                Active                                           R2
                Inactive


                                                                                SYM-002542



Figure 18      Cascaded SRDF/Star configuration




                                                                      EMC SRDF/Star          67
GDDR Managed Technologies



                                       Figure 19 on page 68 illustrates a cascaded SRDF/Star EDP diskless
                                       environment with the remote devices at the asynchronous secondary
                                       site. The diskless R21 device streamlines the link connections to the
                                       asynchronous site in cascaded mode. No data copies are available at
                                       the synchronous target site B because that site's Symmetrix use
                                       diskless cache only R21 devices.
                                                                    Synchronous secondary site (B)
                    Primary site (A)                                (diskless DL Dev)




                         R11                   SRDF/Synchronous               R21




                                                 SRDF/A
                                                 (recovery links)                   SRDF/Asynchronous




                                                                                    Asynchronous secondary site (C)




                        Active                                                 R2
                        Inactive


                                                                                                           SYM-002543



                    Figure 19          Cascaded SRDF/Star with EDP (diskless) configuration




68     GDDR Solution Design and Implementation Techniques
4
                                                   EMC Geographically
                                                     Dispersed Disaster
                                                    Restart (EMC GDDR)




This chapter contains the following:
◆   Supported configurations ................................................................. 72
◆   EMC GDDR requirements ................................................................ 89
◆   EMC GDDR fundamental concepts ................................................ 93
◆   EMC GDDR components.................................................................. 98
◆   EMC GDDR procedures.................................................................. 114
◆   EMC GDDR supported scripts....................................................... 116




            EMC Geographically Dispersed Disaster Restart (EMC GDDR)                                69
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                     EMC Geographically Dispersed Disaster Restart (EMC GDDR) can be
                                     implemented in a variety of configurations involving two or
                                     three-sites, SRDF/S, SRDF/A, ConGroup, AutoSwap, SRDF/EDP,
                                     and SRDF/Star. In the mainframe environment, EMC GDDR is a
                                     requirement for a SRDF/Star configuration. EMC GDDR can manage
                                     environments that are comprised of the following elements:
                                     ◆   Multiple z/OS systems
                                     ◆   Multiple Sysplexes
                                     ◆   Multiple Symmetrix controllers
                                     ◆   Intermix of CKD and FBA/FBAM DASD and BCVs
                                     In each configuration, EMC GDDR provides specific capabilities
                                     tailored to that configuration. However, the major features of EMC
                                     GDDR are common across all topologies.
                                     Because EMC GDDR manages production systems following
                                     disasters, it does not reside on the same servers that it is seeking to
                                     protect. EMC GDDR resides on separate z/OS systems from the host
                                     servers that run your application workloads. EMC GDDR is installed
                                     on a control LPAR at each site. These control LPARs are referred to as
                                     GDDR nodes, Control Systems, or C-Systems. Each EMC GDDR node
                                     is aware of the other EMC GDDR nodes through network
                                     connections between each site. This multi-site awareness allows EMC
                                     GDDR to detect disasters and identify survivors. In two-site
                                     configurations, EMC GDDR can recover business at the surviving
                                     site. In three-site configurations, EMC GDDR can nominate a control
                                     LPAR to assume the leadership role for EMC GDDR and recover
                                     business at one of the surviving sites.
                                     To achieve the task of business restart, EMC GDDR automation
                                     extends well beyond the disk level and into the host operating system
                                     level where sufficient controls and access to third party software and
                                     hardware products exist to enable EMC GDDR to provide automated
                                     recovery capabilities.
                                     EMC GDDR can distinguish normal operational disruptions from
                                     disasters and respond accordingly. For example, EMC GDDR is able
                                     to distinguish between network outages (SRDF link drop) and real
                                     disasters. This awareness is achieved by periodic exchange of
                                     dual-direction heartbeats between the EMC GDDR C-Systems. EMC
                                     GDDR constantly checks for disaster situations and ensures that
                                     other GDDR systems are "healthy." This checking allows EMC GDDR
                                     to recognize, and act on, potential disaster situations, even if only one
                                     EMC GDDR C-system survives. "Split brain" problems associated

70      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



with cluster technologies are avoided through operator prompts.
Upon the initial recognition stage, EMC GDDR issues messages to the
operator console seeking confirmation of the event and, further,
confirmation of restart actions required.
In any EMC GDDR installation, one Control System (C-System) is
designated as the Master C-System. In a two-site configuration,
failure of the Master C-System will result in the remaining C-System
becoming the Master. In a three-site configuration, if a local or
regional disaster occurs EMC GDDR can determine which of the
surviving sites will execute the recovery. Changes to EMC GDDR
configuration information can only be made on the EMC GDDR
Master Control System. EMC GDDR propagates these changes to the
subordinate EMC GDDR C-system(s) using inter-system
communications. Restart procedures following disasters are
coordinated from the EMC GDDR Master C-System. EMC GDDR
scripts, whether for planned or unplanned scenarios, can only be
executed from the Master C-System. If necessary, the Master role can
be manually transferred from one C-System to another through the
EMC GDDR user interface.




                                                                             71
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




     Supported configurations
                                     EMC GDDR is available to monitor and manage the following
                                     configurations and topologies:
                                     Two-site solutions:
                                     ◆   SRDF/S with ConGroup — The two-site SRDF/S with ConGroup
                                         configuration provides disaster restart capabilities at site DC2.
                                     ◆   SRDF/S with AutoSwap — The two-site SRDF/S with AutoSwap
                                         configuration provides for near-continuous availability through
                                         device failover between DC1 and DC2.
                                     ◆   SRDF/A — The two-site SRDF/A configuration provides
                                         disaster restart capabilities at site DC3.
                                     Three-site solutions:
                                     ◆   SRDF/Star — The three-site SRDF/Star configuration provides
                                         disaster restart capabilities at either the Secondary DASD site or
                                         DC3. Concurrent and Cascaded SRDF support further minimize
                                         the DC3 recovery time objective. R22 support is available,
                                         removing the need to create device pairs during GDDR scripts.
                                     ◆   SRDF/Star with AutoSwap — The three-site SRDF/Star with
                                         AutoSwap configuration provides for near-continuous
                                         availability through device failover between DC1 and DC2 as
                                         well as disaster restart capabilities at DC3. Concurrent and
                                         Cascaded SRDF support further minimize the DC3 recovery time
                                         objective. R22 support is available, removing the need to create
                                         device pairs during GDDR scripts.
                                     ◆   SRDF/Star with EDP.
                                     EMC GDDR functionality, controlled by a parameter library, can be
                                     customized to operate in any of these two or three-site
                                     configurations. During EMC GDDR implementations, the EMC
                                     GDDR parameter library is customized to reflect:
                                     ◆   The prerequisite software stack.
                                     ◆   The desired data center topology (two-site versus three-site,
                                         synchronous or asynchronous).




72      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                        An EMC GDDR complex consists of EMC GDDR control systems
                        (C-Systems), the z/OS and open systems hosts, and EMC Symmetrix
                        storage systems which support an organization's mission-critical
                        workload. A single GDDR complex is able to support multiple
                        Sysplexes.


SRDF/S with ConGroup
                        The two-site SRDF/S with ConGroup configuration provides disaster
                        restart capabilities at site DC2. Figure 20 on page 73 illustrates EMC
                        GDDR operation in the SRDF/S with Consistency Group
                        environment.



                         DC1                                        DC2


                           EMC                                                             EMC
                           GDDR                                                            GDDR



                                                ConGroup         ConGroup

                                        R1                                        R2
                                                           SRDF/S



                        GDDR heartbeat communication
                           Active Escon/Ficon channels
                         Standby Escon/Ficon channels
                                     Active SRDF links                                        SYM-002226



            Figure 20   SRDF/S with ConGroup environment

                        As Figure 20 on page 73 shows, the relationship between the DC1 and
                        DC2 sites is maintained through SRDF/S replication of primary disk
                        images at DC1 to DC2. Both open systems (FBA) and mainframe
                        (CKD) disk images can be replicated. Figure 20 on page 73 also shows
                        the two EMC GDDR C-Systems with their heartbeat communication
                        paths, separate from the production disk and computer facilities. The
                        C-system at the Primary DASD site has EMC Consistency Group
                        software installed. It is recommended that one or more of the Primary
                        DASD site production z/OS systems also has EMC Consistency



                                                                          Supported configurations         73
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                     Group software installed. SRDF/S and ConGroup ensure that at the
                                     point that EMC GDDR receives notification of an unplanned or
                                     failure event, a point of consistency is already achieved.
                                     In this environment, EMC GDDR can do the following:
                                     ◆   Manage planned site swaps
                                     ◆   Restart processing at the secondary site following unplanned
                                         primary site events
                                     ◆   Perform standard operational tasks:
                                         • IPL, system reset, activate, deactivate
                                         • Trigger stop/start of business workloads
                                     ◆   Actively monitor for unplanned/failure events
                                         • Sites
                                         • Systems
                                         • Loss of SRDF/S
                                         • ConGroup trip
                                         • Inter-site communication failure

     EMC GDDR SRDF/S complex
                                     Each GDDR complex manages one consistency group. A consistency
                                     group is a named group of source (R1) volumes managed by the EMC
                                     Consistency Group (ConGroup) application as a unit. The volumes
                                     can be any mix of FBA and CKD devices on multiple Symmetrix units
                                     supporting a system, a SYSPLEX, or multiple SYSPLEXs, as well as
                                     "Open systems" hosts (Windows and various Unix flavors). Figure 21
                                     on page 75 depicts a logical view of a typical EMC GDDR complex.




74      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




                   Primary site                                            Secondary site

DC1                                                                                                  DC2
                     Sysplex #n                               Sysplex #n
                          Sysplex #2                              Sysplex #2
       C1                                                                                     C2
                              Sysplex #1                               Sysplex #1
                                                                                            Master C




                                                 SRDF/S
      Local                            R1                                      R2            Local

                      BCV                                        BCV


                                                                                                   SYM-002219



              Figure 21     EMC GDDR complex with SRDF/S

                            The following are brief descriptions of the components that comprise
                            this GDDR complex:
                            BCVs — BCVs (Business Continuance Volumes) can be supported at
                            each of the sites. They may be established at the DC2 site and split at
                            the DC1 site. BCVs taken at the recovery site are often referred to as
                            "Gold Copy" devices.
                            C1 and C2 — C1 and C2 are the EMC GDDR Control LPARs (or
                            C-Systems) at each of the sites.
                            Primary site — The primary site is the site where the production
                            workload is located.
                            Primary DASD site — The primary DASD (direct access storage
                            device) site is the site where the source (R1) DASD is located. The
                            primary DASD site is the same as the primary site.
                            Secondary site — The secondary site is the site where the
                            contingency or standby systems are located.
                            Secondary DASD site — The secondary DASD site is the site where
                            the target (R2) DASD is located. The secondary DASD site is the same
                            as the secondary site.
                            Sites DC1 and DC2 — Sites DC1 and DC2 are the primary and
                            secondary data centers of critical production applications and data.
                            DC1 is the primary site, with SRDF/S data replication to the
                            secondary site, DC2. These sites are considered fully equivalent for


                                                                           Supported configurations             75
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                     strategic production applications, connected with highly redundant
                                     direct network links. At all times, all production data is replicated
                                     synchronously between the two-sites.


     SRDF/S with AutoSwap
                                     The two-site SRDF/S with AutoSwap configuration provides for
                                     near-continuous availability through device failover between DC1
                                     and DC2. Figure 22 on page 76 illustrates EMC GDDR operation in
                                     the SRDF/S with AutoSwap environment.



                                     DC1                                        DC2
                                                                         AutoSwap


                                       EMC                                                      EMC
                                       GDDR                                                     GDDR



                                                            AutoSwap        AutoSwap

                                                    R1                                 R2
                                                                       SRDF/S



                                   GDDR heartbeat communication
                                       Active Escon/Ficon channels
                                     Standby Escon/Ficon channels
                                                 Active SRDF links                                 SYM-002225



                       Figure 22     SRDF/S with AutoSwap environment

                                     As Figure 22 on page 76 shows, the relationship between the DC1 and
                                     DC2 sites is maintained through SRDF/S replication of primary disk
                                     images at DC1 to DC2. Both open systems (FBA) and mainframe
                                     (CKD) disk images can be replicated. Figure 22 on page 76 also shows
                                     the two EMC GDDR C-Systems with their heartbeat communication
                                     paths, separate from the production disk and computer facilities.
                                     Each of the DC1 and DC2 production z/OS LPARs as well as both
                                     C-systems has EMC AutoSwap and EMC Consistency Group
                                     (ConGroup) software installed. AutoSwap and ConGroup ensure
                                     that a point of consistency exists whenever EMC GDDR receives
                                     notification of an unplanned or failure event.



76      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



In this environment, EMC GDDR can do the following:
◆   Manage planned site swaps
◆   Manage recovery after unplanned site swaps
◆   Perform standard operational tasks:
    • IPL, system reset, activate, deactivate
    • Trigger stop/start of business workloads
◆   Actively monitor for unplanned/failure events
    • Sites
    • Systems
    • Loss of SRDF/S
    • ConGroup trip
    • Inter-site communication failure
    • AutoSwap events
◆   Configure/reconfigure
    • Couple datasets
◆   Manage coupling facilities
    • Policies




                                           Supported configurations          77
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




     SRDF/A
                                     The two-site SRDF/A configuration provides disaster restart
                                     capabilities at site DC3. In case DC3 was the Primary DASD site,
                                     GDDR offers disaster restart capabilities at DC1. Figure 23 on page 78
                                     illustrates EMC GDDR operation in the SRDF/A environment.



                                      DC1                                     DC3


                                       EMC                                                      EMC
                                       GDDR                                                     GDDR




                                                    R1                                 R2
                                                                     SRDF/A



                                    GDDR heartbeat communication
                                       Active Escon/Ficon channels
                                     Standby Escon/Ficon channels
                                                 Active SRDF links                                  SYM-002224



                       Figure 23     SRDF/A environment

                                     As Figure 23 on page 78 shows, the relationship between the DC1 and
                                     DC3 sites is maintained through SRDF/A replication of primary disk
                                     images from DC1 to DC3. Both open systems (FBA) and mainframe
                                     (CKD) disk images can be replicated. It also shows the two EMC
                                     GDDR C-Systems with their heartbeat communication paths,
                                     separate from the production disk and computer facilities.
                                     EMC GDDR does not have a requirement to "freeze" I/O to obtain a
                                     point of consistency. Multi-Session Consistency and SRDF/A provide
                                     the mechanism to obtain a point of consistency. At the point that
                                     EMC GDDR receives notification of an unplanned or failure event, a
                                     point of consistency is already achieved through these foundation
                                     technologies.




78      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                 In this environment, EMC GDDR can do the following:
                                 ◆     Manage planned site swaps
                                 ◆     Restart processing at the secondary site following unplanned
                                       primary site events
                                 ◆     Perform standard operational tasks
                                       • IPL, system reset, activate, deactivate
                                       • Trigger stop/start of business workloads
                                 ◆     Actively monitor for unplanned/failure events
                                       • Sites
                                       • Systems
                                       • Loss of SRDF/A
                                       • Inter-site communication failure

EMC GDDR SRDF/A complex
                                 Each GDDR complex can manage one Multi-Session Consistency
                                 (MSC) group. An MSC group is a named group, consisting of
                                 multiple RDF groups operating in SRDF/A mode, managed by the
                                 EMC MSC control software feature as a single unit. These groups can
                                 have any mix of CKD and FBA devices, on multiple Symmetrix units
                                 supporting a system, a SYSPLEX, or multiple SYSPLEXs as well as
                                 "Open systems" hosts (Windows and various flavours of Unix).
                                 Figure 24 on page 79 depicts a logical view of a typical EMC GDDR
                                 complex.

                        Primary site                                             Secondary site

     DC1                                                                                                   DC3
                          Sysplex #n                                Sysplex #n
                               Sysplex #2                               Sysplex #2
            C1                                                                                      C3
                                   Sysplex #1                                Sysplex #1
                                                                                                  Master C




                                                        SRDF/A
           Local                            R1                                       R2            Local

                           BCV                                         BCV


                                                                                                         SYM002218



                   Figure 24     GDDR complex with SRDF/A



                                                                                 Supported configurations            79
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                     The following are brief descriptions of the components that comprise
                                     this GDDR complex:
                                     BCVs — Business Continuance Volumes can be supported at each of
                                     the sites. They may be established at the DC3 site and split at the DC1
                                     site.
                                     C1 and C3 — C1 and C3 are the EMC GDDR Control LPARs (or
                                     C-Systems) at each of the sites.
                                     Primary site —The primary site is the site where the production
                                     workload is located.
                                     Primary DASD site —The primary DASD (direct access storage
                                     device) site is the site where the source (R1) DASD is located. The
                                     primary DASD site is the same as the primary site.
                                     Secondary site — The secondary site is the site where the
                                     contingency or standby systems are located.
                                     Secondary DASD site — The secondary DASD site is the site where
                                     the target (R2) DASD is located. The secondary DASD site is the same
                                     as the secondary site.
                                     Sites DC1 and DC3 — Sites DC1 and DC3 are the primary and
                                     secondary data centers of critical production applications and data.
                                     DC1 is the primary site, with SRDF/A data replication to the
                                     secondary site, DC3. These sites are considered fully equivalent for
                                     strategic production applications, connected with highly redundant
                                     direct network links. At all times, all production data is replicated
                                     asynchronously between the two-sites.




80      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




SRDF/Star
Configuration
                              The three-site SRDF/Star configuration provides disaster restart
                              capabilities at DC2 or DC3. Figure 25 on page 81 illustrates EMC
                              GDDR operation in a concurrent SRDF/Star environment. Figure 26
                              on page 82 illustrates EMC GDDR operation in a cascaded
                              SRDF/Star environment.



                             DC1                                          DC2


                               EMC                                                                EMC
                               GDDR                                                               GDDR



                                                    ConGroup            ConGroup

                                            R1                                          R2
                                                                 SRDF/S


                                                                          DC3

                                                              SRDF/A

                                                                                       R2
                            GDDR heartbeat communication
                               Active Escon/Ficon channels
                                                                                                  EMC
                             Standby Escon/Ficon channels                                         GDDR

                                         Active SRDF links
                               SRDF links in standby mode


                                                                                                     SYM-002223



                Figure 25     Concurrent SRDF/Star environment




                                                                                Supported configurations          81
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




                                    DC1                                        DC2


                                      EMC                                                        EMC
                                      GDDR                                                       GDDR



                                                           ConGroup        ConGroup

                                                   R1                                 R21
                                                                      SRDF/S

                                                                                        SRDF/A
                                                                               DC3



                                                                                      R2
                                   GDDR heartbeat communication
                                      Active Escon/Ficon channels
                                                                                                 EMC
                                    Standby Escon/Ficon channels                                 GDDR

                                                Active SRDF links
                                      SRDF links in standby mode


                                                                                                   SYM-002221



                       Figure 26     Cascaded SRDF/Star environment

                                     As Figure 25 on page 81 and Figure 26 on page 82 show, the
                                     relationship between the DC1 and DC2 sites is maintained through
                                     SRDF/Synchronous replication of primary disk images at DC1 to
                                     DC2. Both open systems (FBA) and mainframe (CKD) disk images
                                     can be replicated. In a concurrent configuration, the asynchronous
                                     relationship is between DC1 and DC3, while in a cascaded
                                     environment, the asynchronous relationship is between DC2 and
                                     DC3.
                                     Figure 25 on page 81 and Figure 26 on page 82 also show the three
                                     EMC GDDR C-Systems with their independent heartbeat
                                     communication paths, separate from the production disk and
                                     computer facilities. The C-system at the Primary DASD site has EMC
                                     Consistency Group software installed. It is recommended that one or


82      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



more of the Primary DASD site production z/OS systems also has
EMC Consistency Group software installed. During relevant GDDR
scripts, GDDR will ensure the Consistency Group software is started
where needed.
In this environment, EMC GDDR can perform the following tasks:
◆   Manage planned site swaps
◆   Manage recovery after unplanned site swaps
◆   Manage reconfiguration of the SRDF/Star environment between
    concurrent and cascaded topologies
◆   Manage reconfiguration of the SRDF/Star environment from
    cascaded to concurrent with a primary processing site move
◆   Perform standard operational tasks:
    • IPL, system reset, activate, deactivate
    • Trigger stop/start of business workloads
◆   Actively monitor for unplanned/failure events, including:
    • Sites
    • Systems
    • ConGroup trip
    • Loss of SRDF/S
    • Loss of SRDF/A
    • Inter-site communication failure




                                           Supported configurations          83
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




     EMC GDDR SRDF/Star complex
                                         Figure 27 on page 84 depicts a logical view of a typical EMC GDDR
                                         complex in a concurrent configuration.

                               Primary site                                                   Secondary site

             DC1                                                                                                        DC2
                                  Sysplex #n                                    Sysplex #n
                                       Sysplex #2                                   Sysplex #2
                    C1                                                                                           C2
                                           Sysplex #1                                    Sysplex #1
                                                                                                               Master C




                                                                    SRDF/S
                   Local                            R1                                           R2             Local

                                  BCV                                              BCV


                                                     SRDF/A

                                                                                        DC3

                                                                                 C3




                                                         R2                     Local
                                                              BCV




                                                                Tertiary site                                         SYM-002220



                           Figure 27     GDDR Complex with Star

                                         The following are brief descriptions of the components that comprise
                                         this GDDR complex:
                                         BCVs — BCVs (Business Continuance Volumes) can be supported at
                                         each of the sites.
                                         Primary site — The primary site is the site where the production
                                         workload is located.




84      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                      Primary DASD site — The primary DASD (direct access storage
                      device) site is the site where the source (R1) DASD is located. The
                      primary DASD site is normally the same as the primary site.
                      Secondary site — The secondary site is the site where the
                      contingency or standby systems are located.
                      Secondary DASD site — The secondary DASD site is the site where
                      the target (R2) DASD is located. The secondary DASD site is normally
                      the same as the secondary site.
                      Sites DC1, DC2, and DC3 — Site DC1, DC2, and DC3 are used
                      together in an SRDF/Star data replication three-site topology, where
                      DC1 is the primary site with concurrent or cascaded replication to the
                      secondary site DC2 and the tertiary site DC3. Site DC3 is the tertiary
                      data center for critical production applications and data. It is
                      connected with redundant network to both DC1 and DC2. Data is
                      replicated asynchronously from the current primary DASD site or
                      secondary DASD site (with cascaded SRDF) with an intended
                      recovery point objective (RPO) in a short time period.


SRDF/Star with AutoSwap
                      The three-site SRDF/Star with AutoSwap configuration provides for
                      near-continuous availability through device failover between sites
                      DC1 and DC2 as well as disaster restart capabilities at site DC3.
                      Figure 28 on page 86 illustrates EMC GDDR operation in a
                      Concurrent SRDF/Star with AutoSwap environment. Figure 29 on
                      page 87 illustrates EMC GDDR operation in a Cascaded SRDF/Star
                      with AutoSwap environment.




                                                                  Supported configurations          85
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




                                    DC1                                        DC2
                                                                        AutoSwap


                                     EMC                                                     EMC
                                     GDDR                                                    GDDR


                                                           AutoSwap          AutoSwap

                                                   R1                                   R2
                                                                      SRDF/S


                                                                               DC3

                                                                    SRDF/A

                                                                                        R2
                                   GDDR heartbeat communication
                                      Active Escon/Ficon channels                            EMC
                                    Standby Escon/Ficon channels                             GDDR
                                                Active SRDF links
                                      SRDF links in standby mode


                                                                                               SYM-002228



                       Figure 28     Concurrent SRDF/Star with AutoSwap environment




86      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




             DC1                                          DC2
                                                    AutoSwap


               EMC                                                                EMC
               GDDR                                                               GDDR



                                    AutoSwap             AutoSwap

                            R1                                         R21
                                                 SRDF/S

                                                                         SRDF/A
                                                          DC3



                                                                        R2
            GDDR heartbeat communication
               Active Escon/Ficon channels
                                                                                  EMC
             Standby Escon/Ficon channels                                         GDDR

                         Active SRDF links
               SRDF links in standby mode


                                                                                     SYM-002222



Figure 29     Cascaded SRDF/Star with AutoSwap environment

              As Figure 28 on page 86 and Figure 29 on page 87 show, the
              relationship between the DC1 and DC2 sites is maintained through
              SRDF/Synchronous replication of primary disk images at DC1 to
              DC2. Both open systems (FBA) and mainframe (CKD) disk images
              can be replicated. In a concurrent configuration, the asynchronous
              relationship is between DC1 and DC3, while in a cascaded
              environment, the asynchronous relationship is between DC2 and
              DC3.
              These illustrations show the three EMC GDDR C-Systems with their
              independent heartbeat communication paths, separate from the
              production disk and computer facilities. Each of the DC1 and DC2
              production z/OS LPARs has EMC AutoSwap and EMC Consistency
              Group (ConGroup) installed.


                                                                Supported configurations          87
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                     In this environment, EMC GDDR can perform the following tasks:
                                     ◆   Manage planned site swaps
                                     ◆   Manage recovery after unplanned site swaps
                                     ◆   Manage reconfiguration of the SRDF/Star environment between
                                         the concurrent and cascaded topologies
                                     ◆   Manage reconfiguration of the SRDF/Star environment from
                                         cascaded to concurrent with a primary processing site move
                                     ◆   Perform standard operational tasks:
                                         • IPL, system reset, activate, deactivate
                                         • Trigger stop/start of business workloads
                                     ◆   Actively monitor for unplanned/failure events, including:
                                         • Sites
                                         • Systems
                                         • ConGroup trip
                                         • Loss of SRDF/S
                                         • Loss of SRDF/A
                                         • Inter-site communication failure
                                         • AutoSwap events
                                     ◆   Configure/reconfigure:
                                         • Couple datasets
                                     ◆   Manage coupling facilities:
                                         • Policies




88      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




EMC GDDR requirements

Mainframe environment requirements
                      The basic infrastructure must support the specific EMC technologies
                      involved in the configuration (for example, SRDF/S with AutoSwap
                      or SRDF/A). In addition, EMC GDDR has the following specific
                      infrastructure requirements:
                      ◆   One LPAR is required for each C-System, preferably using DASD
                          isolated from the managed environment.
                      ◆   There must be network connectivity between all C-Systems.
                      ◆   An HMC (Hardware Management Console) must be available at
                          each site that can be accessed from each C-System (access to these
                          HMCs can be protected by means of a private VLAN).
                      EMC GDDR has the mainframe environment requirements listed in
                      Table 1 on page 89.

            Table 1   Mainframe environment requirements

                      Item                               Requirements

                      Processor hardware configuration   Any system that supports current IBM mainframe
                                                         operating systems

                      DASD hardware configuration        Any supported Symmetrix DASD model at an
                                                         Enginuity microcode level specified in the EMC
                                                         GDDR Release Notes

                      Software                           Any currently supported IBM operating system

                      EMC GDDR supports and can manage the following combinations of
                      DASD in a single Enterprise Consistency Group:
                      ◆   Single EMC Symmetrix controllers configured with any of the
                          following:
                          • All CKD devices
                          • All FBA and FBA-META devices
                          • Any combination of CKD, FBA and FBA-META devices
                      ◆   Multiple EMC Symmetrix controllers configured with any of the
                          following:
                          • All CKD devices

                                                                          EMC GDDR requirements           89
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                         • All FBA and FBA-META devices
                                         • Any combination of CKD, FBA and FBA-META devices
                                     Management and monitoring of both CKD and FBA/FBA-META
                                     devices is performed from the z/OS platform where the EMC GDDR
                                     application resides. From the EMC GDDR point of view, CKD and
                                     FBA/FBA-META Symmetrix devices are the same; that is, each is
                                     treated no differently than the other. They are all command targets of
                                     SRDF Host Component configuration commands using local, or
                                     remote syntax.
                                     EMC GDDR requires that if even only one device in an RDF group is
                                     defined to GDDR, then all devices in that group must be defined to
                                     GDDR. Most GDDR actions are directed at the RDF group level
                                     (although in some cases, GDDR will act on device ranges if that is
                                     appropriate).
                                     EMC GDDR has no limitations on the number of EMC Symmetrix
                                     controllers/devices that can be managed. Any limitations are subject
                                     to restrictions in EMC hardware and software.


     Minimum hardware requirements
                                     Table 2 on page 90 describes the recommended minimum processor
                                     and I/O configuration for an EMC GDDR C-System.
                       Table 2       Minimum hardware requirements


                                      Item                                      Requirements

                                      Logical processors                        1 (2 are recommended)

                                      MSU                                       15 on IBM 2084-306 (or equivalent)

                                      Storage                                   512 MB

                                      Logical paths to own local DASD devices   4

                                      Logical paths to managed DASD devices     4



     Minimum software requirements
                                     The minimum software requirements for EMC GDDR for the various
                                     supported configurations are listed in the following sections. The
                                     EMC GDDR Release Notes provide the latest information regarding



90      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                            supported software release levels. Installation procedures for the
                            EMC software products listed can be found in the EMC Mainframe
                            Enablers Installation and Customization Guide.

                  Table 3   Software requirements

Configuration                    HMC API or BCPii     MFE           Consistency        AutoSwap
                                 support                            Group

SRDF/S with ConGroup                                              
SRDF/S with AutoSwap                                                                
SRDF/A                                               
SRDF/Star                                                         
SRDF/Star with AutoSwap                                                             


Additional configuration requirements
                            The following sections list additional configuration requirements for
                            the supported EMC GDDR configurations.

SRDF/S with ConGroup
                            Refer to the EMC SRDF Host Component for z/OS Product Guide for
                            information on configuring an SRDF/S environment.

SRDF/S with AutoSwap
                            Refer to the EMC SRDF Host Component for z/OS Product Guide for
                            information on configuring an SRDF/S environment.
                            ◆   SRDF/S with AutoSwap has the following additional
                                requirements:
                            ◆   CAX protection must be added to the SRDF/S-defined
                                ConGroups.
                            ◆   LOSTOWNERPOLICY ONSWAP=OPERATOR must be specified.
                            The EMC Consistency Group for z/OS Product Guide and EMC
                            AutoSwap Product Guide provide information on these items.

                  SRDF/A    Refer to the EMC SRDF Host Component for z/OS Product Guide for
                            information on configuring an SRDF/A environment. EMC GDDR is
                            compatible with SRDF Automated Recovery functionality.




                                                                        EMC GDDR requirements            91
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                     SRDF/A MSC has the following additional gatekeeper requirement:
                                     ◆   There must be one or more gatekeeper devices for each
                                         MSC-controlled RDF group. These gatekeeper devices must be in
                                         OS configuration as OFFLINE at IPL as regular local devices (not
                                         BCV, SRDF, SAV, and so forth).

                     SRDF/Star       Refer to the SRDF Host Component for z/OS Product Guide for
                                     information on configuring an SRDF/Star environment.

     SRDF/Star with AutoSwap
                                     Refer to the SRDF Host Component for z/OS Product Guide for
                                     information on configuring an SRDF/Star environment.
                                     SRDF/Star with AutoSwap has the following additional
                                     requirements:
                                     ◆   CAX protection must be added to the SRDF/Star-defined
                                         ConGroup.
                                     ◆   LOSTOWNERPOLICY ONSWAP=OPERATOR must be specified.
                                     The EMC Consistency Group for z/OS Product Guide and EMC
                                     AutoSwap Product Guide provide information on these items.
                                     In addition, there must be one or more gatekeeper devices for each
                                     MSC-controlled RDF group. These gatekeeper devices must be:
                                     ◆   SRDF/S-protected (you can choose any available SRDF/S RDF
                                         group)
                                     ◆   ConGroup and CAX-protected
                                     ◆   Not SRDF/A-protected
                                     ◆   If CKD, in OS configuration as ONLINE at IPL
                                     ◆   If FBA, in OS configuration as OFFLINE at IPL




92      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




EMC GDDR fundamental concepts

EMC GDDR control systems
                     The EMC GDDR control systems are more commonly referred to as
                     EMC GDDR C-Systems. One EMC GDDR C-System is located at each
                     site in a separate z/OS System. Each EMC GDDR C-System runs in
                     monoplex mode from local DASD. EMC GDDR C-Systems do not run
                     any production workload. EMC recommends that GDDR is installed
                     dedicated C-System DASD on separate controllers from the
                     production DASD.
                     Deploying GDDR's C Systems as dedicated z/OS systems insures
                     their availability for both planned and unplanned events effecting
                     application and/or site operations. Isolation of GDDR C-Systems
                     from the sysplex(s) insures GDDR operational isolation and provides
                     capabilities other automated business continuity solutions cannot
                     offer.
                     The main functions of an EMC GDDR C-System are to:
                     ◆   Control the recovery after an outage
                     ◆   Control a planned site swap
                     One of the C-Systems is designated as the Master C-System. During
                     normal operations, the Master C-System is the central control point
                     for all EMC GDDR activities. The Master C-System is located at the
                     secondary DASD site in configurations with AutoSwap and on the
                     primary DASD site in other configurations. All EMC GDDR
                     C-Systems are potential candidates to take over as the Master
                     C-System and if the Master C-System becomes unavailable for some
                     reason, a C-System at another location/site assumes the EMC GDDR
                     master function ownership. This way, either the Master-C system
                     survives a site loss, or the Master-C role is automatically transferred
                     to a surviving site. When the original Master C-System becomes
                     available, the master function ownership automatically transfers back
                     to the correct location.
                     Some EMC GDDR functions can only be carried out by the Master
                     C-System, for example:
                     ◆   Running planned processes
                     ◆   Updating EMC GDDR parameters



                                                       EMC GDDR fundamental concepts               93
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




     Workload location               In an EMC GDDR complex without AutoSwap, the business or
                                     production workload runs at a single site; that is, one side of the
                                     Sysplex. This is the same location as the primary DASD site.
                                     In an EMC GDDR complex with AutoSwap, the business or
                                     production workload can run as either a single site workload, usually
                                     at the same location as the primary DASD site, or as a multi-site
                                     workload, where the production workload runs at both the primary
                                     and secondary sites.
                                     In an EMC GDDR complex with an SRDF/A two-site configuration
                                     the production systems and primary DASD must always be at the
                                     same site.
                                     A contingency or standby system is a system that provides backup to
                                     a production system. The characteristics of a contingency system in
                                     an EMC GDDR installation depend on the configuration.
                                     Contingency or standby systems are typically located at the same
                                     location as the secondary DASD. Multiple locations containing
                                     contingency or standby systems may be used to increase availability
                                     and provide disaster restart options. Regional contingency systems
                                     are typically located in the same location as the secondary DASD,
                                     while out-of-region standby systems provide protection from
                                     geographic and infrastructure exposures that may negatively impact
                                     the primary and secondary sites.
                                     In an SRDF/S two-site configuration, a contingency system normally
                                     provides a hot backup to the production system and is typically
                                     located at the same location as the secondary DASD. The contingency
                                     system:
                                     ◆   Is in the same Sysplex as its production system partner
                                     ◆   Is IPLed, but runs no business workload
                                     In an SRDF/A two-site configuration, a contingency or standby
                                     system is a system that replaces production system capacity in the
                                     event of a loss of use of the primary site. It is located at an appropriate
                                     distance from the primary systems to minimize risks from geographic
                                     and infrastructure exposures which may negatively impact primary
                                     systems availability. The contingency system:
                                     ◆   May be used for expendable workloads which are displaced by
                                         business workload following the loss of a primary site.




94      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                      ◆   May be cold (not powered up), or warm (powered up but not
                          IPLed) systems reserved for business workload restart and
                          testing of restart processes, but not in support of any meaningful
                          day-to-day workload.
                      A contingency or standby system in a SRDF/Star configuration
                      normally provides a hot backup to a production system. A
                      contingency system:
                      ◆   Is in the same Sysplex as its production system partner
                      ◆   Is IPLed, but runs no business workload
                      ◆   May be an out-of-region location equipped with idle processor
                          capacity reserved for support of business workload restarts
                      Any production or contingency/standby system defined to EMC
                      GDDR is known as an EMC GDDR managed system.

Managed workloads     EMC GDDR can trigger the stop and restart of production workloads
                      on:
                      ◆   z/OS systems
                      ◆   Distributed systems

 External workloads   External workloads run on mainframe systems which do not have
                      their DASD in the managed Symmetrix units. EMC GDDR can
                      coordinate stop and start of the workload on these "non-managed"
                      mainframe systems with the workload stop and start actions for
                      managed systems.

  Excluded systems    Excluded systems apply to GDDR configurations with AutoSwap
                      only. Symmetrix arrays can be shared by multiple systems and some
                      of those systems may not require EMC GDDR protection. EMC
                      GDDR can be configured to exclude certain z/OS systems from
                      workload management - although these systems have their DASD in
                      the managed Symmetrix arrays. Because of this, these systems must
                      be running AutoSwap, or else they would fail during GDDR
                      managed site swaps.

HMC-Bypass option     Applies at the LPAR-level or the Site level and prevents GDDR from
                      performing ANY action that would require HMC access for the
                      relevant site or LPAR. Goal of this option is to allow GDDR to be
                      deployed at sites under a third-party Facilities Management contract,
                      where the third-party has exclusive access to the HMC, and this
                      HMC-access by individual customers using the provided facilities is
                      not allowed.


                                                         EMC GDDR fundamental concepts               95
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



         HMC-only systems            EMC GDDR can be configured to restrict IPL and CBU (Capacity
                                     Backup Upgrade) actions for certain systems to the online interface.
                                     No other actions or automation are performed for these systems.


     EMC GDDR processes
                                     An EMC GDDR process or script is a dynamically determined
                                     sequence of function calls to achieve a predetermined result.
                                     Generally one function call corresponds to one action. An EMC
                                     GDDR process is executed by calling EMC GDDR provided routines,
                                     either from a batch job or as a result of specific messages being
                                     issued. There are two types of EMC GDDR processes:

            Planned process          An EMC GDDR planned process is initiated through the EMC GDDR
                                     interface to perform a planned task.

     Unplanned process or Takeover process
                                     The EMC GDDR unplanned process or takeover process can only be
                                     initiated following an error that results in a possible takeover
                                     situation. Takeover processes are initiated as a result of certain
                                     messages being issued or specific events occurring.
                                     The messages or events that trigger an unplanned or takeover process
                                     can originate on any system, either a C-System or a production
                                     system. In response to a trigger event the current Master C-System,
                                     following operator confirmation of any trigger event, will invoke the
                                     necessary processes based on the current EMC GDDR complex and
                                     managed system configurations. Processes are invoked following
                                     operator confirmation of any of the following types of failure or loss:
                                     ◆   Sites
                                     ◆   DASD
                                     ◆   Systems
                                     ◆   Loss of SRDF link
                                     ◆   Loss of host channels

     EMC GDDR process restart
                                     In the event an EMC GDDR process does not complete properly the
                                     return codes from the function calls that make up an EMC GDDR
                                     process are saved in EMC GDDR global variables. For functions that
                                     issue EMC SRDF Host Component commands, the return code of the



96      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



commands are also saved. If multiple commands are issued from one
function, the return codes from each command are saved in EMC
GDDR global variables.
After the cause of the original failure has been identified and
resolved, the EMC GDDR process can be rerun. EMC GDDR uses the
saved return codes to establish the point of restart; that is, the point of
the previous failure. This ensures that no modifications to the
supplied EMC GDDR process jobs are required in order to rerun after
a failure. If the underlying problem causing a GDDR function to fail,
has been resolved in a way that makes the GDDR function call
unnecessary, the GDDR interface provides an easy way to skip the
failed function call. If necessary, restart can be forced in a particular
step by manually setting the EMC GDDR system variables
appropriately.




                                   EMC GDDR fundamental concepts               97
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




     EMC GDDR components
                                     The EMC GDDR software environment includes a many components
                                     controlled by a parameter library. Parameters define the EMC GDDR
                                     managed configuration. Some parameters are maintained in external
                                     parmlib members, but most are maintained via the EMC GDDR user
                                     interface and loaded into EMC GDDR global variables by the
                                     parameter activation process. The user interface is an ISPF
                                     application available only on the EMC GDDR C-Systems. There are
                                     events in which EMC GDDR is interested and message rules that
                                     determine the actions to be taken in response to those events. Most
                                     GDDR procedures run as started tasks and include EMC foundation
                                     products such as ResourcePak Base and SRDF Host Component.


     Parameters
     Internal parameters
                                     The GDDR Parameter Wizard is the control interface for managing
                                     GDDR parameters, it should only be used should only be used on the
                                     C-system which is intended to be the Master C-System for the
                                     managed configuration. During the activation process, the new
                                     parameters are loaded into EMC GDDR global variables on the
                                     Master C-System and propagated to the remaining C-Systems using
                                     GDDR peer-to-peer communication.




98      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                    The basic process of updating EMC GDDR parameters consists of the
                    following steps:
                    ◆   Back up the existing parameters. The parameter management
                        process uses an existing backup of parameter values as the
                        starting point for subsequent updates.
                    ◆   Select the backup as input.
                    ◆   Modify parameters as required. The updates are captured in PDS
                        members in a work dataset.
                    ◆   Validate the parameter set. This is done via a batch job submitted
                        from the user interface.
                    ◆   Activate the parameter set. This is also a batch job submitted from
                        the user interface.


User interface      The EMC GDDR user interface is an ISPF application. It is used only
                    on the EMC GDDR C-Systems.

System components
                    The EMC GDDR SAMPLIB contains an exec called GDDREXC that
                    can be customized to allocate the required EMC GDDR and libraries.
                    Once customized, this exec can be integrated into a site's ISPF panels
                    in accordance with site standards. EMC GDDR panels will display
                    only the options the user is authorized to use.
                    The GDDR ISPF user interface does not currently support operating
                    in both screens when in split screen mode.




                                                                 EMC GDDR components              99
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



      Primary Options Menu
                                     Invoking EMC GDDR produces the Primary Options Menu, as
                                     shown in Figure 30 on page 100.




                       Figure 30     Primary Options Menu

                                     The Primary Options Menu displays information on the current state
                                     of EMC GDDR in the environment. It shows the name of the
                                     C-System and site, which C-System is currently Master, which site is
                                     the primary processing site, which site is the primary DASD site,
                                     whether EMC GDDR automation is currently enabled, and the name
                                     of the planned or unplanned script, if any, that is currently in process.
                                     From this screen, EMC GDDR automation can be toggled on and off
                                     using the commands GDDR ON and GDDR OFF. This menu also
                                     provides access to the following functions:




100     GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



◆   Update Personal GDDR ISPF Profile allows each user to change
    GDDR ISPF profile variable values associated with their TSO ID.
    This includes items like datasets required to submit scripts, and
    job card information. Each user must enter their profile updates
    before performing any setup and maintenance functions.
◆   GDDR Setup and Maintenance — Provides options to manage
    EMC GDDR internal parameters, system variables and system
    options.
◆   View GDDR configuration — Displays EMC GDDR configuration
    details for sites, features, and C-Systems.
◆   Manage Site Roles — Allows manual changes to be made to site
    roles, including transferring the Master role to a different
    C-System, changing the AutoSwap owner, and moving the MSC
    control function.
◆   Perform pre-script checkup — Displays any active events, shows
    the status of the EMC GDDR internal command queue, and
    performs communication checks to validate that all production
    systems and C-Systems are active and all inter-systems
    communications are enabled.
◆   Run GDDR Scripts — Allows the user to run EMC GDDR
    background scripts.
◆   View GDDR Script Statistics — Displays timing information for
    recently-executed scripts.
◆   Perform GDDR Actions — Provides options to perform actions
    such as HMC LPAR and CBU actions, HMC discovery, and
    managing couple datasets.




                                            EMC GDDR components              101
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



      Option P: Profile - Update Personal GDDR ISPF Profile
                                     Choosing this option from the Primary Options Menu displays the
                                     screen shown in Figure 31 on page 102.




                       Figure 31     Change GDDR ISPF Profile Variable Values

                                     The GDDR Subsystem name cannot be changed but all other values
                                     requested on the screen must be filled in to allow the user to perform
                                     EMC GDDR functions like parameter maintenance and script
                                     execution.




102      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



Option M: Maintenance - GDDR Setup and Maintenance
                         Figure 32 on page 103 shows the EMC GDDR Setup and Maintenance
                         Menu.




             Figure 32   Setup and Maintenance

                         The GDDR ON and GDDR OFF commands can be used from this
                         menu to enable and disable EMC GDDR automation. This menu also
                         provides access to the following functions:
                         ◆   Manage GDDR Parameters — Provides access to a series of
                             screens that are used to maintain the EMC GDDR internal
                             parameters described in 5.4.1.1 Internal parameters. These
                             screens will be presented in detail later in this TechBook.
                         ◆   Message, Debug, and Trace options — Provides the ability to set
                             default debugging levels and override those values for specific
                             programs.




                                                                     EMC GDDR components              103
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                     ◆   Manage GDDR Internal Command Queue — Provides
                                         information on the command queue used by EMC GDDR to store
                                         SRDF commands that have not yet been executed. This screen
                                         can be used to alter EMC GDDR processing. EMC strongly
                                         recommends that no action be taken on this screen unless
                                         instructed to do so by EMC GDDR Solution Support.
                                     ◆   Perform HMC Discovery — Discovers the HMC objects at each
                                         site. This function can be used to validate the HMC information
                                         entered during EMC GDDR configuration as well as the EMC
                                         GDDR HMC interface installation and setup.
                                     ◆   Refresh GDDR Message Table — Refreshes the GDDRMSG table
                                         managed by GDDRMAIN
                                     ◆   Manage GDDR System Variables — Provides the ability to view
                                         and modify all system variables used by EMC GDDR to describe
                                         and manage the environment. Any updates done thru this panel
                                         remain local to the current C-system and are not subject to any
                                         validation.
                                     In general, use of this screen should be avoided without guidance
                                     from EMC GDDR Solution Support.




104     GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



Option G: GDDR Config - View GDDR Configuration
                         Selecting this option displays a screen similar to Figure 33 on
                         page 105 with the information relevant to the EMC GDDR
                         configuration in use. No changes can be made on this screen.




             Figure 33   View GDDR Configuration




                                                                      EMC GDDR components              105
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



      Option R: Roles - Manage Site Roles
                                     EMC GDDR script processing manages the location of the Master
                                     C-System, the AutoSwap Owner system, and the MSC Control site as
                                     part of normal operations. The Manage Site Roles panel is provided
                                     to address exception conditions such as preparing for a script restart.
                                     Figure 34 on page 106 shows the Manage Site Roles menu.




                       Figure 34     Manage Site Roles

                                     The following choices are available:
                                     ◆   Transfer Master-C system function — Displays the sites in the
                                         EMC GDDR complex and indicates the current and
                                         recommended Master. The Master C-System function can be
                                         transferred to another site.
                                     ◆   Move AutoSwap owner — Displays the sites involved in an
                                         AutoSwap relationship and indicates the current and
                                         recommended owner. AutoSwap ownership can be transferred to
                                         another site.


106      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                         ◆   Move Primary MSC Server — Displays the sites eligible to
                             assume the MSC control function and indicates the current and
                             recommended owner. MSC control can be transferred to another
                             site.

Option C: Checkup - Perform Pre-Script Checkup
                         When this option is selected, EMC GDDR performs a series of health
                         checks and returns the results. A screen similar to Figure 35 on
                         page 107 is presented.




             Figure 35   Perform Health Check

                         In this example, the following information is provided:
                         ◆   There is an SRA and an MSC event
                         ◆   GDDR is running in Degraded Mode
                         ◆   The GDDMPARM member was found to be consistent



                                                                     EMC GDDR components              107
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                     ◆   We do not have STAR-HA (either it is not configured, or there is
                                         no secondary msc server)
                                     ◆   The GDDR command queue is empty
                                     ◆   All C-system are operating normally from an HMC perspective,
                                         but none of them have GDDRMAIN fully operational (some
                                         components are inactive, a D (detail) line command is available
                                         to drill-down on this situation)
                                     ◆   There was an HMC setup issue preventing system-status
                                         determination for the 2 managed systems, and GDDRMAIN is
                                         not active of the managed systems

      Option S: Scripts - Run GDDR Scripts
                                     This option can only be chosen on the Master C-System. It displays
                                     the Select Script to Run screen, which is a dynamic display. It lists
                                     EMC GDDR scripts that are eligible to run under the current
                                     configuration, and also specifies configuration information and
                                     scripts that are in progress, if any. Depending on the location of the
                                     primary site and primary DASD locations, the panel is customized to
                                     display valid choices. Scripts that cannot be used at the current site
                                     do not appear. The Configuration field indicates the features of the
                                     configuration that EMC GDDR takes into account when determining
                                     which scripts are eligible to run and hence which are present in the
                                     list of scripts displayed on the panel. Figure 36 on page 109 shows a
                                     sample of the Scripts to Run screen.




108      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




Figure 36   Select Script to Run




                                                       EMC GDDR components              109
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



      Option T: Timing - View GDDR Script Statistics
                                     Choosing this option displays a list of scripts that have run on this
                                     C-System, as shown in Figure 37 on page 110.




                       Figure 37     View GDDR Script Statistics

                                     Selecting a script displays step start and end times. If a script were
                                     currently running, it would be displayed on the "Last planned:" or
                                     "Last unplanned:" line.




110      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



Option A: Actions - Perform GDDR Actions
                          Figure 38 on page 111 is the sub-menu displayed when Option A is
                          chosen.




              Figure 38   Actions Menu

                          From this menu, the following actions are available:
                          ◆   Perform HMC Discovery — Discovers the HMC objects at each
                              site. This function can be used to validate the HMC information
                              entered during EMC GDDR configuration as well as the EMC
                              GDDR HMC interface installation and setup. This function is also
                              available from the GDDR Setup and Maintenance menu.
                          ◆   Perform HMC LPAR actions — Provides the ability to carry out
                              actions against the EMC GDDR defined sites and systems
                              including activating or deactivating an LPAR, performing a load




                                                                      EMC GDDR components              111
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                         clear, reset clear, stop or start for an LPAR, and querying an LPAR
                                         for its status. This option can only be used on the Master
                                         C-System.
                                     ◆   Perform HMC CBU actions — Presents a list of sites with
                                         processors enabled for CBU (Capacity Backup Upgrade)
                                         activation, if any. Action codes can be used to activate backup
                                         capacity (REAL mode), simulate activation of backup capacity
                                         (TEST mode), terminate actual backup capacity activation, or
                                         terminate simulated backup capacity activation. These actions
                                         can only be performed on the Master C-System.
                                     ◆   Manage Couple Datasets — Checks the couple dataset
                                         configurations on all systems. Currently used couple datasets are
                                         compared against those defined on the "Define Managed Couple
                                         Datasets" panel during EMC GDDR parameter definition. If
                                         discrepancies are found, SETXCF commands are issued to bring
                                         couple dataset usage in line with EMC GDDR parameter
                                         specifications. This function is provided to adjust couple datasets
                                         when it is not appropriate to run a site swap script, which will
                                         also align couple dataset usage with the defined parameters.


      Events and rules               An EMC GDDR event is a state change in a component part of the
                                     environment that EMC GDDR is actively monitoring. Examples of
                                     EMC GDDR events include:
                                     ◆   CGT — ConGroup trip has occurred/state change
                                     ◆   CGD — ConGroup group is disabled/state change
                                     ◆   SRA — SRDF/A link is down
                                     ◆   MHB — Missing C-System heartbeat
                                     ◆   LNK — Link to a C-System down
                                     ◆   RDF — RDF link down
                                     ◆   LDR — Local Disaster
                                     ◆   RDR — Regional Disaster
                                     Events are stored in EMC GDDR global variables. An event can have
                                     a state of either TRUE or FALSE. If the event has a state of TRUE, it
                                     has occurred or is currently occurring. If the event has a state of
                                     FALSE, it is no longer occurring. An event that is TRUE is considered
                                     an exception. EMC GDDR events are used by the GDDR event
                                     monitor and GDDR processes to determine environment state. A
                                     change in state can then:


112     GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                          ◆   Trigger unplanned/takeover processes
                          ◆   Prevent a planned process from running
                          EMC GDDR is supplied with message interception rules These
                          message rules have two primary functions:
                          ◆   To detect events that EMC GDDR is interested in and set the
                              appropriate EMC GDDR event TRUE or FALSE.
                          ◆   To detect events that EMC GDDR processes have to wait for
                              (WTOR), and reply as to the success or failure of the waited for
                              event. This will determine if an EMC GDDR process proceeds or
                              terminates.
                          EMC GDDR uses intersystems communication to route message
                          traffic between production systems and C-systems. The arrival of a
                          message at the target production system can be used to trigger an
                          automation rule (for example in IBM Tivoli NetView or BMC
                          Control-M). Such rules can be used to start or shut down workloads
                          on the appropriate systems.


Procedures                There are several JCL procedures used in an EMC GDDR
                          environment, though most are not part of EMC GDDR itself. They
                          include the tasks for EMC foundation products such as ResourcePak
                          Base and SRDF Host Component. The names presented here are the
                          vendor defaults and may not be the names used in any given
                          environment.
                          In case a customer environment does not support the BCPii HMC
                          interface, CA-OPS is required. Refer to the GDDR product guides for
                          more information.

EMC foundation products
                          Depending on the EMC GDDR configuration in use, there will be two
                          or three EMC foundation product tasks running. All environments
                          will run EMCSCF and EMCRDF, and environments including
                          SRDF/S will also run EMCCGRP:
                          ◆   EMCSCF — ResourcePak Base or Symmetrix Control Facility
                          ◆   EMCRDF — SRDF Host Component
                          ◆   EMCCGRP — Consistency Group
                          For more information on the EMC foundation products refer to
                          Chapter 2 of this book or the EMC product documentation.



                                                                      EMC GDDR components              113
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




      EMC GDDR procedures
                                     These are the JCL procedures needed in an EMC GDDR environment:
                                     ◆   GDDRMAIN — Main EMC GDDR address space.
                                     ◆   GDDREVM — EMC GDDR Event Monitor. The event monitor is
                                         used to analyze event state changes in which EMC GDDR is
                                         interested. On detecting the occurrence of selected events, the
                                         event monitor determines what action to take and prompts
                                         operators with the appropriate choices via WTOR.
                                     ◆   GDDRHBM — EMC GDDR C-System heartbeat monitor. The
                                         heartbeat monitor aids the event monitor in determining the
                                         status of the EMC GDDR managed environment. The lack of a
                                         heartbeat from a particular C-System is used to determine the
                                         state of a C-System and the site.
                                     ◆   GDDRWORK — GDDR Workers. Some pieces of work required
                                         for proper GDDR functioning will run under separate
                                         GDDRWORK started tasks. This includes all GDDR HMC
                                         functions, GDDR Command queue processing, parts of GDDR
                                         scripts shipped to a remote C-system.
                                     ◆   GDDRPROC — Used to run scripts. This is the only EMC GDDR
                                         procedure not used for a started task that is up and running at all
                                         times.

      Started task locations         The started tasks that need to run on any given system in the EMC
                                     GDDR complex depend on whether it is a managed production or
                                     C-System, which site it is, and which configuration is in use.
                                     GDDRMAIN runs on all C-systems in the GDDR complex. In
                                     three-site configurations, it optionally does not run at DC3, if the
                                     customer selects to run DC3 as a "lights-out" data centre.
                                     GDDRMAIN also runs on all fully managed production systems.
                                     GDDREVM, GDDRHBM and GDDRWORK run on all C-Systems at
                                     all sites in the EMC GDDR complex. They are started automatically
                                     by GDDRMAIN. They are not required on the productions systems.
                                     EMCSCF runs on all C-Systems and all IPLed production systems at
                                     all sites.
                                     EMCRDF runs on all C-systems.




114     GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



EMCCGRP runs on C-Systems and one or more production systems
at the primary DASD site when SRDF/S is in use. It runs on all
systems both at DC1 and DC2 in AutoSwap configurations.




                                           EMC GDDR procedures             115
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




      EMC GDDR supported scripts
                                       Table 4 shows the mapping of script names with a short description
                                       of the script function as well as the event type that causes the script to
                                       be generated.

                       Table 4         Script names and descriptions (page 1 of 2)

      Script name        Description                                                         Event Type

      GDDRPA51           Reconfigure to Concurrent                                            RECONFIGURE
      GDDRPA52           Reconfigure to Cascaded                                              RECONFIGURE
      GDDRPA53           Reconfigure to concurrent SRDF with Site Move                        RECONFIGURE

      GDDRPA42           Swap production from <P> to <S> (Protected)                          PLANNED
      GDDRPA45           Swap production from <P> to <S> (FAST)                               PLANNED
      GDDRPA21           Swap DASD from <P> to <S> (Protected)                                PLANNED
      GDDRPA25           Swap DASD from <P> to <S> - FAST                                     PLANNED
      GDD2P17A           Abandon Site <P> –(site swap)                                        PLANNED
      GDD2P18A           Restart production at <S> after site swap                            PLANNED
      GDD2P01A           Perform test IPL from BCVs at <tgt site>                             PLANNED
      GDDRPA27           Perform test IPL from R2s at DC3                                     PLANNED
      GDD2P03A           Perform test IPL from R2s at <S>                                     PLANNED

      GDDRUP31           Recover after unplanned swap                                         UNPLANNED
      GDD2U09A           Recover after unplanned swap or LDR                                  UNPLANNED
      GDD2U13A           Recover after loss of <P> (LDR)                                      UNPLANNED
      GDD2U10A           Recover after loss of <P> (LDR)                                      UNPLANNED
      GDD2U12A           Recover after loss of <P> (LDR)                                      UNPLANNED
      GDDRUP41           Recover after loss of <S>                                            UNPLANNED
      GDDRPA0A(1)        Resume replication after unplanned swap                              UNPLANNED
      GDDRPA0A(2)        Resume replication after loss of <P>                                 UNPLANNED
      GDDRPA0A(3)        Resume replication after loss of <S>                                 UNPLANNED
      GDD2PA0A(4)        Resume replication after unplanned swap                              UNPLANNED
      GDD2PA0A(5)        Resume replication after loss of <P>                                 UNPLANNED




116     GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



              Table 4         Script names and descriptions (page 2 of 2)

Script name     Description                                                                   Event Type

GDDRPA23        Resume ConGroup                                                               RESUME
GDD2P02A        Resume after test IPL from BCVs at <tgt site>                                 RESUME
GDDRPA28        Resume SRDF/A after test IPL at DC3                                           RESUME
GDDRPM29        Resume SRDF/A in MSC mode                                                     RESUME
GDDRPF29        Resume SRDF/A in STAR mode                                                    RESUME
GDD2P14A        Resume replication after link failure                                         RESUME
GDD2P16A        Resume after test IPL from R2s at <S>                                         RESUME

GDDRPAAB        Abandon Sites DC1 and DC2                                                     DC3 ONLY
GDDRPA05        Recover at DC3 after RDR at DC1 and DC2                                       DC3 ONLY
GDDRPA06        Restart Production LPARs at DC3 SRDFA to <tgt site>                           DC3 ONLY
GDDRPA07        Recover at DC3 after LDR at DC1 SRDFA to DC2                                  DC3 ONLY

                              Where:
                              ◆   PLANNED — Result from operator interaction with GDDR
                                  panels
                              ◆   UNPLANNED — Result from unplanned events detected by the
                                  GDDR Event Monitor (EVM)
                              ◆   RESUME — Resume replication after some type of service
                                  interruption (planned or unplanned)
                              ◆   DC3 Only — These scripts only execute at DC3 for planned test
                                  events or for unplanned disaster event management.




                                                                               EMC GDDR supported scripts            117
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                                             Table 5 shows scripts mapped to the configuration type where the
                                                             script executes.

                                Table 5                      Script mapping (page 1 of 2)
      Script Name




                                      AutoSwap
                                      Concurrent Star with


                                                                Concurrent Star


                                                                                  Concurrent Star 2 Site



                                                                                                           AutoSwap
                                                                                                           Cascaded Star with


                                                                                                                                Cascaded Star


                                                                                                                                                Cascaded Star with EDP



                                                                                                                                                                         SRDF/A with MSC



                                                                                                                                                                                           SRDF/S with AutoSwap



                                                                                                                                                                                                                  SRDF/S
      GDDRPA51                                                                                                                 
      GDDRPA52                                                                                                                 
      GDDRPA53                                                                                             
      GDDRPA42                                                                                            
      GDDRPA45                                                                                             
      GDDRPA21                                                                                            
      GDDRPA25                                                                                             
      GDD2P17A                                                                                                                                                                                                 
      GDD2P18A                                                                                                                                                                                                 
      GDD2P01A                                                                                                                                                                                            
      GDDRPA27                                                                                                               
      GDD2P03A                                                                                                                                                                                                   
      GDDRUP31                                                                                            
      GDD2U09A                                                                                                                                                                             
      GDD2U13A                                                                                                               
      GDD2U10A                                                                                                                                                                                                    
      GDD2U12A                                                                                                                                                           
      GDDRUP41                                                                                                                 
      GDDRPA0A(1)                                                                                                             


118                 GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



               Table 5                      Script mapping (page 2 of 2)
 Script Name




                     AutoSwap
                     Concurrent Star with


                                                Concurrent Star


                                                                  Concurrent Star 2 Site



                                                                                           AutoSwap
                                                                                           Cascaded Star with


                                                                                                                    Cascaded Star


                                                                                                                                    Cascaded Star with EDP



                                                                                                                                                                   SRDF/A with MSC



                                                                                                                                                                                     SRDF/S with AutoSwap



                                                                                                                                                                                                            SRDF/S
GDDRPA0A(2)                                                                                                      
GDDRPA0A(3)                                                                                                        
GDD2PA0A(4)                                                                                                                                                                          
GDD2PA0A(5)                                                                                                                                                                                               
GDDRPA23                                                                                                         
GDD2P02A                                                                                                                                                                                            
GDDRPA28                                                                                                         
GDDRPM29                                                                                                                                                      
GDDRPF29                                                                                                                          
GDD2P14A                                                          
GDD2P16A                                                                                                                                                                                                   
GDDRPAAB                                                                                                                       
GDD2PA05                                                                                                                       
GDD2PA06                                                                                                                       
GDD2PA07                                                                                                        


GDDR for SRDF/S with ConGroup (Two sites - DC1 and DC2)
Planned script operations
                                            ◆     Abandon Site DC1 (site swap) — Shut down the single site
                                                  workload at the primary site in preparation for the restart of
                                                  processing at the secondary site:
                                                  • Stop the business workload at the primary DASD site


                                                                                                                                                             EMC GDDR supported scripts                              119
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                         • Wait for the stop of all business applications
                                         • Reset clear all production systems managed by EMC GDDR
                                     ◆   Restart production at DC2 after site swap — Restart the single site
                                         workload after the 'Abandon Site DC1 (site swap)' script has
                                         completed successfully:
                                         • Attempt reset clear of all systems at the primary DASD site
                                         • Activate CBU (if required)
                                         • Activate all needed LPARs, including CFs at the secondary
                                           DASD site
                                         • Create a consistency point at the secondary DASD site
                                         • Prepare the SRDF environment
                                         • IPL all needed production systems
                                     ◆   Perform test IPL from BCVs at DC2 — IPL contingency systems at
                                         site DC2 using BCV devices:
                                         • Split BCVs; make them R/W
                                         • Activate test LPARs using BCV volumes
                                         • Start test business workload, if applicable
                                     ◆   Perform test IPL from R2s at DC2 — IPL contingency systems at
                                         site DC2 using R2 devices:
                                         • Confirm that SRDF/S has been stopped normally via a
                                           ConGroup trip
                                         • Activate LPARs using R2 volumes
                                         • Start test business workload, if applicable

      Unplanned script operations
                                     ◆   Recover after loss of DC1 (LDR) — Restart the single site
                                         workload after the GDDR Event Monitor has detected a
                                         ConGroup trip and the GDDR Heartbeat Monitor has timed out.
                                         These events occurring concurrently are viewed by EMC GDDR
                                         as a local disaster:
                                         • Confirm that a ConGroup trip occurred
                                         • Confirm that SRDF links failed
                                         • Confirm that a local disaster (LDR) event occurred
                                         • Shut down applications at the primary site, if applicable
                                         • Split BCVs and condition R2s at secondary site for restart
                                         • Activate contingency systems

120      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                • Restart applications
                            ◆   Resume replication after loss of DC1 — Resume the SRDF/S link
                                to the secondary site after a primary site disaster:
                                • Confirm SRDF/S links are down
                                • Split BCVs at the secondary site, if applicable
                                • Issue ConGroup cleanup and SRDF/S restart commands
                                • Reestablish BCVs at the secondary site

Resumption operations
                            ◆   Resume after test IPL from BCVs at DC2 — Reset clear
                                contingency systems after a test at DC2:
                                • Stop test business workload, if applicable
                                • Reset clear test LPARs
                                • Reestablish the BCVs
                            ◆   Resume after test IPL from R2s at DC2 — Reset clear contingency
                                systems after a test at DC2:
                                • Stop test business workload, if applicable
                                • Reset clear test LPARs
                                • Restart SRDF/S to DC2
                            ◆   Resume replication after link failure — Restore the SRDF/S link
                                after a loss of the link:
                                • Confirm SRDF/S links are down
                                • Stop ConGroup on all systems
                                • Split BCVs at the secondary site, if applicable
                                • Issue ConGroup cleanup and restart commands
                                • Reestablish BCVs at the secondary site


GDDR for SRDF/S with AutoSwap (Two sites - DC1 and DC2)
Planned script operations
                            ◆   Swap production from DC1 to DC2 — Swap the DASD and the
                                single-site workload from site DC1 to site DC2:
                                • Stop the business workload at the primary DASD site
                                • Swap the DASD to the secondary DASD site (AutoSwap
                                  followed by SRDF/S personality swap)


                                                                   EMC GDDR supported scripts            121
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                         • Resume SRDF/S
                                         • Restart the business workload
                                     ◆   Swap DASD from DC1 to DC2 — Swap the DASD only from site
                                         DC1 to site DC2.
                                     ◆   Perform test IPL from BCVs at DC2 — IPL contingency systems at
                                         site DC2 using BCV devices:
                                         • Split BCVs; make them R/W
                                         • Activate test LPARs using BCV volumes
                                         • Start test business workload, if applicable
                                     ◆   Perform test IPL from R2s at DC2 — IPL contingency systems at
                                         site DC2 using R2 devices:
                                         • Confirm that SRDF/S has been stopped normally via a
                                           Congroup trip
                                         • Activate LPARs using R2 volumes
                                         • Start test business workload, if applicable

      Unplanned script operations
                                     ◆   Recover after unplanned swap — Restart processing at the
                                         secondary site:
                                         • Stop business workload on the old primary site (if applicable)
                                         • Split BCVs at the new primary site
                                         • Move couple datasets to the new primary site (if applicable)
                                         • Trigger restart of business workload at the new primary site
                                         • Transfer Master C-System to the new secondary site
                                     ◆   Recover after loss of DC1 (LDR) — Restart the single site
                                         workload after the GDDR Event Monitor has detected a
                                         ConGroup trip and the GDDR Heartbeat Monitor has timed out.
                                         These events occurring concurrently are viewed by EMC GDDR
                                         as a local disaster:
                                         • Confirm that a local disaster (LDR) event occurred
                                         • Condition R2s at the secondary site for restart
                                         • Shut down applications at the primary site, if applicable
                                         • Split BCVs at the secondary site
                                         • Activate contingency systems
                                         • Manage Couple Facility structures and datasets


122      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                • Restart applications
                            ◆   Resume replication after unplanned swap — Resume the SRDF/S
                                link to the secondary site after a primary site disaster:
                                • Confirm SRDF/S links are down
                                • Split BCVs at the secondary site, if applicable
                                • Perform ConGroup cleanup, transfer AutoSwap ownership
                                  and restart SRDF/S
                                • Manage Couple Facility structures and datasets
                                • Reestablish BCVs at the secondary site

Resumption operations
                            ◆   Resume after test IPL from BCVs at DC2 — Reset clear
                                contingency systems after a test at DC2:
                                • Stop test business workload, if applicable
                                • Reset clear test LPARs
                                • Reestablish the BCVs
                            ◆   Resume after test IPL from R2s at DC2 — Reset clear contingency
                                systems after a test at DC2:
                                • Stop test business workload, if applicable
                                • Reset clear test LPARs
                                • Restart SRDF/S with AutoSwap to DC2
                            ◆   Resume replication after link failure — Restore the SRDF/S link
                                after a loss of the link:
                                • Confirm SRDF/S links are down
                                • Stop ConGroup on all systems
                                • Split BCVs at the secondary site, if applicable
                                • Issue ConGroup cleanup and restart commands
                                • Reestablish BCVs at the secondary site


GDDR for SRDF/A (Two sites - DC1 and DC3)
Planned script operations
                            ◆   Abandon Site DC1 (site swap) — Shut down the single site
                                workload at the primary site in preparation for the restart of
                                processing at the secondary site:


                                                                    EMC GDDR supported scripts            123
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                         • Stop the business workload at the primary DASD site
                                         • Wait for the stop of all business applications
                                         • Reset clear all production systems managed by EMC GDDR
                                     ◆   Restart production at DC3 after site swap — Restart the single site
                                         workload after the 'Abandon Site DC1 (site swap)' script has
                                         completed successfully:
                                         • Attempt reset clear of all systems at the primary DASD site
                                         • Activate CBU (if required)
                                         • Activate all needed LPARs, including CFs at the secondary
                                           DASD site
                                         • Create a consistency point at the secondary DASD site
                                         • Prepare the SRDF environment
                                         • IPL all needed production systems
                                     ◆   Perform test IPL from BCVs at DC3 — IPL contingency systems at
                                         site DC3 using BCV devices:
                                         • Split BCVs; make them R/W
                                         • Activate and load test LPARs using BCV volumes

      Unplanned script operations
                                     ◆   Recover after loss of DC1 (RDR) — Restart the single site
                                         workload after the GDDR Event Monitor has detected an
                                         SRDF/A link down event and the GDDR Heartbeat Monitor has
                                         timed out. These events occurring concurrently are viewed by
                                         EMC GDDR as a regional disaster:
                                         • Confirm that an MSC drop occurred
                                         • Confirm that SRDF links failed
                                         • Confirm that a regional disaster (RDR) event occurred
                                         • Shut down applications at the primary site, if applicable
                                         • Split BCVs and condition R2s at secondary site for restart
                                         • Activate contingency systems
                                         • Restart applications
                                     ◆   Resume replication after loss of DC1 — Resume the SRDF/A link
                                         to the secondary site after a primary site disaster:
                                         • Confirm SRDF/A links are down
                                         • Split BCVs at the secondary site, if applicable


124      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                • Issue MSC cleanup and SRDF/A restart commands
                                • Reestablish BCVs at the secondary site

Resumption operations
                            ◆   Resume after test IPL from BCVs at DC3 — Reset clear
                                contingency systems after a test at DC2:
                                • Stop test business workload, if applicable
                                • Reset clear test LPARs
                                • Reestablish the BCVs
                            ◆   Resume SRDF/A after link loss — Restore the SRDF/A link after
                                a loss of the link (planned or unplanned).


GDDR for SRDF/Star (Three sites)
Reconfiguration operations
                            ◆   Reconfigure to concurrent SRDF — Reconfigure a Cascaded
                                SRDF/Star environment to a Concurrent SRDF/Star
                                environment. The result is that site DC1 is protected at site DC2
                                using SRDF/S, and DC1 is the source of the SRDF/A replication
                                to DC3. The workload continues at DC1:
                                • Stop SRDF/A and delete the SRDF/A relationship between
                                  DC2 and DC3
                                • Perform a differential CREATEPAIR from DC1 to DC3
                                • Reestablish the SRDF/Star environment (DC1 to DC3)
                            ◆   Reconfigure to cascaded SRDF — Reconfigure a Concurrent
                                SRDF/Star environment to a Cascaded SRDF/Star environment.
                                The result is that site DC1 is protected at site DC2 using SRDF/S,
                                and DC2 is the source of the SRDF/A replication to DC3. The
                                workload continues at DC1:
                                • Stop SRDF/A and delete the SRDF/A relationship between
                                  DC1 and DC3
                                • Perform a differential CREATEPAIR from DC2 to DC3
                                • Reestablish the SRDF/Star environment (DC2 to DC3)

Planned script operations
                            ◆   Abandon Site DC1 (site swap) — Shut down the single site
                                workload at the primary site in preparation for the restart of
                                processing at the secondary site:


                                                                    EMC GDDR supported scripts            125
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                         • Stop the business workload at the primary DASD site
                                         • Wait for the stop of all business applications
                                         • Reset clear all production systems managed by EMC GDDR
                                     ◆   Restart production at DC2 after site swap — Restart the single site
                                         workload after the 'Abandon Site DC1 (site swap)' script has
                                         completed successfully:
                                         • Attempt reset clear of all systems at the primary DASD site
                                         • Activate CBU (if required)
                                         • Activate all needed LPARs, including CFs at the secondary
                                           DASD site
                                         • Create a consistency point at the secondary DASD site
                                         • Prepare the SRDF environment
                                         • IPL all needed production systems
                                     ◆   Perform test IPL from BCVs at DC3 — IPL contingency systems at
                                         site DC3 using BCV devices:
                                         • Split BCVs; make them R/W
                                         • Activate test LPARs using BCV volumes
                                         • Start test business workload, if applicable
                                     ◆   Perform test IPL from R2s at DC3 — IPL contingency systems at
                                         site DC3 using R2 devices:
                                         • Confirm that SRDF/A has been stopped normally via an
                                           SRDF/A PENDDROP
                                         • Activate LPARs using R2 volumes
                                         • Start test business workload, if applicable

      Unplanned script operations
                                     ◆   Recover after loss of DC1 (LDR) — Restart the single site
                                         workload after the GDDR Event Monitor has detected a local
                                         disaster (LDR) at the primary site. The following events
                                         occurring concurrently are considered a local disaster: Loss of
                                         SRDF/A, Loss of SRDF/S, or a ConGroup trip, or the primary
                                         site GDDR Heartbeat Monitor has timed out:

                                     Note: In the cascaded SRDF with ConGroup topology, the SRDF/A event is
                                     not considered for LDR.

                                         • Confirm that a local disaster (LDR) event occurred


126      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



    • Deactivate systems at old primary site
    • Prepare the SRDF environment at secondary site
    • Create a consistent point on BCVs at DC3
    • Perform a differential CREATEPAIR between the new primary
      site and DC3
    • Start SRDF/A in MSC mode
    • Reestablish the SRDF/Star environment
    • Reestablish BCVs at DC3
    • Activate production systems
◆   Recover after loss of DC2 — In environments which support
    Cascaded SRDF mode, restore data protection for the primary
    site should the secondary site suffer an outage resulting in loss of
    SRDF/S and SRDF/A. This script is submitted by the GDDR
    Event Monitor when a ConGroup trip is detected and the GDDR
    Heartbeat Monitor at the secondary site has timed out. These
    events occurring concurrently are viewed by EMC GDDR as a
    secondary site disaster. Following the restoration of the
    secondary site, the script 'Resume replication after loss of DC2' is
    used to resume SRDF/S:
    • Provide the option to stop primary site and secondary site
      workloads since all remote protection is lost
    • Split BCVs at DC3
    • Remove the SRDF pairing between DC3 and the secondary
      DASD site
    • Perform a differential re-synchronization between the primary
      site and DC3
    • Start SRDF/A in concurrent MSC mode
    • Provide the option to start primary site workloads
    • Reestablish BCVs at DC3
◆   Resume replication after loss of DC1 — Resume the SRDF/S link
    to the secondary site after a primary site disaster:
    • Confirm SRDF/S links are down
    • Stop ConGroup on all systems
    • Split BCVs at the secondary site, if applicable
    • Issue ConGroup cleanup and restart commands
    • Reestablish BCVs at the secondary site


                                        EMC GDDR supported scripts            127
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                     ◆   Resume replication after loss of DC2 — In environments which
                                         support Cascaded SRDF mode, resume the SRDF/S link to the
                                         primary site after a secondary site disaster and restart SRDF/A in
                                         SRDF/Star mode. The script appears in the 'Select Script to Run'
                                         panel of the current Master C-System after completion of the
                                         'Recover after loss of DC2' script:
                                         • Confirm SRDF/S links are down
                                         • Stop ConGroup on all systems
                                         • Split BCVs at the primary site, if applicable
                                         • Issue ConGroup cleanup and restart commands
                                         • Reestablish BCVs at the primary site

      Resumption operations
                                     ◆   Resume ConGroup — Resume consistency group protection
                                         following a ConGroup trip:
                                         • Resume the consistency group.
                                         • Reestablish the BCVs at the secondary DASD site, if
                                           applicable.
                                     ◆   Resume after test IPL from BCVs at DC3 — Reset clear
                                         contingency systems after a test at DC3:
                                         • Stop test business workload, if applicable
                                         • Reset clear test LPARs
                                         • Reestablish the BCVs
                                     ◆   Resume SRDF/A after test IPL at DC3 — Restore the SRDF/A
                                         link to DC3 (either from DC1 or DC2 depending upon where the
                                         production workload is currently running) after a test on DC3:
                                         • Reset clear all systems IPL’d during the test of DC3 (at DC3)
                                         • Deactivate all LPARs previously activated for the test of DC3
                                           (at DC3)
                                         • Restart SRDF/Star to DC3
                                     ◆   Resume SRDF/A in MSC mode to DC3 — Restore the SRDF/A
                                         link to DC3 in MSC mode (from either DC1 or DC2 depending
                                         upon where the production workload is currently running) after
                                         a planned or unplanned swap.
                                     ◆   Resume SRDF/A (SRDF/Star) to DC3 — Restore the SRDF/A
                                         link to DC3 in SRDF/Star mode (from either DC1 or DC2) after a
                                         planned or unplanned stop of SRDF/A.


128     GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



DC3-specific operations
                          ◆   Abandon Sites DC1 and DC2 — In the event of a perceived threat
                              to both DC1 and DC2, perform an orderly shutdown of DC1 and
                              DC2 and force the C-System at DC3 as the new EMC GDDR
                              Master C-System:
                              • Stop the business applications
                              • Wait for the stop of all business applications
                              • Reset clear all production systems managed by EMC GDDR
                              • Create a consistency point on the DC3 BCVs
                              • Create a consistency point on the secondary DASD BCVs
                              • Stop ConGroup on all remaining systems
                              • Force the C-System at DC3 as the new EMC GDDR Master
                                C-System
                          ◆   Recover at DC3 after RDR at DC1 and DC2 — In the event of a
                              major failure that prevents the production workload from being
                              run from either DC1 or DC2, restart the production LPARs at site
                              DC3 and reestablish SRDF/A to site DC1 or DC2. There must be
                              an SRDF/A connection to either DC1 or DC2:
                              • Attempt reset clear of all systems at DC1/DC2
                              • Activate all needed LPARs
                              • Activate CBU (if required)
                              • Create a consistency point at DC3
                              • Prepare SRDF environment
                              • IPL all needed production systems
                          ◆   Restart production LPARs at DC3 SRDFA to DC1 or DC2 — In the
                              event of a major failure that prevents the production workload
                              from being run from either DC1 or DC2, restart the production
                              LPARs at site DC3 and reestablish SRDF/A to site DC1 or DC2:
                              • Attempt reset clear of all systems at DC1/DC2 (except
                                C-System)
                              • Activate all needed LPARs
                              • Activate CBU (if required)
                              • Create a consistency point at DC3
                              • Prepare SRDF environment
                              • IPL all needed production systems



                                                                 EMC GDDR supported scripts            129
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                         • Perform a differential CREATEPAIR between DC3 and DC1 or
                                           DC2
                                         • Start SRDF/A in MSC mode
                                     ◆   Recover at DC3 after LDR at DC1 SRDFA to DC2 — In the event
                                         of a local disaster (LDR) that prevents the production workload
                                         from being run at DC1, and DC2 is a bunker site (storage only),
                                         restart the production LPARs at site DC3 and reestablish SRDF/A
                                         to DC2. The distinction between this script and the 'Restart
                                         production LPARs at DC3 SRDFA to DC1 or DC2' scripts is, that
                                         the 'Star C-Ahead' indicator is interrogated to enable the
                                         workload to restart with the DC3 data, or with the DC2 data if it
                                         is more current:
                                         • Activate all needed LPARs including CFs at DC3
                                         • Activate CBU (if required)
                                         • Create a consistency point at DC3
                                         • Prepare SRDF environment
                                         • IPL all needed production systems
                                         • Perform a differential CREATEPAIR between DC3 and DC2
                                         • Start SRDF/A in MSC mode


      GDDR for SRDF/Star with AutoSwap (Three sites)
      Reconfiguration operations
                                     ◆   Reconfigure to concurrent SRDF — Reconfigure a Cascaded
                                         SRDF/Star with AutoSwap environment to a Concurrent
                                         SRDF/Star with AutoSwap environment. The result is that site
                                         DC1 is protected at site DC2 using SRDF/S and AutoSwap, and
                                         DC1 is the source of the SRDF/A replication to DC3. The
                                         workload continues at DC1:
                                         • Stop SRDF/A and delete the SRDF/A relationship between
                                           DC2 and DC3
                                         • Perform a differential CREATEPAIR from DC1 to DC3
                                         • Reestablish the SRDF/Star with AutoSwap environment (DC1
                                           to DC3)
                                     ◆   Reconfigure to concurrent SRDF with site move — Reconfigure a
                                         Cascaded SRDF/Star with AutoSwap environment to
                                         Concurrent SRDF/Star with AutoSwap with a primary
                                         processing site move. The initial state is that the workload at DC1

130      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                is protected at DC2 with SRDF/S. The result is that DC2 is
                                protected at DC1 using SRDF/S and AutoSwap, and DC2 is the
                                source of the SRDF/A replication to DC3. The workload will be
                                running at DC2:
                                • Stop the business workload at site DC1
                                • Swap the DASD to DC2 (AutoSwap followed by SRDF/S
                                  personality swap)
                                • Reset clear all production systems managed by EMC GDDR
                                • Resume SRDF/S (DC2 to DC1)
                                • Restart the business workload
                                • Reestablish the SRDF/Star with AutoSwap environment (DC2
                                  to DC3)
                            ◆   Reconfigure to cascaded SRDF — Reconfigure a Concurrent
                                SRDF/Star with AutoSwap environment to a Cascaded
                                SRDF/Star with AutoSwap environment. The result is that site
                                DC1 is protected at site DC2 using SRDF/S and AutoSwap, and
                                DC2 is the source of the SRDF/A replication to DC3. The
                                workload continues at DC1:
                                • Stop SRDF/A and delete the SRDF/A relationship between
                                  DC1 and DC3
                                • Perform a differential CREATEPAIR from DC2 to DC3
                                • Reestablish the SRDF/Star with AutoSwap environment (DC2
                                  to DC3)

Planned script operations
                            ◆   Swap production from DC1 to DC2 — Swap the DASD and the
                                single-site workload from the primary DASD site to the
                                secondary DASD site. In environments with geographically
                                dispersed Sysplexes supported by cross-site host-DASD
                                channels, Couple Dataset and Coupling Facility Structure
                                management is performed as a result of a site move, either
                                planned or unplanned. These actions are controlled by the
                                Realign Couple Datasets and Rebuild Couple Facility Structure
                                parameters:
                                • Stop the business workload at the primary DASD site
                                • Swap the DASD to the secondary DASD site (AutoSwap
                                  followed by SRDF/S personality swap)
                                • Resume SRDF/S



                                                                  EMC GDDR supported scripts            131
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                         • Restart the business workload
                                         • Reestablish the SRDF/Star with AutoSwap environment
                                     ◆   Swap DASD from DC1 to DC2 — Swap only the DASD from the
                                         primary DASD site to the secondary DASD site.
                                     ◆   Perform test IPL from BCVs at DC3 — IPL contingency systems at
                                         site DC3 using BCV devices:
                                         • Split BCVs; make them R/W
                                         • Activate test LPARs using BCV volumes
                                         • Start test business workload, if applicable
                                     ◆   Perform test IPL from R2s at DC3 — IPL contingency systems at
                                         site DC3 using R2 devices:
                                         • Confirm that SRDF/A has been stopped normally via an
                                           SRDF/A PENDDROP
                                         • Activate LPARs using R2 volumes
                                         • Start test business workload, if applicable

      Unplanned script operations
                                     ◆   Recover after unplanned swap — Restart processing at the
                                         secondary site after an unplanned swap has completed
                                         successfully.
                                         • Stop business workload on the old primary site (if applicable)
                                         • Trigger restart of business workload at the new primary site (if
                                           applicable)
                                         • Remove SRDF pairing between DC3 and the primary DASD
                                           site
                                         • Stop ConGroup on all systems
                                         • Perform a differential CREATEPAIR between the new primary
                                           DASD site and DC3
                                         • Start SRDF/A in MSC mode
                                     ◆   Recover after loss of DC1 (LDR) — Restart the single site
                                         workload after the GDDR Event Monitor has detected a local
                                         disaster (LDR) at the primary site. The following events
                                         occurring concurrently are considered a local disaster: Loss of
                                         SRDF/A, Loss of SRDF/S, or a ConGroup trip, or the primary
                                         site GDDR Heartbeat Monitor has timed out:
                                         • Confirm that a local disaster (LDR) event occurred



132      GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



    • Deactivate systems at old primary site
    • Prepare the SRDF environment at secondary site
    • Create a consistent point on BCVs at DC3
    • Perform a differential CREATEPAIR between the new primary
      site and DC3
    • Start SRDF/A in MSC mode
    • Reestablish the SRDF/Star environment
    • Reestablish BCVs at DC3
    • Manage Couple Facility structures and datasets
    • Activate production systems
◆   Recover after loss of DC2 — In environments which support
    Cascaded SRDF mode, restore data protection for the primary
    site should the secondary site suffer an outage resulting in loss of
    SRDF/S and SRDF/A. This script is submitted by the GDDR
    Event Monitor when a ConGroup trip is detected and the GDDR
    Heartbeat Monitor at the secondary site has timed out. These
    events occurring concurrently are viewed by EMC GDDR as a
    secondary site disaster. Following the restoration of the
    secondary site, the script 'Resume replication after loss DC2' is
    used to resume SRDF/S:
    • Provide the option to stop primary site and secondary site
      workloads since all remote protection is lost
    • Split BCVs at DC3
    • Remove the SRDF pairing between DC3 and the secondary
      DASD site
    • Perform a differential re-synchronization between the primary
      site and DC3
    • Start SRDF/A in concurrent MSC mode
    • Provide the option to start primary site workloads
    • Reestablish BCVs at DC3
◆   Resume replication after unplanned swap — Resume the SRDF/S
    link to the secondary DASD site after an unplanned swap (due to
    the loss of the primary DASD). The script appears in the 'Select
    Script to Run' panel of the current Master C-System after
    completion of the 'Recover after unplanned swap' script:
    • Remove the SRDF pairing between the secondary DASD site
      and the primary DASD site


                                        EMC GDDR supported scripts            133
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                         • Clean ConGroup bits from secondary DASD
                                         • Resume SRDF/S (from current primary DASD site to the
                                           previous primary DASD site)
                                         • Put SRDF/A into SRDF/Star with AutoSwap mode
                                     ◆   Resume replication after loss of DC1 — Resume the SRDF/S link
                                         to the secondary site after a primary site disaster and restart
                                         SRDF/A in SRDF/Star with AutoSwap mode. The script appears
                                         in the 'Select Script to Run' panel of the current Master C-System
                                         after completion of the 'Recover after loss of DC1 (LDR)' script:
                                         • Confirm SRDF/S links are down
                                         • Stop ConGroup on all systems
                                         • Split BCVs at the secondary site, if applicable
                                         • Issue ConGroup cleanup and restart commands
                                         • Reestablish BCVs at the secondary site
                                     ◆   Resume replication after loss of DC2 — In environments which
                                         support Cascaded SRDF mode, resume the SRDF/S link to the
                                         primary site after a secondary site disaster and restart SRDF/A in
                                         SRDF/Star with AutoSwap mode. The script appears in the
                                         'Select Script to Run' panel of the current Master C-System after
                                         completion of the 'Recover after loss of DC2' script:
                                         • Confirm SRDF/S links are down
                                         • Stop ConGroup on all systems
                                         • Split BCVs at the primary site, if applicable
                                         • Issue ConGroup cleanup and restart commands
                                         • Reestablish BCVs at the primary site

      Resumption operations
                                     ◆   Resume ConGroup — Resume consistency group protection
                                         following a ConGroup trip:
                                         • Resume the consistency group
                                         • Re-establish the BCVs at the secondary DASD site, if
                                           applicable
                                     ◆   Resume after test IPL from BCVs at DC3 — Reset clear
                                         contingency systems after a test at DC3:
                                         • Stop test business workload, if applicable
                                         • Reset clear test LPARs


134     GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                              • Reestablish the BCVs
                          ◆   Resume SRDF/A after test IPL at DC3 — Restore the SRDF/A
                              link to DC3 (either from DC1 or DC2 depending upon where the
                              production workload is currently running) after a test on DC3:
                              • Reset clear all systems IPLed during the test of DC3 (at DC3)
                              • Deactivate all LPARs previously activated for the test of DC3
                                (at DC3)
                              • Restart SRDF/Star with AutoSwap to DC3
                          ◆   Resume SRDF/A in MSC mode to DC3 — Restore the SRDF/A
                              link to DC3 in MSC mode (from either DC1 or DC2 depending
                              upon where the production workload is currently running) after
                              a planned or unplanned swap.
                          ◆   Resume SRDF/A (SRDF/Star AutoSwap) to DC3 — Restore the
                              SRDF/A link to DC3 in SRDF/Star with AutoSwap mode (from
                              either DC1 or DC2) after a planned or unplanned stop of
                              SRDF/A.

DC3-specific operations
                          ◆   Abandon Sites DC1 and DC2 — In the event of a perceived threat
                              to both DC1 and DC2, perform an orderly shutdown of DC1 and
                              DC2 and force the C-System at DC3 as the new EMC GDDR
                              Master C-System:
                              • Stop the business applications
                              • Wait for the stop of all business applications
                              • Reset clear all production systems managed by EMC GDDR
                              • Create a consistency point on the DC3 BCVs
                              • Create a consistency point on the secondary DASD BCVs
                              • Stop SRDF/Star with AutoSwap
                              • Stop ConGroup on all remaining systems
                              • Force the C-System at DC3 as the new EMC GDDR Master
                                C-System
                          ◆   Recover at DC3 after RDR at DC1 and DC2 — In the event of a
                              major failure that prevents the production workload from being
                              run from either DC1 or DC2, restart the production LPARs at site
                              DC3 and reestablish SRDF/A to site DC1 or DC2. There must be
                              an SRDF/A connection to either DC1 or DC2:
                              • Attempt reset clear of all systems at DC1/DC2


                                                                 EMC GDDR supported scripts            135
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



                                         • Activate all needed LPARs
                                         • Activate CBU (if required)
                                         • Create a consistency point at DC3
                                         • Prepare SRDF environment
                                         • IPL all needed production systems
                                     ◆   Restart production LPARs at DC3 SRDFA to DC1 or DC2 — In the
                                         event of a major failure that prevents the production workload
                                         from being run from either DC1 or DC2, restart the production
                                         LPARs at site DC3 and reestablish SRDF/A to site DC1 or DC2:
                                         • Attempt reset clear of all systems at DC1/DC2 (except
                                           C-System)
                                         • Activate all needed LPARs
                                         • Activate CBU (if required)
                                         • Create a consistency point at DC3
                                         • Prepare SRDF environment
                                         • IPL all needed production systems
                                         • Perform a differential CREATEPAIR between DC3 and DC1 or
                                           DC2
                                         • Start SRDF/A in MSC mode
                                     ◆   Recover at DC3 after LDR at DC1 SRDFA to DC2 — In the event
                                         of a local disaster (LDR) that prevents the production workload
                                         from being run at DC1, and DC2 is a bunker site (storage only),
                                         restart the production LPARs at site DC3 and reestablish SRDF/A
                                         to DC2. The distinction between this script and the 'Restart
                                         production LPARs at DC3 SRDFA to DC1 or DC2' scripts is that
                                         the 'Star C-Ahead' indicator is interrogated to enable the
                                         workload to restart with the DC3 data, or with the DC2 data if it
                                         is more current:
                                         • Activate all needed LPARs including CFs at DC3
                                         • Activate CBU (if required)
                                         • Create a consistency point at DC3
                                         • Prepare SRDF environment
                                         • IPL all needed production systems
                                         • Perform a differential CREATEPAIR between DC3 and DC2
                                         • Start SRDF/A in MSC mode



136     GDDR Solution Design and Implementation Techniques
EMC Geographically Dispersed Disaster Restart (EMC GDDR)



The remainder of the TechBook will describe the lab environment,
document the installation and configuration of GDDR, and document
running scripts with more detail on the steps executed in the scripts.




                                       EMC GDDR supported scripts            137
EMC Geographically Dispersed Disaster Restart (EMC GDDR)




138     GDDR Solution Design and Implementation Techniques
5
                                                                      GDDR Managed
                                                                       Systems Design
                                                                       Considerations




This chapter contains the following:
◆   Overview ...........................................................................................   140
◆   Ignore, Reset Clear, or IPL in place................................................                   142
◆   Contingency systems and workload movement .........................                                    143
◆   HMC-Only systems (new in GDDR 3.1).......................................                              144
◆   LPAR and CEC recovery .................................................................                146
◆   Automated Configuration Check - DASD ...................................                               149




                                GDDR Managed Systems Design Considerations                                       139
GDDR Managed Systems Design Considerations




      Overview
                                   One of the many decisions that must be made as part of a GDDR
                                   implementation is whether or not customers want GDDR to manage
                                   production systems, production LPARs and/or production CPCs
                                   from GDDR. It is important to understand that systems
                                   management in GDDR is optional. Some customers choose to have
                                   GDDR automation extend into managing zSeries systems, LPARs and
                                   workloads, other customers want some systems managed by GDDR
                                   and others excluded and finally there are a set of customers who do
                                   not want GDDR to do anything to their environment other than to
                                   manage EMC infrastructure software such as SRDF, TimeFinder and
                                   AutoSwap. The GDDR systems management functions are flexible
                                   and can accommodate this wide ranging set of customer operational
                                   requirements.
                                   If a decision is made for GDDR to manage system recovery beyond
                                   just the storage layer, a further decision needs to be made whether to
                                   include systems management into the scope of the implementation
                                   project or defer systems management to a later date. Deferring
                                   systems management to a second phase of implementation does
                                   make the implementation project simpler and shorter. When a
                                   decision is made to implement GDDR systems management, the
                                   following features/functions must be considered for each system to
                                   be managed by GDDR:
                                   ◆   Ignore, Reset Clear or IPL in place
                                   ◆   Contingency systems and workload movement
                                   ◆   HMC Only
                                   ◆   LPAR and CPC recovery
                                   Production systems under GDDR control are known as P-Systems or
                                   sometimes “managed systems.” These are the customer systems that
                                   run application workload.
                                   Commencing with GDDR 3.1 EMC began introducing more
                                   functionality into CEC, LPAR and system recovery options. Some
                                   customers use GDDR to manage their P-Systems and others chose to
                                   manage these systems themselves. GDDR has been architected to
                                   provide significant flexibility when it comes to P-System
                                   management.




140     GDDR Solution Design and Implementation Techniques
GDDR Managed Systems Design Considerations



                                  For customers that have more complex environments it helps to build
                                  a GDDR managed system recovery map similar to Table 6 on
                                  page 141. This documents the actions that GDDR is to take for event
                                  management on any of the managed systems.

                  Table 6         GDDR managed system recover map

LPAR NAME   E01             E02            E03            E04           F01            F02           F03

Location    DC1             DC1            DC1            DC1           DC2            DC2           DC2

CPC         SYSTEM E        SYSTEM E       SYSTEM E       SYSTEM E      SYSTEM E       SYSTEM E      SYSTEM E
            IBM             IBM            IBM            IBM           IBM            IBM           IBM
            2096-S07-R04    2096-S07-R04   2096-S07-R04   2096-S07-R    2096-S07-R04   2096-S07-R    2096-S07-R0
            #012345         #012345        #012345        04 #012345    #6789A         04 #6789A     4 #6789A

Workload    Production      Production     Development    Test          Production     Production    Development
type

Sysplex     Prod            Prod                                        Prod           Prod
Name

System      MVSA            MVSB           MVSC           MVSD          MVSE           MVSF          MVSG
Name

Desired     UP              UP             UP             UP            UP             UP            UP
State

Recover     Yes             Yes            Yes            No            Yes            Yes           Yes

Recovery    Workload        Workload       Recovery       Reset Clear                                IPL in place
type        Move            Move           LPAR

Recovery    F01             F02            F03
location

Activate                                                                Y              Y             N
CBU




                                                                                                    Overview        141
GDDR Managed Systems Design Considerations




      Ignore, Reset Clear, or IPL in place
                                   When a system is defined to GDDR systems management a decision
                                   must be made regarding GDDR treatment of a failure of the defined
                                   system. The options are:
                                   ◆   Ignore — The system is defined to GDDR but this instructs
                                       GDDR to take no action if GDDR observes that this system has
                                       failed. Failure is typically “noticed” by GDDR when the EMC
                                       SCF product “de- registers” the system typically because of
                                       polling time-out.
                                   ◆   Reset Clear — If GDDR detects that the system has crashed (such
                                       as SCF de-registration), GDDR will access the HMC and RESET
                                       CLEAR the LPAR but take no other action.
                                   ◆   IPL in place— If GDDR detects that the system has crashed,
                                       GDDR will access the HMC, RESET CLEAR the system and then
                                       IPL in place.




142     GDDR Solution Design and Implementation Techniques
GDDR Managed Systems Design Considerations




Contingency systems and workload movement
                Contingency systems are “hot standby” systems that are IPLed and
                active and (typically) are located in a different data center. Often,
                Contingency Systems are the “remote half” of a geographically
                separated, single site workload, parallel Sysplex. In the event of a
                primary site failure (for example system MVSA fails in LPAR E01 in
                the chart above) GDDR will trigger workload restart in system MVSE
                at site DC2 in LPAR F01. GDDR does this by sending message
                GDDX191I to appear on the MVSE console. A message rule must be
                written and executed for MVSE to intercept the GDDX191I message
                and fire a REXX rule for that system to carry out the actions necessary
                to start the production workload. This message interception REXX
                rule would be written for whatever automation package runs on
                MVSE.




                                      Contingency systems and workload movement           143
GDDR Managed Systems Design Considerations




      HMC-Only systems (new in GDDR 3.1)
                                   For systems defined as “HMC-Only” GDDR can be used in place of
                                   native access to the HMC to perform system options. Interaction with
                                   HMC-Only systems is obtained through the “LPAR Actions” panel,
                                   shown in Figure 39 on page 144, by choosing option “L”.




                      Figure 39    GDDR Actions Menu

                                   After choosing option “L” a screen similar to Figure 40 on page 145, is
                                   displayed. Systems defined as HMC Only will appear. The “T”
                                   column below shows the GDDR system type. OS in the display
                                   indicates that this is a “HMC Only” system with Standard IPL Parms.
                                   For a full description of fields in this display, refer to the GDDR
                                   product guides.




144     GDDR Solution Design and Implementation Techniques
GDDR Managed Systems Design Considerations




Figure 40   GDDR Perform HMC LPAR Actions

            All of the functions displayed on this screen are protected by SAF
            profiles. This offers more controls than are available from direct HMC
            access. Customers choose to add “HMC Only” systems so that they
            have more control over access controls to these powerful system
            commands.




                                           HMC-Only systems (new in GDDR 3.1)         145
GDDR Managed Systems Design Considerations




      LPAR and CEC recovery
                                   GDDR can tolerate the loss of a C-System. In the event of a C-System
                                   failure GDDR will report “degraded mode” errors periodically until
                                   the C-System rejoins the GDDR complex. In the case of the loss of a
                                   GDDR Master C-System, another GDDR C-System will take over the
                                   master function. Again, “degraded mode” will be reported until the
                                   missing C-System returns. LPAR Recovery can be specified for GDDR
                                   C-Systems which may help minimize the amount of time GDDR
                                   operates in “degraded mode.”
                                   For production systems, the following options are available:
                                   ◆   Ignore
                                   ◆   IPL in place
                                   ◆   Reset Clear
                                   ◆   Start workload in contingency systems.
                                   The “desired state” indicator can be useful if a system needs to be
                                   down for maintenance ensuring that GDDR will not kick into action.
                                   If SCF is topped normally on that system GDDR will not kick into
                                   action anyway, as the SCF controls the de-registration process and
                                   GDDR will know that the system has been shut down in a controlled
                                   manner.
                                   For managed systems, the location indicator is either set to “H”
                                   indicating “H”ome location for this system or “A” indicating
                                   “A”way, meaning that this system is operating in its recovery
                                   location. It is possible that the Home and Away locations are in the
                                   same data center, but commonly the Away location is a different data
                                   center.
                                   For managed systems, the following should be considered:
                                   ◆   Multi-site workloads do not have CONTingency systems.
                                   ◆   A system can have LPAR-RECOVERY specified indicating that
                                       the workload can be restarted in a different LPAR to where the
                                       system is currently running. The recovery LPAR can be in the
                                       same or different data center.
                                   ◆   Recovery LPARs defined to GDDR as REGULAR LPARs, but are
                                       defined a second time, on the system Recovery Attributes panel,
                                       as recovery for a protected systems




146     GDDR Solution Design and Implementation Techniques
GDDR Managed Systems Design Considerations



                   ◆   A protected system can have a CONTingency system as a
                       Recovery LPAR
                   ◆   Sysplex systems with more than one LPAR can have Recovery
                       LPARs defined, which could be in the primary or secondary data
                       center.
                   ◆   So for systems protected with LPAR recovery, they are either
                       Home or Away but their SITE value does not change when they
                       are “Away”.
                   ◆   Systems NOT protected by LPAR Recovery can (optionally) have
                       a second regular LPAR defined for them, on a different site. These
                       second LPARs could even be the home location of a system
                       protected with LPAR_Recovery.
                   ◆   The difference between a Recover LPAR definition and a second
                       LPAR definition is subtle:
                       • The second LPAR definition is NOT considered as a recovery
                         option, and will never lead to a system running in that
                         location to be displaced.
                       • The SITE value for a system NOT protected with
                         LPAR_RECOVERY changes depending on whether it is
                         running in its first or second LPAR.
                   ◆   LPAR Recovery for C-systems: avoid degrade mode when a
                       C-system is lost.
                   ◆   GDDR LPAR Recovery supports repeated failures: LPAR in Away
                       location is protected with LPAR Recovery in Home location


Planned CPC Swap   This is a new Planned Script, triggered from the “Perform HMC
                   LPAR Actions” panel.
                   While we have that panel up here, let’s mention a few other changes
                   to it which happened in 3.2:
                   ◆   Action X LOAD Recovery: for a system protected with LPAR
                       recovery only
                   ◆   E show events: queries GDDR global variables for events related
                       to a system
                   ◆   T Toggle Desired state
                   ◆   Site column: now indicates site of the LPAR where the system
                       currently runs
                   ◆   T Type of IPL parameters: S: STD, A: Alternate B: BCV


                                                               LPAR and CEC recovery         147
GDDR Managed Systems Design Considerations



                                   ◆   IPL-parms shown as “*” (not new) IPL parms are only shown if
                                       their definition matches the current Primary DASD site




148     GDDR Solution Design and Implementation Techniques
GDDR Managed Systems Design Considerations




Automated Configuration Check - DASD
                      Traditionally GDDR parameter validation has been very much
                      limited to checking syntax and internal consistency, with relatively
                      few verifications against real world data.
                      GDDR 3.2 introduced the Automated Configuration Check—DASD
                      function with the GDDR ACCD utility.
                      One of the more challenging parts of defining an environment to
                      GDDR has always been to ensure correct definition of large and
                      complex SRDF configurations.
                      GDDRACCD removes the uncertainty about the correctness of an
                      existing GDDR SRDF device definition, by checking the real
                      configuration found in the GDDR managed EMC Symmetrix storage.


GDDR variables influencing the expected configuration
                      GDDRACCD works is by formulating a detailed expectation about
                      the configuration based on a whole series of GDDR global variables.
                      This expectation is then compared against what we find in the DMX.
                      5 phases:
                      Phase 1
                      Validation for internal consistency and exit if errors: no point
                      checking the DMX if we already know the parms are bad.
                      Phases 2 and 3
                      Query the box and report deviations.
                      Phase 4
                      Provides information allowing the user to verify what GDDRACCD
                      based its decisions on.
                      Phase 5
                      RDF.DEVICES parms allow comparison to existing parms, but
                      differences are to be expected:
                      ◆   Real parms could be a subset of generated parms
                      ◆   Ranges in real parms could for a split of generated ranges
                      ◆   Gatekeepers will certainly be different



                                                   Automated Configuration Check - DASD          149
GDDR Managed Systems Design Considerations




150     GDDR Solution Design and Implementation Techniques
6
                                                    Storage Infrastructure
                                                   Design Considerations




This chapter contains the following:
◆   Storage configuration and layout ..................................................            152
◆   Infrastructure recommendations ...................................................             153
◆   EMC software levels ........................................................................   158
◆   Sysplex and coupling facility considerations...............................                    160




                                   Storage Infrastructure Design Considerations                          151
Storage Infrastructure Design Considerations




      Storage configuration and layout
                                       Symmetrix storage systems incorporate many standard features that
                                       ensure a higher level of recoverability and availability than other
                                       storage systems. They are configurable at the logical volume level so
                                       different protection schemes can be applied to different classes of
                                       data within the same Symmetrix storage system on the same physical
                                       device. Customers choose data protection options, such as the
                                       following, to match their data requirements:
                                       ◆       Mirroring (RAID 1) or RAID 10
                                       ◆       RAID 6 (6+2) and RAID 6 (14+2)
                                       ◆       RAID 5 (3+1) and RAID 5 (7+1)
                                       ◆       Symmetrix Remote Data Facility (SRDF)
                                       ◆       TimeFinder
                                       ◆       Dynamic Sparing
                                       ◆       Global Sparing




152      GDDR Solution Design and Implementation Techniques
Storage Infrastructure Design Considerations




Infrastructure recommendations
                         There is benefit is standardizing configurations "best practices" across
                         the install base of GDDR worldwide. This makes the support
                         function more efficient in that support people will be able provide
                         guidance aided by common naming conventions and infrastructure
                         layout. Also, the recommendations in this section are designed to
                         minimize the overheads associated with large infrastructure
                         deployments. A good example is the number of device ranges.
                         Having very small device ranges increases the overheads associated
                         with infrastructure manipulations - consuming more C-System
                         memory, more worker tasks, more commands issued to the arrays
                         and so on. Engaging the GDDR Solution Support team very early in
                         your design phase will ensure an appropriate trade-off between
                         configuration flexibility and associated overheads.


Gatekeeper device list
                         A Gatekeeper device is a reserved device used to direct commands to
                         a particular array(s). Due to the volume of the commands during
                         certain activities, such as data center swaps, the volume of commands
                         to these devices can be quite high. GDDR does not require gatekeeper
                         devices for itself but some underlying technologies do require these
                         devices. For performance and to avoid potential conflicts, gatekeeper
                         devices for SRDF/S, SRDF/A, MSC/Star etc must be defined and
                         must be separate devices.
                         It is very important to understand the layout of gatekeeper devices
                         and which component uses gatekeepers. It is recommended to create
                         and maintain charts similar to these for each array in the solution.
                         This example shows a list of the SRDF Gatekeeper devices and RDF
                         groups for a SRDF/Star with AutoSwap configuration with one
                         VMAX frame located at each site. The following list of SRDF
                         Gatekeeper devices & RDF groups used at each location describes the
                         intended purpose for use in commands. The required Host
                         Component (H.C.) command prefix is also provided to help facilitate
                         command generation for each specific RDF group. This additional
                         information can be helpful in a configuration where multiple SRDF
                         Host Component tasks are present.




                                                              Infrastructure recommendations            153
Storage Infrastructure Design Considerations



                                       DC1 Complex:

                                                GDDRPARM               Gatekeeper = 803D-8075 (00E1-0119)

                                                H.C. Prefix            ##


                                       Customer-Specific-Name VMAX# 2455:

                                                SRDF/ S - STAR:              Group = 10

                                                R/W - AD:                    Group = 21

                                                SRDF/ A - Concurrent         STAR::Group = 20

                                                MSC/STA                      Gatekeeper = 8037 (00DB)

                                                SRDFA/MSC                    Gatekeeper = 8076 (011A)

                                                SCF                          Gatekeeper = 8038-803A (00DC-00DE)

                                                CSC                          Gatekeeper = 803B-803C (00DF-00E0)


                                       DC2 Complex:

                                                GDDRPAR                Gatekeeper = 903D-9075 (00E1-0119)

                                                H.C. Prefix            ##


                                       Customer-Specific-Name VMAX# 2457:

                                                SRDF/S-STAR:                 Group=10

                                                STAR-Recovery:               Group=30

                                                MSC/STAR                     Gatekeeper=9037 (00DB)

                                                SRDFA/MSC                    Gatekeeper=9076 (011A)

                                                SCF                          Gatekeeper=9038-903A (00DC-00DE)

                                                CSC                          Gatekeeper=903B-903C(00DF-00E0)




154      GDDR Solution Design and Implementation Techniques
Storage Infrastructure Design Considerations



                      DC3 Complex:

                              GDDRPARM                     Gatekeeper=803D-8075 (00E1-0119)

                              H.C.Prefix                   ##


                      Customer-Specific-Name VMAX# 2465:

                             SRDF/A - Concurrent STAR:   Group = 20

                             R/W - AD:                   Group = 21

                             STAR - Recovery:            Group = 30


                             SRDFA/MSC                   Gatekeeper = 8076 (011A)

                             SCF                         Gatekeeper = 8038-803A (00DC-00DE)

                             CSC                         Gatekeeper = 803B-803C (00DF-00E0)



Started tasks names   EMC recommends the following conventions are adopted for started
                      task names.
                      STCs in DC1
                      GDDR managed Production LPARs:
                      ◆   EMCBASE
                          • SYSISVP.GDDR.PARMLIB(SCFINI00)
                      ◆   EMCCGRP
                          • SYSISVP.GDDR.PARMLIB(CONGRP34)
                      ◆   EMCSRDF
                          • SYSISVP.GDDR.PARMLIB(EMCCMD00)
                      ◆   GDDRMAIN
                          • SYSISVP.GDDR.PARMLIB(GDDMPARM)
                      GDDR Control LPAR Customer-Specific-Name:
                      ◆   GDDRSCF
                          • SYSISVP.GDDR.PARMLIB(SCFINI00)
                      ◆   GDDRSRDF
                          • SYSISVP.GDDR.PARMLIB(EMCCMD00)


                                                          Infrastructure recommendations            155
Storage Infrastructure Design Considerations



                                       ◆       GDDRCGRP
                                               • SYSISVP.GDDR.PARMLIB(CONGRP34)
                                       ◆       GDDRMAIN
                                               • SYSISVP.GDDR.PARMLIB(GDDMPARM)
                                       ◆       GDDREVM
                                               • SYSISVP.GDDR.PARMLIB(GDDREVMP)
                                       ◆       GDDRHBM
                                               • SYS1.GDDR.PARMLIB(GDDRHBMP)
                                       ◆       GDDRWORK
                                       ◆       GDDRPARMS
                                               • SYSISVP.GDDR.PARMLIB
                                       STCs in DC2
                                       GDDR Control LPAR Customer-Specific-Name:
                                       ◆       GDDRSCF
                                               • SYSISVP.GDDR.PARMLIB(SCFINI00)
                                       ◆       GDDRSRDF
                                               • SYSISVP.GDDR.PARMLIB(EMCCMD00)
                                       ◆       GDDRCGRP
                                               • SYSISVP.GDDR.PARMLIB(CONGRP34)
                                       ◆       GDDRMAIN
                                               • SYSISVP.GDDR.PARMLIB(GDDMPARM)
                                       ◆       GDDREVM
                                               • SYSISVP.GDDR.PARMLIB(GDDREVMP)
                                       ◆       GDDRHBM
                                               • SYSISVP.GDDR.PARMLIB(GDDRHBMP)
                                       ◆       GDDRWORK
                                       ◆       GDDRPARMS
                                               • SYSISVP.GDDR.PARMLIB
                                       STCs in DC3
                                       GDDR Control LPAR Customer-Specific-Name:
                                       ◆       GDDRSCF


156      GDDR Solution Design and Implementation Techniques
Storage Infrastructure Design Considerations



                          • SYSISVP.GDDR.PARMLIB(SCFINI00)
                     ◆    GDDRSRDF
                          • SYSISVP.GDDR.PARMLIB(EMCCMD00)
                     ◆    GDDRMAIN
                          • SYSISVP.GDDR.PARMLIB(GDDMPARM)
                     ◆    GDDREVM
                          • SYSISVP.GDDR.PARMLIB(GDDREVMP)
                     ◆    GDDRWORK
                     ◆    GDDRHBM
                          • SYSISVP.GDDR.PARMLIB(GDDRHBMP)


Important dataset names
                     The following is a list of the dataset names which are used by the
                     various LPARs for GDDR control. These are the names as supplied by
                     EMC, regardless of the naming convention used it is important to
                     ensure that any support staff are familiar with the naming
                     convention:
                     ◆    SYSISVP.GDDR.*
                     ◆    SYS1.GDDR.*
                     ◆    SYS1.EMC.*




                                                       Infrastructure recommendations           157
Storage Infrastructure Design Considerations




      EMC software levels
                                        Maintain a list of software levels, as shown in Table 7.

                         Table 7        Software levels

                                         Customer-Specific-Name SOFTWARE LEVELS

                         Updated                                                   8/17/2010

                         Mainframe Enablers                   Version              Version - Fix Level

                         Resource Pack Base                   7.x                  7.x - SF7xxxx

                         Consistency Group                    7.x                  7.x - SC7xxxx

                         SRDF Host Component                  7.x                  7.x - SR7xxxx

                         TimeFinder Mirror                    7.x                  7.x - ST7xxxx

                         TimeFinder Snap                      7.x                  7.x - SN7xxxx

                         TimeFinder Utility                   7.x                  7.x - SU7xxxx

                         GDDR                                 3.x                  3.x - GD3xxxx

                         Microcode

                         VMAX base code                       5874.230.183

                         VMAX E-Packs                                              RB/23 with 50853, 50854

                         DMX-3/DMX-4 base code                5773.155.107

                         E-Packs                                                   RB/53 with 49123, 49125



      LPAR configuration specifics
                                        These are guidelines/best practices for GDDR C-System LPAR
                                        configurations.
                                        Each GDDR LPAR will Require:
                                        ◆      2 GB Storage
                                        ◆      2 Shared CPs
                                        ◆      Minimum 25 Mod-9 DASD Units (or equivalent if different
                                               volume sizes are used)




158      GDDR Solution Design and Implementation Techniques
Storage Infrastructure Design Considerations



GDDR LPAR Configuration Restrictions and Considerations:
◆   GDDR Control LPAR DASD volumes are not shared with any
    LPAR
◆   No Coupling Facility or Specialty Engine Requirements.
◆   No DR Requirements for GDDR Control LPARs.
◆   Initial GDDR LPAR to be modeled from CMCX LPAR
◆   Maintain recommended EMC Symmetrix Microcode Levels
◆   Initial Security Databases created from customer-environment
    LPAR database
◆   LPAR Access Restricted to Infrastructure Support Staff
GDDR - Required Automation and Software Elements:
◆   z/OS 1.x Operating System
◆   CA/ACF2 Security
◆   EMC SRDF Product Suite
Example customer Infrastructure Management Components:
◆   IOC Console Operations Support / Automation (Tivoli Netview,
    SA for z/OS)
◆   Storage (SMS, DFHSM DASD Space Mgmt, Tape Mgmt)
◆   Network (FTP, Firewall, TN3270, NDM)
◆   Systems Mgmt (Incident, Change (ECMS))
◆   Change Mgmt (Version and Change Control)
◆   SMF and Logrec Data Collection (Security, Billing, Performance)
◆   Output Archival / Retrieval / Spool Mgmt (product)
◆   Infrastructure Job Scheduling (Data Collection, Infrastructure
    Backups
Cross System Communications (CSC) Requirements:
Update the SCFINIxx member in SYSISVP.GDDR.PARMLIB with the
following updates on all LPARs (PROD and GDDR C-Systems):
◆   SCF.CSC.IDLEPOLL=5
◆   SCF.CSC.EXPIRECYCLE=20
◆   SCF.CSC.MITPERIOD=60




                                                 EMC software levels          159
Storage Infrastructure Design Considerations




      Sysplex and coupling facility considerations
                                       GDDR operations, system management and enablement of data
                                       consistency are compliant with z/OS Sysplex availability, design and
                                       operations. GDDR extends Sysplex availability by addressing a wide
                                       range of planned and unplanned operations for Sysplex systems,
                                       DASD resources, Sysplex resources, Central Processors, processor
                                       capacities and sites.
                                       Sysplex systems require a single time source and synchronous
                                       mirroring between the source DASD and recovery DASD.
                                       Asynchronous mirroring does not support Sysplex operations,
                                       therefore Sysplex linkage between asynchronous operational sites is a
                                       not a consideration for sites using SRDF/A replication - either as a
                                       two-site GDDR SRDF/A or to the 3rd site in STAR configurations.
                                       Sysplex configuration is a key component of business continuity
                                       capability of a proposed solution. The design of the Sysplex, its
                                       required connectivity and the topology of shared Sysplex resources
                                       has a role in the solution's redundancy, performance and business
                                       continuity. Typically, solutions designed for higher availability and
                                       continuous availability have increasingly complex Sysplex
                                       considerations. Whereas, GDDR CONGROUP solutions designed
                                       for site switch and system restart require very modest Sysplex
                                       considerations between the sites.
                                       GDDR's C systems do not have any Sysplex operational requirement.
                                       The C systems are stand alone systems outside any of the managed
                                       Sysplexs. This isolates the C systems from Sysplex wide issues,
                                       simplifies C system implementation and allows the C systems to
                                       manage multiple Sysplexs simultaneously. It also allows
                                       asynchronous mirroring solutions and some synchronous mirroring
                                       solutions without any cross site required Sysplex connectivity
                                       without degradation of Recovery Time Objectives.
                                       When data is shared within a Sysplex data consistency is presumed
                                       and GDDR's architecture should align with Sysplex requirements for
                                       data consistency. In most instances a Sysplex's mirrored DASD
                                       resources required for the restart of systems and workload should
                                       belong to a single CONGROUP. There is no limitation to the number
                                       of Sysplexs encompassed by a single CONGROUP and managed by
                                       GDDR.




160      GDDR Solution Design and Implementation Techniques
Storage Infrastructure Design Considerations



Significant variations in the required Sysplex connectivity depend on
the Sysplex topology, the models of the z/series hardware, Sysplex
timing mechanisms and the desired level of business continuity.
Careful planning of the cross site linkages, supported distances,
sufficient redundancy and considerations for performance impact
must be part of the early design and project planning.
Typically, when architecting a multisite Sysplex, DWDMs are
employed to coalesce, aggregate and managed the Sysplex
connectivity as well as cross site DASD and mirroring connectivity.
Solutions including AutoSwap are easily facilitated by cross site
DASD connectivity over DWDMs. Furthermore, many of the z/series
linkage protocols can not employ channel extension, requiring dark
fiber or DWDM connectivity.

When the solution includes AutoSwap, regardless of the Sysplex
topology all DASD is required to stay up:
◆  Must be mirrored in SRDF/S
◆   Must be available to the systems at DC1 and DC2
◆   Must have all R1-devices on 1 site, and all R2-devices on the
    opposite site
◆   If it has a UCB on the R1-side, must also have a UCB on the
    R2-side
When employing Sysplex LOGR the Sysplex LOGGER data sets
should be mirrored with SRDF/S and SRDF/A in SRDF/Star
configurations.
Single Site Sysplex
All active production systems reside at primary site, the systems
comprising the Sysplex can not span DC1 and DC2. This is usually
representative of GDDR CONGROUP or GDDR SRDF/A solutions
where a complete Sysplex restart is required to restart systems and
workload at the recovery site.
No Sysplex connectivity is required between sites, Sysplex time is not
an issue, duplexing of structures between sites is not
indicated-structures would be rebuilt when the systems are IPLed at
the recovery site. Restart of systems can be facilitated by the
occasional mirroring of system resources including the Sysplex
couple data sets. The Sysplex couple data should not be mirrored
using SRDF/S and not in the CONGROUP (consider mirroring with




                         Sysplex and coupling facility considerations         161
Storage Infrastructure Design Considerations



                                       Adaptive Copy). Alternatively, with minimal cross site DASD
                                       connectivity the Sysplex couple data sets can be mirrored with
                                       Sysplex couple data set duplexing.
                                       In the event of a CONGROUP trip, GDDR doesn't attempt to stop all
                                       systems to preserve the consistency of the target SRDF/S DASD and
                                       the contents of the target duplex Coupling Facility structures. This
                                       technique has only limited value during the restart of systems.
                                       Multi-Site Sysplex
                                       This architecture is consistent with solutions requiring high or
                                       continuous levels of availability, when some subset of operations is
                                       expected to continue in the event of systems, DASD, site failures or
                                       multiple failures.
                                       The z/series systems hosting z/OS systems in the multisite Sysplex
                                       at both sites must share a single time consistency in a Common Time
                                       Network (CTN). The CTN network topology, redundancy and
                                       protocols depend up the timer technology employed. When
                                       employing STP, placement of the PTS and BTS is an important
                                       consideration.
                                       Base Sysplex requires cross site CTC connectivity between all
                                       z/series CPCs, single time consistency and common Sysplex couple
                                       data sets.
                                       Parallel Sysplex employs Coupling Facilities to facilitate system to
                                       system communications. CFs can be Internal or External Coupling
                                       Facilities and all systems require connectivity to the CFs, these
                                       Sysplex links should be redundant. CFs should be deployed in a
                                       manner consistent with the business continuity objectives. If
                                       structures are duplexed these links should be redundant.
                                       AutoSwap is usually included in the implementation to eliminate
                                       DASD as a single point of failure. If so, all DASD required for
                                       operations must be mirrored. The Sysplex couple data sets must not
                                       be mirrored with SRDF/S and must rely upon Sysplex couple data
                                       set duplexing. Cross site DASD connectivity consistent with cross
                                       site operations is typically a consideration.




162      GDDR Solution Design and Implementation Techniques
7
                                                                     Best Practices for
                                                                      Network Design




This chapter contains the following:
◆   Overview ...........................................................................................   164
◆   Understanding bandwidth requirements.....................................                              165
◆   Synchronous network design considerations ..............................                               168
◆   Asynchronous network design considerations ...........................                                 172




                                                       Best Practices for Network Design                         163
Best Practices for Network Design




      Overview
                                    After the synchronous and/or asynchronous bandwidth assessment
                                    has been conducted attention turns to the design and construction of
                                    the network. There are many different types of networks but for this
                                    discussion we will differentiate between networks designed to carry
                                    synchronous workloads and those designed to carry asynchronous
                                    workloads.
                                    Delays in the provisioning, debugging and load testing of network
                                    infrastructure are one of the most common causes of delay for GDDR
                                    projects. Remember the constraint framework for the project, if your
                                    network provisioning is late, be prepared to adjust the
                                    implementation timelines around new and realistic dates.




164      GDDR Solution Design and Implementation Techniques
Best Practices for Network Design




Understanding bandwidth requirements
                Before a network topology can be designed the data change rate must
                be measured. This change rate will be used to determine both
                synchronous and asynchronous network bandwidth requirements.
                EMC TS has toolsets and methods that are used to measure existing
                I/O flows to construct data replication requirements over time.
                Replication requirements not only vary through out the 24 hour day
                (batch vs online) but also throughout the month year, with most
                enterprises having at least one peak processing period. Therefore the
                data sampling for modeling must span a critical time period or at
                least be “factored” from the observation period based on the known
                difference between the observation period and the peak period.
                Both SRDF/S and SRDF/A use block level I/O for replication,
                therefore the collected data can be used for both modeling exercises.
                The actual bandwidth requirement will typically vary between the
                two types of networks (even for combined Synchronous and
                Asynchronous). For HA networks page volumes and temporary data
                sets are included in the replication set; for asynchronous networks
                these data types are typically excluded since in the event that the DR
                site copy is required the contents of the page volumes and temporary
                data set volumes are not required for system restart.

                Note: Figure 41 on page 166 shows a measured workload with significant
                write I/O imbalance across the three Symmetrix. Part of the deployment of
                the project will be to move workloads between the arrays to achieve a more
                balanced I/O profile.




                                              Understanding bandwidth requirements             165
Best Practices for Network Design




                         Figure 41   Write distribution

                                     The raw (uncompressed) bandwidth requirement is determined by
                                     multiplying the number of writes observed in a specific interval by
                                     the size of the write. Earlier design tools use the Connect Time to
                                     determine transmission interval and then calculate blocksize. This
                                     method does not work for FICON attached arrays so therefore an
                                     average block size is determined by dividing the total bytes
                                     transferred in the time interval by the number of writes observed.
                                     The method also varies based on data source, such as SMF records
                                     versus internal EMC Symmetrix STP data.
                                     For synchronous workloads, performance is critical. Therefore this
                                     analysis includes a check of “hot volumes” —if detected—a list of
                                     volumes is produced where the overhead of synchronous replication
                                     is likely to be problematic. Further analysis of these volumes is
                                     required to determine the contents of the volume and identify
                                     mitigation techniques, which may include:
                                     ◆   Re-blocking data sets


166      GDDR Solution Design and Implementation Techniques
Best Practices for Network Design



            ◆   Separating hot data sets
            ◆   Deploying SMS data striping
            ◆   Changing ACS routines
            ◆   Balancing workload across Symmetrix arrays
            At the conclusion of the study period a report is produced which will
            include a profile of the observed bandwidth requirement as shown in
            Figure 42 on page 167.

            Note: Many customers are accustomed to working with their local account
            teams for the sizing of infrastructure deployments. Arguably though, your
            account team will not have experience designing for this type of complex
            solution. The GDDR Solution Support Team can help to direct EMC
            Corporate resources to ensure that this vital piece of supporting
            infrastructure meets best practice standards.




Figure 42   Bandwidth requirement




                                           Understanding bandwidth requirements            167
Best Practices for Network Design




      Synchronous network design considerations
                                    All array-based synchronous data replication creates an overhead on
                                    write response times. For this reason it is absolutely essential to
                                    understand the impact of synchronous data replication before
                                    building out any infrastructure. EMC has sophisticated modeling
                                    tools which can be used to apply the overheads of SRDF/S
                                    synchronous mirroring at zero distance, and then apply latency into
                                    the model as the distance between the primary (R1) and secondary
                                    (R2) storage are increased. At some point in these modeling studies
                                    the maximum amount of separation between the R1 and R2 becomes
                                    known when the increase I/O response time begins to slow write
                                    throughput to the point that the observed write activity to the volume
                                    can no longer be sustained. Post implementation, SRDF/S
                                    overheads can be seen as “disconnect” time when looking at I/O
                                    response times.
                                    Figure 43 on page 169 shows the results of one part of a typical
                                    synchronous design study output: DASD volumes sorted into
                                    descending write activity. Looking more closely at volume CI9003, it
                                    can be seen to have a peak write activity of 300 writes per second. In
                                    effect this volume could have an average response time of 3.3
                                    milliseconds and sustain a throughput of 300 writes per second. A
                                    separate analysis (not show) reveals the average response time to this
                                    volume as 0.5 milliseconds. By subtracting 0.5 from 3.3—reveals that
                                    the overheads of synchronous replication cannot impose more than
                                    3.3 – 0.5 = 2.8 milliseconds to this I/O before the application would




168      GDDR Solution Design and Implementation Techniques
Best Practices for Network Design



            experience performance degradation (because the throughput
            requirement of 300 writes cannot be maintained if the average
            response time exceeds 3.3 milliseconds.




Figure 43   DASD write activity by volser

            Applying this logic to all volumes in the study group produces a
            chart similar to Figure 44 on page 170. The chart shows the existing
            throughput requirement (in blue) and the likely throughput
            requirement (in red) when synchronous overheads at the
            customer-specific distance between the R1 and R2 devices is applied
            to the existing write I/O response time.




                                      Synchronous network design considerations         169
Best Practices for Network Design




                         Figure 44   DASD Volumes Flagged for Tuning

                                     Tuning for synchronous replication can be done well in advance of
                                     the implementation of GDDR. The following techniques are typically
                                     used to address performance issues:
                                     ◆   Use sequential data striping
                                     ◆   Change the blocksize of the data set write. Many of the overheads
                                         associated with synchronous replication are fixed, such as the
                                         speed of light problem. This is the case whether the block being
                                         transmitted is small or large. Therefore, large block writes are
                                         more efficient in terms of write throughput over time in
                                         synchronous replication than smaller block sizes.
                                     ◆   Separate “hot” data sets — Two or more write I/O intensive data
                                         sets can be moved from the same volser and separated.
                                     ◆   Change ACS routines — May be needed to ensure that separated
                                         data sets remain separated.
                                     Most customers deploy a “dark fiber” network to support
                                     synchronous replication. The Network Terminating Equipment
                                     (NTE) for these networks is usually a Dense Wave Division
                                     Multiplexer (DWDM). DWDMs are able to concentrate multiple
                                     wavelengths onto single fiber(s) laid between data centers. DWDMs
                                     are very low latency so the greatest impact in terms of write I/O
                                     performance is typically the effect of the speed-of-light transmission
                                     of data through the fiber and is directly proportional to the distance
                                     of the fiber route.


170      GDDR Solution Design and Implementation Techniques
Best Practices for Network Design



Another advantage to the DWDM approach is for customers looking
to build HA infrastructures. In addition to the Fibre Channel packets
generated by SRDF/S replication, the same fiber can carry FICON
signals necessary to support AutoSwap.
In some instances customers will build HA DASD infrastructures
where the R1 and R2 devices are in the same data center (perhaps
separated by a thermal firewall), in which case direct connect or
connect via cascading directors is common.




                         Synchronous network design considerations         171
Best Practices for Network Design




      Asynchronous network design considerations
                                    Asynchronous replication is provided by SRDF/A. The interface to
                                    SRDF/A is via EMC Multi-Session Consistency (MSC) software.
                                    MSC provides an API interface used by GDDR to control both
                                    SRDF/A running in “MSC mode” and also SRDF/A when it is
                                    running in “Star Mode”. The GDDR for SRDF/A configuration (2
                                    site) controls SRDF/A replication using the MSC API. The same MSC
                                    API interface is used by GDDR to control GDDR for SRDF/Star
                                    solutions (3 site). For more specific information on these control
                                    interfaces, refer to the sections “SRDF/Star with AutoSwap” on
                                    page 85, “GDDR for SRDF/S with ConGroup (Two sites - DC1 and
                                    DC2)” on page 119, and “GDDR for SRDF/A (Two sites - DC1 and
                                    DC3)” on page 123.
                                    SRDF/A is designed as a cache-to-cache replication infrastructure. It
                                    typically supports cycle times down to three second durations,
                                    assuming that the network infrastructure is designed to carry the
                                    workload. Since it is theoretically possible to lose up to 2 SRDF/A
                                    cycles in a DR event, the RPO is calculated by multiplying the
                                    SRDF/A cycle time by 2. Thus, a 3 second cycle time is said to
                                    support an RPO of 6 seconds. It is important to understand that the
                                    RPO is an objective. At times it may not be possible for the replication
                                    infrastructure to maintain the objective time. Examples include:
                                    program loops that generate large amounts of write I/O, unusually
                                    high write activity such as during data base “reorgs,” a compromised
                                    network (example: a failed switch). EMC has instituted a number of
                                    measures to improve resiliency of SRDF/A replication:
                                    ◆   Transmit Idle — Temporarily suspend SRDF/A data
                                        transmission to allow SRDF/A replication to survive temporary
                                        link problems such as link “bounces”.
                                    ◆   Delta Set Extension (DSE) — Elongates the SRDF/A capture
                                        cycle during longer duration temporary link problems, instead of
                                        terminating SRDF/A replication. DSE uses a disk pool to buffer
                                        the current capture cycle data until such time as it can be
                                        de-staged across the link.
                                    ◆   Write Pacing — Slowing the arrival rate of write activity into the
                                        capture cycle by accepting the write to cache, disconnecting for a
                                        specified period of time (milliseconds) and then reconnecting to
                                        complete the I/O.




172      GDDR Solution Design and Implementation Techniques
Best Practices for Network Design



None of these improvements should be assumed to be substitutes to
provisioning the network to the observed peak bandwidth
requirement. To build one or two year operational resiliency into the
network it should be provisioned at the peak observed rate plus the
annual rate of increase of observed throughput. This can be
calculated from the known growth in storage, assuming a growth
rate of 20% over the past 12 months the allowance for capacity
increase can be calculated as:
Bandwidth_growth = (Total_Write_IOs * I/O Size * 20%) /
Compression_Ratio
Many of the GDDR supported configurations are very large. The
designer must consider the scalability of the proposed network and
also “troubleshooting.” For mainframe workloads EMC has observed
FCIP networks to be reliable and scalable and relatively easy to
troubleshoot and tune. Consider operational complexity when
looking to tune or balance network workloads. Gig/E networks are
becoming more popular but do introduce more complexity for
workload balancing and, if used, WAN Optimizers can complicate
the network architecture and impact the throughput, latency and
scalability of the network.
EMC has observed the following type of network topology to be very
successful in large scale SRDF/A implementations, EMC therefore
recommends the following network topology for SRDF/A
replication. Figure 45 on page 174 shows 15 storage arrays in
three-sites (DC1, DC2 and DC3) deploying multiple switches to
aggregate bandwidth up to 10GB for long-distance transmission and
then splitting back through switches on the target side to the
Symmetrix at the remote location. Note how readily this architecture
allows for scale of communications paths, storage devices and
switches.




                       Asynchronous network design considerations          173
Best Practices for Network Design




                               60



                               61                 BI                  BI
                                                                           60

                               62

                                                                           61

                               63                      10Gbps ULH
                                                DC1    Bouh Router
                                                                           62

                               64

                                                                           63

                               65
                                                  BI                  BI

                               66



                               67




                               68



                               69                 BI                  BI
                                                                           68
                               6A

                                                                           69
                               6B
                                                DC2    10Gbps ULH
                                                        Bouh Router
                                                                           6A
                               6C


                                                                           6B
                               6D
                                                  BI                  BI

                               6E




                                                                                 SYM-002545




                         Figure 45   Recommended asynchronous network topology




174      GDDR Solution Design and Implementation Techniques
Glossary




                   This glossary contains terms related to disk storage subsystems.
                   Many of these terms are used in this manual.

                   A
 alternate track   A track designated to contain data in place of a defective primary
                   track. See also ”primary track.”

       actuator    A set of access arms and their attached read/write heads, which
                   move as an independent component within a head and disk assembly
                   (HDA).

        adapter    Card that provides the physical interface between the director and
                   disk devices (SCSI adapter), director and parallel channels (Bus & Tag
                   adapter), director and serial channels (Serial adapter).

                   C
         cache     Random access electronic storage used to retain frequently used data
                   for faster access by the channel.

     cache slot    Unit of cache equivalent to one track.

channel director   The component in the Symmetrix subsystem that interfaces between
                   the host channels and data storage. It transfers data between the
                   channel and cache.




                                 GDDR Solution Design and Implementation Techniques         175
Glossary



                  controller ID    Controller identification number of the director the disks are
                                   channeled to for EREP usage. There is only one controller ID for
                                   Symmetrix.

                          CKD      Count Key Data, a data recording format employing self-defining
                                   record formats in which each record is represented by a count area
                                   that identifies the record and specifies its format, an optional key area
                                   that may be used to identify the data area contents, and a data area
                                   that contains the user data for the record. CKD can also refer to a set
                                   of channel commands that are accepted by a device that employs the
                                   CKD recording format.

                                   D
                         DASD      Direct access storage device, a device that provides nonvolatile
                                   storage of computer data and random access to that data.

              data availability    Access to any and all user data by the application.

            delayed fast write     There is no room in cache for the data presented by the write
                                   operation.

                      destage      The asynchronous write of new or updated data from cache to disk
                                   device.

                        device     A uniquely addressable part of the Symmetrix subsystem that
                                   consists of a set of access arms, the associated disk surfaces, and the
                                   electronic circuitry required to locate, read, and write data. See also
                                   ”volume.”

              device address       The hexadecimal value that uniquely defines a physical I/O device
                                   on a channel path in an MVS environment. See also ”unit address.”

               device number       The value that logically identifies a disk device in a string.

                   diagnostics     System level tests or firmware designed to inspect, detect, and correct
                                   failing components. These tests are comprehensive and self-invoking.

                       director    The component in the Symmetrix subsystem that allows Symmetrix
                                   to transfer data between the host channels and disk devices. See also
                                   ”channel director.”

                  disk director    The component in the Symmetrix subsystem that interfaces between
                                   cache and the disk devices.


176        GDDR Solution Design and Implementation Techniques
Glossary



   dual-initiator   A Symmetrix feature that automatically creates a backup data path to
                    the disk devices serviced directly by a disk director, if that disk
                    director or the disk management hardware for those devices fails.

dynamic sparing     A Symmetrix feature that automatically transfers data from a failing
                    disk device to an available spare disk device without affecting data
                    availability. This feature supports all non-mirrored devices in the
                    Symmetrix subsystem.

                    E
         ESCON      Enterprise Systems Connection, a set of IBM and vendor products
                    that connect mainframe computers with each other and with attached
                    storage, locally attached workstations, and other devices using
                    optical fiber technology and dynamically modifiable switches called
                    ESCON Directors. See also ”ESCON director.”

 ESCON director     Device that provides a dynamic switching function and extended link
                    path lengths (with XDF capability) when attaching an ESCON
                    channel to a Symmetrix serial channel interface.

                    F
       fast write   In Symmetrix, a write operation at cache speed that does not require
                    immediate transfer of data to disk. The data is written directly to
                    cache and is available for later destaging.

            FBA     Fixed Block Architecture, disk device data storage format using
                    fixed-size data blocks.

             FRU    Field Replaceable Unit, a component that is replaced or added by
                    service personnel as a single entity.

          frame     Data packet format in an ESCON environment. See also ”ESCON.”

                    G
    gatekeeper      A small logical volume on a Symmetrix storage subsystem used to
                    pass commands from a host to the Symmetrix storage subsystem.
                    Gatekeeper devices are configured on standard Symmetrix disks.

             GB     Gigabyte, 109 bytes.




                                  GDDR Solution Design and Implementation Techniques        177
Glossary




                                   H
                head and disk      A field replaceable unit in the Symmetrix subsystem containing the
                    assembly       disk and actuator.

                home address       The first field on a CKD track that identifies the track and defines its
                                   operational status. The home address is written after the index point
                                   on each track.

                hyper-volume       The ability to define more than one logical volume on a single
                   extension       physical disk device making use of its full formatted capacity. These
                                   logical volumes are user-selectable in size. The minimum volume size
                                   is one cylinder and the maximum size depends on the disk device
                                   capacity and the emulation mode selected.

                                   I
                             ID    Identifier, a sequence of bits or characters that identifies a program,
                                   device, controller, or system.

                           IML     Initial microcode program loading.

                 index marker      Indicates the physical beginning and end of a track.

                   index point     The reference point on a disk surface that determines the start of a
                                   track.

                       INLINES     An EMC-provided host-based Cache Reporter utility for viewing
                                   short and long term cache statistics at the system console.

                   I/O device      An addressable input/output unit, such as a disk device.

                                   K
                              K    Kilobyte, 1024 bytes.

                                   L
           least recently used     The algorithm used to identify and make available the cache space by
               algorithm (LRU)     removing the least recently used data.

               logical volume      A user-defined storage device. In the Model 5200, the user can define
                                   a physical disk device as one or two logical volumes.



178        GDDR Solution Design and Implementation Techniques
Glossary



            long miss   Requested data is not in cache and is not in the process of being
                        fetched.

longitude redundancy    Exclusive OR (XOR) of the accumulated bytes in the data record.
           code (LRC)
                        M
                  MB    Megabyte, 106 bytes.

        mirrored pair   A logical volume with all data recorded twice, once on each of two
                        different physical devices.

            mirroring   The Symmetrix maintains two identical copies of a designated
                        volume on separate disks. Each volume automatically updates
                        during a write operation. If one disk device fails, Symmetrix
                        automatically uses the other disk device.

                        P
          physical ID   Physical identification number of the Symmetrix director for EREP
                        usage. This value automatically increments by one for each director
                        installed in Symmetrix. This number must be unique in the
                        mainframe system. It should be an even number. This number is
                        referred to as the SCU_ID.

        primary track   The original track on which data is stored. See also ”alternate track.”

           promotion    The process of moving data from a track on the disk device to cache
                        slot.

                        R
             read hit   Data requested by the read operation is in cache.

            read miss   Data requested by the read operation is not in cache.

          record zero   The first record after the home address.

                        S
           scrubbing    The process of reading, checking the error correction bits, and writing
                        corrected data back to the source.




                                      GDDR Solution Design and Implementation Techniques          179
Glossary



                 SCSI adapter      Card in the Symmetrix subsystem that provides the physical interface
                                   between the disk director and the disk devices.

                     short miss    Requested data is not in cache, but is in the process of being fetched.

                           SSID    For 3990 storage control emulations, this value identifies the physical
                                   components of a logical DASD subsystem. The SSID must be a
                                   unique number in the host system. It should be an even number and
                                   start on a zero boundary.

                         stage     The process of writing data from a disk device to cache.

           storage control unit    The component in the Symmetrix subsystem that connects
                                   Symmetrix to the host channels. It performs channel commands and
                                   communicates with the disk directors and cache. See also ”channel
                                   director.”

                         string    A series of connected disk devices sharing the same disk director.

                                   U
                  unit address     The hexadecimal value that uniquely defines a physical I/O device
                                   on a channel path in an MVS environment. See also ”device address.”

                                   V
                       volume      A general term referring to a storage device. In the Symmetrix
                                   subsystem, a volume corresponds to single disk device.

                                   W
                       write hit   There is room in cache for the data presented by the write operation.

                     write miss    There is no room in cache for the data presented by the write
                                   operation.




180        GDDR Solution Design and Implementation Techniques

More Related Content

PDF
Why Networked FICON Storage Is Better Than Direct Attached Storage
PDF
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
PPTX
Disaster Recovery using Spectrum Scale Active File Management
PDF
z/OS Communications Server Overview
PDF
Desmontando la siciliana 2
PDF
IP Routing on z/OS
PPTX
Relación de confianza
PDF
Xilinx Data Center Strategy and CCIX
Why Networked FICON Storage Is Better Than Direct Attached Storage
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
Disaster Recovery using Spectrum Scale Active File Management
z/OS Communications Server Overview
Desmontando la siciliana 2
IP Routing on z/OS
Relación de confianza
Xilinx Data Center Strategy and CCIX

Viewers also liked (16)

PDF
Pivotal gem fire_twp_distributed-main-memory-platform_042313
 
PDF
NFC切手
PDF
Analyst Report: EMA - The Industrialization of Fraud Demands a Dynamic Intell...
 
PDF
Fotonovel·la tutorial adrià, roger i gerard
PDF
NDEF WriterとOSとPaSoRi
PPTX
psychology of old age
PDF
Hadoop Design Patterns
 
PPT
Snapshot of-juxt-india-online-landscape-2010-press
PPTX
Fri papal response to luther
PPTX
การนำเสนอโดยการเขียน
PPT
My evalutauion question 1
PDF
ものに愛着を持たせる
PPT
Money supply inflation
PPTX
летние мастер-классы в сидо
Pivotal gem fire_twp_distributed-main-memory-platform_042313
 
NFC切手
Analyst Report: EMA - The Industrialization of Fraud Demands a Dynamic Intell...
 
Fotonovel·la tutorial adrià, roger i gerard
NDEF WriterとOSとPaSoRi
psychology of old age
Hadoop Design Patterns
 
Snapshot of-juxt-india-online-landscape-2010-press
Fri papal response to luther
การนำเสนอโดยการเขียน
My evalutauion question 1
ものに愛着を持たせる
Money supply inflation
летние мастер-классы в сидо
Ad

Similar to GDDR Solution Design and Implementation Techniques (20)

PDF
TechBook: DB2 for z/OS Using EMC Symmetrix Storage Systems
 
PDF
TechBook: EMC Compatibility Features for IBM Copy Services on z/OS
 
PDF
Using the SRDF Adapter with VMware Site Recovery Manager 5.1
 
PDF
Micron memorias agate.pdf
PDF
TechBook: IMS on z/OS Using EMC Symmetrix Storage Systems
 
PDF
Managing Data Center Connectivity TechBook
 
PDF
ATV320_Modbus_TCP_EtherNet_IP_Manual_NVE41313_02.pdf
PDF
PDF
Manual Hardware CLP Schneider - M258
PDF
Emf2192 ib _ethercat aif module__v3-1__en
PDF
520com um001 -en-e
PDF
Manual guia power flex525
PDF
Spm6600 Dx6i Manual Lo Res
PDF
MCM User Guide
PDF
Www.dedoimedo.com crash-book
PDF
Solid Cam 2009 Milling User Guide
PDF
Man hinh dieu khien
PDF
Dm00046982
PDF
Manual programação stm 32 f4
PDF
Lenze 8400-motec-manual
TechBook: DB2 for z/OS Using EMC Symmetrix Storage Systems
 
TechBook: EMC Compatibility Features for IBM Copy Services on z/OS
 
Using the SRDF Adapter with VMware Site Recovery Manager 5.1
 
Micron memorias agate.pdf
TechBook: IMS on z/OS Using EMC Symmetrix Storage Systems
 
Managing Data Center Connectivity TechBook
 
ATV320_Modbus_TCP_EtherNet_IP_Manual_NVE41313_02.pdf
Manual Hardware CLP Schneider - M258
Emf2192 ib _ethercat aif module__v3-1__en
520com um001 -en-e
Manual guia power flex525
Spm6600 Dx6i Manual Lo Res
MCM User Guide
Www.dedoimedo.com crash-book
Solid Cam 2009 Milling User Guide
Man hinh dieu khien
Dm00046982
Manual programação stm 32 f4
Lenze 8400-motec-manual
Ad

More from EMC (20)

PPTX
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
PDF
Cloud Foundry Summit Berlin Keynote
 
PPTX
EMC GLOBAL DATA PROTECTION INDEX
 
PDF
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
PDF
Citrix ready-webinar-xtremio
 
PDF
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
PPTX
EMC with Mirantis Openstack
 
PPTX
Modern infrastructure for business data lake
 
PDF
Force Cyber Criminals to Shop Elsewhere
 
PDF
Pivotal : Moments in Container History
 
PDF
Data Lake Protection - A Technical Review
 
PDF
Mobile E-commerce: Friend or Foe
 
PDF
Virtualization Myths Infographic
 
PDF
Intelligence-Driven GRC for Security
 
PDF
The Trust Paradox: Access Management and Trust in an Insecure Age
 
PDF
EMC Technology Day - SRM University 2015
 
PDF
EMC Academic Summit 2015
 
PDF
Data Science and Big Data Analytics Book from EMC Education Services
 
PDF
Using EMC Symmetrix Storage in VMware vSphere Environments
 
PDF
Using EMC VNX storage with VMware vSphereTechBook
 
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
 

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
Teaching material agriculture food technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Electronic commerce courselecture one. Pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
MIND Revenue Release Quarter 2 2025 Press Release
“AI and Expert System Decision Support & Business Intelligence Systems”
sap open course for s4hana steps from ECC to s4
MYSQL Presentation for SQL database connectivity
Machine Learning_overview_presentation.pptx
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Teaching material agriculture food technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
20250228 LYD VKU AI Blended-Learning.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Programs and apps: productivity, graphics, security and other tools
Digital-Transformation-Roadmap-for-Companies.pptx
Assigned Numbers - 2025 - Bluetooth® Document
MIND Revenue Release Quarter 2 2025 Press Release

GDDR Solution Design and Implementation Techniques

  • 1. GDDR Solution Design and Implementation Techniques Version 1.0 • Business Requirement Definitions • Project Planning and Change Management Guidelines • Technology Selection and Solution Design • Architecture and Sizing Considerations Dean Smith Peter Callewaert Chris Fallon John Egan
  • 2. Copyright © 2011 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date regulatory document for your product line, go to the Technical Documentation and Advisories section on EMC Powerlink. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners. Part number h8231 2 GDDR Solution Design and Implementation Techniques
  • 3. Contents Preface Chapter 1 Requirements Definition Introduction ....................................................................................... 16 Understanding Business Continuity requirements ..................... 18 High Availability............................................................................... 21 Recovery Time Objective ................................................................. 22 Recovery Point Objective................................................................. 23 EMC GDDR role................................................................................ 24 Chapter 2 Planning for your GDDR Implementation Understanding project constraints ................................................. 28 The importance of test systems....................................................... 29 Minimizing operational risks during implementations.............. 30 The importance of design ................................................................ 32 Construction of your timeline and milestones ............................. 33 Review change management procedures...................................... 34 Chapter 3 GDDR Managed Technologies EMC Symmetrix VMAX with EMC Enginuity............................. 36 Symmetrix VMAX hardware platform....................................36 EMC Enginuity operating environment..................................38 Symmetrix features for mainframe ..........................................38 EMC Mainframe Enablers .........................................................39 EMC TimeFinder............................................................................... 44 TimeFinder/Clone......................................................................44 TimeFinder/Consistency Group ..............................................45 GDDR Solution Design and Implementation Techniques 3
  • 4. Contents TimeFinder/Mirror for z/OS ................................................... 45 EMC Symmetrix Remote Data Facility (SRDF) ............................ 48 SRDF/Synchronous ................................................................... 48 SRDF/Asynchronous................................................................. 48 SRDF/A Multi-Session Consistency (MSC) ........................... 51 Concurrent SRDF........................................................................ 54 Cascaded SRDF........................................................................... 55 SRDF/Extended Distance Protection ...................................... 56 EMC SRDF consistency groups ...................................................... 58 EMC AutoSwap ................................................................................ 62 EMC SRDF/Star................................................................................ 64 Chapter 4 EMC Geographically Dispersed Disaster Restart (EMC GDDR) Supported configurations................................................................ 72 SRDF/S with ConGroup ........................................................... 73 SRDF/S with AutoSwap ........................................................... 76 SRDF/A ....................................................................................... 78 SRDF/Star ................................................................................... 81 EMC GDDR SRDF/Star complex ............................................ 84 SRDF/Star with AutoSwap ...................................................... 85 EMC GDDR requirements............................................................... 89 Mainframe environment requirements ................................... 89 Minimum hardware requirements .......................................... 90 Minimum software requirements ............................................ 90 Additional configuration requirements .................................. 91 EMC GDDR fundamental concepts ............................................... 93 EMC GDDR control systems .................................................... 93 Workload location ..................................................................... 94 EMC GDDR processes ............................................................... 96 EMC GDDR components................................................................. 98 Parameters .................................................................................. 98 User interface ............................................................................. 99 Events and rules....................................................................... 112 Procedures ................................................................................ 113 EMC GDDR procedures ................................................................ 114 EMC GDDR supported scripts ..................................................... 116 GDDR for SRDF/S with ConGroup (Two sites - DC1 and DC2) .................................................................................... 119 GDDR for SRDF/S with AutoSwap (Two sites - DC1 and DC2) .................................................................................... 121 GDDR for SRDF/A (Two sites - DC1 and DC3) .................. 123 4 GDDR Solution Design and Implementation Techniques
  • 5. Contents GDDR for SRDF/Star (Three sites) ........................................125 GDDR for SRDF/Star with AutoSwap (Three sites) ...........130 Chapter 5 GDDR Managed Systems Design Considerations Overview .......................................................................................... 140 Ignore, Reset Clear, or IPL in place............................................... 142 Contingency systems and workload movement ........................ 143 HMC-Only systems (new in GDDR 3.1)...................................... 144 LPAR and CEC recovery ................................................................ 146 Planned CPC Swap .................................................................. 147 Automated Configuration Check - DASD................................... 149 GDDR variables influencing the expected configuration ...149 Chapter 6 Storage Infrastructure Design Considerations Storage configuration and layout ................................................. 152 Infrastructure recommendations .................................................. 153 Gatekeeper device list ..............................................................153 Started tasks names ................................................................. 155 Important dataset names .........................................................157 EMC software levels ....................................................................... 158 LPAR configuration specifics ..................................................158 Sysplex and coupling facility considerations.............................. 160 Chapter 7 Best Practices for Network Design Overview .......................................................................................... 164 Understanding bandwidth requirements.................................... 165 Synchronous network design considerations ............................. 168 Asynchronous network design considerations .......................... 172 Glossary GDDR Solution Design and Implementation Techniques 5
  • 6. Contents 6 GDDR Solution Design and Implementation Techniques
  • 7. Figures Title Page 1 Business Continuity example ....................................................................... 19 2 RTO example................................................................................................... 22 3 RPO example................................................................................................... 23 4 EMC Foundation technologies ..................................................................... 25 5 VMAX architecture ........................................................................................ 37 6 z/OS SymmAPI architecture........................................................................ 40 7 TimeFinder/Mirror under Enginuity 5773 and earlier ............................ 46 8 TimeFinder/Mirror under Enginuity 5874 and later................................ 47 9 SRDF/A delta sets.......................................................................................... 49 10 Delta set switching ......................................................................................... 51 11 SRDF/A cycles and delta sets ...................................................................... 53 12 Cascaded SRDF configuration...................................................................... 56 13 SRDF/EDP basic configuration ................................................................... 57 14 Primary and secondary relationships ......................................................... 59 15 Failed links ...................................................................................................... 60 16 Primary systems 1, 2, and 3 in a consistency group.................................. 61 17 Concurrent SRDF/Star configuration ......................................................... 66 18 Cascaded SRDF/Star configuration ............................................................ 67 19 Cascaded SRDF/Star with EDP (diskless) configuration ........................ 68 20 SRDF/S with ConGroup environment ....................................................... 73 21 EMC GDDR complex with SRDF/S ............................................................ 75 22 SRDF/S with AutoSwap environment ....................................................... 76 23 SRDF/A environment ................................................................................... 78 24 GDDR complex with SRDF/A ..................................................................... 79 25 Concurrent SRDF/Star environment .......................................................... 81 26 Cascaded SRDF/Star environment ............................................................. 82 27 GDDR Complex with Star............................................................................. 84 28 Concurrent SRDF/Star with AutoSwap environment ............................. 86 29 Cascaded SRDF/Star with AutoSwap environment ................................ 87 30 Primary Options Menu................................................................................ 100 GDDR Solution Design and Implementation Techniques 7
  • 8. Figures 31 Change GDDR ISPF Profile Variable Values ........................................... 102 32 Setup and Maintenance............................................................................... 103 33 View GDDR Configuration ........................................................................ 105 34 Manage Site Roles ........................................................................................ 106 35 Perform Health Check ................................................................................. 107 36 Select Script to Run ...................................................................................... 109 37 View GDDR Script Statistics....................................................................... 110 38 Actions Menu................................................................................................ 111 39 GDDR Actions Menu................................................................................... 144 40 GDDR Perform HMC LPAR Actions ........................................................ 145 41 Write distribution......................................................................................... 166 42 Bandwidth requirement .............................................................................. 167 43 DASD write activity by volser ................................................................... 169 44 DASD Volumes Flagged for Tuning ......................................................... 170 45 Recommended asynchronous network topology ................................... 174 8 GDDR Solution Design and Implementation Techniques
  • 9. Tables Title Page 1 Mainframe environment requirements........................................................ 89 2 Minimum hardware requirements ............................................................... 90 3 Software requirements ................................................................................... 91 4 Script names and descriptions..................................................................... 116 5 Script mapping .............................................................................................. 118 6 GDDR managed system recover map........................................................ 141 7 Software levels ............................................................................................... 158 GDDR Solution Design and Implementation Techniques 9
  • 10. Tables 10 GDDR Solution Design and Implementation Techniques
  • 11. Preface This EMC Engineering TechBook draws on GDDR field expertise, highlighting best practices relating to both technology and project management disciplines required to achieve success during GDDR implementations. This field-inspired guide starts with the first risk area to a business continuity project: basic understanding of business continuity or high availability requirements, then proceeds through analysis, technology selection, and architecture design guidelines. Following the guidelines in this GDDR TechBook will help ensure that your business continuity or high availability project is successful. As part of an effort to improve and enhance the performance and capabilities of its product lines, EMC periodically releases revisions of its hardware and software. Therefore, some functions described in this document may not be supported by all versions of the software or hardware currently in use. For the most up-to-date information on product features, refer to your product release notes. If a product does not function properly or does not function as described in this document, please contact your EMC representative. Note: This document was accurate as of the time of publication. However, as information is added, new versions of this document may be released to the EMC Powerlink website. Check the Powerlink website to ensure that you are using the latest version of this document. Audience This manual is intended for customers considering improvements to business continuity or high availability technology deployments. GDDR Solution Design and Implementation Techniques 11
  • 12. Preface Related The following is a list of related documents that may assist readers documentation with more detailed information on topics described in this TechBook. These documents can be found on the EMC Powerlink website at http://Powerlink.EMC.com. ◆ ResourcePak Base for z/OS ◆ SRDF Host Component for z/OS ◆ TimeFinder/Clone Mainframe SNAP Facility ◆ TimeFinder/Mirror for z/OS ◆ Consistency Groups for z/OS ◆ TimeFinder Utility for z/OS ◆ AutoSwap ◆ GDDR for SRDF/Star with AutoSwap ◆ GDDR for SRDF/Star ◆ GDDR for SRDF/S with AutoSwap ◆ GDDR for SRDF/S with ConGroup ◆ GDDR for SRDF/A Authors of this Dean Smith is a Consulting Software Engineer and has recently TechBook rejoined EMC with the GDDR team after working abroad for several years. With nearly 33 years of IT experience in the Mainframe environment in both client and systems development, Dean has focused exclusively on the justification, architecture, and implementation of business continuity solutions world wide for the last 12 years. Peter Callewaert is a Senior Solutions Architect at EMC, with an active role in the evolution of GDDR as a product, in the troubleshooting and fixing of product issues, and in providing support to the GDDR Solutions Support team. He has been with EMC for 12 years in various roles including Pre-Sales, Customer Service, and on-site Implementation. He has 20+ years of IT-experience mainly in the Mainframe environment, with an ever-increasing focus on Disaster Recovery and Business Continuity. Chris Fallon is a Consulting Corporate System Engineer in the EMC Symmetrix Engineering and Virtualization product group with over 22 years experience in enterprise data center systems and storage expertise. Chris has been working with the Symmetrix since 1995 and 12 GDDR Solution Design and Implementation Techniques
  • 13. Preface SRDF since 1997 where he has developed, architected, and deployed geographically dispersed business continuity solutions on the Symmetrix storage array. John Egan is an EMC Distinguished Engineer in the EMC Symmetrix Engineering and Virtualization product group. He leads the System z Automation team, responsible for the GDDR and EzSM products. His background extends over 30 years with specialization in large scale systems integration, software engineering, project management, and design. Conventions used in EMC uses the following conventions for special notices. this document Note: A note presents information that is important, but not hazard-related. A caution contains information essential to avoid data loss or damage to the system or equipment. IMPORTANT An important notice contains information essential to operation Typographical conventions EMC uses the following type style conventions in this document: Normal Used in running (nonprocedural) text for: • Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus) • Names of resources, attributes, pools, Boolean expressions, buttons, DQL statements, keywords, clauses, environment variables, functions, utilities • URLs, pathnames, filenames, directory names, computer names, filenames, links, groups, service keys, file systems, notifications Bold Used in running (nonprocedural) text for: • Names of commands, daemons, options, programs, processes, services, applications, utilities, kernels, notifications, system calls, man pages Used in procedures for: • Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus) • What user specifically selects, clicks, presses, or types GDDR Solution Design and Implementation Techniques 13
  • 14. Preface Italic Used in all text (including procedures) for: • Full titles of publications referenced in text • Emphasis (for example a new term) • Variables Courier Used for: • System output, such as an error message or script • Specific user input (such as commands) • URLs, complete paths, filenames, prompts, and syntax when shown outside of running text Courier italic Used in procedures for: • Variables on command line • User input variables <> Angle brackets enclose parameter or variable values supplied by the user [] Square brackets enclose optional values | Vertical bar indicates alternate selections - the bar means “or” {} Braces indicate content that you must specify (that is, x or y or z) ... Ellipses indicate nonessential information omitted from the example We'd like to hear from you! Your feedback on our TechBooks is important to us! We want our books to be as helpful and relevant as possible, so please feel free to send us your comments, opinions and thoughts on this or any other TechBook: TechBooks@emc.com 14 GDDR Solution Design and Implementation Techniques
  • 15. 1 Requirements Definition This chapter contains the following information: ◆ Introduction ........................................................................................ 16 ◆ Understanding Business Continuity requirements....................... 18 ◆ High Availability................................................................................ 21 ◆ Recovery Time Objective .................................................................. 22 ◆ Recovery Point Objective .................................................................. 23 ◆ EMC GDDR role................................................................................. 24 Requirements Definition 15
  • 16. Requirements Definition Introduction This chapter introduces EMC® Geographically Dispersed Disaster Restart (EMC GDDR) and discusses its role in High Availability (HA), Business Continuity (BC) and disaster restart (DR) strategies. In recent years, business continuity requirements for many enterprises have changed particularly in response to the threat of terrorism as well as from major natural disasters. What has evolved is the notion of a “regional disaster”—whereby local synchronously replicated centers are “in region” in terms of the scope of disaster events. A regional disaster such as a major flood or prolonged and widespread power disruption could concurrently affect both data centers. In response to this many sites have re-evaluated their disaster readiness capabilities. Commerce has not stood still either. The shift toward online/internet enabled enterprises has continued to build high availability and continuous compute architectures to support “24*7” business operations. Consistent with these trends we have seen customers building data centers and technology infrastructure to match changing business requirements. A common practice for customers running synchronous replication is to add third-site asynchronous “out of region” disaster recovery centers. This has been particularly observed for the large financial institutions. Customers with high (HA) or continuous availability requirements are trending towards building adjacent primary and secondary data centers to gain the benefit of synchronous data replication with minimal performance impact. Another common data center strategy to support HA requirements is to modify existing data center infrastructure for redundant power connect and physical firewall separation for compute and/or storage infrastructure. These strategies facilitate synchronous replication to operate with minimum latency overhead and also enables construction of very high-speed, high bandwidth networks to support processor-to-processor interconnect for the IBM Parallel Sysplex Coupling Facility. 16 GDDR Solution Design and Implementation Techniques
  • 17. Requirements Definition The larger financial institutions are trending toward building combined HA and Disaster Recovery (DR) solutions where both synchronous and asynchronous replication architectures are combined with data replication and management infrastructure to detect and manage potentially catastrophic disaster-in-progress “events.” Introduction 17
  • 18. Requirements Definition Understanding Business Continuity requirements Business Continuity (BC) requirements tend to cluster around industry verticals. Telecommunications companies typically deploy some form of two-site replication and recovery architecture. Large financial institutions are trending towards three-site architectures that support both HA and DR solutions in three data center strategies. The largest retailers tend towards geographically separated two-site data center strategies, some deploying asynchronous and others synchronous data replication strategies. Figure 1 on page 19 illustrates that as the size, sophistication and wealth of an enterprise increase over time, business continuity requirements shift from essentially no protection through disaster recovery, business restart and eventually towards high availability. 18 GDDR Solution Design and Implementation Techniques
  • 19. Requirements Definition 100% Procedural 0% IT Architectural Low Security Manual Resources Food Manufacturing Financial Manufacturing Services & Essential Retail Telecommunication Services & Online Banks Small Business Brokerages Transportation And Consumer Logistics Goods 0% Procedural Manufacturing 100% IT Architectural High Security Automated Single Data Dual Data Three Data Center Center Center Market Cap $10b $20b+ SYM-002544 Figure 1 Business Continuity example Disaster recovery is the process of restoring a previous copy of the data as the starting point for resumption of business operations, typically from a tape image. It is then necessary to apply database logs or perform other manual processes to bring the data to the last known point of consistency. Depending on the size of the enterprise this process can range from a few hours to days and weeks. Disaster restart is the process of restarting dependent-write consistent copies of data. This type of “restart” data image is created by EMC data replication technologies. The image is presented to the operating Understanding Business Continuity requirements 19
  • 20. Requirements Definition system for IPL. The data itself is in “crash restart” form, the exact same image that would be available following the loss of power at a primary (non-replicated) data center. Disaster restart displaces tape based recovery and greatly reduces the Recovery Time Objective (RTO). Database management systems and mainframe applications have evolved over time to restart from these data images. Any incomplete or partially complete transactions are “backed” out by the application layer to find a point where business resumption can occur. Any backed out or incomplete transactions are re-applied after business operations resume. The time required for restart depends on the data change rate and state of the databases at the time of disruption: ◆ If a database is shut down normally, achieving a point of consistency on restart requires minimal work. ◆ However, if the database terminates abnormally or is resumed from a “restart” data image, the restart process is elongated depending on the number and size of in-flight transactions at the time of DBMS termination. An image of a database created using EMC consistency technology while the database is in operation is in a dependent-write consistent state, also known as a DBMS restartable image. The restart of this image transforms it to a transactionally consistent data state by completing committed transactions and rolling back uncommitted transactions during the normal database initialization process. 20 GDDR Solution Design and Implementation Techniques
  • 21. Requirements Definition High Availability High Availability (HA) solutions are built on synchronous data replication to enable EMC AutoSwap™ host software to interact with storage arrays to achieve a non-disruptive swap of DASD I/O from a source array to a synchronous target array. Swap operations can occur automatically for unplanned events, such as the loss of all FICON connectivity between a z/OS host and its attached EMC Symmetrix® array(s), or for planned events through an operator command interface and also through panel dialogs supplied with GDDR. Synchronous data replication is required for swap operations because of the requirement for the replicated data to be an exact image of the primary disk at the precise moment of the swap event. Asynchronous replication is chronologically behind the primary storage instance (by measure of seconds or minutes), it is therefore not possible to deploy AutoSwap into an asynchronous replication environment. Some customers require both HA and DR solutions and deploy a combination of synchronous and asynchronous replication infrastructures. These solutions provide non-disruptive swaps to the secondary synchronous disk for localized “disasters” such as severing the FICON cables between a host and its local storage as well as DR protection in the event of a catastrophic loss of the primary and secondary data centers. High Availability 21
  • 22. Requirements Definition Recovery Time Objective To gain an understanding of the type of replication and recovery infrastructure that must be in place to support an enterprise, it is typical to first classify applications by criticality. Many customers have three or more application tiers, whereby applications are categorized by time-to-recover following planned or unplanned service interruption. Ranges can vary from high availability (RTO = zero) through to hours or days to recover; it all depends on the impact to the business as a trade off to the cost of protection against service interruption. Figure 2 illustrates the relationship between the cost of the solution and the cost of the service interruption. Once the RTO is known, the cost of the solution can be determined. Figure 2 RTO example 22 GDDR Solution Design and Implementation Techniques
  • 23. Requirements Definition Recovery Point Objective Applications must also be classified according to the amount of data loss that could be sustained as result of recovery from service interruption. The measure of data loss is termed Recovery Point Objective (RPO). Typically synchronous data replication supports a near zero RPO, whereas asynchronous data replication supports RPO measured in terms of seconds, minutes or hours. For asynchronous data replication the measure of data loss is usually proportional to the amount of Wide Area Network (WAN) bandwidth applied to the solution. Figure 3 illustrates the relationship between the cost of the solution and the cost of the service interruption. Once the RPO is known, the cost of the solution can be determined. Figure 3 RPO example Recovery Point Objective 23
  • 24. Requirements Definition EMC GDDR role EMC Geographically Dispersed Disaster Restart (EMC GDDR) is a software product that runs on the IBM z/OS operating system to automate business recovery following both planned outages and disaster situations, including the total loss of a data center. EMC GDDR does not provide replication and recovery services itself, but rather monitors and automates the services provided by other EMC products, as well as third-party products required for continuous operations or business restart. EMC GDDR facilitates business continuity by generating scripts that can be run on demand; for example, restart business applications following a major data center incident, or resume replication to provide ongoing data protection following unplanned link outages. Scripts are customized at the time of invocation by an expert system that tailors the steps based on the configuration and the event that GDDR is managing. Through automatic event detection and end-to-end automation of managed technologies, GDDR removes human error from the recovery process and allows it to complete in the shortest time possible. The GDDR expert system is also invoked to automatically generate planned procedures, such as moving compute operations from one data center to another. This is the gold standard for high availability compute operations, to be able to move from scheduled DR test weekend activities to regularly scheduled data center swaps without disrupting application workloads. 24 GDDR Solution Design and Implementation Techniques
  • 25. Requirements Definition Figure 4 on page 25 shows the relationship between the software layers and the Symmetrix hardware layer. Figure 4 EMC Foundation technologies EMC GDDR role 25
  • 26. Requirements Definition 26 GDDR Solution Design and Implementation Techniques
  • 27. 2 Planning for your GDDR Implementation This chapter deals with important considerations to ensure that your major infrastructure project is successful. Some of the issues highlighted here are best practices specific to managing technology risk. Other recommendations relate to the common but important risk factors associated with project administration and change management: ◆ Understanding project constraints .................................................. 28 ◆ The importance of test systems........................................................ 29 ◆ Minimizing operational risks during implementations ............... 30 ◆ The importance of design ................................................................. 32 ◆ Construction of your timeline and milestones .............................. 33 ◆ Review change management procedures....................................... 34 Planning for your GDDR Implementation 27
  • 28. Planning for your GDDR Implementation Understanding project constraints EMC GDDR implementations operate under the “triple” constraint model of project management. Essentially, there are three major areas of constraint on any infrastructure project: ◆ Time ◆ Quality ◆ Cost In many instances the date of implementation is already fixed. Ideally, the implementation date will have been calculated from the sum total of all of the complex technical infrastructure tasks that must be accomplished. These tasks may include: ◆ Workload measurement ◆ Bandwidth assessment ◆ Compute resiliency (to support HA or DR etc) ◆ Storage layout and I/O access density (cache, device types, RAID type etc) ◆ Data center planning (floor space, power, cooling etc) ◆ Data migration planning ◆ Construction of test facility ◆ Staff training It is a major risk factor if the target date for the completion of the project has been set without due regard to the number and complexity of tasks to achieve the goal. In theory, the three constraint project model is a method for the project manager to retain control of the project delivery. If the customer owns all three constraints it is impossible for the project manager to obtain or retain control of the project. Control can be maintained by allowing the project manager to control at least one of the constraints, typically the Quality objective. In environments where time and budget are fixed the only remaining variable is quality. Inevitably, if the time constraint is unreasonable project quality will be affected. 28 GDDR Solution Design and Implementation Techniques
  • 29. Planning for your GDDR Implementation The importance of test systems As the complexity of solutions increases so too does the requirement to allow your operations staff to become competent with new technology. This is best achieved with a test environment. Even if the customer cannot afford a perpetual test environment – serious consideration should be given to constructing an interim test environment for the duration of the GDDR project. It will allow operations staff to safely acquire knowledge and also provide a framework for formalized “hands on” training offered by the EMC GDDR implementation team both during the project and more formally as part of the acceptance criteria towards the end of the project. Some customers can benefit from this more than others. If your operations staff do not have experience with replication and automation technologies in general and EMC in particular, then the construction of a test environment should be considered mandatory. If however, the customer has the necessary skills in-house then the benefit from a test environment is more limited to ensuring that software and fixes provided by EMC and other vendors operate correctly in the customer environment. The importance of test systems 29
  • 30. Planning for your GDDR Implementation Minimizing operational risks during implementations There are several methods for implementing GDDR into production environments. The two approaches outlined below are used where a customer cannot tolerate downtime in their existing production workflow. The techniques used now allow for both continuous operation of production systems and also maintenance of the existing DR posture. The parallel infrastructure approach is a technique where new network, compute and storage infrastructure is commissioned adjacent to the existing production systems. Data is migrated from the existing production infrastructure into the new GDDR managed environment to allow for infrastructure and applications orientated testing to be conducted. The testing process is destructive to the data image, so the data migration process is repeated for each test cycle. Once the infrastructure has been proven to operate successfully in the new environment one final data migration is conducted. The GDDR managed infrastructure becomes the new production environment and the older infrastructure is decommissioned. Another significant advantage of this technique over other techniques is that not only can the resiliency of the compute infrastructure be preserved but also the existing HA/DR posture can be retained until the new infrastructure can be deployed. The parallel infrastructure approach is the “best practice” for deploying GDDR management infrastructures into customer environments with both minimal risk and minimal disruption to existing application workloads. The partial production approach is where a small subset of DASD volumes on the production system is used to prove the entire infrastructure. This allows for the validation of GDDR and prerequisite technologies into the production environment but without actually using DASD that contains production data. When the time comes to increase the scope of the parameters to address the full environment the first time it is executed is live in the production environment. This partial production approach is inferior to the parallel infrastructure approach in that the full function and scope of functionality can be not be fully tested without risking impact to production environments. The emphasis here is to eliminate (in particular) typing errors in parameter libraries, security issues, issues with gate keeper devices, firewalls etc., all of which can be debugged without impact using the parallel infrastructure approach but which cannot be fully detected in the partial production approach until actually encountered live in the production environment. 30 GDDR Solution Design and Implementation Techniques
  • 31. Planning for your GDDR Implementation The importance of a test environment to allow operations staff to obtain competency cannot be overstressed when the partial production approach is used. The partial production approach is a good compromise method of implementing where the customer is able to provide sufficient test windows to safely test the new infrastructure "in place." It is also commonly used where financial or other constraints on a project make the "parallel infrastructure" implementation approach impractical. Minimizing operational risks during implementations 31
  • 32. Planning for your GDDR Implementation The importance of design Another significant area of addressable risk with GDDR projects is in the correct sizing of the entire solution. This book sets out guidelines for properly sizing the proposed solution. Particular attention must be paid to the sizing of network infrastructure. See Chapter 7, “Best Practices for Network Design,” for more details. A second major area of design risk is in the storage layout. Storage must be configured not only for raw storage capacity, raid groups etc but also for throughput capacity. The solution must be designed to sustain not only I/O loads from the production environment but also to have sufficient reserve capacity in terms of cache and throughput to operate the replication infrastructure and (if required) restart operations. See the section Storage Infrastructure Design Considerations for more details on this important topic. EMC recommends early contact (pre-sales) with GDDR solution design experts to ensure that any solution proposal meets design standards to ensure optimal operational performance. The GDDR team operates globally and is centrally managed from EMC Corporate Headquarters in Hopkinton, Massachusetts. The team can be contacted via email at the following address: gddrsolutionsupport@emc.com Conventional mail: GDDR Solution Support Team Mailstop 1/P-36 176 South Street Hopkinton, MA 01748 USA 32 GDDR Solution Design and Implementation Techniques
  • 33. Planning for your GDDR Implementation Construction of your timeline and milestones Setting a “too aggressive” timeline can cause a project to fail. As discussed earlier in the “three constraint” model for project administration, if the timeline is fixed and the budget is fixed that leaves only one variable: quality. In other words in order to meet strict deadlines for task completion shortcuts may be taken in quality assurance measures. The result of this may be a sequence of failed changes or service interruptions to the production environment. One of the ways to avoid placing undue/unreasonable time pressure on a project is to make sure to involve the technical architecture and implementation people in the construction of the implementation plan time outline. Construction of your timeline and milestones 33
  • 34. Planning for your GDDR Implementation Review change management procedures The “GDDR Project” is typically much larger than just installing the GDDR software and customizing parameter libraries. As discussed earlier, there may be changes to physical infrastructure, to utilities, air-conditioning as well as IT infrastructure such as network provisioning, new compute and storage arrays, switches, routers etc. The scope of these projects is very large. In the space of a few months a large number of potentially high risk changes need to be executed. Change scheduling is important in order to avoid conflicts between infrastructure provisioning groups and to align with the business calendar. However, much more important than change scheduling is change management. This is a much broader topic and addresses issues such as: ◆ Change scheduling ◆ Backout/fallback planning ◆ Risk mitigation strategies ◆ Documentation ◆ Root cause analysis (for failed changes) When trying to determine the health of a change management system, look beyond the number of change “tickets” to see how many changes are rejected because of scheduling conflicts, inadequate documentation, risk mitigation etc. A robust change management system will exhibit much more than just change scheduling. Assess the health of your change management systems and if necessary adjust to deal with the large volume of changes typically encountered during these infrastructure projects. 34 GDDR Solution Design and Implementation Techniques
  • 35. 3 GDDR Managed Technologies EMC provides many hardware and software products that support applications in the z/OS environment. The following products, some required and some configuration-specific, are used in EMC GDDR solutions: ◆ EMC Symmetrix VMAX with EMC Enginuity.............................. 36 ◆ EMC TimeFinder ................................................................................ 44 ◆ EMC Symmetrix Remote Data Facility (SRDF) ............................. 48 ◆ EMC SRDF consistency groups........................................................ 58 ◆ EMC AutoSwap.................................................................................. 62 ◆ EMC SRDF/Star ................................................................................. 64 GDDR Managed Technologies 35
  • 36. GDDR Managed Technologies EMC Symmetrix VMAX with EMC Enginuity Symmetrix hardware architecture and the EMC Enginuity™ operating environment are the foundation for the Symmetrix storage platform. This environment consists of the following components: ◆ Symmetrix hardware ◆ Enginuity-based operating functions ◆ Mainframe Enablers ◆ Symmetrix application program interface (API) for mainframe ◆ Symmetrix-based applications ◆ Host-based Symmetrix applications ◆ Independent software vendor (ISV) applications Symmetrix VMAX hardware platform The Symmetrix VMAX™ design is based on a highly-available VMAX Engine with redundant CPU, memory, and connectivity on two directors for fault tolerance. Symmetrix VMAX Engines connect to and scale out linearly through the Virtual Matrix Architecture®, which allows resources to be shared across Symmetrix VMAX engines. To meet growth requirements, additional engines can be added non-disruptively for efficient and dynamic scaling of capacity and performance. 36 GDDR Solution Design and Implementation Techniques
  • 37. GDDR Managed Technologies Figure 5 on page 37 shows the Symmetrix VMAX hardware architecture. Figure 5 VMAX architecture EMC Symmetrix VMAX with EMC Enginuity 37
  • 38. GDDR Managed Technologies EMC Enginuity operating environment The Enginuity operating system provides controls for all components in a Symmetrix array. Enginuity coordinates real-time events related to the processing of production data providing the following services: ◆ Manages system resources to intelligently optimize performance across a wide range of I/O workload requirements. ◆ Ensures system availability through advanced fault monitoring, detection and correction capabilities and provides concurrent maintenance and serviceability features. ◆ Provides the foundation for specific software features available through EMC disaster recovery, business continuance, and storage management software. ◆ Supports functional services for both Symmetrix-based functionality and for a large suite of EMC storage application software. ◆ Defines priority of each task, including basic system maintenance, I/O processing, and application processing. ◆ Provides uniform access through APIs for internal calls and provides an external interface to allow integration with other software providers. Symmetrix features for mainframe Symmetrix storage systems appear to mainframe operating systems as any of the following control units: IBM 3990, IBM 2105, and IBM 2107. The physical storage devices can appear to the mainframe operating system as any mixture of different sized 3380 and 3390 devices. Mainframe host connectivity is supported through serial ESCON and FICON channels. Fibre Channel and GigE are supported options in SRDF® environments. Symmetrix GigE directors in an SRDF environment provide direct TCP/IP connectivity end-to-end for remote replication solutions over extended distances. This negates costly FC to IP converters and helps utilize the existing IP infrastructure without major disruptions. EMC supported IBM compatibility features include: ◆ Channel Command Emulation for IBM ESS 2105/2107 38 GDDR Solution Design and Implementation Techniques
  • 39. GDDR Managed Technologies ◆ Multiple Allegiance (MA) ◆ Parallel Access Volume (PAV) Static and Dynamic ◆ HyperPAV ◆ Concurrent Copy ◆ Dynamic Channel Path Management (DCM) ◆ Dynamic Path Reconnection (DPR) support ◆ Host Data Compression ◆ Logical Path and Control Unit Address Support (CUADD) ◆ Mainframe systems hypervolumes ◆ Partitioned Dataset (PDS) Search Assist ◆ FlashCopy ◆ GDPS ◆ GDPS HyperSwap ◆ Peer to Peer Remote Copy (PPRC) ◆ Extended Remote Copy (XRC) ◆ Extended Address Volume ◆ Extended Distance FICON ◆ High Performance FICON (zHPF) ◆ DFSMS support for Flash Drives ◆ 8 Gb/s FICON EMC Mainframe Enablers Mainframe Enablers (MFE) is a software suite that includes the Symmetrix API runtime libraries for all EMC mainframe software. These software packages can be used to monitor device configuration and status and to perform control operations on devices and data objects within a storage complex. The following components are included in Mainframe Enablers (MFE) 7.0 and above: ◆ ResourcePak® Base for z/OS ◆ SRDF Host Component for z/OS ◆ TimeFinder®/Clone Mainframe SNAP Facility ◆ TimeFinder/Mirror for z/OS EMC Symmetrix VMAX with EMC Enginuity 39
  • 40. GDDR Managed Technologies ◆ Consistency Groups for z/OS ◆ TimeFinder Utility for z/OS ◆ AutoSwap ResourcePak Base for z/OS ResourcePak Base delivers EMC Symmetrix Control Facility (EMCSCF) for IBM and IBM-compatible mainframes. EMCSCF provides a uniform interface for EMC and ISV software products. EMCSCF delivers a "persistent address space" on the host that facilitates communication between the host and the Symmetrix as well as other EMC-delivered and partner-delivered applications. In GDDR managed configurations this host is normally the GDDR Master C-System. ResourcePak Base performs tasks such as the following: ◆ Maintaining an active repository of information about EMC Symmetrix devices attached to z/OS environments and making that information available to other EMC products. ◆ Performing automation functions. ◆ Handling inter-LPAR (logical partition) communication through the Symmetrix storage system. EMC or ISV developed Symmetrix products Symmetrix Control Facility devices (e. g., TimeFinder, SRDF (ResourcePak Base) Host Component) EMCSAI Program Calls SNAPAPI Automation: I SWAP O Metadata: S Config info Device status Event monitor ICO-IMG-000104 Figure 6 z/OS SymmAPI architecture 40 GDDR Solution Design and Implementation Techniques
  • 41. GDDR Managed Technologies ResourcePak Base provides faster delivery of new Symmetrix functions by EMC and ISV partners, along with easier upgrades. It also provides the ability to gather data when using tools such as TimeFinder/Mirror query because device status information is now cached along with other important information. ResourcePak Base for z/OS is a prerequisite for EMC mainframe applications like the TimeFinder/Clone Mainframe SNAP Facility or SRDF Host Component for z/OS. As of release 7.0, these products are packaged and installed together as Mainframe Enablers, along with TimeFinder/Mirror for z/OS, Consistency Groups for z/OS, TimeFinder Utility for z/OS, and AutoSwap. ResourcePak Base provides the following functionality through EMCSCF. Cross-system communication Inter-LPAR communication is handled by the EMCSCF cross-system communication (CSC) component. CSC uses a Symmetrix storage system to facilitate communication between LPARs. Several EMC Symmetrix mainframe applications use CSC to handle inter-LPAR communications. Non-disruptive SymmAPI-MF refreshes As of version 5.3, EMCSCF allows the SymmAPI-MF to be refreshed non-disruptively. Refreshing SymmAPI-MF does not impact currently executing applications that use SymmAPI-MF; for example, SRDF Host Component for z/OS or TimeFinder/Clone Mainframe SNAP Facility. SRDF/A Monitor The SRDF/A Monitor in ResourcePak Base is designed to: ◆ Find EMC Symmetrix controllers that are running SRDF/A. ◆ Collect and write SMF data about those controllers. After ResourcePak Base is installed, the SRDF/A Monitor is started as a subtask of EMCSCF. Group Name Service support ResourcePak Base includes support for Symmetrix Group Name Service (GNS). Using GNS, you can define a device group once and then use that single definition across multiple EMC products on multiple platforms. This means that you can use a device group EMC Symmetrix VMAX with EMC Enginuity 41
  • 42. GDDR Managed Technologies defined through GNS with both mainframe and open systems-based EMC applications. GNS also allows you to define group names for volumes that can then be operated upon by various other commands. Pool management With ResourcePak Base V5.7 or higher, generalized device pool management is a provided service. Pool devices are a predefined set of devices that provide a pool of physical space. Pool devices are not host-accessible. The CONFIGPOOL commands allow management of SNAPPOOLS or DSEPOOLS with CONFIGPOOL batch statements. SRDF/A Multi-Session Consistency SRDF/A Multi-Session Consistency (MSC) is a task in EMCSCF that ensures remote R2 consistency across multiple Symmetrix storage systems running SRDF/A. MSC provides the following: ◆ Coordination of SRDF/A cycle switches across systems. ◆ Up to 24 SRDF groups in a multi-session group. ◆ One SRDF/A session and one SRDF group per Symmetrix storage system when using Enginuity release level 5X70. ◆ With Enginuity release level 5X71 and later, SRDF groups are dynamic and are not limited to one per Symmetrix storage system. Group commands of ENABLE, DISPLAY, DISABLE, REFRESH, and RESTART are now available. SWAP services ResourcePak Base deploys a SWAP service in EMCSCF. It is used by EMC AutoSwap for planned outages with the ConGroup Continuous Availability Extensions (CAX). Recovery services Recovery service commands allow you to perform recovery on local or remote devices (if the links are available for the remote devices). Licensed Feature Code management EMCSCF manages Licensed Feature Codes (LFCs) to enable separately chargeable features in EMC software. These features require an LFC to be provided during the installation and customization of EMCSCF. LFCs are available for: ◆ Symmetrix Priority Control ◆ Dynamic Cache Partitioning 42 GDDR Solution Design and Implementation Techniques
  • 43. GDDR Managed Technologies ◆ AutoSwap (Congroup with AutoSwap Extensions) - separate LFCs are required for planned and unplanned swaps ◆ EMC z/OS Storage Manager ◆ SRDF/Asynchronous (MSC) ◆ SRDF/Automated Replication ◆ SRDF/Star ◆ TimeFinder/Clone (TARGET) ◆ TimeFinder/Consistency Group (CONSISTENT) ◆ TimeFinder/Snap (VDEV) EMC Symmetrix VMAX with EMC Enginuity 43
  • 44. GDDR Managed Technologies EMC TimeFinder TimeFinder is a family of products that enables both volume-based replication and data set level replication within a single Symmetrix system. Data is copied from Symmetrix devices using array-based resources without using host CPU or I/O. The source Symmetrix devices remain online for regular I/O operations while the copies are created. GDDR automation extends to managing TimeFinder within an array or across multiple arrays, for example to provide "gold copy" enterprise data images for DR testing. TimeFinder enables customers to create multiple point-in-time copies of data, allowing simultaneous execution of business tasks that were previously sequential. GDDR exploits the following TimeFinder modes of operation: ◆ TimeFinder/Mirror enables users to configure special devices called business continuance volumes (BCVs) to create a mirror image of Symmetrix standard devices. Using BCVs, TimeFinder creates a point-in-time copy of data that can be repurposed. The TimeFinder/Mirror component extends the basic API command set of Mainframe Enablers to include commands that specifically manage Symmetrix BCVs and standard devices. ◆ TimeFinder/Clone enables users to make copies of data from source volumes to target volumes without consuming mirror positions within the Symmetrix. The data is available to a target's host immediately upon activation, even if the copy process has not completed. Data may be copied from a single source device to multiple target devices. A source device can be either a Symmetrix standard device or a BCV device. TimeFinder/Clone TimeFinder/Clone for z/OS produces point-in-time copies of full volumes or individual datasets. TimeFinder/Clone operations involve full volumes or datasets where the amount of data at the source is the same as the amount of data at the target. TimeFinder/Clone provides significant configuration flexibility because clone copies do not require Symmetrix mirror positions. TimeFinder/Clone source and target devices can have any form of RAID protection. The clone copies can also be configured as a standard device or as a Business Continuance Volume (BCV). 44 GDDR Solution Design and Implementation Techniques
  • 45. GDDR Managed Technologies A Full-Volume Snap captures a complete replica of the source volume on the target volume in the local Symmetrix system. TimeFinder/Clone requests that span control units can invoke an external datamover to accomplish the request. TimeFinder/Consistency Group TimeFinder/Consistency Group, using the Enginuity Consistency Assist (ECA) feature, provides consistent snap operations on multiple volumes so that the targets are dependent-write consistent. TimeFinder/Consistency Group is available for full device, virtual device, and remote full device snaps. The source and target device pairs must reside in the same Symmetrix system. Consistency can be preserved over multiple volumes. TimeFinder/Mirror for z/OS EMC TimeFinder/Mirror is a business continuance solution. GDDR uses TimeFinder/Mirror to make full-volume copies of production data from a standard Symmetrix device (which is online for regular I/O operations from the host) to a Business Continuance Volume (BCV) with which the standard device is paired. The BCV was a specially tagged volume established when the Symmetrix unit was configured. The BCV functioned as a mirror controlled with the TimeFinder/Mirror ESTABLISH, SPLIT, RE-ESTABLISH, and RESTORE commands. The BCV device can then be separated (split) from the standard device and used for backup, restore, decision support, or applications testing. In GDDR controlled environments GDDR will re-establish split BCVs with their associated devices at the conclusion of test windows or otherwise as required during disaster recovery operations. EMC TimeFinder 45
  • 46. GDDR Managed Technologies Under Enginuity 5773 and earlier, TimeFinder/Mirror supports two local-replication technologies as shown in Figure 7 on page 46. TimeFinderMirror TimeFinderClone Commands Commands Emulation API Mirror Clone SYM-002533 Figure 7 TimeFinder/Mirror under Enginuity 5773 and earlier 46 GDDR Solution Design and Implementation Techniques
  • 47. GDDR Managed Technologies Starting with Enginuity 5874, TimeFinder/Mirror uses clone emulation for all operations, as shown in Figure 8 on page 47. TimeFinderMirror TimeFinderClone Commands Commands Emulation API Clone SYM-002534 Figure 8 TimeFinder/Mirror under Enginuity 5874 and later There are no specific steps that need to be taken to exploit Clone Emulation. Whenever TimeFinder/Mirror internally detects a Symmetrix controller running at Enginuity level 5874 and later, TimeFinder/Mirror automatically sets the mode to clone emulation. All TimeFinder/Mirror commands will be converted to clone emulation. GDDR uses TimeFinder/Mirror command syntax. For Enginuity 5874 and above code releases Clone Emulation will be used-this is transparent to GDDR. So GDDR generated scripts continue to operate as they did on TimeFinder/Mirror environments. EMC TimeFinder 47
  • 48. GDDR Managed Technologies EMC Symmetrix Remote Data Facility (SRDF) The SRDF family of products provides synchronous and asynchronous remote replication capabilities for Symmetrix storage systems. At the conceptual level, SRDF is mirroring (RAID level 1) one logical disk device (the primary source/R1 within the primary Symmetrix storage system) to a second logical device (the secondary target/R2 within a physically separate secondary Symmetrix storage system) over Fibre Channel or GigE high-speed communication links. SRDF's basic premise is that a remote mirror of data in a different Symmetrix storage system can serve as a valuable resource for: ◆ Protecting data using geographical separation. ◆ Giving applications a second location from which to retrieve data should the primary location become unavailable for any reason. ◆ Providing a means to establish a set of volumes on which to conduct parallel operations, such as testing or modeling. GDDR supports SRDF Synchronous, SRDF/ Asynchronous and SRDF/Star solutions in various forms of two and three-site solutions. The following paragraphs describe the SRDF features exploited by GDDR. SRDF/Synchronous SRDF synchronous (SRDF/S) mode facilitates disaster recovery within the customer's campus or metropolitan area network through real-time synchronous remote mirroring from one Symmetrix system to one or more Symmetrix systems. SRDF/Asynchronous SRDF asynchronous (SRDF/A) mode provides a long distance disaster restart data image with minimal impact on performance. Symmetrix systems implement asynchronous mode host writes from the primary Symmetrix to the secondary Symmetrix system using dependent-write consistent delta sets transferred in cycles: ◆ Each delta set contains groups of write I/Os for processing, which are managed for dependent-write consistency by the Enginuity operating environment. 48 GDDR Solution Design and Implementation Techniques
  • 49. GDDR Managed Technologies ◆ SRDF/A transfers these sets of data using cycles of operation, one cycle at a time, between the primary Symmetrix system and the secondary Symmetrix system. Write Folding improves the efficiency of the SRDF network links. If there are multiple data updates in the same cycle, the systems send the most current data across the SRDF links. This is called Write Folding and is major advantage over competitive asynchronous replication solutions as it decreases network bandwidth consumption and the number of I/Os the SRDF director processes, thereby reducing the system overhead per host I/O. Figure 9 on page 49 illustrates SRDF/A delta sets and their relationships to cycle processing. Primary Symmetrix Secondary Symmetrix R1 Apply R2 Capture N N-2 Transmit Receive N-1 N-1 R1 R2 Capture Transmit Receive Apply “Active” “Inactive” “Inactive” “Active” cycle cycle cycle cycle SYM-001276 Figure 9 SRDF/A delta sets In single session mode, the Symmetrix SRDF director ensures dependent-write consistency within SRDF/A by obtaining the active cycle number from a single location in global memory and assigning it to each I/O. The director retains that cycle number even if a cycle switch occurs during the life of that I/O. This results in an atomic cycle switch process for dependent-write sequences, even though it is not physically an atomic event across a range of volumes. As a result, two I/Os with a dependent relationship between them can be in the same cycle, or the dependent I/O can be in a subsequent cycle. Delta set switching is at the core of the SRDF/A active session operation. The following is required before a primary Symmetrix system cycle switch can occur: EMC Symmetrix Remote Data Facility (SRDF) 49
  • 50. GDDR Managed Technologies ◆ The transmit delta set must have completed data transfer to the secondary Symmetrix system. ◆ The minimum cycle time (single session mode) or minimum cycle target time (MSC mode) must be reached. ◆ The previous Apply delta set must have been completed. During the delta-set switching process: 1. Write I/Os are collected in the capture delta set on the primary Symmetrix system. The previous cycle's transmit delta set is completing the SRDF transfer to the secondary Symmetrix system receive delta set, which is the N-1 copy. The secondary Symmetrix system apply delta set (N-2) is written to global memory, so that data is marked write pending to the secondary devices. 2. The primary Symmetrix system waits for the minimum cycle time to elapse and the transmit delta set to empty, meaning that all data has been transferred to the secondary Symmetrix system. 3. Once these conditions are satisfied, the primary Symmetrix system sends a commit message to the secondary Symmetrix system to begin the secondary Symmetrix system cycle switch. 4. On the primary system, the new capture delta set is available to collect new host I/Os. Before the secondary Symmetrix system cycle switch can occur, the following must be true: • The secondary Symmetrix system receives the commit message from the primary Symmetrix system. • Apply delta set (N-2 copy) completes its write process marking the data write pending to the secondary devices. 5. Once the secondary Symmetrix system receives the commit message from the primary Symmetrix system, the secondary Symmetrix system verifies the apply delta set has been written. This occurs while the primary Symmetrix system is performing the cycle switch between the capture and transmit delta sets. 6. The next step is a delta set cycle switch on the secondary Symmetrix system between the receive (inactive) and apply (active) delta sets. This preserves the dependent-write consistent copy at the secondary Symmetrix system prior to receiving the next dependent-write consistent delta set. 7. The secondary Symmetrix system sends an acknowledgement to the primary Symmetrix system. The data in the apply delta set is written to disk. 50 GDDR Solution Design and Implementation Techniques
  • 51. GDDR Managed Technologies Figure 10 on page 51 shows the delta set switching process. 1 3a 1. Capture delta set (DS) collects 3e application write I/O R1 R2 N Capture Apply 2. Primary waits for the minimum cycle N-2 N N-2 time, and for the Transmit DS to empty 2d 2c 3b Receive a) Primary tells Secondary to commit the Receive Transmit N-1 2b, 4a N-1 DS (begins Secondary step 3 in unison) R1 2 3c b) SRDF transfer halted N R2 N-2 c) Primary cycle switch occurs – Capture DS 2d becomes the Transmit DS Primary Symmetrix Secondary Symmetrix d) New Capture DS available for Host I/O 3. Secondary receives commit from Primary a) Check if the data in Apply DS is restored (data marked write pending to the R2 devices) b) Secondary cycle switch – Receive DS becomes Apply DS c) New Receive DS available for SRDF transfer d) Secondary sends Primary acknowledgement e) Begin restore of Apply DS 4. Primary receives acknowledgement of Secondary cycle switch a) SRDF transfer begins SYM-001274 Figure 10 Delta set switching SRDF/A Multi-Session Consistency (MSC) SRDF/A Multi-Session Consistency (MSC) supports SRDF/A operations in configurations where there are multiple primary Symmetrix systems or multiple primary Symmetrix system SRDF groups connected to multiple secondary Symmetrix systems or multiple secondary Symmetrix systems SRDF groups. SRDF/A MSC configurations can also support mixed open systems and mainframe data controlled within the same SRDF/A MSC session. GDDR managed solutions using SRDF asynchronous replication require MSC because the interface to control asynchronous replication is provided via MSC. Achieving data consistency across multiple SRDF/A groups requires the cycle switch process be coordinated among the participating Symmetrix system SRDF groups or systems, and that the switch occur during a very brief time period when no host writes are being EMC Symmetrix Remote Data Facility (SRDF) 51
  • 52. GDDR Managed Technologies serviced by any participating Symmetrix system. SRDF control software running on the host provides a single coordination point to drive the cycle switch process in all participating Symmetrix systems. I/Os are processed exactly the same way in SRDF/A MSC mode as they are in single session mode: 1. The active cycle on the primary Symmetrix system contains the current host writes or N data version in the capture delta set. 2. The inactive cycle contains the N-1 data version that is transferred using SRDF/A from the primary Symmetrix system to the secondary Symmetrix system. The primary inactive delta set is the transmit delta set and the secondary Symmetrix system's inactive delta set is the receive delta set. 3. The active cycle on the secondary Symmetrix system contains the N-2 data version of the apply delta set. This is the guaranteed dependent-write consistent image in the event of a disaster or failure. 52 GDDR Solution Design and Implementation Techniques
  • 53. GDDR Managed Technologies Figure 11 on page 53 illustrates the delta sets and their relationships to SRDF/A cycles. Primary Symmetrix Secondary Symmetrix R1 Apply R2 Capture N N-2 Transmit Receive N-1 N-1 R1 R2 Capture Apply N N-2 R1 R2 Transmit Receive N-1 N-1 Capture Apply N N-2 R1 R2 Transmit Receive N-1 N-1 Capture Transmit Receive Apply “Active” “Inactive” “Inactive” “Active” cycle cycle cycle cycle SYM-001275 Figure 11 SRDF/A cycles and delta sets For the host to control the cycle switch process, the Symmetrix systems must be aware that they are running in multi-session consistency mode. This is done using the SRDF control software running on the host. The host software: 1. Coordinates the cycle switching for all SRDF/A sessions comprising the SRDF/A MSC configuration. 2. Monitors for any failure to propagate data to the secondary Symmetrix system devices and drops all SRDF/A sessions together to maintain dependent-write consistency. 3. Performs MSC cleanup if able. As part of the process to enter MSC mode, and with each cycle switch issued thereafter, Enginuity assigns a cycle tag to each new capture cycle. That cycle tag is retained throughout that cycle's life. This cycle tag is a value that is common across all participating SRDF/A EMC Symmetrix Remote Data Facility (SRDF) 53
  • 54. GDDR Managed Technologies sessions and eliminates the need to synchronize the cycle numbers across them. The cycle tag is the mechanism by which dependent-write consistency is assured across multiple MSC controlled participating Symmetrix system SRDF groups or systems. SRDF/A MSC mode performs a coordinated cycle switch during a very short window of time referred to as an SRDF/A window, which is actually a flag (open/closed) indicating when there are no host writes being completed. The SRDF/A window flag is an attribute of the SRDF/A group and is checked at the start of each I/O, imposing no additional overhead because the front-end director is already obtaining the cycle number from global memory as part of the existing SRDF/A cycle switch operations. When the host software discovers that all the SRDF groups and Symmetrix systems are ready for a cycle switch, MSC "opens" the SRDF/A window and issues a single command to each SRDF group to perform a cycle switch. In multi-session mode, the front-end director obtains the cycle number at the start of each write and also checks the SRDF/A window flag, if is the flag is on (an open window) the front-end director disconnects upon receiving host write I/O and begins polling to determine when the SRDF MSC host control software has closed the window. While the window is open, any write I/Os that start are disconnected and as a result no dependent-write I/Os are issued by any host to any devices in the SRDF/A MSC session. The SRDF/A window remains open on each SRDF group and Symmetrix system until the last SRDF group and Symmetrix system in the multi-session group acknowledges to the SRDF MSC host control software that the open and switch command has been processed indicating a successful cycle switch. At this point the SRDF MSC host control software issues a close command for each SRDF/A group under MSC control. As a result, dependent-write consistency across the SRDF/A MSC session is ensured. Concurrent SRDF Enginuity version 5567 and later support the ability for a single primary device to be remotely mirrored to two secondary devices concurrently. This feature is called concurrent SRDF and is supported on Fibre Channel, Gigabit Ethernet (GigE) topologies. In concurrent SRDF, the primary (R11) devices must be assigned to two different 54 GDDR Solution Design and Implementation Techniques
  • 55. GDDR Managed Technologies SRDF groups. This is because each remote mirror can be assigned to only one SRDF group, even if the two secondary devices reside in the same Symmetrix system. The secondary devices in a concurrent SRDF configuration can be operating in synchronous, adaptive copy, or asynchronous mode. The only unsupported combination is for both secondary devices to be operating in asynchronous mode. Concurrent SRDF/S with SRDF/A provides the ability to remotely mirror a group of devices in synchronous mode to one secondary site and in asynchronous mode to a different secondary site which may be an extended distance site. Normal I/O operational rules for SRDF also apply to Concurrent SRDF configurations. When operating in synchronous mode, ending status for an I/O is not presented to the host until the remote Symmetrix system acknowledges receipt of the I/O to the primary Symmetrix system. If both secondary devices are operating in synchronous mode, ending status is not presented to the host until both devices acknowledge receipt of the I/O. If one remote mirror is in synchronous mode and one remote mirror is in adaptive copy or asynchronous mode, ending status is presented to the host when the synchronous device acknowledges receipt of the I/O. GDDR automation provides scripts that allow customers to convert from concurrent to cascaded mode provided that the initial configuration is installed as cascaded mode. Cascaded SRDF Cascaded SRDF is a three-site disaster recovery solution where data from a primary site is synchronously replicated to a secondary site, and then asynchronously replicated from the secondary site to a tertiary site, as shown in Figure 12 on page 56. The core benefit behind a cascaded configuration is its inherent capability to continue replicating, with minimal user intervention, from the secondary site to the tertiary site in the event that the primary site fails. This enables a faster recovery at the tertiary site, provided that the tertiary site is where the customer wishes to restart production operations. EMC Symmetrix Remote Data Facility (SRDF) 55
  • 56. GDDR Managed Technologies Cascaded SRDF uses dual-role SRDF devices (R21 devices) on the secondary site which acts as both an R2 to the primary site and an R1 to the tertiary site as shown in the following figure. Host Primary Secondary Tertiary R1 SRDF/S R21 SRDF/S R2 SYM-002536 Figure 12 Cascaded SRDF configuration The following SRDF modes are allowed in Cascaded SRDF: ◆ R1 -> R21: SRDF/S, SRDF/A, Adaptive copy disk mode, Adaptive copy write-pending mode ◆ R21 -> R2: SRDF/A, Adaptive copy disk mode The most common implementation is for the first hop to be in SRDF/S mode, and the second hop to be in SRDF/A mode. The following limitations apply to Cascaded SRDF: ◆ Only one hop (R1 -> R21, or (R21 -> R2) can be asynchronous at a time ◆ Like concurrent SRDF, the two SRDF mirrors of an R21 cannot be configured using the same SRDF group. GDDR automation provides scripts that allow customers to convert from cascaded to concurrent mode provided that the initial configuration is installed as cascaded mode. SRDF/Extended Distance Protection SRDF/Extended Distance Protection (EDP) is a three-site configuration that requires Enginuity 5874 and later running on the secondary Symmetrix system and Enginuity 5773 or 5874 and later running on the primary and tertiary systems. Figure 13 on page 57 shows an example of an SRDF/EDP basic configuration. 56 GDDR Solution Design and Implementation Techniques
  • 57. GDDR Managed Technologies SRDF/EDP is achieved through a cascaded SRDF setup, where a Symmetrix VMAX system at a secondary site uses diskless R21 devices to capture only the differential data that would be owed to the tertiary site in the event of a primary site failure. Primary Secondary Tertiary Host SRDF/S SRDF/A DL R2 R1 Dev Full copy Diskless Full copy R21 decive (Not full copy- differences only) SYM-002130 Figure 13 SRDF/EDP basic configuration SRDF/EDP provides a long distance replication solution with the ability to achieve zero RPO at the tertiary site. This is also a lower cost solution when compared to three-site solutions such as concurrent and cascaded SRDF. EMC Symmetrix Remote Data Facility (SRDF) 57
  • 58. GDDR Managed Technologies EMC SRDF consistency groups GDDR manages a single consistency group. Devices in the group can be FBA (Open Systems) or CKD (mainframe) format. The design of ConGroup and GDDR allows for customers to build HA and DR solutions around the concept of the enterprise. Because GDDR is installed outside of the scope of a Sysplex-GDDR controlled solutions can also span multiple Sysplex environments. An SRDF consistency group is a collection of related Symmetrix devices that are configured to act in unison to maintain data integrity. The devices in consistency groups can be spread across multiple Symmetrix systems. Consistency Groups for z/OS (ConGroup) is an SRDF product offering designed to ensure the dependent-write consistency of the data remotely mirrored by SRDF/S operations in the event of a rolling disaster. Most applications, and in particular database management systems (DBMSs), have dependent-write logic embedded in them to ensure data integrity if a failure occurs, either hardware or software. ConGroup is based on the concept of dependent-write operations. A dependent-write is a write not issued by an application until a prior, related write I/O operation is completed. An example of dependent-write activity is a database update: 1. The DBMS writes to the disk containing the transaction log. 2. The DBMS writes the data to the actual database. 3. The DBMS writes again to the log volume to indicate that the database update was made. In a remotely mirrored environment, data consistency cannot be ensured if one of the writes is remotely mirrored, but its predecessor write was not. This could occur, for example, in a rolling disaster where a communication loss occurs and affects only a subset of the devices involved in the remote copy function. ConGroup prevents a rolling disaster from affecting data integrity at the secondary site. When ConGroup detects any write I/O to a volume that cannot communicate with its R2 (secondary) mirror, it suspends the remote mirroring for all volumes defined to the consistency group before completing the intercepted I/O and returning control to the application. This is referred to as a ConGroup trip. In this way, ConGroup ensures a dependent-write consistent 58 GDDR Solution Design and Implementation Techniques
  • 59. GDDR Managed Technologies image is available at the secondary site by preventing a dependent-write I/O from reaching the secondary site if the previous I/O only gets as far as the primary mirror. I/O to the primary devices in the consistency group can still occur even when the devices are Not Ready on the SRDF links. Such updates are not immediately sent to the secondary site. However, they are propagated after the affected links are again operational, and data transfer from the primary devices to the secondary devices resumes. Assume an SRDF configuration consists of three Symmetrix systems with primary devices, and two additional Symmetrix systems with secondary devices. The systems with primary devices send data to the systems with secondary devices as shown in Figure 14 on page 59. Primary 1 Primary 2 Primary 3 Secondary 1 Secondary 2 SYM-002537 Figure 14 Primary and secondary relationships EMC SRDF consistency groups 59
  • 60. GDDR Managed Technologies Next, assume that the links between primary system 2 and secondary system 1 fail. Without a consistency group, primary systems 1 and 3 continue to write data to the secondary site systems 1 and 2 while primary system 2 does not, as shown in Figure 15 on page 60. The copy of the data spread across secondary systems 1 and 2 becomes inconsistent. Primary 1 Primary 2 Primary 3 Secondary 1 Secondary 2 SYM-002538 Figure 15 Failed links However, if primary systems 1, 2, and 3 belong to a consistency group, as shown in Figure 16 on page 61, and the links between primary system 2 and secondary system 1 fail, the consistency group automatically stops primary systems 1 and 3 from sending data to secondary systems 1 and 2. Thus, the dependent-write consistency of the data spanning secondary systems 1 and 2 remains intact. 60 GDDR Solution Design and Implementation Techniques
  • 61. GDDR Managed Technologies Consistency group Primary 1 Primary 2 Primary 3 Secondary 1 Secondary 2 SYM-002539 Figure 16 Primary systems 1, 2, and 3 in a consistency group EMC SRDF consistency groups 61
  • 62. GDDR Managed Technologies EMC AutoSwap Consistency group Primary 1 Primary 2 Primary 3 Secondary 1 Secondary 2 SYM-002540 EMC AutoSwap provides the ability to move (swap) workload I/Os transparently from volumes in one set of Symmetrix storage systems to volumes in other Symmetrix storage systems without operational interruption. AutoSwap, with SRDF and EMC Consistency Groups, dramatically increases data availability. AutoSwap differs from IBM HyperSwap in that AutoSwap is not confined to operating completely within a Sysplex. AutoSwap can coordinate swaps for multiple monoplex or Sysplex systems in a single swap event. Swaps may be initiated either manually as planned events or automatically as unplanned events (upon failure detection). ◆ Planned swaps facilitate operations such as non-disruptive building maintenance, power reconfiguration, DASD relocation, and channel path connectivity reorganization. ◆ Unplanned swaps protect systems against outages in a number of scenarios. Examples include: power supply failures, building infrastructure faults, air conditioning problems, loss of channel connectivity, entire DASD system failures, operator error, or the consequences of intended or unintended fire suppression system discharge. 62 GDDR Solution Design and Implementation Techniques
  • 63. GDDR Managed Technologies ◆ In GDDR managed environments planned swap events are initiated and controlled by GDDR. Unplanned swap events are initiated and controlled by AutoSwap. GDDR monitors for unplanned swap events and will re-configure surviving infrastructure to deal with the new source of write I/O activity after the successful unplanned swap event. Swaps are concurrently performed while application workloads continue in conjunction with EMC Consistency Groups. This option protects data against unforeseen events, and ensures that swaps are unique, atomic operations that maintain dependent-write consistency. EMC AutoSwap 63
  • 64. GDDR Managed Technologies EMC SRDF/Star SRDF/Star is a three-site disaster recovery solution consisting of a primary Symmetrix system (Site A) and two remote Symmetrix systems (Sites B and C). SRDF/Star provides advanced multisite business continuity replication that augments concurrent SRDF/S (synchronous) and SRDF/A (asynchronous) operations for the same primary volumes with the ability to incrementally establish or reestablish SRDF/A replication between the primary (R1) and remote (R2) sites in the event of a primary site outage. In mainframe environments, EMC GDDR is a mandatory requirement for all SRDF/Star deployments to automate site swap activity and/or disaster restart. In the event of a primary site outage, the EMC GDDR SRDF/Star solution allows customers to quickly move operations and re-establish protection between the remaining sites. Once conditions permit, customers can rejoin the primary site to the configuration, resuming SRDF/Star protection. SRDF/Star can operate in concurrent or cascaded environments, providing the appropriate differential synchronization. These environments address different recovery and availability objectives: ◆ Concurrent SRDF/Star positions the secondary site or the remote site as potential recovery sites, and provides differential resynchronization between the secondary and remote sites. To achieve this positioning, some level of reconfiguration intervention is required to access point-of-disaster data. ◆ Cascaded SRDF/Star positions only the remote site as the recovery site with minimal intervention to access point-of-disaster data. This solution differentially synchronizes data between the primary site and the remote site. ◆ SRDF/EDP can also be used in an SRDF/Star configuration. By design, SRDF/EDP only allows recovery at the tertiary site. SRDF/Star provides rapid reestablishment of cross-site protection in the event of primary site failure. Rather than a full resynchronization between the remote sites, SRDF/Star provides differential synchronization between the remote sites, and dramatically reduces the time it takes to remotely protect the new production site. SRDF/Star also provides a mechanism to determine which remote site has the most current data in the event of a rolling disaster that 64 GDDR Solution Design and Implementation Techniques
  • 65. GDDR Managed Technologies affects the primary site. In all cases, users maintain the ability to choose which site to operate from and which site's data to use when recovering from a primary site failure. The host-based Multi-Session Consistency (MSC) task at the primary (R1) site controls normal SRDF/Star operation. MSC performs session management at SRDF/S site B and when necessary at SRDF/A site C. The MSC session management task maintains the information needed to perform differential synchronization between site B and site C. Automation for SRDF/Star mainframe configuration management is provided by EMC GDDR. In Concurrent SRDF/Star operations, site B serves as the secondary site and the target of the SRDF/S links from site A. Site C serves as the tertiary site and the secondary site of the SRDF/A links from site A. The recovery links are the SRDF/A links between site C and site B. Figure 17 on page 66 shows a Concurrent SRDF/Star configuration. EMC SRDF/Star 65
  • 66. GDDR Managed Technologies Primary site (A) Secondary site (B) R1 SRDF/Synchronous R2 SRDF/Asynchronous SRDF/A (recovery links) Out of region site C Active R2 Inactive SYM-002541 Figure 17 Concurrent SRDF/Star configuration SRDF/Star in a cascaded SRDF configuration has the ability to incrementally establish an SRDF/A session between the primary and the asynchronous site in the event the synchronous site fails. With cascaded SRDF/Star, the synchronous secondary site is always more current than the asynchronous secondary site 66 GDDR Solution Design and Implementation Techniques
  • 67. GDDR Managed Technologies Figure 18 on page 67 shows a basic cascaded SRDF/Star configuration. Site B serves as the secondary site and the target of the SRDF/S links from site A. Site C serves as the out of region site and the target of the SRDF/A links from site B. The recovery SRDF/A links are between site C and site A. Primary site (A) Secondary site (B) R1 SRDF/Synchronous R21 SRDF/A (recovery links) SRDF/Asynchronous Out of region site C Active R2 Inactive SYM-002542 Figure 18 Cascaded SRDF/Star configuration EMC SRDF/Star 67
  • 68. GDDR Managed Technologies Figure 19 on page 68 illustrates a cascaded SRDF/Star EDP diskless environment with the remote devices at the asynchronous secondary site. The diskless R21 device streamlines the link connections to the asynchronous site in cascaded mode. No data copies are available at the synchronous target site B because that site's Symmetrix use diskless cache only R21 devices. Synchronous secondary site (B) Primary site (A) (diskless DL Dev) R11 SRDF/Synchronous R21 SRDF/A (recovery links) SRDF/Asynchronous Asynchronous secondary site (C) Active R2 Inactive SYM-002543 Figure 19 Cascaded SRDF/Star with EDP (diskless) configuration 68 GDDR Solution Design and Implementation Techniques
  • 69. 4 EMC Geographically Dispersed Disaster Restart (EMC GDDR) This chapter contains the following: ◆ Supported configurations ................................................................. 72 ◆ EMC GDDR requirements ................................................................ 89 ◆ EMC GDDR fundamental concepts ................................................ 93 ◆ EMC GDDR components.................................................................. 98 ◆ EMC GDDR procedures.................................................................. 114 ◆ EMC GDDR supported scripts....................................................... 116 EMC Geographically Dispersed Disaster Restart (EMC GDDR) 69
  • 70. EMC Geographically Dispersed Disaster Restart (EMC GDDR) EMC Geographically Dispersed Disaster Restart (EMC GDDR) can be implemented in a variety of configurations involving two or three-sites, SRDF/S, SRDF/A, ConGroup, AutoSwap, SRDF/EDP, and SRDF/Star. In the mainframe environment, EMC GDDR is a requirement for a SRDF/Star configuration. EMC GDDR can manage environments that are comprised of the following elements: ◆ Multiple z/OS systems ◆ Multiple Sysplexes ◆ Multiple Symmetrix controllers ◆ Intermix of CKD and FBA/FBAM DASD and BCVs In each configuration, EMC GDDR provides specific capabilities tailored to that configuration. However, the major features of EMC GDDR are common across all topologies. Because EMC GDDR manages production systems following disasters, it does not reside on the same servers that it is seeking to protect. EMC GDDR resides on separate z/OS systems from the host servers that run your application workloads. EMC GDDR is installed on a control LPAR at each site. These control LPARs are referred to as GDDR nodes, Control Systems, or C-Systems. Each EMC GDDR node is aware of the other EMC GDDR nodes through network connections between each site. This multi-site awareness allows EMC GDDR to detect disasters and identify survivors. In two-site configurations, EMC GDDR can recover business at the surviving site. In three-site configurations, EMC GDDR can nominate a control LPAR to assume the leadership role for EMC GDDR and recover business at one of the surviving sites. To achieve the task of business restart, EMC GDDR automation extends well beyond the disk level and into the host operating system level where sufficient controls and access to third party software and hardware products exist to enable EMC GDDR to provide automated recovery capabilities. EMC GDDR can distinguish normal operational disruptions from disasters and respond accordingly. For example, EMC GDDR is able to distinguish between network outages (SRDF link drop) and real disasters. This awareness is achieved by periodic exchange of dual-direction heartbeats between the EMC GDDR C-Systems. EMC GDDR constantly checks for disaster situations and ensures that other GDDR systems are "healthy." This checking allows EMC GDDR to recognize, and act on, potential disaster situations, even if only one EMC GDDR C-system survives. "Split brain" problems associated 70 GDDR Solution Design and Implementation Techniques
  • 71. EMC Geographically Dispersed Disaster Restart (EMC GDDR) with cluster technologies are avoided through operator prompts. Upon the initial recognition stage, EMC GDDR issues messages to the operator console seeking confirmation of the event and, further, confirmation of restart actions required. In any EMC GDDR installation, one Control System (C-System) is designated as the Master C-System. In a two-site configuration, failure of the Master C-System will result in the remaining C-System becoming the Master. In a three-site configuration, if a local or regional disaster occurs EMC GDDR can determine which of the surviving sites will execute the recovery. Changes to EMC GDDR configuration information can only be made on the EMC GDDR Master Control System. EMC GDDR propagates these changes to the subordinate EMC GDDR C-system(s) using inter-system communications. Restart procedures following disasters are coordinated from the EMC GDDR Master C-System. EMC GDDR scripts, whether for planned or unplanned scenarios, can only be executed from the Master C-System. If necessary, the Master role can be manually transferred from one C-System to another through the EMC GDDR user interface. 71
  • 72. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Supported configurations EMC GDDR is available to monitor and manage the following configurations and topologies: Two-site solutions: ◆ SRDF/S with ConGroup — The two-site SRDF/S with ConGroup configuration provides disaster restart capabilities at site DC2. ◆ SRDF/S with AutoSwap — The two-site SRDF/S with AutoSwap configuration provides for near-continuous availability through device failover between DC1 and DC2. ◆ SRDF/A — The two-site SRDF/A configuration provides disaster restart capabilities at site DC3. Three-site solutions: ◆ SRDF/Star — The three-site SRDF/Star configuration provides disaster restart capabilities at either the Secondary DASD site or DC3. Concurrent and Cascaded SRDF support further minimize the DC3 recovery time objective. R22 support is available, removing the need to create device pairs during GDDR scripts. ◆ SRDF/Star with AutoSwap — The three-site SRDF/Star with AutoSwap configuration provides for near-continuous availability through device failover between DC1 and DC2 as well as disaster restart capabilities at DC3. Concurrent and Cascaded SRDF support further minimize the DC3 recovery time objective. R22 support is available, removing the need to create device pairs during GDDR scripts. ◆ SRDF/Star with EDP. EMC GDDR functionality, controlled by a parameter library, can be customized to operate in any of these two or three-site configurations. During EMC GDDR implementations, the EMC GDDR parameter library is customized to reflect: ◆ The prerequisite software stack. ◆ The desired data center topology (two-site versus three-site, synchronous or asynchronous). 72 GDDR Solution Design and Implementation Techniques
  • 73. EMC Geographically Dispersed Disaster Restart (EMC GDDR) An EMC GDDR complex consists of EMC GDDR control systems (C-Systems), the z/OS and open systems hosts, and EMC Symmetrix storage systems which support an organization's mission-critical workload. A single GDDR complex is able to support multiple Sysplexes. SRDF/S with ConGroup The two-site SRDF/S with ConGroup configuration provides disaster restart capabilities at site DC2. Figure 20 on page 73 illustrates EMC GDDR operation in the SRDF/S with Consistency Group environment. DC1 DC2 EMC EMC GDDR GDDR ConGroup ConGroup R1 R2 SRDF/S GDDR heartbeat communication Active Escon/Ficon channels Standby Escon/Ficon channels Active SRDF links SYM-002226 Figure 20 SRDF/S with ConGroup environment As Figure 20 on page 73 shows, the relationship between the DC1 and DC2 sites is maintained through SRDF/S replication of primary disk images at DC1 to DC2. Both open systems (FBA) and mainframe (CKD) disk images can be replicated. Figure 20 on page 73 also shows the two EMC GDDR C-Systems with their heartbeat communication paths, separate from the production disk and computer facilities. The C-system at the Primary DASD site has EMC Consistency Group software installed. It is recommended that one or more of the Primary DASD site production z/OS systems also has EMC Consistency Supported configurations 73
  • 74. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Group software installed. SRDF/S and ConGroup ensure that at the point that EMC GDDR receives notification of an unplanned or failure event, a point of consistency is already achieved. In this environment, EMC GDDR can do the following: ◆ Manage planned site swaps ◆ Restart processing at the secondary site following unplanned primary site events ◆ Perform standard operational tasks: • IPL, system reset, activate, deactivate • Trigger stop/start of business workloads ◆ Actively monitor for unplanned/failure events • Sites • Systems • Loss of SRDF/S • ConGroup trip • Inter-site communication failure EMC GDDR SRDF/S complex Each GDDR complex manages one consistency group. A consistency group is a named group of source (R1) volumes managed by the EMC Consistency Group (ConGroup) application as a unit. The volumes can be any mix of FBA and CKD devices on multiple Symmetrix units supporting a system, a SYSPLEX, or multiple SYSPLEXs, as well as "Open systems" hosts (Windows and various Unix flavors). Figure 21 on page 75 depicts a logical view of a typical EMC GDDR complex. 74 GDDR Solution Design and Implementation Techniques
  • 75. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Primary site Secondary site DC1 DC2 Sysplex #n Sysplex #n Sysplex #2 Sysplex #2 C1 C2 Sysplex #1 Sysplex #1 Master C SRDF/S Local R1 R2 Local BCV BCV SYM-002219 Figure 21 EMC GDDR complex with SRDF/S The following are brief descriptions of the components that comprise this GDDR complex: BCVs — BCVs (Business Continuance Volumes) can be supported at each of the sites. They may be established at the DC2 site and split at the DC1 site. BCVs taken at the recovery site are often referred to as "Gold Copy" devices. C1 and C2 — C1 and C2 are the EMC GDDR Control LPARs (or C-Systems) at each of the sites. Primary site — The primary site is the site where the production workload is located. Primary DASD site — The primary DASD (direct access storage device) site is the site where the source (R1) DASD is located. The primary DASD site is the same as the primary site. Secondary site — The secondary site is the site where the contingency or standby systems are located. Secondary DASD site — The secondary DASD site is the site where the target (R2) DASD is located. The secondary DASD site is the same as the secondary site. Sites DC1 and DC2 — Sites DC1 and DC2 are the primary and secondary data centers of critical production applications and data. DC1 is the primary site, with SRDF/S data replication to the secondary site, DC2. These sites are considered fully equivalent for Supported configurations 75
  • 76. EMC Geographically Dispersed Disaster Restart (EMC GDDR) strategic production applications, connected with highly redundant direct network links. At all times, all production data is replicated synchronously between the two-sites. SRDF/S with AutoSwap The two-site SRDF/S with AutoSwap configuration provides for near-continuous availability through device failover between DC1 and DC2. Figure 22 on page 76 illustrates EMC GDDR operation in the SRDF/S with AutoSwap environment. DC1 DC2 AutoSwap EMC EMC GDDR GDDR AutoSwap AutoSwap R1 R2 SRDF/S GDDR heartbeat communication Active Escon/Ficon channels Standby Escon/Ficon channels Active SRDF links SYM-002225 Figure 22 SRDF/S with AutoSwap environment As Figure 22 on page 76 shows, the relationship between the DC1 and DC2 sites is maintained through SRDF/S replication of primary disk images at DC1 to DC2. Both open systems (FBA) and mainframe (CKD) disk images can be replicated. Figure 22 on page 76 also shows the two EMC GDDR C-Systems with their heartbeat communication paths, separate from the production disk and computer facilities. Each of the DC1 and DC2 production z/OS LPARs as well as both C-systems has EMC AutoSwap and EMC Consistency Group (ConGroup) software installed. AutoSwap and ConGroup ensure that a point of consistency exists whenever EMC GDDR receives notification of an unplanned or failure event. 76 GDDR Solution Design and Implementation Techniques
  • 77. EMC Geographically Dispersed Disaster Restart (EMC GDDR) In this environment, EMC GDDR can do the following: ◆ Manage planned site swaps ◆ Manage recovery after unplanned site swaps ◆ Perform standard operational tasks: • IPL, system reset, activate, deactivate • Trigger stop/start of business workloads ◆ Actively monitor for unplanned/failure events • Sites • Systems • Loss of SRDF/S • ConGroup trip • Inter-site communication failure • AutoSwap events ◆ Configure/reconfigure • Couple datasets ◆ Manage coupling facilities • Policies Supported configurations 77
  • 78. EMC Geographically Dispersed Disaster Restart (EMC GDDR) SRDF/A The two-site SRDF/A configuration provides disaster restart capabilities at site DC3. In case DC3 was the Primary DASD site, GDDR offers disaster restart capabilities at DC1. Figure 23 on page 78 illustrates EMC GDDR operation in the SRDF/A environment. DC1 DC3 EMC EMC GDDR GDDR R1 R2 SRDF/A GDDR heartbeat communication Active Escon/Ficon channels Standby Escon/Ficon channels Active SRDF links SYM-002224 Figure 23 SRDF/A environment As Figure 23 on page 78 shows, the relationship between the DC1 and DC3 sites is maintained through SRDF/A replication of primary disk images from DC1 to DC3. Both open systems (FBA) and mainframe (CKD) disk images can be replicated. It also shows the two EMC GDDR C-Systems with their heartbeat communication paths, separate from the production disk and computer facilities. EMC GDDR does not have a requirement to "freeze" I/O to obtain a point of consistency. Multi-Session Consistency and SRDF/A provide the mechanism to obtain a point of consistency. At the point that EMC GDDR receives notification of an unplanned or failure event, a point of consistency is already achieved through these foundation technologies. 78 GDDR Solution Design and Implementation Techniques
  • 79. EMC Geographically Dispersed Disaster Restart (EMC GDDR) In this environment, EMC GDDR can do the following: ◆ Manage planned site swaps ◆ Restart processing at the secondary site following unplanned primary site events ◆ Perform standard operational tasks • IPL, system reset, activate, deactivate • Trigger stop/start of business workloads ◆ Actively monitor for unplanned/failure events • Sites • Systems • Loss of SRDF/A • Inter-site communication failure EMC GDDR SRDF/A complex Each GDDR complex can manage one Multi-Session Consistency (MSC) group. An MSC group is a named group, consisting of multiple RDF groups operating in SRDF/A mode, managed by the EMC MSC control software feature as a single unit. These groups can have any mix of CKD and FBA devices, on multiple Symmetrix units supporting a system, a SYSPLEX, or multiple SYSPLEXs as well as "Open systems" hosts (Windows and various flavours of Unix). Figure 24 on page 79 depicts a logical view of a typical EMC GDDR complex. Primary site Secondary site DC1 DC3 Sysplex #n Sysplex #n Sysplex #2 Sysplex #2 C1 C3 Sysplex #1 Sysplex #1 Master C SRDF/A Local R1 R2 Local BCV BCV SYM002218 Figure 24 GDDR complex with SRDF/A Supported configurations 79
  • 80. EMC Geographically Dispersed Disaster Restart (EMC GDDR) The following are brief descriptions of the components that comprise this GDDR complex: BCVs — Business Continuance Volumes can be supported at each of the sites. They may be established at the DC3 site and split at the DC1 site. C1 and C3 — C1 and C3 are the EMC GDDR Control LPARs (or C-Systems) at each of the sites. Primary site —The primary site is the site where the production workload is located. Primary DASD site —The primary DASD (direct access storage device) site is the site where the source (R1) DASD is located. The primary DASD site is the same as the primary site. Secondary site — The secondary site is the site where the contingency or standby systems are located. Secondary DASD site — The secondary DASD site is the site where the target (R2) DASD is located. The secondary DASD site is the same as the secondary site. Sites DC1 and DC3 — Sites DC1 and DC3 are the primary and secondary data centers of critical production applications and data. DC1 is the primary site, with SRDF/A data replication to the secondary site, DC3. These sites are considered fully equivalent for strategic production applications, connected with highly redundant direct network links. At all times, all production data is replicated asynchronously between the two-sites. 80 GDDR Solution Design and Implementation Techniques
  • 81. EMC Geographically Dispersed Disaster Restart (EMC GDDR) SRDF/Star Configuration The three-site SRDF/Star configuration provides disaster restart capabilities at DC2 or DC3. Figure 25 on page 81 illustrates EMC GDDR operation in a concurrent SRDF/Star environment. Figure 26 on page 82 illustrates EMC GDDR operation in a cascaded SRDF/Star environment. DC1 DC2 EMC EMC GDDR GDDR ConGroup ConGroup R1 R2 SRDF/S DC3 SRDF/A R2 GDDR heartbeat communication Active Escon/Ficon channels EMC Standby Escon/Ficon channels GDDR Active SRDF links SRDF links in standby mode SYM-002223 Figure 25 Concurrent SRDF/Star environment Supported configurations 81
  • 82. EMC Geographically Dispersed Disaster Restart (EMC GDDR) DC1 DC2 EMC EMC GDDR GDDR ConGroup ConGroup R1 R21 SRDF/S SRDF/A DC3 R2 GDDR heartbeat communication Active Escon/Ficon channels EMC Standby Escon/Ficon channels GDDR Active SRDF links SRDF links in standby mode SYM-002221 Figure 26 Cascaded SRDF/Star environment As Figure 25 on page 81 and Figure 26 on page 82 show, the relationship between the DC1 and DC2 sites is maintained through SRDF/Synchronous replication of primary disk images at DC1 to DC2. Both open systems (FBA) and mainframe (CKD) disk images can be replicated. In a concurrent configuration, the asynchronous relationship is between DC1 and DC3, while in a cascaded environment, the asynchronous relationship is between DC2 and DC3. Figure 25 on page 81 and Figure 26 on page 82 also show the three EMC GDDR C-Systems with their independent heartbeat communication paths, separate from the production disk and computer facilities. The C-system at the Primary DASD site has EMC Consistency Group software installed. It is recommended that one or 82 GDDR Solution Design and Implementation Techniques
  • 83. EMC Geographically Dispersed Disaster Restart (EMC GDDR) more of the Primary DASD site production z/OS systems also has EMC Consistency Group software installed. During relevant GDDR scripts, GDDR will ensure the Consistency Group software is started where needed. In this environment, EMC GDDR can perform the following tasks: ◆ Manage planned site swaps ◆ Manage recovery after unplanned site swaps ◆ Manage reconfiguration of the SRDF/Star environment between concurrent and cascaded topologies ◆ Manage reconfiguration of the SRDF/Star environment from cascaded to concurrent with a primary processing site move ◆ Perform standard operational tasks: • IPL, system reset, activate, deactivate • Trigger stop/start of business workloads ◆ Actively monitor for unplanned/failure events, including: • Sites • Systems • ConGroup trip • Loss of SRDF/S • Loss of SRDF/A • Inter-site communication failure Supported configurations 83
  • 84. EMC Geographically Dispersed Disaster Restart (EMC GDDR) EMC GDDR SRDF/Star complex Figure 27 on page 84 depicts a logical view of a typical EMC GDDR complex in a concurrent configuration. Primary site Secondary site DC1 DC2 Sysplex #n Sysplex #n Sysplex #2 Sysplex #2 C1 C2 Sysplex #1 Sysplex #1 Master C SRDF/S Local R1 R2 Local BCV BCV SRDF/A DC3 C3 R2 Local BCV Tertiary site SYM-002220 Figure 27 GDDR Complex with Star The following are brief descriptions of the components that comprise this GDDR complex: BCVs — BCVs (Business Continuance Volumes) can be supported at each of the sites. Primary site — The primary site is the site where the production workload is located. 84 GDDR Solution Design and Implementation Techniques
  • 85. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Primary DASD site — The primary DASD (direct access storage device) site is the site where the source (R1) DASD is located. The primary DASD site is normally the same as the primary site. Secondary site — The secondary site is the site where the contingency or standby systems are located. Secondary DASD site — The secondary DASD site is the site where the target (R2) DASD is located. The secondary DASD site is normally the same as the secondary site. Sites DC1, DC2, and DC3 — Site DC1, DC2, and DC3 are used together in an SRDF/Star data replication three-site topology, where DC1 is the primary site with concurrent or cascaded replication to the secondary site DC2 and the tertiary site DC3. Site DC3 is the tertiary data center for critical production applications and data. It is connected with redundant network to both DC1 and DC2. Data is replicated asynchronously from the current primary DASD site or secondary DASD site (with cascaded SRDF) with an intended recovery point objective (RPO) in a short time period. SRDF/Star with AutoSwap The three-site SRDF/Star with AutoSwap configuration provides for near-continuous availability through device failover between sites DC1 and DC2 as well as disaster restart capabilities at site DC3. Figure 28 on page 86 illustrates EMC GDDR operation in a Concurrent SRDF/Star with AutoSwap environment. Figure 29 on page 87 illustrates EMC GDDR operation in a Cascaded SRDF/Star with AutoSwap environment. Supported configurations 85
  • 86. EMC Geographically Dispersed Disaster Restart (EMC GDDR) DC1 DC2 AutoSwap EMC EMC GDDR GDDR AutoSwap AutoSwap R1 R2 SRDF/S DC3 SRDF/A R2 GDDR heartbeat communication Active Escon/Ficon channels EMC Standby Escon/Ficon channels GDDR Active SRDF links SRDF links in standby mode SYM-002228 Figure 28 Concurrent SRDF/Star with AutoSwap environment 86 GDDR Solution Design and Implementation Techniques
  • 87. EMC Geographically Dispersed Disaster Restart (EMC GDDR) DC1 DC2 AutoSwap EMC EMC GDDR GDDR AutoSwap AutoSwap R1 R21 SRDF/S SRDF/A DC3 R2 GDDR heartbeat communication Active Escon/Ficon channels EMC Standby Escon/Ficon channels GDDR Active SRDF links SRDF links in standby mode SYM-002222 Figure 29 Cascaded SRDF/Star with AutoSwap environment As Figure 28 on page 86 and Figure 29 on page 87 show, the relationship between the DC1 and DC2 sites is maintained through SRDF/Synchronous replication of primary disk images at DC1 to DC2. Both open systems (FBA) and mainframe (CKD) disk images can be replicated. In a concurrent configuration, the asynchronous relationship is between DC1 and DC3, while in a cascaded environment, the asynchronous relationship is between DC2 and DC3. These illustrations show the three EMC GDDR C-Systems with their independent heartbeat communication paths, separate from the production disk and computer facilities. Each of the DC1 and DC2 production z/OS LPARs has EMC AutoSwap and EMC Consistency Group (ConGroup) installed. Supported configurations 87
  • 88. EMC Geographically Dispersed Disaster Restart (EMC GDDR) In this environment, EMC GDDR can perform the following tasks: ◆ Manage planned site swaps ◆ Manage recovery after unplanned site swaps ◆ Manage reconfiguration of the SRDF/Star environment between the concurrent and cascaded topologies ◆ Manage reconfiguration of the SRDF/Star environment from cascaded to concurrent with a primary processing site move ◆ Perform standard operational tasks: • IPL, system reset, activate, deactivate • Trigger stop/start of business workloads ◆ Actively monitor for unplanned/failure events, including: • Sites • Systems • ConGroup trip • Loss of SRDF/S • Loss of SRDF/A • Inter-site communication failure • AutoSwap events ◆ Configure/reconfigure: • Couple datasets ◆ Manage coupling facilities: • Policies 88 GDDR Solution Design and Implementation Techniques
  • 89. EMC Geographically Dispersed Disaster Restart (EMC GDDR) EMC GDDR requirements Mainframe environment requirements The basic infrastructure must support the specific EMC technologies involved in the configuration (for example, SRDF/S with AutoSwap or SRDF/A). In addition, EMC GDDR has the following specific infrastructure requirements: ◆ One LPAR is required for each C-System, preferably using DASD isolated from the managed environment. ◆ There must be network connectivity between all C-Systems. ◆ An HMC (Hardware Management Console) must be available at each site that can be accessed from each C-System (access to these HMCs can be protected by means of a private VLAN). EMC GDDR has the mainframe environment requirements listed in Table 1 on page 89. Table 1 Mainframe environment requirements Item Requirements Processor hardware configuration Any system that supports current IBM mainframe operating systems DASD hardware configuration Any supported Symmetrix DASD model at an Enginuity microcode level specified in the EMC GDDR Release Notes Software Any currently supported IBM operating system EMC GDDR supports and can manage the following combinations of DASD in a single Enterprise Consistency Group: ◆ Single EMC Symmetrix controllers configured with any of the following: • All CKD devices • All FBA and FBA-META devices • Any combination of CKD, FBA and FBA-META devices ◆ Multiple EMC Symmetrix controllers configured with any of the following: • All CKD devices EMC GDDR requirements 89
  • 90. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • All FBA and FBA-META devices • Any combination of CKD, FBA and FBA-META devices Management and monitoring of both CKD and FBA/FBA-META devices is performed from the z/OS platform where the EMC GDDR application resides. From the EMC GDDR point of view, CKD and FBA/FBA-META Symmetrix devices are the same; that is, each is treated no differently than the other. They are all command targets of SRDF Host Component configuration commands using local, or remote syntax. EMC GDDR requires that if even only one device in an RDF group is defined to GDDR, then all devices in that group must be defined to GDDR. Most GDDR actions are directed at the RDF group level (although in some cases, GDDR will act on device ranges if that is appropriate). EMC GDDR has no limitations on the number of EMC Symmetrix controllers/devices that can be managed. Any limitations are subject to restrictions in EMC hardware and software. Minimum hardware requirements Table 2 on page 90 describes the recommended minimum processor and I/O configuration for an EMC GDDR C-System. Table 2 Minimum hardware requirements Item Requirements Logical processors 1 (2 are recommended) MSU 15 on IBM 2084-306 (or equivalent) Storage 512 MB Logical paths to own local DASD devices 4 Logical paths to managed DASD devices 4 Minimum software requirements The minimum software requirements for EMC GDDR for the various supported configurations are listed in the following sections. The EMC GDDR Release Notes provide the latest information regarding 90 GDDR Solution Design and Implementation Techniques
  • 91. EMC Geographically Dispersed Disaster Restart (EMC GDDR) supported software release levels. Installation procedures for the EMC software products listed can be found in the EMC Mainframe Enablers Installation and Customization Guide. Table 3 Software requirements Configuration HMC API or BCPii MFE Consistency AutoSwap support Group SRDF/S with ConGroup    SRDF/S with AutoSwap     SRDF/A   SRDF/Star    SRDF/Star with AutoSwap     Additional configuration requirements The following sections list additional configuration requirements for the supported EMC GDDR configurations. SRDF/S with ConGroup Refer to the EMC SRDF Host Component for z/OS Product Guide for information on configuring an SRDF/S environment. SRDF/S with AutoSwap Refer to the EMC SRDF Host Component for z/OS Product Guide for information on configuring an SRDF/S environment. ◆ SRDF/S with AutoSwap has the following additional requirements: ◆ CAX protection must be added to the SRDF/S-defined ConGroups. ◆ LOSTOWNERPOLICY ONSWAP=OPERATOR must be specified. The EMC Consistency Group for z/OS Product Guide and EMC AutoSwap Product Guide provide information on these items. SRDF/A Refer to the EMC SRDF Host Component for z/OS Product Guide for information on configuring an SRDF/A environment. EMC GDDR is compatible with SRDF Automated Recovery functionality. EMC GDDR requirements 91
  • 92. EMC Geographically Dispersed Disaster Restart (EMC GDDR) SRDF/A MSC has the following additional gatekeeper requirement: ◆ There must be one or more gatekeeper devices for each MSC-controlled RDF group. These gatekeeper devices must be in OS configuration as OFFLINE at IPL as regular local devices (not BCV, SRDF, SAV, and so forth). SRDF/Star Refer to the SRDF Host Component for z/OS Product Guide for information on configuring an SRDF/Star environment. SRDF/Star with AutoSwap Refer to the SRDF Host Component for z/OS Product Guide for information on configuring an SRDF/Star environment. SRDF/Star with AutoSwap has the following additional requirements: ◆ CAX protection must be added to the SRDF/Star-defined ConGroup. ◆ LOSTOWNERPOLICY ONSWAP=OPERATOR must be specified. The EMC Consistency Group for z/OS Product Guide and EMC AutoSwap Product Guide provide information on these items. In addition, there must be one or more gatekeeper devices for each MSC-controlled RDF group. These gatekeeper devices must be: ◆ SRDF/S-protected (you can choose any available SRDF/S RDF group) ◆ ConGroup and CAX-protected ◆ Not SRDF/A-protected ◆ If CKD, in OS configuration as ONLINE at IPL ◆ If FBA, in OS configuration as OFFLINE at IPL 92 GDDR Solution Design and Implementation Techniques
  • 93. EMC Geographically Dispersed Disaster Restart (EMC GDDR) EMC GDDR fundamental concepts EMC GDDR control systems The EMC GDDR control systems are more commonly referred to as EMC GDDR C-Systems. One EMC GDDR C-System is located at each site in a separate z/OS System. Each EMC GDDR C-System runs in monoplex mode from local DASD. EMC GDDR C-Systems do not run any production workload. EMC recommends that GDDR is installed dedicated C-System DASD on separate controllers from the production DASD. Deploying GDDR's C Systems as dedicated z/OS systems insures their availability for both planned and unplanned events effecting application and/or site operations. Isolation of GDDR C-Systems from the sysplex(s) insures GDDR operational isolation and provides capabilities other automated business continuity solutions cannot offer. The main functions of an EMC GDDR C-System are to: ◆ Control the recovery after an outage ◆ Control a planned site swap One of the C-Systems is designated as the Master C-System. During normal operations, the Master C-System is the central control point for all EMC GDDR activities. The Master C-System is located at the secondary DASD site in configurations with AutoSwap and on the primary DASD site in other configurations. All EMC GDDR C-Systems are potential candidates to take over as the Master C-System and if the Master C-System becomes unavailable for some reason, a C-System at another location/site assumes the EMC GDDR master function ownership. This way, either the Master-C system survives a site loss, or the Master-C role is automatically transferred to a surviving site. When the original Master C-System becomes available, the master function ownership automatically transfers back to the correct location. Some EMC GDDR functions can only be carried out by the Master C-System, for example: ◆ Running planned processes ◆ Updating EMC GDDR parameters EMC GDDR fundamental concepts 93
  • 94. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Workload location In an EMC GDDR complex without AutoSwap, the business or production workload runs at a single site; that is, one side of the Sysplex. This is the same location as the primary DASD site. In an EMC GDDR complex with AutoSwap, the business or production workload can run as either a single site workload, usually at the same location as the primary DASD site, or as a multi-site workload, where the production workload runs at both the primary and secondary sites. In an EMC GDDR complex with an SRDF/A two-site configuration the production systems and primary DASD must always be at the same site. A contingency or standby system is a system that provides backup to a production system. The characteristics of a contingency system in an EMC GDDR installation depend on the configuration. Contingency or standby systems are typically located at the same location as the secondary DASD. Multiple locations containing contingency or standby systems may be used to increase availability and provide disaster restart options. Regional contingency systems are typically located in the same location as the secondary DASD, while out-of-region standby systems provide protection from geographic and infrastructure exposures that may negatively impact the primary and secondary sites. In an SRDF/S two-site configuration, a contingency system normally provides a hot backup to the production system and is typically located at the same location as the secondary DASD. The contingency system: ◆ Is in the same Sysplex as its production system partner ◆ Is IPLed, but runs no business workload In an SRDF/A two-site configuration, a contingency or standby system is a system that replaces production system capacity in the event of a loss of use of the primary site. It is located at an appropriate distance from the primary systems to minimize risks from geographic and infrastructure exposures which may negatively impact primary systems availability. The contingency system: ◆ May be used for expendable workloads which are displaced by business workload following the loss of a primary site. 94 GDDR Solution Design and Implementation Techniques
  • 95. EMC Geographically Dispersed Disaster Restart (EMC GDDR) ◆ May be cold (not powered up), or warm (powered up but not IPLed) systems reserved for business workload restart and testing of restart processes, but not in support of any meaningful day-to-day workload. A contingency or standby system in a SRDF/Star configuration normally provides a hot backup to a production system. A contingency system: ◆ Is in the same Sysplex as its production system partner ◆ Is IPLed, but runs no business workload ◆ May be an out-of-region location equipped with idle processor capacity reserved for support of business workload restarts Any production or contingency/standby system defined to EMC GDDR is known as an EMC GDDR managed system. Managed workloads EMC GDDR can trigger the stop and restart of production workloads on: ◆ z/OS systems ◆ Distributed systems External workloads External workloads run on mainframe systems which do not have their DASD in the managed Symmetrix units. EMC GDDR can coordinate stop and start of the workload on these "non-managed" mainframe systems with the workload stop and start actions for managed systems. Excluded systems Excluded systems apply to GDDR configurations with AutoSwap only. Symmetrix arrays can be shared by multiple systems and some of those systems may not require EMC GDDR protection. EMC GDDR can be configured to exclude certain z/OS systems from workload management - although these systems have their DASD in the managed Symmetrix arrays. Because of this, these systems must be running AutoSwap, or else they would fail during GDDR managed site swaps. HMC-Bypass option Applies at the LPAR-level or the Site level and prevents GDDR from performing ANY action that would require HMC access for the relevant site or LPAR. Goal of this option is to allow GDDR to be deployed at sites under a third-party Facilities Management contract, where the third-party has exclusive access to the HMC, and this HMC-access by individual customers using the provided facilities is not allowed. EMC GDDR fundamental concepts 95
  • 96. EMC Geographically Dispersed Disaster Restart (EMC GDDR) HMC-only systems EMC GDDR can be configured to restrict IPL and CBU (Capacity Backup Upgrade) actions for certain systems to the online interface. No other actions or automation are performed for these systems. EMC GDDR processes An EMC GDDR process or script is a dynamically determined sequence of function calls to achieve a predetermined result. Generally one function call corresponds to one action. An EMC GDDR process is executed by calling EMC GDDR provided routines, either from a batch job or as a result of specific messages being issued. There are two types of EMC GDDR processes: Planned process An EMC GDDR planned process is initiated through the EMC GDDR interface to perform a planned task. Unplanned process or Takeover process The EMC GDDR unplanned process or takeover process can only be initiated following an error that results in a possible takeover situation. Takeover processes are initiated as a result of certain messages being issued or specific events occurring. The messages or events that trigger an unplanned or takeover process can originate on any system, either a C-System or a production system. In response to a trigger event the current Master C-System, following operator confirmation of any trigger event, will invoke the necessary processes based on the current EMC GDDR complex and managed system configurations. Processes are invoked following operator confirmation of any of the following types of failure or loss: ◆ Sites ◆ DASD ◆ Systems ◆ Loss of SRDF link ◆ Loss of host channels EMC GDDR process restart In the event an EMC GDDR process does not complete properly the return codes from the function calls that make up an EMC GDDR process are saved in EMC GDDR global variables. For functions that issue EMC SRDF Host Component commands, the return code of the 96 GDDR Solution Design and Implementation Techniques
  • 97. EMC Geographically Dispersed Disaster Restart (EMC GDDR) commands are also saved. If multiple commands are issued from one function, the return codes from each command are saved in EMC GDDR global variables. After the cause of the original failure has been identified and resolved, the EMC GDDR process can be rerun. EMC GDDR uses the saved return codes to establish the point of restart; that is, the point of the previous failure. This ensures that no modifications to the supplied EMC GDDR process jobs are required in order to rerun after a failure. If the underlying problem causing a GDDR function to fail, has been resolved in a way that makes the GDDR function call unnecessary, the GDDR interface provides an easy way to skip the failed function call. If necessary, restart can be forced in a particular step by manually setting the EMC GDDR system variables appropriately. EMC GDDR fundamental concepts 97
  • 98. EMC Geographically Dispersed Disaster Restart (EMC GDDR) EMC GDDR components The EMC GDDR software environment includes a many components controlled by a parameter library. Parameters define the EMC GDDR managed configuration. Some parameters are maintained in external parmlib members, but most are maintained via the EMC GDDR user interface and loaded into EMC GDDR global variables by the parameter activation process. The user interface is an ISPF application available only on the EMC GDDR C-Systems. There are events in which EMC GDDR is interested and message rules that determine the actions to be taken in response to those events. Most GDDR procedures run as started tasks and include EMC foundation products such as ResourcePak Base and SRDF Host Component. Parameters Internal parameters The GDDR Parameter Wizard is the control interface for managing GDDR parameters, it should only be used should only be used on the C-system which is intended to be the Master C-System for the managed configuration. During the activation process, the new parameters are loaded into EMC GDDR global variables on the Master C-System and propagated to the remaining C-Systems using GDDR peer-to-peer communication. 98 GDDR Solution Design and Implementation Techniques
  • 99. EMC Geographically Dispersed Disaster Restart (EMC GDDR) The basic process of updating EMC GDDR parameters consists of the following steps: ◆ Back up the existing parameters. The parameter management process uses an existing backup of parameter values as the starting point for subsequent updates. ◆ Select the backup as input. ◆ Modify parameters as required. The updates are captured in PDS members in a work dataset. ◆ Validate the parameter set. This is done via a batch job submitted from the user interface. ◆ Activate the parameter set. This is also a batch job submitted from the user interface. User interface The EMC GDDR user interface is an ISPF application. It is used only on the EMC GDDR C-Systems. System components The EMC GDDR SAMPLIB contains an exec called GDDREXC that can be customized to allocate the required EMC GDDR and libraries. Once customized, this exec can be integrated into a site's ISPF panels in accordance with site standards. EMC GDDR panels will display only the options the user is authorized to use. The GDDR ISPF user interface does not currently support operating in both screens when in split screen mode. EMC GDDR components 99
  • 100. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Primary Options Menu Invoking EMC GDDR produces the Primary Options Menu, as shown in Figure 30 on page 100. Figure 30 Primary Options Menu The Primary Options Menu displays information on the current state of EMC GDDR in the environment. It shows the name of the C-System and site, which C-System is currently Master, which site is the primary processing site, which site is the primary DASD site, whether EMC GDDR automation is currently enabled, and the name of the planned or unplanned script, if any, that is currently in process. From this screen, EMC GDDR automation can be toggled on and off using the commands GDDR ON and GDDR OFF. This menu also provides access to the following functions: 100 GDDR Solution Design and Implementation Techniques
  • 101. EMC Geographically Dispersed Disaster Restart (EMC GDDR) ◆ Update Personal GDDR ISPF Profile allows each user to change GDDR ISPF profile variable values associated with their TSO ID. This includes items like datasets required to submit scripts, and job card information. Each user must enter their profile updates before performing any setup and maintenance functions. ◆ GDDR Setup and Maintenance — Provides options to manage EMC GDDR internal parameters, system variables and system options. ◆ View GDDR configuration — Displays EMC GDDR configuration details for sites, features, and C-Systems. ◆ Manage Site Roles — Allows manual changes to be made to site roles, including transferring the Master role to a different C-System, changing the AutoSwap owner, and moving the MSC control function. ◆ Perform pre-script checkup — Displays any active events, shows the status of the EMC GDDR internal command queue, and performs communication checks to validate that all production systems and C-Systems are active and all inter-systems communications are enabled. ◆ Run GDDR Scripts — Allows the user to run EMC GDDR background scripts. ◆ View GDDR Script Statistics — Displays timing information for recently-executed scripts. ◆ Perform GDDR Actions — Provides options to perform actions such as HMC LPAR and CBU actions, HMC discovery, and managing couple datasets. EMC GDDR components 101
  • 102. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Option P: Profile - Update Personal GDDR ISPF Profile Choosing this option from the Primary Options Menu displays the screen shown in Figure 31 on page 102. Figure 31 Change GDDR ISPF Profile Variable Values The GDDR Subsystem name cannot be changed but all other values requested on the screen must be filled in to allow the user to perform EMC GDDR functions like parameter maintenance and script execution. 102 GDDR Solution Design and Implementation Techniques
  • 103. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Option M: Maintenance - GDDR Setup and Maintenance Figure 32 on page 103 shows the EMC GDDR Setup and Maintenance Menu. Figure 32 Setup and Maintenance The GDDR ON and GDDR OFF commands can be used from this menu to enable and disable EMC GDDR automation. This menu also provides access to the following functions: ◆ Manage GDDR Parameters — Provides access to a series of screens that are used to maintain the EMC GDDR internal parameters described in 5.4.1.1 Internal parameters. These screens will be presented in detail later in this TechBook. ◆ Message, Debug, and Trace options — Provides the ability to set default debugging levels and override those values for specific programs. EMC GDDR components 103
  • 104. EMC Geographically Dispersed Disaster Restart (EMC GDDR) ◆ Manage GDDR Internal Command Queue — Provides information on the command queue used by EMC GDDR to store SRDF commands that have not yet been executed. This screen can be used to alter EMC GDDR processing. EMC strongly recommends that no action be taken on this screen unless instructed to do so by EMC GDDR Solution Support. ◆ Perform HMC Discovery — Discovers the HMC objects at each site. This function can be used to validate the HMC information entered during EMC GDDR configuration as well as the EMC GDDR HMC interface installation and setup. ◆ Refresh GDDR Message Table — Refreshes the GDDRMSG table managed by GDDRMAIN ◆ Manage GDDR System Variables — Provides the ability to view and modify all system variables used by EMC GDDR to describe and manage the environment. Any updates done thru this panel remain local to the current C-system and are not subject to any validation. In general, use of this screen should be avoided without guidance from EMC GDDR Solution Support. 104 GDDR Solution Design and Implementation Techniques
  • 105. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Option G: GDDR Config - View GDDR Configuration Selecting this option displays a screen similar to Figure 33 on page 105 with the information relevant to the EMC GDDR configuration in use. No changes can be made on this screen. Figure 33 View GDDR Configuration EMC GDDR components 105
  • 106. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Option R: Roles - Manage Site Roles EMC GDDR script processing manages the location of the Master C-System, the AutoSwap Owner system, and the MSC Control site as part of normal operations. The Manage Site Roles panel is provided to address exception conditions such as preparing for a script restart. Figure 34 on page 106 shows the Manage Site Roles menu. Figure 34 Manage Site Roles The following choices are available: ◆ Transfer Master-C system function — Displays the sites in the EMC GDDR complex and indicates the current and recommended Master. The Master C-System function can be transferred to another site. ◆ Move AutoSwap owner — Displays the sites involved in an AutoSwap relationship and indicates the current and recommended owner. AutoSwap ownership can be transferred to another site. 106 GDDR Solution Design and Implementation Techniques
  • 107. EMC Geographically Dispersed Disaster Restart (EMC GDDR) ◆ Move Primary MSC Server — Displays the sites eligible to assume the MSC control function and indicates the current and recommended owner. MSC control can be transferred to another site. Option C: Checkup - Perform Pre-Script Checkup When this option is selected, EMC GDDR performs a series of health checks and returns the results. A screen similar to Figure 35 on page 107 is presented. Figure 35 Perform Health Check In this example, the following information is provided: ◆ There is an SRA and an MSC event ◆ GDDR is running in Degraded Mode ◆ The GDDMPARM member was found to be consistent EMC GDDR components 107
  • 108. EMC Geographically Dispersed Disaster Restart (EMC GDDR) ◆ We do not have STAR-HA (either it is not configured, or there is no secondary msc server) ◆ The GDDR command queue is empty ◆ All C-system are operating normally from an HMC perspective, but none of them have GDDRMAIN fully operational (some components are inactive, a D (detail) line command is available to drill-down on this situation) ◆ There was an HMC setup issue preventing system-status determination for the 2 managed systems, and GDDRMAIN is not active of the managed systems Option S: Scripts - Run GDDR Scripts This option can only be chosen on the Master C-System. It displays the Select Script to Run screen, which is a dynamic display. It lists EMC GDDR scripts that are eligible to run under the current configuration, and also specifies configuration information and scripts that are in progress, if any. Depending on the location of the primary site and primary DASD locations, the panel is customized to display valid choices. Scripts that cannot be used at the current site do not appear. The Configuration field indicates the features of the configuration that EMC GDDR takes into account when determining which scripts are eligible to run and hence which are present in the list of scripts displayed on the panel. Figure 36 on page 109 shows a sample of the Scripts to Run screen. 108 GDDR Solution Design and Implementation Techniques
  • 109. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Figure 36 Select Script to Run EMC GDDR components 109
  • 110. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Option T: Timing - View GDDR Script Statistics Choosing this option displays a list of scripts that have run on this C-System, as shown in Figure 37 on page 110. Figure 37 View GDDR Script Statistics Selecting a script displays step start and end times. If a script were currently running, it would be displayed on the "Last planned:" or "Last unplanned:" line. 110 GDDR Solution Design and Implementation Techniques
  • 111. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Option A: Actions - Perform GDDR Actions Figure 38 on page 111 is the sub-menu displayed when Option A is chosen. Figure 38 Actions Menu From this menu, the following actions are available: ◆ Perform HMC Discovery — Discovers the HMC objects at each site. This function can be used to validate the HMC information entered during EMC GDDR configuration as well as the EMC GDDR HMC interface installation and setup. This function is also available from the GDDR Setup and Maintenance menu. ◆ Perform HMC LPAR actions — Provides the ability to carry out actions against the EMC GDDR defined sites and systems including activating or deactivating an LPAR, performing a load EMC GDDR components 111
  • 112. EMC Geographically Dispersed Disaster Restart (EMC GDDR) clear, reset clear, stop or start for an LPAR, and querying an LPAR for its status. This option can only be used on the Master C-System. ◆ Perform HMC CBU actions — Presents a list of sites with processors enabled for CBU (Capacity Backup Upgrade) activation, if any. Action codes can be used to activate backup capacity (REAL mode), simulate activation of backup capacity (TEST mode), terminate actual backup capacity activation, or terminate simulated backup capacity activation. These actions can only be performed on the Master C-System. ◆ Manage Couple Datasets — Checks the couple dataset configurations on all systems. Currently used couple datasets are compared against those defined on the "Define Managed Couple Datasets" panel during EMC GDDR parameter definition. If discrepancies are found, SETXCF commands are issued to bring couple dataset usage in line with EMC GDDR parameter specifications. This function is provided to adjust couple datasets when it is not appropriate to run a site swap script, which will also align couple dataset usage with the defined parameters. Events and rules An EMC GDDR event is a state change in a component part of the environment that EMC GDDR is actively monitoring. Examples of EMC GDDR events include: ◆ CGT — ConGroup trip has occurred/state change ◆ CGD — ConGroup group is disabled/state change ◆ SRA — SRDF/A link is down ◆ MHB — Missing C-System heartbeat ◆ LNK — Link to a C-System down ◆ RDF — RDF link down ◆ LDR — Local Disaster ◆ RDR — Regional Disaster Events are stored in EMC GDDR global variables. An event can have a state of either TRUE or FALSE. If the event has a state of TRUE, it has occurred or is currently occurring. If the event has a state of FALSE, it is no longer occurring. An event that is TRUE is considered an exception. EMC GDDR events are used by the GDDR event monitor and GDDR processes to determine environment state. A change in state can then: 112 GDDR Solution Design and Implementation Techniques
  • 113. EMC Geographically Dispersed Disaster Restart (EMC GDDR) ◆ Trigger unplanned/takeover processes ◆ Prevent a planned process from running EMC GDDR is supplied with message interception rules These message rules have two primary functions: ◆ To detect events that EMC GDDR is interested in and set the appropriate EMC GDDR event TRUE or FALSE. ◆ To detect events that EMC GDDR processes have to wait for (WTOR), and reply as to the success or failure of the waited for event. This will determine if an EMC GDDR process proceeds or terminates. EMC GDDR uses intersystems communication to route message traffic between production systems and C-systems. The arrival of a message at the target production system can be used to trigger an automation rule (for example in IBM Tivoli NetView or BMC Control-M). Such rules can be used to start or shut down workloads on the appropriate systems. Procedures There are several JCL procedures used in an EMC GDDR environment, though most are not part of EMC GDDR itself. They include the tasks for EMC foundation products such as ResourcePak Base and SRDF Host Component. The names presented here are the vendor defaults and may not be the names used in any given environment. In case a customer environment does not support the BCPii HMC interface, CA-OPS is required. Refer to the GDDR product guides for more information. EMC foundation products Depending on the EMC GDDR configuration in use, there will be two or three EMC foundation product tasks running. All environments will run EMCSCF and EMCRDF, and environments including SRDF/S will also run EMCCGRP: ◆ EMCSCF — ResourcePak Base or Symmetrix Control Facility ◆ EMCRDF — SRDF Host Component ◆ EMCCGRP — Consistency Group For more information on the EMC foundation products refer to Chapter 2 of this book or the EMC product documentation. EMC GDDR components 113
  • 114. EMC Geographically Dispersed Disaster Restart (EMC GDDR) EMC GDDR procedures These are the JCL procedures needed in an EMC GDDR environment: ◆ GDDRMAIN — Main EMC GDDR address space. ◆ GDDREVM — EMC GDDR Event Monitor. The event monitor is used to analyze event state changes in which EMC GDDR is interested. On detecting the occurrence of selected events, the event monitor determines what action to take and prompts operators with the appropriate choices via WTOR. ◆ GDDRHBM — EMC GDDR C-System heartbeat monitor. The heartbeat monitor aids the event monitor in determining the status of the EMC GDDR managed environment. The lack of a heartbeat from a particular C-System is used to determine the state of a C-System and the site. ◆ GDDRWORK — GDDR Workers. Some pieces of work required for proper GDDR functioning will run under separate GDDRWORK started tasks. This includes all GDDR HMC functions, GDDR Command queue processing, parts of GDDR scripts shipped to a remote C-system. ◆ GDDRPROC — Used to run scripts. This is the only EMC GDDR procedure not used for a started task that is up and running at all times. Started task locations The started tasks that need to run on any given system in the EMC GDDR complex depend on whether it is a managed production or C-System, which site it is, and which configuration is in use. GDDRMAIN runs on all C-systems in the GDDR complex. In three-site configurations, it optionally does not run at DC3, if the customer selects to run DC3 as a "lights-out" data centre. GDDRMAIN also runs on all fully managed production systems. GDDREVM, GDDRHBM and GDDRWORK run on all C-Systems at all sites in the EMC GDDR complex. They are started automatically by GDDRMAIN. They are not required on the productions systems. EMCSCF runs on all C-Systems and all IPLed production systems at all sites. EMCRDF runs on all C-systems. 114 GDDR Solution Design and Implementation Techniques
  • 115. EMC Geographically Dispersed Disaster Restart (EMC GDDR) EMCCGRP runs on C-Systems and one or more production systems at the primary DASD site when SRDF/S is in use. It runs on all systems both at DC1 and DC2 in AutoSwap configurations. EMC GDDR procedures 115
  • 116. EMC Geographically Dispersed Disaster Restart (EMC GDDR) EMC GDDR supported scripts Table 4 shows the mapping of script names with a short description of the script function as well as the event type that causes the script to be generated. Table 4 Script names and descriptions (page 1 of 2) Script name Description Event Type GDDRPA51 Reconfigure to Concurrent RECONFIGURE GDDRPA52 Reconfigure to Cascaded RECONFIGURE GDDRPA53 Reconfigure to concurrent SRDF with Site Move RECONFIGURE GDDRPA42 Swap production from <P> to <S> (Protected) PLANNED GDDRPA45 Swap production from <P> to <S> (FAST) PLANNED GDDRPA21 Swap DASD from <P> to <S> (Protected) PLANNED GDDRPA25 Swap DASD from <P> to <S> - FAST PLANNED GDD2P17A Abandon Site <P> –(site swap) PLANNED GDD2P18A Restart production at <S> after site swap PLANNED GDD2P01A Perform test IPL from BCVs at <tgt site> PLANNED GDDRPA27 Perform test IPL from R2s at DC3 PLANNED GDD2P03A Perform test IPL from R2s at <S> PLANNED GDDRUP31 Recover after unplanned swap UNPLANNED GDD2U09A Recover after unplanned swap or LDR UNPLANNED GDD2U13A Recover after loss of <P> (LDR) UNPLANNED GDD2U10A Recover after loss of <P> (LDR) UNPLANNED GDD2U12A Recover after loss of <P> (LDR) UNPLANNED GDDRUP41 Recover after loss of <S> UNPLANNED GDDRPA0A(1) Resume replication after unplanned swap UNPLANNED GDDRPA0A(2) Resume replication after loss of <P> UNPLANNED GDDRPA0A(3) Resume replication after loss of <S> UNPLANNED GDD2PA0A(4) Resume replication after unplanned swap UNPLANNED GDD2PA0A(5) Resume replication after loss of <P> UNPLANNED 116 GDDR Solution Design and Implementation Techniques
  • 117. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Table 4 Script names and descriptions (page 2 of 2) Script name Description Event Type GDDRPA23 Resume ConGroup RESUME GDD2P02A Resume after test IPL from BCVs at <tgt site> RESUME GDDRPA28 Resume SRDF/A after test IPL at DC3 RESUME GDDRPM29 Resume SRDF/A in MSC mode RESUME GDDRPF29 Resume SRDF/A in STAR mode RESUME GDD2P14A Resume replication after link failure RESUME GDD2P16A Resume after test IPL from R2s at <S> RESUME GDDRPAAB Abandon Sites DC1 and DC2 DC3 ONLY GDDRPA05 Recover at DC3 after RDR at DC1 and DC2 DC3 ONLY GDDRPA06 Restart Production LPARs at DC3 SRDFA to <tgt site> DC3 ONLY GDDRPA07 Recover at DC3 after LDR at DC1 SRDFA to DC2 DC3 ONLY Where: ◆ PLANNED — Result from operator interaction with GDDR panels ◆ UNPLANNED — Result from unplanned events detected by the GDDR Event Monitor (EVM) ◆ RESUME — Resume replication after some type of service interruption (planned or unplanned) ◆ DC3 Only — These scripts only execute at DC3 for planned test events or for unplanned disaster event management. EMC GDDR supported scripts 117
  • 118. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Table 5 shows scripts mapped to the configuration type where the script executes. Table 5 Script mapping (page 1 of 2) Script Name AutoSwap Concurrent Star with Concurrent Star Concurrent Star 2 Site AutoSwap Cascaded Star with Cascaded Star Cascaded Star with EDP SRDF/A with MSC SRDF/S with AutoSwap SRDF/S GDDRPA51   GDDRPA52   GDDRPA53  GDDRPA42   GDDRPA45  GDDRPA21   GDDRPA25  GDD2P17A     GDD2P18A     GDD2P01A          GDDRPA27     GDD2P03A   GDDRUP31   GDD2U09A  GDD2U13A     GDD2U10A  GDD2U12A  GDDRUP41   GDDRPA0A(1)    118 GDDR Solution Design and Implementation Techniques
  • 119. EMC Geographically Dispersed Disaster Restart (EMC GDDR) Table 5 Script mapping (page 2 of 2) Script Name AutoSwap Concurrent Star with Concurrent Star Concurrent Star 2 Site AutoSwap Cascaded Star with Cascaded Star Cascaded Star with EDP SRDF/A with MSC SRDF/S with AutoSwap SRDF/S GDDRPA0A(2)     GDDRPA0A(3)   GDD2PA0A(4)  GDD2PA0A(5)    GDDRPA23     GDD2P02A          GDDRPA28     GDDRPM29       GDDRPF29    GDD2P14A  GDD2P16A   GDDRPAAB       GDD2PA05       GDD2PA06       GDD2PA07      GDDR for SRDF/S with ConGroup (Two sites - DC1 and DC2) Planned script operations ◆ Abandon Site DC1 (site swap) — Shut down the single site workload at the primary site in preparation for the restart of processing at the secondary site: • Stop the business workload at the primary DASD site EMC GDDR supported scripts 119
  • 120. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Wait for the stop of all business applications • Reset clear all production systems managed by EMC GDDR ◆ Restart production at DC2 after site swap — Restart the single site workload after the 'Abandon Site DC1 (site swap)' script has completed successfully: • Attempt reset clear of all systems at the primary DASD site • Activate CBU (if required) • Activate all needed LPARs, including CFs at the secondary DASD site • Create a consistency point at the secondary DASD site • Prepare the SRDF environment • IPL all needed production systems ◆ Perform test IPL from BCVs at DC2 — IPL contingency systems at site DC2 using BCV devices: • Split BCVs; make them R/W • Activate test LPARs using BCV volumes • Start test business workload, if applicable ◆ Perform test IPL from R2s at DC2 — IPL contingency systems at site DC2 using R2 devices: • Confirm that SRDF/S has been stopped normally via a ConGroup trip • Activate LPARs using R2 volumes • Start test business workload, if applicable Unplanned script operations ◆ Recover after loss of DC1 (LDR) — Restart the single site workload after the GDDR Event Monitor has detected a ConGroup trip and the GDDR Heartbeat Monitor has timed out. These events occurring concurrently are viewed by EMC GDDR as a local disaster: • Confirm that a ConGroup trip occurred • Confirm that SRDF links failed • Confirm that a local disaster (LDR) event occurred • Shut down applications at the primary site, if applicable • Split BCVs and condition R2s at secondary site for restart • Activate contingency systems 120 GDDR Solution Design and Implementation Techniques
  • 121. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Restart applications ◆ Resume replication after loss of DC1 — Resume the SRDF/S link to the secondary site after a primary site disaster: • Confirm SRDF/S links are down • Split BCVs at the secondary site, if applicable • Issue ConGroup cleanup and SRDF/S restart commands • Reestablish BCVs at the secondary site Resumption operations ◆ Resume after test IPL from BCVs at DC2 — Reset clear contingency systems after a test at DC2: • Stop test business workload, if applicable • Reset clear test LPARs • Reestablish the BCVs ◆ Resume after test IPL from R2s at DC2 — Reset clear contingency systems after a test at DC2: • Stop test business workload, if applicable • Reset clear test LPARs • Restart SRDF/S to DC2 ◆ Resume replication after link failure — Restore the SRDF/S link after a loss of the link: • Confirm SRDF/S links are down • Stop ConGroup on all systems • Split BCVs at the secondary site, if applicable • Issue ConGroup cleanup and restart commands • Reestablish BCVs at the secondary site GDDR for SRDF/S with AutoSwap (Two sites - DC1 and DC2) Planned script operations ◆ Swap production from DC1 to DC2 — Swap the DASD and the single-site workload from site DC1 to site DC2: • Stop the business workload at the primary DASD site • Swap the DASD to the secondary DASD site (AutoSwap followed by SRDF/S personality swap) EMC GDDR supported scripts 121
  • 122. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Resume SRDF/S • Restart the business workload ◆ Swap DASD from DC1 to DC2 — Swap the DASD only from site DC1 to site DC2. ◆ Perform test IPL from BCVs at DC2 — IPL contingency systems at site DC2 using BCV devices: • Split BCVs; make them R/W • Activate test LPARs using BCV volumes • Start test business workload, if applicable ◆ Perform test IPL from R2s at DC2 — IPL contingency systems at site DC2 using R2 devices: • Confirm that SRDF/S has been stopped normally via a Congroup trip • Activate LPARs using R2 volumes • Start test business workload, if applicable Unplanned script operations ◆ Recover after unplanned swap — Restart processing at the secondary site: • Stop business workload on the old primary site (if applicable) • Split BCVs at the new primary site • Move couple datasets to the new primary site (if applicable) • Trigger restart of business workload at the new primary site • Transfer Master C-System to the new secondary site ◆ Recover after loss of DC1 (LDR) — Restart the single site workload after the GDDR Event Monitor has detected a ConGroup trip and the GDDR Heartbeat Monitor has timed out. These events occurring concurrently are viewed by EMC GDDR as a local disaster: • Confirm that a local disaster (LDR) event occurred • Condition R2s at the secondary site for restart • Shut down applications at the primary site, if applicable • Split BCVs at the secondary site • Activate contingency systems • Manage Couple Facility structures and datasets 122 GDDR Solution Design and Implementation Techniques
  • 123. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Restart applications ◆ Resume replication after unplanned swap — Resume the SRDF/S link to the secondary site after a primary site disaster: • Confirm SRDF/S links are down • Split BCVs at the secondary site, if applicable • Perform ConGroup cleanup, transfer AutoSwap ownership and restart SRDF/S • Manage Couple Facility structures and datasets • Reestablish BCVs at the secondary site Resumption operations ◆ Resume after test IPL from BCVs at DC2 — Reset clear contingency systems after a test at DC2: • Stop test business workload, if applicable • Reset clear test LPARs • Reestablish the BCVs ◆ Resume after test IPL from R2s at DC2 — Reset clear contingency systems after a test at DC2: • Stop test business workload, if applicable • Reset clear test LPARs • Restart SRDF/S with AutoSwap to DC2 ◆ Resume replication after link failure — Restore the SRDF/S link after a loss of the link: • Confirm SRDF/S links are down • Stop ConGroup on all systems • Split BCVs at the secondary site, if applicable • Issue ConGroup cleanup and restart commands • Reestablish BCVs at the secondary site GDDR for SRDF/A (Two sites - DC1 and DC3) Planned script operations ◆ Abandon Site DC1 (site swap) — Shut down the single site workload at the primary site in preparation for the restart of processing at the secondary site: EMC GDDR supported scripts 123
  • 124. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Stop the business workload at the primary DASD site • Wait for the stop of all business applications • Reset clear all production systems managed by EMC GDDR ◆ Restart production at DC3 after site swap — Restart the single site workload after the 'Abandon Site DC1 (site swap)' script has completed successfully: • Attempt reset clear of all systems at the primary DASD site • Activate CBU (if required) • Activate all needed LPARs, including CFs at the secondary DASD site • Create a consistency point at the secondary DASD site • Prepare the SRDF environment • IPL all needed production systems ◆ Perform test IPL from BCVs at DC3 — IPL contingency systems at site DC3 using BCV devices: • Split BCVs; make them R/W • Activate and load test LPARs using BCV volumes Unplanned script operations ◆ Recover after loss of DC1 (RDR) — Restart the single site workload after the GDDR Event Monitor has detected an SRDF/A link down event and the GDDR Heartbeat Monitor has timed out. These events occurring concurrently are viewed by EMC GDDR as a regional disaster: • Confirm that an MSC drop occurred • Confirm that SRDF links failed • Confirm that a regional disaster (RDR) event occurred • Shut down applications at the primary site, if applicable • Split BCVs and condition R2s at secondary site for restart • Activate contingency systems • Restart applications ◆ Resume replication after loss of DC1 — Resume the SRDF/A link to the secondary site after a primary site disaster: • Confirm SRDF/A links are down • Split BCVs at the secondary site, if applicable 124 GDDR Solution Design and Implementation Techniques
  • 125. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Issue MSC cleanup and SRDF/A restart commands • Reestablish BCVs at the secondary site Resumption operations ◆ Resume after test IPL from BCVs at DC3 — Reset clear contingency systems after a test at DC2: • Stop test business workload, if applicable • Reset clear test LPARs • Reestablish the BCVs ◆ Resume SRDF/A after link loss — Restore the SRDF/A link after a loss of the link (planned or unplanned). GDDR for SRDF/Star (Three sites) Reconfiguration operations ◆ Reconfigure to concurrent SRDF — Reconfigure a Cascaded SRDF/Star environment to a Concurrent SRDF/Star environment. The result is that site DC1 is protected at site DC2 using SRDF/S, and DC1 is the source of the SRDF/A replication to DC3. The workload continues at DC1: • Stop SRDF/A and delete the SRDF/A relationship between DC2 and DC3 • Perform a differential CREATEPAIR from DC1 to DC3 • Reestablish the SRDF/Star environment (DC1 to DC3) ◆ Reconfigure to cascaded SRDF — Reconfigure a Concurrent SRDF/Star environment to a Cascaded SRDF/Star environment. The result is that site DC1 is protected at site DC2 using SRDF/S, and DC2 is the source of the SRDF/A replication to DC3. The workload continues at DC1: • Stop SRDF/A and delete the SRDF/A relationship between DC1 and DC3 • Perform a differential CREATEPAIR from DC2 to DC3 • Reestablish the SRDF/Star environment (DC2 to DC3) Planned script operations ◆ Abandon Site DC1 (site swap) — Shut down the single site workload at the primary site in preparation for the restart of processing at the secondary site: EMC GDDR supported scripts 125
  • 126. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Stop the business workload at the primary DASD site • Wait for the stop of all business applications • Reset clear all production systems managed by EMC GDDR ◆ Restart production at DC2 after site swap — Restart the single site workload after the 'Abandon Site DC1 (site swap)' script has completed successfully: • Attempt reset clear of all systems at the primary DASD site • Activate CBU (if required) • Activate all needed LPARs, including CFs at the secondary DASD site • Create a consistency point at the secondary DASD site • Prepare the SRDF environment • IPL all needed production systems ◆ Perform test IPL from BCVs at DC3 — IPL contingency systems at site DC3 using BCV devices: • Split BCVs; make them R/W • Activate test LPARs using BCV volumes • Start test business workload, if applicable ◆ Perform test IPL from R2s at DC3 — IPL contingency systems at site DC3 using R2 devices: • Confirm that SRDF/A has been stopped normally via an SRDF/A PENDDROP • Activate LPARs using R2 volumes • Start test business workload, if applicable Unplanned script operations ◆ Recover after loss of DC1 (LDR) — Restart the single site workload after the GDDR Event Monitor has detected a local disaster (LDR) at the primary site. The following events occurring concurrently are considered a local disaster: Loss of SRDF/A, Loss of SRDF/S, or a ConGroup trip, or the primary site GDDR Heartbeat Monitor has timed out: Note: In the cascaded SRDF with ConGroup topology, the SRDF/A event is not considered for LDR. • Confirm that a local disaster (LDR) event occurred 126 GDDR Solution Design and Implementation Techniques
  • 127. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Deactivate systems at old primary site • Prepare the SRDF environment at secondary site • Create a consistent point on BCVs at DC3 • Perform a differential CREATEPAIR between the new primary site and DC3 • Start SRDF/A in MSC mode • Reestablish the SRDF/Star environment • Reestablish BCVs at DC3 • Activate production systems ◆ Recover after loss of DC2 — In environments which support Cascaded SRDF mode, restore data protection for the primary site should the secondary site suffer an outage resulting in loss of SRDF/S and SRDF/A. This script is submitted by the GDDR Event Monitor when a ConGroup trip is detected and the GDDR Heartbeat Monitor at the secondary site has timed out. These events occurring concurrently are viewed by EMC GDDR as a secondary site disaster. Following the restoration of the secondary site, the script 'Resume replication after loss of DC2' is used to resume SRDF/S: • Provide the option to stop primary site and secondary site workloads since all remote protection is lost • Split BCVs at DC3 • Remove the SRDF pairing between DC3 and the secondary DASD site • Perform a differential re-synchronization between the primary site and DC3 • Start SRDF/A in concurrent MSC mode • Provide the option to start primary site workloads • Reestablish BCVs at DC3 ◆ Resume replication after loss of DC1 — Resume the SRDF/S link to the secondary site after a primary site disaster: • Confirm SRDF/S links are down • Stop ConGroup on all systems • Split BCVs at the secondary site, if applicable • Issue ConGroup cleanup and restart commands • Reestablish BCVs at the secondary site EMC GDDR supported scripts 127
  • 128. EMC Geographically Dispersed Disaster Restart (EMC GDDR) ◆ Resume replication after loss of DC2 — In environments which support Cascaded SRDF mode, resume the SRDF/S link to the primary site after a secondary site disaster and restart SRDF/A in SRDF/Star mode. The script appears in the 'Select Script to Run' panel of the current Master C-System after completion of the 'Recover after loss of DC2' script: • Confirm SRDF/S links are down • Stop ConGroup on all systems • Split BCVs at the primary site, if applicable • Issue ConGroup cleanup and restart commands • Reestablish BCVs at the primary site Resumption operations ◆ Resume ConGroup — Resume consistency group protection following a ConGroup trip: • Resume the consistency group. • Reestablish the BCVs at the secondary DASD site, if applicable. ◆ Resume after test IPL from BCVs at DC3 — Reset clear contingency systems after a test at DC3: • Stop test business workload, if applicable • Reset clear test LPARs • Reestablish the BCVs ◆ Resume SRDF/A after test IPL at DC3 — Restore the SRDF/A link to DC3 (either from DC1 or DC2 depending upon where the production workload is currently running) after a test on DC3: • Reset clear all systems IPL’d during the test of DC3 (at DC3) • Deactivate all LPARs previously activated for the test of DC3 (at DC3) • Restart SRDF/Star to DC3 ◆ Resume SRDF/A in MSC mode to DC3 — Restore the SRDF/A link to DC3 in MSC mode (from either DC1 or DC2 depending upon where the production workload is currently running) after a planned or unplanned swap. ◆ Resume SRDF/A (SRDF/Star) to DC3 — Restore the SRDF/A link to DC3 in SRDF/Star mode (from either DC1 or DC2) after a planned or unplanned stop of SRDF/A. 128 GDDR Solution Design and Implementation Techniques
  • 129. EMC Geographically Dispersed Disaster Restart (EMC GDDR) DC3-specific operations ◆ Abandon Sites DC1 and DC2 — In the event of a perceived threat to both DC1 and DC2, perform an orderly shutdown of DC1 and DC2 and force the C-System at DC3 as the new EMC GDDR Master C-System: • Stop the business applications • Wait for the stop of all business applications • Reset clear all production systems managed by EMC GDDR • Create a consistency point on the DC3 BCVs • Create a consistency point on the secondary DASD BCVs • Stop ConGroup on all remaining systems • Force the C-System at DC3 as the new EMC GDDR Master C-System ◆ Recover at DC3 after RDR at DC1 and DC2 — In the event of a major failure that prevents the production workload from being run from either DC1 or DC2, restart the production LPARs at site DC3 and reestablish SRDF/A to site DC1 or DC2. There must be an SRDF/A connection to either DC1 or DC2: • Attempt reset clear of all systems at DC1/DC2 • Activate all needed LPARs • Activate CBU (if required) • Create a consistency point at DC3 • Prepare SRDF environment • IPL all needed production systems ◆ Restart production LPARs at DC3 SRDFA to DC1 or DC2 — In the event of a major failure that prevents the production workload from being run from either DC1 or DC2, restart the production LPARs at site DC3 and reestablish SRDF/A to site DC1 or DC2: • Attempt reset clear of all systems at DC1/DC2 (except C-System) • Activate all needed LPARs • Activate CBU (if required) • Create a consistency point at DC3 • Prepare SRDF environment • IPL all needed production systems EMC GDDR supported scripts 129
  • 130. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Perform a differential CREATEPAIR between DC3 and DC1 or DC2 • Start SRDF/A in MSC mode ◆ Recover at DC3 after LDR at DC1 SRDFA to DC2 — In the event of a local disaster (LDR) that prevents the production workload from being run at DC1, and DC2 is a bunker site (storage only), restart the production LPARs at site DC3 and reestablish SRDF/A to DC2. The distinction between this script and the 'Restart production LPARs at DC3 SRDFA to DC1 or DC2' scripts is, that the 'Star C-Ahead' indicator is interrogated to enable the workload to restart with the DC3 data, or with the DC2 data if it is more current: • Activate all needed LPARs including CFs at DC3 • Activate CBU (if required) • Create a consistency point at DC3 • Prepare SRDF environment • IPL all needed production systems • Perform a differential CREATEPAIR between DC3 and DC2 • Start SRDF/A in MSC mode GDDR for SRDF/Star with AutoSwap (Three sites) Reconfiguration operations ◆ Reconfigure to concurrent SRDF — Reconfigure a Cascaded SRDF/Star with AutoSwap environment to a Concurrent SRDF/Star with AutoSwap environment. The result is that site DC1 is protected at site DC2 using SRDF/S and AutoSwap, and DC1 is the source of the SRDF/A replication to DC3. The workload continues at DC1: • Stop SRDF/A and delete the SRDF/A relationship between DC2 and DC3 • Perform a differential CREATEPAIR from DC1 to DC3 • Reestablish the SRDF/Star with AutoSwap environment (DC1 to DC3) ◆ Reconfigure to concurrent SRDF with site move — Reconfigure a Cascaded SRDF/Star with AutoSwap environment to Concurrent SRDF/Star with AutoSwap with a primary processing site move. The initial state is that the workload at DC1 130 GDDR Solution Design and Implementation Techniques
  • 131. EMC Geographically Dispersed Disaster Restart (EMC GDDR) is protected at DC2 with SRDF/S. The result is that DC2 is protected at DC1 using SRDF/S and AutoSwap, and DC2 is the source of the SRDF/A replication to DC3. The workload will be running at DC2: • Stop the business workload at site DC1 • Swap the DASD to DC2 (AutoSwap followed by SRDF/S personality swap) • Reset clear all production systems managed by EMC GDDR • Resume SRDF/S (DC2 to DC1) • Restart the business workload • Reestablish the SRDF/Star with AutoSwap environment (DC2 to DC3) ◆ Reconfigure to cascaded SRDF — Reconfigure a Concurrent SRDF/Star with AutoSwap environment to a Cascaded SRDF/Star with AutoSwap environment. The result is that site DC1 is protected at site DC2 using SRDF/S and AutoSwap, and DC2 is the source of the SRDF/A replication to DC3. The workload continues at DC1: • Stop SRDF/A and delete the SRDF/A relationship between DC1 and DC3 • Perform a differential CREATEPAIR from DC2 to DC3 • Reestablish the SRDF/Star with AutoSwap environment (DC2 to DC3) Planned script operations ◆ Swap production from DC1 to DC2 — Swap the DASD and the single-site workload from the primary DASD site to the secondary DASD site. In environments with geographically dispersed Sysplexes supported by cross-site host-DASD channels, Couple Dataset and Coupling Facility Structure management is performed as a result of a site move, either planned or unplanned. These actions are controlled by the Realign Couple Datasets and Rebuild Couple Facility Structure parameters: • Stop the business workload at the primary DASD site • Swap the DASD to the secondary DASD site (AutoSwap followed by SRDF/S personality swap) • Resume SRDF/S EMC GDDR supported scripts 131
  • 132. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Restart the business workload • Reestablish the SRDF/Star with AutoSwap environment ◆ Swap DASD from DC1 to DC2 — Swap only the DASD from the primary DASD site to the secondary DASD site. ◆ Perform test IPL from BCVs at DC3 — IPL contingency systems at site DC3 using BCV devices: • Split BCVs; make them R/W • Activate test LPARs using BCV volumes • Start test business workload, if applicable ◆ Perform test IPL from R2s at DC3 — IPL contingency systems at site DC3 using R2 devices: • Confirm that SRDF/A has been stopped normally via an SRDF/A PENDDROP • Activate LPARs using R2 volumes • Start test business workload, if applicable Unplanned script operations ◆ Recover after unplanned swap — Restart processing at the secondary site after an unplanned swap has completed successfully. • Stop business workload on the old primary site (if applicable) • Trigger restart of business workload at the new primary site (if applicable) • Remove SRDF pairing between DC3 and the primary DASD site • Stop ConGroup on all systems • Perform a differential CREATEPAIR between the new primary DASD site and DC3 • Start SRDF/A in MSC mode ◆ Recover after loss of DC1 (LDR) — Restart the single site workload after the GDDR Event Monitor has detected a local disaster (LDR) at the primary site. The following events occurring concurrently are considered a local disaster: Loss of SRDF/A, Loss of SRDF/S, or a ConGroup trip, or the primary site GDDR Heartbeat Monitor has timed out: • Confirm that a local disaster (LDR) event occurred 132 GDDR Solution Design and Implementation Techniques
  • 133. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Deactivate systems at old primary site • Prepare the SRDF environment at secondary site • Create a consistent point on BCVs at DC3 • Perform a differential CREATEPAIR between the new primary site and DC3 • Start SRDF/A in MSC mode • Reestablish the SRDF/Star environment • Reestablish BCVs at DC3 • Manage Couple Facility structures and datasets • Activate production systems ◆ Recover after loss of DC2 — In environments which support Cascaded SRDF mode, restore data protection for the primary site should the secondary site suffer an outage resulting in loss of SRDF/S and SRDF/A. This script is submitted by the GDDR Event Monitor when a ConGroup trip is detected and the GDDR Heartbeat Monitor at the secondary site has timed out. These events occurring concurrently are viewed by EMC GDDR as a secondary site disaster. Following the restoration of the secondary site, the script 'Resume replication after loss DC2' is used to resume SRDF/S: • Provide the option to stop primary site and secondary site workloads since all remote protection is lost • Split BCVs at DC3 • Remove the SRDF pairing between DC3 and the secondary DASD site • Perform a differential re-synchronization between the primary site and DC3 • Start SRDF/A in concurrent MSC mode • Provide the option to start primary site workloads • Reestablish BCVs at DC3 ◆ Resume replication after unplanned swap — Resume the SRDF/S link to the secondary DASD site after an unplanned swap (due to the loss of the primary DASD). The script appears in the 'Select Script to Run' panel of the current Master C-System after completion of the 'Recover after unplanned swap' script: • Remove the SRDF pairing between the secondary DASD site and the primary DASD site EMC GDDR supported scripts 133
  • 134. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Clean ConGroup bits from secondary DASD • Resume SRDF/S (from current primary DASD site to the previous primary DASD site) • Put SRDF/A into SRDF/Star with AutoSwap mode ◆ Resume replication after loss of DC1 — Resume the SRDF/S link to the secondary site after a primary site disaster and restart SRDF/A in SRDF/Star with AutoSwap mode. The script appears in the 'Select Script to Run' panel of the current Master C-System after completion of the 'Recover after loss of DC1 (LDR)' script: • Confirm SRDF/S links are down • Stop ConGroup on all systems • Split BCVs at the secondary site, if applicable • Issue ConGroup cleanup and restart commands • Reestablish BCVs at the secondary site ◆ Resume replication after loss of DC2 — In environments which support Cascaded SRDF mode, resume the SRDF/S link to the primary site after a secondary site disaster and restart SRDF/A in SRDF/Star with AutoSwap mode. The script appears in the 'Select Script to Run' panel of the current Master C-System after completion of the 'Recover after loss of DC2' script: • Confirm SRDF/S links are down • Stop ConGroup on all systems • Split BCVs at the primary site, if applicable • Issue ConGroup cleanup and restart commands • Reestablish BCVs at the primary site Resumption operations ◆ Resume ConGroup — Resume consistency group protection following a ConGroup trip: • Resume the consistency group • Re-establish the BCVs at the secondary DASD site, if applicable ◆ Resume after test IPL from BCVs at DC3 — Reset clear contingency systems after a test at DC3: • Stop test business workload, if applicable • Reset clear test LPARs 134 GDDR Solution Design and Implementation Techniques
  • 135. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Reestablish the BCVs ◆ Resume SRDF/A after test IPL at DC3 — Restore the SRDF/A link to DC3 (either from DC1 or DC2 depending upon where the production workload is currently running) after a test on DC3: • Reset clear all systems IPLed during the test of DC3 (at DC3) • Deactivate all LPARs previously activated for the test of DC3 (at DC3) • Restart SRDF/Star with AutoSwap to DC3 ◆ Resume SRDF/A in MSC mode to DC3 — Restore the SRDF/A link to DC3 in MSC mode (from either DC1 or DC2 depending upon where the production workload is currently running) after a planned or unplanned swap. ◆ Resume SRDF/A (SRDF/Star AutoSwap) to DC3 — Restore the SRDF/A link to DC3 in SRDF/Star with AutoSwap mode (from either DC1 or DC2) after a planned or unplanned stop of SRDF/A. DC3-specific operations ◆ Abandon Sites DC1 and DC2 — In the event of a perceived threat to both DC1 and DC2, perform an orderly shutdown of DC1 and DC2 and force the C-System at DC3 as the new EMC GDDR Master C-System: • Stop the business applications • Wait for the stop of all business applications • Reset clear all production systems managed by EMC GDDR • Create a consistency point on the DC3 BCVs • Create a consistency point on the secondary DASD BCVs • Stop SRDF/Star with AutoSwap • Stop ConGroup on all remaining systems • Force the C-System at DC3 as the new EMC GDDR Master C-System ◆ Recover at DC3 after RDR at DC1 and DC2 — In the event of a major failure that prevents the production workload from being run from either DC1 or DC2, restart the production LPARs at site DC3 and reestablish SRDF/A to site DC1 or DC2. There must be an SRDF/A connection to either DC1 or DC2: • Attempt reset clear of all systems at DC1/DC2 EMC GDDR supported scripts 135
  • 136. EMC Geographically Dispersed Disaster Restart (EMC GDDR) • Activate all needed LPARs • Activate CBU (if required) • Create a consistency point at DC3 • Prepare SRDF environment • IPL all needed production systems ◆ Restart production LPARs at DC3 SRDFA to DC1 or DC2 — In the event of a major failure that prevents the production workload from being run from either DC1 or DC2, restart the production LPARs at site DC3 and reestablish SRDF/A to site DC1 or DC2: • Attempt reset clear of all systems at DC1/DC2 (except C-System) • Activate all needed LPARs • Activate CBU (if required) • Create a consistency point at DC3 • Prepare SRDF environment • IPL all needed production systems • Perform a differential CREATEPAIR between DC3 and DC1 or DC2 • Start SRDF/A in MSC mode ◆ Recover at DC3 after LDR at DC1 SRDFA to DC2 — In the event of a local disaster (LDR) that prevents the production workload from being run at DC1, and DC2 is a bunker site (storage only), restart the production LPARs at site DC3 and reestablish SRDF/A to DC2. The distinction between this script and the 'Restart production LPARs at DC3 SRDFA to DC1 or DC2' scripts is that the 'Star C-Ahead' indicator is interrogated to enable the workload to restart with the DC3 data, or with the DC2 data if it is more current: • Activate all needed LPARs including CFs at DC3 • Activate CBU (if required) • Create a consistency point at DC3 • Prepare SRDF environment • IPL all needed production systems • Perform a differential CREATEPAIR between DC3 and DC2 • Start SRDF/A in MSC mode 136 GDDR Solution Design and Implementation Techniques
  • 137. EMC Geographically Dispersed Disaster Restart (EMC GDDR) The remainder of the TechBook will describe the lab environment, document the installation and configuration of GDDR, and document running scripts with more detail on the steps executed in the scripts. EMC GDDR supported scripts 137
  • 138. EMC Geographically Dispersed Disaster Restart (EMC GDDR) 138 GDDR Solution Design and Implementation Techniques
  • 139. 5 GDDR Managed Systems Design Considerations This chapter contains the following: ◆ Overview ........................................................................................... 140 ◆ Ignore, Reset Clear, or IPL in place................................................ 142 ◆ Contingency systems and workload movement ......................... 143 ◆ HMC-Only systems (new in GDDR 3.1)....................................... 144 ◆ LPAR and CEC recovery ................................................................. 146 ◆ Automated Configuration Check - DASD ................................... 149 GDDR Managed Systems Design Considerations 139
  • 140. GDDR Managed Systems Design Considerations Overview One of the many decisions that must be made as part of a GDDR implementation is whether or not customers want GDDR to manage production systems, production LPARs and/or production CPCs from GDDR. It is important to understand that systems management in GDDR is optional. Some customers choose to have GDDR automation extend into managing zSeries systems, LPARs and workloads, other customers want some systems managed by GDDR and others excluded and finally there are a set of customers who do not want GDDR to do anything to their environment other than to manage EMC infrastructure software such as SRDF, TimeFinder and AutoSwap. The GDDR systems management functions are flexible and can accommodate this wide ranging set of customer operational requirements. If a decision is made for GDDR to manage system recovery beyond just the storage layer, a further decision needs to be made whether to include systems management into the scope of the implementation project or defer systems management to a later date. Deferring systems management to a second phase of implementation does make the implementation project simpler and shorter. When a decision is made to implement GDDR systems management, the following features/functions must be considered for each system to be managed by GDDR: ◆ Ignore, Reset Clear or IPL in place ◆ Contingency systems and workload movement ◆ HMC Only ◆ LPAR and CPC recovery Production systems under GDDR control are known as P-Systems or sometimes “managed systems.” These are the customer systems that run application workload. Commencing with GDDR 3.1 EMC began introducing more functionality into CEC, LPAR and system recovery options. Some customers use GDDR to manage their P-Systems and others chose to manage these systems themselves. GDDR has been architected to provide significant flexibility when it comes to P-System management. 140 GDDR Solution Design and Implementation Techniques
  • 141. GDDR Managed Systems Design Considerations For customers that have more complex environments it helps to build a GDDR managed system recovery map similar to Table 6 on page 141. This documents the actions that GDDR is to take for event management on any of the managed systems. Table 6 GDDR managed system recover map LPAR NAME E01 E02 E03 E04 F01 F02 F03 Location DC1 DC1 DC1 DC1 DC2 DC2 DC2 CPC SYSTEM E SYSTEM E SYSTEM E SYSTEM E SYSTEM E SYSTEM E SYSTEM E IBM IBM IBM IBM IBM IBM IBM 2096-S07-R04 2096-S07-R04 2096-S07-R04 2096-S07-R 2096-S07-R04 2096-S07-R 2096-S07-R0 #012345 #012345 #012345 04 #012345 #6789A 04 #6789A 4 #6789A Workload Production Production Development Test Production Production Development type Sysplex Prod Prod Prod Prod Name System MVSA MVSB MVSC MVSD MVSE MVSF MVSG Name Desired UP UP UP UP UP UP UP State Recover Yes Yes Yes No Yes Yes Yes Recovery Workload Workload Recovery Reset Clear IPL in place type Move Move LPAR Recovery F01 F02 F03 location Activate Y Y N CBU Overview 141
  • 142. GDDR Managed Systems Design Considerations Ignore, Reset Clear, or IPL in place When a system is defined to GDDR systems management a decision must be made regarding GDDR treatment of a failure of the defined system. The options are: ◆ Ignore — The system is defined to GDDR but this instructs GDDR to take no action if GDDR observes that this system has failed. Failure is typically “noticed” by GDDR when the EMC SCF product “de- registers” the system typically because of polling time-out. ◆ Reset Clear — If GDDR detects that the system has crashed (such as SCF de-registration), GDDR will access the HMC and RESET CLEAR the LPAR but take no other action. ◆ IPL in place— If GDDR detects that the system has crashed, GDDR will access the HMC, RESET CLEAR the system and then IPL in place. 142 GDDR Solution Design and Implementation Techniques
  • 143. GDDR Managed Systems Design Considerations Contingency systems and workload movement Contingency systems are “hot standby” systems that are IPLed and active and (typically) are located in a different data center. Often, Contingency Systems are the “remote half” of a geographically separated, single site workload, parallel Sysplex. In the event of a primary site failure (for example system MVSA fails in LPAR E01 in the chart above) GDDR will trigger workload restart in system MVSE at site DC2 in LPAR F01. GDDR does this by sending message GDDX191I to appear on the MVSE console. A message rule must be written and executed for MVSE to intercept the GDDX191I message and fire a REXX rule for that system to carry out the actions necessary to start the production workload. This message interception REXX rule would be written for whatever automation package runs on MVSE. Contingency systems and workload movement 143
  • 144. GDDR Managed Systems Design Considerations HMC-Only systems (new in GDDR 3.1) For systems defined as “HMC-Only” GDDR can be used in place of native access to the HMC to perform system options. Interaction with HMC-Only systems is obtained through the “LPAR Actions” panel, shown in Figure 39 on page 144, by choosing option “L”. Figure 39 GDDR Actions Menu After choosing option “L” a screen similar to Figure 40 on page 145, is displayed. Systems defined as HMC Only will appear. The “T” column below shows the GDDR system type. OS in the display indicates that this is a “HMC Only” system with Standard IPL Parms. For a full description of fields in this display, refer to the GDDR product guides. 144 GDDR Solution Design and Implementation Techniques
  • 145. GDDR Managed Systems Design Considerations Figure 40 GDDR Perform HMC LPAR Actions All of the functions displayed on this screen are protected by SAF profiles. This offers more controls than are available from direct HMC access. Customers choose to add “HMC Only” systems so that they have more control over access controls to these powerful system commands. HMC-Only systems (new in GDDR 3.1) 145
  • 146. GDDR Managed Systems Design Considerations LPAR and CEC recovery GDDR can tolerate the loss of a C-System. In the event of a C-System failure GDDR will report “degraded mode” errors periodically until the C-System rejoins the GDDR complex. In the case of the loss of a GDDR Master C-System, another GDDR C-System will take over the master function. Again, “degraded mode” will be reported until the missing C-System returns. LPAR Recovery can be specified for GDDR C-Systems which may help minimize the amount of time GDDR operates in “degraded mode.” For production systems, the following options are available: ◆ Ignore ◆ IPL in place ◆ Reset Clear ◆ Start workload in contingency systems. The “desired state” indicator can be useful if a system needs to be down for maintenance ensuring that GDDR will not kick into action. If SCF is topped normally on that system GDDR will not kick into action anyway, as the SCF controls the de-registration process and GDDR will know that the system has been shut down in a controlled manner. For managed systems, the location indicator is either set to “H” indicating “H”ome location for this system or “A” indicating “A”way, meaning that this system is operating in its recovery location. It is possible that the Home and Away locations are in the same data center, but commonly the Away location is a different data center. For managed systems, the following should be considered: ◆ Multi-site workloads do not have CONTingency systems. ◆ A system can have LPAR-RECOVERY specified indicating that the workload can be restarted in a different LPAR to where the system is currently running. The recovery LPAR can be in the same or different data center. ◆ Recovery LPARs defined to GDDR as REGULAR LPARs, but are defined a second time, on the system Recovery Attributes panel, as recovery for a protected systems 146 GDDR Solution Design and Implementation Techniques
  • 147. GDDR Managed Systems Design Considerations ◆ A protected system can have a CONTingency system as a Recovery LPAR ◆ Sysplex systems with more than one LPAR can have Recovery LPARs defined, which could be in the primary or secondary data center. ◆ So for systems protected with LPAR recovery, they are either Home or Away but their SITE value does not change when they are “Away”. ◆ Systems NOT protected by LPAR Recovery can (optionally) have a second regular LPAR defined for them, on a different site. These second LPARs could even be the home location of a system protected with LPAR_Recovery. ◆ The difference between a Recover LPAR definition and a second LPAR definition is subtle: • The second LPAR definition is NOT considered as a recovery option, and will never lead to a system running in that location to be displaced. • The SITE value for a system NOT protected with LPAR_RECOVERY changes depending on whether it is running in its first or second LPAR. ◆ LPAR Recovery for C-systems: avoid degrade mode when a C-system is lost. ◆ GDDR LPAR Recovery supports repeated failures: LPAR in Away location is protected with LPAR Recovery in Home location Planned CPC Swap This is a new Planned Script, triggered from the “Perform HMC LPAR Actions” panel. While we have that panel up here, let’s mention a few other changes to it which happened in 3.2: ◆ Action X LOAD Recovery: for a system protected with LPAR recovery only ◆ E show events: queries GDDR global variables for events related to a system ◆ T Toggle Desired state ◆ Site column: now indicates site of the LPAR where the system currently runs ◆ T Type of IPL parameters: S: STD, A: Alternate B: BCV LPAR and CEC recovery 147
  • 148. GDDR Managed Systems Design Considerations ◆ IPL-parms shown as “*” (not new) IPL parms are only shown if their definition matches the current Primary DASD site 148 GDDR Solution Design and Implementation Techniques
  • 149. GDDR Managed Systems Design Considerations Automated Configuration Check - DASD Traditionally GDDR parameter validation has been very much limited to checking syntax and internal consistency, with relatively few verifications against real world data. GDDR 3.2 introduced the Automated Configuration Check—DASD function with the GDDR ACCD utility. One of the more challenging parts of defining an environment to GDDR has always been to ensure correct definition of large and complex SRDF configurations. GDDRACCD removes the uncertainty about the correctness of an existing GDDR SRDF device definition, by checking the real configuration found in the GDDR managed EMC Symmetrix storage. GDDR variables influencing the expected configuration GDDRACCD works is by formulating a detailed expectation about the configuration based on a whole series of GDDR global variables. This expectation is then compared against what we find in the DMX. 5 phases: Phase 1 Validation for internal consistency and exit if errors: no point checking the DMX if we already know the parms are bad. Phases 2 and 3 Query the box and report deviations. Phase 4 Provides information allowing the user to verify what GDDRACCD based its decisions on. Phase 5 RDF.DEVICES parms allow comparison to existing parms, but differences are to be expected: ◆ Real parms could be a subset of generated parms ◆ Ranges in real parms could for a split of generated ranges ◆ Gatekeepers will certainly be different Automated Configuration Check - DASD 149
  • 150. GDDR Managed Systems Design Considerations 150 GDDR Solution Design and Implementation Techniques
  • 151. 6 Storage Infrastructure Design Considerations This chapter contains the following: ◆ Storage configuration and layout .................................................. 152 ◆ Infrastructure recommendations ................................................... 153 ◆ EMC software levels ........................................................................ 158 ◆ Sysplex and coupling facility considerations............................... 160 Storage Infrastructure Design Considerations 151
  • 152. Storage Infrastructure Design Considerations Storage configuration and layout Symmetrix storage systems incorporate many standard features that ensure a higher level of recoverability and availability than other storage systems. They are configurable at the logical volume level so different protection schemes can be applied to different classes of data within the same Symmetrix storage system on the same physical device. Customers choose data protection options, such as the following, to match their data requirements: ◆ Mirroring (RAID 1) or RAID 10 ◆ RAID 6 (6+2) and RAID 6 (14+2) ◆ RAID 5 (3+1) and RAID 5 (7+1) ◆ Symmetrix Remote Data Facility (SRDF) ◆ TimeFinder ◆ Dynamic Sparing ◆ Global Sparing 152 GDDR Solution Design and Implementation Techniques
  • 153. Storage Infrastructure Design Considerations Infrastructure recommendations There is benefit is standardizing configurations "best practices" across the install base of GDDR worldwide. This makes the support function more efficient in that support people will be able provide guidance aided by common naming conventions and infrastructure layout. Also, the recommendations in this section are designed to minimize the overheads associated with large infrastructure deployments. A good example is the number of device ranges. Having very small device ranges increases the overheads associated with infrastructure manipulations - consuming more C-System memory, more worker tasks, more commands issued to the arrays and so on. Engaging the GDDR Solution Support team very early in your design phase will ensure an appropriate trade-off between configuration flexibility and associated overheads. Gatekeeper device list A Gatekeeper device is a reserved device used to direct commands to a particular array(s). Due to the volume of the commands during certain activities, such as data center swaps, the volume of commands to these devices can be quite high. GDDR does not require gatekeeper devices for itself but some underlying technologies do require these devices. For performance and to avoid potential conflicts, gatekeeper devices for SRDF/S, SRDF/A, MSC/Star etc must be defined and must be separate devices. It is very important to understand the layout of gatekeeper devices and which component uses gatekeepers. It is recommended to create and maintain charts similar to these for each array in the solution. This example shows a list of the SRDF Gatekeeper devices and RDF groups for a SRDF/Star with AutoSwap configuration with one VMAX frame located at each site. The following list of SRDF Gatekeeper devices & RDF groups used at each location describes the intended purpose for use in commands. The required Host Component (H.C.) command prefix is also provided to help facilitate command generation for each specific RDF group. This additional information can be helpful in a configuration where multiple SRDF Host Component tasks are present. Infrastructure recommendations 153
  • 154. Storage Infrastructure Design Considerations DC1 Complex: GDDRPARM Gatekeeper = 803D-8075 (00E1-0119) H.C. Prefix ## Customer-Specific-Name VMAX# 2455: SRDF/ S - STAR: Group = 10 R/W - AD: Group = 21 SRDF/ A - Concurrent STAR::Group = 20 MSC/STA Gatekeeper = 8037 (00DB) SRDFA/MSC Gatekeeper = 8076 (011A) SCF Gatekeeper = 8038-803A (00DC-00DE) CSC Gatekeeper = 803B-803C (00DF-00E0) DC2 Complex: GDDRPAR Gatekeeper = 903D-9075 (00E1-0119) H.C. Prefix ## Customer-Specific-Name VMAX# 2457: SRDF/S-STAR: Group=10 STAR-Recovery: Group=30 MSC/STAR Gatekeeper=9037 (00DB) SRDFA/MSC Gatekeeper=9076 (011A) SCF Gatekeeper=9038-903A (00DC-00DE) CSC Gatekeeper=903B-903C(00DF-00E0) 154 GDDR Solution Design and Implementation Techniques
  • 155. Storage Infrastructure Design Considerations DC3 Complex: GDDRPARM Gatekeeper=803D-8075 (00E1-0119) H.C.Prefix ## Customer-Specific-Name VMAX# 2465: SRDF/A - Concurrent STAR: Group = 20 R/W - AD: Group = 21 STAR - Recovery: Group = 30 SRDFA/MSC Gatekeeper = 8076 (011A) SCF Gatekeeper = 8038-803A (00DC-00DE) CSC Gatekeeper = 803B-803C (00DF-00E0) Started tasks names EMC recommends the following conventions are adopted for started task names. STCs in DC1 GDDR managed Production LPARs: ◆ EMCBASE • SYSISVP.GDDR.PARMLIB(SCFINI00) ◆ EMCCGRP • SYSISVP.GDDR.PARMLIB(CONGRP34) ◆ EMCSRDF • SYSISVP.GDDR.PARMLIB(EMCCMD00) ◆ GDDRMAIN • SYSISVP.GDDR.PARMLIB(GDDMPARM) GDDR Control LPAR Customer-Specific-Name: ◆ GDDRSCF • SYSISVP.GDDR.PARMLIB(SCFINI00) ◆ GDDRSRDF • SYSISVP.GDDR.PARMLIB(EMCCMD00) Infrastructure recommendations 155
  • 156. Storage Infrastructure Design Considerations ◆ GDDRCGRP • SYSISVP.GDDR.PARMLIB(CONGRP34) ◆ GDDRMAIN • SYSISVP.GDDR.PARMLIB(GDDMPARM) ◆ GDDREVM • SYSISVP.GDDR.PARMLIB(GDDREVMP) ◆ GDDRHBM • SYS1.GDDR.PARMLIB(GDDRHBMP) ◆ GDDRWORK ◆ GDDRPARMS • SYSISVP.GDDR.PARMLIB STCs in DC2 GDDR Control LPAR Customer-Specific-Name: ◆ GDDRSCF • SYSISVP.GDDR.PARMLIB(SCFINI00) ◆ GDDRSRDF • SYSISVP.GDDR.PARMLIB(EMCCMD00) ◆ GDDRCGRP • SYSISVP.GDDR.PARMLIB(CONGRP34) ◆ GDDRMAIN • SYSISVP.GDDR.PARMLIB(GDDMPARM) ◆ GDDREVM • SYSISVP.GDDR.PARMLIB(GDDREVMP) ◆ GDDRHBM • SYSISVP.GDDR.PARMLIB(GDDRHBMP) ◆ GDDRWORK ◆ GDDRPARMS • SYSISVP.GDDR.PARMLIB STCs in DC3 GDDR Control LPAR Customer-Specific-Name: ◆ GDDRSCF 156 GDDR Solution Design and Implementation Techniques
  • 157. Storage Infrastructure Design Considerations • SYSISVP.GDDR.PARMLIB(SCFINI00) ◆ GDDRSRDF • SYSISVP.GDDR.PARMLIB(EMCCMD00) ◆ GDDRMAIN • SYSISVP.GDDR.PARMLIB(GDDMPARM) ◆ GDDREVM • SYSISVP.GDDR.PARMLIB(GDDREVMP) ◆ GDDRWORK ◆ GDDRHBM • SYSISVP.GDDR.PARMLIB(GDDRHBMP) Important dataset names The following is a list of the dataset names which are used by the various LPARs for GDDR control. These are the names as supplied by EMC, regardless of the naming convention used it is important to ensure that any support staff are familiar with the naming convention: ◆ SYSISVP.GDDR.* ◆ SYS1.GDDR.* ◆ SYS1.EMC.* Infrastructure recommendations 157
  • 158. Storage Infrastructure Design Considerations EMC software levels Maintain a list of software levels, as shown in Table 7. Table 7 Software levels Customer-Specific-Name SOFTWARE LEVELS Updated 8/17/2010 Mainframe Enablers Version Version - Fix Level Resource Pack Base 7.x 7.x - SF7xxxx Consistency Group 7.x 7.x - SC7xxxx SRDF Host Component 7.x 7.x - SR7xxxx TimeFinder Mirror 7.x 7.x - ST7xxxx TimeFinder Snap 7.x 7.x - SN7xxxx TimeFinder Utility 7.x 7.x - SU7xxxx GDDR 3.x 3.x - GD3xxxx Microcode VMAX base code 5874.230.183 VMAX E-Packs RB/23 with 50853, 50854 DMX-3/DMX-4 base code 5773.155.107 E-Packs RB/53 with 49123, 49125 LPAR configuration specifics These are guidelines/best practices for GDDR C-System LPAR configurations. Each GDDR LPAR will Require: ◆ 2 GB Storage ◆ 2 Shared CPs ◆ Minimum 25 Mod-9 DASD Units (or equivalent if different volume sizes are used) 158 GDDR Solution Design and Implementation Techniques
  • 159. Storage Infrastructure Design Considerations GDDR LPAR Configuration Restrictions and Considerations: ◆ GDDR Control LPAR DASD volumes are not shared with any LPAR ◆ No Coupling Facility or Specialty Engine Requirements. ◆ No DR Requirements for GDDR Control LPARs. ◆ Initial GDDR LPAR to be modeled from CMCX LPAR ◆ Maintain recommended EMC Symmetrix Microcode Levels ◆ Initial Security Databases created from customer-environment LPAR database ◆ LPAR Access Restricted to Infrastructure Support Staff GDDR - Required Automation and Software Elements: ◆ z/OS 1.x Operating System ◆ CA/ACF2 Security ◆ EMC SRDF Product Suite Example customer Infrastructure Management Components: ◆ IOC Console Operations Support / Automation (Tivoli Netview, SA for z/OS) ◆ Storage (SMS, DFHSM DASD Space Mgmt, Tape Mgmt) ◆ Network (FTP, Firewall, TN3270, NDM) ◆ Systems Mgmt (Incident, Change (ECMS)) ◆ Change Mgmt (Version and Change Control) ◆ SMF and Logrec Data Collection (Security, Billing, Performance) ◆ Output Archival / Retrieval / Spool Mgmt (product) ◆ Infrastructure Job Scheduling (Data Collection, Infrastructure Backups Cross System Communications (CSC) Requirements: Update the SCFINIxx member in SYSISVP.GDDR.PARMLIB with the following updates on all LPARs (PROD and GDDR C-Systems): ◆ SCF.CSC.IDLEPOLL=5 ◆ SCF.CSC.EXPIRECYCLE=20 ◆ SCF.CSC.MITPERIOD=60 EMC software levels 159
  • 160. Storage Infrastructure Design Considerations Sysplex and coupling facility considerations GDDR operations, system management and enablement of data consistency are compliant with z/OS Sysplex availability, design and operations. GDDR extends Sysplex availability by addressing a wide range of planned and unplanned operations for Sysplex systems, DASD resources, Sysplex resources, Central Processors, processor capacities and sites. Sysplex systems require a single time source and synchronous mirroring between the source DASD and recovery DASD. Asynchronous mirroring does not support Sysplex operations, therefore Sysplex linkage between asynchronous operational sites is a not a consideration for sites using SRDF/A replication - either as a two-site GDDR SRDF/A or to the 3rd site in STAR configurations. Sysplex configuration is a key component of business continuity capability of a proposed solution. The design of the Sysplex, its required connectivity and the topology of shared Sysplex resources has a role in the solution's redundancy, performance and business continuity. Typically, solutions designed for higher availability and continuous availability have increasingly complex Sysplex considerations. Whereas, GDDR CONGROUP solutions designed for site switch and system restart require very modest Sysplex considerations between the sites. GDDR's C systems do not have any Sysplex operational requirement. The C systems are stand alone systems outside any of the managed Sysplexs. This isolates the C systems from Sysplex wide issues, simplifies C system implementation and allows the C systems to manage multiple Sysplexs simultaneously. It also allows asynchronous mirroring solutions and some synchronous mirroring solutions without any cross site required Sysplex connectivity without degradation of Recovery Time Objectives. When data is shared within a Sysplex data consistency is presumed and GDDR's architecture should align with Sysplex requirements for data consistency. In most instances a Sysplex's mirrored DASD resources required for the restart of systems and workload should belong to a single CONGROUP. There is no limitation to the number of Sysplexs encompassed by a single CONGROUP and managed by GDDR. 160 GDDR Solution Design and Implementation Techniques
  • 161. Storage Infrastructure Design Considerations Significant variations in the required Sysplex connectivity depend on the Sysplex topology, the models of the z/series hardware, Sysplex timing mechanisms and the desired level of business continuity. Careful planning of the cross site linkages, supported distances, sufficient redundancy and considerations for performance impact must be part of the early design and project planning. Typically, when architecting a multisite Sysplex, DWDMs are employed to coalesce, aggregate and managed the Sysplex connectivity as well as cross site DASD and mirroring connectivity. Solutions including AutoSwap are easily facilitated by cross site DASD connectivity over DWDMs. Furthermore, many of the z/series linkage protocols can not employ channel extension, requiring dark fiber or DWDM connectivity. When the solution includes AutoSwap, regardless of the Sysplex topology all DASD is required to stay up: ◆ Must be mirrored in SRDF/S ◆ Must be available to the systems at DC1 and DC2 ◆ Must have all R1-devices on 1 site, and all R2-devices on the opposite site ◆ If it has a UCB on the R1-side, must also have a UCB on the R2-side When employing Sysplex LOGR the Sysplex LOGGER data sets should be mirrored with SRDF/S and SRDF/A in SRDF/Star configurations. Single Site Sysplex All active production systems reside at primary site, the systems comprising the Sysplex can not span DC1 and DC2. This is usually representative of GDDR CONGROUP or GDDR SRDF/A solutions where a complete Sysplex restart is required to restart systems and workload at the recovery site. No Sysplex connectivity is required between sites, Sysplex time is not an issue, duplexing of structures between sites is not indicated-structures would be rebuilt when the systems are IPLed at the recovery site. Restart of systems can be facilitated by the occasional mirroring of system resources including the Sysplex couple data sets. The Sysplex couple data should not be mirrored using SRDF/S and not in the CONGROUP (consider mirroring with Sysplex and coupling facility considerations 161
  • 162. Storage Infrastructure Design Considerations Adaptive Copy). Alternatively, with minimal cross site DASD connectivity the Sysplex couple data sets can be mirrored with Sysplex couple data set duplexing. In the event of a CONGROUP trip, GDDR doesn't attempt to stop all systems to preserve the consistency of the target SRDF/S DASD and the contents of the target duplex Coupling Facility structures. This technique has only limited value during the restart of systems. Multi-Site Sysplex This architecture is consistent with solutions requiring high or continuous levels of availability, when some subset of operations is expected to continue in the event of systems, DASD, site failures or multiple failures. The z/series systems hosting z/OS systems in the multisite Sysplex at both sites must share a single time consistency in a Common Time Network (CTN). The CTN network topology, redundancy and protocols depend up the timer technology employed. When employing STP, placement of the PTS and BTS is an important consideration. Base Sysplex requires cross site CTC connectivity between all z/series CPCs, single time consistency and common Sysplex couple data sets. Parallel Sysplex employs Coupling Facilities to facilitate system to system communications. CFs can be Internal or External Coupling Facilities and all systems require connectivity to the CFs, these Sysplex links should be redundant. CFs should be deployed in a manner consistent with the business continuity objectives. If structures are duplexed these links should be redundant. AutoSwap is usually included in the implementation to eliminate DASD as a single point of failure. If so, all DASD required for operations must be mirrored. The Sysplex couple data sets must not be mirrored with SRDF/S and must rely upon Sysplex couple data set duplexing. Cross site DASD connectivity consistent with cross site operations is typically a consideration. 162 GDDR Solution Design and Implementation Techniques
  • 163. 7 Best Practices for Network Design This chapter contains the following: ◆ Overview ........................................................................................... 164 ◆ Understanding bandwidth requirements..................................... 165 ◆ Synchronous network design considerations .............................. 168 ◆ Asynchronous network design considerations ........................... 172 Best Practices for Network Design 163
  • 164. Best Practices for Network Design Overview After the synchronous and/or asynchronous bandwidth assessment has been conducted attention turns to the design and construction of the network. There are many different types of networks but for this discussion we will differentiate between networks designed to carry synchronous workloads and those designed to carry asynchronous workloads. Delays in the provisioning, debugging and load testing of network infrastructure are one of the most common causes of delay for GDDR projects. Remember the constraint framework for the project, if your network provisioning is late, be prepared to adjust the implementation timelines around new and realistic dates. 164 GDDR Solution Design and Implementation Techniques
  • 165. Best Practices for Network Design Understanding bandwidth requirements Before a network topology can be designed the data change rate must be measured. This change rate will be used to determine both synchronous and asynchronous network bandwidth requirements. EMC TS has toolsets and methods that are used to measure existing I/O flows to construct data replication requirements over time. Replication requirements not only vary through out the 24 hour day (batch vs online) but also throughout the month year, with most enterprises having at least one peak processing period. Therefore the data sampling for modeling must span a critical time period or at least be “factored” from the observation period based on the known difference between the observation period and the peak period. Both SRDF/S and SRDF/A use block level I/O for replication, therefore the collected data can be used for both modeling exercises. The actual bandwidth requirement will typically vary between the two types of networks (even for combined Synchronous and Asynchronous). For HA networks page volumes and temporary data sets are included in the replication set; for asynchronous networks these data types are typically excluded since in the event that the DR site copy is required the contents of the page volumes and temporary data set volumes are not required for system restart. Note: Figure 41 on page 166 shows a measured workload with significant write I/O imbalance across the three Symmetrix. Part of the deployment of the project will be to move workloads between the arrays to achieve a more balanced I/O profile. Understanding bandwidth requirements 165
  • 166. Best Practices for Network Design Figure 41 Write distribution The raw (uncompressed) bandwidth requirement is determined by multiplying the number of writes observed in a specific interval by the size of the write. Earlier design tools use the Connect Time to determine transmission interval and then calculate blocksize. This method does not work for FICON attached arrays so therefore an average block size is determined by dividing the total bytes transferred in the time interval by the number of writes observed. The method also varies based on data source, such as SMF records versus internal EMC Symmetrix STP data. For synchronous workloads, performance is critical. Therefore this analysis includes a check of “hot volumes” —if detected—a list of volumes is produced where the overhead of synchronous replication is likely to be problematic. Further analysis of these volumes is required to determine the contents of the volume and identify mitigation techniques, which may include: ◆ Re-blocking data sets 166 GDDR Solution Design and Implementation Techniques
  • 167. Best Practices for Network Design ◆ Separating hot data sets ◆ Deploying SMS data striping ◆ Changing ACS routines ◆ Balancing workload across Symmetrix arrays At the conclusion of the study period a report is produced which will include a profile of the observed bandwidth requirement as shown in Figure 42 on page 167. Note: Many customers are accustomed to working with their local account teams for the sizing of infrastructure deployments. Arguably though, your account team will not have experience designing for this type of complex solution. The GDDR Solution Support Team can help to direct EMC Corporate resources to ensure that this vital piece of supporting infrastructure meets best practice standards. Figure 42 Bandwidth requirement Understanding bandwidth requirements 167
  • 168. Best Practices for Network Design Synchronous network design considerations All array-based synchronous data replication creates an overhead on write response times. For this reason it is absolutely essential to understand the impact of synchronous data replication before building out any infrastructure. EMC has sophisticated modeling tools which can be used to apply the overheads of SRDF/S synchronous mirroring at zero distance, and then apply latency into the model as the distance between the primary (R1) and secondary (R2) storage are increased. At some point in these modeling studies the maximum amount of separation between the R1 and R2 becomes known when the increase I/O response time begins to slow write throughput to the point that the observed write activity to the volume can no longer be sustained. Post implementation, SRDF/S overheads can be seen as “disconnect” time when looking at I/O response times. Figure 43 on page 169 shows the results of one part of a typical synchronous design study output: DASD volumes sorted into descending write activity. Looking more closely at volume CI9003, it can be seen to have a peak write activity of 300 writes per second. In effect this volume could have an average response time of 3.3 milliseconds and sustain a throughput of 300 writes per second. A separate analysis (not show) reveals the average response time to this volume as 0.5 milliseconds. By subtracting 0.5 from 3.3—reveals that the overheads of synchronous replication cannot impose more than 3.3 – 0.5 = 2.8 milliseconds to this I/O before the application would 168 GDDR Solution Design and Implementation Techniques
  • 169. Best Practices for Network Design experience performance degradation (because the throughput requirement of 300 writes cannot be maintained if the average response time exceeds 3.3 milliseconds. Figure 43 DASD write activity by volser Applying this logic to all volumes in the study group produces a chart similar to Figure 44 on page 170. The chart shows the existing throughput requirement (in blue) and the likely throughput requirement (in red) when synchronous overheads at the customer-specific distance between the R1 and R2 devices is applied to the existing write I/O response time. Synchronous network design considerations 169
  • 170. Best Practices for Network Design Figure 44 DASD Volumes Flagged for Tuning Tuning for synchronous replication can be done well in advance of the implementation of GDDR. The following techniques are typically used to address performance issues: ◆ Use sequential data striping ◆ Change the blocksize of the data set write. Many of the overheads associated with synchronous replication are fixed, such as the speed of light problem. This is the case whether the block being transmitted is small or large. Therefore, large block writes are more efficient in terms of write throughput over time in synchronous replication than smaller block sizes. ◆ Separate “hot” data sets — Two or more write I/O intensive data sets can be moved from the same volser and separated. ◆ Change ACS routines — May be needed to ensure that separated data sets remain separated. Most customers deploy a “dark fiber” network to support synchronous replication. The Network Terminating Equipment (NTE) for these networks is usually a Dense Wave Division Multiplexer (DWDM). DWDMs are able to concentrate multiple wavelengths onto single fiber(s) laid between data centers. DWDMs are very low latency so the greatest impact in terms of write I/O performance is typically the effect of the speed-of-light transmission of data through the fiber and is directly proportional to the distance of the fiber route. 170 GDDR Solution Design and Implementation Techniques
  • 171. Best Practices for Network Design Another advantage to the DWDM approach is for customers looking to build HA infrastructures. In addition to the Fibre Channel packets generated by SRDF/S replication, the same fiber can carry FICON signals necessary to support AutoSwap. In some instances customers will build HA DASD infrastructures where the R1 and R2 devices are in the same data center (perhaps separated by a thermal firewall), in which case direct connect or connect via cascading directors is common. Synchronous network design considerations 171
  • 172. Best Practices for Network Design Asynchronous network design considerations Asynchronous replication is provided by SRDF/A. The interface to SRDF/A is via EMC Multi-Session Consistency (MSC) software. MSC provides an API interface used by GDDR to control both SRDF/A running in “MSC mode” and also SRDF/A when it is running in “Star Mode”. The GDDR for SRDF/A configuration (2 site) controls SRDF/A replication using the MSC API. The same MSC API interface is used by GDDR to control GDDR for SRDF/Star solutions (3 site). For more specific information on these control interfaces, refer to the sections “SRDF/Star with AutoSwap” on page 85, “GDDR for SRDF/S with ConGroup (Two sites - DC1 and DC2)” on page 119, and “GDDR for SRDF/A (Two sites - DC1 and DC3)” on page 123. SRDF/A is designed as a cache-to-cache replication infrastructure. It typically supports cycle times down to three second durations, assuming that the network infrastructure is designed to carry the workload. Since it is theoretically possible to lose up to 2 SRDF/A cycles in a DR event, the RPO is calculated by multiplying the SRDF/A cycle time by 2. Thus, a 3 second cycle time is said to support an RPO of 6 seconds. It is important to understand that the RPO is an objective. At times it may not be possible for the replication infrastructure to maintain the objective time. Examples include: program loops that generate large amounts of write I/O, unusually high write activity such as during data base “reorgs,” a compromised network (example: a failed switch). EMC has instituted a number of measures to improve resiliency of SRDF/A replication: ◆ Transmit Idle — Temporarily suspend SRDF/A data transmission to allow SRDF/A replication to survive temporary link problems such as link “bounces”. ◆ Delta Set Extension (DSE) — Elongates the SRDF/A capture cycle during longer duration temporary link problems, instead of terminating SRDF/A replication. DSE uses a disk pool to buffer the current capture cycle data until such time as it can be de-staged across the link. ◆ Write Pacing — Slowing the arrival rate of write activity into the capture cycle by accepting the write to cache, disconnecting for a specified period of time (milliseconds) and then reconnecting to complete the I/O. 172 GDDR Solution Design and Implementation Techniques
  • 173. Best Practices for Network Design None of these improvements should be assumed to be substitutes to provisioning the network to the observed peak bandwidth requirement. To build one or two year operational resiliency into the network it should be provisioned at the peak observed rate plus the annual rate of increase of observed throughput. This can be calculated from the known growth in storage, assuming a growth rate of 20% over the past 12 months the allowance for capacity increase can be calculated as: Bandwidth_growth = (Total_Write_IOs * I/O Size * 20%) / Compression_Ratio Many of the GDDR supported configurations are very large. The designer must consider the scalability of the proposed network and also “troubleshooting.” For mainframe workloads EMC has observed FCIP networks to be reliable and scalable and relatively easy to troubleshoot and tune. Consider operational complexity when looking to tune or balance network workloads. Gig/E networks are becoming more popular but do introduce more complexity for workload balancing and, if used, WAN Optimizers can complicate the network architecture and impact the throughput, latency and scalability of the network. EMC has observed the following type of network topology to be very successful in large scale SRDF/A implementations, EMC therefore recommends the following network topology for SRDF/A replication. Figure 45 on page 174 shows 15 storage arrays in three-sites (DC1, DC2 and DC3) deploying multiple switches to aggregate bandwidth up to 10GB for long-distance transmission and then splitting back through switches on the target side to the Symmetrix at the remote location. Note how readily this architecture allows for scale of communications paths, storage devices and switches. Asynchronous network design considerations 173
  • 174. Best Practices for Network Design 60 61 BI BI 60 62 61 63 10Gbps ULH DC1 Bouh Router 62 64 63 65 BI BI 66 67 68 69 BI BI 68 6A 69 6B DC2 10Gbps ULH Bouh Router 6A 6C 6B 6D BI BI 6E SYM-002545 Figure 45 Recommended asynchronous network topology 174 GDDR Solution Design and Implementation Techniques
  • 175. Glossary This glossary contains terms related to disk storage subsystems. Many of these terms are used in this manual. A alternate track A track designated to contain data in place of a defective primary track. See also ”primary track.” actuator A set of access arms and their attached read/write heads, which move as an independent component within a head and disk assembly (HDA). adapter Card that provides the physical interface between the director and disk devices (SCSI adapter), director and parallel channels (Bus & Tag adapter), director and serial channels (Serial adapter). C cache Random access electronic storage used to retain frequently used data for faster access by the channel. cache slot Unit of cache equivalent to one track. channel director The component in the Symmetrix subsystem that interfaces between the host channels and data storage. It transfers data between the channel and cache. GDDR Solution Design and Implementation Techniques 175
  • 176. Glossary controller ID Controller identification number of the director the disks are channeled to for EREP usage. There is only one controller ID for Symmetrix. CKD Count Key Data, a data recording format employing self-defining record formats in which each record is represented by a count area that identifies the record and specifies its format, an optional key area that may be used to identify the data area contents, and a data area that contains the user data for the record. CKD can also refer to a set of channel commands that are accepted by a device that employs the CKD recording format. D DASD Direct access storage device, a device that provides nonvolatile storage of computer data and random access to that data. data availability Access to any and all user data by the application. delayed fast write There is no room in cache for the data presented by the write operation. destage The asynchronous write of new or updated data from cache to disk device. device A uniquely addressable part of the Symmetrix subsystem that consists of a set of access arms, the associated disk surfaces, and the electronic circuitry required to locate, read, and write data. See also ”volume.” device address The hexadecimal value that uniquely defines a physical I/O device on a channel path in an MVS environment. See also ”unit address.” device number The value that logically identifies a disk device in a string. diagnostics System level tests or firmware designed to inspect, detect, and correct failing components. These tests are comprehensive and self-invoking. director The component in the Symmetrix subsystem that allows Symmetrix to transfer data between the host channels and disk devices. See also ”channel director.” disk director The component in the Symmetrix subsystem that interfaces between cache and the disk devices. 176 GDDR Solution Design and Implementation Techniques
  • 177. Glossary dual-initiator A Symmetrix feature that automatically creates a backup data path to the disk devices serviced directly by a disk director, if that disk director or the disk management hardware for those devices fails. dynamic sparing A Symmetrix feature that automatically transfers data from a failing disk device to an available spare disk device without affecting data availability. This feature supports all non-mirrored devices in the Symmetrix subsystem. E ESCON Enterprise Systems Connection, a set of IBM and vendor products that connect mainframe computers with each other and with attached storage, locally attached workstations, and other devices using optical fiber technology and dynamically modifiable switches called ESCON Directors. See also ”ESCON director.” ESCON director Device that provides a dynamic switching function and extended link path lengths (with XDF capability) when attaching an ESCON channel to a Symmetrix serial channel interface. F fast write In Symmetrix, a write operation at cache speed that does not require immediate transfer of data to disk. The data is written directly to cache and is available for later destaging. FBA Fixed Block Architecture, disk device data storage format using fixed-size data blocks. FRU Field Replaceable Unit, a component that is replaced or added by service personnel as a single entity. frame Data packet format in an ESCON environment. See also ”ESCON.” G gatekeeper A small logical volume on a Symmetrix storage subsystem used to pass commands from a host to the Symmetrix storage subsystem. Gatekeeper devices are configured on standard Symmetrix disks. GB Gigabyte, 109 bytes. GDDR Solution Design and Implementation Techniques 177
  • 178. Glossary H head and disk A field replaceable unit in the Symmetrix subsystem containing the assembly disk and actuator. home address The first field on a CKD track that identifies the track and defines its operational status. The home address is written after the index point on each track. hyper-volume The ability to define more than one logical volume on a single extension physical disk device making use of its full formatted capacity. These logical volumes are user-selectable in size. The minimum volume size is one cylinder and the maximum size depends on the disk device capacity and the emulation mode selected. I ID Identifier, a sequence of bits or characters that identifies a program, device, controller, or system. IML Initial microcode program loading. index marker Indicates the physical beginning and end of a track. index point The reference point on a disk surface that determines the start of a track. INLINES An EMC-provided host-based Cache Reporter utility for viewing short and long term cache statistics at the system console. I/O device An addressable input/output unit, such as a disk device. K K Kilobyte, 1024 bytes. L least recently used The algorithm used to identify and make available the cache space by algorithm (LRU) removing the least recently used data. logical volume A user-defined storage device. In the Model 5200, the user can define a physical disk device as one or two logical volumes. 178 GDDR Solution Design and Implementation Techniques
  • 179. Glossary long miss Requested data is not in cache and is not in the process of being fetched. longitude redundancy Exclusive OR (XOR) of the accumulated bytes in the data record. code (LRC) M MB Megabyte, 106 bytes. mirrored pair A logical volume with all data recorded twice, once on each of two different physical devices. mirroring The Symmetrix maintains two identical copies of a designated volume on separate disks. Each volume automatically updates during a write operation. If one disk device fails, Symmetrix automatically uses the other disk device. P physical ID Physical identification number of the Symmetrix director for EREP usage. This value automatically increments by one for each director installed in Symmetrix. This number must be unique in the mainframe system. It should be an even number. This number is referred to as the SCU_ID. primary track The original track on which data is stored. See also ”alternate track.” promotion The process of moving data from a track on the disk device to cache slot. R read hit Data requested by the read operation is in cache. read miss Data requested by the read operation is not in cache. record zero The first record after the home address. S scrubbing The process of reading, checking the error correction bits, and writing corrected data back to the source. GDDR Solution Design and Implementation Techniques 179
  • 180. Glossary SCSI adapter Card in the Symmetrix subsystem that provides the physical interface between the disk director and the disk devices. short miss Requested data is not in cache, but is in the process of being fetched. SSID For 3990 storage control emulations, this value identifies the physical components of a logical DASD subsystem. The SSID must be a unique number in the host system. It should be an even number and start on a zero boundary. stage The process of writing data from a disk device to cache. storage control unit The component in the Symmetrix subsystem that connects Symmetrix to the host channels. It performs channel commands and communicates with the disk directors and cache. See also ”channel director.” string A series of connected disk devices sharing the same disk director. U unit address The hexadecimal value that uniquely defines a physical I/O device on a channel path in an MVS environment. See also ”device address.” V volume A general term referring to a storage device. In the Symmetrix subsystem, a volume corresponds to single disk device. W write hit There is room in cache for the data presented by the write operation. write miss There is no room in cache for the data presented by the write operation. 180 GDDR Solution Design and Implementation Techniques