SlideShare a Scribd company logo
IBM MQ High Availability
David Ware
IBM MQ Chief Architect
© 2019 IBM Corporation 1
© 2018 IBM Corporation
High availability
2
A system is said to be available if it is able
to perform its required function, such as
successfully process requests from
users.
A requirement, or a capability, of a
system to be operational for a greater
proportion of time than is common for
other, less important, systems.
Available Highly Available
Often, greater availability means greater complexity and cost
A sliding scale
0% 100%
© 2018 IBM Corporation
Measuring availability
3
Target
95%
99%
99.5%
99.9%
99.99%
99.999%
99.9999%
Impacts on availability
o Applying maintenance
o Likelihood of outages (meantime to
failure, and speed of recovery)
o Operational errors
Overall availability is the combined
availability of all components
o The platform
o The middleware
o The applications
Yearly outage
~ 18 days
< 4 days
< 2 days
< 9 hours
~ 1 hour
~ 5 minutes
~ 30 seconds
‘5 nines’
© 2018 IBM Corporation
Messaging system availability
4
Asynchronous messaging can improve application availability by providing a buffer but the
messaging system itself must be highly available to achieve that
Redundancy
Multiple active options
available for applications to
connect
Routing
Ability to route messages
around failures
Message availability
Critical messages are not
locked to a single runtime
and quickly available from
elsewhere
© 2018 IBM Corporation
Messaging system availability
5
Asynchronous messaging can improve application availability by providing a buffer but the
messaging system itself must be highly available to achieve that
Redundancy
Multiple active options
available for applications to
connect
Routing
Ability to route messages
around failures
Message availability
Critical messages are not
locked to a single runtime
and quickly available from
elsewhere
Required for continuous
availabilities Only required for certain message flows
© 2018 IBM Corporation
Messaging system availability – MQ
6
Asynchronous messaging can improve application availability by providing a buffer but the
messaging system itself must be highly available to achieve that
Redundancy
Multiple active options
available for applications to
connect
Routing
Ability to route messages
around failures
Message availability
Critical messages are not
locked to a single runtime
and quickly available from
elsewhere
Application client
connectivity
MQ Cluster routing MQ ‘HA’
© 2018 IBM Corporation
Application client connectivity
© 2018 IBM Corporation
Decouple the applications from queue managers
Applications locally bound to a queue manager will
limit the availability of the solution.
Running applications remote from the queue
managers, always connecting as MQ clients,
decouples the application and system runtimes,
enabling higher availability.
© 2018 IBM Corporation
Decouple the applications from queue managers
Step 1
Connect the application as a client
Benefits:
o Ability to support solutions where a queue manager
may fail-over between systems (more later).
o Separates application system requirements from the
queue manager’s, reducing maintenance conflicts and
therefore, availability.
o Restart times on either side can be reduced.
Should be relatively invisible to the application
o Don’t hardcode that connection configuration!
o Use client auto-reconnect to hide a queue manager
restart from the application
Qmgr
App
© 2018 IBM Corporation
Decouple the applications from queue managers
Step 2
Allow the application to connect to a set of queue managers
Benefits:
o Applications can continue to interact with MQ even
whilst a queue manager is failing over or unavailable
during maintenance
o With multiple applications connected, only a subset
will be impacted by a queue manager outage
How does your application find the queue manager?
o Network routing
o Connection name lists
o Client Channel Definition Tables
The application may need to re-evaluate how it exploits all of
MQ’s capabilities
o Message ordering may change if it is currently
expected across connections
o Applications may be reliant on transitory state:
o Dynamic queues and subscriptions
o Reply messages
o XA transaction recovery
Qmgr
App
Qmgr Qmgr
This might not work for all applications
© 2018 IBM Corporation
What do CCDTs enable?
These provide encapsulation and abstraction of
connection information for applications, hiding the
MQ architecture and configuration from the
application
They also enable security, high availability and
workload balancing of clients
Applications simply connect to an abstracted
“queue manager” name (which doesn’t need to be
the actual queue manager name – use an ‘*’)
CCDT defines which real queue managers the
application will connect to. Which could be a single
queue manager or a group of
Across a group, selection can be ordered or
randomised and weighted
CCDT
QMGRP1: QM1
QMGRP2: QM2
QMGRP2: QM3
Application 1
connect: *QMGRP1
Application 2
connect: *QMGRP2
Application 2
connect: *QMGRP2
QM1
QM2
QM3
© 2018 IBM Corporation
Creating the CCDTs
CCDTs can be used to represent connection details to multiple queue
managers
Prior to MQ 9.1.2, a CCDT is generated by using MQ tooling to define
CLNTCONN channels that identify the SVRCONN channels
Define multiple CLNTCONNs in a central place to generate the CCDT
It doesn’t have to be any of the queue managers owning the SVRCONNs
Pre-MQ V8: You needed a dedicated queue manager for this purpose
MQ V8+: Use runmqsc –n to remove the need for a queue manager
MQ 9.1.2 added the ability to build your own JSON format CCDT files.
This has also added the ability to define multiple channels of the same name on
different queue managers
A single CCDT for your MQ estate or one per application?
A single CCDT can be easier to create but updates can be expensive
Separate CCDTs make it easier to update when an application’s needs change
Create the QMgrs and define
their SVRCONNs
Centrally define all the
CLNTCONNs that represent
all the QMgrs
Take the CCDT and make
available to the connecting
applications
{
“channel”:[
{
“name”:”ABC”,
”queueManager”:”A”
},
{
“name”:”ABC”,
”queueManager”:”B”
},
]
}
© 2018 IBM Corporation
Accessing the CCDTs
CCDT files need to be accessible to the applications connecting to MQ
?
CCDT
HTTP
server
CCDT
QM1
QM2
QM3
Either accessible through the client’s filesystem
User needs to manage distribution of CCDT files themselves
Or remotely over HTTP or FTP
Available for JMS/XMS applications for a number of releases
Added for MQI applications in MQ V9 LTS
MQI App
MQCONN(QMGRP2)
CCDT
QMGR1
QMGR2
QMGR3
CCDT
QM1
QM2
QM3
QM2 QM3QM1 QM2 QM3QM1
MQI App
MQCONN(QMGRP2)
MQI App
MQCONN(QMGRP2)
MQI App
MQCONN(QMGRP2)
CCDT
QMGR1
QMGR2
QMGR3
CCDT
QMGR1
QMGR2
QMGR3
CCDT
QMGR1
QMGR2
QMGR3
MQI App
MQCONN(QMGRP2)
MQI App
MQCONN(QMGRP2)
MQI App
MQCONN(QMGRP2)
MQI App
MQCONN(QMGRP2)
© 2018 IBM Corporation
MQ Cluster routing
© 2018 IBM Corporation
Routing on availability with MQ Clusters
15
• MQ Clusters provide a way to route messages
based on availability
• In a cluster there can be multiple potential targets
for any message. This alone can improve the
availability of the solution, always providing an
option to process new messages.
• A queue manager in a cluster also has the ability to
route new and old messages based on the
availability of the channels, routing messages to
running queue managers.
• Clustering can also be used to route messages to
active consuming applications.
• Clustering is used by many customers who
operate critical services at scale
• Available on all supported MQ platforms
ServiceService
App 1App 1Client
Service
Queue Manager Queue Manager Queue Manager
Queue Manager
© 2018 IBM Corporation
MQ ‘HA’
(message availability)
© 2018 IBM Corporation
Message high availability
17
– Consider a single message
– Tied to a single runtime, on a single piece of
hardware
– Any failure locks it away until recovery
completes
The problem The objective
– Messages are not tied to a single anything
– In the event of a failure, there is a fast route
to access the message
© 2018 IBM Corporation
Message high availability
18
– Messages are highly available, through
replication
– Only one runtime is the leader and has access
to the messages at a time
– A failure results in a new leader taking over
– Any message is available from any runtime at
any time
– Coordinated access to each message
– A failed runtime does not prevent access to a
message by another runtime
Active / active messages Active / passive messages
© 2018 IBM Corporation
Message high availability
19
Active / active messages Active / passive messages
IBM MQ Distributed HA solutions
(but there’s nothing to stop you having multiple different
queue managers to share the load and the risk
More later)
IBM MQ for z/OS shared queues
© 2018 IBM Corporation
Coupling Facility
MQ for z/OS shared queues
20
–MQ queue-sharing groups (QSGs)
– Available with z/OS parallel sysplex
• A tightly coupled cluster of independent z/OS instances
– Multiple queue managers are members of a queue
sharing group (QSG)
– Shared queues are held in the Parallel SysPlex
Coupling Facility
• A highly optimised and resilient z/OS technology
– All queue managers in a QSG can access the same
shared queues and their messages
– Benefits:
ü Messages remain available even if a queue manager
fails
ü Pull workload balancing
ü Applications can connect to the group using a QSG
name
ü Removes affinity to a specific queue manager
Queue Sharing Group
QMgr QMgr
QMgr
App
App
App
© 2018 IBM Corporation
IBM MQ Distributed HA solutions
21
MQ managed
The resilient data and the automatic takeover is
provided by the MQ system
Externally managed
External mechanisms are relied on to protect
the data and provide automatic takeover
capabilities
System managed HA
QMgr QMgr
Multi-instance
queue managers
QMgr QMgr
MQ Appliance
QMgr
QMgr
Replicated data
queue managers
QMgr QMgr QMgr
new
© 2018 IBM Corporation
Externally managed HA
© 2018 IBM Corporation
System managed HA
23
The HA manager monitors the MQ system (e.g. a queue
manager in a VM or container), on detecting a failure it
will start a new system, remount storage and reroute
network traffic
– Relies on external, highly available, storage (e.g. SAN)
– A queue manager is unaware of the HA system
– Availability depends on speed to detect problems and
to restart all layers of the system required (e.g. VM
and queue manager)
Examples:
– HA Clusters
• Veritas Cluster Server, IBM PowerHA, Microsoft Cluster
Server
– Cloud platforms
• IBM Cloud, AWS EC2, Azure
– Containers
• Kubernetes, Docker Swarm
QMgr QMgr
HA manager
– Some systems can be relatively slow to restart
– Additional cost of infrastructure
– Multiple moving parts to configure and manage
active
active
passive
passive
© 2018 IBM Corporation
Multi-instance queue managers
24
QMgr
– Only as reliable as the network attached storage
– Matching the MQ requirements to filesystem
behaviour can be tricky
– No IP address takeover, use client configuration
instead
– All queue manager data is held on network attached
storage (e.g. NFS, IBM Spectrum Scale).
– Two systems are running, both have an instance of
the same queue manager, pointed to the same
storage. One is active, the other is in standby.
– A failure of the active instance is detected by the
standby through regularly attempting to take
filesystem locks.
– The queue manager with the locks is the active
instance.
– Faster takeover, less to restart
– Cheaper – less specialised software or administration
skills needed
– Wide platform coverage, Windows, Unix, Linux
QMgr
active
active
active
passive
© 2018 IBM Corporation
MQ managed HA
© 2018 IBM Corporation© 2018 IBM Corporation
o A pair of MQ Appliances are connected together and
configured as an HA group
o Queue managers created on one appliance can be
automatically replicated, along with all the MQ data, to the
other
o Appliances monitor each other
o Automatic failover, plus manual failover for
migration or maintenance
o Independent failover for queue managers so both
appliances can run workload (active / active load)
o Optional IP address associated with an HA queue
manager, automatically adopted by the active HA
appliance – single logical endpoint for client apps
o No persistent data loss on failure
o No external storage
o No additional skills required
IBM MQ Appliance
QMgrQMgr
QMgrQMgr
QMgr
active
active
QMgr
active
passive
© 2018 IBM Corporation© 2018 IBM Corporation
MQ HA Group
Node 2 Node 3Node 1
Synchronous data replication
27
Replicated Data Queue Managers
o Linux only, MQ Advanced HA solution with no
need for a shared file system or HA cluster
o MQ configures the underlying resources to make
setup and operations natural to an MQ user
o Three-way replication for quorum support
o Synchronous data replication for once and once
only transactional delivery of messages
o Active/passive queue managers with automatic
takeover
o Per queue manager control to support
active/active utilisation of nodes
o Per queue manager IP address to provide simple
application setup
o Supported on RHEL v7 x86-64 only
Monitoring
App
Network
New in V9.0.4 CD / V9.1 LTS MQ Advanced
for Linux
© 2018 IBM Corporation© 2018 IBM Corporation
MQ HA Group
Node 2 Node 3Node 1
Synchronous data replication
28
Replicated Data Queue Managers
Monitoring
App
Networko Linux only, MQ Advanced HA solution with no
need for a shared file system or HA cluster
o MQ configures the underlying resources to make
setup and operations natural to an MQ user
o Three-way replication for quorum support
o Synchronous data replication for once and once
only transactional delivery of messages
o Active/passive queue managers with automatic
takeover
o Per queue manager control to support
active/active utilisation of nodes
o Per queue manager IP address to provide simple
application setup
o Supported on RHEL v7 x86-64 only
New in V9.0.4 CD / V9.1 LTS MQ Advanced
for Linux
© 2018 IBM Corporation© 2018 IBM Corporation
MQ HA Group
Node 2 Node 3Node 1
29
Replicated Data Queue Managers
QM 1
App
NetworkRecommended deployment pattern:
o Spread the workload across multiple queue
managers and distribute them across all three
nodes
o Even better, more than one queue manager per
node for better failover distribution
o Use MQ Clusters for additional routing of
messages to work around problems
o Pair this with the new Uniform Cluster capability
http://ibm.biz/MQ-UniCluster
MQ licensing is aligned to maximise benefits
o One full IBM MQ Advanced license and two
High Availability Replica licenses (previously
named Idle Standby)
App App
© 2018 IBM Corporation© 2018 IBM Corporation
Node 2Node 1
data replication
36
External/MQ managed HA with RDQM App
Disaster recovery
9.0.5 CD MQ Advanced added the ability to build a
looser coupled pair of nodes for data replication with
manual failover
Data replication can be
Asynchronous for systems separated by a high
latency network
Synchronous for systems on a low latency
network
No automatic takeover means no need for a third
node to provide a quorum
Think 2018 / March 21, 2018 / © 2018 IBM Corporation
© 2019 IBM Corporation41
High Availability with Kubernetes
The RDQM solution does not apply to container
environments
High availability of the MQ data requires highly
available replicated storage
Container orchestrators such as Kubernetes handle
much of the monitoring and restart
responsibilities…
Node 1 Node 3Node 2
HA network
storage
Pod 1
StatefulSet, Replicas=1
Kubernetes
© 2019 IBM Corporation42
High Availability with Kubernetes
The RDQM solution does not apply to container
environments
High availability of the MQ data requires highly
available replicated storage
Container orchestrators such as Kubernetes handle
much of the monitoring and restart
responsibilities…
…but not all. StatefullSets such as MQ are not
automatically restarted following a Kubernetes
node failure
Node 3Node 2
HA network
storage
StatefulSet, Replicas=1
Kubernetes
© 2019 IBM Corporation43
High Availability with Kubernetes
The RDQM solution does not apply to container
environments
High availability of the MQ data requires highly
available replicated storage
Container orchestrators such as Kubernetes handle
much of the monitoring and restart
responsibilities…
…but not all. StatefullSets such as MQ are not
automatically restarted following a Kubernetes
node failure
The MQ container image and Certified Container
now supports a two-replica multi-instance queue
manager deployment pattern to handle Kubernetes
node failures
Node 1 Node 3Node 2
HA network
storage
Pod 1
multi-instance
active
Pod 2
multi-instance
standby
StatefulSet, Replicas=2
Kubernetes
IBM MQ 9.1.3 CD
© 2018 IBM Corporation
Cost of a restart
© 2018 IBM Corporation
Speed of failover
45
System managed
QMgr QMgr
Multi-instance
queue managers
QMgr QMgr
MQ Appliance
QMgr
QMgr
Replicated data
queue managers
QMgr QMgr QMgr
Starting a whole VM can be slow
Containers can b e much quicker
Recovering data from network attached
storage can be slow
Detect failure
Restart underlying system
Start queue manager
Recover messaging state
Reconnect applications
Reliant on storage configuration
Reconnecting applications
o The applications must also detect the failure
before attempting to reconnect
o Make sure the channel heartbeat interval is
set suitably low
Message state recovery
o This time is very dependent on the workload of
the queue manager
o High persistent traffic load and deep queues
can significantly increase the time needed
9.1.1/9.1.2 made
significant improvements
https://developer.ibm.com/messaging/2019/06/12/improved-switch-fail-over-times-in-mq-v9-1-2/
© 2018 IBM Corporation
What, where?
© 2018 IBM Corporation
Which HA fits where
47
Shared
queues
System
managed
Multi instance RDQM Appliance HA
z/OS ✓
Distributed
platforms
✓ ✓ ✓**
Containers ✓* ✓
MQ Appliance ✓
* This will depend heavily on the capabilities of the container management layer
** RHEL x86 only, bare metal and virtual machines
© 2018 IBM Corporation
Pulling it all together
© 2018 IBM Corporation© 2018 IBM Corporation
Building a highly available system
Decouple the applications from the
underlying MQ infrastructure
MQ
App
App
App
App
App
App
App
App
Service requestor or
event emitters
Service provider or
long running consumer
© 2018 IBM Corporation© 2018 IBM Corporation
High availability of the individual MQ runtimes
should be baked into the design
• Remove any single point of failure
App
App
App
App
App
App
App
App
Make each queue manager
highly available
Have multiple equivalent queue managers
Building a highly available system
© 2018 IBM Corporation© 2018 IBM Corporation
The applications should be designed and
configured to maximise the availability of the
MQ runtime
App
App
App
App
App
App
App
App
Use CCDTs and design
applications to be able to
connect to one of many
queue managers
Connect instances of
your service applications
to more than one queue
manager
Building a highly available system
MQ 9.1.2 introduced the
Uniform Cluster capability
to aid in building these
topologies
http://ibm.biz/MQ-UniCluster
Thank you
David Ware
Chief Architect, IBM MQ
www.linkedin.com/in/dware1
© 2019 IBM Corporation / IBM Confidential
52

More Related Content

PPTX
Building an Active-Active IBM MQ System
PDF
IBM MQ - High Availability and Disaster Recovery
PDF
IBM MQ and Kafka, what is the difference?
PPT
IBM Websphere MQ Basic
PPTX
IBM MQ Overview (IBM Message Queue)
PDF
Overview of Site Reliability Engineering (SRE) & best practices
PDF
System architecture for central banks
PPTX
CRISC Course Preview
Building an Active-Active IBM MQ System
IBM MQ - High Availability and Disaster Recovery
IBM MQ and Kafka, what is the difference?
IBM Websphere MQ Basic
IBM MQ Overview (IBM Message Queue)
Overview of Site Reliability Engineering (SRE) & best practices
System architecture for central banks
CRISC Course Preview

What's hot (20)

PDF
Fault tolerant and scalable ibm mq
PDF
IBM MQ Update, including 9.1.2 CD
PDF
IBM MQ - What's new in 9.2
PDF
Websphere MQ (MQSeries) fundamentals
PDF
IBM Think 2018: IBM MQ High Availability
PDF
IBM Integration Bus High Availability Overview
PDF
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
PPT
IBM MQ Online Tutorials
PPTX
REST APIs and MQ
PDF
IBM MQ - better application performance
PDF
WebSphere MQ tutorial
PPTX
What's new with MQ on z/OS 9.3 and 9.3.1
PPTX
New Tools and Interfaces for Managing IBM MQ
PPTX
Deploying and managing IBM MQ in the Cloud
PDF
Kafka with IBM Event Streams - Technical Presentation
PDF
MQ Guide France - IBM MQ and Containers
PDF
Open shift 4 infra deep dive
PPT
IBM WebSphere MQ Introduction
PPTX
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
PPTX
The RabbitMQ Message Broker
Fault tolerant and scalable ibm mq
IBM MQ Update, including 9.1.2 CD
IBM MQ - What's new in 9.2
Websphere MQ (MQSeries) fundamentals
IBM Think 2018: IBM MQ High Availability
IBM Integration Bus High Availability Overview
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ Online Tutorials
REST APIs and MQ
IBM MQ - better application performance
WebSphere MQ tutorial
What's new with MQ on z/OS 9.3 and 9.3.1
New Tools and Interfaces for Managing IBM MQ
Deploying and managing IBM MQ in the Cloud
Kafka with IBM Event Streams - Technical Presentation
MQ Guide France - IBM MQ and Containers
Open shift 4 infra deep dive
IBM WebSphere MQ Introduction
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
The RabbitMQ Message Broker
Ad

Similar to IBM MQ High Availability 2019 (20)

PPTX
The enterprise differentiator of mq on zos
PDF
Designing IBM MQ deployments for the cloud generation
PPTX
What's New In MQ 9.2 on z/OS
PDF
IBM MQ cloud architecture blueprint
PDF
Whats new in MQ V9.1
PDF
Connecting IBM MessageSight to the Enterprise
PDF
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
PDF
MQ Guide France - What's new in ibm mq 9.1.4
PPT
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016
PPT
Running IBM MQ in the Cloud
PPTX
CTU 2017 - I168 IBM MQ in the cloud
PDF
IBM Managing Workload Scalability with MQ Clusters
PDF
IBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging Capabilities
PDF
IBM MQ V9 Overview
PPTX
2397 The MQ Appliance as a messaging in a box and MQ MFT hub solution
PDF
Building a resilient and scalable solution with IBM MQ on z/OS
PDF
20200113 - IBM Cloud Côte d'Azur - DeepDive Kubernetes
PDF
Realtime mobile&iot solutions using mqtt and message sight
PPTX
Multi-cloud deployment with IBM MQ
PPT
Hybrid messaging webcast: Using the best of both worlds to drive your busines...
The enterprise differentiator of mq on zos
Designing IBM MQ deployments for the cloud generation
What's New In MQ 9.2 on z/OS
IBM MQ cloud architecture blueprint
Whats new in MQ V9.1
Connecting IBM MessageSight to the Enterprise
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
MQ Guide France - What's new in ibm mq 9.1.4
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016
Running IBM MQ in the Cloud
CTU 2017 - I168 IBM MQ in the cloud
IBM Managing Workload Scalability with MQ Clusters
IBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging Capabilities
IBM MQ V9 Overview
2397 The MQ Appliance as a messaging in a box and MQ MFT hub solution
Building a resilient and scalable solution with IBM MQ on z/OS
20200113 - IBM Cloud Côte d'Azur - DeepDive Kubernetes
Realtime mobile&iot solutions using mqtt and message sight
Multi-cloud deployment with IBM MQ
Hybrid messaging webcast: Using the best of both worlds to drive your busines...
Ad

More from David Ware (9)

PDF
IBM MQ What's new - Sept 2022
PDF
What's new in IBM MQ, March 2018
PDF
Whats new in IBM MQ; V9 LTS, V9.0.1 CD and V9.0.2 CD
PDF
InterConnect 2016: IBM MQ self-service and as-a-service
PDF
InterConnect 2016: What's new in IBM MQ
PDF
IBM MQ: Using Publish/Subscribe in an MQ Network
PDF
IBM MQ: An Introduction to Using and Developing with MQ Publish/Subscribe
PPT
IBM WebSphere MQ: Managing Workloads, Scaling and Availability with MQ Clusters
PPT
IBM WebSphere MQ: Using Publish/Subscribe in an MQ Network
IBM MQ What's new - Sept 2022
What's new in IBM MQ, March 2018
Whats new in IBM MQ; V9 LTS, V9.0.1 CD and V9.0.2 CD
InterConnect 2016: IBM MQ self-service and as-a-service
InterConnect 2016: What's new in IBM MQ
IBM MQ: Using Publish/Subscribe in an MQ Network
IBM MQ: An Introduction to Using and Developing with MQ Publish/Subscribe
IBM WebSphere MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM WebSphere MQ: Using Publish/Subscribe in an MQ Network

Recently uploaded (20)

PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
Transform Your Business with a Software ERP System
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Nekopoi APK 2025 free lastest update
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
ai tools demonstartion for schools and inter college
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPT
Introduction Database Management System for Course Database
Softaken Excel to vCard Converter Software.pdf
Design an Analysis of Algorithms I-SECS-1021-03
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Transform Your Business with a Software ERP System
PTS Company Brochure 2025 (1).pdf.......
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Nekopoi APK 2025 free lastest update
VVF-Customer-Presentation2025-Ver1.9.pptx
ai tools demonstartion for schools and inter college
Operating system designcfffgfgggggggvggggggggg
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Upgrade and Innovation Strategies for SAP ERP Customers
How to Choose the Right IT Partner for Your Business in Malaysia
CHAPTER 2 - PM Management and IT Context
Online Work Permit System for Fast Permit Processing
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Introduction Database Management System for Course Database

IBM MQ High Availability 2019

  • 1. IBM MQ High Availability David Ware IBM MQ Chief Architect © 2019 IBM Corporation 1
  • 2. © 2018 IBM Corporation High availability 2 A system is said to be available if it is able to perform its required function, such as successfully process requests from users. A requirement, or a capability, of a system to be operational for a greater proportion of time than is common for other, less important, systems. Available Highly Available Often, greater availability means greater complexity and cost A sliding scale 0% 100%
  • 3. © 2018 IBM Corporation Measuring availability 3 Target 95% 99% 99.5% 99.9% 99.99% 99.999% 99.9999% Impacts on availability o Applying maintenance o Likelihood of outages (meantime to failure, and speed of recovery) o Operational errors Overall availability is the combined availability of all components o The platform o The middleware o The applications Yearly outage ~ 18 days < 4 days < 2 days < 9 hours ~ 1 hour ~ 5 minutes ~ 30 seconds ‘5 nines’
  • 4. © 2018 IBM Corporation Messaging system availability 4 Asynchronous messaging can improve application availability by providing a buffer but the messaging system itself must be highly available to achieve that Redundancy Multiple active options available for applications to connect Routing Ability to route messages around failures Message availability Critical messages are not locked to a single runtime and quickly available from elsewhere
  • 5. © 2018 IBM Corporation Messaging system availability 5 Asynchronous messaging can improve application availability by providing a buffer but the messaging system itself must be highly available to achieve that Redundancy Multiple active options available for applications to connect Routing Ability to route messages around failures Message availability Critical messages are not locked to a single runtime and quickly available from elsewhere Required for continuous availabilities Only required for certain message flows
  • 6. © 2018 IBM Corporation Messaging system availability – MQ 6 Asynchronous messaging can improve application availability by providing a buffer but the messaging system itself must be highly available to achieve that Redundancy Multiple active options available for applications to connect Routing Ability to route messages around failures Message availability Critical messages are not locked to a single runtime and quickly available from elsewhere Application client connectivity MQ Cluster routing MQ ‘HA’
  • 7. © 2018 IBM Corporation Application client connectivity
  • 8. © 2018 IBM Corporation Decouple the applications from queue managers Applications locally bound to a queue manager will limit the availability of the solution. Running applications remote from the queue managers, always connecting as MQ clients, decouples the application and system runtimes, enabling higher availability.
  • 9. © 2018 IBM Corporation Decouple the applications from queue managers Step 1 Connect the application as a client Benefits: o Ability to support solutions where a queue manager may fail-over between systems (more later). o Separates application system requirements from the queue manager’s, reducing maintenance conflicts and therefore, availability. o Restart times on either side can be reduced. Should be relatively invisible to the application o Don’t hardcode that connection configuration! o Use client auto-reconnect to hide a queue manager restart from the application Qmgr App
  • 10. © 2018 IBM Corporation Decouple the applications from queue managers Step 2 Allow the application to connect to a set of queue managers Benefits: o Applications can continue to interact with MQ even whilst a queue manager is failing over or unavailable during maintenance o With multiple applications connected, only a subset will be impacted by a queue manager outage How does your application find the queue manager? o Network routing o Connection name lists o Client Channel Definition Tables The application may need to re-evaluate how it exploits all of MQ’s capabilities o Message ordering may change if it is currently expected across connections o Applications may be reliant on transitory state: o Dynamic queues and subscriptions o Reply messages o XA transaction recovery Qmgr App Qmgr Qmgr This might not work for all applications
  • 11. © 2018 IBM Corporation What do CCDTs enable? These provide encapsulation and abstraction of connection information for applications, hiding the MQ architecture and configuration from the application They also enable security, high availability and workload balancing of clients Applications simply connect to an abstracted “queue manager” name (which doesn’t need to be the actual queue manager name – use an ‘*’) CCDT defines which real queue managers the application will connect to. Which could be a single queue manager or a group of Across a group, selection can be ordered or randomised and weighted CCDT QMGRP1: QM1 QMGRP2: QM2 QMGRP2: QM3 Application 1 connect: *QMGRP1 Application 2 connect: *QMGRP2 Application 2 connect: *QMGRP2 QM1 QM2 QM3
  • 12. © 2018 IBM Corporation Creating the CCDTs CCDTs can be used to represent connection details to multiple queue managers Prior to MQ 9.1.2, a CCDT is generated by using MQ tooling to define CLNTCONN channels that identify the SVRCONN channels Define multiple CLNTCONNs in a central place to generate the CCDT It doesn’t have to be any of the queue managers owning the SVRCONNs Pre-MQ V8: You needed a dedicated queue manager for this purpose MQ V8+: Use runmqsc –n to remove the need for a queue manager MQ 9.1.2 added the ability to build your own JSON format CCDT files. This has also added the ability to define multiple channels of the same name on different queue managers A single CCDT for your MQ estate or one per application? A single CCDT can be easier to create but updates can be expensive Separate CCDTs make it easier to update when an application’s needs change Create the QMgrs and define their SVRCONNs Centrally define all the CLNTCONNs that represent all the QMgrs Take the CCDT and make available to the connecting applications { “channel”:[ { “name”:”ABC”, ”queueManager”:”A” }, { “name”:”ABC”, ”queueManager”:”B” }, ] }
  • 13. © 2018 IBM Corporation Accessing the CCDTs CCDT files need to be accessible to the applications connecting to MQ ? CCDT HTTP server CCDT QM1 QM2 QM3 Either accessible through the client’s filesystem User needs to manage distribution of CCDT files themselves Or remotely over HTTP or FTP Available for JMS/XMS applications for a number of releases Added for MQI applications in MQ V9 LTS MQI App MQCONN(QMGRP2) CCDT QMGR1 QMGR2 QMGR3 CCDT QM1 QM2 QM3 QM2 QM3QM1 QM2 QM3QM1 MQI App MQCONN(QMGRP2) MQI App MQCONN(QMGRP2) MQI App MQCONN(QMGRP2) CCDT QMGR1 QMGR2 QMGR3 CCDT QMGR1 QMGR2 QMGR3 CCDT QMGR1 QMGR2 QMGR3 MQI App MQCONN(QMGRP2) MQI App MQCONN(QMGRP2) MQI App MQCONN(QMGRP2) MQI App MQCONN(QMGRP2)
  • 14. © 2018 IBM Corporation MQ Cluster routing
  • 15. © 2018 IBM Corporation Routing on availability with MQ Clusters 15 • MQ Clusters provide a way to route messages based on availability • In a cluster there can be multiple potential targets for any message. This alone can improve the availability of the solution, always providing an option to process new messages. • A queue manager in a cluster also has the ability to route new and old messages based on the availability of the channels, routing messages to running queue managers. • Clustering can also be used to route messages to active consuming applications. • Clustering is used by many customers who operate critical services at scale • Available on all supported MQ platforms ServiceService App 1App 1Client Service Queue Manager Queue Manager Queue Manager Queue Manager
  • 16. © 2018 IBM Corporation MQ ‘HA’ (message availability)
  • 17. © 2018 IBM Corporation Message high availability 17 – Consider a single message – Tied to a single runtime, on a single piece of hardware – Any failure locks it away until recovery completes The problem The objective – Messages are not tied to a single anything – In the event of a failure, there is a fast route to access the message
  • 18. © 2018 IBM Corporation Message high availability 18 – Messages are highly available, through replication – Only one runtime is the leader and has access to the messages at a time – A failure results in a new leader taking over – Any message is available from any runtime at any time – Coordinated access to each message – A failed runtime does not prevent access to a message by another runtime Active / active messages Active / passive messages
  • 19. © 2018 IBM Corporation Message high availability 19 Active / active messages Active / passive messages IBM MQ Distributed HA solutions (but there’s nothing to stop you having multiple different queue managers to share the load and the risk More later) IBM MQ for z/OS shared queues
  • 20. © 2018 IBM Corporation Coupling Facility MQ for z/OS shared queues 20 –MQ queue-sharing groups (QSGs) – Available with z/OS parallel sysplex • A tightly coupled cluster of independent z/OS instances – Multiple queue managers are members of a queue sharing group (QSG) – Shared queues are held in the Parallel SysPlex Coupling Facility • A highly optimised and resilient z/OS technology – All queue managers in a QSG can access the same shared queues and their messages – Benefits: ü Messages remain available even if a queue manager fails ü Pull workload balancing ü Applications can connect to the group using a QSG name ü Removes affinity to a specific queue manager Queue Sharing Group QMgr QMgr QMgr App App App
  • 21. © 2018 IBM Corporation IBM MQ Distributed HA solutions 21 MQ managed The resilient data and the automatic takeover is provided by the MQ system Externally managed External mechanisms are relied on to protect the data and provide automatic takeover capabilities System managed HA QMgr QMgr Multi-instance queue managers QMgr QMgr MQ Appliance QMgr QMgr Replicated data queue managers QMgr QMgr QMgr new
  • 22. © 2018 IBM Corporation Externally managed HA
  • 23. © 2018 IBM Corporation System managed HA 23 The HA manager monitors the MQ system (e.g. a queue manager in a VM or container), on detecting a failure it will start a new system, remount storage and reroute network traffic – Relies on external, highly available, storage (e.g. SAN) – A queue manager is unaware of the HA system – Availability depends on speed to detect problems and to restart all layers of the system required (e.g. VM and queue manager) Examples: – HA Clusters • Veritas Cluster Server, IBM PowerHA, Microsoft Cluster Server – Cloud platforms • IBM Cloud, AWS EC2, Azure – Containers • Kubernetes, Docker Swarm QMgr QMgr HA manager – Some systems can be relatively slow to restart – Additional cost of infrastructure – Multiple moving parts to configure and manage active active passive passive
  • 24. © 2018 IBM Corporation Multi-instance queue managers 24 QMgr – Only as reliable as the network attached storage – Matching the MQ requirements to filesystem behaviour can be tricky – No IP address takeover, use client configuration instead – All queue manager data is held on network attached storage (e.g. NFS, IBM Spectrum Scale). – Two systems are running, both have an instance of the same queue manager, pointed to the same storage. One is active, the other is in standby. – A failure of the active instance is detected by the standby through regularly attempting to take filesystem locks. – The queue manager with the locks is the active instance. – Faster takeover, less to restart – Cheaper – less specialised software or administration skills needed – Wide platform coverage, Windows, Unix, Linux QMgr active active active passive
  • 25. © 2018 IBM Corporation MQ managed HA
  • 26. © 2018 IBM Corporation© 2018 IBM Corporation o A pair of MQ Appliances are connected together and configured as an HA group o Queue managers created on one appliance can be automatically replicated, along with all the MQ data, to the other o Appliances monitor each other o Automatic failover, plus manual failover for migration or maintenance o Independent failover for queue managers so both appliances can run workload (active / active load) o Optional IP address associated with an HA queue manager, automatically adopted by the active HA appliance – single logical endpoint for client apps o No persistent data loss on failure o No external storage o No additional skills required IBM MQ Appliance QMgrQMgr QMgrQMgr QMgr active active QMgr active passive
  • 27. © 2018 IBM Corporation© 2018 IBM Corporation MQ HA Group Node 2 Node 3Node 1 Synchronous data replication 27 Replicated Data Queue Managers o Linux only, MQ Advanced HA solution with no need for a shared file system or HA cluster o MQ configures the underlying resources to make setup and operations natural to an MQ user o Three-way replication for quorum support o Synchronous data replication for once and once only transactional delivery of messages o Active/passive queue managers with automatic takeover o Per queue manager control to support active/active utilisation of nodes o Per queue manager IP address to provide simple application setup o Supported on RHEL v7 x86-64 only Monitoring App Network New in V9.0.4 CD / V9.1 LTS MQ Advanced for Linux
  • 28. © 2018 IBM Corporation© 2018 IBM Corporation MQ HA Group Node 2 Node 3Node 1 Synchronous data replication 28 Replicated Data Queue Managers Monitoring App Networko Linux only, MQ Advanced HA solution with no need for a shared file system or HA cluster o MQ configures the underlying resources to make setup and operations natural to an MQ user o Three-way replication for quorum support o Synchronous data replication for once and once only transactional delivery of messages o Active/passive queue managers with automatic takeover o Per queue manager control to support active/active utilisation of nodes o Per queue manager IP address to provide simple application setup o Supported on RHEL v7 x86-64 only New in V9.0.4 CD / V9.1 LTS MQ Advanced for Linux
  • 29. © 2018 IBM Corporation© 2018 IBM Corporation MQ HA Group Node 2 Node 3Node 1 29 Replicated Data Queue Managers QM 1 App NetworkRecommended deployment pattern: o Spread the workload across multiple queue managers and distribute them across all three nodes o Even better, more than one queue manager per node for better failover distribution o Use MQ Clusters for additional routing of messages to work around problems o Pair this with the new Uniform Cluster capability http://ibm.biz/MQ-UniCluster MQ licensing is aligned to maximise benefits o One full IBM MQ Advanced license and two High Availability Replica licenses (previously named Idle Standby) App App
  • 30. © 2018 IBM Corporation© 2018 IBM Corporation Node 2Node 1 data replication 36 External/MQ managed HA with RDQM App Disaster recovery 9.0.5 CD MQ Advanced added the ability to build a looser coupled pair of nodes for data replication with manual failover Data replication can be Asynchronous for systems separated by a high latency network Synchronous for systems on a low latency network No automatic takeover means no need for a third node to provide a quorum Think 2018 / March 21, 2018 / © 2018 IBM Corporation
  • 31. © 2019 IBM Corporation41 High Availability with Kubernetes The RDQM solution does not apply to container environments High availability of the MQ data requires highly available replicated storage Container orchestrators such as Kubernetes handle much of the monitoring and restart responsibilities… Node 1 Node 3Node 2 HA network storage Pod 1 StatefulSet, Replicas=1 Kubernetes
  • 32. © 2019 IBM Corporation42 High Availability with Kubernetes The RDQM solution does not apply to container environments High availability of the MQ data requires highly available replicated storage Container orchestrators such as Kubernetes handle much of the monitoring and restart responsibilities… …but not all. StatefullSets such as MQ are not automatically restarted following a Kubernetes node failure Node 3Node 2 HA network storage StatefulSet, Replicas=1 Kubernetes
  • 33. © 2019 IBM Corporation43 High Availability with Kubernetes The RDQM solution does not apply to container environments High availability of the MQ data requires highly available replicated storage Container orchestrators such as Kubernetes handle much of the monitoring and restart responsibilities… …but not all. StatefullSets such as MQ are not automatically restarted following a Kubernetes node failure The MQ container image and Certified Container now supports a two-replica multi-instance queue manager deployment pattern to handle Kubernetes node failures Node 1 Node 3Node 2 HA network storage Pod 1 multi-instance active Pod 2 multi-instance standby StatefulSet, Replicas=2 Kubernetes IBM MQ 9.1.3 CD
  • 34. © 2018 IBM Corporation Cost of a restart
  • 35. © 2018 IBM Corporation Speed of failover 45 System managed QMgr QMgr Multi-instance queue managers QMgr QMgr MQ Appliance QMgr QMgr Replicated data queue managers QMgr QMgr QMgr Starting a whole VM can be slow Containers can b e much quicker Recovering data from network attached storage can be slow Detect failure Restart underlying system Start queue manager Recover messaging state Reconnect applications Reliant on storage configuration Reconnecting applications o The applications must also detect the failure before attempting to reconnect o Make sure the channel heartbeat interval is set suitably low Message state recovery o This time is very dependent on the workload of the queue manager o High persistent traffic load and deep queues can significantly increase the time needed 9.1.1/9.1.2 made significant improvements https://developer.ibm.com/messaging/2019/06/12/improved-switch-fail-over-times-in-mq-v9-1-2/
  • 36. © 2018 IBM Corporation What, where?
  • 37. © 2018 IBM Corporation Which HA fits where 47 Shared queues System managed Multi instance RDQM Appliance HA z/OS ✓ Distributed platforms ✓ ✓ ✓** Containers ✓* ✓ MQ Appliance ✓ * This will depend heavily on the capabilities of the container management layer ** RHEL x86 only, bare metal and virtual machines
  • 38. © 2018 IBM Corporation Pulling it all together
  • 39. © 2018 IBM Corporation© 2018 IBM Corporation Building a highly available system Decouple the applications from the underlying MQ infrastructure MQ App App App App App App App App Service requestor or event emitters Service provider or long running consumer
  • 40. © 2018 IBM Corporation© 2018 IBM Corporation High availability of the individual MQ runtimes should be baked into the design • Remove any single point of failure App App App App App App App App Make each queue manager highly available Have multiple equivalent queue managers Building a highly available system
  • 41. © 2018 IBM Corporation© 2018 IBM Corporation The applications should be designed and configured to maximise the availability of the MQ runtime App App App App App App App App Use CCDTs and design applications to be able to connect to one of many queue managers Connect instances of your service applications to more than one queue manager Building a highly available system MQ 9.1.2 introduced the Uniform Cluster capability to aid in building these topologies http://ibm.biz/MQ-UniCluster
  • 42. Thank you David Ware Chief Architect, IBM MQ www.linkedin.com/in/dware1 © 2019 IBM Corporation / IBM Confidential 52