SlideShare a Scribd company logo
Going large deep with MQ
Matt Leming: Architect, IBM MQ for z/OS
lemingma@uk.ibm.com
© 2020 IBM Corporation 1
What does going deep
mean?
And when should you, and shouldn’t you do it?
Application Application
A good queue is an empty queue
MQ is designed to allow applications to asynchronously communicate, acting as a buffer to smooth workload
peaks and temporary application outages
Application Application
A good queue is an empty queue, but application failures happen!
MQ is designed to allow applications to asynchronously communicate, acting as a buffer to smooth workload
peaks and temporary application outages
Failure of a putting application has no real effect on MQ
Failure of a getting application will result in messages building up on its queues until the application restarts and
beings processing again
Application
A good queue is an empty queue, but application failures happen!
MQ is designed to allow applications to asynchronously communicate, acting as a buffer to smooth workload
peaks and temporary application outages
Failure of a putting application has no real effect on MQ
Failure of a getting application will result in messages building up on its queues until the application restarts and
beings processing again
If the getting application suffers an extended outage queues can completely fill up, resulting in the putting
application having to have a strategy for how to deal with a full queue
Application
Application Application
The aim of this presentation
This presentation aims to explore how deep a queue can go, and things you should bear in mind when that
happens
Application
Are deep queues a valid use case for MQ?
MQ is designed to allow applications to asynchronously communicate, acting as a buffer to smooth workload
peaks and temporary application outages
These outages might be minutes, hours or even days, but they are always temporary
We are not advocating using MQ as a database. I.e. keeping data on queues for “ever”. MQ is not optimised for
this use case
Use a database instead!
Application
Application Application
Are deep queues the only answer?
No, but they are often the simplest from an application perspective. But as we will see, really deep queues require
some thinking about
Some alternatives include:
• Putting application detecting MQRC_Q_FULL and pausing putting more messages
• Putting application detecting MQRC_Q_FULL and putting new messages to a database, file, another queue
• Starting up a temporary getting application to move the messages off somewhere else
• Failing!
In addition to having to code these solutions, they often come with challenges such as keeping message ordering
etc
Life-cycle of a getting application outage
It is up to you to decide the maximum getting application outage duration that you want to tolerate
Some customers are now looking at multiple day outages with 10s of millions of messages a day building up
However that decision needs to be in the bounds of possibility for MQ. There is a limit to the amount of data a
queue manager can store
Time
Depth
Getting
application
outage
Getting
application
restarts
Getting application
processing backlog
Normal running Normal running
Peak queue
depth
Duration of recovery from
outage
Duration of getting
application outage
Other important things
• When you decide what the queue size limits are, make sure you enforce them via configuration: MAXDEPTH,
MAXMSGL
• Make sure you monitor for a getting application outage, perhaps via DIS QSTATUS, or by service interval
events, so you can restart the getting application as soon as possible
• Monitor for the queue filling up, e.g. queue depth full, or high events, for similar reasons
• As the queue fills up you might start to see degraded putting application performance, it’s worth
understanding what this might be so you can plan for it
Time
Depth
Getting
application
outage
Getting
application
restarts
Getting application
processing backlog
Normal running Normal running
Peak queue
depth
Duration of recovery from
outage
Duration of getting
application outage
Other important things
• When the getting application starts up it is going to need to catch up. How fast can it do this? E.g. with a 2 day
outage and a getting application that can get twice as fast as data being put, it will take a total of 4 days from
the start of the application outage to get back to normal
• Do you need to allow for a getting application outage occurring during this recovery window?
• Can the messages expire? And are you relying on that to keep the queue depth low?
• Be particularly aware of potential getting application inefficiencies. For example getting using a message
selector is relatively slow, this will be particularly noticeable on a deep queue
• Make sure you test the deep queue scenario, including recovery, to make sure everything works in the
timeframes you expect!
Time
Depth
Getting
application
outage
Getting
application
restarts
Getting application
processing backlog
Normal running Normal running
Duration of recovery from
outage
Duration of getting
application outage
Peak queue
depth
So, how deep can I go?
Approximate number of 1 KB
messages
MQ on z/OS private queue 16.8 million
MQ on z/OS shared queue all in CF 613.6 million* **
MQ on z/OS shared queue on
SMDS
1.4 billion* **
MQ on distributed 68.4 billion*
We will see where these numbers come from later
* We haven’t tested to the end of these limits
**Going this deep will mean you can’t recover in the case of a structure failure. So non-
persistent messages only!
Going deep with MQ on z/OS
Private queues
Private queues – buffer pools and page sets
Messages on private queues are stored in memory in buffer pools and might be moved into a page set on disk
Data in buffer pools and page sets is accessed in 4KB pages. A single page might contain the queue spine,
message meta data, message data for at most one message, or space usage (space map page)
A private queue is associated with a single buffer pool / page set pair so the size of a page set is the constraining
factor for private queues. A page set can be a maximum of 64 GB in size, which allows for ~16.8 million 1KB
messages. This is a rough calculation, ignoring space maps, spine pages, etc
To get a rough idea for the maximum number of messages of a given size that can be stored in a page set, round
up the size of the message to a multiple of 4 KB and divide 64 GB by that number
Buffer pool
Page set
Private queues – recommendations
If you want to allow a private queue to deal with an extensive getting application outage consider putting them in
their own page set and ideally their own buffer pool. This will allow for accurate sizing. You don’t want two very
deep queues on the same page set at the same time
There is obviously a limit to how much separation you can have here given that there can be at most 100 page
sets and buffer pools in a single z/OS queue manager
Buffer pool
Page set
Indexing your queues
On z/OS you can specify the INDXTYPE attribute on local queues to tell the queue manager how to index the
queue: no index; by message ID; by correlation ID; by group ID
An index makes no difference if just getting the next available message off a queue, but makes a significant
difference otherwise, especially if the queue is deep
INDXTYPE only supports a single value, so choose it based off the most common approach for getting messages
from the queue.
However for private queues you can still use any approach regardless of the index, but it will be less efficient. The
queue manager will tell you if you should consider indexing your queues
MessageID=A MessageID=B MessageID=C
Indexing your queues
The index for private queues is maintained in the queue manager in 64 bit storage
Each message on an indexed queue uses 136 bytes to maintain the index. 10 million messages therefore uses
1360MB of 64 bit storage
So if you are going to have deep private queues which are indexed make sure you account for it in the
MEMLIMIT attribute of your *MSTR JCL
MessageID=A MessageID=B MessageID=C
Going deep with MQ on z/OS
Shared queues
Shared queues – storage
Shared queues are stored in a coupling facility (CF)
Messages may be entirely stored in the CF if the message size < 63KB, or a pointer to the message can be
stored in the CF and the remainder of the message offloaded to Db2 (not recommended, and not discussed
further) or in shared message data sets (SMDS)
The maximum supported CF structure size is 1TB. This is all real storage, so is relatively expensive
If SMDS is used each queue manager gets its own SMDS data set for the structure. The maximum size of a
single SMDS is 16 TB
pMessage
pMessage
pMessage
Shared queues – small messages
Shared queues perform best when the message is held entirely in the CF as that minimises both code path
length and removes the need to interact with DASD
For messages which are < 63KB in size the best approach is therefore to keep them in the CF and only offload
them to SMDS when the queue starts getting deep during a getting application outage
This gives the best of both worlds, fast message access normally, but the ability to store lots of messages in the
worst case
If you do decide to keep small messages in the CF and not to offload, the maximum number of 1KB messages
that can be stored in a single structure is approximately 613.6 million, based on the max CF size of 1TB
pMessage
pMessage
pMessage
Shared queues – offload rules
Each MQ CFSTRUCT definition has three offload rules associated with it
Each rule specifies the maximum size message that can be stored in in the structure, when the structure is over a
given percentage full
The default rules assume that no messages < 63KB get offloaded until the structure is very full
These defaults are not likely to be good for a getting application outage where you want to maximize the number
of messages you can store and minimise the amount of CF used. Instead you might want to start offloading all
messages when the structure is say 10% full, as shown in the example on the right hand size
DEFINE CFSTRUCT(SHALLOWSTRUCT)
CFLEVEL(5) …
OFFLOAD(SMDS)
OFFLD1TH(70) OFFLD1SZ(32K)
OFFLD2TH(80) OFFLD1SZ(4K)
OFFLD2TH(90) OFFLD1SZ(0K)
DEFINE CFSTRUCT(DEEPSTRUCT)
CFLEVEL(5) …
OFFLOAD(SMDS)
OFFLD1TH(10) OFFLD1SZ(0K)
OFFLD2TH(10) OFFLD1SZ(0K)
OFFLD2TH(10) OFFLD1SZ(0K)
Shared queues – offloaded messages
Each offloaded message requires a message pointer in the CF. The CF is used to maintain queueing semantics,
the pointer allows the messages to be located in SMDS
Offloaded messages require you to consider both the CF structure size and the space used in SMDS
At most a structure can contain 1.4 billion message pointers, you then need to consider how much space those
messages will occupy on SMDS
A single SMDS data set can be up to 16TB in size. This can easily take 1.4 billion 1KB messages, but only 16
million 1MB messages
pMessage
pMessage
pMessage
a message
pointer
1 entry = 256 bytes
2 elements = 2 * 256 bytes
Total size = 768 bytes
pMessage
Shared queues – back ups
CF structures are in memory structures. In the rare cases where they fail, they need to be rebuilt
With MQ this is done by periodically taking back ups of the structure. If a structure failure occurs the structure can
be recovered from the back up plus the logs of the queue managers that have accessed the structure since the
failure
Deep queues have important implications for this process:
• Size and number of active and archive logs
• Backup time
• Backup frequency
• Recovery time
BACKUP
CFSTRUCT(DEEPSTRUCT)
Active and
archive logs
RECOVER
CFSTRUCT(DEEPSTRUCT)
Shared queues – back ups – archive and active logs
Backing up a structure involves writing the structure contents and the contents of the SMDS to a queue
manager’s active logs. Over time the active logs then get written to the archive logs
Therefore, as shown above, the back up might be only in the active logs, in a mixture of active and archive logs,
or just archive logs
As shown above the contents of the active logs normally mainly overlap the contents of the archive logs
Therefore the limiting factor for a back up is the number and size of the archive logs. The biggest backup you can
have is ~ 4096 GB. NB this is smaller than the maximum size of an SMDS!
Current
active log (0)
Previous
active log (-1)
Archive log
0
Archiving
in process
Previous
active log (-2)
Archive log
-1
Archiving
complete
=
Previous
active log (-3)
Archive log
-2
Archiving
complete
=
Archive log
-3
Data only
in archive
Archive log
-4
Data only
in archive
Up to 310 * 4 GB
active logs
Up to 1000 * 4 GB
archive logs
Shared queues – back ups – archive and active logs
In order to recover a backup it needs to be accessible to the queue managers, so you need to make sure that the
start of the backup remains in the available archive logs for that queue manager, otherwise you can’t recover it!
You don’t just need one backup, you need to be able to safely transition from one backup to the next one, should
anything fail while the backup is occurring. The green and yellow “usable backups” above illustrate this
The conclusion of all this is that you really don’t want a backup to be more than about a quarter of the available
archive logs of a queue manager, i.e. < 1 TB at the most
For this extreme case you should also be considering a separate queue manager just to perform the backups.
This removes the risks of application messages pushing the backup from the archive logs, and also means the
back up process can’t affect applications
Archive log 0
(newest available)
Archive log -999
(oldest available)
Usable backup Unusable backup
Messages since
backup
RECOVER CFSTRUCT works RECOVER CFSTRUCT fails
Usable backup
Messages since
backup
Messages since
backup
Shared queues – back ups – time and frequency
Backing up a structure takes time. The best backup is one that contains minimal data, as that will be fast. IBM
recommends taking a backup of every structure at least every hour to minimise the amount of time recovery will
take
Care is needed here with deep queues!
Backing up a large structure will take a long time, and as discussed it will take a lot of log space
If your getting application outage can be several days, continuing with your normal backup strategy while the
structure contains lots and lots of messages might not be a good idea
BACKUP
CFSTRUCT(DEEPSTRUCT)
Active and
archive logs
RECOVER
CFSTRUCT(DEEPSTRUCT)
Shared queues – back ups – time and frequency
It is worth considering adjusting your back up strategy during extended getting application outages
1) During normal running back up every hour or so (whatever is normal for your site)
2) When the queue starts to get very deep, pause backups, or make them less frequent, until the getting
application restarts and the queue depth has reduced
This approach minimises the repeated costs of taking a large back up both in terms of CPU and log usage.
However it does mean that any recovery will rely on reading a potentially large amount of log data across the
queue managers in your group. Therefore you need to accurately size the active and archive logs across your
group too, and consider how long recovery might take
A good time to do the pause is when the size of the backup starts to become multiple times the amount of data
that would normally get written between backups
Time
Depth
Regular backups
Getting
application
outage
Getting
application
restarts
Regular backups
resume
Pause backups
Shared queues – back ups – recovery
Frequent back ups are recommended to minimise the amount of time it takes to recover from a back up
Recovery involves reading the logs in the backup and scanning the logs of all queue managers that used the
structure since the point of the backup. This results in reading the active / archive logs backwards which is
typically slower than reading them forwards
With deep queues this could take a significant period of time. But it does require multiple failures all at the same
time (getting application outage for a significant period of time, and then subsequent CF structure failure)
It might be worth considering CF duplexing to reduce the chance of a structure failure, but bear in mind this will
result in increased CF CPU utilization, and more CF storage being required
Time
Depth
Regular backups
Getting
application
outage
Getting
application
restarts
Structure fails here
Pause backups
Going deep with
MQ on distributed
Queue files
Distributed takes a different approach to z/OS
Each queue gets its own in memory set of buffers for temporarily staging messages. A different set of buffers are
used for persistent and non-persistent messages
These buffers can be tuned, up to a maximum size of 100MB, they default to 128KB for non-persistent messages
and 256 KB for persistent ones. These settings aren’t as fully externalized as buffer pools on z/OS. But similar
performance considerations apply
The buffers asynchronously get written to the queue file, and there is one queue file per queue
Q1
Non-persistent
message buffers
Persistent message
buffers
Queue file
Q2
Non-persistent
message buffers
Persistent message
buffers
Queue file
Q3
Non-persistent
message buffers
Persistent message
buffers
Queue file
Queue files
Queue file size is the ultimate upper limit for queue depth on distributed. Default max size is ~2TB
From 9.2.0 adjusting this is simple as shown above. The maximum value is ~255TB
Messages are stored on queue files in blocks. If the queue file < 2TB in size the block size is 512 bytes
Above 2 TB the block size is 4KB, which means a 1KB message will use a whole block
Maximum number of 1KB messages on a single queue is therefore ~68.4 billion
DEFINE QL(NEWQUEUE) MAXFSIZE(500)
ALTER QL(EXISTINGQUEUE) MAXFSIZE(1000)
Create queue with maximum file size of 500MB
Alter existing queue to have a max size of
1000MB
DIS QSTATUS(NEWQUEUE) CURMAXFS
CURFSIZE
Queue is using 39 MB of its 500MB
QUEUE(NEWQUEUE)
CURFSIZE(39) CURMAXFS(500)
Logging
Distributed supports both linear and circular logging
Linear logging is of interest with deep queues as it allows you to periodically backup queue files onto the log, i.e.
create a media image
As with shared queues, on distributed you need to think carefully as to when you create a media image of a very
large queue file as it will consume a lot of log space, which you will have to maintain
Queue managers can be configured to create media images automatically based on time, or amount of log usage
since the last image. If you use this you might want to switch it off for long getting application outages
Depth
Regular media images
Getting
application
outage
Getting
application
restarts
Regular media images resume
Last media image
Recommended reading
Recommended reading
For z/OS much of this information, along with some example values, is in the capacity and planning guide. I
strongly recommend reading it
http://ibm-messaging.github.io/mqperf/mp16.pdf
For distributed, take a look at
https://ibm-messaging.github.io/mqperf/MQ_Performance_Best_Practices_v1.0.1.pdf
Thank you
Matt Leming
Architect, IBM MQ for z/OS
lemingma@uk.ibm.com
© 2022 IBM Corporation

More Related Content

PDF
WebSphere MQ tutorial
PPTX
Building an Active-Active IBM MQ System
PDF
IBM Integration Bus High Availability Overview
PDF
IBM MQ Whats new - up to 9.3.4.pdf
PDF
IBM MQ - Comparing Distributed and z/OS platforms
PDF
IBM MQ High Availability 2019
PDF
Websphere MQ (MQSeries) fundamentals
PDF
Rabbitmq an amqp message broker
WebSphere MQ tutorial
Building an Active-Active IBM MQ System
IBM Integration Bus High Availability Overview
IBM MQ Whats new - up to 9.3.4.pdf
IBM MQ - Comparing Distributed and z/OS platforms
IBM MQ High Availability 2019
Websphere MQ (MQSeries) fundamentals
Rabbitmq an amqp message broker

What's hot (20)

PPTX
Tanzu Kubernetes Grid - Presentation.pptx
PDF
DevOps - Interview Question.pdf
PPT
IBM Websphere MQ Basic
PDF
PRISMACLOUD Cloud Security and Privacy by Design
PDF
IBM MQ: An Introduction to Using and Developing with MQ Publish/Subscribe
PDF
Fault tolerant and scalable ibm mq
PPTX
The RabbitMQ Message Broker
PPTX
IBM MQ Whats new - up to 9.3.4.pptx
PDF
Microservices
PDF
Hands On Introduction To Ansible Configuration Management With Ansible Comple...
PDF
Red Hat OpenShift Operators - Operators ABC
PPTX
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
PDF
Meetup #4: AWS ELB Deep dive & Best practices
PPTX
Communication in a Microservice Architecture
PPTX
Distribuciones Linux
PDF
CRI, OCI, and CRI-O
PDF
VLAN vs VXLAN
PDF
Microservices with Docker, Kubernetes, and Jenkins
PPTX
Prometheus (Prometheus London, 2016)
Tanzu Kubernetes Grid - Presentation.pptx
DevOps - Interview Question.pdf
IBM Websphere MQ Basic
PRISMACLOUD Cloud Security and Privacy by Design
IBM MQ: An Introduction to Using and Developing with MQ Publish/Subscribe
Fault tolerant and scalable ibm mq
The RabbitMQ Message Broker
IBM MQ Whats new - up to 9.3.4.pptx
Microservices
Hands On Introduction To Ansible Configuration Management With Ansible Comple...
Red Hat OpenShift Operators - Operators ABC
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
Meetup #4: AWS ELB Deep dive & Best practices
Communication in a Microservice Architecture
Distribuciones Linux
CRI, OCI, and CRI-O
VLAN vs VXLAN
Microservices with Docker, Kubernetes, and Jenkins
Prometheus (Prometheus London, 2016)
Ad

Similar to Going Deep with MQ (20)

PDF
Triage Presentation
ODP
The Art of Message Queues - TEKX
PDF
Postponed Optimized Report Recovery under Lt Based Cloud Memory
PDF
#4 Mulesoft Virtual Meetup Kolkata December 2020
PDF
Kafka Overview
PPTX
Apache Kafka - Messaging System Overview
PDF
Understanding Apache Kafka P99 Latency at Scale
PDF
Non-Kafkaesque Apache Kafka - Yottabyte 2018
PDF
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
PPTX
MuleSoft Meetup Singapore #8 March 2021
PPTX
Kafka RealTime Streaming
PDF
Streaming Analytics unit 2 notes for engineers
ODP
Apache ActiveMQ and Apache Camel
PPTX
Introduction to requirement of microservices
PDF
101 mistakes FINN.no has made with Kafka (Baksida meetup)
DOCX
RabbitMQ in Microservice Architecture.docx
PPT
Mq Lecture
PDF
Kafka Deep Dive
PPT
01-MessagePassingFundamentals.ppt
PDF
MongoDB Sharding
Triage Presentation
The Art of Message Queues - TEKX
Postponed Optimized Report Recovery under Lt Based Cloud Memory
#4 Mulesoft Virtual Meetup Kolkata December 2020
Kafka Overview
Apache Kafka - Messaging System Overview
Understanding Apache Kafka P99 Latency at Scale
Non-Kafkaesque Apache Kafka - Yottabyte 2018
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
MuleSoft Meetup Singapore #8 March 2021
Kafka RealTime Streaming
Streaming Analytics unit 2 notes for engineers
Apache ActiveMQ and Apache Camel
Introduction to requirement of microservices
101 mistakes FINN.no has made with Kafka (Baksida meetup)
RabbitMQ in Microservice Architecture.docx
Mq Lecture
Kafka Deep Dive
01-MessagePassingFundamentals.ppt
MongoDB Sharding
Ad

More from Matt Leming (16)

PDF
533-MigratingYourMQIApplicationsToJMS.pdf
PPTX
What's new with MQ on z/OS 9.3 and 9.3.1
PPTX
Connecting mq&amp;kafka
PPTX
What's New In MQ 9.2 on z/OS
PDF
Building a resilient and scalable solution with IBM MQ on z/OS
PDF
What's new in MQ 9.1.* on z/OS
PPTX
Where is my MQ message on z/OS?
PPTX
REST APIs and MQ
PDF
What's new in MQ 9.1 on z/OS
PPTX
The enterprise differentiator of mq on zos
PDF
Where is My Message
PPTX
New Tools and Interfaces for Managing IBM MQ
PDF
MQ Support for z/OS Connect
PDF
HHM-2833: Where is My Message?: Using IBM MQ Tools to Work Out What Applicati...
PDF
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
PDF
HHM-3481: IBM MQ for z/OS: Enhancing Application and Messaging Connectivity ...
533-MigratingYourMQIApplicationsToJMS.pdf
What's new with MQ on z/OS 9.3 and 9.3.1
Connecting mq&amp;kafka
What's New In MQ 9.2 on z/OS
Building a resilient and scalable solution with IBM MQ on z/OS
What's new in MQ 9.1.* on z/OS
Where is my MQ message on z/OS?
REST APIs and MQ
What's new in MQ 9.1 on z/OS
The enterprise differentiator of mq on zos
Where is My Message
New Tools and Interfaces for Managing IBM MQ
MQ Support for z/OS Connect
HHM-2833: Where is My Message?: Using IBM MQ Tools to Work Out What Applicati...
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
HHM-3481: IBM MQ for z/OS: Enhancing Application and Messaging Connectivity ...

Recently uploaded (20)

PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
AI in Product Development-omnex systems
PPTX
ai tools demonstartion for schools and inter college
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Digital Strategies for Manufacturing Companies
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
System and Network Administration Chapter 2
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
Which alternative to Crystal Reports is best for small or large businesses.pdf
How to Migrate SBCGlobal Email to Yahoo Easily
Operating system designcfffgfgggggggvggggggggg
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
AI in Product Development-omnex systems
ai tools demonstartion for schools and inter college
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Navsoft: AI-Powered Business Solutions & Custom Software Development
Digital Strategies for Manufacturing Companies
Online Work Permit System for Fast Permit Processing
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Softaken Excel to vCard Converter Software.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
ManageIQ - Sprint 268 Review - Slide Deck
Upgrade and Innovation Strategies for SAP ERP Customers
System and Network Administration Chapter 2
How to Choose the Right IT Partner for Your Business in Malaysia

Going Deep with MQ

  • 1. Going large deep with MQ Matt Leming: Architect, IBM MQ for z/OS lemingma@uk.ibm.com © 2020 IBM Corporation 1
  • 2. What does going deep mean? And when should you, and shouldn’t you do it?
  • 3. Application Application A good queue is an empty queue MQ is designed to allow applications to asynchronously communicate, acting as a buffer to smooth workload peaks and temporary application outages
  • 4. Application Application A good queue is an empty queue, but application failures happen! MQ is designed to allow applications to asynchronously communicate, acting as a buffer to smooth workload peaks and temporary application outages Failure of a putting application has no real effect on MQ Failure of a getting application will result in messages building up on its queues until the application restarts and beings processing again
  • 5. Application A good queue is an empty queue, but application failures happen! MQ is designed to allow applications to asynchronously communicate, acting as a buffer to smooth workload peaks and temporary application outages Failure of a putting application has no real effect on MQ Failure of a getting application will result in messages building up on its queues until the application restarts and beings processing again If the getting application suffers an extended outage queues can completely fill up, resulting in the putting application having to have a strategy for how to deal with a full queue Application
  • 6. Application Application The aim of this presentation This presentation aims to explore how deep a queue can go, and things you should bear in mind when that happens
  • 7. Application Are deep queues a valid use case for MQ? MQ is designed to allow applications to asynchronously communicate, acting as a buffer to smooth workload peaks and temporary application outages These outages might be minutes, hours or even days, but they are always temporary We are not advocating using MQ as a database. I.e. keeping data on queues for “ever”. MQ is not optimised for this use case Use a database instead! Application
  • 8. Application Application Are deep queues the only answer? No, but they are often the simplest from an application perspective. But as we will see, really deep queues require some thinking about Some alternatives include: • Putting application detecting MQRC_Q_FULL and pausing putting more messages • Putting application detecting MQRC_Q_FULL and putting new messages to a database, file, another queue • Starting up a temporary getting application to move the messages off somewhere else • Failing! In addition to having to code these solutions, they often come with challenges such as keeping message ordering etc
  • 9. Life-cycle of a getting application outage It is up to you to decide the maximum getting application outage duration that you want to tolerate Some customers are now looking at multiple day outages with 10s of millions of messages a day building up However that decision needs to be in the bounds of possibility for MQ. There is a limit to the amount of data a queue manager can store Time Depth Getting application outage Getting application restarts Getting application processing backlog Normal running Normal running Peak queue depth Duration of recovery from outage Duration of getting application outage
  • 10. Other important things • When you decide what the queue size limits are, make sure you enforce them via configuration: MAXDEPTH, MAXMSGL • Make sure you monitor for a getting application outage, perhaps via DIS QSTATUS, or by service interval events, so you can restart the getting application as soon as possible • Monitor for the queue filling up, e.g. queue depth full, or high events, for similar reasons • As the queue fills up you might start to see degraded putting application performance, it’s worth understanding what this might be so you can plan for it Time Depth Getting application outage Getting application restarts Getting application processing backlog Normal running Normal running Peak queue depth Duration of recovery from outage Duration of getting application outage
  • 11. Other important things • When the getting application starts up it is going to need to catch up. How fast can it do this? E.g. with a 2 day outage and a getting application that can get twice as fast as data being put, it will take a total of 4 days from the start of the application outage to get back to normal • Do you need to allow for a getting application outage occurring during this recovery window? • Can the messages expire? And are you relying on that to keep the queue depth low? • Be particularly aware of potential getting application inefficiencies. For example getting using a message selector is relatively slow, this will be particularly noticeable on a deep queue • Make sure you test the deep queue scenario, including recovery, to make sure everything works in the timeframes you expect! Time Depth Getting application outage Getting application restarts Getting application processing backlog Normal running Normal running Duration of recovery from outage Duration of getting application outage Peak queue depth
  • 12. So, how deep can I go? Approximate number of 1 KB messages MQ on z/OS private queue 16.8 million MQ on z/OS shared queue all in CF 613.6 million* ** MQ on z/OS shared queue on SMDS 1.4 billion* ** MQ on distributed 68.4 billion* We will see where these numbers come from later * We haven’t tested to the end of these limits **Going this deep will mean you can’t recover in the case of a structure failure. So non- persistent messages only!
  • 13. Going deep with MQ on z/OS Private queues
  • 14. Private queues – buffer pools and page sets Messages on private queues are stored in memory in buffer pools and might be moved into a page set on disk Data in buffer pools and page sets is accessed in 4KB pages. A single page might contain the queue spine, message meta data, message data for at most one message, or space usage (space map page) A private queue is associated with a single buffer pool / page set pair so the size of a page set is the constraining factor for private queues. A page set can be a maximum of 64 GB in size, which allows for ~16.8 million 1KB messages. This is a rough calculation, ignoring space maps, spine pages, etc To get a rough idea for the maximum number of messages of a given size that can be stored in a page set, round up the size of the message to a multiple of 4 KB and divide 64 GB by that number Buffer pool Page set
  • 15. Private queues – recommendations If you want to allow a private queue to deal with an extensive getting application outage consider putting them in their own page set and ideally their own buffer pool. This will allow for accurate sizing. You don’t want two very deep queues on the same page set at the same time There is obviously a limit to how much separation you can have here given that there can be at most 100 page sets and buffer pools in a single z/OS queue manager Buffer pool Page set
  • 16. Indexing your queues On z/OS you can specify the INDXTYPE attribute on local queues to tell the queue manager how to index the queue: no index; by message ID; by correlation ID; by group ID An index makes no difference if just getting the next available message off a queue, but makes a significant difference otherwise, especially if the queue is deep INDXTYPE only supports a single value, so choose it based off the most common approach for getting messages from the queue. However for private queues you can still use any approach regardless of the index, but it will be less efficient. The queue manager will tell you if you should consider indexing your queues MessageID=A MessageID=B MessageID=C
  • 17. Indexing your queues The index for private queues is maintained in the queue manager in 64 bit storage Each message on an indexed queue uses 136 bytes to maintain the index. 10 million messages therefore uses 1360MB of 64 bit storage So if you are going to have deep private queues which are indexed make sure you account for it in the MEMLIMIT attribute of your *MSTR JCL MessageID=A MessageID=B MessageID=C
  • 18. Going deep with MQ on z/OS Shared queues
  • 19. Shared queues – storage Shared queues are stored in a coupling facility (CF) Messages may be entirely stored in the CF if the message size < 63KB, or a pointer to the message can be stored in the CF and the remainder of the message offloaded to Db2 (not recommended, and not discussed further) or in shared message data sets (SMDS) The maximum supported CF structure size is 1TB. This is all real storage, so is relatively expensive If SMDS is used each queue manager gets its own SMDS data set for the structure. The maximum size of a single SMDS is 16 TB pMessage pMessage pMessage
  • 20. Shared queues – small messages Shared queues perform best when the message is held entirely in the CF as that minimises both code path length and removes the need to interact with DASD For messages which are < 63KB in size the best approach is therefore to keep them in the CF and only offload them to SMDS when the queue starts getting deep during a getting application outage This gives the best of both worlds, fast message access normally, but the ability to store lots of messages in the worst case If you do decide to keep small messages in the CF and not to offload, the maximum number of 1KB messages that can be stored in a single structure is approximately 613.6 million, based on the max CF size of 1TB pMessage pMessage pMessage
  • 21. Shared queues – offload rules Each MQ CFSTRUCT definition has three offload rules associated with it Each rule specifies the maximum size message that can be stored in in the structure, when the structure is over a given percentage full The default rules assume that no messages < 63KB get offloaded until the structure is very full These defaults are not likely to be good for a getting application outage where you want to maximize the number of messages you can store and minimise the amount of CF used. Instead you might want to start offloading all messages when the structure is say 10% full, as shown in the example on the right hand size DEFINE CFSTRUCT(SHALLOWSTRUCT) CFLEVEL(5) … OFFLOAD(SMDS) OFFLD1TH(70) OFFLD1SZ(32K) OFFLD2TH(80) OFFLD1SZ(4K) OFFLD2TH(90) OFFLD1SZ(0K) DEFINE CFSTRUCT(DEEPSTRUCT) CFLEVEL(5) … OFFLOAD(SMDS) OFFLD1TH(10) OFFLD1SZ(0K) OFFLD2TH(10) OFFLD1SZ(0K) OFFLD2TH(10) OFFLD1SZ(0K)
  • 22. Shared queues – offloaded messages Each offloaded message requires a message pointer in the CF. The CF is used to maintain queueing semantics, the pointer allows the messages to be located in SMDS Offloaded messages require you to consider both the CF structure size and the space used in SMDS At most a structure can contain 1.4 billion message pointers, you then need to consider how much space those messages will occupy on SMDS A single SMDS data set can be up to 16TB in size. This can easily take 1.4 billion 1KB messages, but only 16 million 1MB messages pMessage pMessage pMessage a message pointer 1 entry = 256 bytes 2 elements = 2 * 256 bytes Total size = 768 bytes pMessage
  • 23. Shared queues – back ups CF structures are in memory structures. In the rare cases where they fail, they need to be rebuilt With MQ this is done by periodically taking back ups of the structure. If a structure failure occurs the structure can be recovered from the back up plus the logs of the queue managers that have accessed the structure since the failure Deep queues have important implications for this process: • Size and number of active and archive logs • Backup time • Backup frequency • Recovery time BACKUP CFSTRUCT(DEEPSTRUCT) Active and archive logs RECOVER CFSTRUCT(DEEPSTRUCT)
  • 24. Shared queues – back ups – archive and active logs Backing up a structure involves writing the structure contents and the contents of the SMDS to a queue manager’s active logs. Over time the active logs then get written to the archive logs Therefore, as shown above, the back up might be only in the active logs, in a mixture of active and archive logs, or just archive logs As shown above the contents of the active logs normally mainly overlap the contents of the archive logs Therefore the limiting factor for a back up is the number and size of the archive logs. The biggest backup you can have is ~ 4096 GB. NB this is smaller than the maximum size of an SMDS! Current active log (0) Previous active log (-1) Archive log 0 Archiving in process Previous active log (-2) Archive log -1 Archiving complete = Previous active log (-3) Archive log -2 Archiving complete = Archive log -3 Data only in archive Archive log -4 Data only in archive Up to 310 * 4 GB active logs Up to 1000 * 4 GB archive logs
  • 25. Shared queues – back ups – archive and active logs In order to recover a backup it needs to be accessible to the queue managers, so you need to make sure that the start of the backup remains in the available archive logs for that queue manager, otherwise you can’t recover it! You don’t just need one backup, you need to be able to safely transition from one backup to the next one, should anything fail while the backup is occurring. The green and yellow “usable backups” above illustrate this The conclusion of all this is that you really don’t want a backup to be more than about a quarter of the available archive logs of a queue manager, i.e. < 1 TB at the most For this extreme case you should also be considering a separate queue manager just to perform the backups. This removes the risks of application messages pushing the backup from the archive logs, and also means the back up process can’t affect applications Archive log 0 (newest available) Archive log -999 (oldest available) Usable backup Unusable backup Messages since backup RECOVER CFSTRUCT works RECOVER CFSTRUCT fails Usable backup Messages since backup Messages since backup
  • 26. Shared queues – back ups – time and frequency Backing up a structure takes time. The best backup is one that contains minimal data, as that will be fast. IBM recommends taking a backup of every structure at least every hour to minimise the amount of time recovery will take Care is needed here with deep queues! Backing up a large structure will take a long time, and as discussed it will take a lot of log space If your getting application outage can be several days, continuing with your normal backup strategy while the structure contains lots and lots of messages might not be a good idea BACKUP CFSTRUCT(DEEPSTRUCT) Active and archive logs RECOVER CFSTRUCT(DEEPSTRUCT)
  • 27. Shared queues – back ups – time and frequency It is worth considering adjusting your back up strategy during extended getting application outages 1) During normal running back up every hour or so (whatever is normal for your site) 2) When the queue starts to get very deep, pause backups, or make them less frequent, until the getting application restarts and the queue depth has reduced This approach minimises the repeated costs of taking a large back up both in terms of CPU and log usage. However it does mean that any recovery will rely on reading a potentially large amount of log data across the queue managers in your group. Therefore you need to accurately size the active and archive logs across your group too, and consider how long recovery might take A good time to do the pause is when the size of the backup starts to become multiple times the amount of data that would normally get written between backups Time Depth Regular backups Getting application outage Getting application restarts Regular backups resume Pause backups
  • 28. Shared queues – back ups – recovery Frequent back ups are recommended to minimise the amount of time it takes to recover from a back up Recovery involves reading the logs in the backup and scanning the logs of all queue managers that used the structure since the point of the backup. This results in reading the active / archive logs backwards which is typically slower than reading them forwards With deep queues this could take a significant period of time. But it does require multiple failures all at the same time (getting application outage for a significant period of time, and then subsequent CF structure failure) It might be worth considering CF duplexing to reduce the chance of a structure failure, but bear in mind this will result in increased CF CPU utilization, and more CF storage being required Time Depth Regular backups Getting application outage Getting application restarts Structure fails here Pause backups
  • 29. Going deep with MQ on distributed
  • 30. Queue files Distributed takes a different approach to z/OS Each queue gets its own in memory set of buffers for temporarily staging messages. A different set of buffers are used for persistent and non-persistent messages These buffers can be tuned, up to a maximum size of 100MB, they default to 128KB for non-persistent messages and 256 KB for persistent ones. These settings aren’t as fully externalized as buffer pools on z/OS. But similar performance considerations apply The buffers asynchronously get written to the queue file, and there is one queue file per queue Q1 Non-persistent message buffers Persistent message buffers Queue file Q2 Non-persistent message buffers Persistent message buffers Queue file Q3 Non-persistent message buffers Persistent message buffers Queue file
  • 31. Queue files Queue file size is the ultimate upper limit for queue depth on distributed. Default max size is ~2TB From 9.2.0 adjusting this is simple as shown above. The maximum value is ~255TB Messages are stored on queue files in blocks. If the queue file < 2TB in size the block size is 512 bytes Above 2 TB the block size is 4KB, which means a 1KB message will use a whole block Maximum number of 1KB messages on a single queue is therefore ~68.4 billion DEFINE QL(NEWQUEUE) MAXFSIZE(500) ALTER QL(EXISTINGQUEUE) MAXFSIZE(1000) Create queue with maximum file size of 500MB Alter existing queue to have a max size of 1000MB DIS QSTATUS(NEWQUEUE) CURMAXFS CURFSIZE Queue is using 39 MB of its 500MB QUEUE(NEWQUEUE) CURFSIZE(39) CURMAXFS(500)
  • 32. Logging Distributed supports both linear and circular logging Linear logging is of interest with deep queues as it allows you to periodically backup queue files onto the log, i.e. create a media image As with shared queues, on distributed you need to think carefully as to when you create a media image of a very large queue file as it will consume a lot of log space, which you will have to maintain Queue managers can be configured to create media images automatically based on time, or amount of log usage since the last image. If you use this you might want to switch it off for long getting application outages Depth Regular media images Getting application outage Getting application restarts Regular media images resume Last media image
  • 34. Recommended reading For z/OS much of this information, along with some example values, is in the capacity and planning guide. I strongly recommend reading it http://ibm-messaging.github.io/mqperf/mp16.pdf For distributed, take a look at https://ibm-messaging.github.io/mqperf/MQ_Performance_Best_Practices_v1.0.1.pdf
  • 35. Thank you Matt Leming Architect, IBM MQ for z/OS lemingma@uk.ibm.com © 2022 IBM Corporation

Editor's Notes

  • #2: Author – David Ware