SlideShare a Scribd company logo
0
Integration Technical Conference 2019
M08: Monitoring and
problem determination of
an MQ z/OS system
Matt Leming
lemingma@uk.ibm.com
1
Agenda – the MQ Toolbox
• What’s connected?
• Are messages flowing?
• Where are messages going?
• What are the apps doing?
• How can I look back in time?
NB: Capabilities differ in form between
Distributed and z/OS, this presentation is
z/OS focused
1
© 2019 IBM Corporation
What’s connected?
33
DISPLAY CONN
!MQ21 DISPLAY CONN(*) TYPE(HANDLE) WHERE(OBJNAME EQ WLMMDB.REQUEST)
CSQM201I !MQ21 CSQMDRTC DIS CONN DETAILS
CONN(577C425321295301)
EXTCONN(414D5143474154455741593120202020)
TYPE(HANDLE)
OBJNAME(WLMMDB.REQUEST)
OBJTYPE(QUEUE) READA(NO)
OPENOPTS(MQOO_OUTPUT,MQOO_FAIL_IF_QUIESCING)
QSGDISP(QMGR) ASTATE(NONE) HSTATE(INACTIVE)
OBJNAME(SENDINGAPP.REPLY)
OBJTYPE(QUEUE) READA(NO)
OPENOPTS(MQOO_INPUT_SHARED,MQOO_INQUIRE,MQOO_SAVE_ALL_CONTEXT,MQOO_FAIL_IF_QUIESCING)
QSGDISP(QMGR) ASTATE(ACTIVE) HSTATE(ACTIVE)
CSQM201I !MQ21 CSQMDRTC DIS CONN DETAILS
CONN(577C425321295301)
EXTCONN(414D5143474154455741593120202020)
TYPE(CONN) URDISP(QMGR) CONNOPTS(MQCNO_SHARED_BINDING)
UOWLOGDA( ) UOWLOGTI( )
UOWSTDA(2014-04-08) UOWSTTI(13.24.00) UOWSTATE(ACTIVE)
NID( )
EXTURID(XA_FORMATID[DSAW]
XA_GTRID[00000145414B8AB40000000104DF48FC0001020304050607080900010203040506070809]
XA_BQUAL[00000145414B8AB40000000104DF48FC0001020304050607080900010203040506070809000000010000000000000000
000000000001])
QMURID(00000000036A70C4)
URTYPE(XA) ASTATE(NONE)
USERID(MLEMING) APPLTAG(jms/GATEWAY1_CF)
ASID(00D5) APPLTYPE(CHINIT) APPLDESC(IBM MQ Channel Initiator)
CHANNEL(WAS.CLIENTS) CONNAME(127.0.0.1)
Use CONN to match TYPE(CONN) and
TYPE(HANDLE) records
TYPE(HANDLE) records let you find
applications by the objects they access.
See all open handles for an app in one place,
unlike DIS QSTATUS records
Long running UOW information.
XID can be tied up with app server transaction
timeout
Channel name + IP help identify client apps.
MQ V7.5 and later JMS clients can supply
an application name in the CF
!MQ21 DISPLAY CONN(577C425321295301) ALL
ASID for bindings based applications
44
DISPLAY CHSTATUS
!MQ21 DISPLAY CHSTATUS(*) ALL
CSQM201I !MQ21 CSQMDRTC DIS CHSTATUS DETAILS
CHANNEL(WAS.CLIENTS)
CHLDISP(PRIVATE) CONNAME(127.0.0.1) CURRENT
CHLTYPE(SVRCONN) STATUS(RUNNING) SUBSTATE(RECEIVE)
CURSHCNV(1) MAXSHCNV(1)
LSTMSGTI(15.26.59) LSTMSGDA(2014-04-08) MSGS(6)
BYTSSENT(2456) BYTSRCVD(2296)
CHSTATI(15.26.59) CHSTADA(2014-04-08)
BUFSSENT(13) BUFSRCVD(17)
MONCHL(OFF) STOPREQ(NO) KAINT(360) QMNAME(MQ21)
RAPPLTAG(jms/GATEWAY1_CF)
SECPROT(TLSV12) SSLCERTI(CN=ExampleCA,O=Example)
SSLCERTU(MQSX00) SSLCIPH(TLS_RSA_WITH_AES_128_CBC_SHA256)
SSLRKEYS(0) SSLKEYTI( ) SSLKEYDA( )
SSLPEER(SERIALNUMBER=53:43:FD:D6,CN=ExampleApp1,O=Example)
RPRODUCT(MQJM) RVERSION(09010200)
MCAUSER(MLEMING) LOCLADDR( )
MAXMSGL(4194304)
COMPHDR(NONE NONE) COMPMSG(NONE NONE)
COMPRATE(0,0) COMPTIME(0,0)
HBINT(5)
See SSLPEER information not in DIS CONN
Note that multiple CONN might share one SVRCONN channel instance
4
© 2019 IBM Corporation
Rerouting messages if no-
one is connected
6
Cluster WLM
• Cluster WLM algorithm doesn’t
check whether consuming
applications are connected to
cluster queues
– I.e. whether something is getting
messages from them
• You need to ensure that
applications are consuming
from all instances of a
clustered queue to prevent
messages building up
• However, there is an
alternative: AMQSCLM
6
Driver
qmgr 1
Driver
qmgr 2
Driver
qmgr N
. . .
Target
qmgr 1
Target
qmgr 2
Target
qmgr n
. . .
Sending appsSending appsSending apps
Cluster
WLM
App
server 2
IPPROCS(0) IPPROCS(>=1)
Round-robin message distribution
IPPROCS(0)
77
AMQSCLM
1) Normal operation
Driver
qmgr 1
Driver
qmgr 2
Driver
qmgr N
. . .
Target
qmgr 1
Target
qmgr 2
Target
qmgr n
. . .
Sending appsSending appsSending apps
Cluster
WLM
Round-robin message distribution
App
server 1
App
server 2
App
server N
. . .
IPPROCS(>=1) IPPROCS(>=1) IPPROCS(>=1)
AMQSCLM AMQSCLM AMQSCLM
7
88
AMQSCLM
2) Application failure
Driver
qmgr 1
Driver
qmgr 2
Driver
qmgr N
. . .
Target
qmgr 1
Target
qmgr 2
Target
qmgr n
. . .
Sending appsSending appsSending apps
Cluster
WLM
App
server 1
App
server 2
. . .
IPPROCS(0) IPPROCS(>=1) IPPROCS(>=1)
Messages
start
building
up
AMQSCLM AMQSCLM AMQSCLM
App
server N
Round-robin message distribution
8
99
AMQSCLM
3) Failure Detection
Driver
qmgr 1
Driver
qmgr 2
Driver
qmgr N
. . .
Target
qmgr 1
Target
qmgr 2
Target
qmgr n
. . .
Sending appsSending appsSending apps
Cluster
WLM
App
server 1
App
server 2
. . .
IPPROCS(0) IPPROCS(>=1) IPPROCS(>=1)
AMQSCLM
checks the
queue every
5 minutes
and detects
IPPROCS(0)
AMQSCLM AMQSCLM AMQSCLM
App
server N
Round-robin message distribution
1010
AMQSCLM
4) Redirect Messages
Driver
qmgr 1
Driver
qmgr 2
Driver
qmgr N
. . .
Target
qmgr 1
Target
qmgr 2
Target
qmgr n
. . .
Sending appsSending appsSending apps
Cluster
WLM
App
server 1
App
server 2
. . .
IPPROCS(0) IPPROCS(>=1) IPPROCS(>=1)
AMQSCLM
reduces
queue priority
in cluster, to
stop new
messages
arriving
AMQSCLM AMQSCLM AMQSCLM
App
server N
Round-robin message distribution
1
1111
AMQSCLM
5) Un-maroon Msgs (optional)
Driver
qmgr 1
Driver
qmgr 2
Driver
qmgr N
. . .
Target
qmgr 1
Target
qmgr 2
Target
qmgr n
. . .
Sending appsSending appsSending apps
Cluster
WLM
App
server 1
App
server 2
. . .
IPPROCS(0) IPPROCS(>=1) IPPROCS(>=1)
AMQSCLM
tells qmgr to
redistribute
‘stuck’
messages via
the cluster
AMQSCLM AMQSCLM AMQSCLM
App
server N
Round-robin message distribution
12
AMQSCLM summary
• The cluster queue monitoring sample program (AMQSCLM)
• Shipped with the distributed product as a sample
– Precompiled
– Source code
– Not shipped on z/OS, but distributed source code can be compiled and used on z/OS
• More information here:
http://www.ibm.com/support/knowledgecenter/SSFKSJ_9.1.0/com.ibm.mq.dev.doc/q024
620_.htm
© 2019 IBM Corporation
Are messages flowing?
14
Real-time/online monitoring – queues
• Set detail level for queue manager. Override for individual queues
• Gives live view of application responsiveness
!MQ21 ALTER QMGR MONQ(MEDIUM)
!MQ21 ALTER QLOCAL(QUEUE1) MONQ(HIGH)
!MQ21 ALTER QLOCAL(QUEUE2) MONQ(OFF)
CSQM201I !MQ21 CSQMDRTC DIS QSTATUS DETAILS
QSTATUS(QUEUE1) TYPE(QUEUE)
OPPROCS(5) IPPROCS(3) CURDEPTH(16)
UNCOM(YES) MONQ(HIGH)
QTIME(10101414, 10101414)
MSGAGE(112)
LPUTTIME(17.12.16) LPUTDATE(2019-04-08)
LGETTIME(17.05.59) LGETDATE(2019-04-08)
QSGDISP(QMGR)
!MQ21 DIS QSTATUS(QUEUE1) ALL
Age in seconds of the oldest
message on the queue
Estimations of the time in microseconds that messages are waiting on the queue for
processing.
First value: Calculated from recent activity
Second value: Calculated from longer term activity
Timestamps of last PUT/GET
to check for recent activity
Without MONQ you only get the depth and how
many handles are open.
NB: HIGH/MEDIUM/LOW
doesn’t matter for queues
NB: Off by default, so
consider switching it on at
the queue manager level!
Bear in mind that CURDEPTH includes
committed and uncommitted messages. Look
at UNCOM!
NB: for shared queues
these values are only this
queue manager’s view,
apart from CURDEPTH &
MSGAGE
15
Real-time/online monitoring – channels
• Gives live view of channel throughput
!MQ21 ALTER QMGR MONCHL(MEDIUM) MONACLS(MEDIUM)
!MQ21 ALTER CHANNEL(CLUSTER1.QM1) CHLTYPE(CLUSRCVR) MONCHL(HIGH)
CSQM201I !MQ21 CSQMDRTC DIS CHSTATUS DETAILS
CHSTATUS(TO.GWY2) CHLDISP(PRIVATE) XMITQ(GATEWAY2) CONNAME(127.0.0.1(1522))
CURRENT CHLTYPE(SDR) STATUS(RUNNING) SUBSTATE(RECEIVE) INDOUBT(YES)
LSTSEQNO(11773) LSTLUWID(0107445310001B33)
CURMSGS(50) CURSEQNO(11823) CURLUWID(D5F6358CE52A358A)
LSTMSGTI(17.49.51) LSTMSGDA(2014-04-08)
MSGS(1580) BYTSSENT(1192330) BYTSRCVD(1748) BATCHES(52)
CHSTATI(17.49.03) CHSTADA(2019-04-08)
BUFSSENT(1616) BUFSRCVD(55)
LONGRTS(999999999) SHORTRTS(180)
MONCHL(MEDIUM)
XQTIME(545784,3929968)
NETTIME(137538,29555)
EXITTIME(0,0)
XBATCHSZ(20,17)
COMPTIME(0,0) COMPRATE(0,0)
STOPREQ(NO) KAINT(360)
QMNAME(MQ21) RQMNAME(QWY2)
SECPROT(NONE) SSLCERTI( ) SSLCERTU(MQSX00) SSLCIPH( )
SSLRKEYS(0) SSLKEYTI( ) SSLKEYDA( ) SSLPEER( )
RPRODUCT(MQMV) RVERSION(09010200)
STATCHL(OFF) LOCLADDR(127.0.0.1(53557))
BATCHSZ(50) MAXMSGL(4194304)
COMPHDR(NONE NONE) COMPMSG(NONE NONE) HBINT(5) NPMSPEED(FAST))
!MQ21 DIS CHSTATUS(TO.GWY2) ALL
Short/long term calculations of how full your
batches are getting, to help you tune
BATCHSZ/BATCHINT
Last time a message was sent over the
channel
Short/long term calculations of how long
messages are waiting on the XMITQ for
transmission
Short/long term view of the network time to
send a request and receive a response.
Calculated during batch completion
NB: HIGH/MEDIUM/LOW
doesn’t matter for channels
on z/OS
NB: Can be expensive for
large numbers of channels
due to impact on BP 0
NB: Off by default, so
consider switching it on at
the queue manager level!
16
Accounting and statistics overview
• Realtime information isn’t always sufficient. It is useful to be
able to spot trends over time, and see how the system is
currently deviating from the average
• Queue manager can periodically output monitoring data to
SMF
• SMF 115: Statistics
– High level queue manager wide information. E.g. memory
usage, log information, locking information, etc
– Most useful from perspective of this presentation is likely
to be message manager information which gives
information about counts of MQI operations
• SMF 116: Accounting
– More detailed…
– Per task information: log usage, CF usage, CPU time,
some MQI information
– Per queue information: gives information on MQI usage,
open/close time, more detailed log information
– Per channel information: similar to output from DIS
CHSTATUS
– Can generate a lot of data on busy queue managers
• All this information is in binary format. Assembler and C
mapping are provided
000000 01A40000 5E740035 61240100 223FD4E5 *....;.../.....MV*
000000 F4F1D4D8 F2F10000 F9F1F200 00000134 *41MQ21..912.....*
000000 00700001 00000054 00B00001 00000104 *................*
000000 00300001 00000000 00000000 00000000 *................*
000000 00000000 00000000 00000000 00000000 *................*
Example SMF 116 header
17
Accounting and statistics overview
• To collect:
– Enabled SMF to collect 115/116 data
– Enable the queue manager to generate it from start
up either via CSQ6SYSP
– Or via MQ’s START TRACE command
– Fine grained control available on per queue/channel
basis
• Then dump out records from SMF
• Tooling is available to work with records
– CSQ4SMFD: basic sample C program
– MP1B: a SupportPac, see later slides
– MQSMFCSV: simple formatter to convert SMF
records into csv. Can then be imported into
spreadsheets and databases
https://github.com/ibm-messaging/mq-smf-csv
– IBM OMEGAMON for messaging
– Others…
CSQY103I !MQ21 SMFACCT= YES (F0000000),
SMFSTAT=YES (F0000000), STATIME=1
!MQ21 START TRACE(ACCT) CLASS(*)
!MQ21 ALTER QMGR ACCTQ(ON)
!MQ21 ALTER QLOCAL(QUEUE1) ACCTQ(QMGR)
!MQ21 ALTER CHANNEL(TO.GWY2) STATCHL(HIGH)
CHLTYPE(SDR)
A gotcha: for chinit stats and accounting
you have to issue:
START TRACE(STAT) CLASS(4)
START TRACE(ACCTG) CLASS(4)
1818
MP1B
• Unsupported SupportPac: https://www-01.ibm.com/support/docview.wss?uid=swg24005907
• Kept reasonably up to date by MQ development team
• Uses:
– Summary of transactions, jobs and channels
– Summary of queues being used by application
– List of “high use” queues
– …
• Will detect potential issues (based on our past experience) and output them as a summary
• Can generate csv files for import into spreadsheets which allows for graphs of activity over time
to be generated
• Lots of options to customize the level of output
• Much more useful than CSQ4SMFD
• Used by lots of customers
1919
MP1B example
• QSUML report gives a summary of activity against local queues
• Useful for understanding your expected workload against a set of queues. You can then spot anomalies
Number of times message data
wasn’t in buffer pool
2020
MP1B Example
To dig further you can use the TASK report which digs further into accounting data
16305 MV41,MQ21,2019/04/12,15:19:05,VRM:913,
16305 MQ21 MOVER Jobname:MQ21CHIN Userid:MQSX00
16305 Channel SYSTEM.DEF.SVRCONN 9.174.27.35
16305 Start time Apr 12 15:18:39 2019 Started in a different time interval
16305 Interval Apr 12 15:18:39 2019 - Apr 12 15:18:41 2019 : 1.549716 seconds
16305 Other reqs : Total ET 0.000024 Seconds
16305 Other reqs : Total CPU 0.000024 Seconds
16305 Commit count 101
16305 Commit avg elapsed time 628 uS
16305 Backout count 1
16305 Backout avg elapsed time 15 uS
16305 Backout avg CPU time 15 uS
16305 MQCTL count 301
16305 MQCTL avg elapsed time 0 uS
16305 MQCTL avg CPU time 0 uS
16305 Total suspend time 0.062778 Seconds
16305 Open name AQUEUE
16305 Queue type:QLocal AQUEUE
16305 Page set ID 4 AQUEUE
16305 Buffer pool 4 AQUEUE
16305 Get count 100 AQUEUE
16305 Get avg elapsed time 42 uS AQUEUE
16305 Get avg CPU time 41 uS AQUEUE
16305 Get valid destructive 100 AQUEUE
16305 Put count 100 AQUEUE
16305 Put avg elapsed time 519 uS AQUEUE
16305 Put avg CPU time 75 uS AQUEUE
….
Lots of backouts is normally a bad sign
Start of per queue information
Discrepancy between these two is often a
sign of messages not being in buffer pool so
being got from pageset
Discrepancy between these two for persistent
messages is likely to be because of logging.
For non-persistent messages it could be
because the buffer pool is full
2121
MP1B Example
Then go look at channel accounting data too.
Jobname: MV41,MQ21,2019/04/12,16:44:14,VRM:913
SMF interval start local time 2019/04/12,16:43:24
SMF interval end local time 2019/04/12,16:44:14
SMF interval start GMT 2019/04/12,15:43:24
SMF interval end GMT 2019/04/12,15:44:14
SMF interval duration 49.332448 seconds
SYSTEM.DEF.SVRCONN 9.174.27.35 Connection name 9.174.27.35
SYSTEM.DEF.SVRCONN 9.174.27.35 Channel disp PRIVATE
SYSTEM.DEF.SVRCONN 9.174.27.35 Channel type SVRCONN
SYSTEM.DEF.SVRCONN 9.174.27.35 Channel status RUNNING
SYSTEM.DEF.SVRCONN 9.174.27.35 Remote qmgr/app LoadQueueLoopJMSWIthDifferen
SYSTEM.DEF.SVRCONN 9.174.27.35 Channel started date & time 2019/04/12,15:43:26
SYSTEM.DEF.SVRCONN 9.174.27.35 Channel status collect time 2019/04/12,15:44:14
SYSTEM.DEF.SVRCONN 9.174.27.35 Active for 47 seconds
SYSTEM.DEF.SVRCONN 9.174.27.35 Last MQI request time 2019/04/12,15:44:13
SYSTEM.DEF.SVRCONN 9.174.27.35 Dispatcher number 1
SYSTEM.DEF.SVRCONN 9.174.27.35 Number of MQI requests 20,853
SYSTEM.DEF.SVRCONN 9.174.27.35 Number of persistent messages 0
SYSTEM.DEF.SVRCONN 9.174.27.35 Buffers sent 20,855
SYSTEM.DEF.SVRCONN 9.174.27.35 Buffers received 21,259
SYSTEM.DEF.SVRCONN 9.174.27.35 Current shared connections 4
SYSTEM.DEF.SVRCONN 9.174.27.35 Message data 0 0 B
SYSTEM.DEF.SVRCONN 9.174.27.35 Persistent message data 0 0 B
SYSTEM.DEF.SVRCONN 9.174.27.35 Non persistent message data 0 0 B
SYSTEM.DEF.SVRCONN 9.174.27.35 Total bytes sent 7,988,504 7801 KB
SYSTEM.DEF.SVRCONN 9.174.27.35 Total bytes received 5,611,764 5480 KB
SYSTEM.DEF.SVRCONN 9.174.27.35 Bytes received/message 269 269 B
SYSTEM.DEF.SVRCONN 9.174.27.35 Bytes sent/message 383 383 B
….
2222
Performance considerations
• Capturing SMF data has a cost: CPU and storage of the records
• SMF 115 (queue manager stats): negligible -> always have it on
• SMF 116 (queue manager accounting class 1): 0.5% increase in transaction cost in our workloads
– High level API count, and CPU usage per thread
• SMF 116 (queue manager accounting, class 1 & 4): 3.3% increase in transaction cost in our workloads
– Per thread and queue usage information
• SMF 115 + 116 (chinit): 1-2% increase in transaction cost in our sample workloads
• This cost needs to be compared to the benefit of having the data
• Consider sizing your environment to assume that MQ SMF data of all types will be needed at some point,
or always running with it on
• We have had many instances of customers raising PMRs for performance problems and not having
enough headroom to enable accounting trace, which is how we diagnose performance problems!
• Plus when investigating performance issues it is always useful to have a baseline
• See capacity and planning guide: MP16 for more information
– https://ibm-messaging.github.io/mqperf/mp16.pdf
© 2019 IBM Corporation
Where are the messages
going?
24
MQ networks can be complex
• At each of the dots stuck / mis-sent messages are possible
– MQOPENs of the wrong queue / queue manager by apps
– Full queues
– Stopped channels
– Stopped apps
– Incorrectly configured QREMOTE/QALIAS routing objects
– Cluster membership problems
• The standard problem diagnosis approach
– Methodically checking channels/queues/DLQs at each point
• Is there anything to speed up this process?
App1
App2A
App2B
Requests
Replies
Replies
SDR/RCVR
SDR/RCVR
Cluster
25
Trace route
• MQ has the ability to inject trace route messages
– (Can be) hidden from applications
– Generate activity reports as they pass through, potentially accumulated in the message
• Tools are available to trace routes using these reports
– dspmqrte – command line tool supplied with Distributed. Can be run against a z/OS queue manager as a
client
– MS0P – cat 2 SupportPac extension to MQ Explorer
• Lets you see the path messages could have taken
– Test connectivity through the MQ network
– Test cluster workload balancing
• Can quickly jump you close to the problem
– The point your trace message veers off in the wrong direction
– The point the trail goes cold
• NB: there is also a related technology: activity recording which generates activity reports
from real messages
dspmqrte SDR/RCVR
SDR/RCVR
dspmqrte
dspmqrte
Cluster
© 2019 IBM Corporation
What are the apps doing?
27
What are the apps doing?
• Distributed have application activity trace which can give
information about all the MQI operations an application does
in the order that they do it
• Useful for:
1. Application audit trails
2. Message duplication
3. Detailed problem determination
4. Enforcing application coding standards
• Should be selectively enabled to reduce performance impact
• MQ for z/OS doesn’t have any equivalent to this, but accounting
stats can start to help with #1, and we will later see
a tool for #2
• There is an RFE for adding application activity trace to MQ for z/OS
– https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56321
© 2019 IBM Corporation
How can I look back in time?
29
What happened to my messages at 2am this morning?
• Enterprise monitoring solution
– DLQ alerts, queue depth alerts, channel status alerts
– Unresolved running units of work
– Historical MQ monitoring, accounting and stats data
• App logs from the time of the problem
– Exceptions, MQ error codes, timeouts
• MQ job logs for all queue managers that could have been involved
– Channel errors
– Authentication issues
• SYSLOG
• ??? – what else is there
30
What about the MQ recovery log?
• For persistent messages regardless of whether in a transaction or not
– MQ logs each operation performed
• Why can't we use this to
– Look back in time to 2am and see what happened?
– Recover the original payload if the app lost the message?
– See what happened inside long-running units of work?
– Provide a list of operations within the failed business transaction?
• MQ provides a utility to do so: CSQ1LOGP
– Allows you to look at both active and archive logs even if queue manager is running
– Can extract information on messages put / got in committed, rolled-back and in-
flight units of work
– Can also be used to get information on alterations to object definitions
– Can be combined with DFSORT to do things like give the total amount of persistent
message data sent to a queue
3131
Running CSQ1LOGP
• ACTIVE1 DD card: name of active log to search
• CSQCMT DD card: message data from committed units of work go here
CSQBACK, CSQBOTH, CSQINFLT and CSQOBJS are also
available
• PAGESET(4): only look at data in pageset 4
• EXTRACT(YES): extract message data to specified datasets
• RBASTART(): start from this RBA to the end of the log
RBAEND and LRSN variants are also available
32
Example output
Output is one line for each MQPUT, MQGET and each UR related operation
MQPUT example shown below. Payload is in bold
2019.108 7:31:42.730 0N m < w i 8XX** Ö MQSX00 N m l CHIN MQ21CHINBUR
MQ21AQUEUE y MQPUT CN < x AMD
CSQ MQ21 N m MQ21
MQSX00 MQ21CHIN1BEC3AE0 Ö 
LoadQueueLoopJMSWIthDifferen2019041807314273 This is a message
While this output format can be useful its likely that you will want to run tooling against it to further refine the data.
We provide sample C code (CSQ4LOGS) which will parse the output from CSQ1LOGP and then either:
– Display summary information about each unit of work (no message data)
– Replay the messages to the target queue, possibly to a different queue manager, where it can then be
browsed
– An obvious extension to this would be to dump just the message data. This is left as an action for the user…
33
Summary
• Lots of tools in your MQ toolbox!
• On-line status commands
– DISPLAY CONN
– DISPLAY QSTATUS
– DISPLAY CHSTATUS
• Cluster monitoring – AMQSCLM
• Off-line statistics and accounting
– View using MP1B
• Tracking
– Trace-route
• MQ recovery logs
– CSQ1LOGP
Integration Technical Conference 2019 34
Don’t forget to fill out the survey!
Selectyour session, selectsurvey, rate thesession andsubmit!
35
ThankYou
Matt Leming
lemingma@uk.ibm.com

More Related Content

PPTX
GDPS and System Complex
PDF
IBM MQ and Kafka, what is the difference?
PDF
IBM Integration Bus High Availability Overview
PPTX
REST APIs and MQ
PPT
IBM MQ Online Tutorials
PDF
Db2 and storage management (mullins)
PDF
IBM MQ High Availability 2019
PDF
MQ Security Overview
GDPS and System Complex
IBM MQ and Kafka, what is the difference?
IBM Integration Bus High Availability Overview
REST APIs and MQ
IBM MQ Online Tutorials
Db2 and storage management (mullins)
IBM MQ High Availability 2019
MQ Security Overview

What's hot (20)

PDF
IBM MQ - better application performance
PDF
Where is My Message
PDF
IBM MQ Channel Authentication
PDF
IBM WebSphere MQ V8 Security Features: Deep Dive
PDF
Websphere MQ (MQSeries) fundamentals
PDF
FlashCopy and DB2 for z/OS
PDF
TCP/IP Stack Configuration with Configuration Assistant for IBM z/OS CS
PDF
Avaya Aura System Manager
PPTX
Maria db 이중화구성_고민하기
PDF
Track 2 session 6 db2 utilities update and best practices v2
PDF
Db2 for z/OS and FlashCopy - Practical use cases (June 2019 Edition)
PDF
Fault tolerant and scalable ibm mq
PDF
IBM MQ Clustering (2017 version)
PDF
[2018] MySQL 이중화 진화기
PDF
IBM MQ Update, including 9.1.2 CD
PPT
PPTX
Why Do I Need an SBC
PDF
Developing rich SIP applications with SIPSIMPLE SDK
DOCX
Data power Performance Tuning
IBM MQ - better application performance
Where is My Message
IBM MQ Channel Authentication
IBM WebSphere MQ V8 Security Features: Deep Dive
Websphere MQ (MQSeries) fundamentals
FlashCopy and DB2 for z/OS
TCP/IP Stack Configuration with Configuration Assistant for IBM z/OS CS
Avaya Aura System Manager
Maria db 이중화구성_고민하기
Track 2 session 6 db2 utilities update and best practices v2
Db2 for z/OS and FlashCopy - Practical use cases (June 2019 Edition)
Fault tolerant and scalable ibm mq
IBM MQ Clustering (2017 version)
[2018] MySQL 이중화 진화기
IBM MQ Update, including 9.1.2 CD
Why Do I Need an SBC
Developing rich SIP applications with SIPSIMPLE SDK
Data power Performance Tuning
Ad

Similar to Where is my MQ message on z/OS? (20)

PDF
HHM-2833: Where is My Message?: Using IBM MQ Tools to Work Out What Applicati...
PDF
Monitoring and problem determination of your mq distributed systems
PDF
IBM IMPACT 2014 - AMC-1883 - Where's My Message - Analyze IBM WebSphere MQ Re...
PDF
Network time protocol
PPTX
ECI OpenFlow 2.0 the Future of SDN
PDF
Network-Connected Development with ZeroMQ
 
PPTX
IBM MQ Whats new - up to 9.3.4.pptx
PDF
Matteo Murgida - Monet: a NodeJS enterprise system for IoT and Energy Managem...
PDF
AIST Super Green Cloud: lessons learned from the operation and the performanc...
PDF
Where is My Message?: Use MQ Tools to Work Out What Applications Have Done
PDF
9Tuts.Com New CCNA 200-120 New CCNA New Questions 2
PPTX
Timing attacks against web applications: Are they still practical?
PDF
Transport SDN & OpenDaylight Use Cases in Korea
PDF
Report PAPID 7
PDF
Отчет Audit report RAPID7
PDF
Switching and multicast schemes in asynchronous transfer mode networks
PDF
Whats new in IBM MQ; V9 LTS, V9.0.1 CD and V9.0.2 CD
PDF
IBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging Capabilities
PDF
CA Spectrum® Just Keeps Getting Better and Better
ODP
IBM WebSphere MQ for z/OS - The Inside Story
HHM-2833: Where is My Message?: Using IBM MQ Tools to Work Out What Applicati...
Monitoring and problem determination of your mq distributed systems
IBM IMPACT 2014 - AMC-1883 - Where's My Message - Analyze IBM WebSphere MQ Re...
Network time protocol
ECI OpenFlow 2.0 the Future of SDN
Network-Connected Development with ZeroMQ
 
IBM MQ Whats new - up to 9.3.4.pptx
Matteo Murgida - Monet: a NodeJS enterprise system for IoT and Energy Managem...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Where is My Message?: Use MQ Tools to Work Out What Applications Have Done
9Tuts.Com New CCNA 200-120 New CCNA New Questions 2
Timing attacks against web applications: Are they still practical?
Transport SDN & OpenDaylight Use Cases in Korea
Report PAPID 7
Отчет Audit report RAPID7
Switching and multicast schemes in asynchronous transfer mode networks
Whats new in IBM MQ; V9 LTS, V9.0.1 CD and V9.0.2 CD
IBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging Capabilities
CA Spectrum® Just Keeps Getting Better and Better
IBM WebSphere MQ for z/OS - The Inside Story
Ad

More from Matt Leming (13)

PDF
533-MigratingYourMQIApplicationsToJMS.pdf
PPTX
Going Deep with MQ
PPTX
What's new with MQ on z/OS 9.3 and 9.3.1
PPTX
Connecting mq&amp;kafka
PPTX
What's New In MQ 9.2 on z/OS
PDF
Building a resilient and scalable solution with IBM MQ on z/OS
PDF
What's new in MQ 9.1.* on z/OS
PDF
What's new in MQ 9.1 on z/OS
PPTX
The enterprise differentiator of mq on zos
PPTX
New Tools and Interfaces for Managing IBM MQ
PDF
MQ Support for z/OS Connect
PDF
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
PDF
HHM-3481: IBM MQ for z/OS: Enhancing Application and Messaging Connectivity ...
533-MigratingYourMQIApplicationsToJMS.pdf
Going Deep with MQ
What's new with MQ on z/OS 9.3 and 9.3.1
Connecting mq&amp;kafka
What's New In MQ 9.2 on z/OS
Building a resilient and scalable solution with IBM MQ on z/OS
What's new in MQ 9.1.* on z/OS
What's new in MQ 9.1 on z/OS
The enterprise differentiator of mq on zos
New Tools and Interfaces for Managing IBM MQ
MQ Support for z/OS Connect
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
HHM-3481: IBM MQ for z/OS: Enhancing Application and Messaging Connectivity ...

Recently uploaded (20)

PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
top salesforce developer skills in 2025.pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
medical staffing services at VALiNTRY
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
System and Network Administration Chapter 2
PPTX
Introduction to Artificial Intelligence
Wondershare Filmora 15 Crack With Activation Key [2025
top salesforce developer skills in 2025.pdf
How to Migrate SBCGlobal Email to Yahoo Easily
VVF-Customer-Presentation2025-Ver1.9.pptx
Navsoft: AI-Powered Business Solutions & Custom Software Development
How Creative Agencies Leverage Project Management Software.pdf
Softaken Excel to vCard Converter Software.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Understanding Forklifts - TECH EHS Solution
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Operating system designcfffgfgggggggvggggggggg
medical staffing services at VALiNTRY
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
System and Network Administration Chapter 2
Introduction to Artificial Intelligence

Where is my MQ message on z/OS?

  • 1. 0 Integration Technical Conference 2019 M08: Monitoring and problem determination of an MQ z/OS system Matt Leming lemingma@uk.ibm.com
  • 2. 1 Agenda – the MQ Toolbox • What’s connected? • Are messages flowing? • Where are messages going? • What are the apps doing? • How can I look back in time? NB: Capabilities differ in form between Distributed and z/OS, this presentation is z/OS focused 1
  • 3. © 2019 IBM Corporation What’s connected?
  • 4. 33 DISPLAY CONN !MQ21 DISPLAY CONN(*) TYPE(HANDLE) WHERE(OBJNAME EQ WLMMDB.REQUEST) CSQM201I !MQ21 CSQMDRTC DIS CONN DETAILS CONN(577C425321295301) EXTCONN(414D5143474154455741593120202020) TYPE(HANDLE) OBJNAME(WLMMDB.REQUEST) OBJTYPE(QUEUE) READA(NO) OPENOPTS(MQOO_OUTPUT,MQOO_FAIL_IF_QUIESCING) QSGDISP(QMGR) ASTATE(NONE) HSTATE(INACTIVE) OBJNAME(SENDINGAPP.REPLY) OBJTYPE(QUEUE) READA(NO) OPENOPTS(MQOO_INPUT_SHARED,MQOO_INQUIRE,MQOO_SAVE_ALL_CONTEXT,MQOO_FAIL_IF_QUIESCING) QSGDISP(QMGR) ASTATE(ACTIVE) HSTATE(ACTIVE) CSQM201I !MQ21 CSQMDRTC DIS CONN DETAILS CONN(577C425321295301) EXTCONN(414D5143474154455741593120202020) TYPE(CONN) URDISP(QMGR) CONNOPTS(MQCNO_SHARED_BINDING) UOWLOGDA( ) UOWLOGTI( ) UOWSTDA(2014-04-08) UOWSTTI(13.24.00) UOWSTATE(ACTIVE) NID( ) EXTURID(XA_FORMATID[DSAW] XA_GTRID[00000145414B8AB40000000104DF48FC0001020304050607080900010203040506070809] XA_BQUAL[00000145414B8AB40000000104DF48FC0001020304050607080900010203040506070809000000010000000000000000 000000000001]) QMURID(00000000036A70C4) URTYPE(XA) ASTATE(NONE) USERID(MLEMING) APPLTAG(jms/GATEWAY1_CF) ASID(00D5) APPLTYPE(CHINIT) APPLDESC(IBM MQ Channel Initiator) CHANNEL(WAS.CLIENTS) CONNAME(127.0.0.1) Use CONN to match TYPE(CONN) and TYPE(HANDLE) records TYPE(HANDLE) records let you find applications by the objects they access. See all open handles for an app in one place, unlike DIS QSTATUS records Long running UOW information. XID can be tied up with app server transaction timeout Channel name + IP help identify client apps. MQ V7.5 and later JMS clients can supply an application name in the CF !MQ21 DISPLAY CONN(577C425321295301) ALL ASID for bindings based applications
  • 5. 44 DISPLAY CHSTATUS !MQ21 DISPLAY CHSTATUS(*) ALL CSQM201I !MQ21 CSQMDRTC DIS CHSTATUS DETAILS CHANNEL(WAS.CLIENTS) CHLDISP(PRIVATE) CONNAME(127.0.0.1) CURRENT CHLTYPE(SVRCONN) STATUS(RUNNING) SUBSTATE(RECEIVE) CURSHCNV(1) MAXSHCNV(1) LSTMSGTI(15.26.59) LSTMSGDA(2014-04-08) MSGS(6) BYTSSENT(2456) BYTSRCVD(2296) CHSTATI(15.26.59) CHSTADA(2014-04-08) BUFSSENT(13) BUFSRCVD(17) MONCHL(OFF) STOPREQ(NO) KAINT(360) QMNAME(MQ21) RAPPLTAG(jms/GATEWAY1_CF) SECPROT(TLSV12) SSLCERTI(CN=ExampleCA,O=Example) SSLCERTU(MQSX00) SSLCIPH(TLS_RSA_WITH_AES_128_CBC_SHA256) SSLRKEYS(0) SSLKEYTI( ) SSLKEYDA( ) SSLPEER(SERIALNUMBER=53:43:FD:D6,CN=ExampleApp1,O=Example) RPRODUCT(MQJM) RVERSION(09010200) MCAUSER(MLEMING) LOCLADDR( ) MAXMSGL(4194304) COMPHDR(NONE NONE) COMPMSG(NONE NONE) COMPRATE(0,0) COMPTIME(0,0) HBINT(5) See SSLPEER information not in DIS CONN Note that multiple CONN might share one SVRCONN channel instance 4
  • 6. © 2019 IBM Corporation Rerouting messages if no- one is connected
  • 7. 6 Cluster WLM • Cluster WLM algorithm doesn’t check whether consuming applications are connected to cluster queues – I.e. whether something is getting messages from them • You need to ensure that applications are consuming from all instances of a clustered queue to prevent messages building up • However, there is an alternative: AMQSCLM 6 Driver qmgr 1 Driver qmgr 2 Driver qmgr N . . . Target qmgr 1 Target qmgr 2 Target qmgr n . . . Sending appsSending appsSending apps Cluster WLM App server 2 IPPROCS(0) IPPROCS(>=1) Round-robin message distribution IPPROCS(0)
  • 8. 77 AMQSCLM 1) Normal operation Driver qmgr 1 Driver qmgr 2 Driver qmgr N . . . Target qmgr 1 Target qmgr 2 Target qmgr n . . . Sending appsSending appsSending apps Cluster WLM Round-robin message distribution App server 1 App server 2 App server N . . . IPPROCS(>=1) IPPROCS(>=1) IPPROCS(>=1) AMQSCLM AMQSCLM AMQSCLM 7
  • 9. 88 AMQSCLM 2) Application failure Driver qmgr 1 Driver qmgr 2 Driver qmgr N . . . Target qmgr 1 Target qmgr 2 Target qmgr n . . . Sending appsSending appsSending apps Cluster WLM App server 1 App server 2 . . . IPPROCS(0) IPPROCS(>=1) IPPROCS(>=1) Messages start building up AMQSCLM AMQSCLM AMQSCLM App server N Round-robin message distribution 8
  • 10. 99 AMQSCLM 3) Failure Detection Driver qmgr 1 Driver qmgr 2 Driver qmgr N . . . Target qmgr 1 Target qmgr 2 Target qmgr n . . . Sending appsSending appsSending apps Cluster WLM App server 1 App server 2 . . . IPPROCS(0) IPPROCS(>=1) IPPROCS(>=1) AMQSCLM checks the queue every 5 minutes and detects IPPROCS(0) AMQSCLM AMQSCLM AMQSCLM App server N Round-robin message distribution
  • 11. 1010 AMQSCLM 4) Redirect Messages Driver qmgr 1 Driver qmgr 2 Driver qmgr N . . . Target qmgr 1 Target qmgr 2 Target qmgr n . . . Sending appsSending appsSending apps Cluster WLM App server 1 App server 2 . . . IPPROCS(0) IPPROCS(>=1) IPPROCS(>=1) AMQSCLM reduces queue priority in cluster, to stop new messages arriving AMQSCLM AMQSCLM AMQSCLM App server N Round-robin message distribution 1
  • 12. 1111 AMQSCLM 5) Un-maroon Msgs (optional) Driver qmgr 1 Driver qmgr 2 Driver qmgr N . . . Target qmgr 1 Target qmgr 2 Target qmgr n . . . Sending appsSending appsSending apps Cluster WLM App server 1 App server 2 . . . IPPROCS(0) IPPROCS(>=1) IPPROCS(>=1) AMQSCLM tells qmgr to redistribute ‘stuck’ messages via the cluster AMQSCLM AMQSCLM AMQSCLM App server N Round-robin message distribution
  • 13. 12 AMQSCLM summary • The cluster queue monitoring sample program (AMQSCLM) • Shipped with the distributed product as a sample – Precompiled – Source code – Not shipped on z/OS, but distributed source code can be compiled and used on z/OS • More information here: http://www.ibm.com/support/knowledgecenter/SSFKSJ_9.1.0/com.ibm.mq.dev.doc/q024 620_.htm
  • 14. © 2019 IBM Corporation Are messages flowing?
  • 15. 14 Real-time/online monitoring – queues • Set detail level for queue manager. Override for individual queues • Gives live view of application responsiveness !MQ21 ALTER QMGR MONQ(MEDIUM) !MQ21 ALTER QLOCAL(QUEUE1) MONQ(HIGH) !MQ21 ALTER QLOCAL(QUEUE2) MONQ(OFF) CSQM201I !MQ21 CSQMDRTC DIS QSTATUS DETAILS QSTATUS(QUEUE1) TYPE(QUEUE) OPPROCS(5) IPPROCS(3) CURDEPTH(16) UNCOM(YES) MONQ(HIGH) QTIME(10101414, 10101414) MSGAGE(112) LPUTTIME(17.12.16) LPUTDATE(2019-04-08) LGETTIME(17.05.59) LGETDATE(2019-04-08) QSGDISP(QMGR) !MQ21 DIS QSTATUS(QUEUE1) ALL Age in seconds of the oldest message on the queue Estimations of the time in microseconds that messages are waiting on the queue for processing. First value: Calculated from recent activity Second value: Calculated from longer term activity Timestamps of last PUT/GET to check for recent activity Without MONQ you only get the depth and how many handles are open. NB: HIGH/MEDIUM/LOW doesn’t matter for queues NB: Off by default, so consider switching it on at the queue manager level! Bear in mind that CURDEPTH includes committed and uncommitted messages. Look at UNCOM! NB: for shared queues these values are only this queue manager’s view, apart from CURDEPTH & MSGAGE
  • 16. 15 Real-time/online monitoring – channels • Gives live view of channel throughput !MQ21 ALTER QMGR MONCHL(MEDIUM) MONACLS(MEDIUM) !MQ21 ALTER CHANNEL(CLUSTER1.QM1) CHLTYPE(CLUSRCVR) MONCHL(HIGH) CSQM201I !MQ21 CSQMDRTC DIS CHSTATUS DETAILS CHSTATUS(TO.GWY2) CHLDISP(PRIVATE) XMITQ(GATEWAY2) CONNAME(127.0.0.1(1522)) CURRENT CHLTYPE(SDR) STATUS(RUNNING) SUBSTATE(RECEIVE) INDOUBT(YES) LSTSEQNO(11773) LSTLUWID(0107445310001B33) CURMSGS(50) CURSEQNO(11823) CURLUWID(D5F6358CE52A358A) LSTMSGTI(17.49.51) LSTMSGDA(2014-04-08) MSGS(1580) BYTSSENT(1192330) BYTSRCVD(1748) BATCHES(52) CHSTATI(17.49.03) CHSTADA(2019-04-08) BUFSSENT(1616) BUFSRCVD(55) LONGRTS(999999999) SHORTRTS(180) MONCHL(MEDIUM) XQTIME(545784,3929968) NETTIME(137538,29555) EXITTIME(0,0) XBATCHSZ(20,17) COMPTIME(0,0) COMPRATE(0,0) STOPREQ(NO) KAINT(360) QMNAME(MQ21) RQMNAME(QWY2) SECPROT(NONE) SSLCERTI( ) SSLCERTU(MQSX00) SSLCIPH( ) SSLRKEYS(0) SSLKEYTI( ) SSLKEYDA( ) SSLPEER( ) RPRODUCT(MQMV) RVERSION(09010200) STATCHL(OFF) LOCLADDR(127.0.0.1(53557)) BATCHSZ(50) MAXMSGL(4194304) COMPHDR(NONE NONE) COMPMSG(NONE NONE) HBINT(5) NPMSPEED(FAST)) !MQ21 DIS CHSTATUS(TO.GWY2) ALL Short/long term calculations of how full your batches are getting, to help you tune BATCHSZ/BATCHINT Last time a message was sent over the channel Short/long term calculations of how long messages are waiting on the XMITQ for transmission Short/long term view of the network time to send a request and receive a response. Calculated during batch completion NB: HIGH/MEDIUM/LOW doesn’t matter for channels on z/OS NB: Can be expensive for large numbers of channels due to impact on BP 0 NB: Off by default, so consider switching it on at the queue manager level!
  • 17. 16 Accounting and statistics overview • Realtime information isn’t always sufficient. It is useful to be able to spot trends over time, and see how the system is currently deviating from the average • Queue manager can periodically output monitoring data to SMF • SMF 115: Statistics – High level queue manager wide information. E.g. memory usage, log information, locking information, etc – Most useful from perspective of this presentation is likely to be message manager information which gives information about counts of MQI operations • SMF 116: Accounting – More detailed… – Per task information: log usage, CF usage, CPU time, some MQI information – Per queue information: gives information on MQI usage, open/close time, more detailed log information – Per channel information: similar to output from DIS CHSTATUS – Can generate a lot of data on busy queue managers • All this information is in binary format. Assembler and C mapping are provided 000000 01A40000 5E740035 61240100 223FD4E5 *....;.../.....MV* 000000 F4F1D4D8 F2F10000 F9F1F200 00000134 *41MQ21..912.....* 000000 00700001 00000054 00B00001 00000104 *................* 000000 00300001 00000000 00000000 00000000 *................* 000000 00000000 00000000 00000000 00000000 *................* Example SMF 116 header
  • 18. 17 Accounting and statistics overview • To collect: – Enabled SMF to collect 115/116 data – Enable the queue manager to generate it from start up either via CSQ6SYSP – Or via MQ’s START TRACE command – Fine grained control available on per queue/channel basis • Then dump out records from SMF • Tooling is available to work with records – CSQ4SMFD: basic sample C program – MP1B: a SupportPac, see later slides – MQSMFCSV: simple formatter to convert SMF records into csv. Can then be imported into spreadsheets and databases https://github.com/ibm-messaging/mq-smf-csv – IBM OMEGAMON for messaging – Others… CSQY103I !MQ21 SMFACCT= YES (F0000000), SMFSTAT=YES (F0000000), STATIME=1 !MQ21 START TRACE(ACCT) CLASS(*) !MQ21 ALTER QMGR ACCTQ(ON) !MQ21 ALTER QLOCAL(QUEUE1) ACCTQ(QMGR) !MQ21 ALTER CHANNEL(TO.GWY2) STATCHL(HIGH) CHLTYPE(SDR) A gotcha: for chinit stats and accounting you have to issue: START TRACE(STAT) CLASS(4) START TRACE(ACCTG) CLASS(4)
  • 19. 1818 MP1B • Unsupported SupportPac: https://www-01.ibm.com/support/docview.wss?uid=swg24005907 • Kept reasonably up to date by MQ development team • Uses: – Summary of transactions, jobs and channels – Summary of queues being used by application – List of “high use” queues – … • Will detect potential issues (based on our past experience) and output them as a summary • Can generate csv files for import into spreadsheets which allows for graphs of activity over time to be generated • Lots of options to customize the level of output • Much more useful than CSQ4SMFD • Used by lots of customers
  • 20. 1919 MP1B example • QSUML report gives a summary of activity against local queues • Useful for understanding your expected workload against a set of queues. You can then spot anomalies Number of times message data wasn’t in buffer pool
  • 21. 2020 MP1B Example To dig further you can use the TASK report which digs further into accounting data 16305 MV41,MQ21,2019/04/12,15:19:05,VRM:913, 16305 MQ21 MOVER Jobname:MQ21CHIN Userid:MQSX00 16305 Channel SYSTEM.DEF.SVRCONN 9.174.27.35 16305 Start time Apr 12 15:18:39 2019 Started in a different time interval 16305 Interval Apr 12 15:18:39 2019 - Apr 12 15:18:41 2019 : 1.549716 seconds 16305 Other reqs : Total ET 0.000024 Seconds 16305 Other reqs : Total CPU 0.000024 Seconds 16305 Commit count 101 16305 Commit avg elapsed time 628 uS 16305 Backout count 1 16305 Backout avg elapsed time 15 uS 16305 Backout avg CPU time 15 uS 16305 MQCTL count 301 16305 MQCTL avg elapsed time 0 uS 16305 MQCTL avg CPU time 0 uS 16305 Total suspend time 0.062778 Seconds 16305 Open name AQUEUE 16305 Queue type:QLocal AQUEUE 16305 Page set ID 4 AQUEUE 16305 Buffer pool 4 AQUEUE 16305 Get count 100 AQUEUE 16305 Get avg elapsed time 42 uS AQUEUE 16305 Get avg CPU time 41 uS AQUEUE 16305 Get valid destructive 100 AQUEUE 16305 Put count 100 AQUEUE 16305 Put avg elapsed time 519 uS AQUEUE 16305 Put avg CPU time 75 uS AQUEUE …. Lots of backouts is normally a bad sign Start of per queue information Discrepancy between these two is often a sign of messages not being in buffer pool so being got from pageset Discrepancy between these two for persistent messages is likely to be because of logging. For non-persistent messages it could be because the buffer pool is full
  • 22. 2121 MP1B Example Then go look at channel accounting data too. Jobname: MV41,MQ21,2019/04/12,16:44:14,VRM:913 SMF interval start local time 2019/04/12,16:43:24 SMF interval end local time 2019/04/12,16:44:14 SMF interval start GMT 2019/04/12,15:43:24 SMF interval end GMT 2019/04/12,15:44:14 SMF interval duration 49.332448 seconds SYSTEM.DEF.SVRCONN 9.174.27.35 Connection name 9.174.27.35 SYSTEM.DEF.SVRCONN 9.174.27.35 Channel disp PRIVATE SYSTEM.DEF.SVRCONN 9.174.27.35 Channel type SVRCONN SYSTEM.DEF.SVRCONN 9.174.27.35 Channel status RUNNING SYSTEM.DEF.SVRCONN 9.174.27.35 Remote qmgr/app LoadQueueLoopJMSWIthDifferen SYSTEM.DEF.SVRCONN 9.174.27.35 Channel started date & time 2019/04/12,15:43:26 SYSTEM.DEF.SVRCONN 9.174.27.35 Channel status collect time 2019/04/12,15:44:14 SYSTEM.DEF.SVRCONN 9.174.27.35 Active for 47 seconds SYSTEM.DEF.SVRCONN 9.174.27.35 Last MQI request time 2019/04/12,15:44:13 SYSTEM.DEF.SVRCONN 9.174.27.35 Dispatcher number 1 SYSTEM.DEF.SVRCONN 9.174.27.35 Number of MQI requests 20,853 SYSTEM.DEF.SVRCONN 9.174.27.35 Number of persistent messages 0 SYSTEM.DEF.SVRCONN 9.174.27.35 Buffers sent 20,855 SYSTEM.DEF.SVRCONN 9.174.27.35 Buffers received 21,259 SYSTEM.DEF.SVRCONN 9.174.27.35 Current shared connections 4 SYSTEM.DEF.SVRCONN 9.174.27.35 Message data 0 0 B SYSTEM.DEF.SVRCONN 9.174.27.35 Persistent message data 0 0 B SYSTEM.DEF.SVRCONN 9.174.27.35 Non persistent message data 0 0 B SYSTEM.DEF.SVRCONN 9.174.27.35 Total bytes sent 7,988,504 7801 KB SYSTEM.DEF.SVRCONN 9.174.27.35 Total bytes received 5,611,764 5480 KB SYSTEM.DEF.SVRCONN 9.174.27.35 Bytes received/message 269 269 B SYSTEM.DEF.SVRCONN 9.174.27.35 Bytes sent/message 383 383 B ….
  • 23. 2222 Performance considerations • Capturing SMF data has a cost: CPU and storage of the records • SMF 115 (queue manager stats): negligible -> always have it on • SMF 116 (queue manager accounting class 1): 0.5% increase in transaction cost in our workloads – High level API count, and CPU usage per thread • SMF 116 (queue manager accounting, class 1 & 4): 3.3% increase in transaction cost in our workloads – Per thread and queue usage information • SMF 115 + 116 (chinit): 1-2% increase in transaction cost in our sample workloads • This cost needs to be compared to the benefit of having the data • Consider sizing your environment to assume that MQ SMF data of all types will be needed at some point, or always running with it on • We have had many instances of customers raising PMRs for performance problems and not having enough headroom to enable accounting trace, which is how we diagnose performance problems! • Plus when investigating performance issues it is always useful to have a baseline • See capacity and planning guide: MP16 for more information – https://ibm-messaging.github.io/mqperf/mp16.pdf
  • 24. © 2019 IBM Corporation Where are the messages going?
  • 25. 24 MQ networks can be complex • At each of the dots stuck / mis-sent messages are possible – MQOPENs of the wrong queue / queue manager by apps – Full queues – Stopped channels – Stopped apps – Incorrectly configured QREMOTE/QALIAS routing objects – Cluster membership problems • The standard problem diagnosis approach – Methodically checking channels/queues/DLQs at each point • Is there anything to speed up this process? App1 App2A App2B Requests Replies Replies SDR/RCVR SDR/RCVR Cluster
  • 26. 25 Trace route • MQ has the ability to inject trace route messages – (Can be) hidden from applications – Generate activity reports as they pass through, potentially accumulated in the message • Tools are available to trace routes using these reports – dspmqrte – command line tool supplied with Distributed. Can be run against a z/OS queue manager as a client – MS0P – cat 2 SupportPac extension to MQ Explorer • Lets you see the path messages could have taken – Test connectivity through the MQ network – Test cluster workload balancing • Can quickly jump you close to the problem – The point your trace message veers off in the wrong direction – The point the trail goes cold • NB: there is also a related technology: activity recording which generates activity reports from real messages dspmqrte SDR/RCVR SDR/RCVR dspmqrte dspmqrte Cluster
  • 27. © 2019 IBM Corporation What are the apps doing?
  • 28. 27 What are the apps doing? • Distributed have application activity trace which can give information about all the MQI operations an application does in the order that they do it • Useful for: 1. Application audit trails 2. Message duplication 3. Detailed problem determination 4. Enforcing application coding standards • Should be selectively enabled to reduce performance impact • MQ for z/OS doesn’t have any equivalent to this, but accounting stats can start to help with #1, and we will later see a tool for #2 • There is an RFE for adding application activity trace to MQ for z/OS – https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56321
  • 29. © 2019 IBM Corporation How can I look back in time?
  • 30. 29 What happened to my messages at 2am this morning? • Enterprise monitoring solution – DLQ alerts, queue depth alerts, channel status alerts – Unresolved running units of work – Historical MQ monitoring, accounting and stats data • App logs from the time of the problem – Exceptions, MQ error codes, timeouts • MQ job logs for all queue managers that could have been involved – Channel errors – Authentication issues • SYSLOG • ??? – what else is there
  • 31. 30 What about the MQ recovery log? • For persistent messages regardless of whether in a transaction or not – MQ logs each operation performed • Why can't we use this to – Look back in time to 2am and see what happened? – Recover the original payload if the app lost the message? – See what happened inside long-running units of work? – Provide a list of operations within the failed business transaction? • MQ provides a utility to do so: CSQ1LOGP – Allows you to look at both active and archive logs even if queue manager is running – Can extract information on messages put / got in committed, rolled-back and in- flight units of work – Can also be used to get information on alterations to object definitions – Can be combined with DFSORT to do things like give the total amount of persistent message data sent to a queue
  • 32. 3131 Running CSQ1LOGP • ACTIVE1 DD card: name of active log to search • CSQCMT DD card: message data from committed units of work go here CSQBACK, CSQBOTH, CSQINFLT and CSQOBJS are also available • PAGESET(4): only look at data in pageset 4 • EXTRACT(YES): extract message data to specified datasets • RBASTART(): start from this RBA to the end of the log RBAEND and LRSN variants are also available
  • 33. 32 Example output Output is one line for each MQPUT, MQGET and each UR related operation MQPUT example shown below. Payload is in bold 2019.108 7:31:42.730 0N m < w i 8XX** Ö MQSX00 N m l CHIN MQ21CHINBUR MQ21AQUEUE y MQPUT CN < x AMD CSQ MQ21 N m MQ21 MQSX00 MQ21CHIN1BEC3AE0 Ö LoadQueueLoopJMSWIthDifferen2019041807314273 This is a message While this output format can be useful its likely that you will want to run tooling against it to further refine the data. We provide sample C code (CSQ4LOGS) which will parse the output from CSQ1LOGP and then either: – Display summary information about each unit of work (no message data) – Replay the messages to the target queue, possibly to a different queue manager, where it can then be browsed – An obvious extension to this would be to dump just the message data. This is left as an action for the user…
  • 34. 33 Summary • Lots of tools in your MQ toolbox! • On-line status commands – DISPLAY CONN – DISPLAY QSTATUS – DISPLAY CHSTATUS • Cluster monitoring – AMQSCLM • Off-line statistics and accounting – View using MP1B • Tracking – Trace-route • MQ recovery logs – CSQ1LOGP
  • 35. Integration Technical Conference 2019 34 Don’t forget to fill out the survey! Selectyour session, selectsurvey, rate thesession andsubmit!