The Magic of a Risk Matrix and a Reflection on Safety Performance

In this media one can expect confusion when there are discussions related to risk matrix applications and more confusion when such information is related to so-called safety performance. One only needs to apply some thinking and confusion can be minimized. The main objective associated with safety performance equates to the identification, elimination of control of risks to acceptable levels for those exposed. Efforts that don’t directly support this objective are moot. There are formal constructs that deal with decision theory and the value of information that equates to acceptable risks.

Risk matrix…

The risk matrix is used to illustrate risks identified. Initial, current, and residual risks can be displayed. Usually, matrices are 4X4 or 5X5. On the Y axis likelihood is indicated and on the X axis severity.  Attempts have been made to depict risk on a logarithmic scale. Note that risk is a point estimate. 

There are qualitative and quantitative ways of evaluating risk. Caution should be applied when quantitative risk is attempted; concepts of probability and risk can be debated when Boolean and Bayes analyses are applied.

There is a difference between a hazard and risk and there is some confusion when one thinks in terms of a single hazard. Consider that system risks are comprised of many hazards forming adverse progressions. Risk considers the understanding of the progression and the likelihood and severity of a given system risks; the hazards lining up within the progression. It is not just the probability of a single hazard. The risk matrix is not a bad thing. Risks can be depicted in many ways: graphs, distributions, 3D illustrations, and with tie lines. Maybe the bad thing is that most don't understand how to use the matrix...It is only a way to display initial, current and residual risks...

Risk assessment…

Now risk assessment is another story about 20% know how to use a matrix and of the 20% maybe 20% know how to appropriately conduct risk assessment and apply continuous validation and verification of risk controls throughout the system and adverse progression life cycles...

Risk assessment processes…

Define the system in detail and decompose your system into parts, diagrams, flows, operations, tasks, procedures, links, technology, interfaces and interactions; Bound and scope the analysis; Develop system abstractions…Figure out how to conduct inclusive system hazard, threat, vulnerability analysis and risk assessment; (many papers are accessible via LinkedIn); Perturb the system and identify hazards, threats, vulnerabilities, develop adverse sequences, (including actions, decisions, inactions); Define risk criteria and identify current risks given minimal controls; Design controls to eliminate or control risks to acceptable levels (within adverse sequences (risks)); The output of analyses are both engineering and administrative controls: also proactive performance indicators; Monitor the system and continue to validate and verify all controls, update analyses and reports addressing system dynamics, changes, modifications, updates; Continually rank risks given risk criteria and allocate resources from high to lower risk...Hence a risk-based system.

Risk Metrics and the Value of Information[i]

Within the field of decision theory there are various methods and techniques that enable the evaluation of metrics along with the associated value the metrics provide. Metrics are the methods and techniques used to measure something, or the results obtained from the effort. A performance metric is that which determines an organization's risk management behavior and performance. Performance metrics measure an organization's activities and performance in support of the continuous validation and verification of risk controls. It should support a range of stakeholder risk management needs.

Developing performance metrics usually follows a process of: Establishing critical processes/stakeholder risk management requirements, Identifying specific, quantifiable outputs of work, Establishing targets against which results can be scored, High-level key system metrics can be broken down into more specific subsystem metrics; consider a general description of a large process metric can be further decomposed into lower-level processes metrics.

Baseline performance metrics – Prior to analysis it is important to take a snapshot of how well an existing process performs before implementation of additional refinements (controls) to improve the risk management process. The analyst will compare post-process performance with the baseline to verify that the refinement works.

Key performance indicators (KPI) – Are a set of quantifiable measures that an entity uses to gauge or compare performance in terms of meeting objectives. Effective KPI’s must:

Be well-defined and quantifiable.

Be thoroughly communicated throughout the organization.

Actually be crucial to achieving goals.

Be applicable to the Line of Business, department and stakeholders.

Effective KPI’s[ii] - There are six factors that separate effective, value creating KPIs from detrimental, value diminishing KPIs. The right KPIs for the operation should follow these KPI best practices:

Aligned - Make sure the KPIs are aligned with the strategic goals and objectives of your organization.

Attainable - The KPIs you choose to measure should have data that can be easily obtained.

Acute - KPIs should keep everyone on the same page and moving in the same direction.

Accurate - The data flowing into the KPI should be reliable and accurate.

Actionable - Does the KPI give insight into the process that is actionable?

Alive – Operations are always growing and changing. KPIs should evolve as well.

Example KPI’s

Key Performance Indicators are used to:

Allocate risk management resources

Control expenditures

Process cycle-time improvement

Increased stakeholder satisfaction

Key Performance Indicators can gauge process performance[iii]:

Average process overdue time

Percentage overdue

Average time to complete task

Sum of deviation time against planned schedule of all active projects

Average time between incident and its resolution

Number of outstanding actions

Percentage of correspondence replied to on time

Cycle time

Number of staff involved

Number of process errors

Number of human errors

Average time lag between identification of the compliance issue and resolution

Sum of deviation in money of planned budget of projects

Objective Metrics Requirement Statements – In line with KPI’s appropriate metrics requirements statements are developed, for example: Improve [primary metric] from [baseline average] to [target] by [date]. Metrics requirements statements must be SMART: Specific, Measurable, Aggressive (yet Achievable), Relevant, and Time-bound.

Value of Information - The "value of information analysis" considers metrics that focus on high-value measures. The value of information (VoI) is a decision analytic method for quantifying the potential benefit of additional information in the face of uncertainty. VoI can be particularly useful in identifying desirable ways to improve the prospective outcomes for the chosen course of action. In other words, VoI provides guidance on how decision makers might invest in reducing that uncertainty before selecting a course of action. VoI is also defined as the increase in expected value that arises from making the best choice with the benefit of a piece of information compared to the best choice without the benefit of that same information.

VoI considerations:

With nonlinear utility functions, VoI is the amount that could be paid to obtain the information, whereby the decision with information would result in the same certain equivalent value as the decision without information and without incurring the cost of obtaining it.

VoI can be used to assess the value of any piece of information that helps to improve the estimate of one or more alternatives’ performance on one or more criterion.

In some cases, resolving uncertainty prior to making decisions has little or no actual value in a particular context, while in other cases, resolving uncertainty may be the primary enabler of value in a situation and not necessarily in a way that is intuitively obvious.

In calculating VoI, information obtained can be assumed perfect (the results obtained correspond to the actual state of the world with certainty), which provides an upper bound on the potential gain and we call this EVPI (expected value of perfect information).

Alternatively, models may consider the expected value of sample information (often called EVSI) or of imperfect information. In these cases, new information increases the decision maker’s knowledge of the state of the world, but the result is still uncertain.

Resource Savings, Cost and Risk Avoidance…

Via improvement analyses efficiencies will be gained in products, processes, procedures, tasks, and operations. Depending on the evaluation the various method and techniques discussed enables resource savings and cost reductions. There are categories of saving and cost avoidance to consider:

Hard savings – are the quantifiable savings that are the direct result of improvement analyses and associated controls. Hard savings are savings in materials, time expended, or overhead.

Soft savings – are intangible benefits as on output of improvement analyses, such a motivation, a reduction of undue stressors, enhancing organizational culture…positive can-do mindset, collective thinking, team building, consensus development, improved communications and involvement.

Potential savings – are by-products of improvement analyses that require subsequent action to be realized. Savings remain potential if the resources are not applied to another use.

Cost/Resource Avoidance – an output of improvement analyses there is the elimination or decease of waste and human error and products, processes, procedures, tasks, and operations are redesigned to be more efficient and effective… All of these efforts will result in decreased resource expenditure, as well as a decrease in associated risks.


[i] For further information on VoI see: file:///C:/Users/afs430ma/Downloads/fulltext_stamped.pdf

 

[ii] For information on effective KPI’s refer to: https://www.klipfolio.com/resources/kpi-examples

 

[iii] For additional information on performance see: http://www.pnmsoft.com/resources/bpm-tutorial/key-performance-indicators/

 

Mark F. Witcher, Ph.D.

Actively Retired - ReRA (Relational Risk Analysis) developer

6d

“For those that understand risk management and risk assessment.” What if nobody understands what a risk is? The dominant definition and view that a risk is an event has essentially prevented people from understanding how to analyze and manage risks because risks are really a relationship between events. Risks are mostly a bad mechanism that produces a bad event not just a bad event. Until that misdefinition is corrected, risk analysis will continue to flounder searching for something that works.

Like
Reply

The Risk Assessment Matrix (RAM) was never designed to calculate or quantify risk—it is a strategic ranking tool, not a data-starved calculator. Misusing it to quantify likelihood of a consequence is a misapplication that breeds distortion. RAM’s operational integrity lies in comparing consequences across hazards with order-of-magnitude clarity. It is a strategic ranking tool. Using order-of-magnitude rather than numbers or probabilities, the RAM provides a strategic overview of where to place your controls/barriers in relation to achieving Tolerable Risk. Ranking the consequence of a hazard (or initiating event) against consequences of other hazards (or initiating events). Using order of magnitude for comparison purposes (orders of magnitude higher/lower compared to another hazard consequence). Afterall, we don't have enough empirical data to calculate the real probability/likelihood. It helps us mobilize limited resources toward what is consequential and non-negotiable. High-impact hazards get fortified, and their safeguards become mission-critical. Stop wielding RAM as a blunt instrument. Use it with tactical focus.

Keith Miller

Technical Safety Consultant

3w

Your paper has left me totally confused. You describe the hazard management process comprehensively and then just say figure the probability and use the matrix, but don't tell us how to determine the probability or what the risk matrix is for. However, you do quite rightly say that hazard and risk are different things, but you don't tell us how to get from one to the other. And nor has anyone else ever explained this to me either. My take is that your paper justifies why the matrices are completely useless. My experience is that people using risk matrices are doing so because they haven't applied critical thinking, and its just an easy thing to put a cross in a box without any forethought. The graphic somehow impresses upon the lay person that they have done something clever, which they haven't. And because no one can ever verify the rankings there's no accountability for this nonsense.

Ahmad Aminu Ibrahim

Computer Engineer | Project Management

3w

Thanks for sharing, Mike

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics