SlideShare a Scribd company logo
1
HTTP://WWW.MISSIONREADYSOFTWARE.COM
321-514-4659
An overview of the IEEE 1633 Recommended
Practices for Software Reliability
Ann Marie Neufelder
ann.neufelder@missionreadysoftware.com
• Introduction and motivation to product this guidance
• Solutions provided by the guidance
• Quantitative and qualitative reliability measures for making a software release decision
2
IEEE 1633 Working Group members
• Robert Stoddard - SEI
• Lance Fiondella - UMass
• Peter Lakey - Consultant
• Robert Binder – retired
• Michael Siok – Lockheed Martin
• Ming Li - NRC
• Ying Shi - NASA
• Nematollah Bidokhti - thinkDFR
• Thierry Wandji – US Navy
• Michael Grottke - FAU
• Andy Long - OSD
• George Stark - IBM
• Allen Nikora - NASA
• Bakul Banerjee – retired IEEE
• Debra Greenhalgh Lubas – US Navy
• Mark Sims – US Army
• Rajesh Murthy - Consultant
• Willie Fitzpatrick – US Army
• Mark Ofori-kyei – General Dynamics
• Sonya Davis – General Dynamics
2
• Burdette Joyner – Northrup
Grumman
• Marty Shooman –retired NASA
• Andrew Mack
• Loren Garroway – Northrup
Grumman
• Kevin Mattos– US Navy
• Kevin Frye - US Navy
• Claire Jones - Boeing
• Robert Raygan - OSD
• Mary Ann DeCicco – General
Dynamics
• Shane Smith - OSD
• Franklin Marotta – US Army
• David Bernreuther – US Army
• Martin Wayne – US Army
• Nathan Herbert – US Army
• Richard E Gibbs III - Boeing
• Harry White - Harmonic
• Jacob Axman – US Navy
• Ahlia T. Kitwana - Harris
• Yuan Wei
• Darwin Heiser – General
Dynamics
• Brian McQuillan – General
Dynamics
• Kishor Trivedi – Duke
University
• Debra Haehn – Philips
Healthcare
Chair: Ann Marie Neufelder,
Mission Ready Software
Vice Chair: Lance Fiondella -
UMass
Secretary: Rachel Neufelder,
Mission Ready Software
IEEE Standards Association
Chair: Louis Gullo, Northrop
Grumman
Martha Wetherholt of NASA was Vice Chair until her passing in 2020.
She was instrumental in delivering the 2016 edition.
Introduction and
Motivation
Reliable
software
engineering…
• Process assessments such as the SEI CMMi
assessment have not provided value added for
improving reliability or safety of software
• 30 years of data shows [1]
• No improved software reliability beyond level 3
• Organizations with CMMi level 3+ can and do produce
failed software programs
• A good process is necessary but not sufficient
• An organization can have a great process and still have
• People who do not understand the product or industry
doing development and test
• Low level of rigor in testing
• Requirements that are traceable but poorly written
• Design that is traceable but poorly written
• Test procedures that are traceable but poorly written
and have low coverage
• Overlooked failure modes
• Too many unknown defects in the software
• Too many known open defects that aren’t assessed
appropriately
• Too many workarounds in the software that burden
the end users and/or cause loss of availability
3
Introduction and Motivation
Reliable software engineering…
4
Has been an
engineering disciple
for > 50 years.
Fundamental
prerequisite for
virtually all modern
systems
Plenty of theory
generated over last
several decades,
but…Practical
guidance on how to
apply these models
has lagged
significantly
Diverse set of
stakeholders
requires pragmatic
guidance and tools
to apply software
reliability models to
assess real software
or firmware projects
during each stage of
the software
development
lifecycle
Fundamental roadblocks addressed by IEEE
1633
• Reliability engineers don’t understand software
• Software engineers don’t understand reliability
• Both may have challenges acquiring data
needed for the analyses
Solutions
provided by IEEE
1633
Actionable step by
step procedures for
assessing reliable
software
During any phase of
software or firmware
development
With any software
lifecycle model for any
industry or application
type.
• Reliable software is demonstrated by both qualitative
and quantitative evidence
• Fact based decisions for releasing software based on
qualitative and quantitative aspects
• Quantitative measures are the demonstration that
software is reliable
• Qualitative measures provide confidence that
quantitative measures are accurate
• The easiest way to record software fewer defects is to test
less or test with less rigor
5
Qualitative
Level of rigor in testing including
• Test LikeYou Operate
• Fault injection testing
• Peak loading and endurance testing
• Boundary and zero value testing
• Code coverage
• Go-No go testing
• Requirements coverage
Failure mode identification against the
“Common Defect Enumeration” or
known set of software failure modes
and/or fault tree analysis
Defect root cause analysis
Qualitative
Estimated portion of system failures
that will be due to software
Rate of defect discovery in testing
Fix rate versus open rate
Estimated residual defects and
potential for pileup
Severity level/effect of unresolved
defects
Defect density benchmark
Coverage metrics
Collect SW failure
data
Clause 5.4.4
Testing for reliable
software
Clause 5.4.1-5.4.3
Allocate reliability goals
to software components
Clause 5.3.5
Benchmark software
reliability early
Clauses 5.3.2, 6.2
Include software in the
system reliability model
Clause 5.3.4
6
Make a fact based release decision
Clause 5.5
Failure modes
analysis
Clause 5.2, 5.4.8
Plan the reliable software Clause 5.1
Evaluate reliability of software during
testing and operation
Clauses 5.4.4-5.4.7, 6.3
List of software configuration items (CI), failure
definition scoring criteria, assessment of key risks
that can derail the software program
Portion of total
failures due to SW
Predicted
reliability
measures for
each CI
Measurable
goals for each
SW CI
Failure modes
typically overlooked
in testing
Test results,
Level of rigor
Defect
discoveries
over usage
time
Test coverage, failure mode resolution, progress against reliability goals
All IEEE 1633 clauses
provide support for a
fact-based release
decision
Current status of
IEEE 1633
•Unanimously approved by IEEE
Standards Association in first ballot of
May 24, 2016. Released on January 18,
2017.
•Working group is currently making
updates for
• How reliable software tasks are executed in
DevSecOps
• Common Defect Enumeration recently
published on the Defense Acquisition
University R&M Community of Practice
website
• https://www.dau.edu/cop/rm-
engineering/_layouts/15/WopiFrame.aspx?sour
cedoc=/cop/rm-
engineering/DAU%20Sponsored%20Documen
ts/Reliable%20Software%20SOW%20Appendi
x%20B%20-%20CDE.xlsx&action=default
• Level of rigor in testing 7
Quantitative
• Exactly “1” quantitative measure
should not be used as a release
decision maker
• Ex: Defects per source line of code or
failures per hour should not be used
without other metrics
• No one metric tells the whole story
• It’s too easy for one metric to be a self-
fulfilling prophecy.
8
Quantitative
Estimated portion of
system failures that will be
due to software
Rate of defect discovery in
testing
Fix rate versus open rate
Estimated residual defects
and potential for pileup
Severity level/effect of
unresolved defects
Defect density benchmark
Coverage metrics
9
Estimated portion of system
failures due to software
• Past history method is very accurate if the past
history is recent and is calibrated for changes in
technology.
• As a rule, software grows 10-12% per year. So,
historical data should be calibrated to assume that
the software portion is growing 10-12% per year.
• There is virtually no chance that software will
decrease in size over time. Hence past history is a
useful lower bound.
• Real example: An engineering company
produced a system in 2015. Of all of the deployed
failures, 25% were due to software. In 2017 they
were deploying a similar system.
• Since historical data was 2 years old, 25% is
adjusted by 10-12 % per year.
• So, the prediction is between 30.25% and
31.36%.
• When the equipment was deployed in 2019 -
the actual portion of failures due to software was
33%. Much more accurate than the 5% estimated
by subject matter experts.
9
Method Description
Past
history
Compute relative
portion of SW versus
HW failures from a
past similar system
R&D $ Compute relative
portion of R&D $
dedicated to software
development
Achievabl
e failure
rates
Use prediction models
to determine failure
rate for HW, SW. The
predicted values for
each determine their
allocation.
Value added: Engineering has
no basis for assuming that
software doesn’t fail or that
the reliability = 1
10
Rate of defect
discovery in
testing
• Software fault rate increases, peaks and then decreases prior to
maturity
• Maturity level at deployment separates the world class from the
distressed
• Increasing fault rate– the customers will see it as a failed product
in 100% of all cases
• Fault rate barely decreasing- customers will be unhappy with it
• Fault rate is steadily decreasing – customers won’t notice the SW
which is ultimate indicator of success
• With agile or incremental development there are multiple peaks
until the final burn down of defects
• We cover how to track fault rate during testing in the IEEE 1633
clause 5.4.4
10
Metric World
Class
Mediocre Distressed
Fault rate
trend
Steadily
decreasing
Peaking or
recently
peaked
Increasing
Percentage
of defects
identified
prior to
deployment
versus post
deployment
>=75% 40-74% <=39%
0
2
4
6
8
10
12
Non
Cumulative
defects
discovered
Usage/test time
Defects discovered over life of version
Failed projects
deploy prior to
peak when <=
39% of defects
are removed
Mediocre projects
deploy between 40%
and 75% of area
under curve
Successful
projects deploy
at > 75% of the
area under this
curve
Value added: Ensures software
isn’t deployed half baked
Lessons learned from a real software intensive program
in which defect discovery rate was not tracked
This is the defect rate from a distressed software intensive program
The organization released the software to operational deployment before the fault
rate peaked.
That’s because no one was trending the fault rate.
More than 800 software failures were discovered by customer after deployment.
Upon deployment, the actual system reliability was 8 % of the required reliability
objective because of the software failures.
If SWRG models had been used prior to deployment, the service would not have
accepted the software as is since the RAM goal had not been met.
0
20
40
60
80
100
120
140
160
180
Defects discovered over usage time
Software was
released here
with an
increasing fault
rate
Lessons learned from a real software intensive program in which
defect discovery rate was tracked
This is the fault rate from a real software intensive program
• The fault rate is clearly trending downwards
• By the end of the trend, approximately 80% of defects had been discovered (The IEEE 1633
shows how to calculate this)
• There was still work to be done with regards to defect removal but the software is stable.
• The SWRG model provides confidence that the overall RAM objective can be met and the
work required to meet it
0
1
2
3
4
5
6
7
8
9
10
0 200 400 600 800 1000 1200 1400 1600 1800
Non
cumulative
unique
defects
Usage time
Defects discovered over usage time
13
Fix rate
versus open
rate
13
If the discovered defects aren’t
removed - the defect discovery
trend won’t improve very far
beyond the peak
0
2
4
6
8
10
12
Defects
discovered
Usage/test time
Defects discovered over life of version
Increase is
caused by
blocking
defects
Peak is reached
when blocking
defects are either
fixed or avoidable
Stability cannot
be reached if
defects are not
fixed
Value added: A stable defect
discovery profile won’t happen
if the defects aren’t removed
14
Estimated residual
defects and potential
for pileup
• Releases are too far
apart initially and too
close together in last 3
releases
• SRE predictions
allowed for leveling of
features before the
code is even written 0
5
10
15
20
25
Total faults predicted (nominal case)
from releases 1 to 5 predicted for each
month
Value added: An early sprint or
release might be stable; but at
planned cadence eventually
future sprints or releases
won’t be
15
Severity/effect of
unresolved defects
15
It’s possible to have a decreasing
defect trend and defects that aren’t
piling up but still have unreliable
software because there are open
defects that the user can’t experience
1. A single defect causes multiple failures
2. The same failure happens at every
installed site at the same time
3. A collection of defects “with
workarounds” collectively cause the
user to have unacceptable downtime
4. One “reset” could happen so often
that the system is unusable
5. The worst software failures often
happen when the software is
executing
6. Systematic software failures need to
be counted as a failure every time
they occur until the underlying failure
mode is proven to be removed or
mitigated
Value added: Software isn’t like hardware.
Software can have the following failures that
typically don’t happen with hardware.
16
Defect density
benchmarking
statistics
 Tables like this are
derived from actual
field data [1][2]
 Organizations with
lowest deployed
defect density were
also late less often
and by a smaller
amount
 SRE for any given
project can be
benchmarked by
answering a simple
survey
Cluster Outcome
Defect metrics
Late deliveries (as
per SW estimates)
Average
defects
per 1000
source
lines of
code
% defects
removed
prior to
release Fault rate
Prob
(late)
How much
project is
late by as
% of
schedule
3% World Class .0269
>75%
Steadily
decreasing
40 12
10% Successful .0644 20 25
25%
Above
average .111
40-75%
Recently
peaked or
recently
decreasing
17 25
50% Average .239 34 37
75%
Below
average .647 85 125
90% Impaired 1.119
<40%
Increasing
or peaking
67 67
97% Distressed 2.402 83 75
16
Value added: The defect
profile can be predicted before
testing even starts. Sometimes
one bad development practice
can derail the program. You
don’t want to wait until testing
or operation to find that out.
Factors that have
been
mathematically
proven to be related
to software
reliability [1][2]
USAF Rome Laboratories
developed first prediction
model in 1987[2]. It was
based on these factors.
A few more have been
developed since then.
Facts don’t lie. All
predictive models agree
that how the software is
developed is a good
predictor for its ultimate
reliability.
Static analysis
tools measure
these
SEI CMMi and
ASPICE assess
this
These are
often
overlooked
Type of
factor
Number /% of
characteristics
in this category
Examples of characteristics in this
category
Product 50 – (10%) Size, complexity, whether the
design is object oriented,
whether the requirements are
consistent, code that is old
and fragile, etc.
Product
risks
12 – (2%) Risks imposed by end users,
government regulations,
customers, product maturity,
etc.
People 38 – (7%) Turnover, geographical
location, amount of noise in
work area, number of years of
experience in the applicable
industry, number of software
people, ratio of software
developers to testers, etc.
Process 121 – (23%) Procedures, compliance, exit
criteria, standards, etc.
Technique 302 – (58%) The specific methods,
approaches and tools that are
used to develop the software.
Example: Using a SFMEA to
help identify the exceptions
that should be designed and
coded.
These are
often
overlooked
17
Techniques that have been proven to effect software reliability that
are often overlooked
Category Examples
Decomposition • Code a little, test a little philosophy.
• Release development/test time < 18 months long and preferably <12
months.
• Each developer has a schedule that is granular to day or week.
Visualization with
pictures and tables
A picture is worth 1000 words. Specifications with diagrams/pictures/tables
are associated with fewer defects than text.
Requirements focus Developing requirements that aren’t missing crucially important details
Testing focus/rigor Explicitly testing the requirements, design, stresses, lines of code,
operational profile
Unit testing focus Unit testing by every software engineer is mandatary and as per a defined
template. Branch coverage tools and metrics.
Defect reduction
techniques
Software fault trees, software FMEA, etc.
Design focus Designing states, sequences, timing, logic, algorithms, error handling before
coding
Regular monitoring
of the software
engineers
Monitoring software progress daily or weekly, identifying risks early, etc.
Planning ahead Planning the scope, personnel, equipment, risks before they become
problematic, planning the timing of the tasks
18
Coverage metrics
• These metrics put the defect profile observed in testing into
perspective
• Ex: If only half of the requirements are covered or half of the
code has been executed in testing then the reliability growth
progress is optimistic by at least 50%. (i.e. total defects are
at least twice what have been observed so far)
19
Software
Metrics
Definition
Requirements
Traceability
Degree to which the
requirements have been met by
the architecture, code and test
cases
Structural
coverage
Degree to which the lines of
code, paths, and data have
been tested
Value added:The estimated
fault trend and remaining
defects are only as accurate as
the amount of code and
requirements that have been
covered.
Qualitative
• An organization can have excellent
quantitative measures but still
have unknown failures that cause
serious problems for end users
• This is because every
development activity is designed
for success.
• Engineering calls these failures
“edge” cases
• But really these failure modes
were detectable all along. They
just chose not to look for them or
design to them.
20
Qualitative
Level of rigor in testing including
• Test Like You Operate
• Fault injection testing
• Peak loading and endurance
testing
• Boundary and zero value testing
• Code coverage
• Go-No go testing
• Requirements coverage
Failure mode identification against
the “Common Defect
Enumeration” or known set of
software failure modes and/or
fault tree analysis
Defect root cause analysis
Value added: Reliable software is
achievable only when software is
designed to avoid failure.
Develop a
reliability test
suite to
maximize the
level of rigor
Contrary to
popular belief
testing only the
requirements is
rarely
sufficient.
21
Type of test Inputs Description
Black box testing
Operational
profile testing
The Operational Profile
(OP)
Also known as “Test LikeYou
Operate”
Requirements
based testing
The software
requirements
Exercises the SUT to provide
assurance that it satisfies its
requirements as specified.
Model based
testing
Test models may be
derived from
requirements and design
documentation.
Exercises state transitions, fault
states, dead states, prohibited
states
Stress case
testing
Expected longest
mission time and max
concurrent users
Peak loading, endurance, zero
value testing, boundary values,
go no-go testing
Timing and
performance
Timing and scheduling
diagrams, performance
requirements, the
effects of software on
the system design
Exercises the SUT to evaluate
compliance with requirements
for real-time deadlines, resource
utilization.
Failure modes
– See next
slide
Software FMEA, defect
root cause analysis, fault
trees
Exercises the conditions that are
associated with the identified
failure modes. This is the only
test that verifies that software
works properly in a degraded
environment.
Value added:
These tests often
identify defects
that are the most
severe and most
expensive to fix
once deployed
Analyze failure
modes and effects
• Problem: Engineers will
often consider failure
modes that are so obvious
that they are guaranteed
to be found in testing – or
they will consider failure
modes that aren’t fixable in
the software.
• There are over 400 failure
mode/root causes that are
at least relevant to all
software systems.
• The failure modes are
tagged to actual failures
from mission and safety
critical systems since 1962
22
Software
failure mode
Description
Faulty state
management
Inadvertent state transitions, dead states,
state transitions are incorrect, etc.
Faulty
sequencing
Operations execute in the wrong order
Faulty timing Operations start too early or too late or take
too long.The right event happens in the right
order but at the wrong time
Faulty data The data is the wrong units of measure,
scale, resolution, type, size, stale, corrupt,
missing.
Faulty
functionality
The system does the wrong thing or does the
right thing incorrectly
Faulty error
detection
The software fails to detect faults in the
hardware, communication, computations,
power, external components or devices,
computations, etc.
Faulty
processing
Software accuracy or memory degrades over
mission, software can’t handle peak loading
or maximum users
Value added: Virtually 100% [3] of the software failure modes that have cause problems in
operation are due to the above categories which are often overlooked.
History repeats
itself.
Root causes are
predictable but
only if someone
thinks about
them.
• Faulty error handling – Apollo 11 lunar landing, ARIANE5,
Quantas flight 72, Solar Heliospheric Observatory
spacecraft, Denver Airport, NASA Spirit Rover (too many
files on drive not detected)
• Faulty data definition - Ariane5 explosion 16/64 bit
mismatch, Mars Climate Orbiter Metric/English mismatch,
Mars Global Surveyor, 1985 SDIO mismatch, TITANIV wrong
constant defined
• Fault logic– AT&T Mid Atlantic outage in 1991
• Timing - SCUD missile attack Patriot missile system, 2003
Northeast blackout
• Race condition - Therac 25
• Peak load conditions - Affordable Health Care site launch,
Iowa Primary
• Faulty usability
• Too easy for humans to make mistakes – AFATDS
friendly fire, PANAMA city over-radiation
• Insufficient positive feedback of safety and mission
critical commands –
The above illustrates that history keeps repeating itself
because people assume root causes from other
industries/applications are somehow not applicable.
Lesson to be learned – the root causes are applicable to
any industry/product. It’s the hazards/effects that result
from the root causes that are unique.
23
Software failure modes effects analysis are highly effective
but only if the below 17 mistakes are avoided
Organizational mistakes
• None of the software FMEA analysts
have a background in software
• The analysis is not constructed by a cross
functional team
• Conducting the SFMEA too late (most of
these failure modes are too expensive to
fix once the code is written)
• Conducting the SFMEA without the
proper software deliverables such as the
SRS, SDD, IRS, etc.
• Failing to track the failure modes and/or
make any corrective actions to the
requirements, design, code, use case,
users manual as a result of the SFMEA
• Failing to tailor the software FMEA to the
highest risk areas and most relevant
failure modes
Faulty Assumptions
• Assumption that all
failures originate in a
single line of code or
specification
• Assumption that software
works
• Assumption that software
specifications are correct
and complete
• Assumption that all failure
modes will be found and
fixed in testing
• Assumption that all failure
modes are impossible or
negligible in severity
FMEA Execution mistakes
• Focusing on total failure of the software
- failing to consider small things that
lead to big things going wrong
• Black box versus functional approach –
analyze what the software does and not
what it is
• Ignoring the 6 dimensions that lead to
software failures - the system, the users
who use the system, the battlefield
environment, and the mission
• Conducting the SFMEA at too high
(system requirements) or too low (lines
of code) a level or architecture
• Mixing functional failure modes with
process failure modes (i.e. fault timing
means the software design not the
software schedule)
• Incorrectly assigning a failure rate or
likelihood
Value added: The IEEE 1633 explains out how to apply the FMEA so that the 17
common mistakes are minimized. The recommended practice can be applied to any
industry FMEA standard for framework.
FAULTY ASSUMPTIONTHAT ALL FAILURE MODES ORIGINATE IN A SINGLE LINE
OF CODE
25
Line of code #1
Line of code #2
Line of code #3
Line of code #4
Line of code #5
Line of code #6
…..
Line of code fails
to execute
Line of code
terminates
The analysts work through each line of
code one at a time and analyze against
statement each CDE one at a time.
This is ineffective because:
1. Very few failures are due to a single line of
code [3]
2. When a failure is due to a single line of code
it is usually due to mistakes like these
• Line of code executes the wrong
command (i.e. has a compilable typo)
• Line of code manipulates the wrong
data
• Line of code uses isn’t written properly
but still compiles
3. Lines of code typically don’t fail to execute
unless there is a defect in another line of code
4. If a line of code terminates execution it is
often because there is missing fault handling or
by faulty design
FAULTY ASSUMPTIONTHAT ALL FAILURE MODES ORIGINATE IN A SINGLE
SPECIFICATION
26
SRS #1
SRS #2
SRS #3
SRS #4
SRS #5
SRS #6
…..
CDE #1
CDE #2
CDE #3
CDE #4
CDE #5
CDE #6
…..
The analysts work through each SRS
statement one at a time and analyze
against statement each failure mode
one at a time.
This is ineffective because:
1. Majority of operational defects aren’t
caused by a single faulty statement [3]
2. Many of the common defect
enumerations don’t apply at the
statement level – they apply to a
collection of statements
3. Primary failure mode at an individual
specification is magic numbers (i.e.
timing or accuracy requirements. )
INCOSE requirements analyzers are effective at identifying requirements
statements that are ambiguous or untestable.
MANY FAILURES ARE DUETO A COLLECTION OF SOFTWARE SPECIFICATIONS,
MISSING SPECIFICATIONSAND MULTIPLE LINES OF CODE
27
Analyze the collection of
software requirements against
the set of CDEs
1. Prune the CDEs to remove
things you don’t have in the
software
2. Analyze the specifications
and design as a whole
package against the relevant
CDEs
SRS #1
SRS #2
SRS #3
SRS #4
SRS #5
SRS #6
…..
CDE #1
CDE #2
CDE #3
CDE #4
CDE #5
CDE #6
…..
Value added: The Common Defect Enumeration lays out the failure modes
that have caused the most failures in operation. The CDE can be used in any
FMEA framework or industry standard.
Include software in a system fault tree
• A “software” fault tree should be part of an overall system fault tree to ensure
that interactions with hardware are considered
• The fault tree can feed the software FMEA
• The hazards are tagged to the top level effects in the SFMEA
• If the SFMEA is effective it will cover every hazard
• The software FMEA can feed the FTA
• It may/will identify hazards that weren’t considered in the FTA, PHA, FHA
28
Value added: A fault tree can jump start the software FMEA to ensure that the most likely
failure modes tagged to the most serious hazards are considered.
29
Software defect root cause analysis
• The defect RCA can be employed all by itself or prior to a software
FMEA
• Defect RCA ensures that fault injection testing, design reviews,
specification reviews, and code reviews, focus on the most relevant
root causes for the application under development
• Defect RCA has 3 viewpoints
• Defects by originating artifact
• Contrary to popular belief most defects found in operation are not
“coding” defects. They are specification and design defects that led to
coding defects.
• Defects by failure mode
• Faulty timing, sequencing, state management, error handling,
functionality, processing, logic, interfaces, etc.
• Defects by root cause
• See the Common Defect Enumeration [2]
29
Value added: Failure modes that have happened in the recent past are the most likely to
happen again. That’s because software engineers usually fix one instance of a defect but
don’t fix related systematic instances.
Software defect root cause analysis
The most common failure mode is directly related to the weakest link in the
development activity. Examples:
• If the software engineers fail to consider that software must detect failures in hardware there
will be more faulty error handling failure modes
• If the software engineers fail to do state diagraming prior to coding and the system is
stateful; state management defects are more likely
• If the software engineers fail to do timing design and timing is important for the application;
timing defects are more likely
30
0
1
2
3
4
5
6
7
8
9
Defects
Failure mode
Defects by failure mode
Software defect root cause analysis
Defects are introduced because of either bad requirements, bad design,
bad coding practices or bad change control.
• Requirements defect –The “whats” are incorrect, ambiguous or incomplete.
• Design defect –The “whats” are correct but the “hows” are not. Logic, state,
timing, exception handling are all design related.
• Coding defect-The “whats” and “hows” are correct but the software engineer
did not implement one or more lines of code properly. 31
0
1
2
3
4
5
6
7
8
9
Detailed design Requirements Maintenance
change
External change Coding Installation
package
Usability
Number
of
defects
Activity/Artifact introducing defect
Defects by originating artifact or activity
Software defect root cause analysis
Each of the failure modes has multiple root causes as per the Common
Defect Enumeration [2]
32
0
1
2
3
4
5
6
7
8
Defects
Root cause
Defects by root cause
Summary
33
IEEE 1633 2016 puts forth
recommended practices
to apply qualitative
software failure modes
analyses and qualitative
models
Improve product and ensure
software or firmware
delivered with required
reliability
IEEE 1633 2016
includes improved
guidance over 2008
edition
Offers increased value to this
audience
• Reliability engineers
• Software quality engineers
• Software and engineering
managers
• Acquisitions
• Regulatory
IEEE 1633 2023 will
make following
improvements
Incorporates
• Common defect
enumeration
• Tailoring for DevSecOps
• Updated models
• Refined guidance
34
References
[1] The Cold Hard Truth About Reliable Software, Version 6i, AM Neufelder, 2019.
[2] Rome Laboratory TR-92-52, “Software Reliability Measurement and Test Integration Techniques”, J
Mccall, W Randell, J Dunham, L. Lauterback, 1992.
[3] The Common Defect Enumeration, AM Neufelder, Copyright Mission Ready Software, 2021.
https://www.dau.edu/cop/rm-engineering/_layouts/15/WopiFrame.aspx?sourcedoc=/cop/rm-
engineering/DAU%20Sponsored%20Documents/Reliable%20Software%20SOW%20Appendix%20B%20-
%20CDE.xlsx&action=default
[4] Effective Application of Software Failure Modes Effects Analysis, AM Neufelder, published by
Quanterion Solutions, Inc., 2014.
34

More Related Content

PDF
Five Common Mistakes made when Conducting a Software FMECA
PPT
Advanced topics in software engineering
PPTX
ISTQB - Software development life cycle
PPT
Function Oriented Design
PPTX
Introduction to Software Failure Modes Effects Analysis
PPTX
Software Engineering by Pankaj Jalote
PPTX
Waterfall Model
PPTX
Overview of software reliability engineering
Five Common Mistakes made when Conducting a Software FMECA
Advanced topics in software engineering
ISTQB - Software development life cycle
Function Oriented Design
Introduction to Software Failure Modes Effects Analysis
Software Engineering by Pankaj Jalote
Waterfall Model
Overview of software reliability engineering

What's hot (20)

PDF
Sdlc models
PPTX
SDLC Model (Waterfall,Iterative Waterfall,Spiral)
PPTX
Software Development Methodologies.pptx
PDF
Introduction to Robot Framework (external)
PPTX
Software test automation
PPTX
Sadcw 7e chapter04_recorded
PDF
Dimensions RM: Agile Requirements Management
PDF
Software development life cycle
PPTX
Software Development Life Cycle-SDLC
PDF
Software engineering a practitioners approach 8th edition pressman solutions ...
PPTX
Pair Programming in software engineering.pptx
PPS
Unit Testing
PDF
Career in Software Testing | Skills Required for Software Test Engineer | Edu...
PPT
Test Automation Framework Designs
PPTX
Manual Vs Automation Testing
PPTX
Fundamentals of Testing
PPTX
Software Engineering
PPT
ALM (Application Lifecycle Management)
PPTX
Software quality assurance activites
PPTX
software quality
Sdlc models
SDLC Model (Waterfall,Iterative Waterfall,Spiral)
Software Development Methodologies.pptx
Introduction to Robot Framework (external)
Software test automation
Sadcw 7e chapter04_recorded
Dimensions RM: Agile Requirements Management
Software development life cycle
Software Development Life Cycle-SDLC
Software engineering a practitioners approach 8th edition pressman solutions ...
Pair Programming in software engineering.pptx
Unit Testing
Career in Software Testing | Skills Required for Software Test Engineer | Edu...
Test Automation Framework Designs
Manual Vs Automation Testing
Fundamentals of Testing
Software Engineering
ALM (Application Lifecycle Management)
Software quality assurance activites
software quality
Ad

Similar to IEEE 1633 Recommended Practices for Reliable Software (20)

PDF
real simple reliable software
PDF
Revised IEEE 1633 Recommended Practices for Software Reliability
PDF
IEEE 1633 Recommended Practice on Software Reliability
PDF
A Combined Approach of Software Metrics and Software Fault Analysis to Estima...
PPTX
Software reliability engineering process
PDF
Volume 2-issue-6-1983-1986
PDF
Volume 2-issue-6-1983-1986
PPTX
Software engineering 23 software reliability
PPTX
1-SUMSEM2024-25_CSI3014_TH_VL2024250700241_2025-05-13_Reference-Material-I.pptx
PDF
Reliable software in a continuous integration/continuous deployment (CI/CD) e...
PDF
O0181397100
PDF
A Review On Software Reliability.
PPT
Software Engineering -Software Reliability.ppt
PPTX
Module IV (1).pptx for software emgineee
PDF
Software reliability engineering
PDF
A Survey of Software Reliability factor
PDF
Finally… reliable software!
PDF
Software Reliability and Safety.pdf
PDF
the-top-ten-things-that-have-been-proven-to-effect-software-reliability-1.pdf
PDF
The Top Ten things that have been proven to effect software reliability
real simple reliable software
Revised IEEE 1633 Recommended Practices for Software Reliability
IEEE 1633 Recommended Practice on Software Reliability
A Combined Approach of Software Metrics and Software Fault Analysis to Estima...
Software reliability engineering process
Volume 2-issue-6-1983-1986
Volume 2-issue-6-1983-1986
Software engineering 23 software reliability
1-SUMSEM2024-25_CSI3014_TH_VL2024250700241_2025-05-13_Reference-Material-I.pptx
Reliable software in a continuous integration/continuous deployment (CI/CD) e...
O0181397100
A Review On Software Reliability.
Software Engineering -Software Reliability.ppt
Module IV (1).pptx for software emgineee
Software reliability engineering
A Survey of Software Reliability factor
Finally… reliable software!
Software Reliability and Safety.pdf
the-top-ten-things-that-have-been-proven-to-effect-software-reliability-1.pdf
The Top Ten things that have been proven to effect software reliability
Ad

More from Ann Marie Neufelder (7)

PDF
Software Failure Modes Effects Analysis Overview
PPTX
An Introduction to Software Failure Modes Effects Analysis (SFMEA)
PPTX
The Top Ten things that have been proven to effect software reliability
PPTX
Top Ten things that have been proven to effect software reliability
PPTX
Predict Software Reliability Before the Code is Written
PPTX
Four things that are almost guaranteed to reduce the reliability of a softwa...
PPTX
Four things that are almost guaranteed to reduce the reliability of a softwa...
Software Failure Modes Effects Analysis Overview
An Introduction to Software Failure Modes Effects Analysis (SFMEA)
The Top Ten things that have been proven to effect software reliability
Top Ten things that have been proven to effect software reliability
Predict Software Reliability Before the Code is Written
Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...

Recently uploaded (20)

PPTX
Welding lecture in detail for understanding
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPT
Mechanical Engineering MATERIALS Selection
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Sustainable Sites - Green Building Construction
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
web development for engineering and engineering
PDF
Well-logging-methods_new................
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Construction Project Organization Group 2.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
Welding lecture in detail for understanding
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Mechanical Engineering MATERIALS Selection
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Sustainable Sites - Green Building Construction
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
UNIT 4 Total Quality Management .pptx
web development for engineering and engineering
Well-logging-methods_new................
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Construction Project Organization Group 2.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Foundation to blockchain - A guide to Blockchain Tech

IEEE 1633 Recommended Practices for Reliable Software

  • 1. 1 HTTP://WWW.MISSIONREADYSOFTWARE.COM 321-514-4659 An overview of the IEEE 1633 Recommended Practices for Software Reliability Ann Marie Neufelder ann.neufelder@missionreadysoftware.com • Introduction and motivation to product this guidance • Solutions provided by the guidance • Quantitative and qualitative reliability measures for making a software release decision
  • 2. 2 IEEE 1633 Working Group members • Robert Stoddard - SEI • Lance Fiondella - UMass • Peter Lakey - Consultant • Robert Binder – retired • Michael Siok – Lockheed Martin • Ming Li - NRC • Ying Shi - NASA • Nematollah Bidokhti - thinkDFR • Thierry Wandji – US Navy • Michael Grottke - FAU • Andy Long - OSD • George Stark - IBM • Allen Nikora - NASA • Bakul Banerjee – retired IEEE • Debra Greenhalgh Lubas – US Navy • Mark Sims – US Army • Rajesh Murthy - Consultant • Willie Fitzpatrick – US Army • Mark Ofori-kyei – General Dynamics • Sonya Davis – General Dynamics 2 • Burdette Joyner – Northrup Grumman • Marty Shooman –retired NASA • Andrew Mack • Loren Garroway – Northrup Grumman • Kevin Mattos– US Navy • Kevin Frye - US Navy • Claire Jones - Boeing • Robert Raygan - OSD • Mary Ann DeCicco – General Dynamics • Shane Smith - OSD • Franklin Marotta – US Army • David Bernreuther – US Army • Martin Wayne – US Army • Nathan Herbert – US Army • Richard E Gibbs III - Boeing • Harry White - Harmonic • Jacob Axman – US Navy • Ahlia T. Kitwana - Harris • Yuan Wei • Darwin Heiser – General Dynamics • Brian McQuillan – General Dynamics • Kishor Trivedi – Duke University • Debra Haehn – Philips Healthcare Chair: Ann Marie Neufelder, Mission Ready Software Vice Chair: Lance Fiondella - UMass Secretary: Rachel Neufelder, Mission Ready Software IEEE Standards Association Chair: Louis Gullo, Northrop Grumman Martha Wetherholt of NASA was Vice Chair until her passing in 2020. She was instrumental in delivering the 2016 edition.
  • 3. Introduction and Motivation Reliable software engineering… • Process assessments such as the SEI CMMi assessment have not provided value added for improving reliability or safety of software • 30 years of data shows [1] • No improved software reliability beyond level 3 • Organizations with CMMi level 3+ can and do produce failed software programs • A good process is necessary but not sufficient • An organization can have a great process and still have • People who do not understand the product or industry doing development and test • Low level of rigor in testing • Requirements that are traceable but poorly written • Design that is traceable but poorly written • Test procedures that are traceable but poorly written and have low coverage • Overlooked failure modes • Too many unknown defects in the software • Too many known open defects that aren’t assessed appropriately • Too many workarounds in the software that burden the end users and/or cause loss of availability 3
  • 4. Introduction and Motivation Reliable software engineering… 4 Has been an engineering disciple for > 50 years. Fundamental prerequisite for virtually all modern systems Plenty of theory generated over last several decades, but…Practical guidance on how to apply these models has lagged significantly Diverse set of stakeholders requires pragmatic guidance and tools to apply software reliability models to assess real software or firmware projects during each stage of the software development lifecycle Fundamental roadblocks addressed by IEEE 1633 • Reliability engineers don’t understand software • Software engineers don’t understand reliability • Both may have challenges acquiring data needed for the analyses
  • 5. Solutions provided by IEEE 1633 Actionable step by step procedures for assessing reliable software During any phase of software or firmware development With any software lifecycle model for any industry or application type. • Reliable software is demonstrated by both qualitative and quantitative evidence • Fact based decisions for releasing software based on qualitative and quantitative aspects • Quantitative measures are the demonstration that software is reliable • Qualitative measures provide confidence that quantitative measures are accurate • The easiest way to record software fewer defects is to test less or test with less rigor 5 Qualitative Level of rigor in testing including • Test LikeYou Operate • Fault injection testing • Peak loading and endurance testing • Boundary and zero value testing • Code coverage • Go-No go testing • Requirements coverage Failure mode identification against the “Common Defect Enumeration” or known set of software failure modes and/or fault tree analysis Defect root cause analysis Qualitative Estimated portion of system failures that will be due to software Rate of defect discovery in testing Fix rate versus open rate Estimated residual defects and potential for pileup Severity level/effect of unresolved defects Defect density benchmark Coverage metrics
  • 6. Collect SW failure data Clause 5.4.4 Testing for reliable software Clause 5.4.1-5.4.3 Allocate reliability goals to software components Clause 5.3.5 Benchmark software reliability early Clauses 5.3.2, 6.2 Include software in the system reliability model Clause 5.3.4 6 Make a fact based release decision Clause 5.5 Failure modes analysis Clause 5.2, 5.4.8 Plan the reliable software Clause 5.1 Evaluate reliability of software during testing and operation Clauses 5.4.4-5.4.7, 6.3 List of software configuration items (CI), failure definition scoring criteria, assessment of key risks that can derail the software program Portion of total failures due to SW Predicted reliability measures for each CI Measurable goals for each SW CI Failure modes typically overlooked in testing Test results, Level of rigor Defect discoveries over usage time Test coverage, failure mode resolution, progress against reliability goals All IEEE 1633 clauses provide support for a fact-based release decision
  • 7. Current status of IEEE 1633 •Unanimously approved by IEEE Standards Association in first ballot of May 24, 2016. Released on January 18, 2017. •Working group is currently making updates for • How reliable software tasks are executed in DevSecOps • Common Defect Enumeration recently published on the Defense Acquisition University R&M Community of Practice website • https://www.dau.edu/cop/rm- engineering/_layouts/15/WopiFrame.aspx?sour cedoc=/cop/rm- engineering/DAU%20Sponsored%20Documen ts/Reliable%20Software%20SOW%20Appendi x%20B%20-%20CDE.xlsx&action=default • Level of rigor in testing 7
  • 8. Quantitative • Exactly “1” quantitative measure should not be used as a release decision maker • Ex: Defects per source line of code or failures per hour should not be used without other metrics • No one metric tells the whole story • It’s too easy for one metric to be a self- fulfilling prophecy. 8 Quantitative Estimated portion of system failures that will be due to software Rate of defect discovery in testing Fix rate versus open rate Estimated residual defects and potential for pileup Severity level/effect of unresolved defects Defect density benchmark Coverage metrics
  • 9. 9 Estimated portion of system failures due to software • Past history method is very accurate if the past history is recent and is calibrated for changes in technology. • As a rule, software grows 10-12% per year. So, historical data should be calibrated to assume that the software portion is growing 10-12% per year. • There is virtually no chance that software will decrease in size over time. Hence past history is a useful lower bound. • Real example: An engineering company produced a system in 2015. Of all of the deployed failures, 25% were due to software. In 2017 they were deploying a similar system. • Since historical data was 2 years old, 25% is adjusted by 10-12 % per year. • So, the prediction is between 30.25% and 31.36%. • When the equipment was deployed in 2019 - the actual portion of failures due to software was 33%. Much more accurate than the 5% estimated by subject matter experts. 9 Method Description Past history Compute relative portion of SW versus HW failures from a past similar system R&D $ Compute relative portion of R&D $ dedicated to software development Achievabl e failure rates Use prediction models to determine failure rate for HW, SW. The predicted values for each determine their allocation. Value added: Engineering has no basis for assuming that software doesn’t fail or that the reliability = 1
  • 10. 10 Rate of defect discovery in testing • Software fault rate increases, peaks and then decreases prior to maturity • Maturity level at deployment separates the world class from the distressed • Increasing fault rate– the customers will see it as a failed product in 100% of all cases • Fault rate barely decreasing- customers will be unhappy with it • Fault rate is steadily decreasing – customers won’t notice the SW which is ultimate indicator of success • With agile or incremental development there are multiple peaks until the final burn down of defects • We cover how to track fault rate during testing in the IEEE 1633 clause 5.4.4 10 Metric World Class Mediocre Distressed Fault rate trend Steadily decreasing Peaking or recently peaked Increasing Percentage of defects identified prior to deployment versus post deployment >=75% 40-74% <=39% 0 2 4 6 8 10 12 Non Cumulative defects discovered Usage/test time Defects discovered over life of version Failed projects deploy prior to peak when <= 39% of defects are removed Mediocre projects deploy between 40% and 75% of area under curve Successful projects deploy at > 75% of the area under this curve Value added: Ensures software isn’t deployed half baked
  • 11. Lessons learned from a real software intensive program in which defect discovery rate was not tracked This is the defect rate from a distressed software intensive program The organization released the software to operational deployment before the fault rate peaked. That’s because no one was trending the fault rate. More than 800 software failures were discovered by customer after deployment. Upon deployment, the actual system reliability was 8 % of the required reliability objective because of the software failures. If SWRG models had been used prior to deployment, the service would not have accepted the software as is since the RAM goal had not been met. 0 20 40 60 80 100 120 140 160 180 Defects discovered over usage time Software was released here with an increasing fault rate
  • 12. Lessons learned from a real software intensive program in which defect discovery rate was tracked This is the fault rate from a real software intensive program • The fault rate is clearly trending downwards • By the end of the trend, approximately 80% of defects had been discovered (The IEEE 1633 shows how to calculate this) • There was still work to be done with regards to defect removal but the software is stable. • The SWRG model provides confidence that the overall RAM objective can be met and the work required to meet it 0 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 1000 1200 1400 1600 1800 Non cumulative unique defects Usage time Defects discovered over usage time
  • 13. 13 Fix rate versus open rate 13 If the discovered defects aren’t removed - the defect discovery trend won’t improve very far beyond the peak 0 2 4 6 8 10 12 Defects discovered Usage/test time Defects discovered over life of version Increase is caused by blocking defects Peak is reached when blocking defects are either fixed or avoidable Stability cannot be reached if defects are not fixed Value added: A stable defect discovery profile won’t happen if the defects aren’t removed
  • 14. 14 Estimated residual defects and potential for pileup • Releases are too far apart initially and too close together in last 3 releases • SRE predictions allowed for leveling of features before the code is even written 0 5 10 15 20 25 Total faults predicted (nominal case) from releases 1 to 5 predicted for each month Value added: An early sprint or release might be stable; but at planned cadence eventually future sprints or releases won’t be
  • 15. 15 Severity/effect of unresolved defects 15 It’s possible to have a decreasing defect trend and defects that aren’t piling up but still have unreliable software because there are open defects that the user can’t experience 1. A single defect causes multiple failures 2. The same failure happens at every installed site at the same time 3. A collection of defects “with workarounds” collectively cause the user to have unacceptable downtime 4. One “reset” could happen so often that the system is unusable 5. The worst software failures often happen when the software is executing 6. Systematic software failures need to be counted as a failure every time they occur until the underlying failure mode is proven to be removed or mitigated Value added: Software isn’t like hardware. Software can have the following failures that typically don’t happen with hardware.
  • 16. 16 Defect density benchmarking statistics  Tables like this are derived from actual field data [1][2]  Organizations with lowest deployed defect density were also late less often and by a smaller amount  SRE for any given project can be benchmarked by answering a simple survey Cluster Outcome Defect metrics Late deliveries (as per SW estimates) Average defects per 1000 source lines of code % defects removed prior to release Fault rate Prob (late) How much project is late by as % of schedule 3% World Class .0269 >75% Steadily decreasing 40 12 10% Successful .0644 20 25 25% Above average .111 40-75% Recently peaked or recently decreasing 17 25 50% Average .239 34 37 75% Below average .647 85 125 90% Impaired 1.119 <40% Increasing or peaking 67 67 97% Distressed 2.402 83 75 16 Value added: The defect profile can be predicted before testing even starts. Sometimes one bad development practice can derail the program. You don’t want to wait until testing or operation to find that out.
  • 17. Factors that have been mathematically proven to be related to software reliability [1][2] USAF Rome Laboratories developed first prediction model in 1987[2]. It was based on these factors. A few more have been developed since then. Facts don’t lie. All predictive models agree that how the software is developed is a good predictor for its ultimate reliability. Static analysis tools measure these SEI CMMi and ASPICE assess this These are often overlooked Type of factor Number /% of characteristics in this category Examples of characteristics in this category Product 50 – (10%) Size, complexity, whether the design is object oriented, whether the requirements are consistent, code that is old and fragile, etc. Product risks 12 – (2%) Risks imposed by end users, government regulations, customers, product maturity, etc. People 38 – (7%) Turnover, geographical location, amount of noise in work area, number of years of experience in the applicable industry, number of software people, ratio of software developers to testers, etc. Process 121 – (23%) Procedures, compliance, exit criteria, standards, etc. Technique 302 – (58%) The specific methods, approaches and tools that are used to develop the software. Example: Using a SFMEA to help identify the exceptions that should be designed and coded. These are often overlooked 17
  • 18. Techniques that have been proven to effect software reliability that are often overlooked Category Examples Decomposition • Code a little, test a little philosophy. • Release development/test time < 18 months long and preferably <12 months. • Each developer has a schedule that is granular to day or week. Visualization with pictures and tables A picture is worth 1000 words. Specifications with diagrams/pictures/tables are associated with fewer defects than text. Requirements focus Developing requirements that aren’t missing crucially important details Testing focus/rigor Explicitly testing the requirements, design, stresses, lines of code, operational profile Unit testing focus Unit testing by every software engineer is mandatary and as per a defined template. Branch coverage tools and metrics. Defect reduction techniques Software fault trees, software FMEA, etc. Design focus Designing states, sequences, timing, logic, algorithms, error handling before coding Regular monitoring of the software engineers Monitoring software progress daily or weekly, identifying risks early, etc. Planning ahead Planning the scope, personnel, equipment, risks before they become problematic, planning the timing of the tasks 18
  • 19. Coverage metrics • These metrics put the defect profile observed in testing into perspective • Ex: If only half of the requirements are covered or half of the code has been executed in testing then the reliability growth progress is optimistic by at least 50%. (i.e. total defects are at least twice what have been observed so far) 19 Software Metrics Definition Requirements Traceability Degree to which the requirements have been met by the architecture, code and test cases Structural coverage Degree to which the lines of code, paths, and data have been tested Value added:The estimated fault trend and remaining defects are only as accurate as the amount of code and requirements that have been covered.
  • 20. Qualitative • An organization can have excellent quantitative measures but still have unknown failures that cause serious problems for end users • This is because every development activity is designed for success. • Engineering calls these failures “edge” cases • But really these failure modes were detectable all along. They just chose not to look for them or design to them. 20 Qualitative Level of rigor in testing including • Test Like You Operate • Fault injection testing • Peak loading and endurance testing • Boundary and zero value testing • Code coverage • Go-No go testing • Requirements coverage Failure mode identification against the “Common Defect Enumeration” or known set of software failure modes and/or fault tree analysis Defect root cause analysis Value added: Reliable software is achievable only when software is designed to avoid failure.
  • 21. Develop a reliability test suite to maximize the level of rigor Contrary to popular belief testing only the requirements is rarely sufficient. 21 Type of test Inputs Description Black box testing Operational profile testing The Operational Profile (OP) Also known as “Test LikeYou Operate” Requirements based testing The software requirements Exercises the SUT to provide assurance that it satisfies its requirements as specified. Model based testing Test models may be derived from requirements and design documentation. Exercises state transitions, fault states, dead states, prohibited states Stress case testing Expected longest mission time and max concurrent users Peak loading, endurance, zero value testing, boundary values, go no-go testing Timing and performance Timing and scheduling diagrams, performance requirements, the effects of software on the system design Exercises the SUT to evaluate compliance with requirements for real-time deadlines, resource utilization. Failure modes – See next slide Software FMEA, defect root cause analysis, fault trees Exercises the conditions that are associated with the identified failure modes. This is the only test that verifies that software works properly in a degraded environment. Value added: These tests often identify defects that are the most severe and most expensive to fix once deployed
  • 22. Analyze failure modes and effects • Problem: Engineers will often consider failure modes that are so obvious that they are guaranteed to be found in testing – or they will consider failure modes that aren’t fixable in the software. • There are over 400 failure mode/root causes that are at least relevant to all software systems. • The failure modes are tagged to actual failures from mission and safety critical systems since 1962 22 Software failure mode Description Faulty state management Inadvertent state transitions, dead states, state transitions are incorrect, etc. Faulty sequencing Operations execute in the wrong order Faulty timing Operations start too early or too late or take too long.The right event happens in the right order but at the wrong time Faulty data The data is the wrong units of measure, scale, resolution, type, size, stale, corrupt, missing. Faulty functionality The system does the wrong thing or does the right thing incorrectly Faulty error detection The software fails to detect faults in the hardware, communication, computations, power, external components or devices, computations, etc. Faulty processing Software accuracy or memory degrades over mission, software can’t handle peak loading or maximum users Value added: Virtually 100% [3] of the software failure modes that have cause problems in operation are due to the above categories which are often overlooked.
  • 23. History repeats itself. Root causes are predictable but only if someone thinks about them. • Faulty error handling – Apollo 11 lunar landing, ARIANE5, Quantas flight 72, Solar Heliospheric Observatory spacecraft, Denver Airport, NASA Spirit Rover (too many files on drive not detected) • Faulty data definition - Ariane5 explosion 16/64 bit mismatch, Mars Climate Orbiter Metric/English mismatch, Mars Global Surveyor, 1985 SDIO mismatch, TITANIV wrong constant defined • Fault logic– AT&T Mid Atlantic outage in 1991 • Timing - SCUD missile attack Patriot missile system, 2003 Northeast blackout • Race condition - Therac 25 • Peak load conditions - Affordable Health Care site launch, Iowa Primary • Faulty usability • Too easy for humans to make mistakes – AFATDS friendly fire, PANAMA city over-radiation • Insufficient positive feedback of safety and mission critical commands – The above illustrates that history keeps repeating itself because people assume root causes from other industries/applications are somehow not applicable. Lesson to be learned – the root causes are applicable to any industry/product. It’s the hazards/effects that result from the root causes that are unique. 23
  • 24. Software failure modes effects analysis are highly effective but only if the below 17 mistakes are avoided Organizational mistakes • None of the software FMEA analysts have a background in software • The analysis is not constructed by a cross functional team • Conducting the SFMEA too late (most of these failure modes are too expensive to fix once the code is written) • Conducting the SFMEA without the proper software deliverables such as the SRS, SDD, IRS, etc. • Failing to track the failure modes and/or make any corrective actions to the requirements, design, code, use case, users manual as a result of the SFMEA • Failing to tailor the software FMEA to the highest risk areas and most relevant failure modes Faulty Assumptions • Assumption that all failures originate in a single line of code or specification • Assumption that software works • Assumption that software specifications are correct and complete • Assumption that all failure modes will be found and fixed in testing • Assumption that all failure modes are impossible or negligible in severity FMEA Execution mistakes • Focusing on total failure of the software - failing to consider small things that lead to big things going wrong • Black box versus functional approach – analyze what the software does and not what it is • Ignoring the 6 dimensions that lead to software failures - the system, the users who use the system, the battlefield environment, and the mission • Conducting the SFMEA at too high (system requirements) or too low (lines of code) a level or architecture • Mixing functional failure modes with process failure modes (i.e. fault timing means the software design not the software schedule) • Incorrectly assigning a failure rate or likelihood Value added: The IEEE 1633 explains out how to apply the FMEA so that the 17 common mistakes are minimized. The recommended practice can be applied to any industry FMEA standard for framework.
  • 25. FAULTY ASSUMPTIONTHAT ALL FAILURE MODES ORIGINATE IN A SINGLE LINE OF CODE 25 Line of code #1 Line of code #2 Line of code #3 Line of code #4 Line of code #5 Line of code #6 ….. Line of code fails to execute Line of code terminates The analysts work through each line of code one at a time and analyze against statement each CDE one at a time. This is ineffective because: 1. Very few failures are due to a single line of code [3] 2. When a failure is due to a single line of code it is usually due to mistakes like these • Line of code executes the wrong command (i.e. has a compilable typo) • Line of code manipulates the wrong data • Line of code uses isn’t written properly but still compiles 3. Lines of code typically don’t fail to execute unless there is a defect in another line of code 4. If a line of code terminates execution it is often because there is missing fault handling or by faulty design
  • 26. FAULTY ASSUMPTIONTHAT ALL FAILURE MODES ORIGINATE IN A SINGLE SPECIFICATION 26 SRS #1 SRS #2 SRS #3 SRS #4 SRS #5 SRS #6 ….. CDE #1 CDE #2 CDE #3 CDE #4 CDE #5 CDE #6 ….. The analysts work through each SRS statement one at a time and analyze against statement each failure mode one at a time. This is ineffective because: 1. Majority of operational defects aren’t caused by a single faulty statement [3] 2. Many of the common defect enumerations don’t apply at the statement level – they apply to a collection of statements 3. Primary failure mode at an individual specification is magic numbers (i.e. timing or accuracy requirements. ) INCOSE requirements analyzers are effective at identifying requirements statements that are ambiguous or untestable.
  • 27. MANY FAILURES ARE DUETO A COLLECTION OF SOFTWARE SPECIFICATIONS, MISSING SPECIFICATIONSAND MULTIPLE LINES OF CODE 27 Analyze the collection of software requirements against the set of CDEs 1. Prune the CDEs to remove things you don’t have in the software 2. Analyze the specifications and design as a whole package against the relevant CDEs SRS #1 SRS #2 SRS #3 SRS #4 SRS #5 SRS #6 ….. CDE #1 CDE #2 CDE #3 CDE #4 CDE #5 CDE #6 ….. Value added: The Common Defect Enumeration lays out the failure modes that have caused the most failures in operation. The CDE can be used in any FMEA framework or industry standard.
  • 28. Include software in a system fault tree • A “software” fault tree should be part of an overall system fault tree to ensure that interactions with hardware are considered • The fault tree can feed the software FMEA • The hazards are tagged to the top level effects in the SFMEA • If the SFMEA is effective it will cover every hazard • The software FMEA can feed the FTA • It may/will identify hazards that weren’t considered in the FTA, PHA, FHA 28 Value added: A fault tree can jump start the software FMEA to ensure that the most likely failure modes tagged to the most serious hazards are considered.
  • 29. 29 Software defect root cause analysis • The defect RCA can be employed all by itself or prior to a software FMEA • Defect RCA ensures that fault injection testing, design reviews, specification reviews, and code reviews, focus on the most relevant root causes for the application under development • Defect RCA has 3 viewpoints • Defects by originating artifact • Contrary to popular belief most defects found in operation are not “coding” defects. They are specification and design defects that led to coding defects. • Defects by failure mode • Faulty timing, sequencing, state management, error handling, functionality, processing, logic, interfaces, etc. • Defects by root cause • See the Common Defect Enumeration [2] 29 Value added: Failure modes that have happened in the recent past are the most likely to happen again. That’s because software engineers usually fix one instance of a defect but don’t fix related systematic instances.
  • 30. Software defect root cause analysis The most common failure mode is directly related to the weakest link in the development activity. Examples: • If the software engineers fail to consider that software must detect failures in hardware there will be more faulty error handling failure modes • If the software engineers fail to do state diagraming prior to coding and the system is stateful; state management defects are more likely • If the software engineers fail to do timing design and timing is important for the application; timing defects are more likely 30 0 1 2 3 4 5 6 7 8 9 Defects Failure mode Defects by failure mode
  • 31. Software defect root cause analysis Defects are introduced because of either bad requirements, bad design, bad coding practices or bad change control. • Requirements defect –The “whats” are incorrect, ambiguous or incomplete. • Design defect –The “whats” are correct but the “hows” are not. Logic, state, timing, exception handling are all design related. • Coding defect-The “whats” and “hows” are correct but the software engineer did not implement one or more lines of code properly. 31 0 1 2 3 4 5 6 7 8 9 Detailed design Requirements Maintenance change External change Coding Installation package Usability Number of defects Activity/Artifact introducing defect Defects by originating artifact or activity
  • 32. Software defect root cause analysis Each of the failure modes has multiple root causes as per the Common Defect Enumeration [2] 32 0 1 2 3 4 5 6 7 8 Defects Root cause Defects by root cause
  • 33. Summary 33 IEEE 1633 2016 puts forth recommended practices to apply qualitative software failure modes analyses and qualitative models Improve product and ensure software or firmware delivered with required reliability IEEE 1633 2016 includes improved guidance over 2008 edition Offers increased value to this audience • Reliability engineers • Software quality engineers • Software and engineering managers • Acquisitions • Regulatory IEEE 1633 2023 will make following improvements Incorporates • Common defect enumeration • Tailoring for DevSecOps • Updated models • Refined guidance
  • 34. 34 References [1] The Cold Hard Truth About Reliable Software, Version 6i, AM Neufelder, 2019. [2] Rome Laboratory TR-92-52, “Software Reliability Measurement and Test Integration Techniques”, J Mccall, W Randell, J Dunham, L. Lauterback, 1992. [3] The Common Defect Enumeration, AM Neufelder, Copyright Mission Ready Software, 2021. https://www.dau.edu/cop/rm-engineering/_layouts/15/WopiFrame.aspx?sourcedoc=/cop/rm- engineering/DAU%20Sponsored%20Documents/Reliable%20Software%20SOW%20Appendix%20B%20- %20CDE.xlsx&action=default [4] Effective Application of Software Failure Modes Effects Analysis, AM Neufelder, published by Quanterion Solutions, Inc., 2014. 34