SlideShare a Scribd company logo
Colin Walls
Self-Testing in Embedded
Systems
colin_walls@mentor.com
http://blogs.mentor.com/colinwalls
Restricted © 2017 Mentor Graphics Corporation
Agenda
Introduction
CPU Failure
Peripheral Failure
Memory Failure
Software Error Conditions
Failure Recovery and Reporting
Conclusions
Restricted © 2017 Mentor Graphics Corporation
Agenda
Introduction
CPU Failure
Peripheral Failure
Memory Failure
Software Error Conditions
Failure Recovery and Reporting
Conclusions
Restricted © 2017 Mentor Graphics Corporation
Introduction
 Failure is almost inevitable
 Important to accept this fact
 Key issues:
— How to reduce likelihood of failure
— How to handle impending failure
— How to recover from failure conditions
Restricted © 2017 Mentor Graphics Corporation
Failure and Testing
 What can fail in an
embedded system?
— CPU
— Peripherals
— Memory
— Software
 Start-up testing
 Background testing
 Watchdog
Restricted © 2017 Mentor Graphics Corporation
Agenda
Introduction
CPU Failure
Peripheral Failure
Memory Failure
Software Error Conditions
Failure Recovery and Reporting
Conclusions
Restricted © 2017 Mentor Graphics Corporation
CPU Failure
 Quite unlikely for just processor to fail
 Possibly no hope of recovery
 Failure most likely at power up
 Partial failure very rare
 Multicore designs offer
options
Restricted © 2017 Mentor Graphics Corporation
CPU Failure
 Quite unlikely for just processor to fail
 Possibly no hope of recovery
 Failure most likely at power up
 Partial failure very rare
 Multicore designs
offer options
Restricted © 2017 Mentor Graphics Corporation
Agenda
Introduction
CPU Failure
Peripheral Failure
Memory Failure
Software Error Conditions
Failure Recovery and Reporting
Conclusions
Restricted © 2017 Mentor Graphics Corporation
Peripheral Failure
 Total failure: not responding to address
— Trap handler
 Other failures/tests are device dependent
 Loop back testing is common
Restricted © 2017 Mentor Graphics Corporation
Agenda
Introduction
CPU Failure
Peripheral Failure
Memory Failure
Software Error Conditions
Failure Recovery and Reporting
Conclusions
Restricted © 2017 Mentor Graphics Corporation
Memory Failure
 Systems have a lot of memory
— Surprising that failure is not more
common
 Failure most likely at power up
— That is the time for
comprehensive testing
 3 failure modes:
— Not responding
— Stuck bits
— Cross-talk
Restricted © 2017 Mentor Graphics Corporation
Memory Testing
 “Moving Ones” test
— Looks for stuck bits and cross talk with fine grain resolution
— Perform on start up
— Code to only use registers
— “Moving Zeros” is the same idea
 Pattern test
— Also looks for stuck bits and cross talk
— Perform in background task
 All tests can be optimized if the memory
architecture is known
Restricted © 2017 Mentor Graphics Corporation
Moving Ones Test
set every bit of memory to 0
for each bit of memory
{
verify that all bits are 0
set the bit under test to 1
verify that it is 1
verify all other bits are 0
set the bit under test to 0
}
Restricted © 2017 Mentor Graphics Corporation
Memory Testing
 “Moving Ones” test
— Looks for stuck bits and cross talk with fine grain resolution
— Perform on start up
— Code to only use registers
— “Moving Zeros” is the same idea
 Pattern test
— Also looks for stuck bits and cross talk
— Perform in background task
 All tests can be optimized if the memory
architecture is known
Restricted © 2017 Mentor Graphics Corporation
Pattern Test
for each byte of memory
{
turn off interrupts
save memory byte contents
for values 0x00, 0xff, 0xaa, 0x55
{
write value to byte under test
verify value of byte
}
restore byte data
turn on interrupts
}
Restricted © 2017 Mentor Graphics Corporation
Memory Testing
 “Moving Ones” test
— Looks for stuck bits and cross talk with fine grain resolution
— Perform on start up
— Code to only use registers
— “Moving Zeros” is the same idea
 Pattern test
— Also looks for stuck bits and cross talk
— Perform in background task
 All tests can be optimized if the memory architecture is known
Restricted © 2017 Mentor Graphics Corporation
Agenda
Introduction
CPU Failure
Peripheral Failure
Memory Failure
Software Error Conditions
Failure Recovery and Reporting
Conclusions
Restricted © 2017 Mentor Graphics Corporation
Software Error Conditions
 Bugs can lead to unpredictable failure
 Defensive code can anticipate problems
 2 key failure modes:
— Data corruption
— Code looping
Restricted © 2017 Mentor Graphics Corporation
Data Corruption
 Pointers!
 Null pointer
— Trap handler
 Incorrect pointer
— May point anywhere
— Leads to random corruption
— MMU may help
— Special cases
– Stack overflow/underflow
– Array bound violations
Restricted © 2017 Mentor Graphics Corporation
Stack Overflow/Underflow
 Avoid by careful testing
— Use memory access breakpoints during
debug
— Unexpected recursion depth is hard to
predict
 Use guard words
— Test periodically
– Background task?
— Choose odd value to avoid addresses
— Use unlikely value
– Not 0, 1, 0xffffffff
— 4 billion to 1 chance of a false alarm
— Might use MMU
Restricted © 2017 Mentor Graphics Corporation
Stack Overflow/Underflow
0x16951695
0x16951695
 Avoid by careful testing
— Use memory access breakpoints during
debug
— Unexpected recursion depth is hard to
predict
 Use guard words
— Test periodically
– Background task?
— Choose odd value to avoid addresses
— Use unlikely value
– Not 0, 1, 0xffffffff
— 4 billion to 1 chance of a false alarm
— Might use MMU
Restricted © 2017 Mentor Graphics Corporation
Stack Overflow/Underflow
0x00000000
0xffffffff
0x00000000
0x00000000
0x99000000
0x00000000
0x00000000
0x00000000
0x12341234
0x16951695
0x16951695
 Avoid by careful testing
— Use memory access breakpoints during
debug
— Unexpected recursion depth is hard to
predict
 Use guard words
— Test periodically
– Background task?
— Choose odd value to avoid addresses
— Use unlikely value
– Not 0, 1, 0xffffffff
— 4 billion to 1 chance of a false alarm
— Might use MMU
Restricted © 2017 Mentor Graphics Corporation
Stack Overflow/Underflow
0x16951695
0x16951695
0x77770000
0x00000000
0xffffffff
0x00000000
0x00000000
0x99000000
0x00000000
0x00000000
0x00000000
0x12341234
 Avoid by careful testing
— Use memory access breakpoints during
debug
— Unexpected recursion depth is hard to
predict
 Use guard words
— Test periodically
– Background task?
— Choose odd value to avoid addresses
— Use unlikely value
– Not 0, 1, 0xffffffff
— 4 billion to 1 chance of a false alarm
— Might use MMU
Restricted © 2017 Mentor Graphics Corporation
Array Bound Violations
 No checking in C
— Considered a runtime overhead
— Could add in C++
– Overload the [ ] operator
— Pointers can get around bounding
 Using guard words makes sense
— Most programming errors result in writing
to memory immediately adjacent to the
end of the array
Restricted © 2017 Mentor Graphics Corporation
Code Looping
 Infinite loops should not occur
 May be a programming error
 May be a device failing to respond
— Software should have timeout
 Hardware watchdog helpful
 In multi-threaded environment, use a watchdog task
— Event flag for each task
— Watchdog sets flags to 1 and goes to sleep
— Other tasks periodically set their flag to 0
— When watchdog wakes, any 1 flags result in alarm
Restricted © 2017 Mentor Graphics Corporation
Using a Watchdog Task
Task 1 Task 2 Task 3
Watchdog
Task
1 1 1
Restricted © 2017 Mentor Graphics Corporation
Using a Watchdog Task
Task 1 Task 2 Task 3
Watchdog
Task
1 1 10 0
Restricted © 2017 Mentor Graphics Corporation
Using a Watchdog Task
Task 1 Task 2 Task 3
Watchdog
Task
1 1 10 0
Watchdog
Task
Restricted © 2017 Mentor Graphics Corporation
Agenda
Introduction
CPU Failure
Peripheral Failure
Memory Failure
Software Error Conditions
Failure Recovery and Reporting
Conclusions
Restricted © 2017 Mentor Graphics Corporation
Failure Reporting and Recovery
 Action on finding a fault is very system
dependent
 If there is a user interface, an alarm may
be activated
 A deeply embedded system may have no
option other than a reset
 Example: heart pacemaker
Restricted © 2017 Mentor Graphics Corporation
Sounding the Alarm
 Display
— Text/graphics
 Sound
 Network
— Send email
— Web page
 LEDs
— On/Off
— Color
— Flashing
Restricted © 2017 Mentor Graphics Corporation
Flashing LEDs
 Speed
— Slow – “heartbeat”
— Fast – “error”
 Duty cycle
— Morse code?
LONG = 500
SHORT = 50
flash_delay = LONG
LED_state = 0
loop-forever
{
flags = 0xff
sleep(flash_delay)
set_LED(LED_state)
if LED_state = 0
LED_state = 1
else
LED_state = 0
if flags <> 0
flash_delay = SHORT
}
Restricted © 2017 Mentor Graphics Corporation
Reset
 Automatic reset is sometimes the only option
— Deeply embedded systems with no UI
— Maybe log for later reference
 User input
— Reset button may be reassuring
— Multi-key sequences not intuitive or reliable
Restricted © 2017 Mentor Graphics Corporation
Agenda
Introduction
CPU Failure
Peripheral Failure
Memory Failure
Software Error Conditions
Failure Recovery and Reporting
Conclusions
Restricted © 2017 Mentor Graphics Corporation
Conclusions
 First rule is accept that failure is possible
 Consider all possible failure modes
 Add code to monitor system “health”
 Consider action on failure
— Warn
— Fix
Colin Walls
Thank you
colin_walls@mentor.com
http://blogs.mentor.com/colinwalls

More Related Content

KEY
Test Driven Development - Tulsa TechFest 2009
KEY
Test Driven Development
PPTX
ZeroNights2013 testing of password policy
PDF
Availability in a cloud native world v1.6 (Feb 2019)
PDF
2 colin walls - how to measure rtos performance
PPTX
Computer preemption and TotalView have made debugging Pascal much more seamless
PPTX
Day 2 General Session Presentations RedisConf
PPTX
Webinar: Does Your Data Center Need NVMe?
Test Driven Development - Tulsa TechFest 2009
Test Driven Development
ZeroNights2013 testing of password policy
Availability in a cloud native world v1.6 (Feb 2019)
2 colin walls - how to measure rtos performance
Computer preemption and TotalView have made debugging Pascal much more seamless
Day 2 General Session Presentations RedisConf
Webinar: Does Your Data Center Need NVMe?

Similar to 4 colin walls - self-testing in embedded systems (20)

PPTX
Back to Basics Webinar 6: Production Deployment
PPTX
System hardening - OS and Application
PDF
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
PPTX
Run MongoDB with Confidence Using MongoDB Management Service (MMS)
PDF
Top10 list planningpostgresdeployment.2014
 
PDF
Why Software Test Performance Matters
PPTX
Open source: Top issues in the top enterprise packages
PPTX
Run MongoDB with Confidence: Backing up and Monitoring with MMS
PDF
The Diabolical Developers Guide to Performance Tuning
PDF
How to not fail at security data analytics (by CxOSidekick)
PDF
Simplifying debugging for multi-core Linux devices and low-power Linux clusters
PPTX
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
PPTX
Cyber security - It starts with the embedded system
PDF
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
PPTX
MongoDB Management Service: Getting Started with MMS
PPTX
Automate MongoDB with MongoDB Management Service
PDF
Pitchero - Increasing agility through DevOps - Leeds DevOps November 2016
ODP
DevOps Days Vancouver 2014 Slides
ODP
Final presentasi gnome asia
PPTX
Presentation confoo optimization_sherifzaroubi
Back to Basics Webinar 6: Production Deployment
System hardening - OS and Application
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Run MongoDB with Confidence Using MongoDB Management Service (MMS)
Top10 list planningpostgresdeployment.2014
 
Why Software Test Performance Matters
Open source: Top issues in the top enterprise packages
Run MongoDB with Confidence: Backing up and Monitoring with MMS
The Diabolical Developers Guide to Performance Tuning
How to not fail at security data analytics (by CxOSidekick)
Simplifying debugging for multi-core Linux devices and low-power Linux clusters
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
Cyber security - It starts with the embedded system
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
MongoDB Management Service: Getting Started with MMS
Automate MongoDB with MongoDB Management Service
Pitchero - Increasing agility through DevOps - Leeds DevOps November 2016
DevOps Days Vancouver 2014 Slides
Final presentasi gnome asia
Presentation confoo optimization_sherifzaroubi
Ad

More from Ievgenii Katsan (20)

PDF
8 andrew kalyuzhin - 30 ux-advices, that will make users love you
PDF
5 hans van loenhoud - master-class the 7 skills of highly successful teams
PDF
4 alexey orlov - life of product in startup and enterprise
PDF
3 dmitry gomeniuk - how to make data-driven decisions in saa s products
PDF
7 hans van loenhoud - the problem-goal-solution trinity
PDF
1 hans van loenhoud -
PDF
3 denys gobov - change request specification the knowledge base or the task...
PDF
5 victoria cupet - learn to play business analysis
PDF
5 alina petrenko - key requirements elicitation during the first contact wi...
PDF
3 karabak kuyavets transformation of business analyst to product owner
PDF
4 andrii melnykov - stakeholder management for pd ms and b-as and why it is...
PDF
3 zornitsa nikolova - the product manager between decision making and facil...
PDF
4 viktoriya gudym - how to effectively manage remote employees
PDF
9 natali renska - product and outsource development, how to cook 2 meals in...
PDF
7 denis parkhomenko - from idea to execution how to make a product that cus...
PDF
6 anton vitiaz - inside the mvp in 3 days
PDF
5 mariya popova - ideal product management. unicorns in our reality
PDF
2 victor podzubanov - design thinking game
PDF
3 sergiy potapov - analyst to product owner
PDF
4 anton parkhomenko - how to make effective user research with no budget at...
8 andrew kalyuzhin - 30 ux-advices, that will make users love you
5 hans van loenhoud - master-class the 7 skills of highly successful teams
4 alexey orlov - life of product in startup and enterprise
3 dmitry gomeniuk - how to make data-driven decisions in saa s products
7 hans van loenhoud - the problem-goal-solution trinity
1 hans van loenhoud -
3 denys gobov - change request specification the knowledge base or the task...
5 victoria cupet - learn to play business analysis
5 alina petrenko - key requirements elicitation during the first contact wi...
3 karabak kuyavets transformation of business analyst to product owner
4 andrii melnykov - stakeholder management for pd ms and b-as and why it is...
3 zornitsa nikolova - the product manager between decision making and facil...
4 viktoriya gudym - how to effectively manage remote employees
9 natali renska - product and outsource development, how to cook 2 meals in...
7 denis parkhomenko - from idea to execution how to make a product that cus...
6 anton vitiaz - inside the mvp in 3 days
5 mariya popova - ideal product management. unicorns in our reality
2 victor podzubanov - design thinking game
3 sergiy potapov - analyst to product owner
4 anton parkhomenko - how to make effective user research with no budget at...
Ad

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
sap open course for s4hana steps from ECC to s4
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Spectroscopy.pptx food analysis technology
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
Chapter 3 Spatial Domain Image Processing.pdf
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
“AI and Expert System Decision Support & Business Intelligence Systems”
A comparative analysis of optical character recognition models for extracting...
MYSQL Presentation for SQL database connectivity
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Reach Out and Touch Someone: Haptics and Empathic Computing
sap open course for s4hana steps from ECC to s4
MIND Revenue Release Quarter 2 2025 Press Release
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
Assigned Numbers - 2025 - Bluetooth® Document
20250228 LYD VKU AI Blended-Learning.pptx

4 colin walls - self-testing in embedded systems

  • 1. Colin Walls Self-Testing in Embedded Systems colin_walls@mentor.com http://blogs.mentor.com/colinwalls
  • 2. Restricted © 2017 Mentor Graphics Corporation Agenda Introduction CPU Failure Peripheral Failure Memory Failure Software Error Conditions Failure Recovery and Reporting Conclusions
  • 3. Restricted © 2017 Mentor Graphics Corporation Agenda Introduction CPU Failure Peripheral Failure Memory Failure Software Error Conditions Failure Recovery and Reporting Conclusions
  • 4. Restricted © 2017 Mentor Graphics Corporation Introduction  Failure is almost inevitable  Important to accept this fact  Key issues: — How to reduce likelihood of failure — How to handle impending failure — How to recover from failure conditions
  • 5. Restricted © 2017 Mentor Graphics Corporation Failure and Testing  What can fail in an embedded system? — CPU — Peripherals — Memory — Software  Start-up testing  Background testing  Watchdog
  • 6. Restricted © 2017 Mentor Graphics Corporation Agenda Introduction CPU Failure Peripheral Failure Memory Failure Software Error Conditions Failure Recovery and Reporting Conclusions
  • 7. Restricted © 2017 Mentor Graphics Corporation CPU Failure  Quite unlikely for just processor to fail  Possibly no hope of recovery  Failure most likely at power up  Partial failure very rare  Multicore designs offer options
  • 8. Restricted © 2017 Mentor Graphics Corporation CPU Failure  Quite unlikely for just processor to fail  Possibly no hope of recovery  Failure most likely at power up  Partial failure very rare  Multicore designs offer options
  • 9. Restricted © 2017 Mentor Graphics Corporation Agenda Introduction CPU Failure Peripheral Failure Memory Failure Software Error Conditions Failure Recovery and Reporting Conclusions
  • 10. Restricted © 2017 Mentor Graphics Corporation Peripheral Failure  Total failure: not responding to address — Trap handler  Other failures/tests are device dependent  Loop back testing is common
  • 11. Restricted © 2017 Mentor Graphics Corporation Agenda Introduction CPU Failure Peripheral Failure Memory Failure Software Error Conditions Failure Recovery and Reporting Conclusions
  • 12. Restricted © 2017 Mentor Graphics Corporation Memory Failure  Systems have a lot of memory — Surprising that failure is not more common  Failure most likely at power up — That is the time for comprehensive testing  3 failure modes: — Not responding — Stuck bits — Cross-talk
  • 13. Restricted © 2017 Mentor Graphics Corporation Memory Testing  “Moving Ones” test — Looks for stuck bits and cross talk with fine grain resolution — Perform on start up — Code to only use registers — “Moving Zeros” is the same idea  Pattern test — Also looks for stuck bits and cross talk — Perform in background task  All tests can be optimized if the memory architecture is known
  • 14. Restricted © 2017 Mentor Graphics Corporation Moving Ones Test set every bit of memory to 0 for each bit of memory { verify that all bits are 0 set the bit under test to 1 verify that it is 1 verify all other bits are 0 set the bit under test to 0 }
  • 15. Restricted © 2017 Mentor Graphics Corporation Memory Testing  “Moving Ones” test — Looks for stuck bits and cross talk with fine grain resolution — Perform on start up — Code to only use registers — “Moving Zeros” is the same idea  Pattern test — Also looks for stuck bits and cross talk — Perform in background task  All tests can be optimized if the memory architecture is known
  • 16. Restricted © 2017 Mentor Graphics Corporation Pattern Test for each byte of memory { turn off interrupts save memory byte contents for values 0x00, 0xff, 0xaa, 0x55 { write value to byte under test verify value of byte } restore byte data turn on interrupts }
  • 17. Restricted © 2017 Mentor Graphics Corporation Memory Testing  “Moving Ones” test — Looks for stuck bits and cross talk with fine grain resolution — Perform on start up — Code to only use registers — “Moving Zeros” is the same idea  Pattern test — Also looks for stuck bits and cross talk — Perform in background task  All tests can be optimized if the memory architecture is known
  • 18. Restricted © 2017 Mentor Graphics Corporation Agenda Introduction CPU Failure Peripheral Failure Memory Failure Software Error Conditions Failure Recovery and Reporting Conclusions
  • 19. Restricted © 2017 Mentor Graphics Corporation Software Error Conditions  Bugs can lead to unpredictable failure  Defensive code can anticipate problems  2 key failure modes: — Data corruption — Code looping
  • 20. Restricted © 2017 Mentor Graphics Corporation Data Corruption  Pointers!  Null pointer — Trap handler  Incorrect pointer — May point anywhere — Leads to random corruption — MMU may help — Special cases – Stack overflow/underflow – Array bound violations
  • 21. Restricted © 2017 Mentor Graphics Corporation Stack Overflow/Underflow  Avoid by careful testing — Use memory access breakpoints during debug — Unexpected recursion depth is hard to predict  Use guard words — Test periodically – Background task? — Choose odd value to avoid addresses — Use unlikely value – Not 0, 1, 0xffffffff — 4 billion to 1 chance of a false alarm — Might use MMU
  • 22. Restricted © 2017 Mentor Graphics Corporation Stack Overflow/Underflow 0x16951695 0x16951695  Avoid by careful testing — Use memory access breakpoints during debug — Unexpected recursion depth is hard to predict  Use guard words — Test periodically – Background task? — Choose odd value to avoid addresses — Use unlikely value – Not 0, 1, 0xffffffff — 4 billion to 1 chance of a false alarm — Might use MMU
  • 23. Restricted © 2017 Mentor Graphics Corporation Stack Overflow/Underflow 0x00000000 0xffffffff 0x00000000 0x00000000 0x99000000 0x00000000 0x00000000 0x00000000 0x12341234 0x16951695 0x16951695  Avoid by careful testing — Use memory access breakpoints during debug — Unexpected recursion depth is hard to predict  Use guard words — Test periodically – Background task? — Choose odd value to avoid addresses — Use unlikely value – Not 0, 1, 0xffffffff — 4 billion to 1 chance of a false alarm — Might use MMU
  • 24. Restricted © 2017 Mentor Graphics Corporation Stack Overflow/Underflow 0x16951695 0x16951695 0x77770000 0x00000000 0xffffffff 0x00000000 0x00000000 0x99000000 0x00000000 0x00000000 0x00000000 0x12341234  Avoid by careful testing — Use memory access breakpoints during debug — Unexpected recursion depth is hard to predict  Use guard words — Test periodically – Background task? — Choose odd value to avoid addresses — Use unlikely value – Not 0, 1, 0xffffffff — 4 billion to 1 chance of a false alarm — Might use MMU
  • 25. Restricted © 2017 Mentor Graphics Corporation Array Bound Violations  No checking in C — Considered a runtime overhead — Could add in C++ – Overload the [ ] operator — Pointers can get around bounding  Using guard words makes sense — Most programming errors result in writing to memory immediately adjacent to the end of the array
  • 26. Restricted © 2017 Mentor Graphics Corporation Code Looping  Infinite loops should not occur  May be a programming error  May be a device failing to respond — Software should have timeout  Hardware watchdog helpful  In multi-threaded environment, use a watchdog task — Event flag for each task — Watchdog sets flags to 1 and goes to sleep — Other tasks periodically set their flag to 0 — When watchdog wakes, any 1 flags result in alarm
  • 27. Restricted © 2017 Mentor Graphics Corporation Using a Watchdog Task Task 1 Task 2 Task 3 Watchdog Task 1 1 1
  • 28. Restricted © 2017 Mentor Graphics Corporation Using a Watchdog Task Task 1 Task 2 Task 3 Watchdog Task 1 1 10 0
  • 29. Restricted © 2017 Mentor Graphics Corporation Using a Watchdog Task Task 1 Task 2 Task 3 Watchdog Task 1 1 10 0 Watchdog Task
  • 30. Restricted © 2017 Mentor Graphics Corporation Agenda Introduction CPU Failure Peripheral Failure Memory Failure Software Error Conditions Failure Recovery and Reporting Conclusions
  • 31. Restricted © 2017 Mentor Graphics Corporation Failure Reporting and Recovery  Action on finding a fault is very system dependent  If there is a user interface, an alarm may be activated  A deeply embedded system may have no option other than a reset  Example: heart pacemaker
  • 32. Restricted © 2017 Mentor Graphics Corporation Sounding the Alarm  Display — Text/graphics  Sound  Network — Send email — Web page  LEDs — On/Off — Color — Flashing
  • 33. Restricted © 2017 Mentor Graphics Corporation Flashing LEDs  Speed — Slow – “heartbeat” — Fast – “error”  Duty cycle — Morse code? LONG = 500 SHORT = 50 flash_delay = LONG LED_state = 0 loop-forever { flags = 0xff sleep(flash_delay) set_LED(LED_state) if LED_state = 0 LED_state = 1 else LED_state = 0 if flags <> 0 flash_delay = SHORT }
  • 34. Restricted © 2017 Mentor Graphics Corporation Reset  Automatic reset is sometimes the only option — Deeply embedded systems with no UI — Maybe log for later reference  User input — Reset button may be reassuring — Multi-key sequences not intuitive or reliable
  • 35. Restricted © 2017 Mentor Graphics Corporation Agenda Introduction CPU Failure Peripheral Failure Memory Failure Software Error Conditions Failure Recovery and Reporting Conclusions
  • 36. Restricted © 2017 Mentor Graphics Corporation Conclusions  First rule is accept that failure is possible  Consider all possible failure modes  Add code to monitor system “health”  Consider action on failure — Warn — Fix