HKG18 TR12 - LAVA for LITE Platforms
and Tests
Bill Fletcher
Part1
● LAVA overview and test job basics
● Getting started with the Lab Instance
● Anatomy of a test job
● Looking at LAVA results
● Writing tests
Part2
● LAVA in ci.linaro.org overview
● Invoking LAVA via xmlrpc
● Metadata
● Job templates
Training Contents Summary
● Specifics
○ Material is specific to LITE
○ Emphasis is Zephyr targets rather than
Linux (i.e. monolithic images and no
shells)
● Out of scope
○ mcuboot*
○ Installing a local LAVA instance
○ Adding new devices
○ Adding new features
○ LAVA/Lab planning
*as far as I can tell mcuboot isn’t supported anywhere yet
● The Linaro Automated Validation Architecture
● An automation system for deploying executable images
onto physical and virtual hardware for running tests
● Very scalable
● More details at https://validation.linaro.org/static/docs/v2/
● LAVA Lab went live in July 2011 with 2(!) device types
● Features in the latest version:
○ YAML format job submissions
○ Live result reporting
○ A lot of support for scaled and/or distributed instances
LAVA Overview
Basic Elements of LAVA ● Web interface - UI based on the uWSGI application
server and the Django web framework. It also
provides XML-RPC access and the REST API.
● Database - PostgreSQL locally on the master
storing jobs and device details
● Scheduler - periodically this will scan the database
to check for queued test jobs and available test
devices, starting jobs on a Worker when the needed
resources become available.
● Lava-master daemon - This communicates with the
worker(s) using ZMQ.
● Lava-slave daemon - This receives control
messages from the master and sends logs and
results back to the master using ZMQ.
● Dispatcher - This manages all the operations on the
device under test, according to the job submission
and device parameters sent by the master.
● Device Under Test (DUT)
Dispatchers and Devices
● The picture on the left shows Hikey boards
in the Lab connected to one of the
Dispatchers
● The Dispatcher in this case provides:
○ USB ethernet - Networking
○ FTDI serial - console
○ USB OTG - interface for fastboot/flashing
○ Mode control (via OTG power or not)
○ Power control
● The Dispatcher needs to be able to:
○ Put the device in a known state
○ Deploy the test image to the device
○ Boot the device
○ Exactly monitor the execution of the test phase
○ Put the device back into a known state
LAVA Test Job Basics - a pipeline of Dispatcher
actions
deploy boot test
● Downloads files required by
the job to the dispatcher,
● to: parameter selects the
deployment strategy class
● Boot the device
● The device may be
powered up or reset to
provoke the boot.
● Every boot action must
specify a method: which is
used to determine how to
boot the deployed files on
the device.
● Individual action blocks can
be repeated conditionally or
unconditionally
● Groups of blocks (e.g. boot,
test) can also be repeated
● Other elements/modifiers
are: timeouts, protocols,
user notifications
● Execute the required tests
● Monitor the test excution
● Use naming and pattern
matching elements to parse
the specific test output
A Simplified Example Pipeline of Test Actions
1. deploy
1.1. strategy class
1.2. zephyr image url
2. boot
2.1. specify boot method (e.g. cmsis/pyocd)
3. test
3.1. monitor patterns
deploy boot test test
Repeat 3x
Retry on failure
A more complex job pipeline ...
bootdeploy test
Test Job Actions
● A test job reaches the LAVA Dispatcher as a pipeline of actions
● The action concept within a test job definition is tightly defined
○ there are 3 types of actions (deploy, boot, test)
○ actions don’t overlap (e.g. a test action doesn’t do any booting)
○ Repeating an action gives the same result (idempotency)
● The pipeline structure of each job is explicit - no implied actions or behaviour
● All pipeline steps are validated at submission, this includes checks on all urls
● Actions, and the elements that make them up, are documented here
https://validation.linaro.org/static/docs/v2/dispatcher-actions.html#dispatch
er-actions
● Link to Sample Job Files
job definition actions - k64f
# Zephyr JOB definition for NXP K64F
device_type: 'frdm-k64f'
job_name: 'zephyr tutorial 001 - from ppl.l.o'
[global timeouts, priority, context blocks omitted]
actions:
- deploy:
timeout:
minutes: 3
to: tmpfs
images:
zephyr:
url: 'https://people.linaro.org/~bill.fletcher/zephyr_frdm_k64f.bin'
- boot:
method: cmsis-dap
timeout:
minutes: 3
- test:
monitors:
- name: 'kernel_common'
start: (tc_start()|starting test)
end: PROJECT EXECUTION
pattern: (?P<result>(PASS|FAIL))s-s(?P<test_case_id>w+).
fixupdict:
PASS: pass
FAIL: fail
Getting Started with the Lab Instance
A quick tour of the web UI
https://validation.linaro.org/
“The LAVA Lab”
(Cambridge Lab - Production Instance)
Large installation with 7 workers
1.6M jobs
Reports 118 public devices
validation.linaro.org (LAVA Lab) - logging in
For access to the lab, mail a request to lava-lab-team@linaro.org
WebUI - Scheduler drop-down for general status
● Submit Job - can directly paste a
yaml job file here
● View All Jobs, or jobs in various
states
● All (Active) Devices
● Reports - Overall health check
statistics
● Workers - details of Dispatcher
instances
WebUI drop-down for job authentication tokens
All job submission requires authentication
● Create an authentication token:
https://validation.linaro.org/api/tokens/
● Display the token hash
Getting Support/Reporting Issues
LAVA Lab
Tech Lead: Dave Pigott
Support Portal
● "Problems" -> "Report a Problem".
Mention “LAVA Lab:" for correct
assignment
● Tickets should prominently feature
‘LITE’ in the subject and summary
● Generally please put as much info as
possible in the summary
● For e.g. VPN please include public keys
with the request
LAVA Project
Tech Lead: Neil Williams
Support Info
Mailing list: lava-users@lists.linaro.org
Bugs.linaro.org (->LAVA Framework)
lava-tool
● the command-line tool for interacting with the various services that LAVA
offers using the underlying XML-RPC mechanism
● can also be installed on any machine running a Debian-based distribution,
without needing the rest of LAVA ( $ apt-get install lava-tool )
● allows a user to interact with any LAVA instance on which they have an
account
● primarily designed to assist users in manual tasks and uses keyring integration
● Basic useful lava-tool features:
$ lava-tool auth-add <user@lava-server>
$ lava-tool submit-job <user@lava-server> <job definition file>
Using lava-tool to submit a Lava Job
● This example uses a prebuilt image and job definition file
● Use a test image built for a lab-supported platform - in this case frdm-k64f - at
https://people.linaro.org/~bill.fletcher/zephyr_frdm_k64f.bin
● Use the yaml job definition file here (complete version … and on next slide)
● Get an authentication token from the web UI and paste into lava-tool as prompted
$ lava-tool auth-add https://first.last@validation.linaro.org
● Submit the job
$ lava-tool submit-job https://first.last@validation.linaro.org zephyr_k64_job001.yaml
● lava-tool returns the job number if the submission is successful. You can follow the
results at https://validation.linaro.org/scheduler/myjobs, finding the job number.
# Zephyr JOB definition for NXP K64F
device_type: 'frdm-k64f'
job_name: 'zephyr tutorial hw test job submission 001 - from ppl.l.o'
timeouts:
job:
minutes: 6
action:
minutes: 2
priority: medium
visibility: public
actions:
- deploy:
timeout:
minutes: 3
to: tmpfs
images:
zephyr:
url: 'https://people.linaro.org/~bill.fletcher/zephyr_frdm_k64f.bin'
- boot:
method: cmsis-dap
timeout:
minutes: 3
- test:
monitors:
- name: 'kernel_common'
start: (tc_start()|starting test)
end: PROJECT EXECUTION
pattern: (?P<result>(PASS|FAIL))s-s(?P<test_case_id>w+).
fixupdict:
PASS: pass
FAIL: fail
Anatomy of a test job definition
● Documentation
https://validation.linaro.org/static/docs/v2/explain_first_jo
b.html
● Example job file k64f-kernel-common (previous slide)
● General details -
○ device_type - used by the Scheduler to match your job to a device
○ job_name - free text appearing in the list of jobs
○ Global timeouts - to detect and fail a hung job
● Context:
○ Used to set values for selected variables in the device configuration.
○ Most commonly, to tell the qemu template e.g. which architecture is
being tested
● Test Job actions: Deploy, Boot, Test
Deploy action
- deploy:
timeout:
minutes: 3
to: tmpfs
images:
zephyr:
image_arg: '-kernel {zephyr}'
url: 'https://...
Downloads files required by the job
to the dispatcher
detailed docs
timeout: self-explanatory - can use
seconds … to hours ...
to: specifies the deploy method
image_arg: only needed for jobs that
run on qemu Cortex M3
url: the location of the image
Many other deploy features not used here: OS
awareness, loading test overlays onto rootfs images
Boot action
- boot:
method: cmsis-dap
timeout:
minutes: 3
Boot the device
Detailed docs
timeout: self-explanatory
method: specifies either the command to run on
the dispatcher or the interaction with the bootloader on
the target
Zephyr specific boot methods:
● cmsis_dap.py
● pyocd.py
● qemu
No Parameters?
● The individual board is not known at job
submission time, so the Scheduler has to
populate the relevant ports, power-reset
control I/O etc
● Command line parameters for e.g. pyocd
are populated from the device_type
template in the Scheduler
Test action
- test:
monitors:
- name: 'kernel_common'
start: (tc_start()|starting test)
end: PROJECT EXECUTION
pattern:
(?P<result>(PASS|FAIL))s-s(?P<test_case_id>w+).
fixupdict:
PASS: pass
FAIL: fail
Execute the required tests
monitors: one-way DUT connection
-
https://git.linaro.org/lava/lava-dispatcher.git/tree/lava_disp
atcher/actions/test/monitor.py
name: appears in the results output
start: string used to detect when the
test action starts
end: string used to detect when the
test action is finished
pattern: supplies a parser that
converts each test output into results
fixupdict: as a default, LAVA
understands only
“pass”|”fail”|”skip”|”unknown”
Sample output to parse:
PASS - byteorder_test_memcpy_swap.
Looking at LAVA results
See what happens when we run the job …
● In the following slides:
● Results
● Job Details
● Timing
$ lava-tool submit-job returns the job number...
A link to the full trace is here:
https://validation.linaro.org/scheduler/job/1656241
Job Summary List
Results
Details
Results
Your Tests
LAVA’s checks
Job Details - start of Deploy action
Job Details - start of Boot action
Job Details - Test action parsing
- test:
monitors:
- name: 'kernel_common'
start: (tc_start()|starting test)
end: PROJECT EXECUTION
pattern:
(?P<result>(PASS|FAIL))s-s(?P<test_case_id>w+).
fixupdict:
PASS: pass
FAIL: fail
(not matched)
Job Timing - for timeout tuning, not
benchmarking
A More Complex Zephyr Test Example
Output the Zephyr boot time values as the result of a test, and also that the boot
test succeeded (tests/benchmarks/boot_time)
tc_start() - Boot Time Measurement
Boot Result: Clock Frequency: 12 MHz
__start : 0 cycles, 0 us
_start->main(): 5030 cycles, 419 us
_start->task : 5461 cycles, 455 us
_start->idle : 8934 cycles, 744 us
Boot Time Measurement finished
===================================================================
PASS - main.
===================================================================
PROJECT EXECUTION SUCCESSFUL
Pipeline: cascade 2 test actions
The first test action matches _start->... and picks out the microsecond values
The second test action matches PASS and picks out the test case which is main
Example Solution:
The Job Definition
The Results
The Measurements
Writing Tests
● pattern: expressions need to be compatible with pexpect/re (used by the
Dispatcher)
● monitor: is for devices without a unix-style* shell. It handles output only
● monitor: pattern matches can populate named Python regex groups for
test_case_id, result, measurement, units
● Obviously tests that need some interaction to boot and/or run can’t be
automated with LAVA
● The pattern: syntax has not been designed for complex detailed parsing of
test output logs. The expectation was that it would invoke (via a shell) and
parse the results of scripts/commands that would do most of the heavy lifting
in dealing with test suite output
*The Lava Test Shell is used for testing devices that have a unix style shell and a writeable FS.
Writing tests - coping strategies
● Most (non-Zephyr) LAVA users craft their test invocation scripts to fit existing
pattern: boilerplate
● Prototype pattern: re expressions in an offline python script before trying them
in LAVA
● Debug them further in LAVA test actions on an M3 qemu instance first (fast,
doesn’t tie up resources, unbreakable)
● The more carefully crafted a pattern: is, the more brittle it will likely be when
the Zephyr-side code changes
● Cascading multiple test action blocks can solve more complex parsing
problems
LAVA and CI
Overview
LAVA in ci.linaro.org
XMLRPC
Metadata
Job templates
Overview - industrializing LAVA
Health checks
Target requirements
Metadata
Health Checks & Gold Standard Images
● Health check
○ special type of test job
○ designed to validate a test device and the infrastructure around it
○ run periodically to check for equipment and/or infrastructure failures
○ needs to at least check that the device will boot and deploy a test image.
● Writing Health Checks
○ It has a job name describing the test as a health check
○ It has a minimal set of test definitions
○ It uses gold standard files
● Gold Standard
○ Gold standard has been defined in association with the QA team.
○ Provide a known baseline for test definition writers
○ (open point: are there gold standard images and jobs for LITE target boards?)
Sources of Target Board Success ...
● See https://validation.linaro.org/static/docs/v2/device-integration.html section
on Device Integration
A few LITE-relevant points:
● Serial
○ Persistent, stable
○ if over a shared OTG cable, other traffic does not disrupt trace
● Reset
○ Image data not retained
○ ‘old’ serial data not buffered/retained
● Predictable & repeatable
● No manual intervention
Metadata
● Linking a LAVA job and its result artifacts back to the code - not important for
ad hoc submission, but vital for CI
● Specific metadata: section within the jobfile
● Can be queried for a job via xmlrpc
● Example API call get_testjob_metadata (job_id)
● Call returns entries created by LAVA as well as submitted in the test job
definition
● Example
metadata:
build-url: $build_url
build-log: $build_url/consoleText
zephyr-gcc-variant: $gcc_variant
platform: $board_name
git-url: https://git.linaro.org/zephyrproject-org/zephyr.git
git-commit: $git_commit
LAVA in ci.linaro.org
Jenkins
ci.linaro.org
LAVA Instance
validation.linaro.org
job
file
Test Farm
Deploy
Boot,
Test
Output
Results?
Idealised flow: ● In practice, LAVA jobs are
submitted by the QA server which
acts as a proxy, not by ci.linaro.orglinaro-cp
submit-to-lava
Jenkins
ci.linaro.org
LAVA Instance
validation.linaro.org
Test Farm
Boot,
Test
Output
Results
QA Server
submit-for-qa
● In either case LAVA is invoked via
xmlrpc APIs
metadata?
Invoking a LAVA job via xmlrpc
#!/usr/lib/python
import xmlrpclib
username = "bill.fletcher"
token = "<token string>"
hostname = "validation.linaro.org"
server = xmlrpclib.ServerProxy("https://%s:%s@%s/RPC2" % (username, token, hostname))
jobfile = open("zephyr_k64_job001.yaml")
jobtext = jobfile.read()
id = server.scheduler.submit_job(jobtext)
print server.scheduler.job_status(id)
The above is approximately equivalent to $ lava-tool submit-job ...
The API is documented here https://validation.linaro.org/api/help/
Creating the jobfile on the fly - templates
Uses class string.Template(template)
template_file_name = "lava-job-definitions/%s/template.yaml" % (args.device_type, )
test_template = None
if os.path.exists(template_file_name):
test_template_file = open(template_file_name, "r")
test_template = test_template_file.read()
test_template_file.close()
else:
sys.exit(1)
replace_dict = dict(
build_url=args.build_url,
test_url=args.test_url,
device_type=args.device_type,
board_name=args.board_name,
test_name=args.test_name,
git_commit=args.git_commit,
gcc_variant=args.gcc_variant
)
template = Template(test_template)
lava_job = template.substitute(replace_dict)
Job Templates - actions
actions:
- deploy:
timeout:
minutes: 3
to: tmpfs
images:
zephyr:
url: ' $test_url '
- boot:
method: pyocd
timeout:
minutes: 10
- test:
timeout:
minutes: 10
monitors:
- name: ' $test_name '
start: (tc_start()|starting .*test|BOOTING ZEPHYR OS)
end: PROJECT EXECUTION
pattern: (?P<result>(PASS|FAIL))s-s(?P<test_case_id>w+).
fixupdict:
PASS: pass
FAIL: fail
Maybe consider also including
pattern: in the template, so that it
tracks any changes in the test
Job Templates - general, timeouts & metadata
# Zephyr JOB definition for frdm-kw41z
device_type: ' $device_type '
job_name: 'zephyr-upstream $test_name '
timeouts:
job:
minutes: 30
action:
minutes: 3
actions:
wait-usb-device:
seconds: 40
priority: medium
visibility: public
<actions>
metadata:
build-url: $build_url
build-log: $build_url /consoleText
zephyr-gcc-variant: $gcc_variant
platform: $board_name
git-url: https://git.linaro.org/zephyrproject-org/zephyr.git
git-commit: $git_commit
Thank You
#HKG18
HKG18 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

More Related Content

PDF
LCA13: LAVA and CI Component Review
PDF
LCE13: LAVA Multi-Node Testing
PPTX
Demo
ODP
Introduction to LAVA Workload Scheduler
PDF
Chronon - A Back-In-Time-Debugger for Java
PDF
Uvm presentation dac2011_final
PPTX
Cpp unit
ODP
NovaProva, a new generation unit test framework for C programs
LCA13: LAVA and CI Component Review
LCE13: LAVA Multi-Node Testing
Demo
Introduction to LAVA Workload Scheduler
Chronon - A Back-In-Time-Debugger for Java
Uvm presentation dac2011_final
Cpp unit
NovaProva, a new generation unit test framework for C programs

What's hot (18)

PPTX
Java Hates Linux. Deal With It.
PDF
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
PDF
CppUnit using introduction
PDF
LAS16-307: Benchmarking Schedutil in Android
PDF
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
PPT
HP Quick Test Professional
PDF
Top five reasons why every DV engineer will love the latest systemverilog 201...
PDF
DTrace Topics: Introduction
PPTX
"Introduction to JMeter" @ CPTM 3rd Session
PDF
Software Profiling: Java Performance, Profiling and Flamegraphs
PDF
Analyzing Java Applications Using Thermostat (Omair Majid)
PPT
SystemVerilog Assertions verification with SVAUnit - DVCon US 2016 Tutorial
PDF
Brief introduction to kselftest
PDF
Linux Internals - Part II
PPTX
UVM Driver sequencer handshaking
PDF
A New Tracer for Reverse Engineering - PacSec 2010
DOCX
Ecet 360 Enthusiastic Study / snaptutorial.com
Java Hates Linux. Deal With It.
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
CppUnit using introduction
LAS16-307: Benchmarking Schedutil in Android
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
HP Quick Test Professional
Top five reasons why every DV engineer will love the latest systemverilog 201...
DTrace Topics: Introduction
"Introduction to JMeter" @ CPTM 3rd Session
Software Profiling: Java Performance, Profiling and Flamegraphs
Analyzing Java Applications Using Thermostat (Omair Majid)
SystemVerilog Assertions verification with SVAUnit - DVCon US 2016 Tutorial
Brief introduction to kselftest
Linux Internals - Part II
UVM Driver sequencer handshaking
A New Tracer for Reverse Engineering - PacSec 2010
Ecet 360 Enthusiastic Study / snaptutorial.com
Ad

Similar to HKG18-TR12 - LAVA for LITE Platforms and Tests (20)

PDF
LCE13: Test and Validation Summit: The future of testing at Linaro
PDF
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
PDF
PaaSTA: Autoscaling at Yelp
PDF
PaaSTA: Running applications at Yelp
PDF
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
PDF
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (II)
PDF
BKK16-312 Integrating and controlling embedded devices in LAVA
PPTX
Automation tools: making things go... (March 2019)
PDF
LISA15: systemd, the Next-Generation Linux System Manager
PDF
Load testing in Zonky with Gatling
PDF
Pytest - testing tips and useful plugins
PPTX
Hadoop cluster performance profiler
PPTX
Neutron upgrades
PDF
HKG15-204: OpenStack: 3rd party testing and performance benchmarking
PDF
Write unit test from scratch
PDF
JMeter-UCCSC.pdf
PDF
The State of the Veil Framework
PDF
Automation for developers
PDF
BKK16-210 Migrating to the new dispatcher
PPTX
Testing Django APIs
LCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
PaaSTA: Autoscaling at Yelp
PaaSTA: Running applications at Yelp
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (II)
BKK16-312 Integrating and controlling embedded devices in LAVA
Automation tools: making things go... (March 2019)
LISA15: systemd, the Next-Generation Linux System Manager
Load testing in Zonky with Gatling
Pytest - testing tips and useful plugins
Hadoop cluster performance profiler
Neutron upgrades
HKG15-204: OpenStack: 3rd party testing and performance benchmarking
Write unit test from scratch
JMeter-UCCSC.pdf
The State of the Veil Framework
Automation for developers
BKK16-210 Migrating to the new dispatcher
Testing Django APIs
Ad

More from Linaro (20)

PDF
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
PDF
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
PDF
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
PDF
Bud17 113: distribution ci using qemu and open qa
PDF
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
PDF
HPC network stack on ARM - Linaro HPC Workshop 2018
PDF
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
PDF
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
PDF
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
PDF
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
PDF
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
PDF
HKG18-100K1 - George Grey: Opening Keynote
PDF
HKG18-318 - OpenAMP Workshop
PDF
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
PDF
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
PDF
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
PDF
HKG18-TR08 - Upstreaming SVE in QEMU
PDF
HKG18-113- Secure Data Path work with i.MX8M
PPTX
HKG18-120 - Devicetree Schema Documentation and Validation
PPTX
HKG18-223 - Trusted FirmwareM: Trusted boot
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Bud17 113: distribution ci using qemu and open qa
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-100K1 - George Grey: Opening Keynote
HKG18-318 - OpenAMP Workshop
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-113- Secure Data Path work with i.MX8M
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-223 - Trusted FirmwareM: Trusted boot

Recently uploaded (20)

PDF
Getting started with AI Agents and Multi-Agent Systems
DOCX
search engine optimization ppt fir known well about this
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
STKI Israel Market Study 2025 version august
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPT
Geologic Time for studying geology for geologist
PPTX
Microsoft Excel 365/2024 Beginner's training
PPTX
Configure Apache Mutual Authentication
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
Getting started with AI Agents and Multi-Agent Systems
search engine optimization ppt fir known well about this
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Benefits of Physical activity for teenagers.pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
2018-HIPAA-Renewal-Training for executives
Consumable AI The What, Why & How for Small Teams.pdf
The various Industrial Revolutions .pptx
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A comparative study of natural language inference in Swahili using monolingua...
Hindi spoken digit analysis for native and non-native speakers
Zenith AI: Advanced Artificial Intelligence
A contest of sentiment analysis: k-nearest neighbor versus neural network
STKI Israel Market Study 2025 version august
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
A proposed approach for plagiarism detection in Myanmar Unicode text
Geologic Time for studying geology for geologist
Microsoft Excel 365/2024 Beginner's training
Configure Apache Mutual Authentication
Custom Battery Pack Design Considerations for Performance and Safety

HKG18-TR12 - LAVA for LITE Platforms and Tests

  • 1. HKG18 TR12 - LAVA for LITE Platforms and Tests Bill Fletcher
  • 2. Part1 ● LAVA overview and test job basics ● Getting started with the Lab Instance ● Anatomy of a test job ● Looking at LAVA results ● Writing tests Part2 ● LAVA in ci.linaro.org overview ● Invoking LAVA via xmlrpc ● Metadata ● Job templates Training Contents Summary ● Specifics ○ Material is specific to LITE ○ Emphasis is Zephyr targets rather than Linux (i.e. monolithic images and no shells) ● Out of scope ○ mcuboot* ○ Installing a local LAVA instance ○ Adding new devices ○ Adding new features ○ LAVA/Lab planning *as far as I can tell mcuboot isn’t supported anywhere yet
  • 3. ● The Linaro Automated Validation Architecture ● An automation system for deploying executable images onto physical and virtual hardware for running tests ● Very scalable ● More details at https://validation.linaro.org/static/docs/v2/ ● LAVA Lab went live in July 2011 with 2(!) device types ● Features in the latest version: ○ YAML format job submissions ○ Live result reporting ○ A lot of support for scaled and/or distributed instances LAVA Overview
  • 4. Basic Elements of LAVA ● Web interface - UI based on the uWSGI application server and the Django web framework. It also provides XML-RPC access and the REST API. ● Database - PostgreSQL locally on the master storing jobs and device details ● Scheduler - periodically this will scan the database to check for queued test jobs and available test devices, starting jobs on a Worker when the needed resources become available. ● Lava-master daemon - This communicates with the worker(s) using ZMQ. ● Lava-slave daemon - This receives control messages from the master and sends logs and results back to the master using ZMQ. ● Dispatcher - This manages all the operations on the device under test, according to the job submission and device parameters sent by the master. ● Device Under Test (DUT)
  • 5. Dispatchers and Devices ● The picture on the left shows Hikey boards in the Lab connected to one of the Dispatchers ● The Dispatcher in this case provides: ○ USB ethernet - Networking ○ FTDI serial - console ○ USB OTG - interface for fastboot/flashing ○ Mode control (via OTG power or not) ○ Power control ● The Dispatcher needs to be able to: ○ Put the device in a known state ○ Deploy the test image to the device ○ Boot the device ○ Exactly monitor the execution of the test phase ○ Put the device back into a known state
  • 6. LAVA Test Job Basics - a pipeline of Dispatcher actions deploy boot test ● Downloads files required by the job to the dispatcher, ● to: parameter selects the deployment strategy class ● Boot the device ● The device may be powered up or reset to provoke the boot. ● Every boot action must specify a method: which is used to determine how to boot the deployed files on the device. ● Individual action blocks can be repeated conditionally or unconditionally ● Groups of blocks (e.g. boot, test) can also be repeated ● Other elements/modifiers are: timeouts, protocols, user notifications ● Execute the required tests ● Monitor the test excution ● Use naming and pattern matching elements to parse the specific test output
  • 7. A Simplified Example Pipeline of Test Actions 1. deploy 1.1. strategy class 1.2. zephyr image url 2. boot 2.1. specify boot method (e.g. cmsis/pyocd) 3. test 3.1. monitor patterns deploy boot test test Repeat 3x Retry on failure A more complex job pipeline ... bootdeploy test
  • 8. Test Job Actions ● A test job reaches the LAVA Dispatcher as a pipeline of actions ● The action concept within a test job definition is tightly defined ○ there are 3 types of actions (deploy, boot, test) ○ actions don’t overlap (e.g. a test action doesn’t do any booting) ○ Repeating an action gives the same result (idempotency) ● The pipeline structure of each job is explicit - no implied actions or behaviour ● All pipeline steps are validated at submission, this includes checks on all urls ● Actions, and the elements that make them up, are documented here https://validation.linaro.org/static/docs/v2/dispatcher-actions.html#dispatch er-actions ● Link to Sample Job Files
  • 9. job definition actions - k64f # Zephyr JOB definition for NXP K64F device_type: 'frdm-k64f' job_name: 'zephyr tutorial 001 - from ppl.l.o' [global timeouts, priority, context blocks omitted] actions: - deploy: timeout: minutes: 3 to: tmpfs images: zephyr: url: 'https://people.linaro.org/~bill.fletcher/zephyr_frdm_k64f.bin' - boot: method: cmsis-dap timeout: minutes: 3 - test: monitors: - name: 'kernel_common' start: (tc_start()|starting test) end: PROJECT EXECUTION pattern: (?P<result>(PASS|FAIL))s-s(?P<test_case_id>w+). fixupdict: PASS: pass FAIL: fail
  • 10. Getting Started with the Lab Instance A quick tour of the web UI https://validation.linaro.org/ “The LAVA Lab” (Cambridge Lab - Production Instance) Large installation with 7 workers 1.6M jobs Reports 118 public devices
  • 11. validation.linaro.org (LAVA Lab) - logging in For access to the lab, mail a request to lava-lab-team@linaro.org
  • 12. WebUI - Scheduler drop-down for general status ● Submit Job - can directly paste a yaml job file here ● View All Jobs, or jobs in various states ● All (Active) Devices ● Reports - Overall health check statistics ● Workers - details of Dispatcher instances
  • 13. WebUI drop-down for job authentication tokens All job submission requires authentication ● Create an authentication token: https://validation.linaro.org/api/tokens/ ● Display the token hash
  • 14. Getting Support/Reporting Issues LAVA Lab Tech Lead: Dave Pigott Support Portal ● "Problems" -> "Report a Problem". Mention “LAVA Lab:" for correct assignment ● Tickets should prominently feature ‘LITE’ in the subject and summary ● Generally please put as much info as possible in the summary ● For e.g. VPN please include public keys with the request LAVA Project Tech Lead: Neil Williams Support Info Mailing list: lava-users@lists.linaro.org Bugs.linaro.org (->LAVA Framework)
  • 15. lava-tool ● the command-line tool for interacting with the various services that LAVA offers using the underlying XML-RPC mechanism ● can also be installed on any machine running a Debian-based distribution, without needing the rest of LAVA ( $ apt-get install lava-tool ) ● allows a user to interact with any LAVA instance on which they have an account ● primarily designed to assist users in manual tasks and uses keyring integration ● Basic useful lava-tool features: $ lava-tool auth-add <user@lava-server> $ lava-tool submit-job <user@lava-server> <job definition file>
  • 16. Using lava-tool to submit a Lava Job ● This example uses a prebuilt image and job definition file ● Use a test image built for a lab-supported platform - in this case frdm-k64f - at https://people.linaro.org/~bill.fletcher/zephyr_frdm_k64f.bin ● Use the yaml job definition file here (complete version … and on next slide) ● Get an authentication token from the web UI and paste into lava-tool as prompted $ lava-tool auth-add https://first.last@validation.linaro.org ● Submit the job $ lava-tool submit-job https://first.last@validation.linaro.org zephyr_k64_job001.yaml ● lava-tool returns the job number if the submission is successful. You can follow the results at https://validation.linaro.org/scheduler/myjobs, finding the job number.
  • 17. # Zephyr JOB definition for NXP K64F device_type: 'frdm-k64f' job_name: 'zephyr tutorial hw test job submission 001 - from ppl.l.o' timeouts: job: minutes: 6 action: minutes: 2 priority: medium visibility: public actions: - deploy: timeout: minutes: 3 to: tmpfs images: zephyr: url: 'https://people.linaro.org/~bill.fletcher/zephyr_frdm_k64f.bin' - boot: method: cmsis-dap timeout: minutes: 3 - test: monitors: - name: 'kernel_common' start: (tc_start()|starting test) end: PROJECT EXECUTION pattern: (?P<result>(PASS|FAIL))s-s(?P<test_case_id>w+). fixupdict: PASS: pass FAIL: fail
  • 18. Anatomy of a test job definition ● Documentation https://validation.linaro.org/static/docs/v2/explain_first_jo b.html ● Example job file k64f-kernel-common (previous slide) ● General details - ○ device_type - used by the Scheduler to match your job to a device ○ job_name - free text appearing in the list of jobs ○ Global timeouts - to detect and fail a hung job ● Context: ○ Used to set values for selected variables in the device configuration. ○ Most commonly, to tell the qemu template e.g. which architecture is being tested ● Test Job actions: Deploy, Boot, Test
  • 19. Deploy action - deploy: timeout: minutes: 3 to: tmpfs images: zephyr: image_arg: '-kernel {zephyr}' url: 'https://... Downloads files required by the job to the dispatcher detailed docs timeout: self-explanatory - can use seconds … to hours ... to: specifies the deploy method image_arg: only needed for jobs that run on qemu Cortex M3 url: the location of the image Many other deploy features not used here: OS awareness, loading test overlays onto rootfs images
  • 20. Boot action - boot: method: cmsis-dap timeout: minutes: 3 Boot the device Detailed docs timeout: self-explanatory method: specifies either the command to run on the dispatcher or the interaction with the bootloader on the target Zephyr specific boot methods: ● cmsis_dap.py ● pyocd.py ● qemu No Parameters? ● The individual board is not known at job submission time, so the Scheduler has to populate the relevant ports, power-reset control I/O etc ● Command line parameters for e.g. pyocd are populated from the device_type template in the Scheduler
  • 21. Test action - test: monitors: - name: 'kernel_common' start: (tc_start()|starting test) end: PROJECT EXECUTION pattern: (?P<result>(PASS|FAIL))s-s(?P<test_case_id>w+). fixupdict: PASS: pass FAIL: fail Execute the required tests monitors: one-way DUT connection - https://git.linaro.org/lava/lava-dispatcher.git/tree/lava_disp atcher/actions/test/monitor.py name: appears in the results output start: string used to detect when the test action starts end: string used to detect when the test action is finished pattern: supplies a parser that converts each test output into results fixupdict: as a default, LAVA understands only “pass”|”fail”|”skip”|”unknown” Sample output to parse: PASS - byteorder_test_memcpy_swap.
  • 22. Looking at LAVA results See what happens when we run the job … ● In the following slides: ● Results ● Job Details ● Timing $ lava-tool submit-job returns the job number... A link to the full trace is here: https://validation.linaro.org/scheduler/job/1656241
  • 25. Job Details - start of Deploy action
  • 26. Job Details - start of Boot action
  • 27. Job Details - Test action parsing - test: monitors: - name: 'kernel_common' start: (tc_start()|starting test) end: PROJECT EXECUTION pattern: (?P<result>(PASS|FAIL))s-s(?P<test_case_id>w+). fixupdict: PASS: pass FAIL: fail (not matched)
  • 28. Job Timing - for timeout tuning, not benchmarking
  • 29. A More Complex Zephyr Test Example Output the Zephyr boot time values as the result of a test, and also that the boot test succeeded (tests/benchmarks/boot_time) tc_start() - Boot Time Measurement Boot Result: Clock Frequency: 12 MHz __start : 0 cycles, 0 us _start->main(): 5030 cycles, 419 us _start->task : 5461 cycles, 455 us _start->idle : 8934 cycles, 744 us Boot Time Measurement finished =================================================================== PASS - main. =================================================================== PROJECT EXECUTION SUCCESSFUL Pipeline: cascade 2 test actions The first test action matches _start->... and picks out the microsecond values The second test action matches PASS and picks out the test case which is main Example Solution: The Job Definition The Results The Measurements
  • 30. Writing Tests ● pattern: expressions need to be compatible with pexpect/re (used by the Dispatcher) ● monitor: is for devices without a unix-style* shell. It handles output only ● monitor: pattern matches can populate named Python regex groups for test_case_id, result, measurement, units ● Obviously tests that need some interaction to boot and/or run can’t be automated with LAVA ● The pattern: syntax has not been designed for complex detailed parsing of test output logs. The expectation was that it would invoke (via a shell) and parse the results of scripts/commands that would do most of the heavy lifting in dealing with test suite output *The Lava Test Shell is used for testing devices that have a unix style shell and a writeable FS.
  • 31. Writing tests - coping strategies ● Most (non-Zephyr) LAVA users craft their test invocation scripts to fit existing pattern: boilerplate ● Prototype pattern: re expressions in an offline python script before trying them in LAVA ● Debug them further in LAVA test actions on an M3 qemu instance first (fast, doesn’t tie up resources, unbreakable) ● The more carefully crafted a pattern: is, the more brittle it will likely be when the Zephyr-side code changes ● Cascading multiple test action blocks can solve more complex parsing problems
  • 32. LAVA and CI Overview LAVA in ci.linaro.org XMLRPC Metadata Job templates
  • 33. Overview - industrializing LAVA Health checks Target requirements Metadata
  • 34. Health Checks & Gold Standard Images ● Health check ○ special type of test job ○ designed to validate a test device and the infrastructure around it ○ run periodically to check for equipment and/or infrastructure failures ○ needs to at least check that the device will boot and deploy a test image. ● Writing Health Checks ○ It has a job name describing the test as a health check ○ It has a minimal set of test definitions ○ It uses gold standard files ● Gold Standard ○ Gold standard has been defined in association with the QA team. ○ Provide a known baseline for test definition writers ○ (open point: are there gold standard images and jobs for LITE target boards?)
  • 35. Sources of Target Board Success ... ● See https://validation.linaro.org/static/docs/v2/device-integration.html section on Device Integration A few LITE-relevant points: ● Serial ○ Persistent, stable ○ if over a shared OTG cable, other traffic does not disrupt trace ● Reset ○ Image data not retained ○ ‘old’ serial data not buffered/retained ● Predictable & repeatable ● No manual intervention
  • 36. Metadata ● Linking a LAVA job and its result artifacts back to the code - not important for ad hoc submission, but vital for CI ● Specific metadata: section within the jobfile ● Can be queried for a job via xmlrpc ● Example API call get_testjob_metadata (job_id) ● Call returns entries created by LAVA as well as submitted in the test job definition ● Example metadata: build-url: $build_url build-log: $build_url/consoleText zephyr-gcc-variant: $gcc_variant platform: $board_name git-url: https://git.linaro.org/zephyrproject-org/zephyr.git git-commit: $git_commit
  • 37. LAVA in ci.linaro.org Jenkins ci.linaro.org LAVA Instance validation.linaro.org job file Test Farm Deploy Boot, Test Output Results? Idealised flow: ● In practice, LAVA jobs are submitted by the QA server which acts as a proxy, not by ci.linaro.orglinaro-cp submit-to-lava Jenkins ci.linaro.org LAVA Instance validation.linaro.org Test Farm Boot, Test Output Results QA Server submit-for-qa ● In either case LAVA is invoked via xmlrpc APIs metadata?
  • 38. Invoking a LAVA job via xmlrpc #!/usr/lib/python import xmlrpclib username = "bill.fletcher" token = "<token string>" hostname = "validation.linaro.org" server = xmlrpclib.ServerProxy("https://%s:%s@%s/RPC2" % (username, token, hostname)) jobfile = open("zephyr_k64_job001.yaml") jobtext = jobfile.read() id = server.scheduler.submit_job(jobtext) print server.scheduler.job_status(id) The above is approximately equivalent to $ lava-tool submit-job ... The API is documented here https://validation.linaro.org/api/help/
  • 39. Creating the jobfile on the fly - templates Uses class string.Template(template) template_file_name = "lava-job-definitions/%s/template.yaml" % (args.device_type, ) test_template = None if os.path.exists(template_file_name): test_template_file = open(template_file_name, "r") test_template = test_template_file.read() test_template_file.close() else: sys.exit(1) replace_dict = dict( build_url=args.build_url, test_url=args.test_url, device_type=args.device_type, board_name=args.board_name, test_name=args.test_name, git_commit=args.git_commit, gcc_variant=args.gcc_variant ) template = Template(test_template) lava_job = template.substitute(replace_dict)
  • 40. Job Templates - actions actions: - deploy: timeout: minutes: 3 to: tmpfs images: zephyr: url: ' $test_url ' - boot: method: pyocd timeout: minutes: 10 - test: timeout: minutes: 10 monitors: - name: ' $test_name ' start: (tc_start()|starting .*test|BOOTING ZEPHYR OS) end: PROJECT EXECUTION pattern: (?P<result>(PASS|FAIL))s-s(?P<test_case_id>w+). fixupdict: PASS: pass FAIL: fail Maybe consider also including pattern: in the template, so that it tracks any changes in the test
  • 41. Job Templates - general, timeouts & metadata # Zephyr JOB definition for frdm-kw41z device_type: ' $device_type ' job_name: 'zephyr-upstream $test_name ' timeouts: job: minutes: 30 action: minutes: 3 actions: wait-usb-device: seconds: 40 priority: medium visibility: public <actions> metadata: build-url: $build_url build-log: $build_url /consoleText zephyr-gcc-variant: $gcc_variant platform: $board_name git-url: https://git.linaro.org/zephyrproject-org/zephyr.git git-commit: $git_commit
  • 42. Thank You #HKG18 HKG18 keynotes and videos on: connect.linaro.org For further information: www.linaro.org