GPUVerify - Implementation

I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE

GPUVerify
Section 4 - Veriﬁcation Method

Thomas Wood

November 28, 2012


I NTRODUCTION

Section 4 describes in detail the the implementation of a veriﬁer
for the semantics detailed in the previous sections.


T RANSLATION

Compiler from OpenCL/CUDA to intermediary Boogie built
on CLANG/LLVM (a compiler toolset)


S PECIALISED GPU F EATURES

Although both GPU languages and Boogie are both C-like,
both extend C in different ways.
In particular, GPU languages additionally support:
Vector and Image types
Intrinsic functions supported by the hardware and
compiler eg: advanced maths
Writing translations for these features for Boogie is time
consuming.
(And apparently boring, the paper doesn’t say any more on this)


B OOGIE AND F LOATS

Boogie doesn’t support ﬂoating point numbers directly.
These are often used in GPU Kernels.
Modelled using uninterpreted functions (a function
deﬁned only by signature).
We know something has been assigned, just not its value.
Over-approximation could lead to false-positives, but only
discovered one such case during evaluation.


P OINTER H ANDLING

Boogie doesn’t support pointers (because they get messy)
GPU Kernels often do less messy things with pointers than
most C code
So, let’s assume that all pointers point within arrays, or are
null, and that anything else is an error
(Variables can be modelled as single-element arrays)
So, pointers can be modelled as a pair: (base, offset)


P OINTER S EMANTICS

Translation rules of pointer model are straightforward:
Source Generated Boogie
p = A; p = int_ptr(A_base, 0);
p = q; p = q;
foo(p); foo(p);
p = q + 1; p = int_ptr(q.base, q.offset + 1);
if (p.base == A_base)
A[p.offset + e] = d;
p[e] = d; else if (p.base == B_base)
B[p.offset + e] = d;
else assert(false);
x = A[p.offset + e];
x = p[e]; else if (p.base == B_base)
x = B[p.offset + e];
else assert(false);


B UT...

...if the program manipulates pointer in loops, the if...else if
clauses make determining the loop invariants hard.

One solution is to use points-to analysis (Steensgaard’s
algorithm) to determine which arrays a pointer can possibly
point to, and eliminate the impossible branches
A[p.offset + e] = d; if (p.base == A_base)
else if (p.base == B_base) → A[p.offset + e] = d;
B[p.offset + e] = d; else assert(false);
else assert(false);


R EDUCTION OF RACE - AND DIVERGENCE - CHECKING
TO SEQUENTIAL PROGRAM VERIFICATION

Basics have already been discussed in lectures:
Accesses to shared memory are instrumented with logging
procedures
Program transformed to model two arbitrary threads
Checking procedures for race and barrier divergence
introduced


A N OPEN QUESTION

At the end of the last lecture, we decided that:
P is correct ⇒ All terminating executions of K are free from
data races and barrier divergence.

But:
We might have P incorrect, but all terminating executions of K
free from data races and barrier divergence. Why?


Recall: Consider:
if (A[0]) {
Stmt translate(Stmt, P) A[tid + 1] = tid;
LOG_READ_A(P$1, e$1); } else {
CHECK_READ_A(P$2, e$2); A[tid + 2] = tid;
x = A[e]; x$1 = P$1 ? * : x$1; }
x$2 = P$2 ? * : x$2;


Thread 0: Thread 1:
if (false) { if (true) {
... A[2] = 1;
} else { } else {
A[2] = 0; ...
} }

Because we’ve havoced away the shared state!


A DVERSARIAL A BSTRACTION

The strategy we’ve seen in lectures for shared-state is
Adversarial abstraction - the shared state is thrown away
and havoced.
This over-approximation is ﬁne for cases where the shared
state does not impact upon the control-ﬂow. Otherwise, it
gives false-posititves.


E QUALITY A BSTRACTION
Both threads keep a shadow copy of the shared-state
At a barrier, the shadow copies are set to be arbitrary, but
equal
On leaving the barrier, all threads have a consistent view
of the shared state

Stmt translatea (Stmt, P) translatee (Stmt, P)
LOG_READ_A(P$1, e$1);
LOG_READ_A(P$1, e$1); CHECK_READ_A(P$2, e$2);
CHECK_READ_A(P$2, e$2); x$1 = P$1 ? A$1[e$1] :
x = A[e]; x$1 = P$1 ? * : x$1; x$1;
x$2 = P$2 ? * : x$2; x$2 = P$2 ? A$2[e$2] :
x$2;
LOG_WRITE_A(P$1, e$1);
CHECK_WRITE_A(P$2, e$2);
LOG_WRITE_A(P$1, e$1); A$1[e$1] = P$1 ? x$1 :
A[e] = x; CHECK_WRITE_A(P$2, e$2); A$1[e$1];
A$2[e$2] = P$2 ? x$2 :
A$2[e$2];


L IMITATIONS

Unfortunately, Equality Abstraction is far less efficient
than Adversarial Abstraction
GPUVerify only uses Equality Abstraction with the arrays
that require it, this is determined using control
dependence analysis

More complicated uses of the shared-state, such as
A[B[lid]] = ... cannot be verified

This is because B[i] != B[j] cannot be verified, as the
side-effecting actions of other (prior) threads are not
modelled.


I NVARIANT I NFERENCE

To be able to prove race and barrier-divergence free code, then
the produced Boogie program must be verified.
Verification depends on finding pre and post conditions for the
kernel, and loop invariants within.
GPUVerify uses a heuristically-selected set of invariants and
the Houdini tool to remove invalid invariants from that set
until all can be proven.


M EMORY S TRUCTURE H EURISTICS

The set of invariant heuristics discussed in the paper are for
common data structurings in arrays.
For example, if A[lid + C] = ... occurs in a loop, then a
candidate invariant is
WR EXISTS A ⇒ WR ELEM A − C == lid.

GPUVerify - Implementation

More Related Content

What's hot (20)

Viewers also liked (9)

Similar to GPUVerify - Implementation (20)

Recently uploaded (20)

GPUVerify - Implementation