SlideShare a Scribd company logo
I NTRODUCTION     T RANSLATION          R EDUCTION   I NFERENCE




                          GPUVerify
                Section 4 - Verification Method


                          Thomas Wood


                       November 28, 2012
I NTRODUCTION            T RANSLATION          R EDUCTION           I NFERENCE




I NTRODUCTION



      Section 4 describes in detail the the implementation of a verifier
      for the semantics detailed in the previous sections.
I NTRODUCTION         T RANSLATION       R EDUCTION            I NFERENCE




T RANSLATION




      Compiler from OpenCL/CUDA to intermediary Boogie built
      on CLANG/LLVM (a compiler toolset)
I NTRODUCTION                 T RANSLATION                 R EDUCTION   I NFERENCE




S PECIALISED GPU F EATURES


      Although both GPU languages and Boogie are both C-like,
      both extend C in different ways.
      In particular, GPU languages additionally support:
            Vector and Image types
            Intrinsic functions supported by the hardware and
            compiler eg: advanced maths
      Writing translations for these features for Boogie is time
      consuming.
      (And apparently boring, the paper doesn’t say any more on this)
I NTRODUCTION             T RANSLATION         R EDUCTION            I NFERENCE




B OOGIE AND F LOATS



            Boogie doesn’t support floating point numbers directly.
            These are often used in GPU Kernels.
            Modelled using uninterpreted functions (a function
            defined only by signature).
              We know something has been assigned, just not its value.
            Over-approximation could lead to false-positives, but only
            discovered one such case during evaluation.
I NTRODUCTION                 T RANSLATION              R EDUCTION     I NFERENCE




P OINTER H ANDLING



            Boogie doesn’t support pointers (because they get messy)
            GPU Kernels often do less messy things with pointers than
            most C code
            So, let’s assume that all pointers point within arrays, or are
            null, and that anything else is an error
                (Variables can be modelled as single-element arrays)
            So, pointers can be modelled as a pair: (base, offset)
I NTRODUCTION            T RANSLATION          R EDUCTION       I NFERENCE




P OINTER S EMANTICS

      Translation rules of pointer model are straightforward:
       Source       Generated Boogie
       p = A;       p = int_ptr(A_base, 0);
       p = q;       p = q;
       foo(p);      foo(p);
       p = q + 1;   p = int_ptr(q.base, q.offset + 1);
                    if (p.base == A_base)
                      A[p.offset + e] = d;
       p[e] = d;    else if (p.base == B_base)
                      B[p.offset + e] = d;
                    else assert(false);
                    if (p.base == A_base)
                      x = A[p.offset + e];
       x = p[e];    else if (p.base == B_base)
                      x = B[p.offset + e];
                    else assert(false);
I NTRODUCTION            T RANSLATION            R EDUCTION              I NFERENCE




B UT...


      ...if the program manipulates pointer in loops, the if...else if
      clauses make determining the loop invariants hard.

      One solution is to use points-to analysis (Steensgaard’s
      algorithm) to determine which arrays a pointer can possibly
      point to, and eliminate the impossible branches
    if (p.base == A_base)
      A[p.offset + e] = d;                      if (p.base == A_base)
    else if (p.base == B_base)          →         A[p.offset + e] = d;
      B[p.offset + e] = d;                      else assert(false);
    else assert(false);
I NTRODUCTION            T RANSLATION         R EDUCTION          I NFERENCE




R EDUCTION OF RACE - AND DIVERGENCE - CHECKING
TO SEQUENTIAL PROGRAM VERIFICATION




      Basics have already been discussed in lectures:
            Accesses to shared memory are instrumented with logging
            procedures
            Program transformed to model two arbitrary threads
            Checking procedures for race and barrier divergence
            introduced
I NTRODUCTION           T RANSLATION           R EDUCTION            I NFERENCE




A N OPEN QUESTION



      At the end of the last lecture, we decided that:
      P is correct ⇒ All terminating executions of K are free from
      data races and barrier divergence.

      But:
      We might have P incorrect, but all terminating executions of K
      free from data races and barrier divergence. Why?
I NTRODUCTION            T RANSLATION       R EDUCTION            I NFERENCE




   Recall:                                  Consider:
                                            if (A[0]) {
      Stmt        translate(Stmt, P)          A[tid + 1] = tid;
                  LOG_READ_A(P$1, e$1);     } else {
                  CHECK_READ_A(P$2, e$2);     A[tid + 2] = tid;
      x = A[e];   x$1 = P$1 ? * : x$1;      }
                  x$2 = P$2 ? * : x$2;
I NTRODUCTION                  T RANSLATION     R EDUCTION   I NFERENCE




                Thread 0:                     Thread 1:
                if (false) {                  if (true) {
                  ...                           A[2] = 1;
                } else {                      } else {
                  A[2] = 0;                     ...
                }                             }

      Because we’ve havoced away the shared state!
I NTRODUCTION             T RANSLATION          R EDUCTION          I NFERENCE




A DVERSARIAL A BSTRACTION




            The strategy we’ve seen in lectures for shared-state is
            Adversarial abstraction - the shared state is thrown away
            and havoced.
            This over-approximation is fine for cases where the shared
            state does not impact upon the control-flow. Otherwise, it
            gives false-posititves.
I NTRODUCTION             T RANSLATION           R EDUCTION            I NFERENCE




E QUALITY A BSTRACTION
            Both threads keep a shadow copy of the shared-state
            At a barrier, the shadow copies are set to be arbitrary, but
            equal
            On leaving the barrier, all threads have a consistent view
            of the shared state


       Stmt         translatea (Stmt, P)        translatee (Stmt, P)
                                                LOG_READ_A(P$1, e$1);
                    LOG_READ_A(P$1, e$1);       CHECK_READ_A(P$2, e$2);
                    CHECK_READ_A(P$2, e$2);     x$1 = P$1 ? A$1[e$1] :
       x = A[e];    x$1 = P$1 ? * : x$1;                      x$1;
                    x$2 = P$2 ? * : x$2;        x$2 = P$2 ? A$2[e$2] :
                                                              x$2;
                                                LOG_WRITE_A(P$1, e$1);
                                                CHECK_WRITE_A(P$2, e$2);
                    LOG_WRITE_A(P$1, e$1);      A$1[e$1] = P$1 ? x$1 :
       A[e] = x;    CHECK_WRITE_A(P$2, e$2);                  A$1[e$1];
                                                A$2[e$2] = P$2 ? x$2 :
                                                              A$2[e$2];
I NTRODUCTION             T RANSLATION           R EDUCTION            I NFERENCE




L IMITATIONS


            Unfortunately, Equality Abstraction is far less efficient
            than Adversarial Abstraction
            GPUVerify only uses Equality Abstraction with the arrays
            that require it, this is determined using control
            dependence analysis

            More complicated uses of the shared-state, such as
            A[B[lid]] = ... cannot be verified

            This is because B[i] != B[j] cannot be verified, as the
            side-effecting actions of other (prior) threads are not
            modelled.
I NTRODUCTION           T RANSLATION          R EDUCTION           I NFERENCE




I NVARIANT I NFERENCE



      To be able to prove race and barrier-divergence free code, then
      the produced Boogie program must be verified.
      Verification depends on finding pre and post conditions for the
      kernel, and loop invariants within.
      GPUVerify uses a heuristically-selected set of invariants and
      the Houdini tool to remove invalid invariants from that set
      until all can be proven.
I NTRODUCTION            T RANSLATION           R EDUCTION             I NFERENCE




M EMORY S TRUCTURE H EURISTICS




      The set of invariant heuristics discussed in the paper are for
      common data structurings in arrays.
      For example, if A[lid + C] = ... occurs in a loop, then a
      candidate invariant is
      WR EXISTS A ⇒ WR ELEM A − C == lid.

More Related Content

PPTX
C++ presentation
PPT
3306617
PPT
Algorithm
PDF
Lk module5 pointers
PPTX
Introduction to c++
PDF
C Recursion, Pointers, Dynamic memory management
PPS
Let Us Learn Lambda Using C# 3.0
PPTX
C introduction by thooyavan
C++ presentation
3306617
Algorithm
Lk module5 pointers
Introduction to c++
C Recursion, Pointers, Dynamic memory management
Let Us Learn Lambda Using C# 3.0
C introduction by thooyavan

What's hot (20)

PPT
Advanced Programming C++
PDF
Data structure week 3
PDF
Programming For Problem Solving Lecture Notes
PDF
2015 CMS Winter Meeting Poster
PDF
C Programming Storage classes, Recursion
PPTX
C++ Overview PPT
PDF
C++ book
PDF
AI Lesson 13
PDF
Ch04
PDF
Ch06
PPT
C++ Overview
PDF
Ch03
PPT
C++ Advanced
PPTX
C++ language basic
PDF
AI Lesson 16
PPT
C C++ tutorial for beginners- tibacademy.in
PPTX
Pointers
PPTX
Software Construction Assignment Help
PDF
Cs501 fd nf
PPTX
Computer Science Assignment Help
Advanced Programming C++
Data structure week 3
Programming For Problem Solving Lecture Notes
2015 CMS Winter Meeting Poster
C Programming Storage classes, Recursion
C++ Overview PPT
C++ book
AI Lesson 13
Ch04
Ch06
C++ Overview
Ch03
C++ Advanced
C++ language basic
AI Lesson 16
C C++ tutorial for beginners- tibacademy.in
Pointers
Software Construction Assignment Help
Cs501 fd nf
Computer Science Assignment Help
Ad

Viewers also liked (9)

PPT
C:\Documents And Settings\Pc3\Documenti\Metafore Per Il Mio Futuro Ggs
PDF
Innovative Learning Strategies For Small And Midsized Organizations
PPS
Whither subject access?
PPT
Como Se Titula 85
PDF
Usage and impact of controlled vocabularies in a subject repository for index...
PPT
RENION DE PADRES
PPTX
Folksonomies as Subject Access: A Survey of Tagging in Library Online Catalog...
PPT
Semantic Technology 2009: Hybrid Approaches to Taxonomy and Folksonomy
C:\Documents And Settings\Pc3\Documenti\Metafore Per Il Mio Futuro Ggs
Innovative Learning Strategies For Small And Midsized Organizations
Whither subject access?
Como Se Titula 85
Usage and impact of controlled vocabularies in a subject repository for index...
RENION DE PADRES
Folksonomies as Subject Access: A Survey of Tagging in Library Online Catalog...
Semantic Technology 2009: Hybrid Approaches to Taxonomy and Folksonomy
Ad

Similar to GPUVerify - Implementation (20)

PDF
Appsec obfuscator reloaded
PDF
Introduction to Compiler Development
PDF
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
PDF
論文紹介 Hyperkernel: Push-Button Verification of an OS Kernel (SOSP’17)
PDF
Axiomatic Verification of Memory Models
PDF
Autovectorization in llvm
KEY
Verification with LoLA: 2 The LoLA Input Language
PPTX
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
PDF
Presentation1.pdf
PDF
Large-scale computation without sacrificing expressiveness
PPTX
Как работает LLVM бэкенд в C#. Егор Богатов ➠ CoreHard Autumn 2019
PDF
Codefreeze eng
PDF
Codefreeze rus
KEY
Pontificating quantification
PDF
Tour of language landscape
PPT
458237.-Compiler-Design-Intermediate-code-generation.ppt
PPTX
The theory of concurrent programming for a seasoned programmer
PDF
Bristol 2009 q1_wright_steve
PDF
Scala Functional Patterns
PPT
Appsec obfuscator reloaded
Introduction to Compiler Development
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
論文紹介 Hyperkernel: Push-Button Verification of an OS Kernel (SOSP’17)
Axiomatic Verification of Memory Models
Autovectorization in llvm
Verification with LoLA: 2 The LoLA Input Language
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Presentation1.pdf
Large-scale computation without sacrificing expressiveness
Как работает LLVM бэкенд в C#. Егор Богатов ➠ CoreHard Autumn 2019
Codefreeze eng
Codefreeze rus
Pontificating quantification
Tour of language landscape
458237.-Compiler-Design-Intermediate-code-generation.ppt
The theory of concurrent programming for a seasoned programmer
Bristol 2009 q1_wright_steve
Scala Functional Patterns

Recently uploaded (20)

PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Spectroscopy.pptx food analysis technology
PDF
Approach and Philosophy of On baking technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
cuic standard and advanced reporting.pdf
PPT
Teaching material agriculture food technology
Assigned Numbers - 2025 - Bluetooth® Document
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A comparative analysis of optical character recognition models for extracting...
Machine learning based COVID-19 study performance prediction
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectroscopy.pptx food analysis technology
Approach and Philosophy of On baking technology
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Empathic Computing: Creating Shared Understanding
MIND Revenue Release Quarter 2 2025 Press Release
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Programs and apps: productivity, graphics, security and other tools
cuic standard and advanced reporting.pdf
Teaching material agriculture food technology

GPUVerify - Implementation

  • 1. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE GPUVerify Section 4 - Verification Method Thomas Wood November 28, 2012
  • 2. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE I NTRODUCTION Section 4 describes in detail the the implementation of a verifier for the semantics detailed in the previous sections.
  • 3. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE T RANSLATION Compiler from OpenCL/CUDA to intermediary Boogie built on CLANG/LLVM (a compiler toolset)
  • 4. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE S PECIALISED GPU F EATURES Although both GPU languages and Boogie are both C-like, both extend C in different ways. In particular, GPU languages additionally support: Vector and Image types Intrinsic functions supported by the hardware and compiler eg: advanced maths Writing translations for these features for Boogie is time consuming. (And apparently boring, the paper doesn’t say any more on this)
  • 5. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE B OOGIE AND F LOATS Boogie doesn’t support floating point numbers directly. These are often used in GPU Kernels. Modelled using uninterpreted functions (a function defined only by signature). We know something has been assigned, just not its value. Over-approximation could lead to false-positives, but only discovered one such case during evaluation.
  • 6. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE P OINTER H ANDLING Boogie doesn’t support pointers (because they get messy) GPU Kernels often do less messy things with pointers than most C code So, let’s assume that all pointers point within arrays, or are null, and that anything else is an error (Variables can be modelled as single-element arrays) So, pointers can be modelled as a pair: (base, offset)
  • 7. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE P OINTER S EMANTICS Translation rules of pointer model are straightforward: Source Generated Boogie p = A; p = int_ptr(A_base, 0); p = q; p = q; foo(p); foo(p); p = q + 1; p = int_ptr(q.base, q.offset + 1); if (p.base == A_base) A[p.offset + e] = d; p[e] = d; else if (p.base == B_base) B[p.offset + e] = d; else assert(false); if (p.base == A_base) x = A[p.offset + e]; x = p[e]; else if (p.base == B_base) x = B[p.offset + e]; else assert(false);
  • 8. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE B UT... ...if the program manipulates pointer in loops, the if...else if clauses make determining the loop invariants hard. One solution is to use points-to analysis (Steensgaard’s algorithm) to determine which arrays a pointer can possibly point to, and eliminate the impossible branches if (p.base == A_base) A[p.offset + e] = d; if (p.base == A_base) else if (p.base == B_base) → A[p.offset + e] = d; B[p.offset + e] = d; else assert(false); else assert(false);
  • 9. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE R EDUCTION OF RACE - AND DIVERGENCE - CHECKING TO SEQUENTIAL PROGRAM VERIFICATION Basics have already been discussed in lectures: Accesses to shared memory are instrumented with logging procedures Program transformed to model two arbitrary threads Checking procedures for race and barrier divergence introduced
  • 10. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE A N OPEN QUESTION At the end of the last lecture, we decided that: P is correct ⇒ All terminating executions of K are free from data races and barrier divergence. But: We might have P incorrect, but all terminating executions of K free from data races and barrier divergence. Why?
  • 11. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE Recall: Consider: if (A[0]) { Stmt translate(Stmt, P) A[tid + 1] = tid; LOG_READ_A(P$1, e$1); } else { CHECK_READ_A(P$2, e$2); A[tid + 2] = tid; x = A[e]; x$1 = P$1 ? * : x$1; } x$2 = P$2 ? * : x$2;
  • 12. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE Thread 0: Thread 1: if (false) { if (true) { ... A[2] = 1; } else { } else { A[2] = 0; ... } } Because we’ve havoced away the shared state!
  • 13. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE A DVERSARIAL A BSTRACTION The strategy we’ve seen in lectures for shared-state is Adversarial abstraction - the shared state is thrown away and havoced. This over-approximation is fine for cases where the shared state does not impact upon the control-flow. Otherwise, it gives false-posititves.
  • 14. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE E QUALITY A BSTRACTION Both threads keep a shadow copy of the shared-state At a barrier, the shadow copies are set to be arbitrary, but equal On leaving the barrier, all threads have a consistent view of the shared state Stmt translatea (Stmt, P) translatee (Stmt, P) LOG_READ_A(P$1, e$1); LOG_READ_A(P$1, e$1); CHECK_READ_A(P$2, e$2); CHECK_READ_A(P$2, e$2); x$1 = P$1 ? A$1[e$1] : x = A[e]; x$1 = P$1 ? * : x$1; x$1; x$2 = P$2 ? * : x$2; x$2 = P$2 ? A$2[e$2] : x$2; LOG_WRITE_A(P$1, e$1); CHECK_WRITE_A(P$2, e$2); LOG_WRITE_A(P$1, e$1); A$1[e$1] = P$1 ? x$1 : A[e] = x; CHECK_WRITE_A(P$2, e$2); A$1[e$1]; A$2[e$2] = P$2 ? x$2 : A$2[e$2];
  • 15. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE L IMITATIONS Unfortunately, Equality Abstraction is far less efficient than Adversarial Abstraction GPUVerify only uses Equality Abstraction with the arrays that require it, this is determined using control dependence analysis More complicated uses of the shared-state, such as A[B[lid]] = ... cannot be verified This is because B[i] != B[j] cannot be verified, as the side-effecting actions of other (prior) threads are not modelled.
  • 16. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE I NVARIANT I NFERENCE To be able to prove race and barrier-divergence free code, then the produced Boogie program must be verified. Verification depends on finding pre and post conditions for the kernel, and loop invariants within. GPUVerify uses a heuristically-selected set of invariants and the Houdini tool to remove invalid invariants from that set until all can be proven.
  • 17. I NTRODUCTION T RANSLATION R EDUCTION I NFERENCE M EMORY S TRUCTURE H EURISTICS The set of invariant heuristics discussed in the paper are for common data structurings in arrays. For example, if A[lid + C] = ... occurs in a loop, then a candidate invariant is WR EXISTS A ⇒ WR ELEM A − C == lid.