SequenceL Auto-Parallelizing Toolset Intro slideshare

An Introduction to SequenceL
Auto-Parallelizing Programming Language and Toolset
www.texasmulticore.com
Brad Nemanich, PhD
Chief Technology Officer

Why is SequenceL Needed?
”The way the processor industry is going is
to add more and more cores, but nobody
knows how to program those things. I mean,
two, yeah; four, not really; eight, forget it.”
– Steve Jobs
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved2
This shift now affects every software company,
large enterprise, and government agency that
develops software

Current (Manual) Approach to Multicore Programming
1. Be sure you identify truly independent computations.
2. Implement concurrency at the highest level possible.
3. Plan early for scalability to take advantage of increasing numbers of
cores.
4. Make use of thread-safe libraries wherever possible.
5. Use the right threading model.
6. Never assume a particular order of execution.
7. Use thread-local storage whenever possible; associate locks to specific
data, if needed.
8. Don’t be afraid to change the algorithm for a better chance of
concurrency.
8 “Simple” Rules for Designing Threaded Applications
(0. Hire team of “Parallel Ninjas”, PhD experts in computer architecture.)

Current (Manual) Approach to Multicore Programming
1. Be sure you identify truly independent computations.
2. Implement concurrency at the highest level possible.
3. Plan early for scalability to take advantage of increasing numbers of
cores.
4. Make use of thread-safe libraries wherever possible.
5. Use the right threading model.
6. Never assume a particular order of execution.
7. Use thread-local storage whenever possible; associate locks to specific
data, if needed.
8. Don’t be afraid to change the algorithm for a better chance of
concurrency.
8 “Simple” Rules for Designing Threaded Applications
(0. Hire team of “Parallel Ninjas”, PhD experts in computer architecture.)
“The significant problems we face cannot be solved using
the same level of thinking we used when we created them.”
-Albert Einstein

“Parallel Ninja” Approach Does Not Scale
 How do you:
─ find them?
─ afford them?
─ retain them?
─ support rapid innovation?
─ ensure accuracy and correctness?
─ keep them current on platform technologies?
─ do this for all your software?
Einstein was right;
There’s a much better way….

It’s Time to Change the Game (Again)
6
Wiring Machine CodeWiring
Machine Code Machine Code
Assembly
Language
Netlist
Netlist
1954 1957 1980
Machine Code
HLL + Compiler
(Fortran, COBOL,
PL/I, Lisp, C,…)
Machine Code
Object Oriented
(SmallTalk, C++,
Java, C#,)
19491949
All Rights Reserved

7
Assembly
Language
Netlist
Netlist
1954 1957 1980
Machine Code
HLL + Compiler
(Fortran, COBOL,
PL/I, Lisp, C,…)
Machine Code
Object Oriented
(SmallTalk, C++,
Java, C#,)
19491949
2004: Multicore
All Rights Reserved

8
Assembly
Language
Netlist
Netlist
1954 1957 1980
Machine Code
HLL + Compiler
(Fortran, COBOL,
PL/I, Lisp, C,…)
Machine Code
Object Oriented
(SmallTalk, C++,
Java, C#,)
19491949 2014
Machine Code
Object Oriented
C++
Functional,
Auto-
Parallelizing
Object Oriented
C++
Functional,
Auto-
Parallelizing
2004: Multicore
All Rights Reserved

SequenceL is a Game Changer
Faster Performance;
Uses all cores, GPUs
10X Faster Time to
Innovation/Market
Get it Right the
First Time
Quickly Leverage New
Computing Platforms
Built Upon Open Industry
Standards; Works with Existing
Tools & Methodologies

Customer Example: Industrial Control Networking
(WirelessHART, IEC 62591, IEEE 802.15.4)
 New algorithm, developed for large, noisy industrial
process control environments
─ Presented white paper to IEEE
─ Won an award
 Asked TMT to implement for comparison purposes
─ Finished in SequenceL in 3 weeks
 10X faster performance and right the first time
─ Java finished by the inventors in 3 months
 Had errors and much slower; used SequenceL code to debug Java
 Another month getting code correct
 A 5th month improving performance that still fell short
 Bottom line
─ SL was finished in 15% of the time
─ SL was correct the first time
─ SL out-performed the Java code 1.5x-3.0x on a 2 core AMD APU
─ Robust and fast code, fast time to market
10
All Rights Reserved

Customer Example: Video Processing Using SequenceL
 Goal: 30Hz to keep up with input video feed
 Best performance (8 core x86 platform)
─ 58 Hz: SequenceL
─ 21 Hz: Matlab (Interpreter)
─ 1.2 Hz: Matlab (Coder/C-out)
Input video feed
(e.g.- Apache helicopter gyro camera)
Processed video
(Proprietary algorithms remove air
turbulence, radiated heat, etc.)

Input video feed
Processed video

What is SequenceL?
SequenceL is a…
 High-Abstraction
 Functional
 Self-Parallelizing
…programming language and tool set
….designed to work in concert with other
popular programming languages and tools
15
All Rights Reserved

High-Abstraction, High Performance
 Most common programming languages are imperative
─ Detailed sequence of commands for carrying out the computation;
i.e.- tell the computer both “what” to do and “how” to do it
─ Inherently sequential, written for classic Von Neumann computers
─ e.g.- C/C++, Java, C#, Python, Fortran
─ Some add explicit “directives” to manually enable low-level parallelism
 SequenceL is declarative & functional – higher abstraction
─ Describe the desired output in terms of the input, as functions;
i.e.- tell the computer only “what” to do, so no thinking about parallel
─ Abstracts away complex multicore and many-core platforms
 Best analogy is SQL database language
─ A programmer could write their own database procedures in low level C
─ But would be error-prone and not perform as well as with Oracle or DB2
16
All Rights Reserved

Drops Into Your Current Design Flow
 Designed to work in concert with
other programming languages,
legacy code and libraries
 Additive: works with existing
design flows, tools, and training
 Builds upon open industry
standards
17
All Rights Reserved

Drops Into Your Current Design Flow
 Adds a multicore “power tool” to
the programmers toolbox
 Complete add-on solution
─ IDE plug-ins, debugger, interpreter, auto-
parallelizing compiler, runtime environment
 Easy to modernize legacy applications
─ Parallel C++ output enables just a portion to
be refactored in SequenceL and linked in
─ Uses Vector (SIMD) processor instructions
─ Automatic OpenCL generation averts the
need to learn and incorporate low-level
CUDA or OpenCL code and associated
scaffolding to exploit systems with (GP)GPUs
─ Often faster to refactor portions of code in
SequenceL than find and fix bugs in old code
18
All Rights Reserved

The Problem With Directive-Based Programming
Example: 3-body problem
//P1
a1 = grav(P1, P2, m2) + grav(P1, P3, m3);
dv1 = a1*dt;
v1 = v1 + dv1;
dp1 = v1*dt;
//P2
dv2 = a2*dt;
v2 = v2 + dv2;
dp2 = v2*dt;
//P3
dv3 = a3*dt;
v3 = v3 + dv3;
dp3 = v3*dt;
19
All Rights Reserved

//P1
dv1 = a1*dt;
v1 = v1 + dv1;
dp1 = v1*dt;
//P2
dv2 = a2*dt;
v2 = v2 + dv2;
dp2 = v2*dt;
//P3
dv3 = a3*dt;
v3 = v3 + dv3;
dp3 = v3*dt;
Each body can be
calculated at the same
time to give in theory a
3x speedup
20
All Rights Reserved

#pragma omp parallel
#pragma omp single nowait
{
#pragma omp task
{
dv1 = a1*dt;
v1 = v1 + dv1;
dp1 = v1*dt;
}
#pragma omp task
{
dv2 = a2*dt;
v2 = v2 + dv2;
dp2 = v2*dt;
}
#pragma omp task
{
dv3 = a3*dt;
v3 = v3 + dv3;
dp3 = v3*dt;
}
#pragma omp taskwait
}
Using directive-based
approaches like OpenMP,
the burden is on the
programmer to identify
where the program can
be safely parallelized.
Programmer then has to
add the correct pragmas.
21
All Rights Reserved

{
#pragma omp task
{
dv1 = a1*dt;
v1 = v1 + dv1;
dp1 = v1*dt;
}
#pragma omp task
{
dv2 = a2*dt;
v2 = v2 + dv2;
dp2 = v2*dt;
}
#pragma omp task
{
dv3 = a3*dt;
v3 = v3 + dv3;
dp3 = v3*dt;
}
}
But maybe you could
parallelize other things…
22
All Rights Reserved

{
#pragma omp task
g1 = grav(P1, P2, m2);
#pragma omp task
g2 = grav(P1, P3, m3);
#pragma omp task
g3 = grav(P2, P1, m1);
#pragma omp task
g4 = grav(P2, P3, m3);
#pragma omp task
g5 = grav(P3, P2, m2);
#pragma omp task
g6 = grav(P3, P1, m1);
}
a1 = g1 + g2;
dv1 = a1*dt;
v1 = v1 + dv1;
dp1 = v1*dt;
a2 = g3 + g4;
dv2 = a2*dt;
v2 = v2 + dv2;
dp2 = v2*dt;
a3 = g5 + g6;
dv3 = a3*dt;
v3 = v3 + dv3;
dp3 = v3*dt;
But now you have to start
re-arranging the code,
moving further away from
the original description of
the algorithm
Possible Race Conditions!
If the grav function modifies its
inputs or calls non thread-safe
functions, there could be hard to
detect race conditions, leading to
incorrect results
23
All Rights Reserved

SequenceL: Self-Parallelizes, Race-Free, Readable
threeBody(P1, m1, P2, m2, P3, m3, dt) :=
let
a1 := grav(P1, P2, m2) + grav(P1, P2, m2);
dv1 := a1*dt;
v1 := v1 + dv1;
dp1 := v1*dt;
a2 := g3 = grav(P2, P1, m1) + grav(P2, P3, m3);
dv2 := a2*dt;
v2 := v2 + dv2;
dp2 := v2*dt;
a3 := grav(P3, P2, m2) + grav(P3, P1, m1);
dv3 := a3*dt;
v3 := v3 + dv3;
dp3 := v3*dt;
in
[dp1, dp2, dp3];
With SequenceL the programmer
does not add any parallel
constructs or pragmas.
The program will self-parallelize if
safe to do so (No race conditions).
Code clarity and intent remain,
greatly improving correctness and
quality.
Subsequent enhancements and
innovations are rapid.
This ease of reading/writing
is not by accident.
24
All Rights Reserved

Ease of Reading/Writing SequenceL
 Matrix Multiply:
─ The product of an m×p matrix A with a p×n matrix B is
an m×n matrix denoted AB whose entries are given by:
𝐴𝐵 𝑖𝑗 = 𝑘=1
𝑝
𝐴𝑖𝑘 𝐵 𝑘𝑗
25
All Rights Reserved

High-Abstraction, High Performance
-
10
20
30
40
50
60
70
C++ Ref. 1 2 4 8 16 32
X
Cores
Matrix Multiply Acceleration
Reference = sequential C++
28
 Parallel Matrix Multiply in SequenceL:
All Rights Reserved

Sample SequenceL Performance Speedups
29
0.00
2.00
4.00
6.00
8.00
10.00
12.00
0 2 4 6 8 10 12 14 16
Matrix Multiply
Game Of Life
2D FFT
LU factorization
QuickSort
String Search
Barnes-Hut
n-Body
Matrix Inverse
Sparse Matrix
Compression
Adesk (DC)
Adesk (LW)
Matrix Multiply
(blocking)
Semblance
Speech filter
Perfect
Number of Processor Cores
TimesFaster
All Rights Reserved

To learn more:
Watch an short 3-part video tutorial at:
http://www.texasmulticoretechnologies.com/resources/videos/
Email: sales@texasmulticore.com for a free 45 day trial
www.texasmulticore.com

SequenceL Auto-Parallelizing Toolset Intro slideshare

More Related Content

What's hot (19)

Similar to SequenceL Auto-Parallelizing Toolset Intro slideshare (20)

Recently uploaded (20)

SequenceL Auto-Parallelizing Toolset Intro slideshare