SlideShare a Scribd company logo
Ateji PX: Java Parallel Programming made Simple© Ateji – All rights reserved.
Ateji – the CompanySpecialized in parallelism & language technologiesFounded by Patrick Viry in 2005 Java extensions for optimization (OptimJ, 2008),Parallelism (Ateji PX, 2010)January 2010: 1st round of investmentAtejiPX Selected as Disruptive Technology during SC10Member of HiPEAC, OpenGPU
The Grand Challenge : 	Parallel Programming for 	All Application Developers2010 (100 cores)2008 (4 cores)enterpriseservers
Why Java ?Increasingly used for HPC because:Most popular language todayGood runtime performanceMuch better productivity and code qualityFaster time-to-market, less bugs, less maintenanceMuch easier staffingUsed in aerospace, bioinformatics, physics, finance,         data mining, statistics, ...Details and references in our latest blog posting: ateji.blogspot.com
How to parallelize Java code ?		for(int i : I) {for(int j : J) {				for(int k : K) {					            C[i][j] += A[i][k] * B[k][j];		        }		    }		}Ateji PXThreadsfinal int nThreads = System.getAvailableProcessors();final int blockSize = I / nThreads;Thread[] threads = new Thread[nThreads];for(int n=0; n<nThreads; n++) {  final int finalN = n; threads[n] = new Thread() {    void run() {        final int beginIndex = finalN*blockSize;        final int endIndex = (finalN == (nThreads-1))?I :(finalN+1)*blockSize;                                 for( int i=beginIndex; i<endIndex; i++) {for(int j=0; j<J; j++) {for(int k=0; k<K; k++) {	C[i][j] += A[i][k] * B[k][j];}}}}};threads[n].start();}for(int n=0; n<nThreads; n++) {try {threads[n].join();} catch (InterruptedException e) {System.exit(-1);}}		for||(int i : I) {for(int j : J) {				for(int k : K) {					            C[i][j] += A[i][k] * B[k][j];		        }		    }		}		for||(int i : I) {for(int j : J) {				for(int k : K) {					            C[i][j] += A[i][k] * B[k][j];		        }		    }		}for||
It’s easy AND efficient :12.5x speedup on 16 coresSeewhitepaperon www.ateji.com/pxAteji PX		for||(int i : I) {for(int j : J) {				for(int k : K) {					            C[i][j] += A[i][k] * B[k][j];		        }		    }		}		for||(int i : I) {for(int j : J) {				for(int k : K) {					            C[i][j] += A[i][k] * B[k][j];		        }		    }		}for||
“The problem with threads”[Technical Report, Edward A. Lee, EECS Berkeley]Threads are a hardware-level concept, not a practical                    abstraction for programmingThreads do not composeCode correctness requires intricate thinking and        inspection of the whole programMost multi-threaded programs are bugged ...      … and debuggers do not helpNot an option for most application programmers !
Introducing Parallelism at the Language LevelSequential composition operator: 	“;”Parallel composition operator: 		“||”“Hello World!”	[ ||System.out.println("Hello");||System.out.println("World");]Run two branches in parallel, wait for terminationprints either                              orHelloWorldWorldHello
DataParallelismSame operation on all elements [// quantified branches|| (inti : N) array[i]++;]Multiple dimensions and filterse.g. update the upper left triangle of a matrix[|| (int i:N, int j:N, i+j<N) m[i][j]++;]
Task Parallelismintfib(int n) {			if(n <= 1) return 1;int fib1, fib2;			[|| fib1 = fib(n-1);|| fib2 = fib(n-2);			];			return fib1 + fib2;		}Note the recursivity: ||compatible with all language constructs
Speculative ParallelismStop when the fastest algorithm succeeds 	[    || return algorithm1();    || return algorithm2(); ]Stop sister branches then returnSame behaviour for break, continue, throwNon-local exit very difficult to get right with threads
Parallel reductionsSame behaviour for break, continue, throw
Message PassingIs an essential aspect of parallelismMust be part of the languageSend a message: 		chan ! ValueReceive a message: 	chan ? valueTyped Channels	Chan<T> : synchronous (rendez-vous)AsyncChan<T>: asynchronous (buffered)‏	User-defined serialization (Java, XML, ASN.1, ...)	Can be mapped to I/O devices (files, sockets, MPI)
in1adderoutin2Data Flow and Stream parallelismAn adder void adder(Chan<Integer> in1, in2, out) {	for(;;) {int value1, value2;[in1 ? value1; ||in2 ? value2; ];out ! (value1 + value2);}}
c1addersourcec3sinkc2sourceData Flow and Stream parallelismCompose processes [  || source(c1); // generates values on c1   || source(c2); // generates values on c2   || adder(c1, c2, c3);   || sink(c3); ] // read values from c3Numeric values + sync = “data flow”String or tuples + async = “stream programming”	e.g. MapReduce algorithm
Expressing non-determinismNote the parallel reads [ in1 ? value1 || in2 ? value2 ]Impossibleto express in a sequential language|| for performance, but also expressivitySee also the select construct
Distributing branchesUse indications [ || #Remote(“192.168.20.1”)source(c1);||#Remote(“Amazon EC2”) source(c2); ||#Remote(“GPU”) adder(c1, c2, c3);  || sink(c3); ]Multicore Desktop/ServerMulticore CPU/GPU cluster
Compiler handles the boring stuffPassing parametersReturning resultsThrowing exceptionsAccessing non-final fieldsPerforming non-local exitsStopping branches properly
Makingiteasyisalso about tools:EclipseIntegration
Ateji PX SummaryParallelism at the language level is simple and intuitive,        efficient, compatible with source code and toolsMost patterns in a single language: data, task, recursive and speculative parallelismshared memory and distributed memoryCovers OpenMP, Cilk, MPI, Occam, Erlang, etc…Most hardware architectures from a single language:Manycore, grid, cloud, GPU
Roadmap as of February 2011Ateji PX 1.1 (multicore version) available today 			Free evaluation version on www.ateji.comGPU version coming soonOpenGPU projectDistributed version coming soon	Grid / Cluster / CloudInteractive correctness proofsIntegration of profiling tools
Call to ActionFree download on  www.ateji.com/pxRead the whitepapersPlay with the online demoLook at the samples libraryBenchmark your || codeContact  info@ateji.comBlog : ateji.blogspot.com
© Ateji – All rights reserved.

More Related Content

PDF
Tech Talks @NSU: DLang: возможности языка и его применение
PPT
Fpga 13-task-and-functions
PPTX
SharePoint Saturday Belgium 2014 - Production debugging of SharePoint applica...
PDF
Unsupervised program synthesis
PPSX
Symbolic mathematics
PDF
Introduction to Gura Programming Language
PPTX
Reducing computational complexity of Mathematical functions using FPGA
Tech Talks @NSU: DLang: возможности языка и его применение
Fpga 13-task-and-functions
SharePoint Saturday Belgium 2014 - Production debugging of SharePoint applica...
Unsupervised program synthesis
Symbolic mathematics
Introduction to Gura Programming Language
Reducing computational complexity of Mathematical functions using FPGA

What's hot (19)

PDF
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
PDF
I don’t care if you have 360 Intra directional predictors
PDF
OXiGen: Automated FPGA design flow from C applications to dataflow kernels - ...
PPT
C#, What Is Next?
PDF
14 - 08 Feb - Dynamic Programming
PPTX
Isorc18 keynote
PPTX
Tail Recursion in data structure
PPTX
Symbexecsearch
PDF
C++17 introduction - Meetup @EtixLabs
PDF
PDF
Kotlin Crash Course
PPTX
Convolution using Scilab
PDF
Performance Portability Through Descriptive Parallelism
PPTX
C++ Generators and Property-based Testing
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PDF
Brief introduction to Algorithm analysis
PPTX
PDF
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
I don’t care if you have 360 Intra directional predictors
OXiGen: Automated FPGA design flow from C applications to dataflow kernels - ...
C#, What Is Next?
14 - 08 Feb - Dynamic Programming
Isorc18 keynote
Tail Recursion in data structure
Symbexecsearch
C++17 introduction - Meetup @EtixLabs
Kotlin Crash Course
Convolution using Scilab
Performance Portability Through Descriptive Parallelism
C++ Generators and Property-based Testing
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Brief introduction to Algorithm analysis
Ad

Similar to Java parallel programming made simple (20)

PDF
Ateji PX for Java
PPT
Os Reindersfinal
PPT
Os Reindersfinal
PPT
Oscon keynote: Working hard to keep it simple
PPT
Migration To Multi Core - Parallel Programming Models
PPTX
Medical Image Processing Strategies for multi-core CPUs
PPT
BayFP: Concurrent and Multicore Haskell
PPT
Overview Of Parallel Development - Ericnel
PPT
Profiling Java Programs for Parallelism
PPTX
Parallel programming patterns - Олександр Павлишак
PPTX
Parallel programming patterns (UA)
PPT
Task and Data Parallelism
PDF
Unmanaged Parallelization via P/Invoke
PPT
Parallel Programming: Beyond the Critical Section
PPTX
Thinking in parallel ab tuladev
ODP
Pick up the low-hanging concurrency fruit
PDF
Matrix Multiplication with Ateji PX for Java
PDF
Parallel Programming
PDF
От Java Threads к лямбдам, Андрей Родионов
PDF
Java Concurrency Idioms
Ateji PX for Java
Os Reindersfinal
Os Reindersfinal
Oscon keynote: Working hard to keep it simple
Migration To Multi Core - Parallel Programming Models
Medical Image Processing Strategies for multi-core CPUs
BayFP: Concurrent and Multicore Haskell
Overview Of Parallel Development - Ericnel
Profiling Java Programs for Parallelism
Parallel programming patterns - Олександр Павлишак
Parallel programming patterns (UA)
Task and Data Parallelism
Unmanaged Parallelization via P/Invoke
Parallel Programming: Beyond the Critical Section
Thinking in parallel ab tuladev
Pick up the low-hanging concurrency fruit
Matrix Multiplication with Ateji PX for Java
Parallel Programming
От Java Threads к лямбдам, Андрей Родионов
Java Concurrency Idioms
Ad

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Assigned Numbers - 2025 - Bluetooth® Document
Reach Out and Touch Someone: Haptics and Empathic Computing
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Weekly Chronicles - August'25-Week II
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Programs and apps: productivity, graphics, security and other tools
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MYSQL Presentation for SQL database connectivity
20250228 LYD VKU AI Blended-Learning.pptx
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction

Java parallel programming made simple

  • 1. Ateji PX: Java Parallel Programming made Simple© Ateji – All rights reserved.
  • 2. Ateji – the CompanySpecialized in parallelism & language technologiesFounded by Patrick Viry in 2005 Java extensions for optimization (OptimJ, 2008),Parallelism (Ateji PX, 2010)January 2010: 1st round of investmentAtejiPX Selected as Disruptive Technology during SC10Member of HiPEAC, OpenGPU
  • 3. The Grand Challenge : Parallel Programming for All Application Developers2010 (100 cores)2008 (4 cores)enterpriseservers
  • 4. Why Java ?Increasingly used for HPC because:Most popular language todayGood runtime performanceMuch better productivity and code qualityFaster time-to-market, less bugs, less maintenanceMuch easier staffingUsed in aerospace, bioinformatics, physics, finance, data mining, statistics, ...Details and references in our latest blog posting: ateji.blogspot.com
  • 5. How to parallelize Java code ? for(int i : I) {for(int j : J) { for(int k : K) { C[i][j] += A[i][k] * B[k][j]; } } }Ateji PXThreadsfinal int nThreads = System.getAvailableProcessors();final int blockSize = I / nThreads;Thread[] threads = new Thread[nThreads];for(int n=0; n<nThreads; n++) { final int finalN = n; threads[n] = new Thread() { void run() { final int beginIndex = finalN*blockSize; final int endIndex = (finalN == (nThreads-1))?I :(finalN+1)*blockSize; for( int i=beginIndex; i<endIndex; i++) {for(int j=0; j<J; j++) {for(int k=0; k<K; k++) { C[i][j] += A[i][k] * B[k][j];}}}}};threads[n].start();}for(int n=0; n<nThreads; n++) {try {threads[n].join();} catch (InterruptedException e) {System.exit(-1);}} for||(int i : I) {for(int j : J) { for(int k : K) { C[i][j] += A[i][k] * B[k][j]; } } } for||(int i : I) {for(int j : J) { for(int k : K) { C[i][j] += A[i][k] * B[k][j]; } } }for||
  • 6. It’s easy AND efficient :12.5x speedup on 16 coresSeewhitepaperon www.ateji.com/pxAteji PX for||(int i : I) {for(int j : J) { for(int k : K) { C[i][j] += A[i][k] * B[k][j]; } } } for||(int i : I) {for(int j : J) { for(int k : K) { C[i][j] += A[i][k] * B[k][j]; } } }for||
  • 7. “The problem with threads”[Technical Report, Edward A. Lee, EECS Berkeley]Threads are a hardware-level concept, not a practical abstraction for programmingThreads do not composeCode correctness requires intricate thinking and inspection of the whole programMost multi-threaded programs are bugged ... … and debuggers do not helpNot an option for most application programmers !
  • 8. Introducing Parallelism at the Language LevelSequential composition operator: “;”Parallel composition operator: “||”“Hello World!” [ ||System.out.println("Hello");||System.out.println("World");]Run two branches in parallel, wait for terminationprints either orHelloWorldWorldHello
  • 9. DataParallelismSame operation on all elements [// quantified branches|| (inti : N) array[i]++;]Multiple dimensions and filterse.g. update the upper left triangle of a matrix[|| (int i:N, int j:N, i+j<N) m[i][j]++;]
  • 10. Task Parallelismintfib(int n) { if(n <= 1) return 1;int fib1, fib2; [|| fib1 = fib(n-1);|| fib2 = fib(n-2); ]; return fib1 + fib2; }Note the recursivity: ||compatible with all language constructs
  • 11. Speculative ParallelismStop when the fastest algorithm succeeds [ || return algorithm1(); || return algorithm2(); ]Stop sister branches then returnSame behaviour for break, continue, throwNon-local exit very difficult to get right with threads
  • 12. Parallel reductionsSame behaviour for break, continue, throw
  • 13. Message PassingIs an essential aspect of parallelismMust be part of the languageSend a message: chan ! ValueReceive a message: chan ? valueTyped Channels Chan<T> : synchronous (rendez-vous)AsyncChan<T>: asynchronous (buffered)‏ User-defined serialization (Java, XML, ASN.1, ...) Can be mapped to I/O devices (files, sockets, MPI)
  • 14. in1adderoutin2Data Flow and Stream parallelismAn adder void adder(Chan<Integer> in1, in2, out) { for(;;) {int value1, value2;[in1 ? value1; ||in2 ? value2; ];out ! (value1 + value2);}}
  • 15. c1addersourcec3sinkc2sourceData Flow and Stream parallelismCompose processes [ || source(c1); // generates values on c1 || source(c2); // generates values on c2 || adder(c1, c2, c3); || sink(c3); ] // read values from c3Numeric values + sync = “data flow”String or tuples + async = “stream programming” e.g. MapReduce algorithm
  • 16. Expressing non-determinismNote the parallel reads [ in1 ? value1 || in2 ? value2 ]Impossibleto express in a sequential language|| for performance, but also expressivitySee also the select construct
  • 17. Distributing branchesUse indications [ || #Remote(“192.168.20.1”)source(c1);||#Remote(“Amazon EC2”) source(c2); ||#Remote(“GPU”) adder(c1, c2, c3); || sink(c3); ]Multicore Desktop/ServerMulticore CPU/GPU cluster
  • 18. Compiler handles the boring stuffPassing parametersReturning resultsThrowing exceptionsAccessing non-final fieldsPerforming non-local exitsStopping branches properly
  • 20. Ateji PX SummaryParallelism at the language level is simple and intuitive, efficient, compatible with source code and toolsMost patterns in a single language: data, task, recursive and speculative parallelismshared memory and distributed memoryCovers OpenMP, Cilk, MPI, Occam, Erlang, etc…Most hardware architectures from a single language:Manycore, grid, cloud, GPU
  • 21. Roadmap as of February 2011Ateji PX 1.1 (multicore version) available today Free evaluation version on www.ateji.comGPU version coming soonOpenGPU projectDistributed version coming soon Grid / Cluster / CloudInteractive correctness proofsIntegration of profiling tools
  • 22. Call to ActionFree download on www.ateji.com/pxRead the whitepapersPlay with the online demoLook at the samples libraryBenchmark your || codeContact  info@ateji.comBlog : ateji.blogspot.com
  • 23. © Ateji – All rights reserved.