HPC in R

1. Taking R on limit Kutergin Alex Perm State University, MiFIT 16 october 2012 Kutergin A. High performance computing with R

2. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package ﬁlehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R

25. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and eﬀectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R

36. General words about R View of R work session Kutergin A. High performance computing with R

37. General words about R packages and information sources There are two sources of happiness for R-programmer Source of information Source of packages Kutergin A. High performance computing with R

38. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more eﬀectively Kutergin A. High performance computing with R

53. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R

60. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Function’s profile Let us compare work of universal function lm() and more specific function lm.fit() #Loading some dataset d a t a ( longley ) #Recording profile to file lm.out Rprof ( " l m . o u t " ) #Runnig lm() 1000 times i n v i s i b l e ( r e p l i c a t e (1000 , l m ( Employed ~ . -1 , d a t a = longley ) ) ) #Switch off profiling Rprof ( NULL ) Kutergin A. High performance computing with R

61. The basic ways of speeding up the R-code Analysis of the eﬀectiveness of programs #Preparing data for lm.fit() longleydm < - d a t a . m a t r i x ( d a t a . f r a m e ( longley ) ) #Recording profile to file lm.fit.out Rprof ( " l m . f i t . o u t " ) #Runnig lm.fit() 1000 times i n v i s i b l e ( r e p l i c a t e (1000 , l m . fit ( longleydm [ , -7] , longleydm [ ,7]) ) ) #Switch off profiling Rprof ( NULL ) #Results of profiling summaryRprof ( " l m . o u t " ) $ sampling . t i m e [1] 3.12 summaryRprof ( " l m . f i t . o u t " ) $ sampling . t i m e [1] 0.18 #What a difference! Kutergin A. High performance computing with R

62. The basic ways of speeding up the R-code Analysis of the eﬀectiveness of programs Package profr This package allows you to visualize the results of proﬁling library (" profr ") p l o t ( p a r s e _ rprof ( " l m . o u t " ) , main = " P r o f i l e ␣ o f ␣ lm () ") p l o t ( p a r s e _ rprof ( " l m . f i t . o u t " ) , main = " P r o f i l e ␣ of ␣ lm . fit () ") Package proftools This package allows you to visualize call graph for a function l i b r a r y (" R g r a p h v i z "); l i b r a r y (" p r o f t o o l s ") lmfitprod < - readProfileData ( " l m . f i t . o u t " ) pl o t P r o f i l e C al l Gr a p h ( lmfitprod ) Kutergin A. High performance computing with R

63. The basic ways of speeding up the R-code Analysis of the eﬀectiveness of programs Kutergin A. High performance computing with R

64. The basic ways of speeding up the R-code Analysis of the eﬀectiveness of programs Сall graph Kutergin A. High performance computing with R

65. The basic ways of speeding up the R-code Analysis of the eﬀectiveness of programs Another example of proﬁling: its = 2500; d i m = 1750 X = m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) my . cross . p r o d < - f u n c t i o n ( X ) { C = m a t r i x (0 , n c o l ( X ) , n c o l ( X ) ) f o r ( i in 1: n r o w ( X ) ) { C = C + X [i ,] % o % X [i ,] } return (C) } l i b r a r y ( proftools ) C = my . cross . p r o d ( X ) C1 = t ( X ) % * % X C2 = c r o s s p r o d ( X ) Rprof ( NULL ) p r i n t ( a l l . e q u a l ( C , C1 , C2 ) ) Kutergin A. High performance computing with R

66. The basic ways of speeding up the R-code Analysis of the eﬀectiveness of programs Result: l i b r a r y ( proftools ) profile . data <- readProfileData ( " m a t r i x - m u l t . o u t " ) flatProfile ( p r o f i l e . d a t a ) / total . pct total . t i m e self . pct self . t i m e my . cross . p r o d 87.31 88.36 0.04 0.04 + 49.84 50.44 49.84 50.44 %o% 37.37 37.82 0.00 0.00 outer 37.37 37.82 37.27 37.72 %*% 7.75 7.84 7.75 7.84 crossprod 4.86 4.92 4.86 4.92 t 0.16 0.16 0.06 0.06 t. default 0.10 0.10 0.10 0.10 matrix 0.06 0.06 0.06 0.06 as . vector 0.02 0.02 0.02 0.02 Kutergin A. High performance computing with R

67. The basic ways of speeding up the R-code Vectorization of code Note! Loops in R are slow! You can speed up your code by using operation with vectors and matrix. It’s another style of programming, but you have to use it! #Simple example of vectorization: #component-wise addition of two vectors #Generating some random data #First vector a < - r n o r m ( n = 10000000) #Second vector b < - r n o r m ( n = 10000000) #Vector for result x < - r e p (0 , l e n g t h ( a ) ) Kutergin A. High performance computing with R

68. The basic ways of speeding up the R-code Vectorization of code So, what about results? #Slow way time _1 <- system . time ( f o r ( i in 1: l e n g t h ( a ) ) { x [ i ] < - a [ i ]+ b [ i ] } ) ; t i m e _ 1[3] 36.97 #Fast way t i m e _ 2 < - s y s t e m . t i m e ( x < - a + b ) ; t i m e _ 2[3] 0.04 Acceleration < - t i m e _ 1[3] / t i m e _ 2[3] Acceleration 924.25 #That’s hot!!!! Kutergin A. High performance computing with R

69. The basic ways of speeding up the R-code Using magic of linear algebra Using linear algebra operations #Scalar product #Slow way s t a r t < - p r o c . t i m e () res < - 0 f o r ( i in 1: l e n g t h ( a ) ) { res < - res + a [ i ] * b [ i ] } e n d < - p r o c . t i m e () - s t a r t ; e n d [3] 16.71 #Fast s y s t e m . t i m e ( a % * % b ) [3] 0.09 #Even faster... s y s t e m . t i m e ( s u m ( a * b ) ) [3] 0.08 Kutergin A. High performance computing with R

70. The basic ways of speeding up the R-code Using magic of linear algebra Using linear algebra operations #Matrix multiplication slow version its < - 2500; d i m < - 1750; X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) X _ transp < - t ( X ) res < - a r r a y ( NA , d i m = c (1750 , 1750) ) s t a r t < - p r o c . t i m e () f o r ( i in 1: n r o w ( X _ transp ) ) { f o r ( j in 1: n c o l ( X ) ) { res [i , j ] < - s u m ( X _ transp [i ,] * X [ , j ]) } } e n d < - p r o c . t i m e () - s t a r t ; e n d [3] 221.67 Kutergin A. High performance computing with R

71. The basic ways of speeding up the R-code Using magic of linear algebra Package BLAS BLAS means: Basic Linear Algebra Subprogram. This package contains the optimized algorithms for linear algebra operations and uses all cores of multi-core machine automatically. #Matrix multiplication fast version #BLAS matrix mult s y s t e m . t i m e ( X _ transp % * % X ) [3] 7.77 #Even faster... s y s t e m . t i m e ( c r o s s p r o d ( X ) ) [3] 4.98 Kutergin A. High performance computing with R

72. The basic ways of speeding up the R-code Using build-in R-functions Package base You can ﬁnd full list of build-in R-function in the documentation for this package #Let us define a function mySum < - f u n c t i o n ( N ) { sumVal < - 0 f o r ( i in 1: N ) { sumVal < - sumVal + i } r e t u r n ( sumVal ) } s y s t e m . t i m e ( mySum (1000000) ) [3] 0.62 s y s t e m . t i m e ( s u m ( a s . n u m e r i c ( s e q (1 , 1000000) ) ) ) [3] 0.05 Kutergin A. High performance computing with R

73. The basic ways of speeding up the R-code Using build-in R-functions Why are build R-functions faster? R programming language works in interpreter mode. This is always slowly than using the compiled code. So, when you call build-in R-function, you call optimized and compiled code. Also build-in functions are written in more low-level programming language (like C/C++ or FORTRAN) and this provides greater access to the capabilities of the hardware Note! You can select data from vector, matrix, data.frame or array using some condition that applies to row or column of data object. It’s fast and convenient #Extracting only positive values from first column of X its < - 2500; d i m < - 1750; X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) X [ X [ ,1] >0 , 1] Kutergin A. High performance computing with R

74. The special way of speeding up the R-code Package pnmath Another easy way to get a speed-up is to use the pnmath package in R. This package takes many of the standard math functions in R and replaces them with multi-threaded versions, using OpenMP. Some functions get more of a speed-up than others with pnmath. #Generating random data v1 < - r u n i f (1000) v2 < - r u n i f (100000000) #Time of execution without pnmath s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) ) s y s t e m . t i m e ( e x p ( v2 ) ) s y s t e m . t i m e ( s q r t ( v2 ) ) #Time of execution with pnmath l i b r a r y ( pnmath ) s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) ) s y s t e m . t i m e ( e x p ( v2 ) ) s y s t e m . t i m e ( s q r t ( v2 ) ) Kutergin A. High performance computing with R

75. Problem of data splitting Our problem: Before you start the calculation you need to split your data set according the number of threads. Another reason is more eﬀective data processing in loops Package iterator The iterators package provides tools for iterating over various R data structures. Iterators are available for vectors, lists, matrices, arrays, data frames and ﬁles. By following very simple conventions, new iterators can be written to support any type of data source, such as database queries or dynamically generating data Download You can download this useful package from CRAN (available for Windows!): http: //cran.r-project.org/web/packages/iterators/index.html Kutergin A. High performance computing with R

79. Problem of data splitting: package iterators Capabilities icount(count) This method returns the iterator that counts starting from one. Count - number of times that iterator will be fire. If not specified, it will count forever nextElem() This function returns next value of pre-define iterator. When the iterator has no more values, it calls stop with massage "StopIteration" l i b r a r y ( iterators ) #create an iterator that counts from 1 to 3. it < - icount (2) nextElem ( it ) Example: [1] 1 nextElem ( it ) [1] 2 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R

80. Problem of data splitting: package iterators Capabilities You can create iterators by rows of your data structure using iter() function: l i b r a r y ( iterators ) #Creating iterator by rows of data set irState < - iter ( state . x77 , b y = " r o w " ) nextElem ( irState ) Population Income Illiteracy Life Murder Area Alabama 3615 3624 2.1 69.05 15.1 50708 nextElem ( irState ) Population Income Illiteracy Life Murder Area Alaska 365 6315 1.5 69.31 11.3 566432 nextElem ( irState ) Population Income Illiteracy Life Murder Area Arizona 2212 4530 1.8 70.55 7.8 113417 Kutergin A. High performance computing with R

81. Problem of data splitting: package iterators Capabilities You can create iterators by columns of your data structure using iter() #Creating iterator by columns of data set icState < - iter ( state . x77 , b y = " c o l " ) nextElem ( icState ) Population Alabama 3615 Alaska 365 Arizona 2212 nextElem ( icState ) function: Illiteracy Alabama 2.1 Alaska 1.5 Arizona 1.8 nextElem ( icState ) Income Alabama 3624 Alaska 6315 Arizona 4530 Kutergin A. High performance computing with R

82. Problem of data splitting: package iterators Capabilities You can create iterators using iter() function from data object returned by some other function: l i b r a r y ( iterators ) #Define a function, wich generate random data GetDataStructure < - f u n c t i o n ( meanVal1 , meanVal2 , sdVal1 , sdVal2 ) { a < - r n o r m (4 , m e a n = meanVal1 , s d = sdVal1 ) b < - r n o r m (4 , m e a n = meanVal2 , s d = sdVal2 ) data <- a%o%b return ( data ) } ifun < - iter ( GetDataStructure (25 ,27 ,2.5 ,3.5) , b y = " r o w " ) nextElem ( ifun ) ; nextElem ( ifun ) [ ,1] [ ,2] [ ,3] [ ,4] [1 ,] 701.7055 939.6574 764.7724 799.6965 [ ,1] [ ,2] [ ,3] [ ,4] [1 ,] 647.6349 867.2512 705.8422 738.0752 Kutergin A. High performance computing with R

83. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R

93. Problem of data splitting: package iterators Capabilities Example: l i b r a r y ( iterators ) # divide the value 10 into 3 pieces it < - idiv (10 , chunks =3) nextElem ( it ) [1] 4 nextElem ( it ) [1] 3 nextElem ( it ) [1] 3 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R

94. Problem of data splitting: package iterators Capabilities Example: l i b r a r y ( iterators ) # divide the value 10 into pieces no larger than 3 it < - idiv (10 , chunkSize =3) nextElem ( it ) [1] 3 nextElem ( it ) [1] 3 nextElem ( it ) [1] 2 nextElem ( it ) [1] 2 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R

95. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R

105. Problem of data splitting: package iterators Capabilities Example: l i b r a r y ( iterators ) #Gnerating random data its < - 2000000; d i m < - 3; d a t a < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) #Writing them to HDD DATA _ PATH < - " E : / R _ w o r k s / d a t a . t x t " #Size of this file - 123 Mb w r i t e . t a b l e ( d a t a , f i l e = DATA _ PATH , a p p e n d = FALSE , sep = " t " , dec = " . " ) Kutergin A. High performance computing with R

106. Problem of data splitting: package iterators Capabilities #Creating an iterator from these file ifile < - iread . t a b l e ( DATA _ PATH , header = TRUE , r o w . n a m e s = NULL , verbose = FALSE ) row . names V1 V2 V3 1 1 -1.042623 -1.386382 0.399798 > nextElem ( ifile ) row . names V1 V2 V3 1 2 0.8841238 -1.296501 0.1580505 > nextElem ( ifile ) row . names V1 V2 V3 1 3 -0.3195784 -0.6830442 0.3647958 #It works very fast!!!! #remove the file f i l e . r e m o v e ( DATA _ PATH ) Kutergin A. High performance computing with R

107. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R

117. Problem of data splitting: package iterators Capabilities x < - r n o r m (200) f < - f a c t o r ( s a m p l e (1:10 , l e n g t h ( x ) , r e p l a c e = TRUE ) ) it < - isplit (x , f ) nextElem ( it ) $ value [1] 0.14087878 -0.94439161 0.13593045 [4] -0.25732860 0.09422130 -0.55166303 [7] -0.18325419 -0.00871019 0.38344388 [10] -1.05761926 1.16126462 -0.02280205 [13] -0.67338941 1.68724264 0.92112983 [16] 1.39782337 -0.51060989 $ key $ key [[1]] [1] " 1 " Kutergin A. High performance computing with R

118. Problem of data splitting: package iterators Capabilities Special types of iterators Also there are special types of iterators. Like: irnorm(..., cont) or irunif(..., count). These function returns an iterator that return random number of various distributions. Each one is a wrapper around a standard R function count - number of times that the iterator will fire. If not specified, it will fire values forever ... - arguments to pass to the underling rnorm function Example: # create an iterator that returns three random numbers it < - irnorm (1 , c o u n t =2) nextElem ( it ) ; nextElem ( it ) [1] 0.1592311 [1] -1.387449 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R

119. Parallel computation with R: high-level parallelism packages: parallel, snow Scope High-level parallelism means that you do not need to deﬁne ideology of communication between thread. Which process is master, which processes are slaves? You only initialize parallel environment and work inside it. All the details are on the shoulders of the package’s methods Package: snow Package contains the basic function allow you to create diﬀerent type of clusters on a multicore machine Package: parallel This package is an add-on packages multicore and snow and provides drop- in replacements for most of the functionality of those packages Kutergin A. High performance computing with R

123. Parallel computation with R: high-level parallelism package: parallel Description The landscape of parallel computing has changed with the advent of shared-memory computers with multiple (and often many) CPU cores. Until the late 2000’s parallel computing was mainly done on clusters of large numbers of single- or dual-CPU computers: nowadays even laptops have two or four cores, and servers with 8 or more cores are commonplace. It is such hardware that package parallel is designed to exploit. It can also be used with several computers running the same version of R connected by (reasonable-speed) ethernet: the computers need not be running the same OS Scope Parallelism can be done in computation at many diﬀerent levels: this package is principally concerned with "coarse-grained parallelization" Kutergin A. High performance computing with R

126. Parallel computation with R: high-level parallelism package: parallel Computational model This package handles running much larger chunks of computations in parallel. The crucial point is that these chunks of computation are unrelated and do not need to communicate in any way. It is often the case that the chunks take approximately the same length of time. The basic computational model is ( a ) Start up M "worker"processes, and do any initialization needed on the workers ( b ) Send any data required for each task to the workers ( c ) Split the task into M roughly equally-sized chunks, and send the chunks (including the Rcode needed) to the workers ( d ) Wait for all the workers to complete their tasks, and ask them for their results ( e ) Repeat steps (b - d) for any further tasks ( f ) Shut down the worker processes Kutergin A. High performance computing with R

127. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R

143. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not speciﬁed, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R

153. Parallel computation with R: high-level parallelism Example of cluster based on parallel package l i b r a r y ( parallel ) l i b r a r y ( doParallel ) #Detect how many cores we have CoresCount < - detectCores () ; CoresCount [1] 4 > #Initializing the cluster cl < - makeCluster ( CoresCount ) ; cl s o c k e t cluster with 4 nodes o n host ‘’localhost #How many cores of our cluster we are going to use CoresCountFor eUse < - CoresCount ; Co resCountF oreUse [1] 4 #Register parallel backend regist erD oP ar all el ( cl , cores = Co resCount ForeUse ) #Some expresions #Stop our cluster stopCluster ( cl ) Kutergin A. High performance computing with R

154. Parallel computation with R: high-level parallelism Example of cluster based on snow package l i b r a r y ( snow ) l i b r a r y ( doSNOW ) #Make socket cluster with four threads clSnow < - makeCluster ( c ( " l o c a l h o s t " , " l o c a l h o s t " , " l o c a l h o s t " , " l o c a l h o s t " ) , type = " SOCK ") clSnow s o c k e t cluster with 4 nodes o n host ‘’localhost registerDoSNOW ( clSnow ) #Some expresions #Stop our cluster stopCluster ( clSnow ) Kutergin A. High performance computing with R

155. Parallel computation with R: low-level parallelism Package: Rmpi Description This is a basic tutorial on parallel programming in R using Rmpi, the MPI interface for R. This R package allow you to create R programs which run cooperatively in parallel across multiple machines, or multiple CPUs on one machine, to accomplish a goal more quickly than running a single program on one machine So... I have not worked with this package yet, thus I can’t say much about it. This work is on process Kutergin A. High performance computing with R

158. Parallel computation with R: parallel execution of for-loops Package: foreach Motivation In many practical cases it is impossible to avoid the usage of loop. Loops are slow and it will be great to reach the speed of loop’s execution Description The foreach package provides new looping construct for executing R code repeatedly. The main reason for using the foreach package is that it supports parallel execution. The foreach package can be used with a variety of different parallel computing systems, include NetWorkSpaces and snow. In addition, foreach can be used with iterators, which allows the data to specified in a very flexible way Note! Foreach structures work in parallel only inside initialized parallel environment! You can used it in parallel only inside parallel or snow clusters Kutergin A. High performance computing with R

162. Parallel computation with R: parallel execution of for-loops Operators used with foreach object Operator %do% It is a binary operator that operate on a foreach object and R expression. The expression is evaluated multiple times in an environment that is created by the foreach object, and that environment is modiﬁed for each evaluation as speciﬁed by the foreach object. %do% evaluate the expression sequentially. The results of evaluating expression are returned as a list by default Operator %dopar% %dopar% is a parallel version of %do% operator. It evaluates expression in parallel Operator %:% The operator %:% is called nested operator. It is a binary operator used to merge two foreach objects into single structure Kutergin A. High performance computing with R

166. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .combine - function that is used to process the tasks results as they generated. This can be specified as a non-empty character string naming the function. Specifying "c"is useful to concatenating the results into a vector. The values "rbind"and "cbind"can combine vectors into matrix. The values "+"and "*"can used to process numeric data .inorder - logical flag indicating whether the .combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting .inorder to FALSE can give improved performance .multicombine - logical flag indicating whether .combine function can accept more then to arguments. If it can take more then two arguments, then setting .multicombine to TRUE could improve the performance Kutergin A. High performance computing with R

175. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R

185. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #sequentially t i m e _ s e q < - s y s t e m . t i m e ( f o r e a c h ( i =1:100) % d o % { s u m ( r u n i f (10000000) ) }) t i m e _ s e q [3] 31.06 #in parallel t i m e _ p a r < - s y s t e m . t i m e ( f o r e a c h ( i =1:100) % dopar % { s u m ( r u n i f (10000000) ) }) t i m e _ p a r [3] 15.25 #acceleration acceleration < - t i m e _ s e q [3] / t i m e _ p a r [3] acceleration elapsed 2.036721 Kutergin A. High performance computing with R

186. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #sequentially t i m e _ s e q < - s y s t e m . t i m e ( f o r e a c h ( i =1:100) % do % { s u m ( s i n ( r u n i f (10000000) ) ) }) t i m e _ s e q [3] 87.46 #in parallel t i m e _ p a r < - s y s t e m . t i m e ( f o r e a c h ( i =1:100) % dopar % { s u m ( s i n ( r u n i f (10000000) ) ) }) t i m e _ p a r [3] 33.82 #acceleration acceleration < - t i m e _ s e q [3] / t i m e _ p a r [3] acceleration elapsed 2.586044 Kutergin A. High performance computing with R

187. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Combine results as a vector foreachResult < - f o r e a c h ( i =1:100) % dopar % { s u m ( r u n i f (10000000) ) } c l a s s ( foreachResult ) [1] " l i s t " n r o w ( foreachResult ) NULL n c o l ( foreachResult ) NULL l e n g t h ( foreachResult ) [1] 100 Kutergin A. High performance computing with R

188. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Combine results as matrix by columns > foreachResult2 < - f o r e a c h ( i =1:100 , . combine = " c b i n d " ) % dopar % { s u m ( r u n i f (10000000) ) } c l a s s ( foreachResult2 ) [1] " m a t r i x " n r o w ( foreachResult2 ) [1] 1 n c o l ( foreachResult2 ) [1] 100 Kutergin A. High performance computing with R

189. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Combine results as matrix by columns foreachResult3 < - f o r e a c h ( i =1:100 , . combine = " r b i n d " ) % dopar % { s u m ( r u n i f (10000000) ) } c l a s s ( foreachResult3 ) [1] " m a t r i x " n r o w ( foreachResult3 ) [1] 100 n c o l ( foreachResult3 ) [1] 1 Kutergin A. High performance computing with R

190. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #parallel, .multicombine = FALSE, .inorder = TRUE time1 < - s y s t e m . t i m e ( f o r e a c h ( i =1:100 , . combine = " r b i n d " , . multicombine = FALSE , . inorder = TRUE ) % dopar % { s u m ( r u n i f (10000000) ) }) time1 [3] elapsed 15.13 #parallel .multicombine = TRUE и .inorder = FALSE time2 < - s y s t e m . t i m e ( f o r e a c h ( i =1:100 , . combine = " r b i n d " , . multicombine = TRUE , . inorder = FALSE ) % dopar % { s u m ( r u n i f (10000000) ) }) time2 [3] elapsed 15.02 Kutergin A. High performance computing with R

191. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #parallel, list as a result t i m e _ l i s t < - s y s t e m . t i m e ( f o r e a c h ( i =1:100) % dopar % { s u m ( r u n i f (10000000) ) }) t i m e _ l i s t [3] elapsed 15.24 acceleration < - time1 [3] / time2 [3] acceleration elapsed 1.007324 accelerationL ist1 < - t i m e _ l i s t [3] / time1 [3] accelerationL ist1 elapsed 1.00727 accelerationL ist2 < - t i m e _ l i s t [3] / time2 [3] accelerationL ist2 elapsed 1.014647 Kutergin A. High performance computing with R

192. Parallel computation with R: parallel execution of for-loops Examples of foreach usage s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) % do % { f o r e a c h ( k =1:1000 , . combine = " c " , . multicombine = TRUE , . inorder = FALSE ) % do % { sin (i)* cos (k) } } e n d < - p r o c . t i m e () - s t a r t end 1.76 SomeResult [1] 0.6106603 Kutergin A. High performance computing with R

193. Parallel computation with R: parallel execution of for-loops Examples of foreach usage s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) % do % { f o r e a c h ( k =1:1000 , . combine = " c " , . multicombine = TRUE , . inorder = FALSE ) % dopar % { sin (i)* cos (k) } } e n d < - p r o c . t i m e () - s t a r t end 35.79 SomeResult [1] 0.6106603 Kutergin A. High performance computing with R

194. Parallel computation with R: parallel execution of for-loops Examples of foreach usage However, this construction does not work. It’s sad... #Not run s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) % dopar % { f o r e a c h ( k =1:1000 , . combine = " c " , . multicombine = TRUE , . inorder = FALSE ) % do % { sin (i)* cos (k) } } e n d < - p r o c . t i m e () - s t a r t end SomeResult Kutergin A. High performance computing with R

195. Parallel computation with R: parallel execution of for-loops Examples of foreach usage So, how to execute four task (each has 10000000 iterations) into four threads in parallel #Define a function #This function emulates our single 10000000-iteration task #inside foreach loop #This is necessary because only internal foreach loop #can be execute in parallel mod GetSomeData < - f u n c t i o n ( indexVal ) { tmpData < - r e p ( NA , l e n g t h = 10000000) f o r ( j in 1:10000000) { tmpData [ j ] < - s i n ( indexVal ) * c o s ( j ) } r e t u r n ( tmpData ) } Kutergin A. High performance computing with R

196. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Four tasks, each has 10000000 iterations #sequentially s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m , . multicombine = TRUE , . inorder = FALSE ) % d o % { GetSomeData ( i ) } e n d < - p r o c . t i m e () - s t a r t end 120.49 SomeResult [1] -0.645559 Kutergin A. High performance computing with R

197. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Parallel execution #So, here we send 10000000 iterations for each thread s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m , . multicombine = TRUE , . inorder = FALSE ) % dopar % { GetSomeData ( i ) } e n d < - p r o c . t i m e () - s t a r t end 60.76 SomeResult [1] -0.645559 Kutergin A. High performance computing with R

198. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Using Nested operator s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) %:% f o r e a c h ( k =1:1000 , . combine = " c " , . multicombine = TRUE , . inorder = FALSE ) % do % { sin (i)* cos (k) } end2 < - p r o c . t i m e () - s t a r t end2 2.19 SomeResult [1] 0.6106603 Kutergin A. High performance computing with R

199. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Using Nested operator s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) %:% f o r e a c h ( k =1:1000 , . combine = " c " , . multicombine = TRUE , . inorder = FALSE ) % dopar % { sin (i)* cos (k) } end2 < - p r o c . t i m e () - s t a r t end2 35.44 SomeResult [1] 0.6106603 Kutergin A. High performance computing with R

200. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Using iterators and foreach together #Define some function simFun < - f u n c t i o n ( arg1 , arg2 ) { tmp < - 2 * arg1 + 3 * arg2 r e t u r n ( tmp ) } #Generate some random data avec < - r n o r m (1000 , 22 , 3) bvec < - r n o r m (1000 , 24 , 5) Kutergin A. High performance computing with R

201. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Initializing iterators iavec < - iter ( avec ) ibvec < - iter ( bvec ) s t a r t < - p r o c . t i m e () seqSimu l at i on r e su l t < - f o r e a c h ( i = iavec , . combine = " c b i n d " ) %:% f o r e a c h ( j = ibvec , . combine = " c " ) % do % { simFun (i , j ) } e n d < - p r o c . t i m e () - s t a r t end 4.90 Kutergin A. High performance computing with R

202. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Initializing iterators iavec < - iter ( avec ) ibvec < - iter ( bvec ) s t a r t < - p r o c . t i m e () parSimu l at i on r e su l t < - f o r e a c h ( i = iavec , . combine = " c b i n d " ) %:% f o r e a c h ( j = ibvec , . combine = " c " ) % dopar % { simFun (i , j ) } e n d < - p r o c . t i m e () - s t a r t end 13.57 Kutergin A. High performance computing with R

203. Parallel computation with R: parallel execution of for-loops Examples of foreach usage This example uses all tricks #Generating grid x < - s e q ( -10 , 10 , b y =0.1) y < - s e q ( -10 , 10 , b y =0.1) s t a r t < - p r o c . t i m e () z < - f o r e a c h ( y = ivector (x , 4) , . combine = c b i n d ) % dopar % { y < - r e p (y , each = l e n g t h ( x ) ) del < - a b s (1+( x ^ 2 + y ^ 2) ^0.7) r < - ( x ^ 2 + y ^ 2) / 2 m a t r i x (10 * s i n ( r ) / del , l e n g t h ( x ) ) } e n d < - p r o c . t i m e () - s t a r t end 0.37 Kutergin A. High performance computing with R

204. Parallel computation with R: parallel execution of for-loops Examples of foreach usage Result of this code #Plot the results as a perspective plot p e r s p (x , x , z , ylab = ’ y ’ , theta =30 , phi =30 , e x p a n d =0.5 , c o l = " l i g h t b l u e " ) Kutergin A. High performance computing with R

205. Parallel computation with R Parallel computation with graphical processing unit Package: gputools This package provides R interfaces to a handful of common statistical algorithms. These algorithms are implemented in parallel using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS library, and EMI Photonics’ CULA libraries On a computer equiped with an Nvidia GPU some of these functions may be substantially more eﬃcient than native R routines Note! Simply put, this package contains a set of specialized functions that can use GPU for computing. Full list of the functions with description you can ﬁnd in documentation. However, this package is available only for linux Kutergin A. High performance computing with R

213. Parallel computation with R Parallel computation with graphical processing unit Some short example gputools usage: #GPU. Here is an example: l i b r a r y ( gputools ) matA < - m a t r i x ( r u n i f (3 * 2) , 3 , 2) matB < - m a t r i x ( r u n i f (3 * 4) , 3 , 4) #Perform Matrix Cross-product with a GPU gpuCrossprod ( matA , matB ) numVectors < - 5 dimension < - 10 Vectors < - m a t r i x ( r u n i f ( numVectors * dimension ) , > numVectors , dimension ) gpuDist ( Vectors , " e u c l i d e a n " ) gpuDist ( Vectors , " m a x i m u m " ) gpuDist ( Vectors , " m a n h a t t a n " ) gpuDist ( Vectors , " m i n k o w s k i " , 4) Kutergin A. High performance computing with R

214. Working with vary large datasets Package bigmemory Motivation Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R’s rich statistical programming environment Description The package bigmemory and sister packages bridge this gap, implementing massive matrices and supporting their manipulation and exploration The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster Kutergin A. High performance computing with R

222. Working with vary large datasets Bigmemory usage examples #Here is an example that uses a very, very large matrix #This example illustrates how to work with a #big.matrix: no 2147483648 object size limitation. l i b r a r y ( bigmemory ) R < - 3 e9 # 3 billion rows C < - 2 # 2 columns print (" 48 ␣ GB ␣ total ␣ size : ") R * C * 8 # 48 GB total size x < - filebacked . big . m a t r i x ( R , C , type = ’ d o u b l e ’ , backingfile = ’ h u g e - d a t a . b i n ’ , descriptorfile = ’ h u g e - d a t a . d e s c ’ ) #Generates huge-data.bin and huge-data.desc files. #Now we can use huge-data.desc file in any R session. x [1 ,] < - r n o r m ( C ) x [ n r o w ( x ) ,] < - r u n i f ( C ) s u m m a r y ( x [1 ,]) s u m m a r y ( x [ n r o w ( x ) ,]) #Note: This example will leave a 48 GB on your hard drive! Kutergin A. High performance computing with R

223. Working with vary large datasets Package ﬁlehash Motivation Working with large datasets in R can be cumbersome because of the need to keep objects in physical memory. While many might generally see that as a feature of the system, the need to keep whole objects in memory creates challenges to those who might want to work interactively with large datasets Here we take a simple deﬁnition of "large dataset"to be any dataset that cannot be loaded into R as a single R object because of memory limitations. For example, a very large data frame might be too large for all of the columns and rows to be loaded at once. In such a situation, one might load only a subset of the rows or columns, if that is possible Kutergin A. High performance computing with R

229. Working with vary large datasets Package filehash Description The filehash package provides a full read-write implementation of a key-value database for R. The package does not depend on any external packages or software systems and is written entirely in R, making it readily usable on most platforms. The filehash package can be thought of as a specific implementation of the database concept, taking a slightly different approach to the problem Technical Note Key-value databases are sometimes called hash tables. With filehash the values are stored in a file on the disk rather than in memory. When a user requests the values associated with a key, filehash finds the object on the disk, loads the value into R and returns it to the user. The package offers two formats for storing data on the disk: The values can be stored (1) concatenated together in a single file or (2) separately as a directory of files Kutergin A. High performance computing with R

232. Working with vary large datasets Filehash usage examples #Connecting library l i b r a r y ( filehash ) #Creating hash-database on HDD DATA _ PATH < - " E : / R _ works / file _ hash _ data _ s t r o r a g e / db _ test " DATA _ PATH dbCreate ( DATA _ PATH ) #Initializing link to our hash-database db < - dbInit ( DATA _ PATH ) #Load matrix to our database #Dimantions its = 3000000; d i m = 10 dbInsert ( db , " o u r _ b i g _ m a t r i x " , m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) ) Kutergin A. High performance computing with R

233. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R

248. Final words, some useful references and contacts Final words and contacts Well... this presentation is only the beginning of my work in this direction. This is only my ﬁrst try. I will continue this work and will be adding future versions of this presentation with new materials and examples as soon as i have more free time. Also, about quality of this version of the presentation... It is my ﬁrst experience with LaTex system, so don’t judge me harshly. If you are interesting in this scope or have some ideas, you can just write me. I am open for discussion. This is my contacts list: email: aleksey.v.kutergin@gmail.com facebook page: facebook.com/aleksey.kutergin vk page: vk.com/aleksey_v_kutergin Kutergin A. High performance computing with R

HPC in R

More Related Content

Similar to HPC in R (9)

More from Vyacheslav Arbuzov (6)

Recently uploaded (20)

HPC in R