SlideShare a Scribd company logo
Taking R on limit
       Kutergin Alex

Perm State University, MiFIT

      16 october 2012



        Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
Outline

    1   General words about R
    2   Motivation and scope
    3   The basic ways of speeding up the R-code
    4   The special way of speeding up the R-code: package pnmath
    5   Problem of data splitting: package iterator
    6   Parallel computation with R: high-level parallelism (packages:
        parallel, snow and additional packages)
    7   Parallel computation with R: low-level parallelism (package: Rmpi)
    8   Parallel computation with R: parallel execution of for-loops
        (package: foreach)
    9   Parallel computation with R: parallel computation with graphical
        processing unit (package: gputools)
   10   Working with vary large datasets: package filehash and package
        bigmemory
   11   Final words, some useful references and contacts

                              Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R


  R software
  R is free powerful software for data analysis and statistical computing. R
  - console application with its own programming language running in
  interpreter mode. Lack of sophisticated GUI provides a number of
  advantages:
      there is no need to learn which algorithm is behind each button
      you can just learn the basic principles of R-programming and
      effectively solve complex problems using R-programming language

  Download R
      R can be downloaded from following link:
      http://cran.r-project.org/
      Project page: www.r-project.org



                             Kutergin A.   High performance computing with R
General words about R
View of R work session




                         Kutergin A.   High performance computing with R
General words about R
packages and information sources




    There are two sources of happiness for R-programmer
            Source of information                         Source of packages




                                   Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
Motivation and scope

  Motivation
      Computers become more productive. Progress in computer’s
      hardware and software is amazing. These computing power became
      available even in a laptop
      Constantly increasing growth of data’s volume and the complexity of
      problems associated with data processing
      The emergence of multi-core PCs and CUDA technology

  Scope
      We: simple students or not powerful guys. So we don’t have
      supercomputer
      We have Core i5 or Core i7 or another multi-core laptop or PC with
      support of CUDA technology
      We have some computational tasks and we want to solve them more
      effectively


                           Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
How to check time of code’s execution?


    First way to check time of code execution

    #return CPU (and other) times that expr used
    s y s t e m . t i m e ()

    s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) )


    Second way to check time of code execution

    #determines how much real and CPU time (in seconds) the
          currently running R process has already taken
    p r o c . t i m e ()

    s t a r t _ t i m e < - p r o c . t i m e ()
    s u m ( r u n i f (10000000) )
    e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs




    Function’s profile
    Let us compare work of universal function lm() and more specific
    function lm.fit()

    #Loading some dataset
    d a t a ( longley )
    #Recording profile to file lm.out
    Rprof ( " l m . o u t " )
    #Runnig lm() 1000 times
    i n v i s i b l e ( r e p l i c a t e (1000 , l m ( Employed ~ . -1 , d a t a
         = longley ) ) )
    #Switch off profiling
    Rprof ( NULL )



                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs



    #Preparing data for lm.fit()
    longleydm < - d a t a . m a t r i x ( d a t a . f r a m e ( longley ) )
    #Recording profile to file lm.fit.out
    Rprof ( " l m . f i t . o u t " )
    #Runnig lm.fit() 1000 times
    i n v i s i b l e ( r e p l i c a t e (1000 ,
          l m . fit ( longleydm [ , -7] , longleydm [ ,7]) ) )
    #Switch off profiling
    Rprof ( NULL )

    #Results of profiling
    summaryRprof ( " l m . o u t " )  $ sampling . t i m e
    [1] 3.12
    summaryRprof ( " l m . f i t . o u t " )  $ sampling . t i m e
    [1] 0.18
    #What a difference!


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs


    Package profr
    This package allows you to visualize the results of profiling

    library (" profr ")
    p l o t ( p a r s e _ rprof ( " l m . o u t " ) , main = " P r o f i l e ␣ o f ␣
          lm () ")
    p l o t ( p a r s e _ rprof ( " l m . f i t . o u t " ) , main = " P r o f i l e ␣
          of ␣ lm . fit () ")


    Package proftools
    This package allows you to visualize call graph for a function

    l i b r a r y (" R g r a p h v i z "); l i b r a r y (" p r o f t o o l s ")
    lmfitprod < - readProfileData ( " l m . f i t . o u t " )
    pl o t P r o f i l e C al l Gr a p h ( lmfitprod )

                                    Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs




                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs

    Сall graph




                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs

    Another example of profiling:
    its = 2500; d i m = 1750
    X = m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )
    my . cross . p r o d < - f u n c t i o n ( X )
    {
          C = m a t r i x (0 , n c o l ( X ) , n c o l ( X ) )
          f o r ( i in 1: n r o w ( X ) )
          {
                  C = C + X [i ,] % o % X [i ,]
          }
          return (C)
    }
    l i b r a r y ( proftools )
    C = my . cross . p r o d ( X )
    C1 = t ( X ) % * % X
    C2 = c r o s s p r o d ( X )
    Rprof ( NULL )
    p r i n t ( a l l . e q u a l ( C , C1 , C2 ) )

                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Analysis of the effectiveness of programs


    Result:
    l i b r a r y ( proftools )
    profile . data <-
         readProfileData ( " m a t r i x - m u l t . o u t " )
    flatProfile ( p r o f i l e . d a t a )
            / total . pct total . t i m e self . pct self . t i m e
    my . cross . p r o d          87.31     88.36           0.04 0.04
    +                             49.84     50.44 49.84 50.44
    %o%                           37.37     37.82           0.00 0.00
    outer                         37.37     37.82 37.27 37.72
    %*%                             7.75      7.84          7.75 7.84
    crossprod                       4.86      4.92          4.86 4.92
    t                               0.16      0.16          0.06 0.06
    t. default                      0.10      0.10          0.10 0.10
    matrix                          0.06      0.06          0.06 0.06
    as . vector                     0.02      0.02          0.02 0.02


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Vectorization of code




    Note!
    Loops in R are slow! You can speed up your code by using operation with
    vectors and matrix. It’s another style of programming, but you have to
    use it!

                    #Simple example of vectorization:
                    #component-wise addition of two vectors
                    #Generating some random data
                    #First vector
                    a < - r n o r m ( n = 10000000)
                    #Second vector
                    b < - r n o r m ( n = 10000000)
                    #Vector for result
                    x < - r e p (0 , l e n g t h ( a ) )



                               Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Vectorization of code

    So, what about results?
    #Slow way
    time _1 <- system . time
    (
         f o r ( i in 1: l e n g t h ( a ) )
         {
              x [ i ] < - a [ i ]+ b [ i ]
         }
    ) ; t i m e _ 1[3]
    36.97
    #Fast way
    t i m e _ 2 < - s y s t e m . t i m e ( x < - a + b ) ; t i m e _ 2[3]
    0.04
    Acceleration < - t i m e _ 1[3] / t i m e _ 2[3]
    Acceleration
    924.25
    #That’s hot!!!!

                                  Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Using magic of linear algebra

    Using linear algebra operations
    #Scalar product
    #Slow way
    s t a r t < - p r o c . t i m e ()
    res < - 0
    f o r ( i in 1: l e n g t h ( a ) )
    {
         res < - res + a [ i ] * b [ i ]
    }
    e n d < - p r o c . t i m e () - s t a r t ; e n d [3]
    16.71
    #Fast
    s y s t e m . t i m e ( a % * % b ) [3]
    0.09
    #Even faster...
    s y s t e m . t i m e ( s u m ( a * b ) ) [3]
    0.08

                                     Kutergin A.    High performance computing with R
The basic ways of speeding up the R-code
Using magic of linear algebra


    Using linear algebra operations
    #Matrix multiplication slow version
    its < - 2500; d i m < - 1750;
    X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )
    X _ transp < - t ( X )
    res < - a r r a y ( NA , d i m = c (1750 , 1750) )
    s t a r t < - p r o c . t i m e ()
    f o r ( i in 1: n r o w ( X _ transp ) )
    {
         f o r ( j in 1: n c o l ( X ) )
         {
               res [i , j ] < - s u m ( X _ transp [i ,] * X [ , j ])
         }
    }
    e n d < - p r o c . t i m e () - s t a r t ; e n d [3]
    221.67


                                 Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Using magic of linear algebra




    Package BLAS
    BLAS means: Basic Linear Algebra Subprogram. This package contains
    the optimized algorithms for linear algebra operations and uses all cores
    of multi-core machine automatically.

                    #Matrix multiplication fast version
                    #BLAS matrix mult
                    s y s t e m . t i m e ( X _ transp % * % X ) [3]
                    7.77
                    #Even faster...
                    s y s t e m . t i m e ( c r o s s p r o d ( X ) ) [3]
                    4.98




                                     Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Using build-in R-functions



    Package base
    You can find full list of build-in R-function in the documentation for this
    package

    #Let us define a function
    mySum < - f u n c t i o n ( N )
    {
         sumVal < - 0
         f o r ( i in 1: N ) { sumVal < - sumVal + i }
         r e t u r n ( sumVal )
    }
    s y s t e m . t i m e ( mySum (1000000) ) [3]
    0.62
    s y s t e m . t i m e ( s u m ( a s . n u m e r i c ( s e q (1 , 1000000) ) ) ) [3]
    0.05


                                   Kutergin A.   High performance computing with R
The basic ways of speeding up the R-code
Using build-in R-functions


    Why are build R-functions faster?
    R programming language works in interpreter mode. This is always slowly
    than using the compiled code. So, when you call build-in R-function, you
    call optimized and compiled code. Also build-in functions are written in
    more low-level programming language (like C/C++ or FORTRAN) and
    this provides greater access to the capabilities of the hardware

    Note!
    You can select data from vector, matrix, data.frame or array using some
    condition that applies to row or column of data object. It’s fast and
    convenient
             #Extracting only positive values from first column of X
             its < - 2500; d i m < - 1750;
             X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )
             X [ X [ ,1] >0 , 1]


                                   Kutergin A.   High performance computing with R
The special way of speeding up the R-code
  Package pnmath
  Another easy way to get a speed-up is to use the pnmath package in R.
  This package takes many of the standard math functions in R and
  replaces them with multi-threaded versions, using OpenMP. Some
  functions get more of a speed-up than others with pnmath.

             #Generating random data
             v1 < - r u n i f (1000)
             v2 < - r u n i f (100000000)
             #Time of execution without pnmath
             s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) )
             s y s t e m . t i m e ( e x p ( v2 ) )
             s y s t e m . t i m e ( s q r t ( v2 ) )
             #Time of execution with pnmath
             l i b r a r y ( pnmath )
             s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) )
             s y s t e m . t i m e ( e x p ( v2 ) )
             s y s t e m . t i m e ( s q r t ( v2 ) )

                               Kutergin A.   High performance computing with R
Problem of data splitting

  Our problem:
  Before you start the calculation you need to split your data set according
  the number of threads. Another reason is more effective data processing
  in loops

  Package iterator
  The iterators package provides tools for iterating over various R data
  structures. Iterators are available for vectors, lists, matrices, arrays, data
  frames and files. By following very simple conventions, new iterators can
  be written to support any type of data source, such as database queries
  or dynamically generating data

  Download
  You can download this useful package from CRAN (available for
  Windows!): http:
  //cran.r-project.org/web/packages/iterators/index.html

                               Kutergin A.   High performance computing with R
Problem of data splitting

  Our problem:
  Before you start the calculation you need to split your data set according
  the number of threads. Another reason is more effective data processing
  in loops

  Package iterator
  The iterators package provides tools for iterating over various R data
  structures. Iterators are available for vectors, lists, matrices, arrays, data
  frames and files. By following very simple conventions, new iterators can
  be written to support any type of data source, such as database queries
  or dynamically generating data

  Download
  You can download this useful package from CRAN (available for
  Windows!): http:
  //cran.r-project.org/web/packages/iterators/index.html

                               Kutergin A.   High performance computing with R
Problem of data splitting

  Our problem:
  Before you start the calculation you need to split your data set according
  the number of threads. Another reason is more effective data processing
  in loops

  Package iterator
  The iterators package provides tools for iterating over various R data
  structures. Iterators are available for vectors, lists, matrices, arrays, data
  frames and files. By following very simple conventions, new iterators can
  be written to support any type of data source, such as database queries
  or dynamically generating data

  Download
  You can download this useful package from CRAN (available for
  Windows!): http:
  //cran.r-project.org/web/packages/iterators/index.html

                               Kutergin A.   High performance computing with R
Problem of data splitting

  Our problem:
  Before you start the calculation you need to split your data set according
  the number of threads. Another reason is more effective data processing
  in loops

  Package iterator
  The iterators package provides tools for iterating over various R data
  structures. Iterators are available for vectors, lists, matrices, arrays, data
  frames and files. By following very simple conventions, new iterators can
  be written to support any type of data source, such as database queries
  or dynamically generating data

  Download
  You can download this useful package from CRAN (available for
  Windows!): http:
  //cran.r-project.org/web/packages/iterators/index.html

                               Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    icount(count)
    This method returns the iterator that counts starting from one. Count -
    number of times that iterator will be fire. If not specified, it will count
    forever

    nextElem()
    This function returns next value of pre-define iterator. When the iterator
    has no more values, it calls stop with massage "StopIteration"
                 l i b r a r y ( iterators )
                 #create an iterator that counts from 1 to 3.
                 it < - icount (2)
                 nextElem ( it )
    Example:     [1] 1
                 nextElem ( it )
                 [1] 2
                 t r y ( nextElem ( it ) ) # expect a StopIteration exception
                 Error : StopIteration
                               Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities



    You can create iterators by rows of your data structure using iter()
    function:
    l i b r a r y ( iterators )
    #Creating iterator by rows of data set
    irState < - iter ( state . x77 , b y = " r o w " )
    nextElem ( irState )
      Population Income Illiteracy Life Murder Area
    Alabama 3615     3624   2.1    69.05 15.1 50708
    nextElem ( irState )
      Population Income Illiteracy Life Murder Area
    Alaska     365   6315   1.5    69.31 11.3 566432
    nextElem ( irState )
      Population Income Illiteracy Life Murder Area
    Arizona 2212     4530   1.8    70.55 7.8 113417



                               Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities

    You can create iterators by columns of your data structure using iter()
                #Creating iterator by columns of data set
                icState < - iter ( state . x77 , b y = " c o l " )
                nextElem ( icState )
                                 Population
                Alabama                   3615
                Alaska                     365
                Arizona                   2212
                nextElem ( icState )
    function:                    Illiteracy
                Alabama                    2.1
                Alaska                     1.5
                Arizona                    1.8
                nextElem ( icState )
                                 Income
                Alabama              3624
                Alaska               6315
                Arizona              4530
                               Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities

    You can create iterators using iter() function from data object returned
    by some other function:
      l i b r a r y ( iterators )
      #Define a function, wich generate random data
      GetDataStructure < - f u n c t i o n ( meanVal1 , meanVal2 ,
                                                                           sdVal1 , sdVal2 )
      {
               a < - r n o r m (4 , m e a n = meanVal1 , s d = sdVal1 )
               b < - r n o r m (4 , m e a n = meanVal2 , s d = sdVal2 )
               data <- a%o%b
               return ( data )
      }
      ifun < - iter ( GetDataStructure (25 ,27 ,2.5 ,3.5) , b y = " r o w " )
      nextElem ( ifun ) ; nextElem ( ifun )
                  [ ,1]      [ ,2]      [ ,3]     [ ,4]
      [1 ,] 701.7055 939.6574 764.7724 799.6965
                  [ ,1]      [ ,2]      [ ,3]     [ ,4]
      [1 ,] 647.6349 867.2512 705.8422 738.0752
                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    idiv(n, chunk, chunksize)
    This is more interesting iterator. It provides the ability to divide a
    numeric value into pieces
          n - number of times that iterator will fire. If not specified, it will
          count forever
          chunks - the number of pieces that n should be divided into. It
          useful when you know the number of pieces that you want. If
          specified, the chunkSize should not be
          chunkSize - the maximum size of the pieces, that n should be
          divided into. It is useful when you know the size of the pieces that
          you want. If specified, the chunk should not be

    Some thoughts...
    However, practical application of this iterator is unclear. Perhaps it can
    be used to index vector or rows/columns of arrays

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities




                                   Example:
    l i b r a r y ( iterators )
    # divide the value 10 into 3 pieces
    it < - idiv (10 , chunks =3)
    nextElem ( it )
    [1] 4
    nextElem ( it )
    [1] 3
    nextElem ( it )
    [1] 3
    t r y ( nextElem ( it ) ) # expect a StopIteration exception
    Error : StopIteration




                            Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities



                                   Example:
    l i b r a r y ( iterators )
    # divide the value 10 into pieces no larger than 3
    it < - idiv (10 , chunkSize =3)
    nextElem ( it )
    [1] 3
    nextElem ( it )
    [1] 3
    nextElem ( it )
    [1] 2
    nextElem ( it )
    [1] 2
    t r y ( nextElem ( it ) ) # expect a StopIteration exception
    Error : StopIteration



                            Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    iread.table(file,...., verbose = FALSE)
    This is very important iterator. It returns an iterator object over the rows
    of the data frame stored in a file in table format
          file - the name of the file to read data from
          ... - all additional arguments are passed on to the read.table
          function. See the documentation for read.table for more information
          verbose - logical flag indicating whether or not to print the calls to
          read.table

    Note!
    In this version of iread.table, both the read.table arguments header and
    row.names must be specified. This is because the default values of this
    arguments depend on the contents of the beginning of the file. In order
    to make the subsequent calls to read.table work consistently, the user
    must specified those arguments explicitly

                                 Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities




                                         Example:
    l i b r a r y ( iterators )

    #Gnerating random data
    its < - 2000000; d i m < - 3;
    d a t a < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m )

    #Writing them to HDD
    DATA _ PATH < - " E : / R _ w o r k s / d a t a . t x t "
    #Size of this file - 123 Mb
    w r i t e . t a b l e ( d a t a , f i l e = DATA _ PATH ,
                            a p p e n d = FALSE , sep = "  t " , dec = " . " )




                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities



    #Creating an iterator from these file
    ifile < - iread . t a b l e ( DATA _ PATH , header = TRUE ,
                     r o w . n a m e s = NULL , verbose = FALSE )
       row . names               V1           V2       V3
    1           1 -1.042623 -1.386382 0.399798
    > nextElem ( ifile )
      row . names        V1        V2        V3
    1           2 0.8841238 -1.296501 0.1580505
    > nextElem ( ifile )
      row . names         V1          V2        V3
    1           3 -0.3195784 -0.6830442 0.3647958

    #It works very fast!!!!
    #remove the file
    f i l e . r e m o v e ( DATA _ PATH )



                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    isplit(x, f, drop = FALSE)
    Another important type of iterator. It returns the the iterator that divides
    the data in the vector x into the groups define by f
          x - vector or data frame of values to be split into groups
          f - a factor or list of factors used to categorize x
          drop - logical indicating if levels that do not occur should be
          dropped More detailed information you can find in documentation

    Note!
    This is very useful! For example, you have data-vector and vector
    containing values of the factor corresponding these data. Factor has
    pre-defined levels. Thus, you can extract data in loop for each of the
    levels of the factor without additional operations. Also you can define in
    loop’s body some conditions for each level of the factor and use this
    condition as a condition for if() control structures

                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities



      x < - r n o r m (200)
      f < - f a c t o r ( s a m p l e (1:10 , l e n g t h ( x ) ,
          r e p l a c e = TRUE ) )
      it < - isplit (x , f )

      nextElem ( it )
     $ value
      [1] 0.14087878 -0.94439161 0.13593045
      [4] -0.25732860 0.09422130 -0.55166303
      [7] -0.18325419 -0.00871019 0.38344388
    [10] -1.05761926 1.16126462 -0.02280205
    [13] -0.67338941 1.68724264 0.92112983
    [16] 1.39782337 -0.51060989
     $ key
     $ key [[1]]
    [1] " 1 "


                                  Kutergin A.   High performance computing with R
Problem of data splitting: package iterators
Capabilities


    Special types of iterators
    Also there are special types of iterators. Like: irnorm(..., cont) or
    irunif(..., count). These function returns an iterator that return random
    number of various distributions. Each one is a wrapper around a standard
    R function
          count - number of times that the iterator will fire. If not specified, it
          will fire values forever
          ... - arguments to pass to the underling rnorm function

    Example:
    # create an iterator that returns three random numbers
    it < - irnorm (1 , c o u n t =2)
    nextElem ( it ) ; nextElem ( it )
    [1] 0.1592311
    [1] -1.387449
    t r y ( nextElem ( it ) ) # expect a StopIteration exception
    Error : StopIteration
                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: parallel, snow




    Scope
    High-level parallelism means that you do not need to define ideology of
    communication between thread. Which process is master, which
    processes are slaves? You only initialize parallel environment and work
    inside it. All the details are on the shoulders of the package’s methods

    Package: snow
    Package contains the basic function allow you to create different type of
    clusters on a multicore machine

    Package: parallel
    This package is an add-on packages multicore and snow and provides
    drop- in replacements for most of the functionality of those packages



                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: parallel, snow




    Scope
    High-level parallelism means that you do not need to define ideology of
    communication between thread. Which process is master, which
    processes are slaves? You only initialize parallel environment and work
    inside it. All the details are on the shoulders of the package’s methods

    Package: snow
    Package contains the basic function allow you to create different type of
    clusters on a multicore machine

    Package: parallel
    This package is an add-on packages multicore and snow and provides
    drop- in replacements for most of the functionality of those packages



                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: parallel, snow




    Scope
    High-level parallelism means that you do not need to define ideology of
    communication between thread. Which process is master, which
    processes are slaves? You only initialize parallel environment and work
    inside it. All the details are on the shoulders of the package’s methods

    Package: snow
    Package contains the basic function allow you to create different type of
    clusters on a multicore machine

    Package: parallel
    This package is an add-on packages multicore and snow and provides
    drop- in replacements for most of the functionality of those packages



                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: parallel, snow




    Scope
    High-level parallelism means that you do not need to define ideology of
    communication between thread. Which process is master, which
    processes are slaves? You only initialize parallel environment and work
    inside it. All the details are on the shoulders of the package’s methods

    Package: snow
    Package contains the basic function allow you to create different type of
    clusters on a multicore machine

    Package: parallel
    This package is an add-on packages multicore and snow and provides
    drop- in replacements for most of the functionality of those packages



                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: parallel



    Description
    The landscape of parallel computing has changed with the advent of
    shared-memory computers with multiple (and often many) CPU cores.
    Until the late 2000’s parallel computing was mainly done on clusters of
    large numbers of single- or dual-CPU computers: nowadays even laptops
    have two or four cores, and servers with 8 or more cores are
    commonplace. It is such hardware that package parallel is designed to
    exploit. It can also be used with several computers running the same
    version of R connected by (reasonable-speed) ethernet: the computers
    need not be running the same OS

    Scope
    Parallelism can be done in computation at many different levels: this
    package is principally concerned with "coarse-grained parallelization"



                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: parallel



    Description
    The landscape of parallel computing has changed with the advent of
    shared-memory computers with multiple (and often many) CPU cores.
    Until the late 2000’s parallel computing was mainly done on clusters of
    large numbers of single- or dual-CPU computers: nowadays even laptops
    have two or four cores, and servers with 8 or more cores are
    commonplace. It is such hardware that package parallel is designed to
    exploit. It can also be used with several computers running the same
    version of R connected by (reasonable-speed) ethernet: the computers
    need not be running the same OS

    Scope
    Parallelism can be done in computation at many different levels: this
    package is principally concerned with "coarse-grained parallelization"



                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: parallel



    Description
    The landscape of parallel computing has changed with the advent of
    shared-memory computers with multiple (and often many) CPU cores.
    Until the late 2000’s parallel computing was mainly done on clusters of
    large numbers of single- or dual-CPU computers: nowadays even laptops
    have two or four cores, and servers with 8 or more cores are
    commonplace. It is such hardware that package parallel is designed to
    exploit. It can also be used with several computers running the same
    version of R connected by (reasonable-speed) ethernet: the computers
    need not be running the same OS

    Scope
    Parallelism can be done in computation at many different levels: this
    package is principally concerned with "coarse-grained parallelization"



                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: parallel


    Computational model
    This package handles running much larger chunks of computations in
    parallel. The crucial point is that these chunks of computation are
    unrelated and do not need to communicate in any way. It is often the
    case that the chunks take approximately the same length of time. The
    basic computational model is
    ( a ) Start up M "worker"processes, and do any initialization needed on
          the workers
    ( b ) Send any data required for each task to the workers
    ( c ) Split the task into M roughly equally-sized chunks, and send the
          chunks (including the Rcode needed) to the workers
    ( d ) Wait for all the workers to complete their tasks, and ask them for
          their results
    ( e ) Repeat steps (b - d) for any further tasks
    ( f ) Shut down the worker processes

                                Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
package: Snow


   Description
   Package contains the basic function allow you to create different type of
   clusters on a multicore machine. Like
        makeSOCKcluster(names, ..., options = defaultClusterOptions)
        makeMPIcluster(count, ..., options = defaultClusterOptions)
   Also it contains specific functions for computing on SNOW clusters. Like:
        clusterCall(cl, fun, ...) calls a function fun with identical arguments
        ... on each node in the cluster cl and returns a list of the results
        clusterEvalQ(cl, expr) evaluates a literal expression on each cluster
        node. It is a cluster version of evalq
        clusterApply(cl, x, fun, ...) calls fun on the first cluster node with
        arguments seq[[1]] and ..., on the second node with arguments
        seq[[2]] and ..., and so on.
   It makes no sense to go into further syntax. All details you can find in
   documentation

                               Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: doParallel, doSNOW




   Package: doSNOW
   The registerDoSNOW(cl) function is used to register the SNOW parallel
   backend with the foreach package. Where cl - the cluster object to use
   for parallel execution

   Package: doParallel
   The registerDoParallel(cl, cores=NULL, ...) package provides a parallel
   backend for the foreach function using the parallel package. Where
         cl - a cluster object returned by makeCluster, or the number of cores
         to be created in the cluster. If not specified, on Windows a three
         worker cluster is created and used
         cores - the number of cores to use for parallel execution
         ... - package options



                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: doParallel, doSNOW




   Package: doSNOW
   The registerDoSNOW(cl) function is used to register the SNOW parallel
   backend with the foreach package. Where cl - the cluster object to use
   for parallel execution

   Package: doParallel
   The registerDoParallel(cl, cores=NULL, ...) package provides a parallel
   backend for the foreach function using the parallel package. Where
         cl - a cluster object returned by makeCluster, or the number of cores
         to be created in the cluster. If not specified, on Windows a three
         worker cluster is created and used
         cores - the number of cores to use for parallel execution
         ... - package options



                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: doParallel, doSNOW




   Package: doSNOW
   The registerDoSNOW(cl) function is used to register the SNOW parallel
   backend with the foreach package. Where cl - the cluster object to use
   for parallel execution

   Package: doParallel
   The registerDoParallel(cl, cores=NULL, ...) package provides a parallel
   backend for the foreach function using the parallel package. Where
         cl - a cluster object returned by makeCluster, or the number of cores
         to be created in the cluster. If not specified, on Windows a three
         worker cluster is created and used
         cores - the number of cores to use for parallel execution
         ... - package options



                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: doParallel, doSNOW




   Package: doSNOW
   The registerDoSNOW(cl) function is used to register the SNOW parallel
   backend with the foreach package. Where cl - the cluster object to use
   for parallel execution

   Package: doParallel
   The registerDoParallel(cl, cores=NULL, ...) package provides a parallel
   backend for the foreach function using the parallel package. Where
         cl - a cluster object returned by makeCluster, or the number of cores
         to be created in the cluster. If not specified, on Windows a three
         worker cluster is created and used
         cores - the number of cores to use for parallel execution
         ... - package options



                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: doParallel, doSNOW




   Package: doSNOW
   The registerDoSNOW(cl) function is used to register the SNOW parallel
   backend with the foreach package. Where cl - the cluster object to use
   for parallel execution

   Package: doParallel
   The registerDoParallel(cl, cores=NULL, ...) package provides a parallel
   backend for the foreach function using the parallel package. Where
         cl - a cluster object returned by makeCluster, or the number of cores
         to be created in the cluster. If not specified, on Windows a three
         worker cluster is created and used
         cores - the number of cores to use for parallel execution
         ... - package options



                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: doParallel, doSNOW




   Package: doSNOW
   The registerDoSNOW(cl) function is used to register the SNOW parallel
   backend with the foreach package. Where cl - the cluster object to use
   for parallel execution

   Package: doParallel
   The registerDoParallel(cl, cores=NULL, ...) package provides a parallel
   backend for the foreach function using the parallel package. Where
         cl - a cluster object returned by makeCluster, or the number of cores
         to be created in the cluster. If not specified, on Windows a three
         worker cluster is created and used
         cores - the number of cores to use for parallel execution
         ... - package options



                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: doParallel, doSNOW




   Package: doSNOW
   The registerDoSNOW(cl) function is used to register the SNOW parallel
   backend with the foreach package. Where cl - the cluster object to use
   for parallel execution

   Package: doParallel
   The registerDoParallel(cl, cores=NULL, ...) package provides a parallel
   backend for the foreach function using the parallel package. Where
         cl - a cluster object returned by makeCluster, or the number of cores
         to be created in the cluster. If not specified, on Windows a three
         worker cluster is created and used
         cores - the number of cores to use for parallel execution
         ... - package options



                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: doParallel, doSNOW




   Package: doSNOW
   The registerDoSNOW(cl) function is used to register the SNOW parallel
   backend with the foreach package. Where cl - the cluster object to use
   for parallel execution

   Package: doParallel
   The registerDoParallel(cl, cores=NULL, ...) package provides a parallel
   backend for the foreach function using the parallel package. Where
         cl - a cluster object returned by makeCluster, or the number of cores
         to be created in the cluster. If not specified, on Windows a three
         worker cluster is created and used
         cores - the number of cores to use for parallel execution
         ... - package options



                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: doParallel, doSNOW




   Package: doSNOW
   The registerDoSNOW(cl) function is used to register the SNOW parallel
   backend with the foreach package. Where cl - the cluster object to use
   for parallel execution

   Package: doParallel
   The registerDoParallel(cl, cores=NULL, ...) package provides a parallel
   backend for the foreach function using the parallel package. Where
         cl - a cluster object returned by makeCluster, or the number of cores
         to be created in the cluster. If not specified, on Windows a three
         worker cluster is created and used
         cores - the number of cores to use for parallel execution
         ... - package options



                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
packages: doParallel, doSNOW




   Package: doSNOW
   The registerDoSNOW(cl) function is used to register the SNOW parallel
   backend with the foreach package. Where cl - the cluster object to use
   for parallel execution

   Package: doParallel
   The registerDoParallel(cl, cores=NULL, ...) package provides a parallel
   backend for the foreach function using the parallel package. Where
         cl - a cluster object returned by makeCluster, or the number of cores
         to be created in the cluster. If not specified, on Windows a three
         worker cluster is created and used
         cores - the number of cores to use for parallel execution
         ... - package options



                                 Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
Example of cluster based on parallel package


    l i b r a r y ( parallel )
    l i b r a r y ( doParallel )
    #Detect how many cores we have
    CoresCount < - detectCores () ; CoresCount
    [1] 4
    > #Initializing the cluster
    cl < - makeCluster ( CoresCount ) ; cl
    s o c k e t cluster with 4 nodes o n host ‘’localhost
    #How many cores of our cluster we are going to use
    CoresCountFor eUse < -
       CoresCount ; Co resCountF oreUse
    [1] 4
    #Register parallel backend
    regist erD oP ar all el ( cl , cores = Co resCount ForeUse )
           #Some expresions
    #Stop our cluster
    stopCluster ( cl )

                                   Kutergin A.   High performance computing with R
Parallel computation with R: high-level parallelism
Example of cluster based on snow package



    l i b r a r y ( snow )
    l i b r a r y ( doSNOW )

    #Make socket cluster with four threads
    clSnow < - makeCluster ( c ( " l o c a l h o s t " ,
        " l o c a l h o s t " , " l o c a l h o s t " , " l o c a l h o s t " ) , type
        = " SOCK ")
    clSnow
    s o c k e t cluster with 4 nodes o n host ‘’localhost

    registerDoSNOW ( clSnow )
         #Some expresions

    #Stop our cluster
    stopCluster ( clSnow )


                                   Kutergin A.   High performance computing with R
Parallel computation with R: low-level parallelism
Package: Rmpi




   Description
   This is a basic tutorial on parallel programming in R using Rmpi, the MPI
   interface for R. This R package allow you to create R programs which run
   cooperatively in parallel across multiple machines, or multiple CPUs on
   one machine, to accomplish a goal more quickly than running a single
   program on one machine

   So...
   I have not worked with this package yet, thus I can’t say much about it.
   This work is on process




                              Kutergin A.   High performance computing with R
Parallel computation with R: low-level parallelism
Package: Rmpi




   Description
   This is a basic tutorial on parallel programming in R using Rmpi, the MPI
   interface for R. This R package allow you to create R programs which run
   cooperatively in parallel across multiple machines, or multiple CPUs on
   one machine, to accomplish a goal more quickly than running a single
   program on one machine

   So...
   I have not worked with this package yet, thus I can’t say much about it.
   This work is on process




                              Kutergin A.   High performance computing with R
Parallel computation with R: low-level parallelism
Package: Rmpi




   Description
   This is a basic tutorial on parallel programming in R using Rmpi, the MPI
   interface for R. This R package allow you to create R programs which run
   cooperatively in parallel across multiple machines, or multiple CPUs on
   one machine, to accomplish a goal more quickly than running a single
   program on one machine

   So...
   I have not worked with this package yet, thus I can’t say much about it.
   This work is on process




                              Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Package: foreach


    Motivation
    In many practical cases it is impossible to avoid the usage of loop. Loops
    are slow and it will be great to reach the speed of loop’s execution

    Description
    The foreach package provides new looping construct for executing R code
    repeatedly. The main reason for using the foreach package is that it
    supports parallel execution. The foreach package can be used with a
    variety of different parallel computing systems, include NetWorkSpaces
    and snow. In addition, foreach can be used with iterators, which allows
    the data to specified in a very flexible way

    Note!
    Foreach structures work in parallel only inside initialized parallel
    environment! You can used it in parallel only inside parallel or snow
    clusters

                               Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Package: foreach


    Motivation
    In many practical cases it is impossible to avoid the usage of loop. Loops
    are slow and it will be great to reach the speed of loop’s execution

    Description
    The foreach package provides new looping construct for executing R code
    repeatedly. The main reason for using the foreach package is that it
    supports parallel execution. The foreach package can be used with a
    variety of different parallel computing systems, include NetWorkSpaces
    and snow. In addition, foreach can be used with iterators, which allows
    the data to specified in a very flexible way

    Note!
    Foreach structures work in parallel only inside initialized parallel
    environment! You can used it in parallel only inside parallel or snow
    clusters

                               Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Package: foreach


    Motivation
    In many practical cases it is impossible to avoid the usage of loop. Loops
    are slow and it will be great to reach the speed of loop’s execution

    Description
    The foreach package provides new looping construct for executing R code
    repeatedly. The main reason for using the foreach package is that it
    supports parallel execution. The foreach package can be used with a
    variety of different parallel computing systems, include NetWorkSpaces
    and snow. In addition, foreach can be used with iterators, which allows
    the data to specified in a very flexible way

    Note!
    Foreach structures work in parallel only inside initialized parallel
    environment! You can used it in parallel only inside parallel or snow
    clusters

                               Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Package: foreach


    Motivation
    In many practical cases it is impossible to avoid the usage of loop. Loops
    are slow and it will be great to reach the speed of loop’s execution

    Description
    The foreach package provides new looping construct for executing R code
    repeatedly. The main reason for using the foreach package is that it
    supports parallel execution. The foreach package can be used with a
    variety of different parallel computing systems, include NetWorkSpaces
    and snow. In addition, foreach can be used with iterators, which allows
    the data to specified in a very flexible way

    Note!
    Foreach structures work in parallel only inside initialized parallel
    environment! You can used it in parallel only inside parallel or snow
    clusters

                               Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Operators used with foreach object


    Operator %do%
    It is a binary operator that operate on a foreach object and R expression.
    The expression is evaluated multiple times in an environment that is
    created by the foreach object, and that environment is modified for each
    evaluation as specified by the foreach object. %do% evaluate the
    expression sequentially. The results of evaluating expression are returned
    as a list by default

    Operator %dopar%
    %dopar% is a parallel version of %do% operator. It evaluates expression
    in parallel

    Operator %:%
    The operator %:% is called nested operator. It is a binary operator used
    to merge two foreach objects into single structure

                                     Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Operators used with foreach object


    Operator %do%
    It is a binary operator that operate on a foreach object and R expression.
    The expression is evaluated multiple times in an environment that is
    created by the foreach object, and that environment is modified for each
    evaluation as specified by the foreach object. %do% evaluate the
    expression sequentially. The results of evaluating expression are returned
    as a list by default

    Operator %dopar%
    %dopar% is a parallel version of %do% operator. It evaluates expression
    in parallel

    Operator %:%
    The operator %:% is called nested operator. It is a binary operator used
    to merge two foreach objects into single structure

                                     Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Operators used with foreach object


    Operator %do%
    It is a binary operator that operate on a foreach object and R expression.
    The expression is evaluated multiple times in an environment that is
    created by the foreach object, and that environment is modified for each
    evaluation as specified by the foreach object. %do% evaluate the
    expression sequentially. The results of evaluating expression are returned
    as a list by default

    Operator %dopar%
    %dopar% is a parallel version of %do% operator. It evaluates expression
    in parallel

    Operator %:%
    The operator %:% is called nested operator. It is a binary operator used
    to merge two foreach objects into single structure

                                     Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Operators used with foreach object


    Operator %do%
    It is a binary operator that operate on a foreach object and R expression.
    The expression is evaluated multiple times in an environment that is
    created by the foreach object, and that environment is modified for each
    evaluation as specified by the foreach object. %do% evaluate the
    expression sequentially. The results of evaluating expression are returned
    as a list by default

    Operator %dopar%
    %dopar% is a parallel version of %do% operator. It evaluates expression
    in parallel

    Operator %:%
    The operator %:% is called nested operator. It is a binary operator used
    to merge two foreach objects into single structure

                                     Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .combine - function that is used to process the tasks results as they
         generated. This can be specified as a non-empty character string
         naming the function. Specifying "c"is useful to concatenating the
         results into a vector. The values "rbind"and "cbind"can combine
         vectors into matrix. The values "+"and "*"can used to process
         numeric data
         .inorder - logical flag indicating whether the .combine function
         requires the task results to be combined in the same order that they
         were submitted. If the order is not important, then it setting .inorder
         to FALSE can give improved performance
         .multicombine - logical flag indicating whether .combine function
         can accept more then to arguments. If it can take more then two
         arguments, then setting .multicombine to TRUE could improve the
         performance


                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .combine - function that is used to process the tasks results as they
         generated. This can be specified as a non-empty character string
         naming the function. Specifying "c"is useful to concatenating the
         results into a vector. The values "rbind"and "cbind"can combine
         vectors into matrix. The values "+"and "*"can used to process
         numeric data
         .inorder - logical flag indicating whether the .combine function
         requires the task results to be combined in the same order that they
         were submitted. If the order is not important, then it setting .inorder
         to FALSE can give improved performance
         .multicombine - logical flag indicating whether .combine function
         can accept more then to arguments. If it can take more then two
         arguments, then setting .multicombine to TRUE could improve the
         performance


                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .combine - function that is used to process the tasks results as they
         generated. This can be specified as a non-empty character string
         naming the function. Specifying "c"is useful to concatenating the
         results into a vector. The values "rbind"and "cbind"can combine
         vectors into matrix. The values "+"and "*"can used to process
         numeric data
         .inorder - logical flag indicating whether the .combine function
         requires the task results to be combined in the same order that they
         were submitted. If the order is not important, then it setting .inorder
         to FALSE can give improved performance
         .multicombine - logical flag indicating whether .combine function
         can accept more then to arguments. If it can take more then two
         arguments, then setting .multicombine to TRUE could improve the
         performance


                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .combine - function that is used to process the tasks results as they
         generated. This can be specified as a non-empty character string
         naming the function. Specifying "c"is useful to concatenating the
         results into a vector. The values "rbind"and "cbind"can combine
         vectors into matrix. The values "+"and "*"can used to process
         numeric data
         .inorder - logical flag indicating whether the .combine function
         requires the task results to be combined in the same order that they
         were submitted. If the order is not important, then it setting .inorder
         to FALSE can give improved performance
         .multicombine - logical flag indicating whether .combine function
         can accept more then to arguments. If it can take more then two
         arguments, then setting .multicombine to TRUE could improve the
         performance


                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .combine - function that is used to process the tasks results as they
         generated. This can be specified as a non-empty character string
         naming the function. Specifying "c"is useful to concatenating the
         results into a vector. The values "rbind"and "cbind"can combine
         vectors into matrix. The values "+"and "*"can used to process
         numeric data
         .inorder - logical flag indicating whether the .combine function
         requires the task results to be combined in the same order that they
         were submitted. If the order is not important, then it setting .inorder
         to FALSE can give improved performance
         .multicombine - logical flag indicating whether .combine function
         can accept more then to arguments. If it can take more then two
         arguments, then setting .multicombine to TRUE could improve the
         performance


                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .combine - function that is used to process the tasks results as they
         generated. This can be specified as a non-empty character string
         naming the function. Specifying "c"is useful to concatenating the
         results into a vector. The values "rbind"and "cbind"can combine
         vectors into matrix. The values "+"and "*"can used to process
         numeric data
         .inorder - logical flag indicating whether the .combine function
         requires the task results to be combined in the same order that they
         were submitted. If the order is not important, then it setting .inorder
         to FALSE can give improved performance
         .multicombine - logical flag indicating whether .combine function
         can accept more then to arguments. If it can take more then two
         arguments, then setting .multicombine to TRUE could improve the
         performance


                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .combine - function that is used to process the tasks results as they
         generated. This can be specified as a non-empty character string
         naming the function. Specifying "c"is useful to concatenating the
         results into a vector. The values "rbind"and "cbind"can combine
         vectors into matrix. The values "+"and "*"can used to process
         numeric data
         .inorder - logical flag indicating whether the .combine function
         requires the task results to be combined in the same order that they
         were submitted. If the order is not important, then it setting .inorder
         to FALSE can give improved performance
         .multicombine - logical flag indicating whether .combine function
         can accept more then to arguments. If it can take more then two
         arguments, then setting .multicombine to TRUE could improve the
         performance


                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .combine - function that is used to process the tasks results as they
         generated. This can be specified as a non-empty character string
         naming the function. Specifying "c"is useful to concatenating the
         results into a vector. The values "rbind"and "cbind"can combine
         vectors into matrix. The values "+"and "*"can used to process
         numeric data
         .inorder - logical flag indicating whether the .combine function
         requires the task results to be combined in the same order that they
         were submitted. If the order is not important, then it setting .inorder
         to FALSE can give improved performance
         .multicombine - logical flag indicating whether .combine function
         can accept more then to arguments. If it can take more then two
         arguments, then setting .multicombine to TRUE could improve the
         performance


                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .combine - function that is used to process the tasks results as they
         generated. This can be specified as a non-empty character string
         naming the function. Specifying "c"is useful to concatenating the
         results into a vector. The values "rbind"and "cbind"can combine
         vectors into matrix. The values "+"and "*"can used to process
         numeric data
         .inorder - logical flag indicating whether the .combine function
         requires the task results to be combined in the same order that they
         were submitted. If the order is not important, then it setting .inorder
         to FALSE can give improved performance
         .multicombine - logical flag indicating whether .combine function
         can accept more then to arguments. If it can take more then two
         arguments, then setting .multicombine to TRUE could improve the
         performance


                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .errorhandling - specifies how a task evalution error should be
         handled. If the value is "stop then execution will be stopped if an
         error occures. If the value is "remove the result for that task will not
         be returned, or passed to the .combine function. If it is "pass then
         the error object generated by task evaluation will be included with
         the rest of the results. It is assumed that the combine function will
         be able to deal with the error object
         .packages - character vector of packages that the tasks depend on
         .verbose - logical flag enabling verbose messages. This can be very
         useful for trouble shooting

    Further immersion
    As always, you can find all detailed information in documentation for this
    useful package

                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .errorhandling - specifies how a task evalution error should be
         handled. If the value is "stop then execution will be stopped if an
         error occures. If the value is "remove the result for that task will not
         be returned, or passed to the .combine function. If it is "pass then
         the error object generated by task evaluation will be included with
         the rest of the results. It is assumed that the combine function will
         be able to deal with the error object
         .packages - character vector of packages that the tasks depend on
         .verbose - logical flag enabling verbose messages. This can be very
         useful for trouble shooting

    Further immersion
    As always, you can find all detailed information in documentation for this
    useful package

                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .errorhandling - specifies how a task evalution error should be
         handled. If the value is "stop then execution will be stopped if an
         error occures. If the value is "remove the result for that task will not
         be returned, or passed to the .combine function. If it is "pass then
         the error object generated by task evaluation will be included with
         the rest of the results. It is assumed that the combine function will
         be able to deal with the error object
         .packages - character vector of packages that the tasks depend on
         .verbose - logical flag enabling verbose messages. This can be very
         useful for trouble shooting

    Further immersion
    As always, you can find all detailed information in documentation for this
    useful package

                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .errorhandling - specifies how a task evalution error should be
         handled. If the value is "stop then execution will be stopped if an
         error occures. If the value is "remove the result for that task will not
         be returned, or passed to the .combine function. If it is "pass then
         the error object generated by task evaluation will be included with
         the rest of the results. It is assumed that the combine function will
         be able to deal with the error object
         .packages - character vector of packages that the tasks depend on
         .verbose - logical flag enabling verbose messages. This can be very
         useful for trouble shooting

    Further immersion
    As always, you can find all detailed information in documentation for this
    useful package

                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .errorhandling - specifies how a task evalution error should be
         handled. If the value is "stop then execution will be stopped if an
         error occures. If the value is "remove the result for that task will not
         be returned, or passed to the .combine function. If it is "pass then
         the error object generated by task evaluation will be included with
         the rest of the results. It is assumed that the combine function will
         be able to deal with the error object
         .packages - character vector of packages that the tasks depend on
         .verbose - logical flag enabling verbose messages. This can be very
         useful for trouble shooting

    Further immersion
    As always, you can find all detailed information in documentation for this
    useful package

                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .errorhandling - specifies how a task evalution error should be
         handled. If the value is "stop then execution will be stopped if an
         error occures. If the value is "remove the result for that task will not
         be returned, or passed to the .combine function. If it is "pass then
         the error object generated by task evaluation will be included with
         the rest of the results. It is assumed that the combine function will
         be able to deal with the error object
         .packages - character vector of packages that the tasks depend on
         .verbose - logical flag enabling verbose messages. This can be very
         useful for trouble shooting

    Further immersion
    As always, you can find all detailed information in documentation for this
    useful package

                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .errorhandling - specifies how a task evalution error should be
         handled. If the value is "stop then execution will be stopped if an
         error occures. If the value is "remove the result for that task will not
         be returned, or passed to the .combine function. If it is "pass then
         the error object generated by task evaluation will be included with
         the rest of the results. It is assumed that the combine function will
         be able to deal with the error object
         .packages - character vector of packages that the tasks depend on
         .verbose - logical flag enabling verbose messages. This can be very
         useful for trouble shooting

    Further immersion
    As always, you can find all detailed information in documentation for this
    useful package

                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .errorhandling - specifies how a task evalution error should be
         handled. If the value is "stop then execution will be stopped if an
         error occures. If the value is "remove the result for that task will not
         be returned, or passed to the .combine function. If it is "pass then
         the error object generated by task evaluation will be included with
         the rest of the results. It is assumed that the combine function will
         be able to deal with the error object
         .packages - character vector of packages that the tasks depend on
         .verbose - logical flag enabling verbose messages. This can be very
         useful for trouble shooting

    Further immersion
    As always, you can find all detailed information in documentation for this
    useful package

                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .errorhandling - specifies how a task evalution error should be
         handled. If the value is "stop then execution will be stopped if an
         error occures. If the value is "remove the result for that task will not
         be returned, or passed to the .combine function. If it is "pass then
         the error object generated by task evaluation will be included with
         the rest of the results. It is assumed that the combine function will
         be able to deal with the error object
         .packages - character vector of packages that the tasks depend on
         .verbose - logical flag enabling verbose messages. This can be very
         useful for trouble shooting

    Further immersion
    As always, you can find all detailed information in documentation for this
    useful package

                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Main arguments of the foreach function



    Note! This is important
         .errorhandling - specifies how a task evalution error should be
         handled. If the value is "stop then execution will be stopped if an
         error occures. If the value is "remove the result for that task will not
         be returned, or passed to the .combine function. If it is "pass then
         the error object generated by task evaluation will be included with
         the rest of the results. It is assumed that the combine function will
         be able to deal with the error object
         .packages - character vector of packages that the tasks depend on
         .verbose - logical flag enabling verbose messages. This can be very
         useful for trouble shooting

    Further immersion
    As always, you can find all detailed information in documentation for this
    useful package

                                  Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage



    #sequentially
    t i m e _ s e q < - s y s t e m . t i m e ( f o r e a c h ( i =1:100)
                                  % d o % { s u m ( r u n i f (10000000) ) })
    t i m e _ s e q [3]
    31.06
    #in parallel
    t i m e _ p a r < - s y s t e m . t i m e ( f o r e a c h ( i =1:100)
                              % dopar % { s u m ( r u n i f (10000000) ) })
    t i m e _ p a r [3]
    15.25
    #acceleration
    acceleration < - t i m e _ s e q [3] / t i m e _ p a r [3]
    acceleration
    elapsed
    2.036721


                                 Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage


    #sequentially
    t i m e _ s e q < - s y s t e m . t i m e ( f o r e a c h ( i =1:100)
                                    % do %
                                          { s u m ( s i n ( r u n i f (10000000) ) ) })
    t i m e _ s e q [3]
    87.46
    #in parallel
    t i m e _ p a r < - s y s t e m . t i m e ( f o r e a c h ( i =1:100)
                             % dopar %
                                 { s u m ( s i n ( r u n i f (10000000) ) ) })
    t i m e _ p a r [3]
    33.82
    #acceleration
    acceleration < - t i m e _ s e q [3] / t i m e _ p a r [3]
    acceleration
     elapsed
    2.586044

                                   Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage




    #Combine results as a vector
    foreachResult < - f o r e a c h ( i =1:100)
                              % dopar %
                                  { s u m ( r u n i f (10000000) ) }
    c l a s s ( foreachResult )
    [1] " l i s t "
    n r o w ( foreachResult )
    NULL
    n c o l ( foreachResult )
    NULL
    l e n g t h ( foreachResult )
    [1] 100




                              Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage




    #Combine results as matrix by columns
    > foreachResult2 < - f o r e a c h ( i =1:100 , . combine =
          " c b i n d " ) % dopar % { s u m ( r u n i f (10000000) ) }
    c l a s s ( foreachResult2 )
    [1] " m a t r i x "
    n r o w ( foreachResult2 )
    [1] 1
    n c o l ( foreachResult2 )
    [1] 100




                             Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage




    #Combine results as matrix by columns
    foreachResult3 < - f o r e a c h ( i =1:100 , . combine =
          " r b i n d " ) % dopar % { s u m ( r u n i f (10000000) ) }
    c l a s s ( foreachResult3 )
    [1] " m a t r i x "
    n r o w ( foreachResult3 )
    [1] 100
    n c o l ( foreachResult3 )
    [1] 1




                              Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage



    #parallel, .multicombine = FALSE, .inorder = TRUE
    time1 < - s y s t e m . t i m e ( f o r e a c h ( i =1:100 ,
        . combine = " r b i n d " , . multicombine = FALSE ,
       . inorder = TRUE ) % dopar %
       { s u m ( r u n i f (10000000) ) })
    time1 [3]
    elapsed
      15.13
    #parallel .multicombine = TRUE и .inorder = FALSE
    time2 < - s y s t e m . t i m e ( f o r e a c h ( i =1:100 ,
        . combine = " r b i n d " , . multicombine = TRUE ,
       . inorder = FALSE ) % dopar %
       { s u m ( r u n i f (10000000) ) })
    time2 [3]
    elapsed
      15.02


                             Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage

    #parallel, list as a result
    t i m e _ l i s t < - s y s t e m . t i m e ( f o r e a c h ( i =1:100)
          % dopar % { s u m ( r u n i f (10000000) ) })
    t i m e _ l i s t [3]
    elapsed
      15.24
    acceleration < - time1 [3] / time2 [3]
    acceleration
    elapsed
    1.007324
    accelerationL ist1 < - t i m e _ l i s t [3] / time1 [3]
    accelerationL ist1
    elapsed
    1.00727
    accelerationL ist2 < - t i m e _ l i s t [3] / time2 [3]
    accelerationL ist2
    elapsed
    1.014647
                                   Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage



    s t a r t < - p r o c . t i m e ()
    SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m )
          % do %
    {
          f o r e a c h ( k =1:1000 , . combine = " c " ,
                . multicombine = TRUE , . inorder = FALSE )
                % do %
          {
                sin (i)* cos (k)
          }
    }
    e n d < - p r o c . t i m e () - s t a r t
    end
    1.76
    SomeResult
    [1] 0.6106603


                            Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage



    s t a r t < - p r o c . t i m e ()
    SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m )
          % do %
    {
          f o r e a c h ( k =1:1000 , . combine = " c " ,
                . multicombine = TRUE , . inorder = FALSE )
                % dopar %
          {
                  sin (i)* cos (k)
          }
    }
    e n d < - p r o c . t i m e () - s t a r t
    end
    35.79
    SomeResult
    [1] 0.6106603


                            Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage


    However, this construction does not work. It’s sad...
    #Not run
    s t a r t < - p r o c . t i m e ()
    SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m )
         % dopar %
    {
         f o r e a c h ( k =1:1000 , . combine = " c " ,
              . multicombine = TRUE , . inorder = FALSE )
              % do %
         {
                sin (i)* cos (k)
         }
    }
    e n d < - p r o c . t i m e () - s t a r t
    end
    SomeResult


                                 Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage


    So, how to execute four task (each has 10000000 iterations) into four
    threads in parallel
    #Define a function
    #This function emulates our single 10000000-iteration task
    #inside foreach loop
    #This is necessary because only internal foreach loop
    #can be execute in parallel mod
    GetSomeData < - f u n c t i o n ( indexVal )
    {
          tmpData < - r e p ( NA , l e n g t h = 10000000)
          f o r ( j in 1:10000000)
          {
              tmpData [ j ] < - s i n ( indexVal ) * c o s ( j )
          }
          r e t u r n ( tmpData )
    }


                              Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage




    #Four tasks, each has 10000000 iterations
    #sequentially
    s t a r t < - p r o c . t i m e ()
    SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ,
         . multicombine = TRUE , . inorder = FALSE ) % d o %
    {
            GetSomeData ( i )
    }
    e n d < - p r o c . t i m e () - s t a r t
    end
    120.49
    SomeResult
    [1] -0.645559




                                 Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage




    #Parallel execution
    #So, here we send 10000000 iterations for each thread
    s t a r t < - p r o c . t i m e ()
    SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ,
         . multicombine = TRUE , . inorder = FALSE )
         % dopar %
    {
           GetSomeData ( i )
    }
    e n d < - p r o c . t i m e () - s t a r t
    end
    60.76
    SomeResult
    [1] -0.645559



                                 Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage




    #Using Nested operator
    s t a r t < - p r o c . t i m e ()
    SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) %:%
          f o r e a c h ( k =1:1000 , . combine = " c " ,
              . multicombine = TRUE , . inorder = FALSE )
              % do %
         {
            sin (i)* cos (k)
       }
    end2 < - p r o c . t i m e () - s t a r t
    end2
    2.19
    SomeResult
    [1] 0.6106603



                               Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage




    #Using Nested operator
    s t a r t < - p r o c . t i m e ()
    SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) %:%
          f o r e a c h ( k =1:1000 , . combine = " c " ,
              . multicombine = TRUE , . inorder = FALSE )
              % dopar %
         {
            sin (i)* cos (k)
       }
    end2 < - p r o c . t i m e () - s t a r t
    end2
    35.44
    SomeResult
    [1] 0.6106603



                               Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage




    #Using iterators and foreach together
    #Define some function

    simFun < - f u n c t i o n ( arg1 , arg2 )
    {
       tmp < - 2 * arg1 + 3 * arg2
       r e t u r n ( tmp )
    }

    #Generate some random data
    avec < - r n o r m (1000 , 22 , 3)
    bvec < - r n o r m (1000 , 24 , 5)




                             Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage



    #Initializing iterators
    iavec < - iter ( avec )
    ibvec < - iter ( bvec )

    s t a r t < - p r o c . t i m e ()
    seqSimu l at i on r e su l t < - f o r e a c h ( i = iavec ,
          . combine = " c b i n d " ) %:%
                              f o r e a c h ( j = ibvec , . combine = " c " )
                                    % do %
                              {
                                      simFun (i , j )
                              }
    e n d < - p r o c . t i m e () - s t a r t
    end
    4.90


                                 Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage



    #Initializing iterators
    iavec < - iter ( avec )
    ibvec < - iter ( bvec )

    s t a r t < - p r o c . t i m e ()
    parSimu l at i on r e su l t < - f o r e a c h ( i = iavec ,
          . combine = " c b i n d " ) %:%
                              f o r e a c h ( j = ibvec , . combine = " c " )
                                    % dopar %
                              {
                                      simFun (i , j )
                              }
    e n d < - p r o c . t i m e () - s t a r t
    end
    13.57


                                 Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage

    This example uses all tricks
    #Generating grid
    x < - s e q ( -10 , 10 , b y =0.1)
    y < - s e q ( -10 , 10 , b y =0.1)

    s t a r t < - p r o c . t i m e ()
    z < - f o r e a c h ( y = ivector (x , 4) , . combine = c b i n d )
          % dopar %
    {
            y < - r e p (y , each = l e n g t h ( x ) )
            del < - a b s (1+( x ^ 2 + y ^ 2) ^0.7)
            r < - ( x ^ 2 + y ^ 2) / 2
            m a t r i x (10 * s i n ( r ) / del , l e n g t h ( x ) )
    }
    e n d < - p r o c . t i m e () - s t a r t
    end
    0.37

                               Kutergin A.   High performance computing with R
Parallel computation with R: parallel execution of for-loops
Examples of foreach usage

    Result of this code
    #Plot the results as a perspective plot
    p e r s p (x , x , z , ylab = ’ y ’ , theta =30 , phi =30 ,
          e x p a n d =0.5 , c o l = " l i g h t b l u e " )




                             Kutergin A.   High performance computing with R
Parallel computation with R
Parallel computation with graphical processing unit




    Package: gputools
         This package provides R interfaces to a handful of common
         statistical algorithms. These algorithms are implemented in parallel
         using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS
         library, and EMI Photonics’ CULA libraries
         On a computer equiped with an Nvidia GPU some of these functions
         may be substantially more efficient than native R routines

    Note!
    Simply put, this package contains a set of specialized functions that can
    use GPU for computing. Full list of the functions with description you can
    find in documentation. However, this package is available only for linux




                                    Kutergin A.   High performance computing with R
Parallel computation with R
Parallel computation with graphical processing unit




    Package: gputools
         This package provides R interfaces to a handful of common
         statistical algorithms. These algorithms are implemented in parallel
         using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS
         library, and EMI Photonics’ CULA libraries
         On a computer equiped with an Nvidia GPU some of these functions
         may be substantially more efficient than native R routines

    Note!
    Simply put, this package contains a set of specialized functions that can
    use GPU for computing. Full list of the functions with description you can
    find in documentation. However, this package is available only for linux




                                    Kutergin A.   High performance computing with R
Parallel computation with R
Parallel computation with graphical processing unit




    Package: gputools
         This package provides R interfaces to a handful of common
         statistical algorithms. These algorithms are implemented in parallel
         using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS
         library, and EMI Photonics’ CULA libraries
         On a computer equiped with an Nvidia GPU some of these functions
         may be substantially more efficient than native R routines

    Note!
    Simply put, this package contains a set of specialized functions that can
    use GPU for computing. Full list of the functions with description you can
    find in documentation. However, this package is available only for linux




                                    Kutergin A.   High performance computing with R
Parallel computation with R
Parallel computation with graphical processing unit




    Package: gputools
         This package provides R interfaces to a handful of common
         statistical algorithms. These algorithms are implemented in parallel
         using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS
         library, and EMI Photonics’ CULA libraries
         On a computer equiped with an Nvidia GPU some of these functions
         may be substantially more efficient than native R routines

    Note!
    Simply put, this package contains a set of specialized functions that can
    use GPU for computing. Full list of the functions with description you can
    find in documentation. However, this package is available only for linux




                                    Kutergin A.   High performance computing with R
Parallel computation with R
Parallel computation with graphical processing unit




    Package: gputools
         This package provides R interfaces to a handful of common
         statistical algorithms. These algorithms are implemented in parallel
         using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS
         library, and EMI Photonics’ CULA libraries
         On a computer equiped with an Nvidia GPU some of these functions
         may be substantially more efficient than native R routines

    Note!
    Simply put, this package contains a set of specialized functions that can
    use GPU for computing. Full list of the functions with description you can
    find in documentation. However, this package is available only for linux




                                    Kutergin A.   High performance computing with R
Parallel computation with R
Parallel computation with graphical processing unit




    Package: gputools
         This package provides R interfaces to a handful of common
         statistical algorithms. These algorithms are implemented in parallel
         using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS
         library, and EMI Photonics’ CULA libraries
         On a computer equiped with an Nvidia GPU some of these functions
         may be substantially more efficient than native R routines

    Note!
    Simply put, this package contains a set of specialized functions that can
    use GPU for computing. Full list of the functions with description you can
    find in documentation. However, this package is available only for linux




                                    Kutergin A.   High performance computing with R
Parallel computation with R
Parallel computation with graphical processing unit




    Package: gputools
         This package provides R interfaces to a handful of common
         statistical algorithms. These algorithms are implemented in parallel
         using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS
         library, and EMI Photonics’ CULA libraries
         On a computer equiped with an Nvidia GPU some of these functions
         may be substantially more efficient than native R routines

    Note!
    Simply put, this package contains a set of specialized functions that can
    use GPU for computing. Full list of the functions with description you can
    find in documentation. However, this package is available only for linux




                                    Kutergin A.   High performance computing with R
Parallel computation with R
Parallel computation with graphical processing unit




    Package: gputools
         This package provides R interfaces to a handful of common
         statistical algorithms. These algorithms are implemented in parallel
         using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS
         library, and EMI Photonics’ CULA libraries
         On a computer equiped with an Nvidia GPU some of these functions
         may be substantially more efficient than native R routines

    Note!
    Simply put, this package contains a set of specialized functions that can
    use GPU for computing. Full list of the functions with description you can
    find in documentation. However, this package is available only for linux




                                    Kutergin A.   High performance computing with R
Parallel computation with R
Parallel computation with graphical processing unit



    Some short example gputools usage:
    #GPU. Here is an example:
    l i b r a r y ( gputools )
    matA < - m a t r i x ( r u n i f (3 * 2) , 3 , 2)
    matB < - m a t r i x ( r u n i f (3 * 4) , 3 , 4)
    #Perform Matrix Cross-product with a GPU
    gpuCrossprod ( matA , matB )
    numVectors < - 5
    dimension < - 10
    Vectors < - m a t r i x ( r u n i f ( numVectors * dimension ) ,
       > numVectors , dimension )
    gpuDist ( Vectors , " e u c l i d e a n " )
    gpuDist ( Vectors , " m a x i m u m " )
    gpuDist ( Vectors , " m a n h a t t a n " )
    gpuDist ( Vectors , " m i n k o w s k i " , 4)


                                    Kutergin A.   High performance computing with R
Working with vary large datasets
Package bigmemory


   Motivation
   Multi-gigabyte data sets challenge and frustrate R users, even on
   well-equipped hardware. Use of C/C++ can provide efficiencies, but is
   cumbersome for interactive data analysis and lacks the flexibility and
   power of R’s rich statistical programming environment

   Description
        The package bigmemory and sister packages bridge this gap,
        implementing massive matrices and supporting their manipulation
        and exploration
        The data structures may be allocated to shared memory, allowing
        separate processes on the same computer to share access to a single
        copy of the data set
        The data structures may also be file-backed, allowing users to easily
        manage and analyze data sets larger than available RAM and share
        them across nodes of a cluster

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Package bigmemory


   Motivation
   Multi-gigabyte data sets challenge and frustrate R users, even on
   well-equipped hardware. Use of C/C++ can provide efficiencies, but is
   cumbersome for interactive data analysis and lacks the flexibility and
   power of R’s rich statistical programming environment

   Description
        The package bigmemory and sister packages bridge this gap,
        implementing massive matrices and supporting their manipulation
        and exploration
        The data structures may be allocated to shared memory, allowing
        separate processes on the same computer to share access to a single
        copy of the data set
        The data structures may also be file-backed, allowing users to easily
        manage and analyze data sets larger than available RAM and share
        them across nodes of a cluster

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Package bigmemory


   Motivation
   Multi-gigabyte data sets challenge and frustrate R users, even on
   well-equipped hardware. Use of C/C++ can provide efficiencies, but is
   cumbersome for interactive data analysis and lacks the flexibility and
   power of R’s rich statistical programming environment

   Description
        The package bigmemory and sister packages bridge this gap,
        implementing massive matrices and supporting their manipulation
        and exploration
        The data structures may be allocated to shared memory, allowing
        separate processes on the same computer to share access to a single
        copy of the data set
        The data structures may also be file-backed, allowing users to easily
        manage and analyze data sets larger than available RAM and share
        them across nodes of a cluster

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Package bigmemory


   Motivation
   Multi-gigabyte data sets challenge and frustrate R users, even on
   well-equipped hardware. Use of C/C++ can provide efficiencies, but is
   cumbersome for interactive data analysis and lacks the flexibility and
   power of R’s rich statistical programming environment

   Description
        The package bigmemory and sister packages bridge this gap,
        implementing massive matrices and supporting their manipulation
        and exploration
        The data structures may be allocated to shared memory, allowing
        separate processes on the same computer to share access to a single
        copy of the data set
        The data structures may also be file-backed, allowing users to easily
        manage and analyze data sets larger than available RAM and share
        them across nodes of a cluster

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Package bigmemory


   Motivation
   Multi-gigabyte data sets challenge and frustrate R users, even on
   well-equipped hardware. Use of C/C++ can provide efficiencies, but is
   cumbersome for interactive data analysis and lacks the flexibility and
   power of R’s rich statistical programming environment

   Description
        The package bigmemory and sister packages bridge this gap,
        implementing massive matrices and supporting their manipulation
        and exploration
        The data structures may be allocated to shared memory, allowing
        separate processes on the same computer to share access to a single
        copy of the data set
        The data structures may also be file-backed, allowing users to easily
        manage and analyze data sets larger than available RAM and share
        them across nodes of a cluster

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Package bigmemory


   Motivation
   Multi-gigabyte data sets challenge and frustrate R users, even on
   well-equipped hardware. Use of C/C++ can provide efficiencies, but is
   cumbersome for interactive data analysis and lacks the flexibility and
   power of R’s rich statistical programming environment

   Description
        The package bigmemory and sister packages bridge this gap,
        implementing massive matrices and supporting their manipulation
        and exploration
        The data structures may be allocated to shared memory, allowing
        separate processes on the same computer to share access to a single
        copy of the data set
        The data structures may also be file-backed, allowing users to easily
        manage and analyze data sets larger than available RAM and share
        them across nodes of a cluster

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Package bigmemory


   Motivation
   Multi-gigabyte data sets challenge and frustrate R users, even on
   well-equipped hardware. Use of C/C++ can provide efficiencies, but is
   cumbersome for interactive data analysis and lacks the flexibility and
   power of R’s rich statistical programming environment

   Description
        The package bigmemory and sister packages bridge this gap,
        implementing massive matrices and supporting their manipulation
        and exploration
        The data structures may be allocated to shared memory, allowing
        separate processes on the same computer to share access to a single
        copy of the data set
        The data structures may also be file-backed, allowing users to easily
        manage and analyze data sets larger than available RAM and share
        them across nodes of a cluster

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Package bigmemory


   Motivation
   Multi-gigabyte data sets challenge and frustrate R users, even on
   well-equipped hardware. Use of C/C++ can provide efficiencies, but is
   cumbersome for interactive data analysis and lacks the flexibility and
   power of R’s rich statistical programming environment

   Description
        The package bigmemory and sister packages bridge this gap,
        implementing massive matrices and supporting their manipulation
        and exploration
        The data structures may be allocated to shared memory, allowing
        separate processes on the same computer to share access to a single
        copy of the data set
        The data structures may also be file-backed, allowing users to easily
        manage and analyze data sets larger than available RAM and share
        them across nodes of a cluster

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Bigmemory usage examples

   #Here is an example that uses a very, very large matrix
   #This example illustrates how to work with a
   #big.matrix: no 2147483648 object size limitation.
   l i b r a r y ( bigmemory )
   R < - 3 e9 # 3 billion rows
   C < - 2 # 2 columns
   print (" 48 ␣ GB ␣ total ␣ size : ")
   R * C * 8 # 48 GB total size
   x < - filebacked . big . m a t r i x ( R , C ,
         type = ’ d o u b l e ’ , backingfile = ’ h u g e - d a t a . b i n ’ ,
         descriptorfile = ’ h u g e - d a t a . d e s c ’ )
   #Generates huge-data.bin and huge-data.desc files.
   #Now we can use huge-data.desc file in any R session.
   x [1 ,] < - r n o r m ( C )
   x [ n r o w ( x ) ,] < - r u n i f ( C )
   s u m m a r y ( x [1 ,])
   s u m m a r y ( x [ n r o w ( x ) ,])
   #Note: This example will leave a 48 GB on your hard drive!

                                 Kutergin A.   High performance computing with R
Working with vary large datasets
Package filehash




   Motivation
         Working with large datasets in R can be cumbersome because of the
         need to keep objects in physical memory. While many might
         generally see that as a feature of the system, the need to keep whole
         objects in memory creates challenges to those who might want to
         work interactively with large datasets
         Here we take a simple definition of "large dataset"to be any dataset
         that cannot be loaded into R as a single R object because of
         memory limitations. For example, a very large data frame might be
         too large for all of the columns and rows to be loaded at once. In
         such a situation, one might load only a subset of the rows or
         columns, if that is possible




                               Kutergin A.   High performance computing with R
Working with vary large datasets
Package filehash




   Motivation
         Working with large datasets in R can be cumbersome because of the
         need to keep objects in physical memory. While many might
         generally see that as a feature of the system, the need to keep whole
         objects in memory creates challenges to those who might want to
         work interactively with large datasets
         Here we take a simple definition of "large dataset"to be any dataset
         that cannot be loaded into R as a single R object because of
         memory limitations. For example, a very large data frame might be
         too large for all of the columns and rows to be loaded at once. In
         such a situation, one might load only a subset of the rows or
         columns, if that is possible




                               Kutergin A.   High performance computing with R
Working with vary large datasets
Package filehash




   Motivation
         Working with large datasets in R can be cumbersome because of the
         need to keep objects in physical memory. While many might
         generally see that as a feature of the system, the need to keep whole
         objects in memory creates challenges to those who might want to
         work interactively with large datasets
         Here we take a simple definition of "large dataset"to be any dataset
         that cannot be loaded into R as a single R object because of
         memory limitations. For example, a very large data frame might be
         too large for all of the columns and rows to be loaded at once. In
         such a situation, one might load only a subset of the rows or
         columns, if that is possible




                               Kutergin A.   High performance computing with R
Working with vary large datasets
Package filehash




   Motivation
         Working with large datasets in R can be cumbersome because of the
         need to keep objects in physical memory. While many might
         generally see that as a feature of the system, the need to keep whole
         objects in memory creates challenges to those who might want to
         work interactively with large datasets
         Here we take a simple definition of "large dataset"to be any dataset
         that cannot be loaded into R as a single R object because of
         memory limitations. For example, a very large data frame might be
         too large for all of the columns and rows to be loaded at once. In
         such a situation, one might load only a subset of the rows or
         columns, if that is possible




                               Kutergin A.   High performance computing with R
Working with vary large datasets
Package filehash




   Motivation
         Working with large datasets in R can be cumbersome because of the
         need to keep objects in physical memory. While many might
         generally see that as a feature of the system, the need to keep whole
         objects in memory creates challenges to those who might want to
         work interactively with large datasets
         Here we take a simple definition of "large dataset"to be any dataset
         that cannot be loaded into R as a single R object because of
         memory limitations. For example, a very large data frame might be
         too large for all of the columns and rows to be loaded at once. In
         such a situation, one might load only a subset of the rows or
         columns, if that is possible




                               Kutergin A.   High performance computing with R
Working with vary large datasets
Package filehash




   Motivation
         Working with large datasets in R can be cumbersome because of the
         need to keep objects in physical memory. While many might
         generally see that as a feature of the system, the need to keep whole
         objects in memory creates challenges to those who might want to
         work interactively with large datasets
         Here we take a simple definition of "large dataset"to be any dataset
         that cannot be loaded into R as a single R object because of
         memory limitations. For example, a very large data frame might be
         too large for all of the columns and rows to be loaded at once. In
         such a situation, one might load only a subset of the rows or
         columns, if that is possible




                               Kutergin A.   High performance computing with R
Working with vary large datasets
Package filehash


   Description
   The filehash package provides a full read-write implementation of a
   key-value database for R. The package does not depend on any external
   packages or software systems and is written entirely in R, making it
   readily usable on most platforms. The filehash package can be thought of
   as a specific implementation of the database concept, taking a slightly
   different approach to the problem

   Technical Note
   Key-value databases are sometimes called hash tables. With filehash the
   values are stored in a file on the disk rather than in memory. When a user
   requests the values associated with a key, filehash finds the object on the
   disk, loads the value into R and returns it to the user. The package offers
   two formats for storing data on the disk: The values can be stored (1)
   concatenated together in a single file or (2) separately as a directory of
   files

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Package filehash


   Description
   The filehash package provides a full read-write implementation of a
   key-value database for R. The package does not depend on any external
   packages or software systems and is written entirely in R, making it
   readily usable on most platforms. The filehash package can be thought of
   as a specific implementation of the database concept, taking a slightly
   different approach to the problem

   Technical Note
   Key-value databases are sometimes called hash tables. With filehash the
   values are stored in a file on the disk rather than in memory. When a user
   requests the values associated with a key, filehash finds the object on the
   disk, loads the value into R and returns it to the user. The package offers
   two formats for storing data on the disk: The values can be stored (1)
   concatenated together in a single file or (2) separately as a directory of
   files

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Package filehash


   Description
   The filehash package provides a full read-write implementation of a
   key-value database for R. The package does not depend on any external
   packages or software systems and is written entirely in R, making it
   readily usable on most platforms. The filehash package can be thought of
   as a specific implementation of the database concept, taking a slightly
   different approach to the problem

   Technical Note
   Key-value databases are sometimes called hash tables. With filehash the
   values are stored in a file on the disk rather than in memory. When a user
   requests the values associated with a key, filehash finds the object on the
   disk, loads the value into R and returns it to the user. The package offers
   two formats for storing data on the disk: The values can be stored (1)
   concatenated together in a single file or (2) separately as a directory of
   files

                              Kutergin A.   High performance computing with R
Working with vary large datasets
Filehash usage examples




    #Connecting library
    l i b r a r y ( filehash )
    #Creating hash-database on HDD
    DATA _ PATH < -
          " E : / R _ works / file _ hash _ data _ s t r o r a g e / db _ test "
    DATA _ PATH
    dbCreate ( DATA _ PATH )
    #Initializing link to our hash-database
    db < - dbInit ( DATA _ PATH )
    #Load matrix to our database
    #Dimantions
    its = 3000000; d i m = 10
    dbInsert ( db , " o u r _ b i g _ m a t r i x " ,
          m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) )




                                 Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Some useful references



    This are some useful links:
         The book The Art of R programming -
         http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
         The book Econometrix in R - http://cran.r-project.org/doc/
         contrib/Farnsworth-EconometricsInR.pdf
         R Installation and Administration -
         http://cran.r-project.org/doc/manuals/R-admin.html
         Very interesting presentation about HPC in R!!! -
         http://www.slideshare.net/bytemining/r-hpc
         Integrated storage of R-posts - http://www.r-bloggers.com/
         Page of the commercial R-project -
         http://www.revolutionanalytics.com/
    There are many other sites... If you have a problem, just ask Googl: How
    to "here formulation of your problem"in R


                                  Kutergin A.   High performance computing with R
Final words, some useful references and contacts
Final words and contacts




    Well... this presentation is only the beginning of my work in this direction.
    This is only my first try. I will continue this work and will be adding
    future versions of this presentation with new materials and examples as
    soon as i have more free time. Also, about quality of this version of the
    presentation... It is my first experience with LaTex system, so don’t judge
    me harshly. If you are interesting in this scope or have some ideas, you
    can just write me. I am open for discussion. This is my contacts list:
         email: aleksey.v.kutergin@gmail.com
         facebook page: facebook.com/aleksey.kutergin
         vk page: vk.com/aleksey_v_kutergin




                                Kutergin A.   High performance computing with R

More Related Content

PDF
peRm R group. Review of packages for r for market data downloading and analysis
PDF
Banking Congress in Russia
PPTX
Seminar psu 05.04.2013
PPTX
Изменение финансового поведения в условиях нестабильности внешней среды
PPTX
Финансовая грамотность россиян и потребительское поведение на рынке финансовы...
PDF
OZON.ru подвел итоги 2016 года
PDF
Жизнь в сети. Как россияне покупают, платят и зарабатывают деньги
PDF
Financial market simulation based on zero intelligence models
peRm R group. Review of packages for r for market data downloading and analysis
Banking Congress in Russia
Seminar psu 05.04.2013
Изменение финансового поведения в условиях нестабильности внешней среды
Финансовая грамотность россиян и потребительское поведение на рынке финансовы...
OZON.ru подвел итоги 2016 года
Жизнь в сети. Как россияне покупают, платят и зарабатывают деньги
Financial market simulation based on zero intelligence models

Similar to HPC in R (9)

PPTX
High Performance Predictive Analytics in R and Hadoop
PDF
Open source analytics
PDF
High Performance Predictive Analytics in R and Hadoop
PDF
Big Data - Analytics with R
PPTX
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
PDF
High Performance Predictive Analytics in R and Hadoop
PDF
Big Data Processing: Performance Gain Through In-Memory Computation
PPTX
Integration Method of R and Hadoop and Intro
PDF
Dataframes Showdown (miniConf 2022)
High Performance Predictive Analytics in R and Hadoop
Open source analytics
High Performance Predictive Analytics in R and Hadoop
Big Data - Analytics with R
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
High Performance Predictive Analytics in R and Hadoop
Big Data Processing: Performance Gain Through In-Memory Computation
Integration Method of R and Hadoop and Intro
Dataframes Showdown (miniConf 2022)
Ad

More from Vyacheslav Arbuzov (6)

PPTX
Seminar PSU 10.10.2014 mme
PDF
Perm winter school 2014.01.31
PPTX
Seminar psu 21.10.2013 financial bubble diagnostics based on log-periodic p...
PPTX
Seminar psu 20.10.2013
PPTX
Lppl models MiFIT 2013: Vyacheslav Arbuzov
PPTX
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 10.10.2014 mme
Perm winter school 2014.01.31
Seminar psu 21.10.2013 financial bubble diagnostics based on log-periodic p...
Seminar psu 20.10.2013
Lppl models MiFIT 2013: Vyacheslav Arbuzov
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Ad

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Approach and Philosophy of On baking technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
The AUB Centre for AI in Media Proposal.docx
Empathic Computing: Creating Shared Understanding
Dropbox Q2 2025 Financial Results & Investor Presentation
Chapter 3 Spatial Domain Image Processing.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Big Data Technologies - Introduction.pptx
Approach and Philosophy of On baking technology
“AI and Expert System Decision Support & Business Intelligence Systems”
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf

HPC in R

  • 1. Taking R on limit Kutergin Alex Perm State University, MiFIT 16 october 2012 Kutergin A. High performance computing with R
  • 2. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 3. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 4. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 5. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 6. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 7. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 8. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 9. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 10. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 11. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 12. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 13. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 14. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 15. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 16. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 17. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 18. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 19. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 20. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 21. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 22. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 23. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 24. Outline 1 General words about R 2 Motivation and scope 3 The basic ways of speeding up the R-code 4 The special way of speeding up the R-code: package pnmath 5 Problem of data splitting: package iterator 6 Parallel computation with R: high-level parallelism (packages: parallel, snow and additional packages) 7 Parallel computation with R: low-level parallelism (package: Rmpi) 8 Parallel computation with R: parallel execution of for-loops (package: foreach) 9 Parallel computation with R: parallel computation with graphical processing unit (package: gputools) 10 Working with vary large datasets: package filehash and package bigmemory 11 Final words, some useful references and contacts Kutergin A. High performance computing with R
  • 25. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 26. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 27. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 28. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 29. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 30. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 31. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 32. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 33. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 34. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 35. General words about R R software R is free powerful software for data analysis and statistical computing. R - console application with its own programming language running in interpreter mode. Lack of sophisticated GUI provides a number of advantages: there is no need to learn which algorithm is behind each button you can just learn the basic principles of R-programming and effectively solve complex problems using R-programming language Download R R can be downloaded from following link: http://cran.r-project.org/ Project page: www.r-project.org Kutergin A. High performance computing with R
  • 36. General words about R View of R work session Kutergin A. High performance computing with R
  • 37. General words about R packages and information sources There are two sources of happiness for R-programmer Source of information Source of packages Kutergin A. High performance computing with R
  • 38. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 39. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 40. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 41. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 42. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 43. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 44. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 45. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 46. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 47. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 48. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 49. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 50. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 51. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 52. Motivation and scope Motivation Computers become more productive. Progress in computer’s hardware and software is amazing. These computing power became available even in a laptop Constantly increasing growth of data’s volume and the complexity of problems associated with data processing The emergence of multi-core PCs and CUDA technology Scope We: simple students or not powerful guys. So we don’t have supercomputer We have Core i5 or Core i7 or another multi-core laptop or PC with support of CUDA technology We have some computational tasks and we want to solve them more effectively Kutergin A. High performance computing with R
  • 53. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 54. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 55. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 56. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 57. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 58. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 59. The basic ways of speeding up the R-code How to check time of code’s execution? First way to check time of code execution #return CPU (and other) times that expr used s y s t e m . t i m e () s y s t e m . t i m e ( s u m ( r u n i f (10000000) ) ) Second way to check time of code execution #determines how much real and CPU time (in seconds) the currently running R process has already taken p r o c . t i m e () s t a r t _ t i m e < - p r o c . t i m e () s u m ( r u n i f (10000000) ) e n d _ t i m e < - p r o c . t i m e () - s t a r t _ t i m e Kutergin A. High performance computing with R
  • 60. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Function’s profile Let us compare work of universal function lm() and more specific function lm.fit() #Loading some dataset d a t a ( longley ) #Recording profile to file lm.out Rprof ( " l m . o u t " ) #Runnig lm() 1000 times i n v i s i b l e ( r e p l i c a t e (1000 , l m ( Employed ~ . -1 , d a t a = longley ) ) ) #Switch off profiling Rprof ( NULL ) Kutergin A. High performance computing with R
  • 61. The basic ways of speeding up the R-code Analysis of the effectiveness of programs #Preparing data for lm.fit() longleydm < - d a t a . m a t r i x ( d a t a . f r a m e ( longley ) ) #Recording profile to file lm.fit.out Rprof ( " l m . f i t . o u t " ) #Runnig lm.fit() 1000 times i n v i s i b l e ( r e p l i c a t e (1000 , l m . fit ( longleydm [ , -7] , longleydm [ ,7]) ) ) #Switch off profiling Rprof ( NULL ) #Results of profiling summaryRprof ( " l m . o u t " ) $ sampling . t i m e [1] 3.12 summaryRprof ( " l m . f i t . o u t " ) $ sampling . t i m e [1] 0.18 #What a difference! Kutergin A. High performance computing with R
  • 62. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Package profr This package allows you to visualize the results of profiling library (" profr ") p l o t ( p a r s e _ rprof ( " l m . o u t " ) , main = " P r o f i l e ␣ o f ␣ lm () ") p l o t ( p a r s e _ rprof ( " l m . f i t . o u t " ) , main = " P r o f i l e ␣ of ␣ lm . fit () ") Package proftools This package allows you to visualize call graph for a function l i b r a r y (" R g r a p h v i z "); l i b r a r y (" p r o f t o o l s ") lmfitprod < - readProfileData ( " l m . f i t . o u t " ) pl o t P r o f i l e C al l Gr a p h ( lmfitprod ) Kutergin A. High performance computing with R
  • 63. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Kutergin A. High performance computing with R
  • 64. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Сall graph Kutergin A. High performance computing with R
  • 65. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Another example of profiling: its = 2500; d i m = 1750 X = m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) my . cross . p r o d < - f u n c t i o n ( X ) { C = m a t r i x (0 , n c o l ( X ) , n c o l ( X ) ) f o r ( i in 1: n r o w ( X ) ) { C = C + X [i ,] % o % X [i ,] } return (C) } l i b r a r y ( proftools ) C = my . cross . p r o d ( X ) C1 = t ( X ) % * % X C2 = c r o s s p r o d ( X ) Rprof ( NULL ) p r i n t ( a l l . e q u a l ( C , C1 , C2 ) ) Kutergin A. High performance computing with R
  • 66. The basic ways of speeding up the R-code Analysis of the effectiveness of programs Result: l i b r a r y ( proftools ) profile . data <- readProfileData ( " m a t r i x - m u l t . o u t " ) flatProfile ( p r o f i l e . d a t a ) / total . pct total . t i m e self . pct self . t i m e my . cross . p r o d 87.31 88.36 0.04 0.04 + 49.84 50.44 49.84 50.44 %o% 37.37 37.82 0.00 0.00 outer 37.37 37.82 37.27 37.72 %*% 7.75 7.84 7.75 7.84 crossprod 4.86 4.92 4.86 4.92 t 0.16 0.16 0.06 0.06 t. default 0.10 0.10 0.10 0.10 matrix 0.06 0.06 0.06 0.06 as . vector 0.02 0.02 0.02 0.02 Kutergin A. High performance computing with R
  • 67. The basic ways of speeding up the R-code Vectorization of code Note! Loops in R are slow! You can speed up your code by using operation with vectors and matrix. It’s another style of programming, but you have to use it! #Simple example of vectorization: #component-wise addition of two vectors #Generating some random data #First vector a < - r n o r m ( n = 10000000) #Second vector b < - r n o r m ( n = 10000000) #Vector for result x < - r e p (0 , l e n g t h ( a ) ) Kutergin A. High performance computing with R
  • 68. The basic ways of speeding up the R-code Vectorization of code So, what about results? #Slow way time _1 <- system . time ( f o r ( i in 1: l e n g t h ( a ) ) { x [ i ] < - a [ i ]+ b [ i ] } ) ; t i m e _ 1[3] 36.97 #Fast way t i m e _ 2 < - s y s t e m . t i m e ( x < - a + b ) ; t i m e _ 2[3] 0.04 Acceleration < - t i m e _ 1[3] / t i m e _ 2[3] Acceleration 924.25 #That’s hot!!!! Kutergin A. High performance computing with R
  • 69. The basic ways of speeding up the R-code Using magic of linear algebra Using linear algebra operations #Scalar product #Slow way s t a r t < - p r o c . t i m e () res < - 0 f o r ( i in 1: l e n g t h ( a ) ) { res < - res + a [ i ] * b [ i ] } e n d < - p r o c . t i m e () - s t a r t ; e n d [3] 16.71 #Fast s y s t e m . t i m e ( a % * % b ) [3] 0.09 #Even faster... s y s t e m . t i m e ( s u m ( a * b ) ) [3] 0.08 Kutergin A. High performance computing with R
  • 70. The basic ways of speeding up the R-code Using magic of linear algebra Using linear algebra operations #Matrix multiplication slow version its < - 2500; d i m < - 1750; X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) X _ transp < - t ( X ) res < - a r r a y ( NA , d i m = c (1750 , 1750) ) s t a r t < - p r o c . t i m e () f o r ( i in 1: n r o w ( X _ transp ) ) { f o r ( j in 1: n c o l ( X ) ) { res [i , j ] < - s u m ( X _ transp [i ,] * X [ , j ]) } } e n d < - p r o c . t i m e () - s t a r t ; e n d [3] 221.67 Kutergin A. High performance computing with R
  • 71. The basic ways of speeding up the R-code Using magic of linear algebra Package BLAS BLAS means: Basic Linear Algebra Subprogram. This package contains the optimized algorithms for linear algebra operations and uses all cores of multi-core machine automatically. #Matrix multiplication fast version #BLAS matrix mult s y s t e m . t i m e ( X _ transp % * % X ) [3] 7.77 #Even faster... s y s t e m . t i m e ( c r o s s p r o d ( X ) ) [3] 4.98 Kutergin A. High performance computing with R
  • 72. The basic ways of speeding up the R-code Using build-in R-functions Package base You can find full list of build-in R-function in the documentation for this package #Let us define a function mySum < - f u n c t i o n ( N ) { sumVal < - 0 f o r ( i in 1: N ) { sumVal < - sumVal + i } r e t u r n ( sumVal ) } s y s t e m . t i m e ( mySum (1000000) ) [3] 0.62 s y s t e m . t i m e ( s u m ( a s . n u m e r i c ( s e q (1 , 1000000) ) ) ) [3] 0.05 Kutergin A. High performance computing with R
  • 73. The basic ways of speeding up the R-code Using build-in R-functions Why are build R-functions faster? R programming language works in interpreter mode. This is always slowly than using the compiled code. So, when you call build-in R-function, you call optimized and compiled code. Also build-in functions are written in more low-level programming language (like C/C++ or FORTRAN) and this provides greater access to the capabilities of the hardware Note! You can select data from vector, matrix, data.frame or array using some condition that applies to row or column of data object. It’s fast and convenient #Extracting only positive values from first column of X its < - 2500; d i m < - 1750; X < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) X [ X [ ,1] >0 , 1] Kutergin A. High performance computing with R
  • 74. The special way of speeding up the R-code Package pnmath Another easy way to get a speed-up is to use the pnmath package in R. This package takes many of the standard math functions in R and replaces them with multi-threaded versions, using OpenMP. Some functions get more of a speed-up than others with pnmath. #Generating random data v1 < - r u n i f (1000) v2 < - r u n i f (100000000) #Time of execution without pnmath s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) ) s y s t e m . t i m e ( e x p ( v2 ) ) s y s t e m . t i m e ( s q r t ( v2 ) ) #Time of execution with pnmath l i b r a r y ( pnmath ) s y s t e m . t i m e ( q t u k e y ( v1 ,2 ,3) ) s y s t e m . t i m e ( e x p ( v2 ) ) s y s t e m . t i m e ( s q r t ( v2 ) ) Kutergin A. High performance computing with R
  • 75. Problem of data splitting Our problem: Before you start the calculation you need to split your data set according the number of threads. Another reason is more effective data processing in loops Package iterator The iterators package provides tools for iterating over various R data structures. Iterators are available for vectors, lists, matrices, arrays, data frames and files. By following very simple conventions, new iterators can be written to support any type of data source, such as database queries or dynamically generating data Download You can download this useful package from CRAN (available for Windows!): http: //cran.r-project.org/web/packages/iterators/index.html Kutergin A. High performance computing with R
  • 76. Problem of data splitting Our problem: Before you start the calculation you need to split your data set according the number of threads. Another reason is more effective data processing in loops Package iterator The iterators package provides tools for iterating over various R data structures. Iterators are available for vectors, lists, matrices, arrays, data frames and files. By following very simple conventions, new iterators can be written to support any type of data source, such as database queries or dynamically generating data Download You can download this useful package from CRAN (available for Windows!): http: //cran.r-project.org/web/packages/iterators/index.html Kutergin A. High performance computing with R
  • 77. Problem of data splitting Our problem: Before you start the calculation you need to split your data set according the number of threads. Another reason is more effective data processing in loops Package iterator The iterators package provides tools for iterating over various R data structures. Iterators are available for vectors, lists, matrices, arrays, data frames and files. By following very simple conventions, new iterators can be written to support any type of data source, such as database queries or dynamically generating data Download You can download this useful package from CRAN (available for Windows!): http: //cran.r-project.org/web/packages/iterators/index.html Kutergin A. High performance computing with R
  • 78. Problem of data splitting Our problem: Before you start the calculation you need to split your data set according the number of threads. Another reason is more effective data processing in loops Package iterator The iterators package provides tools for iterating over various R data structures. Iterators are available for vectors, lists, matrices, arrays, data frames and files. By following very simple conventions, new iterators can be written to support any type of data source, such as database queries or dynamically generating data Download You can download this useful package from CRAN (available for Windows!): http: //cran.r-project.org/web/packages/iterators/index.html Kutergin A. High performance computing with R
  • 79. Problem of data splitting: package iterators Capabilities icount(count) This method returns the iterator that counts starting from one. Count - number of times that iterator will be fire. If not specified, it will count forever nextElem() This function returns next value of pre-define iterator. When the iterator has no more values, it calls stop with massage "StopIteration" l i b r a r y ( iterators ) #create an iterator that counts from 1 to 3. it < - icount (2) nextElem ( it ) Example: [1] 1 nextElem ( it ) [1] 2 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R
  • 80. Problem of data splitting: package iterators Capabilities You can create iterators by rows of your data structure using iter() function: l i b r a r y ( iterators ) #Creating iterator by rows of data set irState < - iter ( state . x77 , b y = " r o w " ) nextElem ( irState ) Population Income Illiteracy Life Murder Area Alabama 3615 3624 2.1 69.05 15.1 50708 nextElem ( irState ) Population Income Illiteracy Life Murder Area Alaska 365 6315 1.5 69.31 11.3 566432 nextElem ( irState ) Population Income Illiteracy Life Murder Area Arizona 2212 4530 1.8 70.55 7.8 113417 Kutergin A. High performance computing with R
  • 81. Problem of data splitting: package iterators Capabilities You can create iterators by columns of your data structure using iter() #Creating iterator by columns of data set icState < - iter ( state . x77 , b y = " c o l " ) nextElem ( icState ) Population Alabama 3615 Alaska 365 Arizona 2212 nextElem ( icState ) function: Illiteracy Alabama 2.1 Alaska 1.5 Arizona 1.8 nextElem ( icState ) Income Alabama 3624 Alaska 6315 Arizona 4530 Kutergin A. High performance computing with R
  • 82. Problem of data splitting: package iterators Capabilities You can create iterators using iter() function from data object returned by some other function: l i b r a r y ( iterators ) #Define a function, wich generate random data GetDataStructure < - f u n c t i o n ( meanVal1 , meanVal2 , sdVal1 , sdVal2 ) { a < - r n o r m (4 , m e a n = meanVal1 , s d = sdVal1 ) b < - r n o r m (4 , m e a n = meanVal2 , s d = sdVal2 ) data <- a%o%b return ( data ) } ifun < - iter ( GetDataStructure (25 ,27 ,2.5 ,3.5) , b y = " r o w " ) nextElem ( ifun ) ; nextElem ( ifun ) [ ,1] [ ,2] [ ,3] [ ,4] [1 ,] 701.7055 939.6574 764.7724 799.6965 [ ,1] [ ,2] [ ,3] [ ,4] [1 ,] 647.6349 867.2512 705.8422 738.0752 Kutergin A. High performance computing with R
  • 83. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 84. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 85. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 86. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 87. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 88. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 89. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 90. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 91. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 92. Problem of data splitting: package iterators Capabilities idiv(n, chunk, chunksize) This is more interesting iterator. It provides the ability to divide a numeric value into pieces n - number of times that iterator will fire. If not specified, it will count forever chunks - the number of pieces that n should be divided into. It useful when you know the number of pieces that you want. If specified, the chunkSize should not be chunkSize - the maximum size of the pieces, that n should be divided into. It is useful when you know the size of the pieces that you want. If specified, the chunk should not be Some thoughts... However, practical application of this iterator is unclear. Perhaps it can be used to index vector or rows/columns of arrays Kutergin A. High performance computing with R
  • 93. Problem of data splitting: package iterators Capabilities Example: l i b r a r y ( iterators ) # divide the value 10 into 3 pieces it < - idiv (10 , chunks =3) nextElem ( it ) [1] 4 nextElem ( it ) [1] 3 nextElem ( it ) [1] 3 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R
  • 94. Problem of data splitting: package iterators Capabilities Example: l i b r a r y ( iterators ) # divide the value 10 into pieces no larger than 3 it < - idiv (10 , chunkSize =3) nextElem ( it ) [1] 3 nextElem ( it ) [1] 3 nextElem ( it ) [1] 2 nextElem ( it ) [1] 2 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R
  • 95. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 96. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 97. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 98. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 99. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 100. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 101. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 102. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 103. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 104. Problem of data splitting: package iterators Capabilities iread.table(file,...., verbose = FALSE) This is very important iterator. It returns an iterator object over the rows of the data frame stored in a file in table format file - the name of the file to read data from ... - all additional arguments are passed on to the read.table function. See the documentation for read.table for more information verbose - logical flag indicating whether or not to print the calls to read.table Note! In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of this arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specified those arguments explicitly Kutergin A. High performance computing with R
  • 105. Problem of data splitting: package iterators Capabilities Example: l i b r a r y ( iterators ) #Gnerating random data its < - 2000000; d i m < - 3; d a t a < - m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) #Writing them to HDD DATA _ PATH < - " E : / R _ w o r k s / d a t a . t x t " #Size of this file - 123 Mb w r i t e . t a b l e ( d a t a , f i l e = DATA _ PATH , a p p e n d = FALSE , sep = " t " , dec = " . " ) Kutergin A. High performance computing with R
  • 106. Problem of data splitting: package iterators Capabilities #Creating an iterator from these file ifile < - iread . t a b l e ( DATA _ PATH , header = TRUE , r o w . n a m e s = NULL , verbose = FALSE ) row . names V1 V2 V3 1 1 -1.042623 -1.386382 0.399798 > nextElem ( ifile ) row . names V1 V2 V3 1 2 0.8841238 -1.296501 0.1580505 > nextElem ( ifile ) row . names V1 V2 V3 1 3 -0.3195784 -0.6830442 0.3647958 #It works very fast!!!! #remove the file f i l e . r e m o v e ( DATA _ PATH ) Kutergin A. High performance computing with R
  • 107. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 108. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 109. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 110. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 111. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 112. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 113. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 114. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 115. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 116. Problem of data splitting: package iterators Capabilities isplit(x, f, drop = FALSE) Another important type of iterator. It returns the the iterator that divides the data in the vector x into the groups define by f x - vector or data frame of values to be split into groups f - a factor or list of factors used to categorize x drop - logical indicating if levels that do not occur should be dropped More detailed information you can find in documentation Note! This is very useful! For example, you have data-vector and vector containing values of the factor corresponding these data. Factor has pre-defined levels. Thus, you can extract data in loop for each of the levels of the factor without additional operations. Also you can define in loop’s body some conditions for each level of the factor and use this condition as a condition for if() control structures Kutergin A. High performance computing with R
  • 117. Problem of data splitting: package iterators Capabilities x < - r n o r m (200) f < - f a c t o r ( s a m p l e (1:10 , l e n g t h ( x ) , r e p l a c e = TRUE ) ) it < - isplit (x , f ) nextElem ( it ) $ value [1] 0.14087878 -0.94439161 0.13593045 [4] -0.25732860 0.09422130 -0.55166303 [7] -0.18325419 -0.00871019 0.38344388 [10] -1.05761926 1.16126462 -0.02280205 [13] -0.67338941 1.68724264 0.92112983 [16] 1.39782337 -0.51060989 $ key $ key [[1]] [1] " 1 " Kutergin A. High performance computing with R
  • 118. Problem of data splitting: package iterators Capabilities Special types of iterators Also there are special types of iterators. Like: irnorm(..., cont) or irunif(..., count). These function returns an iterator that return random number of various distributions. Each one is a wrapper around a standard R function count - number of times that the iterator will fire. If not specified, it will fire values forever ... - arguments to pass to the underling rnorm function Example: # create an iterator that returns three random numbers it < - irnorm (1 , c o u n t =2) nextElem ( it ) ; nextElem ( it ) [1] 0.1592311 [1] -1.387449 t r y ( nextElem ( it ) ) # expect a StopIteration exception Error : StopIteration Kutergin A. High performance computing with R
  • 119. Parallel computation with R: high-level parallelism packages: parallel, snow Scope High-level parallelism means that you do not need to define ideology of communication between thread. Which process is master, which processes are slaves? You only initialize parallel environment and work inside it. All the details are on the shoulders of the package’s methods Package: snow Package contains the basic function allow you to create different type of clusters on a multicore machine Package: parallel This package is an add-on packages multicore and snow and provides drop- in replacements for most of the functionality of those packages Kutergin A. High performance computing with R
  • 120. Parallel computation with R: high-level parallelism packages: parallel, snow Scope High-level parallelism means that you do not need to define ideology of communication between thread. Which process is master, which processes are slaves? You only initialize parallel environment and work inside it. All the details are on the shoulders of the package’s methods Package: snow Package contains the basic function allow you to create different type of clusters on a multicore machine Package: parallel This package is an add-on packages multicore and snow and provides drop- in replacements for most of the functionality of those packages Kutergin A. High performance computing with R
  • 121. Parallel computation with R: high-level parallelism packages: parallel, snow Scope High-level parallelism means that you do not need to define ideology of communication between thread. Which process is master, which processes are slaves? You only initialize parallel environment and work inside it. All the details are on the shoulders of the package’s methods Package: snow Package contains the basic function allow you to create different type of clusters on a multicore machine Package: parallel This package is an add-on packages multicore and snow and provides drop- in replacements for most of the functionality of those packages Kutergin A. High performance computing with R
  • 122. Parallel computation with R: high-level parallelism packages: parallel, snow Scope High-level parallelism means that you do not need to define ideology of communication between thread. Which process is master, which processes are slaves? You only initialize parallel environment and work inside it. All the details are on the shoulders of the package’s methods Package: snow Package contains the basic function allow you to create different type of clusters on a multicore machine Package: parallel This package is an add-on packages multicore and snow and provides drop- in replacements for most of the functionality of those packages Kutergin A. High performance computing with R
  • 123. Parallel computation with R: high-level parallelism package: parallel Description The landscape of parallel computing has changed with the advent of shared-memory computers with multiple (and often many) CPU cores. Until the late 2000’s parallel computing was mainly done on clusters of large numbers of single- or dual-CPU computers: nowadays even laptops have two or four cores, and servers with 8 or more cores are commonplace. It is such hardware that package parallel is designed to exploit. It can also be used with several computers running the same version of R connected by (reasonable-speed) ethernet: the computers need not be running the same OS Scope Parallelism can be done in computation at many different levels: this package is principally concerned with "coarse-grained parallelization" Kutergin A. High performance computing with R
  • 124. Parallel computation with R: high-level parallelism package: parallel Description The landscape of parallel computing has changed with the advent of shared-memory computers with multiple (and often many) CPU cores. Until the late 2000’s parallel computing was mainly done on clusters of large numbers of single- or dual-CPU computers: nowadays even laptops have two or four cores, and servers with 8 or more cores are commonplace. It is such hardware that package parallel is designed to exploit. It can also be used with several computers running the same version of R connected by (reasonable-speed) ethernet: the computers need not be running the same OS Scope Parallelism can be done in computation at many different levels: this package is principally concerned with "coarse-grained parallelization" Kutergin A. High performance computing with R
  • 125. Parallel computation with R: high-level parallelism package: parallel Description The landscape of parallel computing has changed with the advent of shared-memory computers with multiple (and often many) CPU cores. Until the late 2000’s parallel computing was mainly done on clusters of large numbers of single- or dual-CPU computers: nowadays even laptops have two or four cores, and servers with 8 or more cores are commonplace. It is such hardware that package parallel is designed to exploit. It can also be used with several computers running the same version of R connected by (reasonable-speed) ethernet: the computers need not be running the same OS Scope Parallelism can be done in computation at many different levels: this package is principally concerned with "coarse-grained parallelization" Kutergin A. High performance computing with R
  • 126. Parallel computation with R: high-level parallelism package: parallel Computational model This package handles running much larger chunks of computations in parallel. The crucial point is that these chunks of computation are unrelated and do not need to communicate in any way. It is often the case that the chunks take approximately the same length of time. The basic computational model is ( a ) Start up M "worker"processes, and do any initialization needed on the workers ( b ) Send any data required for each task to the workers ( c ) Split the task into M roughly equally-sized chunks, and send the chunks (including the Rcode needed) to the workers ( d ) Wait for all the workers to complete their tasks, and ask them for their results ( e ) Repeat steps (b - d) for any further tasks ( f ) Shut down the worker processes Kutergin A. High performance computing with R
  • 127. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 128. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 129. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 130. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 131. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 132. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 133. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 134. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 135. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 136. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 137. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 138. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 139. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 140. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 141. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 142. Parallel computation with R: high-level parallelism package: Snow Description Package contains the basic function allow you to create different type of clusters on a multicore machine. Like makeSOCKcluster(names, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) Also it contains specific functions for computing on SNOW clusters. Like: clusterCall(cl, fun, ...) calls a function fun with identical arguments ... on each node in the cluster cl and returns a list of the results clusterEvalQ(cl, expr) evaluates a literal expression on each cluster node. It is a cluster version of evalq clusterApply(cl, x, fun, ...) calls fun on the first cluster node with arguments seq[[1]] and ..., on the second node with arguments seq[[2]] and ..., and so on. It makes no sense to go into further syntax. All details you can find in documentation Kutergin A. High performance computing with R
  • 143. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not specified, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R
  • 144. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not specified, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R
  • 145. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not specified, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R
  • 146. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not specified, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R
  • 147. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not specified, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R
  • 148. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not specified, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R
  • 149. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not specified, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R
  • 150. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not specified, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R
  • 151. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not specified, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R
  • 152. Parallel computation with R: high-level parallelism packages: doParallel, doSNOW Package: doSNOW The registerDoSNOW(cl) function is used to register the SNOW parallel backend with the foreach package. Where cl - the cluster object to use for parallel execution Package: doParallel The registerDoParallel(cl, cores=NULL, ...) package provides a parallel backend for the foreach function using the parallel package. Where cl - a cluster object returned by makeCluster, or the number of cores to be created in the cluster. If not specified, on Windows a three worker cluster is created and used cores - the number of cores to use for parallel execution ... - package options Kutergin A. High performance computing with R
  • 153. Parallel computation with R: high-level parallelism Example of cluster based on parallel package l i b r a r y ( parallel ) l i b r a r y ( doParallel ) #Detect how many cores we have CoresCount < - detectCores () ; CoresCount [1] 4 > #Initializing the cluster cl < - makeCluster ( CoresCount ) ; cl s o c k e t cluster with 4 nodes o n host ‘’localhost #How many cores of our cluster we are going to use CoresCountFor eUse < - CoresCount ; Co resCountF oreUse [1] 4 #Register parallel backend regist erD oP ar all el ( cl , cores = Co resCount ForeUse ) #Some expresions #Stop our cluster stopCluster ( cl ) Kutergin A. High performance computing with R
  • 154. Parallel computation with R: high-level parallelism Example of cluster based on snow package l i b r a r y ( snow ) l i b r a r y ( doSNOW ) #Make socket cluster with four threads clSnow < - makeCluster ( c ( " l o c a l h o s t " , " l o c a l h o s t " , " l o c a l h o s t " , " l o c a l h o s t " ) , type = " SOCK ") clSnow s o c k e t cluster with 4 nodes o n host ‘’localhost registerDoSNOW ( clSnow ) #Some expresions #Stop our cluster stopCluster ( clSnow ) Kutergin A. High performance computing with R
  • 155. Parallel computation with R: low-level parallelism Package: Rmpi Description This is a basic tutorial on parallel programming in R using Rmpi, the MPI interface for R. This R package allow you to create R programs which run cooperatively in parallel across multiple machines, or multiple CPUs on one machine, to accomplish a goal more quickly than running a single program on one machine So... I have not worked with this package yet, thus I can’t say much about it. This work is on process Kutergin A. High performance computing with R
  • 156. Parallel computation with R: low-level parallelism Package: Rmpi Description This is a basic tutorial on parallel programming in R using Rmpi, the MPI interface for R. This R package allow you to create R programs which run cooperatively in parallel across multiple machines, or multiple CPUs on one machine, to accomplish a goal more quickly than running a single program on one machine So... I have not worked with this package yet, thus I can’t say much about it. This work is on process Kutergin A. High performance computing with R
  • 157. Parallel computation with R: low-level parallelism Package: Rmpi Description This is a basic tutorial on parallel programming in R using Rmpi, the MPI interface for R. This R package allow you to create R programs which run cooperatively in parallel across multiple machines, or multiple CPUs on one machine, to accomplish a goal more quickly than running a single program on one machine So... I have not worked with this package yet, thus I can’t say much about it. This work is on process Kutergin A. High performance computing with R
  • 158. Parallel computation with R: parallel execution of for-loops Package: foreach Motivation In many practical cases it is impossible to avoid the usage of loop. Loops are slow and it will be great to reach the speed of loop’s execution Description The foreach package provides new looping construct for executing R code repeatedly. The main reason for using the foreach package is that it supports parallel execution. The foreach package can be used with a variety of different parallel computing systems, include NetWorkSpaces and snow. In addition, foreach can be used with iterators, which allows the data to specified in a very flexible way Note! Foreach structures work in parallel only inside initialized parallel environment! You can used it in parallel only inside parallel or snow clusters Kutergin A. High performance computing with R
  • 159. Parallel computation with R: parallel execution of for-loops Package: foreach Motivation In many practical cases it is impossible to avoid the usage of loop. Loops are slow and it will be great to reach the speed of loop’s execution Description The foreach package provides new looping construct for executing R code repeatedly. The main reason for using the foreach package is that it supports parallel execution. The foreach package can be used with a variety of different parallel computing systems, include NetWorkSpaces and snow. In addition, foreach can be used with iterators, which allows the data to specified in a very flexible way Note! Foreach structures work in parallel only inside initialized parallel environment! You can used it in parallel only inside parallel or snow clusters Kutergin A. High performance computing with R
  • 160. Parallel computation with R: parallel execution of for-loops Package: foreach Motivation In many practical cases it is impossible to avoid the usage of loop. Loops are slow and it will be great to reach the speed of loop’s execution Description The foreach package provides new looping construct for executing R code repeatedly. The main reason for using the foreach package is that it supports parallel execution. The foreach package can be used with a variety of different parallel computing systems, include NetWorkSpaces and snow. In addition, foreach can be used with iterators, which allows the data to specified in a very flexible way Note! Foreach structures work in parallel only inside initialized parallel environment! You can used it in parallel only inside parallel or snow clusters Kutergin A. High performance computing with R
  • 161. Parallel computation with R: parallel execution of for-loops Package: foreach Motivation In many practical cases it is impossible to avoid the usage of loop. Loops are slow and it will be great to reach the speed of loop’s execution Description The foreach package provides new looping construct for executing R code repeatedly. The main reason for using the foreach package is that it supports parallel execution. The foreach package can be used with a variety of different parallel computing systems, include NetWorkSpaces and snow. In addition, foreach can be used with iterators, which allows the data to specified in a very flexible way Note! Foreach structures work in parallel only inside initialized parallel environment! You can used it in parallel only inside parallel or snow clusters Kutergin A. High performance computing with R
  • 162. Parallel computation with R: parallel execution of for-loops Operators used with foreach object Operator %do% It is a binary operator that operate on a foreach object and R expression. The expression is evaluated multiple times in an environment that is created by the foreach object, and that environment is modified for each evaluation as specified by the foreach object. %do% evaluate the expression sequentially. The results of evaluating expression are returned as a list by default Operator %dopar% %dopar% is a parallel version of %do% operator. It evaluates expression in parallel Operator %:% The operator %:% is called nested operator. It is a binary operator used to merge two foreach objects into single structure Kutergin A. High performance computing with R
  • 163. Parallel computation with R: parallel execution of for-loops Operators used with foreach object Operator %do% It is a binary operator that operate on a foreach object and R expression. The expression is evaluated multiple times in an environment that is created by the foreach object, and that environment is modified for each evaluation as specified by the foreach object. %do% evaluate the expression sequentially. The results of evaluating expression are returned as a list by default Operator %dopar% %dopar% is a parallel version of %do% operator. It evaluates expression in parallel Operator %:% The operator %:% is called nested operator. It is a binary operator used to merge two foreach objects into single structure Kutergin A. High performance computing with R
  • 164. Parallel computation with R: parallel execution of for-loops Operators used with foreach object Operator %do% It is a binary operator that operate on a foreach object and R expression. The expression is evaluated multiple times in an environment that is created by the foreach object, and that environment is modified for each evaluation as specified by the foreach object. %do% evaluate the expression sequentially. The results of evaluating expression are returned as a list by default Operator %dopar% %dopar% is a parallel version of %do% operator. It evaluates expression in parallel Operator %:% The operator %:% is called nested operator. It is a binary operator used to merge two foreach objects into single structure Kutergin A. High performance computing with R
  • 165. Parallel computation with R: parallel execution of for-loops Operators used with foreach object Operator %do% It is a binary operator that operate on a foreach object and R expression. The expression is evaluated multiple times in an environment that is created by the foreach object, and that environment is modified for each evaluation as specified by the foreach object. %do% evaluate the expression sequentially. The results of evaluating expression are returned as a list by default Operator %dopar% %dopar% is a parallel version of %do% operator. It evaluates expression in parallel Operator %:% The operator %:% is called nested operator. It is a binary operator used to merge two foreach objects into single structure Kutergin A. High performance computing with R
  • 166. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .combine - function that is used to process the tasks results as they generated. This can be specified as a non-empty character string naming the function. Specifying "c"is useful to concatenating the results into a vector. The values "rbind"and "cbind"can combine vectors into matrix. The values "+"and "*"can used to process numeric data .inorder - logical flag indicating whether the .combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting .inorder to FALSE can give improved performance .multicombine - logical flag indicating whether .combine function can accept more then to arguments. If it can take more then two arguments, then setting .multicombine to TRUE could improve the performance Kutergin A. High performance computing with R
  • 167. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .combine - function that is used to process the tasks results as they generated. This can be specified as a non-empty character string naming the function. Specifying "c"is useful to concatenating the results into a vector. The values "rbind"and "cbind"can combine vectors into matrix. The values "+"and "*"can used to process numeric data .inorder - logical flag indicating whether the .combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting .inorder to FALSE can give improved performance .multicombine - logical flag indicating whether .combine function can accept more then to arguments. If it can take more then two arguments, then setting .multicombine to TRUE could improve the performance Kutergin A. High performance computing with R
  • 168. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .combine - function that is used to process the tasks results as they generated. This can be specified as a non-empty character string naming the function. Specifying "c"is useful to concatenating the results into a vector. The values "rbind"and "cbind"can combine vectors into matrix. The values "+"and "*"can used to process numeric data .inorder - logical flag indicating whether the .combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting .inorder to FALSE can give improved performance .multicombine - logical flag indicating whether .combine function can accept more then to arguments. If it can take more then two arguments, then setting .multicombine to TRUE could improve the performance Kutergin A. High performance computing with R
  • 169. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .combine - function that is used to process the tasks results as they generated. This can be specified as a non-empty character string naming the function. Specifying "c"is useful to concatenating the results into a vector. The values "rbind"and "cbind"can combine vectors into matrix. The values "+"and "*"can used to process numeric data .inorder - logical flag indicating whether the .combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting .inorder to FALSE can give improved performance .multicombine - logical flag indicating whether .combine function can accept more then to arguments. If it can take more then two arguments, then setting .multicombine to TRUE could improve the performance Kutergin A. High performance computing with R
  • 170. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .combine - function that is used to process the tasks results as they generated. This can be specified as a non-empty character string naming the function. Specifying "c"is useful to concatenating the results into a vector. The values "rbind"and "cbind"can combine vectors into matrix. The values "+"and "*"can used to process numeric data .inorder - logical flag indicating whether the .combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting .inorder to FALSE can give improved performance .multicombine - logical flag indicating whether .combine function can accept more then to arguments. If it can take more then two arguments, then setting .multicombine to TRUE could improve the performance Kutergin A. High performance computing with R
  • 171. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .combine - function that is used to process the tasks results as they generated. This can be specified as a non-empty character string naming the function. Specifying "c"is useful to concatenating the results into a vector. The values "rbind"and "cbind"can combine vectors into matrix. The values "+"and "*"can used to process numeric data .inorder - logical flag indicating whether the .combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting .inorder to FALSE can give improved performance .multicombine - logical flag indicating whether .combine function can accept more then to arguments. If it can take more then two arguments, then setting .multicombine to TRUE could improve the performance Kutergin A. High performance computing with R
  • 172. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .combine - function that is used to process the tasks results as they generated. This can be specified as a non-empty character string naming the function. Specifying "c"is useful to concatenating the results into a vector. The values "rbind"and "cbind"can combine vectors into matrix. The values "+"and "*"can used to process numeric data .inorder - logical flag indicating whether the .combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting .inorder to FALSE can give improved performance .multicombine - logical flag indicating whether .combine function can accept more then to arguments. If it can take more then two arguments, then setting .multicombine to TRUE could improve the performance Kutergin A. High performance computing with R
  • 173. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .combine - function that is used to process the tasks results as they generated. This can be specified as a non-empty character string naming the function. Specifying "c"is useful to concatenating the results into a vector. The values "rbind"and "cbind"can combine vectors into matrix. The values "+"and "*"can used to process numeric data .inorder - logical flag indicating whether the .combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting .inorder to FALSE can give improved performance .multicombine - logical flag indicating whether .combine function can accept more then to arguments. If it can take more then two arguments, then setting .multicombine to TRUE could improve the performance Kutergin A. High performance computing with R
  • 174. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .combine - function that is used to process the tasks results as they generated. This can be specified as a non-empty character string naming the function. Specifying "c"is useful to concatenating the results into a vector. The values "rbind"and "cbind"can combine vectors into matrix. The values "+"and "*"can used to process numeric data .inorder - logical flag indicating whether the .combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting .inorder to FALSE can give improved performance .multicombine - logical flag indicating whether .combine function can accept more then to arguments. If it can take more then two arguments, then setting .multicombine to TRUE could improve the performance Kutergin A. High performance computing with R
  • 175. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R
  • 176. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R
  • 177. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R
  • 178. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R
  • 179. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R
  • 180. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R
  • 181. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R
  • 182. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R
  • 183. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R
  • 184. Parallel computation with R: parallel execution of for-loops Main arguments of the foreach function Note! This is important .errorhandling - specifies how a task evalution error should be handled. If the value is "stop then execution will be stopped if an error occures. If the value is "remove the result for that task will not be returned, or passed to the .combine function. If it is "pass then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function will be able to deal with the error object .packages - character vector of packages that the tasks depend on .verbose - logical flag enabling verbose messages. This can be very useful for trouble shooting Further immersion As always, you can find all detailed information in documentation for this useful package Kutergin A. High performance computing with R
  • 185. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #sequentially t i m e _ s e q < - s y s t e m . t i m e ( f o r e a c h ( i =1:100) % d o % { s u m ( r u n i f (10000000) ) }) t i m e _ s e q [3] 31.06 #in parallel t i m e _ p a r < - s y s t e m . t i m e ( f o r e a c h ( i =1:100) % dopar % { s u m ( r u n i f (10000000) ) }) t i m e _ p a r [3] 15.25 #acceleration acceleration < - t i m e _ s e q [3] / t i m e _ p a r [3] acceleration elapsed 2.036721 Kutergin A. High performance computing with R
  • 186. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #sequentially t i m e _ s e q < - s y s t e m . t i m e ( f o r e a c h ( i =1:100) % do % { s u m ( s i n ( r u n i f (10000000) ) ) }) t i m e _ s e q [3] 87.46 #in parallel t i m e _ p a r < - s y s t e m . t i m e ( f o r e a c h ( i =1:100) % dopar % { s u m ( s i n ( r u n i f (10000000) ) ) }) t i m e _ p a r [3] 33.82 #acceleration acceleration < - t i m e _ s e q [3] / t i m e _ p a r [3] acceleration elapsed 2.586044 Kutergin A. High performance computing with R
  • 187. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Combine results as a vector foreachResult < - f o r e a c h ( i =1:100) % dopar % { s u m ( r u n i f (10000000) ) } c l a s s ( foreachResult ) [1] " l i s t " n r o w ( foreachResult ) NULL n c o l ( foreachResult ) NULL l e n g t h ( foreachResult ) [1] 100 Kutergin A. High performance computing with R
  • 188. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Combine results as matrix by columns > foreachResult2 < - f o r e a c h ( i =1:100 , . combine = " c b i n d " ) % dopar % { s u m ( r u n i f (10000000) ) } c l a s s ( foreachResult2 ) [1] " m a t r i x " n r o w ( foreachResult2 ) [1] 1 n c o l ( foreachResult2 ) [1] 100 Kutergin A. High performance computing with R
  • 189. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Combine results as matrix by columns foreachResult3 < - f o r e a c h ( i =1:100 , . combine = " r b i n d " ) % dopar % { s u m ( r u n i f (10000000) ) } c l a s s ( foreachResult3 ) [1] " m a t r i x " n r o w ( foreachResult3 ) [1] 100 n c o l ( foreachResult3 ) [1] 1 Kutergin A. High performance computing with R
  • 190. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #parallel, .multicombine = FALSE, .inorder = TRUE time1 < - s y s t e m . t i m e ( f o r e a c h ( i =1:100 , . combine = " r b i n d " , . multicombine = FALSE , . inorder = TRUE ) % dopar % { s u m ( r u n i f (10000000) ) }) time1 [3] elapsed 15.13 #parallel .multicombine = TRUE и .inorder = FALSE time2 < - s y s t e m . t i m e ( f o r e a c h ( i =1:100 , . combine = " r b i n d " , . multicombine = TRUE , . inorder = FALSE ) % dopar % { s u m ( r u n i f (10000000) ) }) time2 [3] elapsed 15.02 Kutergin A. High performance computing with R
  • 191. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #parallel, list as a result t i m e _ l i s t < - s y s t e m . t i m e ( f o r e a c h ( i =1:100) % dopar % { s u m ( r u n i f (10000000) ) }) t i m e _ l i s t [3] elapsed 15.24 acceleration < - time1 [3] / time2 [3] acceleration elapsed 1.007324 accelerationL ist1 < - t i m e _ l i s t [3] / time1 [3] accelerationL ist1 elapsed 1.00727 accelerationL ist2 < - t i m e _ l i s t [3] / time2 [3] accelerationL ist2 elapsed 1.014647 Kutergin A. High performance computing with R
  • 192. Parallel computation with R: parallel execution of for-loops Examples of foreach usage s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) % do % { f o r e a c h ( k =1:1000 , . combine = " c " , . multicombine = TRUE , . inorder = FALSE ) % do % { sin (i)* cos (k) } } e n d < - p r o c . t i m e () - s t a r t end 1.76 SomeResult [1] 0.6106603 Kutergin A. High performance computing with R
  • 193. Parallel computation with R: parallel execution of for-loops Examples of foreach usage s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) % do % { f o r e a c h ( k =1:1000 , . combine = " c " , . multicombine = TRUE , . inorder = FALSE ) % dopar % { sin (i)* cos (k) } } e n d < - p r o c . t i m e () - s t a r t end 35.79 SomeResult [1] 0.6106603 Kutergin A. High performance computing with R
  • 194. Parallel computation with R: parallel execution of for-loops Examples of foreach usage However, this construction does not work. It’s sad... #Not run s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) % dopar % { f o r e a c h ( k =1:1000 , . combine = " c " , . multicombine = TRUE , . inorder = FALSE ) % do % { sin (i)* cos (k) } } e n d < - p r o c . t i m e () - s t a r t end SomeResult Kutergin A. High performance computing with R
  • 195. Parallel computation with R: parallel execution of for-loops Examples of foreach usage So, how to execute four task (each has 10000000 iterations) into four threads in parallel #Define a function #This function emulates our single 10000000-iteration task #inside foreach loop #This is necessary because only internal foreach loop #can be execute in parallel mod GetSomeData < - f u n c t i o n ( indexVal ) { tmpData < - r e p ( NA , l e n g t h = 10000000) f o r ( j in 1:10000000) { tmpData [ j ] < - s i n ( indexVal ) * c o s ( j ) } r e t u r n ( tmpData ) } Kutergin A. High performance computing with R
  • 196. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Four tasks, each has 10000000 iterations #sequentially s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m , . multicombine = TRUE , . inorder = FALSE ) % d o % { GetSomeData ( i ) } e n d < - p r o c . t i m e () - s t a r t end 120.49 SomeResult [1] -0.645559 Kutergin A. High performance computing with R
  • 197. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Parallel execution #So, here we send 10000000 iterations for each thread s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m , . multicombine = TRUE , . inorder = FALSE ) % dopar % { GetSomeData ( i ) } e n d < - p r o c . t i m e () - s t a r t end 60.76 SomeResult [1] -0.645559 Kutergin A. High performance computing with R
  • 198. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Using Nested operator s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) %:% f o r e a c h ( k =1:1000 , . combine = " c " , . multicombine = TRUE , . inorder = FALSE ) % do % { sin (i)* cos (k) } end2 < - p r o c . t i m e () - s t a r t end2 2.19 SomeResult [1] 0.6106603 Kutergin A. High performance computing with R
  • 199. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Using Nested operator s t a r t < - p r o c . t i m e () SomeResult < - f o r e a c h ( i =1:4 , . combine = s u m ) %:% f o r e a c h ( k =1:1000 , . combine = " c " , . multicombine = TRUE , . inorder = FALSE ) % dopar % { sin (i)* cos (k) } end2 < - p r o c . t i m e () - s t a r t end2 35.44 SomeResult [1] 0.6106603 Kutergin A. High performance computing with R
  • 200. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Using iterators and foreach together #Define some function simFun < - f u n c t i o n ( arg1 , arg2 ) { tmp < - 2 * arg1 + 3 * arg2 r e t u r n ( tmp ) } #Generate some random data avec < - r n o r m (1000 , 22 , 3) bvec < - r n o r m (1000 , 24 , 5) Kutergin A. High performance computing with R
  • 201. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Initializing iterators iavec < - iter ( avec ) ibvec < - iter ( bvec ) s t a r t < - p r o c . t i m e () seqSimu l at i on r e su l t < - f o r e a c h ( i = iavec , . combine = " c b i n d " ) %:% f o r e a c h ( j = ibvec , . combine = " c " ) % do % { simFun (i , j ) } e n d < - p r o c . t i m e () - s t a r t end 4.90 Kutergin A. High performance computing with R
  • 202. Parallel computation with R: parallel execution of for-loops Examples of foreach usage #Initializing iterators iavec < - iter ( avec ) ibvec < - iter ( bvec ) s t a r t < - p r o c . t i m e () parSimu l at i on r e su l t < - f o r e a c h ( i = iavec , . combine = " c b i n d " ) %:% f o r e a c h ( j = ibvec , . combine = " c " ) % dopar % { simFun (i , j ) } e n d < - p r o c . t i m e () - s t a r t end 13.57 Kutergin A. High performance computing with R
  • 203. Parallel computation with R: parallel execution of for-loops Examples of foreach usage This example uses all tricks #Generating grid x < - s e q ( -10 , 10 , b y =0.1) y < - s e q ( -10 , 10 , b y =0.1) s t a r t < - p r o c . t i m e () z < - f o r e a c h ( y = ivector (x , 4) , . combine = c b i n d ) % dopar % { y < - r e p (y , each = l e n g t h ( x ) ) del < - a b s (1+( x ^ 2 + y ^ 2) ^0.7) r < - ( x ^ 2 + y ^ 2) / 2 m a t r i x (10 * s i n ( r ) / del , l e n g t h ( x ) ) } e n d < - p r o c . t i m e () - s t a r t end 0.37 Kutergin A. High performance computing with R
  • 204. Parallel computation with R: parallel execution of for-loops Examples of foreach usage Result of this code #Plot the results as a perspective plot p e r s p (x , x , z , ylab = ’ y ’ , theta =30 , phi =30 , e x p a n d =0.5 , c o l = " l i g h t b l u e " ) Kutergin A. High performance computing with R
  • 205. Parallel computation with R Parallel computation with graphical processing unit Package: gputools This package provides R interfaces to a handful of common statistical algorithms. These algorithms are implemented in parallel using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS library, and EMI Photonics’ CULA libraries On a computer equiped with an Nvidia GPU some of these functions may be substantially more efficient than native R routines Note! Simply put, this package contains a set of specialized functions that can use GPU for computing. Full list of the functions with description you can find in documentation. However, this package is available only for linux Kutergin A. High performance computing with R
  • 206. Parallel computation with R Parallel computation with graphical processing unit Package: gputools This package provides R interfaces to a handful of common statistical algorithms. These algorithms are implemented in parallel using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS library, and EMI Photonics’ CULA libraries On a computer equiped with an Nvidia GPU some of these functions may be substantially more efficient than native R routines Note! Simply put, this package contains a set of specialized functions that can use GPU for computing. Full list of the functions with description you can find in documentation. However, this package is available only for linux Kutergin A. High performance computing with R
  • 207. Parallel computation with R Parallel computation with graphical processing unit Package: gputools This package provides R interfaces to a handful of common statistical algorithms. These algorithms are implemented in parallel using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS library, and EMI Photonics’ CULA libraries On a computer equiped with an Nvidia GPU some of these functions may be substantially more efficient than native R routines Note! Simply put, this package contains a set of specialized functions that can use GPU for computing. Full list of the functions with description you can find in documentation. However, this package is available only for linux Kutergin A. High performance computing with R
  • 208. Parallel computation with R Parallel computation with graphical processing unit Package: gputools This package provides R interfaces to a handful of common statistical algorithms. These algorithms are implemented in parallel using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS library, and EMI Photonics’ CULA libraries On a computer equiped with an Nvidia GPU some of these functions may be substantially more efficient than native R routines Note! Simply put, this package contains a set of specialized functions that can use GPU for computing. Full list of the functions with description you can find in documentation. However, this package is available only for linux Kutergin A. High performance computing with R
  • 209. Parallel computation with R Parallel computation with graphical processing unit Package: gputools This package provides R interfaces to a handful of common statistical algorithms. These algorithms are implemented in parallel using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS library, and EMI Photonics’ CULA libraries On a computer equiped with an Nvidia GPU some of these functions may be substantially more efficient than native R routines Note! Simply put, this package contains a set of specialized functions that can use GPU for computing. Full list of the functions with description you can find in documentation. However, this package is available only for linux Kutergin A. High performance computing with R
  • 210. Parallel computation with R Parallel computation with graphical processing unit Package: gputools This package provides R interfaces to a handful of common statistical algorithms. These algorithms are implemented in parallel using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS library, and EMI Photonics’ CULA libraries On a computer equiped with an Nvidia GPU some of these functions may be substantially more efficient than native R routines Note! Simply put, this package contains a set of specialized functions that can use GPU for computing. Full list of the functions with description you can find in documentation. However, this package is available only for linux Kutergin A. High performance computing with R
  • 211. Parallel computation with R Parallel computation with graphical processing unit Package: gputools This package provides R interfaces to a handful of common statistical algorithms. These algorithms are implemented in parallel using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS library, and EMI Photonics’ CULA libraries On a computer equiped with an Nvidia GPU some of these functions may be substantially more efficient than native R routines Note! Simply put, this package contains a set of specialized functions that can use GPU for computing. Full list of the functions with description you can find in documentation. However, this package is available only for linux Kutergin A. High performance computing with R
  • 212. Parallel computation with R Parallel computation with graphical processing unit Package: gputools This package provides R interfaces to a handful of common statistical algorithms. These algorithms are implemented in parallel using a mixture of Nvidia’s CUDA langauge, Nvidia’s CUBLAS library, and EMI Photonics’ CULA libraries On a computer equiped with an Nvidia GPU some of these functions may be substantially more efficient than native R routines Note! Simply put, this package contains a set of specialized functions that can use GPU for computing. Full list of the functions with description you can find in documentation. However, this package is available only for linux Kutergin A. High performance computing with R
  • 213. Parallel computation with R Parallel computation with graphical processing unit Some short example gputools usage: #GPU. Here is an example: l i b r a r y ( gputools ) matA < - m a t r i x ( r u n i f (3 * 2) , 3 , 2) matB < - m a t r i x ( r u n i f (3 * 4) , 3 , 4) #Perform Matrix Cross-product with a GPU gpuCrossprod ( matA , matB ) numVectors < - 5 dimension < - 10 Vectors < - m a t r i x ( r u n i f ( numVectors * dimension ) , > numVectors , dimension ) gpuDist ( Vectors , " e u c l i d e a n " ) gpuDist ( Vectors , " m a x i m u m " ) gpuDist ( Vectors , " m a n h a t t a n " ) gpuDist ( Vectors , " m i n k o w s k i " , 4) Kutergin A. High performance computing with R
  • 214. Working with vary large datasets Package bigmemory Motivation Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R’s rich statistical programming environment Description The package bigmemory and sister packages bridge this gap, implementing massive matrices and supporting their manipulation and exploration The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster Kutergin A. High performance computing with R
  • 215. Working with vary large datasets Package bigmemory Motivation Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R’s rich statistical programming environment Description The package bigmemory and sister packages bridge this gap, implementing massive matrices and supporting their manipulation and exploration The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster Kutergin A. High performance computing with R
  • 216. Working with vary large datasets Package bigmemory Motivation Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R’s rich statistical programming environment Description The package bigmemory and sister packages bridge this gap, implementing massive matrices and supporting their manipulation and exploration The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster Kutergin A. High performance computing with R
  • 217. Working with vary large datasets Package bigmemory Motivation Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R’s rich statistical programming environment Description The package bigmemory and sister packages bridge this gap, implementing massive matrices and supporting their manipulation and exploration The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster Kutergin A. High performance computing with R
  • 218. Working with vary large datasets Package bigmemory Motivation Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R’s rich statistical programming environment Description The package bigmemory and sister packages bridge this gap, implementing massive matrices and supporting their manipulation and exploration The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster Kutergin A. High performance computing with R
  • 219. Working with vary large datasets Package bigmemory Motivation Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R’s rich statistical programming environment Description The package bigmemory and sister packages bridge this gap, implementing massive matrices and supporting their manipulation and exploration The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster Kutergin A. High performance computing with R
  • 220. Working with vary large datasets Package bigmemory Motivation Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R’s rich statistical programming environment Description The package bigmemory and sister packages bridge this gap, implementing massive matrices and supporting their manipulation and exploration The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster Kutergin A. High performance computing with R
  • 221. Working with vary large datasets Package bigmemory Motivation Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R’s rich statistical programming environment Description The package bigmemory and sister packages bridge this gap, implementing massive matrices and supporting their manipulation and exploration The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster Kutergin A. High performance computing with R
  • 222. Working with vary large datasets Bigmemory usage examples #Here is an example that uses a very, very large matrix #This example illustrates how to work with a #big.matrix: no 2147483648 object size limitation. l i b r a r y ( bigmemory ) R < - 3 e9 # 3 billion rows C < - 2 # 2 columns print (" 48 ␣ GB ␣ total ␣ size : ") R * C * 8 # 48 GB total size x < - filebacked . big . m a t r i x ( R , C , type = ’ d o u b l e ’ , backingfile = ’ h u g e - d a t a . b i n ’ , descriptorfile = ’ h u g e - d a t a . d e s c ’ ) #Generates huge-data.bin and huge-data.desc files. #Now we can use huge-data.desc file in any R session. x [1 ,] < - r n o r m ( C ) x [ n r o w ( x ) ,] < - r u n i f ( C ) s u m m a r y ( x [1 ,]) s u m m a r y ( x [ n r o w ( x ) ,]) #Note: This example will leave a 48 GB on your hard drive! Kutergin A. High performance computing with R
  • 223. Working with vary large datasets Package filehash Motivation Working with large datasets in R can be cumbersome because of the need to keep objects in physical memory. While many might generally see that as a feature of the system, the need to keep whole objects in memory creates challenges to those who might want to work interactively with large datasets Here we take a simple definition of "large dataset"to be any dataset that cannot be loaded into R as a single R object because of memory limitations. For example, a very large data frame might be too large for all of the columns and rows to be loaded at once. In such a situation, one might load only a subset of the rows or columns, if that is possible Kutergin A. High performance computing with R
  • 224. Working with vary large datasets Package filehash Motivation Working with large datasets in R can be cumbersome because of the need to keep objects in physical memory. While many might generally see that as a feature of the system, the need to keep whole objects in memory creates challenges to those who might want to work interactively with large datasets Here we take a simple definition of "large dataset"to be any dataset that cannot be loaded into R as a single R object because of memory limitations. For example, a very large data frame might be too large for all of the columns and rows to be loaded at once. In such a situation, one might load only a subset of the rows or columns, if that is possible Kutergin A. High performance computing with R
  • 225. Working with vary large datasets Package filehash Motivation Working with large datasets in R can be cumbersome because of the need to keep objects in physical memory. While many might generally see that as a feature of the system, the need to keep whole objects in memory creates challenges to those who might want to work interactively with large datasets Here we take a simple definition of "large dataset"to be any dataset that cannot be loaded into R as a single R object because of memory limitations. For example, a very large data frame might be too large for all of the columns and rows to be loaded at once. In such a situation, one might load only a subset of the rows or columns, if that is possible Kutergin A. High performance computing with R
  • 226. Working with vary large datasets Package filehash Motivation Working with large datasets in R can be cumbersome because of the need to keep objects in physical memory. While many might generally see that as a feature of the system, the need to keep whole objects in memory creates challenges to those who might want to work interactively with large datasets Here we take a simple definition of "large dataset"to be any dataset that cannot be loaded into R as a single R object because of memory limitations. For example, a very large data frame might be too large for all of the columns and rows to be loaded at once. In such a situation, one might load only a subset of the rows or columns, if that is possible Kutergin A. High performance computing with R
  • 227. Working with vary large datasets Package filehash Motivation Working with large datasets in R can be cumbersome because of the need to keep objects in physical memory. While many might generally see that as a feature of the system, the need to keep whole objects in memory creates challenges to those who might want to work interactively with large datasets Here we take a simple definition of "large dataset"to be any dataset that cannot be loaded into R as a single R object because of memory limitations. For example, a very large data frame might be too large for all of the columns and rows to be loaded at once. In such a situation, one might load only a subset of the rows or columns, if that is possible Kutergin A. High performance computing with R
  • 228. Working with vary large datasets Package filehash Motivation Working with large datasets in R can be cumbersome because of the need to keep objects in physical memory. While many might generally see that as a feature of the system, the need to keep whole objects in memory creates challenges to those who might want to work interactively with large datasets Here we take a simple definition of "large dataset"to be any dataset that cannot be loaded into R as a single R object because of memory limitations. For example, a very large data frame might be too large for all of the columns and rows to be loaded at once. In such a situation, one might load only a subset of the rows or columns, if that is possible Kutergin A. High performance computing with R
  • 229. Working with vary large datasets Package filehash Description The filehash package provides a full read-write implementation of a key-value database for R. The package does not depend on any external packages or software systems and is written entirely in R, making it readily usable on most platforms. The filehash package can be thought of as a specific implementation of the database concept, taking a slightly different approach to the problem Technical Note Key-value databases are sometimes called hash tables. With filehash the values are stored in a file on the disk rather than in memory. When a user requests the values associated with a key, filehash finds the object on the disk, loads the value into R and returns it to the user. The package offers two formats for storing data on the disk: The values can be stored (1) concatenated together in a single file or (2) separately as a directory of files Kutergin A. High performance computing with R
  • 230. Working with vary large datasets Package filehash Description The filehash package provides a full read-write implementation of a key-value database for R. The package does not depend on any external packages or software systems and is written entirely in R, making it readily usable on most platforms. The filehash package can be thought of as a specific implementation of the database concept, taking a slightly different approach to the problem Technical Note Key-value databases are sometimes called hash tables. With filehash the values are stored in a file on the disk rather than in memory. When a user requests the values associated with a key, filehash finds the object on the disk, loads the value into R and returns it to the user. The package offers two formats for storing data on the disk: The values can be stored (1) concatenated together in a single file or (2) separately as a directory of files Kutergin A. High performance computing with R
  • 231. Working with vary large datasets Package filehash Description The filehash package provides a full read-write implementation of a key-value database for R. The package does not depend on any external packages or software systems and is written entirely in R, making it readily usable on most platforms. The filehash package can be thought of as a specific implementation of the database concept, taking a slightly different approach to the problem Technical Note Key-value databases are sometimes called hash tables. With filehash the values are stored in a file on the disk rather than in memory. When a user requests the values associated with a key, filehash finds the object on the disk, loads the value into R and returns it to the user. The package offers two formats for storing data on the disk: The values can be stored (1) concatenated together in a single file or (2) separately as a directory of files Kutergin A. High performance computing with R
  • 232. Working with vary large datasets Filehash usage examples #Connecting library l i b r a r y ( filehash ) #Creating hash-database on HDD DATA _ PATH < - " E : / R _ works / file _ hash _ data _ s t r o r a g e / db _ test " DATA _ PATH dbCreate ( DATA _ PATH ) #Initializing link to our hash-database db < - dbInit ( DATA _ PATH ) #Load matrix to our database #Dimantions its = 3000000; d i m = 10 dbInsert ( db , " o u r _ b i g _ m a t r i x " , m a t r i x ( r n o r m ( its * d i m ) ,its , d i m ) ) Kutergin A. High performance computing with R
  • 233. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 234. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 235. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 236. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 237. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 238. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 239. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 240. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 241. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 242. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 243. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 244. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 245. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 246. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 247. Final words, some useful references and contacts Some useful references This are some useful links: The book The Art of R programming - http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf The book Econometrix in R - http://cran.r-project.org/doc/ contrib/Farnsworth-EconometricsInR.pdf R Installation and Administration - http://cran.r-project.org/doc/manuals/R-admin.html Very interesting presentation about HPC in R!!! - http://www.slideshare.net/bytemining/r-hpc Integrated storage of R-posts - http://www.r-bloggers.com/ Page of the commercial R-project - http://www.revolutionanalytics.com/ There are many other sites... If you have a problem, just ask Googl: How to "here formulation of your problem"in R Kutergin A. High performance computing with R
  • 248. Final words, some useful references and contacts Final words and contacts Well... this presentation is only the beginning of my work in this direction. This is only my first try. I will continue this work and will be adding future versions of this presentation with new materials and examples as soon as i have more free time. Also, about quality of this version of the presentation... It is my first experience with LaTex system, so don’t judge me harshly. If you are interesting in this scope or have some ideas, you can just write me. I am open for discussion. This is my contacts list: email: aleksey.v.kutergin@gmail.com facebook page: facebook.com/aleksey.kutergin vk page: vk.com/aleksey_v_kutergin Kutergin A. High performance computing with R