SlideShare a Scribd company logo
SUNDAY:


HPC databases workshop:

rasdman:

   • adding arrays to SQL queries
   • array query operators
          • general array contstructor
          • subset trim & slice
          • array nest/unest
          • matrix multiplication
          • histograms
          • formal encoding (e.g. c, cpp, java arrays)
          • nested queries
   • storage mapping: variants
          • coordinate-free sequence
          • BLOBs
          • ROLAP
          • imaging multidimensional OLAP
   • tiled array storage
          • regular
          • directonal
          • area of interest
   • In-Situ Databases
          • approach: reference external files
          • related: SciQL
   • adding tertiary storage
          • tapes
          • problem: spatial clustering
          • approach: super-tiles = all of the particular index nodes (reiner 2001 - paper)
   • Query processing
          • optimization 1: query rewriting
          • optimization 2: JIT compilation
                  • approach: cluster suitable ops
                  • compile & dynamically bind
                  • benefit: speed up complex, repeated operations
                  • variation: compile code for GPU
   • Intra operator parallelization
          • ...too fast
   • query processing in a federation
          • query splitting
          • work in progress
   • examples
          • human brain imaging
          • gene expression analysis (db queries, sexy as fuck) -> output jpeg, correlations,..
          • geo service standardization (OGC, SIC)
   • use cases/ e.g.:
          • sat imageing
          • 3d clients/vis.
   • historhy of array DBMSs
          • array as table
• conclusion
          • awesome for science and so on..

NEEEEEED SLIDES. so much enhanced SQL statement examples.


Energy Efficient HPC:

VERY much information via slides and talk, graphs,..
extremely interesting. you should read the slides yourself, if you are interested:
http://eehpcwg.lbl.gov/documents


Data-aware networking workshop:

gridftp (fatih university - TR):

https://sites.google.com/a/lbl.gov/ndm2012/home/accepted-papers (first one)

    • intro: pipelining, parallelism, concurrency
    • pipelining:
           • useful for large number of small files
           • higher throughputs on small files (1MB)
           • nr. of files affects total throughput but not the optimal pipelining level
           • throughput increases as number of files increases,..
           • BDP = BW*RTT - optimal windowsize (pfo)
           • ....
    • parallelism:
           • when buffer size is too small comparing to the BDP
           • adventagous with large files
    • concurrency:
           • advantages over parallelism:
                   • para. deteriorates the performance w. small files (pipelining)
                   • concurrency + pipelining has better perf. than cc+pp+p
                   • small RTT: quicker acend to the peak trhoughput
                   • ...
    • rules of thumb:
           • always use pipelining
                   • set diffrent levels
           • keep chunks as big as possible
           • use concurrency with pipelining w. small files and small # files
           • add parallelism to cc and pp with bigger filess
           • use parallelism when # files is insufficient to feed BDP
    • recursive chunk size division
           • mean based algo. to construct cluster of files with diff. optimal pipelining lvls.
           • calc.optimal pipelining level by dividing BDP into mean file size of chunk
    • results
           • awesome (slides needed, graphs and so on,..)

Sandhya Narayan, Hadoop acceleration in an OpenFlow-based cluster:

    • overview of SDN/openflow
          • use case: hadoop
• hadoop overview
             • hadoop acceleration approaches (usual stuff)
             • overview mapreduce pipeline (ibid)
             • overview of hadoop network traffic (ibid)
    •   floodlight as openflow controller
    •   openflow switch: openvswitch and link (research link)
    •   queues in openflow (for different bandwidths 50mbps, 200mbps,..)
    •   improvement in latency due to BW queues
    •   conclusion: SDN is awesome, but we don't use much of it now.
    •   further work: QoS, dynamic hadoop flows

no news there.


Mehmet Balman, Streaming Exa Scale data over 100Gbps Networks:

    • lot-of-small files problem! - file centric tools (not high speed), latency still a problem
    • framework for memeory-mapped network channel
           • blocks
           • memory caches are logically mapped between client and server
           • advantages:
                  • decoupling i/o and network ops (front/backend)
                  • not limited by file size characteristics
                  • moving climate files efficiently (gridftp, fopen,..)
    • SC11 100Gbps demo
           • CMIP3 data (35tb) over gpfs at NERSC
           • bs 4MB
           • each blocks data section was alined according to the system page size
           • 1gb cache
           • testbed overview:
                  • many tcp streams
                  • effects: crazy cpu usage
    • memznet's performance (buffer size 5mb)

wtf?! no new information AT ALL.



MONDAY:


parallel storage workshop:

keynote (eric barton)

    • http://www.pdsw.org/keynote.shtml
    • http://www.pdsw.org/pdsw12/slides/keynote-FF-IO-Storage.pdf

poster sessions
  slides and papers available online: http://www.pdsw.org/index.shtml

slides (papers if no slides available at the time):
   1. http://www.pdsw.org/pdsw12/papers/he-pdsw12.pdf
   2. http://www.pdsw.org/pdsw12/slides/crume-slides-pdsw12.pdf
3. http://www.pdsw.org/pdsw12/papers/grawinkle-pdsw12.pdf - no slides yet
  4. http://www.pdsw.org/pdsw12/papers/kim-pdsw12.pdf - no slides yet
  5. http://www.pdsw.org/pdsw12/slides/jwchoi_sc_SAN.pdf
  6. http://www.pdsw.org/pdsw12/slides/ren-tablefs_giga_pdsw.pdf
  7. http://www.pdsw.org/pdsw12/papers/goodell-pdsw12.pdf - no slides yet
  8. http://www.pdsw.org/pdsw12/slides/watkins-datamods-pdsw12.pdf
  9. http://www.pdsw.org/pdsw12/papers/carns-pdsw12.pdf - no slides (yet?)


HFT workshop:

http://www.cs.usfca.edu/~mfdixon/whpcf12/whpcf_12_program.html

2nd keynote - nvidia (john ashley) - how not to be roadkill

    • overview
    • background: EE, realtime data, big data, datamining, geospatial,..
    • drivers - power and heat
    • drivers - financial regulators
    • drivers the world as we dont knot it:
           • no arch. for everything, multi-arch
           • hadoop isnt the answer to everything
           • need to optimize cost and risk
           • need tools and techniques to implement across heterogenous solutions
           • need metrics to identfiy tradeoffs
                   • example:
                           • hanweck - reduced capt. expen. 10x, oper. expen. 13x
                           • citadel - each gpu saves 180.6K USD / year
                           • JPMC - 80 percent oper. expen. savings through GPUs
    • drivers - information advantage
           • is knowledge power?
                   • profit = f(knowledge, cap., capability)
                   • low latency/hft teams know this,..
           • knowing what your competition does
           • are you in the red with respect to capability to price and risk deals,..
                   • analytical? better models?, faster?
                   • computionally? new technology -> time to market
           • JPMorgan runs GPUs for risk analysis
    • crossing the road w/o getting hit
           • techonolgy
                   • no longer hw agnostic
                   • heterogenous
                   • suitable
                   • data is the new bottleneck
    • skills
           • parallel thinking
                   • data awareness
                   • multi-paragidgm, multi-programming
                   • experimentalism
                   • hft guys are into all of this and so on,...
           • parallel thinking
                   • chunking work
                           • distribution
• tiling
                    • cyclic reduction, parallel solvers, swarm optimization, monte carlo
             • numerical issues
             • awareness of descrete math issues, SP/DP
             • numerical stability, async. algos, red/black coloring, multi-level grid solvers
      • data awareness
             • not just hadoop
             • efficient organization, delivery of data to compute is key
             • dataflow programming is key
             • hpc programmers already know this
             • examples:
                    • structure of arrays vs array of structures, esp. as vector units get wider
                    • tiling algos. vs naive algos drastically improve performance
             • some firms still believe that language optimized and hardware aware programming
               is wrong
      • experimentalism
             • innovate
             • avoid analysis paralysis
             • define relevant metrics, collect them, and then act
• STAC-A2: a benchmark focused on metrics and biz problem
      • can be used to compare a range of potential solutions that are innovative
      • allows free eign to parallel and data-sensitive computing
• case study
      • CARMA: standalone arm + gpu micro server, its a dev. kit, over narrow pci-e
             • monte carlo based
             • MPI
             • carma rocks for hft
             • speed
             • low power consumption

More Related Content

PDF
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
PDF
Foss evolution cos-boudnik
PPTX
La big datacamp2014_vikram_dixit
PDF
"The BG collaboration, Past, Present, Future. The new available resources". P...
PDF
Architecting the Future of Big Data & Search - Eric Baldeschwieler
PPTX
Hug france-2012-12-04
PPTX
February 2014 HUG : Pig On Tez
PDF
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
Foss evolution cos-boudnik
La big datacamp2014_vikram_dixit
"The BG collaboration, Past, Present, Future. The new available resources". P...
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Hug france-2012-12-04
February 2014 HUG : Pig On Tez
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013

What's hot (20)

PPTX
Pig on Tez - Low Latency ETL with Big Data
PPTX
Drill lightning-london-big-data-10-01-2012
PDF
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
PDF
HUG slides on NFS and ODBC
PDF
MapReduce Improvements in MapR Hadoop
PDF
Architectural Overview of MapR's Apache Hadoop Distribution
PPTX
Hic 2011 realtime_analytics_at_facebook
KEY
TriHUG - Beyond Batch
PPTX
HBase Operations and Best Practices
PPTX
Asbury Hadoop Overview
PPTX
Realtime Detection of DDOS attacks using Apache Spark and MLLib
PDF
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
PPTX
Drill at the Chug 9-19-12
PDF
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
PPTX
Dealing with an Upside Down Internet
PPTX
10c introduction
PPT
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
PPTX
Pig on Tez: Low Latency Data Processing with Big Data
PDF
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
PDF
sudoers: Benchmarking Hadoop with ALOJA
Pig on Tez - Low Latency ETL with Big Data
Drill lightning-london-big-data-10-01-2012
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
HUG slides on NFS and ODBC
MapReduce Improvements in MapR Hadoop
Architectural Overview of MapR's Apache Hadoop Distribution
Hic 2011 realtime_analytics_at_facebook
TriHUG - Beyond Batch
HBase Operations and Best Practices
Asbury Hadoop Overview
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Drill at the Chug 9-19-12
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Dealing with an Upside Down Internet
10c introduction
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Pig on Tez: Low Latency Data Processing with Big Data
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
sudoers: Benchmarking Hadoop with ALOJA
Ad

Similar to Sc12 workshop-writeup (20)

PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
PPTX
Comparing Big Data and Simulation Applications and Implications for Software ...
PPTX
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
PDF
Big Data/Hadoop Infrastructure Considerations
PDF
Hpc lunch and learn
PDF
Hadoop on Azure, Blue elephants
PPTX
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
PPTX
Big data ppt
PDF
Big Data is changing abruptly, and where it is likely heading
PDF
Software Design Practices for Large-Scale Automation
PPTX
Big Data HPC Convergence
PDF
Hadoop and its Ecosystem Components in Action
PDF
04 open source_tools
PPT
Big Data
PDF
Hadoop.mapreduce
PDF
Notes on data-intensive processing with Hadoop Mapreduce
PPTX
Big Data and Cloud Computing
PDF
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
PDF
Realtime Analytics with Hadoop and HBase
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Comparing Big Data and Simulation Applications and Implications for Software ...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Big Data/Hadoop Infrastructure Considerations
Hpc lunch and learn
Hadoop on Azure, Blue elephants
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Big data ppt
Big Data is changing abruptly, and where it is likely heading
Software Design Practices for Large-Scale Automation
Big Data HPC Convergence
Hadoop and its Ecosystem Components in Action
04 open source_tools
Big Data
Hadoop.mapreduce
Notes on data-intensive processing with Hadoop Mapreduce
Big Data and Cloud Computing
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Realtime Analytics with Hadoop and HBase
Ad

More from Aaron Zauner (13)

PDF
Because "use urandom" isn't everything: a deep dive into CSPRNGs in Operating...
PDF
[BlackHat USA 2016] Nonce-Disrespecting Adversaries: Practical Forgery Attack...
PDF
No need for Black Chambers: Testing TLS in the E-Mail Ecosystem at Large (hac...
PDF
State of Transport Security in the E-Mail Ecosystem at Large
PDF
Javascript Object Signing & Encryption
PDF
Introduction to and survey of TLS security (BsidesHH 2014)
PDF
Beautiful Bash: Let's make reading and writing bash scripts fun again!
PDF
Introduction to and survey of TLS Security
PDF
[IETF Part] BetterCrypto Workshop @ Hack.lu 2014
PDF
[Attacks Part] BetterCrypto Workshop @ Hack.lu 2014
PDF
Introduction to and survey of TLS Security
PDF
BetterCrypto: Applied Crypto Hardening
PDF
How to save the environment
Because "use urandom" isn't everything: a deep dive into CSPRNGs in Operating...
[BlackHat USA 2016] Nonce-Disrespecting Adversaries: Practical Forgery Attack...
No need for Black Chambers: Testing TLS in the E-Mail Ecosystem at Large (hac...
State of Transport Security in the E-Mail Ecosystem at Large
Javascript Object Signing & Encryption
Introduction to and survey of TLS security (BsidesHH 2014)
Beautiful Bash: Let's make reading and writing bash scripts fun again!
Introduction to and survey of TLS Security
[IETF Part] BetterCrypto Workshop @ Hack.lu 2014
[Attacks Part] BetterCrypto Workshop @ Hack.lu 2014
Introduction to and survey of TLS Security
BetterCrypto: Applied Crypto Hardening
How to save the environment

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
Teaching material agriculture food technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Programs and apps: productivity, graphics, security and other tools
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Teaching material agriculture food technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Understanding_Digital_Forensics_Presentation.pptx
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf

Sc12 workshop-writeup

  • 1. SUNDAY: HPC databases workshop: rasdman: • adding arrays to SQL queries • array query operators • general array contstructor • subset trim & slice • array nest/unest • matrix multiplication • histograms • formal encoding (e.g. c, cpp, java arrays) • nested queries • storage mapping: variants • coordinate-free sequence • BLOBs • ROLAP • imaging multidimensional OLAP • tiled array storage • regular • directonal • area of interest • In-Situ Databases • approach: reference external files • related: SciQL • adding tertiary storage • tapes • problem: spatial clustering • approach: super-tiles = all of the particular index nodes (reiner 2001 - paper) • Query processing • optimization 1: query rewriting • optimization 2: JIT compilation • approach: cluster suitable ops • compile & dynamically bind • benefit: speed up complex, repeated operations • variation: compile code for GPU • Intra operator parallelization • ...too fast • query processing in a federation • query splitting • work in progress • examples • human brain imaging • gene expression analysis (db queries, sexy as fuck) -> output jpeg, correlations,.. • geo service standardization (OGC, SIC) • use cases/ e.g.: • sat imageing • 3d clients/vis. • historhy of array DBMSs • array as table
  • 2. • conclusion • awesome for science and so on.. NEEEEEED SLIDES. so much enhanced SQL statement examples. Energy Efficient HPC: VERY much information via slides and talk, graphs,.. extremely interesting. you should read the slides yourself, if you are interested: http://eehpcwg.lbl.gov/documents Data-aware networking workshop: gridftp (fatih university - TR): https://sites.google.com/a/lbl.gov/ndm2012/home/accepted-papers (first one) • intro: pipelining, parallelism, concurrency • pipelining: • useful for large number of small files • higher throughputs on small files (1MB) • nr. of files affects total throughput but not the optimal pipelining level • throughput increases as number of files increases,.. • BDP = BW*RTT - optimal windowsize (pfo) • .... • parallelism: • when buffer size is too small comparing to the BDP • adventagous with large files • concurrency: • advantages over parallelism: • para. deteriorates the performance w. small files (pipelining) • concurrency + pipelining has better perf. than cc+pp+p • small RTT: quicker acend to the peak trhoughput • ... • rules of thumb: • always use pipelining • set diffrent levels • keep chunks as big as possible • use concurrency with pipelining w. small files and small # files • add parallelism to cc and pp with bigger filess • use parallelism when # files is insufficient to feed BDP • recursive chunk size division • mean based algo. to construct cluster of files with diff. optimal pipelining lvls. • calc.optimal pipelining level by dividing BDP into mean file size of chunk • results • awesome (slides needed, graphs and so on,..) Sandhya Narayan, Hadoop acceleration in an OpenFlow-based cluster: • overview of SDN/openflow • use case: hadoop
  • 3. • hadoop overview • hadoop acceleration approaches (usual stuff) • overview mapreduce pipeline (ibid) • overview of hadoop network traffic (ibid) • floodlight as openflow controller • openflow switch: openvswitch and link (research link) • queues in openflow (for different bandwidths 50mbps, 200mbps,..) • improvement in latency due to BW queues • conclusion: SDN is awesome, but we don't use much of it now. • further work: QoS, dynamic hadoop flows no news there. Mehmet Balman, Streaming Exa Scale data over 100Gbps Networks: • lot-of-small files problem! - file centric tools (not high speed), latency still a problem • framework for memeory-mapped network channel • blocks • memory caches are logically mapped between client and server • advantages: • decoupling i/o and network ops (front/backend) • not limited by file size characteristics • moving climate files efficiently (gridftp, fopen,..) • SC11 100Gbps demo • CMIP3 data (35tb) over gpfs at NERSC • bs 4MB • each blocks data section was alined according to the system page size • 1gb cache • testbed overview: • many tcp streams • effects: crazy cpu usage • memznet's performance (buffer size 5mb) wtf?! no new information AT ALL. MONDAY: parallel storage workshop: keynote (eric barton) • http://www.pdsw.org/keynote.shtml • http://www.pdsw.org/pdsw12/slides/keynote-FF-IO-Storage.pdf poster sessions slides and papers available online: http://www.pdsw.org/index.shtml slides (papers if no slides available at the time): 1. http://www.pdsw.org/pdsw12/papers/he-pdsw12.pdf 2. http://www.pdsw.org/pdsw12/slides/crume-slides-pdsw12.pdf
  • 4. 3. http://www.pdsw.org/pdsw12/papers/grawinkle-pdsw12.pdf - no slides yet 4. http://www.pdsw.org/pdsw12/papers/kim-pdsw12.pdf - no slides yet 5. http://www.pdsw.org/pdsw12/slides/jwchoi_sc_SAN.pdf 6. http://www.pdsw.org/pdsw12/slides/ren-tablefs_giga_pdsw.pdf 7. http://www.pdsw.org/pdsw12/papers/goodell-pdsw12.pdf - no slides yet 8. http://www.pdsw.org/pdsw12/slides/watkins-datamods-pdsw12.pdf 9. http://www.pdsw.org/pdsw12/papers/carns-pdsw12.pdf - no slides (yet?) HFT workshop: http://www.cs.usfca.edu/~mfdixon/whpcf12/whpcf_12_program.html 2nd keynote - nvidia (john ashley) - how not to be roadkill • overview • background: EE, realtime data, big data, datamining, geospatial,.. • drivers - power and heat • drivers - financial regulators • drivers the world as we dont knot it: • no arch. for everything, multi-arch • hadoop isnt the answer to everything • need to optimize cost and risk • need tools and techniques to implement across heterogenous solutions • need metrics to identfiy tradeoffs • example: • hanweck - reduced capt. expen. 10x, oper. expen. 13x • citadel - each gpu saves 180.6K USD / year • JPMC - 80 percent oper. expen. savings through GPUs • drivers - information advantage • is knowledge power? • profit = f(knowledge, cap., capability) • low latency/hft teams know this,.. • knowing what your competition does • are you in the red with respect to capability to price and risk deals,.. • analytical? better models?, faster? • computionally? new technology -> time to market • JPMorgan runs GPUs for risk analysis • crossing the road w/o getting hit • techonolgy • no longer hw agnostic • heterogenous • suitable • data is the new bottleneck • skills • parallel thinking • data awareness • multi-paragidgm, multi-programming • experimentalism • hft guys are into all of this and so on,... • parallel thinking • chunking work • distribution
  • 5. • tiling • cyclic reduction, parallel solvers, swarm optimization, monte carlo • numerical issues • awareness of descrete math issues, SP/DP • numerical stability, async. algos, red/black coloring, multi-level grid solvers • data awareness • not just hadoop • efficient organization, delivery of data to compute is key • dataflow programming is key • hpc programmers already know this • examples: • structure of arrays vs array of structures, esp. as vector units get wider • tiling algos. vs naive algos drastically improve performance • some firms still believe that language optimized and hardware aware programming is wrong • experimentalism • innovate • avoid analysis paralysis • define relevant metrics, collect them, and then act • STAC-A2: a benchmark focused on metrics and biz problem • can be used to compare a range of potential solutions that are innovative • allows free eign to parallel and data-sensitive computing • case study • CARMA: standalone arm + gpu micro server, its a dev. kit, over narrow pci-e • monte carlo based • MPI • carma rocks for hft • speed • low power consumption