SlideShare a Scribd company logo
An Active and Hybrid Storage System
     for Data-intensive Applications

   Ph.D Candidate: Zhiyang Ding
   Defense Committee Members:
   Dr. Xiao Qin
   Dr. Kai H. Chang
   Dr. David A. Umphress
   University Reader:
   Prof. Wei Wang,
   Chair of the Art Design Dept.
                    5/7/2012
Cluster Computing
      • Large-scale Data Processing is everywhere.




5/7/2012                     2
Motivation
         • Traditional Storage Nodes on the Cluster
                                                            Storage Node
                                     Head Node       (or Storage Area Network)
                    Internet
Client




                                   Network switch




                               Compute
                               Nodes
         5/7/2012                                3
Motivation
         • What’s the next?
         • More “Active”.


                             Head
              Internet




                             Node
Client




                            Network switch



                                                          Storage Node
                  Compute
                  Nodes                      Computation Offload
                                                I/O Request

                                                   Raw Data
                                              Pre-processed Data
         5/7/2012                                           4
About the Active Storage

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                           Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               5
McSD:
   A Multicore Active Storage Device

• I/O Wall Problem: CPU--I/O Gap
      – Limited I/O Bandwidth
      – CPU Waiting and
        Dissipating the Power
• How to
      – Bridge CPU--I/O Gap
      – Reduce I/O Traffic


5/7/2012                      6
Why McSD?


• “Active”:
      – Leveraging the Processing Power of Storage Devices


• Benefits:
      – Offloading Data-intensive Computation
      – Reducing I/O Traffic
      – Pipeline Parallel Programming


5/7/2012                     7
Contributions


• Design a prototype of a multicore active storage

• Design a pre-assembled processing module

• Extend a shared-memory MapReduce system

• Emulate the whole system on a real testbed


5/7/2012                 8
Background: Active Disks

• Traditional Smart/Active Disks
      – On-board: Embedding a processor into the hard disk
      – Various Research Models
         • e.g. active disk, smart disk, IDISK, SmartSTOR, and etc.

• However, “active disk” is not adopted by hardware vendors

            Improved attachment
                                       Cost of the System
                technologies


            I/O Bound Workloads            Reliability


5/7/2012                           9
Background: Parallel Processing

• Multi-core Processors or Multi-processors
      – 45% transistors increase   20% processing power
• MapReduce: a Parallel Programming Model
      – MapReduce by Google
      – Hadoop, Mars, Phoenix, and etc.
• Multicore and Shared-memory Parallel
  Processing

5/7/2012                     10
Design: System Overview

                                            Pipeline Parallel
                                               Processing

                                           Communication
                                             Mechanism
  Multicore and
 Shared-memory
Parallel Processing
                                       Hybrid Storage Disks




 Design of an Active
      Storage

  5/7/2012                  11
Design and Implementation

• Computation Mechanism
      – Pre-assembled Processing Model
      – smartFAM
• Extend the Shared-Memory MapReduce by
  Partitioning




5/7/2012                   12
Pre-assembled Processing Modules


• Pre-assembled Processing Modules
      – Meet the nature of embedded services
      – Reduce Complexity and Cost
      – Provide Services
           • E.g. Multi-version antivirus service, Pre-process of data-
             intensive apps, De-duplication, and etc.
• How to invoke services?


5/7/2012                            13
smartFAM

• smartFAM = Smart File Alternation Monitor
      – Invokes the pre-assembled processing modules or
        functions by monitoring the changes of the system
        log file.
• Two Components:
      – an inotify function: a Linux system function
      – a trigger daemon


5/7/2012                      14
Design and Implementation

  Active Node


  smartFAM
        Daemon


                   Pre-assembled
                   Modules
         inotify
                           ...        Host node
                           2
                                                                    1
                                      smartFAM      Main Program

                                         Daemon
                       Module Log                                 Data-
        Log files                                     General
                                                                intensive
                      & Result data                 functions
                                                                 function


                           3              inotify
                                                       Merge Results

  NFS




5/7/2012                                    15
Extend the Phoenix:
    A Shared-memory MapReduce Model

• Extend the Phoenix MapReduce Programming
  Model by partitioning and merging
      – New API: partition_input
      – New Functions:
           • partition (provided by the new API)
           • merge (Develop by user)


• Example:
      – wordcount [data-file][partition-size][]


5/7/2012                             16
Pipeline Processing




5/7/2012                 17
Evaluation Environment

• Testbed

• Benchmarks
      – Word Count
      – String Match
      – Matrix Multiplication

• Individual Node Performance
• System Performance
5/7/2012                        18
Individual Node Performance


                Word Count (seconds)    String Match (seconds)

                1 GB          1.25 GB   1 GB           1.25 GB

w/ Partition    40.60          50.91    17.76           20.61

w/o Partition   85.74         139.54    17.62           21.00




5/7/2012                       19
System Evaluation

                  Matrix-Multiplication and Word-Count (Speedups)
Input Data Size          vs Single Machine          vs Single-core Active   vs McSD w/o Partition

   500 MB                        1.47 X                   2.15 X                   0.99 X

   750 MB                        1.45 X                   2.09 X                   1.04 X

     1 GB                        7.62 X                   2.14 X                   6.07 X

   1.25 GB                      19.01 X                   2.50 X                  15.39 X


                      TConsumptionOfControlSample
            Speedup =
                         TConsumptionOfMcSD
 5/7/2012                                           20
Summary

• It can improve system performance by
  offloading data-intensive computation

• McSD is a promising active storage model with
      – Pre-assembled processing modules
      – Parallel data processing
      – Better Evaluation Performance


5/7/2012                   21
About the Active Storage

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                            Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               22
Apply Active Storages to a Cluster


• So far, we know the potential of Active
  Storages

• Challenge: How to coordinate active storage
  nodes with computing nodes?

• Propose a Pipeline-parallel Processing pattern

5/7/2012                23
Contributions


• Propose a pipeline-parallel processing framework
  to “connect” a Active Storage node with
  computing nodes.
• Evaluate the framework using both an analytic
  model and a real implementation.
• Case Study: Extend an existing bioinformatics
  application based on the framework.

5/7/2012                24
Background: Active Storage


                   Processor
     Memory

  Mass Storage
                                         Bridge?


                 Active Storage
                     Node

                  SSD      SSD    Computation

                   Buff Disks


5/7/2012                            25
Background: Bioinformatics App

• BLAST*: Basic Local Alignment Search Tool
      – Comparing primary biological sequence
        information


• mpiBLAST** is a freely available, open-source,
  parallel implementation of NCBI BLAST.
      – Format raw data files
      – Run a parallel BLAST function
                            *http://blast.ncbi.nlm.nih.gov/
                            **http://www.mpiblast.org/
5/7/2012                      27
Pipeline-parallel Design


• Offload the raw-data formatting task to where
    data stores.
• Intra-application Pipeline-parallel Processing
    by “partition” and “merge”.
• pp-mpiBlast, a case study.


5/7/2012                   28
Pipelining Workflow

Active Storage Node                                                              Computing Nodes
                                         Intermediate             Sub-output
                  Partition 1
                                                1                       1

  Raw                     2                   2
                                           Inter-                        2
                                                                                          Output
 Input                 Formart DB          mediat                Formart DB             Output
                                                                                            File
  File                    …                  es     …                       …
                        Partition            Intermediate               Sub-output
                           n                         n                       n

                           n                                                 1
           Partition                 FormatDB                mpiBlast                Merge
                       (n-1) times
                                                         (n-1) times
5/7/2012                                        29
Analytic Model

• Three Critical Measures
 Tresponse = Tactive + Tcompute
                                  1
 Throughput =
                    max(Tactive ,Tcompute )
            Tsequence                n ´ (Tactive + Tcompute )
  Speedup =           =
            Tpipelined Tactive + (n -1) ´ max(Tactive ,Tcompute ) + Tcompute
               n
 =
               Throughput
   1+ (n -1) ´
                 Tresponse

5/7/2012                                 30
Evaluation Environment

                Computing Nodes Configuration            Active Storage Configuration
    CPU                  Intel XEON X3430                       Intel Core 2 Q9400
 Memory                               2 GB DDR3 (PC3-10600)
     OS                      Ubuntu 9.04 Jaunty Jackalope 32bit Version
   Kernel                                   2.6.28-15-generic
 Network                                         Gigabit LAN

           Our Testbed                              Opposite Testbeds
    “Pipeline-parallel”           “12-node Cluster”               “13-node Cluster”
    12 Computing Nodes           12 Computing Nodes               13 Computing Nodes
   1 Active Storage Node            1 Storage Node                  1 Storage Node



5/7/2012                                    31
Pipeline-parallel Design




                   Results: Compared With 12-node System




                   Results: Compared With 13-node System
5/7/2012                          32
Speedups Trends: Partition Size




5/7/2012             33
Summary


• We proposed a pipeline-parallel processing
    mechanism to apply an Active Storage Node.


• As a case study, we extended a classic
    bioinformatics application based on the
    pipeline-parallel style.

5/7/2012                   34
About the Active Storage

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                            Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               35
What’s Hybrid?

A Hybrid Combination of a Gas    Power
Engine and a Electronic Engine   Efficiency




5/7/2012               36
Hybrid Disk Drives

• A Hybrid Combination of Two Types of Storage
  Devices: HDD and SSD
      – HDD: Magnetic Hard Disk
      – Solid State Disk: Built by NAND-based flash memory.


                                        What are their roles?




5/7/2012                       37
Motivation


• In a hybrid storage system, using SSDs as the
  buffer can boost the performance.
            WordCount on Intel Core2 Duo E8400 (seconds)

• However, SSDs suffer Input Data Size issues.
  Storage Buffer
                       reliability
                          500 MB   750 MB    1 GB    1.25 GB

           HDD    HDD      21.51    38.30   505.25   1294.64


           HDD     SD
                    S      19.89    36.41    85.74   139.54



5/7/2012                             38
Limitations Related to SSDs

• Flash Memory:
      – Each Block consists 32 or 64 or128 pages.
      – Each Page is typically 512 or 2,048 or 4,096 bytes.
• “Erase-before-write” at block level.
• Lifespan is 10,000 Program/Erase cycles.
      – E.g., *The lifespan of an 80 GB MLC SSD can only
        last 106 days, if the write rates is 30 MB/s.
• Rethink about their roles?
            *Based on the SSD lifespan calculator provided by Virident.com
5/7/2012                                    39
Contributions


• Hybrid Combination of HDD and SSD disks

• De-duplication Service using HDDs as a Write Buffer

• Internal-parallel Processing in SSD

• Simulation of the Whole System For Evaluation



5/7/2012                  40
Hybrid Disk Configuration


                                                       De-duplication
             Data of Write Requests

                                               HDD
     I/O                                                           Dedicated
   Requests                                     data               Processor
                                Deduplicated    data


                Read Requests                          Pre-processing
               Pre-processed Data
                      Data
                                                SSD


5/7/2012                              41
HcDD Architecture




5/7/2012               42
Deduplication Design




5/7/2012                43
List #0
                                     List #1
                                                List #2
                                                            List #3
                                                                       List #4
                                                                                  List #5
                                                                                             List #6
                                                                                                        List #7




5/7/2012
                           ...        ...        ...        ...         ...        ...        ...        ...
                           ...        ...        ...        ...         ...        ...        ...        ...




           SDRAM Cache
                           ...        ...        ...        ...         ...        ...        ...        ...
                           ...        ...        ...        ...         ...        ...        ...        ...
                         Req 17     Req 18     Req 19     Req 20      Req 21     Req 22     Req 23     Req 24
                         Req 9      Req 10     Req 11     Req 12      Req 13     Req 14     Req 15     Req 16
                         Req 1      Req 2      Req 3      Req 4       Req 5      Req 6      Req 7      Req 8




                           #0
                                      #1
                                                 #2
                                                                                   #5
                                                                                              #6
                                                                                                         #7




                                                            #3
                                                                        #4




                         Package
                                    Package
                                               Package
                                                          Package
                                                                      Package
                                                                                 Package
                                                                                            Package
                                                                                                       Package




44
                                                                                                                  Internal Parallel Processing
Evaluation




5/7/2012           45
Internal Parallelism Evaluation:
               Single Node




5/7/2012                 46
Single Node: Dedup Ratio




5/7/2012                  47
System Performance Evaluation




5/7/2012             48
System Performance Evaluation




5/7/2012             49
Summary




5/7/2012         50
Conclusion

               McSD:
           A Smart Disk Model


           pp-mpiBlast:
     How to deploy Active Storage?


                                            Storage Node
                 HcDD:
      Hybrid Disk for Active Storage

5/7/2012                               51
Future Work




5/7/2012        52
Many Thanks!
           And Questions?




5/7/2012        53

More Related Content

PDF
Using Distributed In-Memory Computing for Fast Data Analysis
PDF
Using multi tiered storage systems for storing both structured & unstructured...
PPTX
Outboard Feel Good NLS
PDF
ITCamp 2012 - Adrian Stoian - Migrating from CFG MGR 2007 to CFG MGR 2012
PDF
Times Ten in-memory database when time counts - Laszlo Ludas
PDF
Performance in a virtualized environment
PDF
Oracle Exadata Version 2
PDF
Real-Time Loading to Sybase IQ
Using Distributed In-Memory Computing for Fast Data Analysis
Using multi tiered storage systems for storing both structured & unstructured...
Outboard Feel Good NLS
ITCamp 2012 - Adrian Stoian - Migrating from CFG MGR 2007 to CFG MGR 2012
Times Ten in-memory database when time counts - Laszlo Ludas
Performance in a virtualized environment
Oracle Exadata Version 2
Real-Time Loading to Sybase IQ

What's hot (20)

PDF
Erlang Cache
PPSX
Parallel Database
KEY
Introduction to Hadoop - ACCU2010
PDF
Faster Than A Speeding Disk
PDF
Architecting Virtualized Infrastructure for Big Data
PDF
Liquidity Risk Management powered by SAP HANA
PPT
How an Enterprise Data Fabric (EDF) can improve resiliency and performance
PDF
数据中心网络研究:机遇与挑战
PDF
The CIOs Guide to NoSQL 2012
PPT
Damon2011 preview
PPT
PPTX
Manage rising disk prices with storage virtualization webinar
PDF
Gear6 Web Cache Overview
PDF
IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
PDF
The unified data center for cloud david yen
PDF
Hitachi Data Services. Business Continuity
PDF
Good Data: Collaborative Analytics On Demand
PDF
Cloud computing era
PPT
Ibm 14052012
PDF
Nutanix Always On-Solution-Brief
Erlang Cache
Parallel Database
Introduction to Hadoop - ACCU2010
Faster Than A Speeding Disk
Architecting Virtualized Infrastructure for Big Data
Liquidity Risk Management powered by SAP HANA
How an Enterprise Data Fabric (EDF) can improve resiliency and performance
数据中心网络研究:机遇与挑战
The CIOs Guide to NoSQL 2012
Damon2011 preview
Manage rising disk prices with storage virtualization webinar
Gear6 Web Cache Overview
IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
The unified data center for cloud david yen
Hitachi Data Services. Business Continuity
Good Data: Collaborative Analytics On Demand
Cloud computing era
Ibm 14052012
Nutanix Always On-Solution-Brief
Ad

Viewers also liked (20)

PPTX
Project 2 - how to compile os161?
PPTX
IPCCC 2012 Conference Program Overview
PPTX
OS/161 Overview
PPTX
Project 2 how to modify OS/161
PPTX
Reliability Analysis for an Energy-Aware RAID System
PPTX
Nas'12 overview
PPTX
Energy Efficient Data Storage Systems
PPT
COMP2710 Software Construction: header files
PPTX
How to do research?
PPTX
Thermal modeling and management of cluster storage systems xunfei jiang 2014
PPTX
Why Major in Computer Science and Software Engineering at Auburn University?
PPTX
Common grammar mistakes
PDF
Project 2 How to modify os161: A Manual
PDF
Project 2 how to install and compile os161
PPTX
Surviving a group project
PPTX
How to add system calls to OS/161
PPT
COMP2710: Software Construction - Linked list exercises
PPTX
Data center specific thermal and energy saving techniques
PPTX
Understanding what our customer wants-slideshare
PPTX
Performance Evaluation of Traditional Caching Policies on a Large System with...
Project 2 - how to compile os161?
IPCCC 2012 Conference Program Overview
OS/161 Overview
Project 2 how to modify OS/161
Reliability Analysis for an Energy-Aware RAID System
Nas'12 overview
Energy Efficient Data Storage Systems
COMP2710 Software Construction: header files
How to do research?
Thermal modeling and management of cluster storage systems xunfei jiang 2014
Why Major in Computer Science and Software Engineering at Auburn University?
Common grammar mistakes
Project 2 How to modify os161: A Manual
Project 2 how to install and compile os161
Surviving a group project
How to add system calls to OS/161
COMP2710: Software Construction - Linked list exercises
Data center specific thermal and energy saving techniques
Understanding what our customer wants-slideshare
Performance Evaluation of Traditional Caching Policies on a Large System with...
Ad

Similar to An Active and Hybrid Storage System for Data-intensive Applications (20)

PDF
Overview and current topics in solid state storage
PDF
FLASH MEMORY: THE BIG DATA from Structure:Data 2012
PPTX
Accelerating Data Management - Dave Fellinger - RDAP12
PPTX
fpga2014-wjun.pptx
PDF
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
PDF
SSD Performance Benchmarking
PPT
VDI storage and storage virtualization
PDF
USENIX FAST2010参加報告
PDF
20121108 vmug london event nimble sorage for vdi
PDF
2009 Us Array
PDF
Big Iron and Parallel Processing, USArray Data Processing Workshop
PDF
Membase Meetup - San Diego
PDF
SANsymphony V
PDF
Application acceleration from the data storage perspective
PPTX
OneCommand Vision 2.1 webcast: Cutting edge LUN SLAs, AIX on PowerPC and flex...
PPTX
Top Technology Trends
PDF
Capacity Planning
PDF
EMC - 8sept2011
PPTX
Data center Technologies
 
PDF
"Achieving Flash Memory's Full Potential" @ Flash Memory Summit 2012
Overview and current topics in solid state storage
FLASH MEMORY: THE BIG DATA from Structure:Data 2012
Accelerating Data Management - Dave Fellinger - RDAP12
fpga2014-wjun.pptx
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
SSD Performance Benchmarking
VDI storage and storage virtualization
USENIX FAST2010参加報告
20121108 vmug london event nimble sorage for vdi
2009 Us Array
Big Iron and Parallel Processing, USArray Data Processing Workshop
Membase Meetup - San Diego
SANsymphony V
Application acceleration from the data storage perspective
OneCommand Vision 2.1 webcast: Cutting edge LUN SLAs, AIX on PowerPC and flex...
Top Technology Trends
Capacity Planning
EMC - 8sept2011
Data center Technologies
 
"Achieving Flash Memory's Full Potential" @ Flash Memory Summit 2012

More from Xiao Qin (10)

PPTX
How to apply for internship positions?
PPTX
How to write research papers? Version 5.0
PDF
Making a competitive nsf career proposal: Part 2 Worksheet
PDF
Making a competitive nsf career proposal: Part 1 Tips
PPTX
Auburn csse faculty orientation
PPTX
Auburn CSSE graduate student orientation
PPTX
CSSE Graduate Programs Committee: Progress Report
PDF
P#1 stream of praise
PPTX
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
PPT
Reliability Modeling and Analysis of Energy-Efficient Storage Systems
How to apply for internship positions?
How to write research papers? Version 5.0
Making a competitive nsf career proposal: Part 2 Worksheet
Making a competitive nsf career proposal: Part 1 Tips
Auburn csse faculty orientation
Auburn CSSE graduate student orientation
CSSE Graduate Programs Committee: Progress Report
P#1 stream of praise
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Approach and Philosophy of On baking technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
KodekX | Application Modernization Development
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Electronic commerce courselecture one. Pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
MIND Revenue Release Quarter 2 2025 Press Release
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Approach and Philosophy of On baking technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Big Data Technologies - Introduction.pptx
Network Security Unit 5.pdf for BCA BBA.
sap open course for s4hana steps from ECC to s4
Programs and apps: productivity, graphics, security and other tools
KodekX | Application Modernization Development
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
Electronic commerce courselecture one. Pdf

An Active and Hybrid Storage System for Data-intensive Applications

  • 1. An Active and Hybrid Storage System for Data-intensive Applications Ph.D Candidate: Zhiyang Ding Defense Committee Members: Dr. Xiao Qin Dr. Kai H. Chang Dr. David A. Umphress University Reader: Prof. Wei Wang, Chair of the Art Design Dept. 5/7/2012
  • 2. Cluster Computing • Large-scale Data Processing is everywhere. 5/7/2012 2
  • 3. Motivation • Traditional Storage Nodes on the Cluster Storage Node Head Node (or Storage Area Network) Internet Client Network switch Compute Nodes 5/7/2012 3
  • 4. Motivation • What’s the next? • More “Active”. Head Internet Node Client Network switch Storage Node Compute Nodes Computation Offload I/O Request Raw Data Pre-processed Data 5/7/2012 4
  • 5. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 5
  • 6. McSD: A Multicore Active Storage Device • I/O Wall Problem: CPU--I/O Gap – Limited I/O Bandwidth – CPU Waiting and Dissipating the Power • How to – Bridge CPU--I/O Gap – Reduce I/O Traffic 5/7/2012 6
  • 7. Why McSD? • “Active”: – Leveraging the Processing Power of Storage Devices • Benefits: – Offloading Data-intensive Computation – Reducing I/O Traffic – Pipeline Parallel Programming 5/7/2012 7
  • 8. Contributions • Design a prototype of a multicore active storage • Design a pre-assembled processing module • Extend a shared-memory MapReduce system • Emulate the whole system on a real testbed 5/7/2012 8
  • 9. Background: Active Disks • Traditional Smart/Active Disks – On-board: Embedding a processor into the hard disk – Various Research Models • e.g. active disk, smart disk, IDISK, SmartSTOR, and etc. • However, “active disk” is not adopted by hardware vendors Improved attachment Cost of the System technologies I/O Bound Workloads Reliability 5/7/2012 9
  • 10. Background: Parallel Processing • Multi-core Processors or Multi-processors – 45% transistors increase 20% processing power • MapReduce: a Parallel Programming Model – MapReduce by Google – Hadoop, Mars, Phoenix, and etc. • Multicore and Shared-memory Parallel Processing 5/7/2012 10
  • 11. Design: System Overview Pipeline Parallel Processing Communication Mechanism Multicore and Shared-memory Parallel Processing Hybrid Storage Disks Design of an Active Storage 5/7/2012 11
  • 12. Design and Implementation • Computation Mechanism – Pre-assembled Processing Model – smartFAM • Extend the Shared-Memory MapReduce by Partitioning 5/7/2012 12
  • 13. Pre-assembled Processing Modules • Pre-assembled Processing Modules – Meet the nature of embedded services – Reduce Complexity and Cost – Provide Services • E.g. Multi-version antivirus service, Pre-process of data- intensive apps, De-duplication, and etc. • How to invoke services? 5/7/2012 13
  • 14. smartFAM • smartFAM = Smart File Alternation Monitor – Invokes the pre-assembled processing modules or functions by monitoring the changes of the system log file. • Two Components: – an inotify function: a Linux system function – a trigger daemon 5/7/2012 14
  • 15. Design and Implementation Active Node smartFAM Daemon Pre-assembled Modules inotify ... Host node 2 1 smartFAM Main Program Daemon Module Log Data- Log files General intensive & Result data functions function 3 inotify Merge Results NFS 5/7/2012 15
  • 16. Extend the Phoenix: A Shared-memory MapReduce Model • Extend the Phoenix MapReduce Programming Model by partitioning and merging – New API: partition_input – New Functions: • partition (provided by the new API) • merge (Develop by user) • Example: – wordcount [data-file][partition-size][] 5/7/2012 16
  • 18. Evaluation Environment • Testbed • Benchmarks – Word Count – String Match – Matrix Multiplication • Individual Node Performance • System Performance 5/7/2012 18
  • 19. Individual Node Performance Word Count (seconds) String Match (seconds) 1 GB 1.25 GB 1 GB 1.25 GB w/ Partition 40.60 50.91 17.76 20.61 w/o Partition 85.74 139.54 17.62 21.00 5/7/2012 19
  • 20. System Evaluation Matrix-Multiplication and Word-Count (Speedups) Input Data Size vs Single Machine vs Single-core Active vs McSD w/o Partition 500 MB 1.47 X 2.15 X 0.99 X 750 MB 1.45 X 2.09 X 1.04 X 1 GB 7.62 X 2.14 X 6.07 X 1.25 GB 19.01 X 2.50 X 15.39 X TConsumptionOfControlSample Speedup = TConsumptionOfMcSD 5/7/2012 20
  • 21. Summary • It can improve system performance by offloading data-intensive computation • McSD is a promising active storage model with – Pre-assembled processing modules – Parallel data processing – Better Evaluation Performance 5/7/2012 21
  • 22. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 22
  • 23. Apply Active Storages to a Cluster • So far, we know the potential of Active Storages • Challenge: How to coordinate active storage nodes with computing nodes? • Propose a Pipeline-parallel Processing pattern 5/7/2012 23
  • 24. Contributions • Propose a pipeline-parallel processing framework to “connect” a Active Storage node with computing nodes. • Evaluate the framework using both an analytic model and a real implementation. • Case Study: Extend an existing bioinformatics application based on the framework. 5/7/2012 24
  • 25. Background: Active Storage Processor Memory Mass Storage Bridge? Active Storage Node SSD SSD Computation Buff Disks 5/7/2012 25
  • 26. Background: Bioinformatics App • BLAST*: Basic Local Alignment Search Tool – Comparing primary biological sequence information • mpiBLAST** is a freely available, open-source, parallel implementation of NCBI BLAST. – Format raw data files – Run a parallel BLAST function *http://blast.ncbi.nlm.nih.gov/ **http://www.mpiblast.org/ 5/7/2012 27
  • 27. Pipeline-parallel Design • Offload the raw-data formatting task to where data stores. • Intra-application Pipeline-parallel Processing by “partition” and “merge”. • pp-mpiBlast, a case study. 5/7/2012 28
  • 28. Pipelining Workflow Active Storage Node Computing Nodes Intermediate Sub-output Partition 1 1 1 Raw 2 2 Inter- 2 Output Input Formart DB mediat Formart DB Output File File … es … … Partition Intermediate Sub-output n n n n 1 Partition FormatDB mpiBlast Merge (n-1) times (n-1) times 5/7/2012 29
  • 29. Analytic Model • Three Critical Measures Tresponse = Tactive + Tcompute 1 Throughput = max(Tactive ,Tcompute ) Tsequence n ´ (Tactive + Tcompute ) Speedup = = Tpipelined Tactive + (n -1) ´ max(Tactive ,Tcompute ) + Tcompute n = Throughput 1+ (n -1) ´ Tresponse 5/7/2012 30
  • 30. Evaluation Environment Computing Nodes Configuration Active Storage Configuration CPU Intel XEON X3430 Intel Core 2 Q9400 Memory 2 GB DDR3 (PC3-10600) OS Ubuntu 9.04 Jaunty Jackalope 32bit Version Kernel 2.6.28-15-generic Network Gigabit LAN Our Testbed Opposite Testbeds “Pipeline-parallel” “12-node Cluster” “13-node Cluster” 12 Computing Nodes 12 Computing Nodes 13 Computing Nodes 1 Active Storage Node 1 Storage Node 1 Storage Node 5/7/2012 31
  • 31. Pipeline-parallel Design Results: Compared With 12-node System Results: Compared With 13-node System 5/7/2012 32
  • 32. Speedups Trends: Partition Size 5/7/2012 33
  • 33. Summary • We proposed a pipeline-parallel processing mechanism to apply an Active Storage Node. • As a case study, we extended a classic bioinformatics application based on the pipeline-parallel style. 5/7/2012 34
  • 34. About the Active Storage McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 35
  • 35. What’s Hybrid? A Hybrid Combination of a Gas Power Engine and a Electronic Engine Efficiency 5/7/2012 36
  • 36. Hybrid Disk Drives • A Hybrid Combination of Two Types of Storage Devices: HDD and SSD – HDD: Magnetic Hard Disk – Solid State Disk: Built by NAND-based flash memory. What are their roles? 5/7/2012 37
  • 37. Motivation • In a hybrid storage system, using SSDs as the buffer can boost the performance. WordCount on Intel Core2 Duo E8400 (seconds) • However, SSDs suffer Input Data Size issues. Storage Buffer reliability 500 MB 750 MB 1 GB 1.25 GB HDD HDD 21.51 38.30 505.25 1294.64 HDD SD S 19.89 36.41 85.74 139.54 5/7/2012 38
  • 38. Limitations Related to SSDs • Flash Memory: – Each Block consists 32 or 64 or128 pages. – Each Page is typically 512 or 2,048 or 4,096 bytes. • “Erase-before-write” at block level. • Lifespan is 10,000 Program/Erase cycles. – E.g., *The lifespan of an 80 GB MLC SSD can only last 106 days, if the write rates is 30 MB/s. • Rethink about their roles? *Based on the SSD lifespan calculator provided by Virident.com 5/7/2012 39
  • 39. Contributions • Hybrid Combination of HDD and SSD disks • De-duplication Service using HDDs as a Write Buffer • Internal-parallel Processing in SSD • Simulation of the Whole System For Evaluation 5/7/2012 40
  • 40. Hybrid Disk Configuration De-duplication Data of Write Requests HDD I/O Dedicated Requests data Processor Deduplicated data Read Requests Pre-processing Pre-processed Data Data SSD 5/7/2012 41
  • 43. List #0 List #1 List #2 List #3 List #4 List #5 List #6 List #7 5/7/2012 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... SDRAM Cache ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Req 17 Req 18 Req 19 Req 20 Req 21 Req 22 Req 23 Req 24 Req 9 Req 10 Req 11 Req 12 Req 13 Req 14 Req 15 Req 16 Req 1 Req 2 Req 3 Req 4 Req 5 Req 6 Req 7 Req 8 #0 #1 #2 #5 #6 #7 #3 #4 Package Package Package Package Package Package Package Package 44 Internal Parallel Processing
  • 45. Internal Parallelism Evaluation: Single Node 5/7/2012 46
  • 46. Single Node: Dedup Ratio 5/7/2012 47
  • 50. Conclusion McSD: A Smart Disk Model pp-mpiBlast: How to deploy Active Storage? Storage Node HcDD: Hybrid Disk for Active Storage 5/7/2012 51
  • 52. Many Thanks! And Questions? 5/7/2012 53

Editor's Notes

  • #3: Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • #4: Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • #5: Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary
  • #7: Aesop’s Fable: The Tortoise and the Hare. Speed gap. Fast Runner wait for the slower one.Over the last several decades, the performance has increased rapidly. While, the performance improvement of I/O is relatively slow. It cause... the gap between CPU performance and I/O bandwidth has continually grown. Especially, for data-intensive computing workloads, I/O bottlenecks often cause low CPU utilization.
  • #28: BLAST is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
  • #29: Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • #30: The pipeline pattern no only improves the performance by exploiting the par- allelism, but also can solve the out-of-core processing issue, which means required amount of data are too large to fit in the ASN’s main memory. In pp-mpiBlast, partition function is implemented within mpiformatdbfucntion running on ASN. And the merge function is a separate one running on the front node of the cluster.
  • #31: Response time, speedup, and throughput are three critical performance measures for the pipelined BLAST. Denoting T1 and T2 as the execution times associated with the first stage and second stage in the pipeline, we can calculate the response time Tresponse for processing each input data set as the sum of T1 and T2.
  • #32: Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • #33: Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • #34: Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • #35: Further subdividing the pipeline patterns, there are inter- and intra-application pipeline processing. The pp- mpiBlast is intra-application parallel processing, which means that, as the name - ‘intra-’ - suggests, one native sequential transaction is partitioned into multiple parallel pipelined transactions. The system performance is improved by fully exploiting the parallelism.
  • #40: One limitation of flash memory is that although it can be read or programmed a byte or a word at a time in a random access fashion, it can only be erased a "block" at a time. This generally sets all bits in the block to 1. Starting with a freshly erased block, any location within that block can be programmed. However, once a bit has been set to 0, only by erasing the entire block can it be changed back to 1. In other words, flash memory (specifically NOR flash) offers random-access read and programming operations, but cannot offer arbitrary random-access rewrite or erase operations.Based on theSSD lifetime calculator provided by Virident website [36], the lifetime of a 200GB MLC-based SSD could be only 160 days if the write rate performing on it is 50MB/s.
  • #49: The performance depends on the number of writes we removed.In real world implementation, (1) conservative comparison: no optimization, consider writes as synchronous (2) log file system->reduce seek and rotational delays of HDD (3) asynchronous writes: from the user perspective, the delay is not obvious (i.e. can omit)
  • #55: Organization: 1. Motivation in Summary: Active Storage, Parallel Processing, Hybrid Storage2. McSD3. ppmpiBlast4. HcDD5. Summary