SlideShare a Scribd company logo
Mario Held mario.held@de.ibm.com
Martin Kammerer martin.kammerer@de.ibm.com
07/07/2010




 Linux on System z disk I/O performance




         visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html

                                                                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




Agenda

 Linux
     –   file system
     –   special functions
     –   logical volumes
     –   monitoring
     –   High performance FICON
 Storage server
     – internal structure
     – recommended usage and configuration
 Host
     –   Linux I/O stack
     –   sample I/O attachment
     –   FICON/ECKD and FCP/SCSI I/O setups
     –   possible bottlenecks
     –   recommendations



2                     Linux on System z disk I/O performance   © 2010 IBM Corporation
Linux on System z Performance Evaluation




Linux file system

 Use ext3 instead of reiserfs
 Tune your ext3 file system
     – Select the appropriate journaling mode (journal, ordered, writeback)
     – Consider to turn off atime
     – Consider upcoming ext4
 Temporary files
     – Don't place them on journaling file systems
     – Consider a ram disk instead of a disk device
     – Consider tmpfs for temporary files
 If possible, use direct I/O to avoid double buffering of files in memory
 If possible, use async I/O to continue program execution while data is fetched
     – Important for read operations
     – Applications usually are not depending on write completions




3                     Linux on System z disk I/O performance                       © 2010 IBM Corporation
Linux on System z Performance Evaluation




I/O options

 Direct I/O (DIO)
     – Transfer the data directly from the application buffers to the device driver, avoids copying the
       data to the page cache
     – Advantages:
         • Saves page cache memory and avoids caching the same data twice
         • Allows larger buffer pools for databases
     – Disadvantage:
         • Make sure that no utility is working through the file system (page cache) --> danger of
           data corruption
         • The size of the I/O requests may be smaller
 Asynchronous I/O (AIO)
     – The application is not blocked for the time of the I/O operation
     – It resumes its processing and gets notified when the I/O is completed.
     – Advantage
          • the issuer of a read/write operation is no longer waiting until the request finishes.
          • reduces the number of I/O processes (saves memory and CPU)
 Recommendation is to use both
     – Database benchmark workloads improved throughput by 50%
4                     Linux on System z disk I/O performance                                   © 2010 IBM Corporation
Linux on System z Performance Evaluation




I/O schedulers (1)

 Four different I/O schedulers are available
     – noop scheduler
       only request merging
     – deadline scheduler
       avoids read request starvation
     – anticipatory scheduler (as scheduler)
       designed for the usage with physical disks, not intended for storage subsystems
     – complete fair queuing scheduler (cfq scheduler)
       all users of a particular drive would be able to execute about the same number of I/O requests
       over a given time.
 The default in current distributions is deadline
     – Don't change the setting to as in Linux on System z. The throughput will be impacted
       significantly




5                     Linux on System z disk I/O performance                              © 2010 IBM Corporation
Linux on System z Performance Evaluation




HowTo

 How to identify which I/O scheduler is used
     – Searching (grep) in file /var/log/boot.msg for phrase 'io scheduler' will find a line like:
        • <4>Using deadline io scheduler
     – which returns the name of the scheduler in use
 How to select the scheduler
     – The I/O scheduler is selected with the boot parameter elevator in zipl.conf
         •   [ipl]
         •   target = /boot/zipl
         •   image = /boot/image
         •   ramdisk = /boot/initrd
         •   parameters = "maxcpus=8 dasd=5000 root=/dev/dasda1 elevator=deadline”
         • where elevator ::= as | deadline | cfq | noop
 For more details see /usr/src/linux/Documentation/kernel-parameters.txt.




6                     Linux on System z disk I/O performance                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




Logical volumes

 Linear logical volumes allow an easy extension of the file system
 Striped logical volumes
     – Provide the capability to perform simultaneous I/O on different stripes
     – Allow load balancing
     – Are also extendable
 Don't use logical volumes for “/” or “/usr”
     – If the logical volume gets corrupted your system is gone
 Logical volumes require more CPU cycles than physical disks
     – Consumption increases with the number of physical disks used for the logical volume




7                     Linux on System z disk I/O performance                            © 2010 IBM Corporation
Linux on System z Performance Evaluation




Monitoring disk I/O
                                                                                                        Merged requests
                                                                                                        I/Os per second
                                                                                                        Throughput
                                                                                                        Request size
                                                                                                        Queue length
                                                                                                        Service times
                                                                                                        utilization

    Output from iostat -dmx 2 /dev/sda


    Device:           rrqm/s        wrqm/s           r/s        w/s   rMB/s   wMB/s avgrq-sz avgqu-sz   await   svctm   %util

    sda               0.00 10800.50    6.00 1370.00     0.02    46.70    69.54     7.03    5.11   0.41 56.50
    sequential write
    ---------------------------------------------------------------------------------------------------------

    sda               0.00 19089.50   21.50 495.00      0.08    76.59   304.02     6.35   12.31   1.85 95.50
    sequential write
    ---------------------------------------------------------------------------------------------------------

    sda               0.00 15610.00 259.50 1805.50      1.01    68.14    68.58    26.07   12.23   0.48 100.00
    random write
    ---------------------------------------------------------------------------------------------------------

    sda             1239.50    0.00 538.00     0.00   115.58     0.00   439.99     4.96    9.23   1.77 95.00
    sequential read
    ---------------------------------------------------------------------------------------------------------

    sda               227.00           0.00 2452.00            0.00   94.26   0.00    78.73     1.76     0.72   0.32    78.50
    random read

8                     Linux on System z disk I/O performance                                                        © 2010 IBM Corporation
Linux on System z Performance Evaluation




High Performance FICON

 Advantages:
     – Provides a simpler link protocol than FICON
     – Is capable of sending multiple channel commands to the control unit in a single entity
 Performance expectation:
     – Should improve performance of database OLTP workloads
 Prerequisites:
     –   z10 GA2 or newer
     –   FICON Express2 or newer
     –   DS8000 R4.1 or newer
     –   DS8000 High Performance FICON feature
     –   A Linux dasd driver that supports read/write track data and HPF
           • Included in SLES11 and future Linux distributions, service packs or updates)




9                     Linux on System z disk I/O performance                                © 2010 IBM Corporation
Linux on System z Performance Evaluation




High Performance FICON results (1)

 For sequential write throughput improves with read/write track data up to 1.2x

                                    Throughput for sequential write [MB/s]
     1200

     1000
                                                                                    old dasd I/O
        800                                                                         new read/write track
                                                                                    data
                                                                                    new HPF (includes
        600                                                                         read/write track data)


        400

        200

            0
                    1               2              4             8   16   32   64
                                                  number of processes
10                      Linux on System z disk I/O performance                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




High Performance FICON results (2)

 For sequential read throughput improves with read/write track data up to 1.6x

                                             Throughput for sequential read [MB/s]
     2500


     2000
                                                                                    old dasd I/O
                                                                                    new read/write track
     1500                                                                           data
                                                                                    new HPF (includes
                                                                                    read/write/track data)

     1000


        500


            0
                    1               2              4             8   16   32   64
                                                  number of processes
11                      Linux on System z disk I/O performance                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




High Performance FICON results (3)

 With small record sizes, as e.g. used by databases, HPF throughput improves up to 2.2x
  versus old Linux channel programs
                               Throughput for random reader [MB/s]
     1000
        900
        800
                                                                                    old dasd I/O
        700                                                                         new read/write track
        600                                                                         data
                                                                                    new HPF (includes
        500                                                                         read/write track data)

        400
        300
        200
        100
            0
                    1               2              4             8   16   32   64
                                                  number of processes
12                      Linux on System z disk I/O performance                                     © 2010 IBM Corporation
Linux on System z Performance Evaluation




DS8000 Disk setup

 Don't treat a storage server as a black box, understand its structure
 Principles apply to other storage vendor products as well
 Several conveniently selected disks instead of one single disk can speed up the sequential
  read/write performance to more than a triple. Use the logical volume manager to set up the
  disks.
 Avoid using subsequent disk addresses in a storage server (e.g. the addresses 5100, 5101,
  5102,.. in an IBM Storage Server), because
     – they use the same rank
     – they use the same device adapter.
 If you ask for 16 disks and your system administrator gives you addresses 5100-510F
     – From a performance perspective this is close to the worst case




13                    Linux on System z disk I/O performance                      © 2010 IBM Corporation
Linux on System z Performance Evaluation




DS8000 Architecture
                                                               ●
                                                                   structure is complex
                                                                     - disks are connected via two internal
                                                                       FCP switches for higher bandwidth
                                                               ●
                                                                   the DS8000 is still divided into two parts
                                                                     named processor complex or just server
                                                                     - caches are organized per server
                                                               ●
                                                                   one device adapter pair addresses
                                                                     4 array sites
                                                               ●
                                                                   one array site is build from 8 disks
                                                                     - disks are distributed over the front and rear
                                                                       storage enclosures
                                                                     - have the same color in the chart
                                                               ●
                                                                   one RAID array is defined using one
                                                                     array site
                                                               ●
                                                                   one rank is built using one RAID array
                                                               ●
                                                                   ranks are assigned to an extent pool
                                                               ●
                                                                   extent pools are assigned to one of the
                                                                    servers
                                                                    - this assigns also the caches
                                                               ●
                                                                   the rules are the same as for ESS
                                                                    - one disk range resides in one extent pool




14                    Linux on System z disk I/O performance                                      © 2010 IBM Corporation
Linux on System z Performance Evaluation




Rules for selecting disks

 This makes it fast
     – Use as many paths as possible (CHPID -> host adapter)
     – Spread the host adapters used across all host adapter bays
        • For ECKD switching of the paths is done automatically
        • FCP needs a fixed relation between disk and path
                                    – Use Linux multipathing for load balancing
     –   Select disks from as many ranks as possible!
     –   Switch the rank for each new disk in a logical volume
     –   Switch the ranks used between servers and device adapters
     –   Avoid reusing the same resource (path, server, device adapter, and disk) as long as possible
 Goal is to get a balanced load on all paths and physical disks
 In addition striped Linux logical volumes and / or storage pool striping may help to increase
  the overall throghput.




15                    Linux on System z disk I/O performance                               © 2010 IBM Corporation
Linux on System z Performance Evaluation




Linux kernel components involved in disk I/O
               Application program                             Issuing reads and writes


                                                               VFS dispatches requests to different devices and
           Virtual File System (VFS)                           Translates to sector addressing

     Logical Volume Manager (LVM)                              LVM defines the physical to logical device relation
                Multipath                                      Multipath sets the multipath policies
          Device mapper (dm)                                   dm holds the generic mapping of all block devices and
          Block device layer                                   performs 1:n mapping for logical volumes and/or multipath

                      Page cache                               Page cache contains all file I/O data, direct I/O bypasses the
                                                               page cache
                     I/O scheduler                             I/O schedulers merge, order and queue requests, start
                                                               device drivers via (un)plug device

                                                               Data transfer handling

     Direct Acces Storage Device driver                            Small Computer System Interface driver (SCSI)

                                            Device drivers          z Fiber Channel Protocol driver (zFCP)
                                                                     Queued Direct I/O driver (qdio)
16                    Linux on System z disk I/O performance                                                  © 2010 IBM Corporation
Linux on System z Performance Evaluation




Disk I/O attachment and storage subsystem

 Each disk is a configured volume in an
  extent pool (here extent pool = rank)                                         Device    ranks
 An extent pool can be configured for                                          Adapter
  either ECKD dasds or SCSI LUNs                                                                  1         ECKD
 Host bus adapters connect internally to




                                                                     Server 0
  both servers                                                                                    2         SCSI
                   Channel                               Host Bus
                    path 1                               Adapter 1                                          ECKD
                                                                                                  3
                   Channel                               Host Bus
                                                                                                  4         SCSI
                                         Switch




                    path 2                               Adapter 2

                   Channel                               Host Bus                                 5         ECKD
                    path 3                               Adapter 3
                                                                     Server 1
                                                                                                  6         SCSI
                   Channel                               Host Bus
                    path 4                               Adapter 4
                                                                                                  7         ECKD
                 FICON
                 Express                                                                          8         SCSI
17                    Linux on System z disk I/O performance                                      © 2010 IBM Corporation
Linux on System z Performance Evaluation




I/O processing characteristics

 FICON/ECKD:
     –   1:1 mapping host subchannel:dasd
     –   Serialization of I/Os per subchannel
     –   I/O request queue in Linux
     –   Disk blocks are 4KB
     –   High availability by FICON path groups
     –   Load balancing by FICON path groups and Parallel Access Volumes
 FCP/SCSI
     –   Several I/Os can be issued against a LUN immediately
     –   Queuing in the FICON Express card and/or in the storage server
     –   Additional I/O request queue in Linux
     –   Disk blocks are 512 bytes
     –   High availability by Linux multipathing, type failover or multibus
     –   Load balancing by Linux multipathing, type multibus




18                    Linux on System z disk I/O performance                  © 2010 IBM Corporation
Linux on System z Performance Evaluation




General layout for FICON/ECKD

                                                      The dasd driver starts the I/O on a subchannel
     Application program
                                                      Each subchannel connects to all channel paths in the path group
                                                      Each channel connects via a switch to a host bus adapter
                 VFS
                                                      A host bus adapter connects to both servers
                                                      Each server connects to its ranks
              LVM
             Multipath                                                                                     DA   ranks
               dm
     Block device layer                                 Channel                                                                     1
                                                       subsystem




                                                                                                Server 0
           Page cache
          I/O scheduler                                 a          chpid 1            HBA 1
                                                                                                                                    3
                                                        b          chpid 2            HBA 2
            dasd driver                                                      Switch
                                                        c          chpid 3            HBA 3                                         5




                                                                                                Server 1
                                                        d          chpid 4            HBA 4
                                                                                                                                    7

19                    Linux on System z disk I/O performance                                                     © 2010 IBM Corporation
Linux on System z Performance Evaluation




FICON/ECKD dasd I/O to a single disk

                                                      Assume that subchannel a corresponds to disk 2 in rank 1
     Application program                              The full choice of host adapters can be used
                                                      Only one I/O can be issued at a time through subchannel a
                 VFS                                  All other I/Os need to be queued in the dasd driver and in the block device layer until
                                                       the subchannel is no longer busy with the preceding I/O


                                                                                                                     DA   ranks
     Block device layer                                 Channel                                                                                 1
                                                       subsystem




                                                                                                          Server 0
           Page cache
          I/O scheduler                                 a          chpid 1                HBA 1
                                                                                                                                                3
                                                                   chpid 2                HBA 2
            dasd driver                                 !                        Switch
                                                                   chpid 3                HBA 3                                                 5




                                                                                                          Server 1
                                                                   chpid 4                HBA 4
                                                                                                                                                7

20                    Linux on System z disk I/O performance                                                                 © 2010 IBM Corporation
Linux on System z Performance Evaluation




FICON/ECKD dasd I/O to a single disk with PAV (SLES10 / RHEL5)

                                                      VFS sees one device
     Application program                              The device mapper sees the real device and all alias devices
                                                      Parallel Access Volumes solve the I/O queuing in front of the subchannel
                                                      Each alias device uses its own subchannel
                 VFS                                  Additional processor cycles are spent to do the load balancing in the device mapper
                                                      The next slowdown is the fact that only one disk is used in the storage server. This implies the use of
                                                       only one rank, one device adapter, one server

             Multipath
               dm                   !                                                                                          DA   ! ranks
     Block device layer                                 Channel                                                                                              1
                                                       subsystem




                                                                                                                    Server 0
           Page cache
          I/O scheduler                                 a             chpid 1                    HBA 1
                                                                                                                                                             3
                                                        b             chpid 2                    HBA 2
            dasd driver                                                                Switch
                                                        c             chpid 3                    HBA 3                                                       5




                                                                                                                    Server 1
                                                        d             chpid 4                    HBA 4
                                                                                                                                                             7

21                    Linux on System z disk I/O performance                                                                              © 2010 IBM Corporation
Linux on System z Performance Evaluation




FICON/ECKD dasd I/O to a single disk with HyperPAV (SLES11)

                                                    VFS sees one device
     Application program                            The dasd driver sees the real device and all alias devices
                                                    Load balancing with HyperPAV and static PAV is done in the dasd driver. The aliases need only to be
                                                     added to Linux. The load balancing works better than on the device mapper layer.
                 VFS                                Less additional processor cycles are needed than with Linux multipath.
                                                    The next slowdown is the fact that only one disk is used in the storage server. This implies the use of
                                                     only one rank, one device adapter, one server


                                                                                                                           DA   ! ranks
     Block device layer                               Channel                                                                                             1
                                                     subsystem




                                                                                                                Server 0
           Page cache
          I/O scheduler                                 a           chpid 1                   HBA 1
                                                                                                                                                          3
                                                        b           chpid 2                   HBA 2
            dasd driver                                                             Switch
                                                        c           chpid 3                   HBA 3                                                       5




                                                                                                                Server 1
                                                        d           chpid 4                   HBA 4
                                                                                                                                                          7

22                    Linux on System z disk I/O performance                                                                           © 2010 IBM Corporation
Linux on System z Performance Evaluation




FICON/ECKD dasd I/O to a linear or striped logical volume

                                                     VFS sees one device (logical volume)
     Application program                             The device mapper sees the logical volume and the physical volumes
                                                     Additional processor cycles are spent to map the I/Os to the physical volumes.
                                                     Striped logical volumes require more additional processor cycles than linear logical volumes
                 VFS                                 With a striped logical volume the I/Os can be well balanced over the entire storage server and overcome limitations
                                                      from a single rank, a single device adapter or a single server
                                                     To ensure that I/O to one physical disk is not limited by one subchannel, PAV or HyperPAV should be used in
                                                      combination with logical volumes
                LVM
                                 !                                                                                                 DA        ranks
                  dm
     Block device layer                                  Channel                                                                                                    1
                                                        subsystem




                                                                                                                        Server 0
           Page cache
          I/O scheduler                                  a             chpid 1                     HBA 1
                                                                                                                                                                    3
                                                         b             chpid 2                     HBA 2
            dasd driver                                                                  Switch
                                                         c             chpid 3                     HBA 3                                                            5




                                                                                                                        Server 1
                                                         d             chpid 4                     HBA 4
                                                                                                                                                                    7

23                     Linux on System z disk I/O performance                                                                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




FICON/ECKD dasd I/O to a storage pool striped volume with
HyperPAV (SLES11)

                                                    A storage pool striped volume makes only sense in combination with PAV or HyperPAV
     Application program                            VFS sees one device
                                                     The dasd driver sees the real device and all alias devices
                                                   
                                                    The storage pool striped volume spans over several ranks of one server and overcomes the limitations of a single rank and / or a
                                                     single device adapter
                 VFS                                Storage pool striped volumes can also be used as physical disks for a logical volume to use both server sides.
                                                    To ensure that I/O to one dasd is not limited by one subchannel, PAV or HyperPAV should be used




                                                                                                                                 ! DA                  ranks
     Block device layer                                 Channel                                                                                                                 1
                                                       subsystem




                                                                                                                                Server 0
           Page cache
          I/O scheduler                                 a               chpid 1                         HBA 1
                                                                                                                                                 extent pool                    3
                                                        b               chpid 2                         HBA 2
            dasd driver                                                                      Switch
                                                        c               chpid 3                         HBA 3                                                                   5




                                                                                                                                Server 1
                                                        d               chpid 4                         HBA 4
                                                                                                                                                                                7

24                    Linux on System z disk I/O performance                                                                                               © 2010 IBM Corporation
Linux on System z Performance Evaluation




General layout for FCP/SCSI

                                                      The SCSI driver finalizes the I/O requests
     Application program
                                                      The zFCP driver adds the FCP protocol to the requests
                                                      The qdio driver transfers the I/O to the channel
                 VFS
                                                      A host bus adapter connects to both servers
                                                      Each server connects to its ranks
              LVM
             Multipath                                                                                     DA   ranks
               dm
     Block device layer




                                                                                                Server 0
           Page cache
                                                                                                                                    2
          I/O scheduler                                          chpid 1             HBA 1

                                                                 chpid 2             HBA 2                                          4
             SCSI driver                                                    Switch
             zFCP driver
                                                                 chpid 3             HBA 3
              qdio driver




                                                                                                Server 1
                                                                                                                                    6
                                                                 chpid 4             HBA 4


                                                                                                                                    8
25                    Linux on System z disk I/O performance                                                     © 2010 IBM Corporation
Linux on System z Performance Evaluation




FCP/SCSI LUN I/O to a single disk

                                                      Assume that disk 3 in rank 8 is reachable via channel 2 and host bus adapter 2
     Application program                              Up to 32 (default value) I/O requests can be sent out to disk 3 before the first completion is required
                                                      The throughput will be limited by the rank and / or the device adapter
                                                      There is no high availability provided for the connection between the host and the storage server
                 VFS


                                                                                                                                     DA         ranks
     Block device layer




                                                                                                                          Server 0
           Page cache
                                                                                                                                                                      2
          I/O scheduler                                                 chpid 1                      HBA 1

                                                                        chpid 2                      HBA 2                                                            4
             SCSI driver                                                                   Switch
             zFCP driver
                                                                        chpid 3                      HBA 3
              qdio driver
                                                          !



                                                                                                                          Server 1
                                                                                                                                                                      6
                                                                        chpid 4                      HBA 4


                                                                                                                                                                      8
26                    Linux on System z disk I/O performance
                                                                                                                                           !       © 2010 IBM Corporation
Linux on System z Performance Evaluation




FCP/SCSI LUN I/O to a single disk with multipathing

                                                    VFS sees one device
     Application program                            The device mapper sees the multibus or failover alternatives to the same disk
                                                    Administrational effort is required to define all paths to one disk
                 VFS                                Additional processor cycles are spent to do the mapping to the desired path for the disk
                                                     in the device mapper
                                                    Multibus requires more additonal processor cycles than failover

             Multipath                                                                                            DA       ranks
               dm               !
     Block device layer




                                                                                                       Server 0
           Page cache
                                                                                                                                               2
          I/O scheduler                                          chpid 1                HBA 1

                                                                 chpid 2                HBA 2                                                  4
             SCSI driver                                                       Switch
             zFCP driver
                                                                 chpid 3                HBA 3
              qdio driver




                                                                                                       Server 1
                                                                                                                                               6
                                                                 chpid 4                HBA 4


                                                                                                                                               8
27                    Linux on System z disk I/O performance
                                                                                                                       !    © 2010 IBM Corporation
Linux on System z Performance Evaluation




FCP/SCSI LUN I/O to a to a linear or striped logical volume

                                                     VFS sees one device (logical volume)
     Application program                             The device mapper sees the logical volume and the physical volumes
                                                     Additional processor cycles are spent to do map the I/Os to the physical volumes.
                                                     Striped logical volumes require more additional processor cycles than linear logical volumes
                 VFS                                 With a striped logical volume the I/Os can be well balanced over the entire storage server and
                                                      overcome limitations from a single rank, a single device adapter or a single server
                                                     To ensure high availability the logical volume should be used in combination with multipathing
                LVM
                                 !                                                                                      DA    ranks
                  dm
     Block device layer




                                                                                                             Server 0
           Page cache
                                                                                                                                                    2
          I/O scheduler                                            chpid 1                  HBA 1

                                                                   chpid 2                  HBA 2                                                   4
             SCSI driver                                                           Switch
             zFCP driver
                                                                   chpid 3                  HBA 3
              qdio driver




                                                                                                             Server 1
                                                                                                                                                    6
                                                                   chpid 4                  HBA 4


                                                                                                                                                    8
28                     Linux on System z disk I/O performance                                                                    © 2010 IBM Corporation
Linux on System z Performance Evaluation




FCP/SCSI LUN I/O to a storage pool striped volume with
multipathing

                                                    Storage pool striped volumes make no sense without high availability
     Application program                            VFS sees one device
                                                    The device mapper sees the multibus or failover alternatives to the same disk
                                                     The storage pool striped volume spans over several ranks of one server and overcomes the limitations
                 VFS                               
                                                     of a single rank and / or a single device adapter
                                                    Storage pool striped volumes can also be used as physical disks for a logical volume to make use of
                                                     both server sides.

             Multipath                                                                                                   DA     ranks
               dm               !
     Block device layer




                                                                                                              Server 0
           Page cache
                                                                                                                                                       2
          I/O scheduler                                            chpid 1                  HBA 1

                                                                   chpid 2                  HBA 2                                                      4
             SCSI driver                                                           Switch
             zFCP driver
                                                                   chpid 3                  HBA 3
              qdio driver




                                                                                                              Server 1
                                                                                                                                                       6
                                                                   chpid 4                  HBA 4


                                                                                                                              extent pool              8
29                    Linux on System z disk I/O performance
                                                                                                                !                   © 2010 IBM Corporation
Linux on System z Performance Evaluation




Summary, recommendations and outlook

 FICON/ECKD
     –   Storage pool striped disks (no disk placement)
     –   HyperPAV (SLES11)
     –   Large volume (RHEL5, SLES11)
     –   High Performance FICON (SLES11)
 FCP/SCSI
     – Linux LV with striping (disk placement)
     – Multipath with failover




30                    Linux on System z disk I/O performance   © 2010 IBM Corporation
Linux on System z Performance Evaluation



Trademarks



IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United
States, other countries, or both.

Other product and service names might be trademarks of IBM or other companies.




Source: If applicable, describe source origin

31                            Linux on System z disk I/O performance                                © 2010 IBM Corporation

More Related Content

PDF
Reliability, Availability and Serviceability on Linux
PDF
Linux on System z – disk I/O performance
PDF
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
PPT
Unit 1 four part pocessor and memory
PDF
DB2 for z/OS - Starter's guide to memory monitoring and control
PDF
IBM Flex System p24L, p260 and p460 Compute Nodes
PPT
Lec10. Memory and storage
Reliability, Availability and Serviceability on Linux
Linux on System z – disk I/O performance
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
Unit 1 four part pocessor and memory
DB2 for z/OS - Starter's guide to memory monitoring and control
IBM Flex System p24L, p260 and p460 Compute Nodes
Lec10. Memory and storage

What's hot (15)

PDF
03unixintro2
PPT
Ch14 system administration
PPTX
Cpu spec
PDF
DB2 Accounting Reporting
PDF
Computer Hardware & Software Lab Manual 3
PDF
06threadsimp
PPTX
What is Bootloader???
PDF
01intro
PDF
Intel Roadmap 2010
PPT
PPT
PDF
03 Hadoop
PPT
Ch18 system administration
PDF
Router internals
03unixintro2
Ch14 system administration
Cpu spec
DB2 Accounting Reporting
Computer Hardware & Software Lab Manual 3
06threadsimp
What is Bootloader???
01intro
Intel Roadmap 2010
03 Hadoop
Ch18 system administration
Router internals
Ad

Viewers also liked (20)

PPT
Impacto de las tic en el aula
PDF
Rubrica del proyecto tarea 5
PPT
PPT
Presentación curso itsm cap4
PPT
Presentación curso itsm cap8
PPT
Presentación curso itsm cap11
PPT
Presentación curso itsm cap10
PPTX
IBM FlashSystems A9000/R presentation
PDF
IBM Flash System® Family webinar 9 de noviembre de 2016
PPTX
Storage Spectrum and Cloud deck late 2016
PDF
XIV Storage deck final
PPTX
FlashSystem February 2017
PPTX
Storwize SVC presentation February 2017
PPT
Masters stretched svc-cluster-2012-04-13 v2
PPT
Ds8000 Practical Performance Analysis P04 20060718
PDF
Xiv overview
PDF
Xiv svc best practices - march 2013
PPT
IBM SAN Volume Controller Performance Analysis
PDF
IBM XIV Gen3 Storage System
PPT
Ibm flash system v9000 technical deep dive workshop
Impacto de las tic en el aula
Rubrica del proyecto tarea 5
Presentación curso itsm cap4
Presentación curso itsm cap8
Presentación curso itsm cap11
Presentación curso itsm cap10
IBM FlashSystems A9000/R presentation
IBM Flash System® Family webinar 9 de noviembre de 2016
Storage Spectrum and Cloud deck late 2016
XIV Storage deck final
FlashSystem February 2017
Storwize SVC presentation February 2017
Masters stretched svc-cluster-2012-04-13 v2
Ds8000 Practical Performance Analysis P04 20060718
Xiv overview
Xiv svc best practices - march 2013
IBM SAN Volume Controller Performance Analysis
IBM XIV Gen3 Storage System
Ibm flash system v9000 technical deep dive workshop
Ad

Similar to Linux on System z disk I/O performance (20)

PPTX
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
PDF
Filesystem Performance from a Database Perspective
PPTX
UKOUG, Lies, Damn Lies and I/O Statistics
PPTX
9_Storage_Devices.pptx
PPTX
19th Session.pptx
PDF
Database performance tuning for SSD based storage
PDF
SCSI over FCP for Linux on System z
PDF
SSD based storage tuning for databases
PPTX
9_Storage_Devices.pptx
PDF
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
PDF
Linux on System z Update: Current & Future Linux on System z Technology
PPTX
Optimizing your Infrastrucure and Operating System for Hadoop
PDF
Measuring Firebird Disk I/O
ODP
Apache con 2013-hadoop
PPTX
IO Dubi Lebel
PDF
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
PPT
Design an I/O system
PDF
Introduction to SCSI over FCP for Linux on System z
PDF
Researching postgresql
PPT
SQL 2005 Disk IO Performance
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Filesystem Performance from a Database Perspective
UKOUG, Lies, Damn Lies and I/O Statistics
9_Storage_Devices.pptx
19th Session.pptx
Database performance tuning for SSD based storage
SCSI over FCP for Linux on System z
SSD based storage tuning for databases
9_Storage_Devices.pptx
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
Linux on System z Update: Current & Future Linux on System z Technology
Optimizing your Infrastrucure and Operating System for Hadoop
Measuring Firebird Disk I/O
Apache con 2013-hadoop
IO Dubi Lebel
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Design an I/O system
Introduction to SCSI over FCP for Linux on System z
Researching postgresql
SQL 2005 Disk IO Performance

More from IBM India Smarter Computing (20)

PDF
Using the IBM XIV Storage System in OpenStack Cloud Environments
PDF
All-flash Needs End to End Storage Efficiency
PDF
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
PDF
IBM FlashSystem 840 Product Guide
PDF
IBM System x3250 M5
PDF
IBM NeXtScale nx360 M4
PDF
IBM System x3650 M4 HD
PDF
IBM System x3300 M4
PDF
IBM System x iDataPlex dx360 M4
PDF
IBM System x3500 M4
PDF
IBM System x3550 M4
PDF
IBM System x3650 M4
PDF
IBM System x3500 M3
PDF
IBM System x3400 M3
PDF
IBM System x3250 M3
PDF
IBM System x3200 M3
PDF
IBM PowerVC Introduction and Configuration
PDF
A Comparison of PowerVM and Vmware Virtualization Performance
PDF
IBM pureflex system and vmware vcloud enterprise suite reference architecture
PDF
X6: The sixth generation of EXA Technology
Using the IBM XIV Storage System in OpenStack Cloud Environments
All-flash Needs End to End Storage Efficiency
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
IBM FlashSystem 840 Product Guide
IBM System x3250 M5
IBM NeXtScale nx360 M4
IBM System x3650 M4 HD
IBM System x3300 M4
IBM System x iDataPlex dx360 M4
IBM System x3500 M4
IBM System x3550 M4
IBM System x3650 M4
IBM System x3500 M3
IBM System x3400 M3
IBM System x3250 M3
IBM System x3200 M3
IBM PowerVC Introduction and Configuration
A Comparison of PowerVM and Vmware Virtualization Performance
IBM pureflex system and vmware vcloud enterprise suite reference architecture
X6: The sixth generation of EXA Technology

Linux on System z disk I/O performance

  • 1. Mario Held mario.held@de.ibm.com Martin Kammerer martin.kammerer@de.ibm.com 07/07/2010 Linux on System z disk I/O performance visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html © 2010 IBM Corporation
  • 2. Linux on System z Performance Evaluation Agenda  Linux – file system – special functions – logical volumes – monitoring – High performance FICON  Storage server – internal structure – recommended usage and configuration  Host – Linux I/O stack – sample I/O attachment – FICON/ECKD and FCP/SCSI I/O setups – possible bottlenecks – recommendations 2 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 3. Linux on System z Performance Evaluation Linux file system  Use ext3 instead of reiserfs  Tune your ext3 file system – Select the appropriate journaling mode (journal, ordered, writeback) – Consider to turn off atime – Consider upcoming ext4  Temporary files – Don't place them on journaling file systems – Consider a ram disk instead of a disk device – Consider tmpfs for temporary files  If possible, use direct I/O to avoid double buffering of files in memory  If possible, use async I/O to continue program execution while data is fetched – Important for read operations – Applications usually are not depending on write completions 3 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 4. Linux on System z Performance Evaluation I/O options  Direct I/O (DIO) – Transfer the data directly from the application buffers to the device driver, avoids copying the data to the page cache – Advantages: • Saves page cache memory and avoids caching the same data twice • Allows larger buffer pools for databases – Disadvantage: • Make sure that no utility is working through the file system (page cache) --> danger of data corruption • The size of the I/O requests may be smaller  Asynchronous I/O (AIO) – The application is not blocked for the time of the I/O operation – It resumes its processing and gets notified when the I/O is completed. – Advantage • the issuer of a read/write operation is no longer waiting until the request finishes. • reduces the number of I/O processes (saves memory and CPU)  Recommendation is to use both – Database benchmark workloads improved throughput by 50% 4 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 5. Linux on System z Performance Evaluation I/O schedulers (1)  Four different I/O schedulers are available – noop scheduler only request merging – deadline scheduler avoids read request starvation – anticipatory scheduler (as scheduler) designed for the usage with physical disks, not intended for storage subsystems – complete fair queuing scheduler (cfq scheduler) all users of a particular drive would be able to execute about the same number of I/O requests over a given time.  The default in current distributions is deadline – Don't change the setting to as in Linux on System z. The throughput will be impacted significantly 5 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 6. Linux on System z Performance Evaluation HowTo  How to identify which I/O scheduler is used – Searching (grep) in file /var/log/boot.msg for phrase 'io scheduler' will find a line like: • <4>Using deadline io scheduler – which returns the name of the scheduler in use  How to select the scheduler – The I/O scheduler is selected with the boot parameter elevator in zipl.conf • [ipl] • target = /boot/zipl • image = /boot/image • ramdisk = /boot/initrd • parameters = "maxcpus=8 dasd=5000 root=/dev/dasda1 elevator=deadline” • where elevator ::= as | deadline | cfq | noop  For more details see /usr/src/linux/Documentation/kernel-parameters.txt. 6 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 7. Linux on System z Performance Evaluation Logical volumes  Linear logical volumes allow an easy extension of the file system  Striped logical volumes – Provide the capability to perform simultaneous I/O on different stripes – Allow load balancing – Are also extendable  Don't use logical volumes for “/” or “/usr” – If the logical volume gets corrupted your system is gone  Logical volumes require more CPU cycles than physical disks – Consumption increases with the number of physical disks used for the logical volume 7 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 8. Linux on System z Performance Evaluation Monitoring disk I/O Merged requests I/Os per second Throughput Request size Queue length Service times utilization Output from iostat -dmx 2 /dev/sda Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 10800.50 6.00 1370.00 0.02 46.70 69.54 7.03 5.11 0.41 56.50 sequential write --------------------------------------------------------------------------------------------------------- sda 0.00 19089.50 21.50 495.00 0.08 76.59 304.02 6.35 12.31 1.85 95.50 sequential write --------------------------------------------------------------------------------------------------------- sda 0.00 15610.00 259.50 1805.50 1.01 68.14 68.58 26.07 12.23 0.48 100.00 random write --------------------------------------------------------------------------------------------------------- sda 1239.50 0.00 538.00 0.00 115.58 0.00 439.99 4.96 9.23 1.77 95.00 sequential read --------------------------------------------------------------------------------------------------------- sda 227.00 0.00 2452.00 0.00 94.26 0.00 78.73 1.76 0.72 0.32 78.50 random read 8 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 9. Linux on System z Performance Evaluation High Performance FICON  Advantages: – Provides a simpler link protocol than FICON – Is capable of sending multiple channel commands to the control unit in a single entity  Performance expectation: – Should improve performance of database OLTP workloads  Prerequisites: – z10 GA2 or newer – FICON Express2 or newer – DS8000 R4.1 or newer – DS8000 High Performance FICON feature – A Linux dasd driver that supports read/write track data and HPF • Included in SLES11 and future Linux distributions, service packs or updates) 9 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 10. Linux on System z Performance Evaluation High Performance FICON results (1)  For sequential write throughput improves with read/write track data up to 1.2x Throughput for sequential write [MB/s] 1200 1000 old dasd I/O 800 new read/write track data new HPF (includes 600 read/write track data) 400 200 0 1 2 4 8 16 32 64 number of processes 10 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 11. Linux on System z Performance Evaluation High Performance FICON results (2)  For sequential read throughput improves with read/write track data up to 1.6x Throughput for sequential read [MB/s] 2500 2000 old dasd I/O new read/write track 1500 data new HPF (includes read/write/track data) 1000 500 0 1 2 4 8 16 32 64 number of processes 11 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 12. Linux on System z Performance Evaluation High Performance FICON results (3)  With small record sizes, as e.g. used by databases, HPF throughput improves up to 2.2x versus old Linux channel programs Throughput for random reader [MB/s] 1000 900 800 old dasd I/O 700 new read/write track 600 data new HPF (includes 500 read/write track data) 400 300 200 100 0 1 2 4 8 16 32 64 number of processes 12 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 13. Linux on System z Performance Evaluation DS8000 Disk setup  Don't treat a storage server as a black box, understand its structure  Principles apply to other storage vendor products as well  Several conveniently selected disks instead of one single disk can speed up the sequential read/write performance to more than a triple. Use the logical volume manager to set up the disks.  Avoid using subsequent disk addresses in a storage server (e.g. the addresses 5100, 5101, 5102,.. in an IBM Storage Server), because – they use the same rank – they use the same device adapter.  If you ask for 16 disks and your system administrator gives you addresses 5100-510F – From a performance perspective this is close to the worst case 13 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 14. Linux on System z Performance Evaluation DS8000 Architecture ● structure is complex - disks are connected via two internal FCP switches for higher bandwidth ● the DS8000 is still divided into two parts named processor complex or just server - caches are organized per server ● one device adapter pair addresses 4 array sites ● one array site is build from 8 disks - disks are distributed over the front and rear storage enclosures - have the same color in the chart ● one RAID array is defined using one array site ● one rank is built using one RAID array ● ranks are assigned to an extent pool ● extent pools are assigned to one of the servers - this assigns also the caches ● the rules are the same as for ESS - one disk range resides in one extent pool 14 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 15. Linux on System z Performance Evaluation Rules for selecting disks  This makes it fast – Use as many paths as possible (CHPID -> host adapter) – Spread the host adapters used across all host adapter bays • For ECKD switching of the paths is done automatically • FCP needs a fixed relation between disk and path – Use Linux multipathing for load balancing – Select disks from as many ranks as possible! – Switch the rank for each new disk in a logical volume – Switch the ranks used between servers and device adapters – Avoid reusing the same resource (path, server, device adapter, and disk) as long as possible  Goal is to get a balanced load on all paths and physical disks  In addition striped Linux logical volumes and / or storage pool striping may help to increase the overall throghput. 15 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 16. Linux on System z Performance Evaluation Linux kernel components involved in disk I/O Application program Issuing reads and writes VFS dispatches requests to different devices and Virtual File System (VFS) Translates to sector addressing Logical Volume Manager (LVM) LVM defines the physical to logical device relation Multipath Multipath sets the multipath policies Device mapper (dm) dm holds the generic mapping of all block devices and Block device layer performs 1:n mapping for logical volumes and/or multipath Page cache Page cache contains all file I/O data, direct I/O bypasses the page cache I/O scheduler I/O schedulers merge, order and queue requests, start device drivers via (un)plug device Data transfer handling Direct Acces Storage Device driver Small Computer System Interface driver (SCSI) Device drivers z Fiber Channel Protocol driver (zFCP) Queued Direct I/O driver (qdio) 16 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 17. Linux on System z Performance Evaluation Disk I/O attachment and storage subsystem  Each disk is a configured volume in an extent pool (here extent pool = rank) Device ranks  An extent pool can be configured for Adapter either ECKD dasds or SCSI LUNs 1 ECKD  Host bus adapters connect internally to Server 0 both servers 2 SCSI Channel Host Bus path 1 Adapter 1 ECKD 3 Channel Host Bus 4 SCSI Switch path 2 Adapter 2 Channel Host Bus 5 ECKD path 3 Adapter 3 Server 1 6 SCSI Channel Host Bus path 4 Adapter 4 7 ECKD FICON Express 8 SCSI 17 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 18. Linux on System z Performance Evaluation I/O processing characteristics  FICON/ECKD: – 1:1 mapping host subchannel:dasd – Serialization of I/Os per subchannel – I/O request queue in Linux – Disk blocks are 4KB – High availability by FICON path groups – Load balancing by FICON path groups and Parallel Access Volumes  FCP/SCSI – Several I/Os can be issued against a LUN immediately – Queuing in the FICON Express card and/or in the storage server – Additional I/O request queue in Linux – Disk blocks are 512 bytes – High availability by Linux multipathing, type failover or multibus – Load balancing by Linux multipathing, type multibus 18 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 19. Linux on System z Performance Evaluation General layout for FICON/ECKD  The dasd driver starts the I/O on a subchannel Application program  Each subchannel connects to all channel paths in the path group  Each channel connects via a switch to a host bus adapter VFS  A host bus adapter connects to both servers  Each server connects to its ranks LVM Multipath DA ranks dm Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 3 b chpid 2 HBA 2 dasd driver Switch c chpid 3 HBA 3 5 Server 1 d chpid 4 HBA 4 7 19 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 20. Linux on System z Performance Evaluation FICON/ECKD dasd I/O to a single disk  Assume that subchannel a corresponds to disk 2 in rank 1 Application program  The full choice of host adapters can be used  Only one I/O can be issued at a time through subchannel a VFS  All other I/Os need to be queued in the dasd driver and in the block device layer until the subchannel is no longer busy with the preceding I/O DA ranks Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 3 chpid 2 HBA 2 dasd driver ! Switch chpid 3 HBA 3 5 Server 1 chpid 4 HBA 4 7 20 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 21. Linux on System z Performance Evaluation FICON/ECKD dasd I/O to a single disk with PAV (SLES10 / RHEL5)  VFS sees one device Application program  The device mapper sees the real device and all alias devices  Parallel Access Volumes solve the I/O queuing in front of the subchannel  Each alias device uses its own subchannel VFS  Additional processor cycles are spent to do the load balancing in the device mapper  The next slowdown is the fact that only one disk is used in the storage server. This implies the use of only one rank, one device adapter, one server Multipath dm ! DA ! ranks Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 3 b chpid 2 HBA 2 dasd driver Switch c chpid 3 HBA 3 5 Server 1 d chpid 4 HBA 4 7 21 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 22. Linux on System z Performance Evaluation FICON/ECKD dasd I/O to a single disk with HyperPAV (SLES11)  VFS sees one device Application program  The dasd driver sees the real device and all alias devices  Load balancing with HyperPAV and static PAV is done in the dasd driver. The aliases need only to be added to Linux. The load balancing works better than on the device mapper layer. VFS  Less additional processor cycles are needed than with Linux multipath.  The next slowdown is the fact that only one disk is used in the storage server. This implies the use of only one rank, one device adapter, one server DA ! ranks Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 3 b chpid 2 HBA 2 dasd driver Switch c chpid 3 HBA 3 5 Server 1 d chpid 4 HBA 4 7 22 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 23. Linux on System z Performance Evaluation FICON/ECKD dasd I/O to a linear or striped logical volume  VFS sees one device (logical volume) Application program  The device mapper sees the logical volume and the physical volumes  Additional processor cycles are spent to map the I/Os to the physical volumes.  Striped logical volumes require more additional processor cycles than linear logical volumes VFS  With a striped logical volume the I/Os can be well balanced over the entire storage server and overcome limitations from a single rank, a single device adapter or a single server  To ensure that I/O to one physical disk is not limited by one subchannel, PAV or HyperPAV should be used in combination with logical volumes LVM ! DA ranks dm Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 3 b chpid 2 HBA 2 dasd driver Switch c chpid 3 HBA 3 5 Server 1 d chpid 4 HBA 4 7 23 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 24. Linux on System z Performance Evaluation FICON/ECKD dasd I/O to a storage pool striped volume with HyperPAV (SLES11)  A storage pool striped volume makes only sense in combination with PAV or HyperPAV Application program  VFS sees one device The dasd driver sees the real device and all alias devices   The storage pool striped volume spans over several ranks of one server and overcomes the limitations of a single rank and / or a single device adapter VFS  Storage pool striped volumes can also be used as physical disks for a logical volume to use both server sides.  To ensure that I/O to one dasd is not limited by one subchannel, PAV or HyperPAV should be used ! DA ranks Block device layer Channel 1 subsystem Server 0 Page cache I/O scheduler a chpid 1 HBA 1 extent pool 3 b chpid 2 HBA 2 dasd driver Switch c chpid 3 HBA 3 5 Server 1 d chpid 4 HBA 4 7 24 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 25. Linux on System z Performance Evaluation General layout for FCP/SCSI  The SCSI driver finalizes the I/O requests Application program  The zFCP driver adds the FCP protocol to the requests  The qdio driver transfers the I/O to the channel VFS  A host bus adapter connects to both servers  Each server connects to its ranks LVM Multipath DA ranks dm Block device layer Server 0 Page cache 2 I/O scheduler chpid 1 HBA 1 chpid 2 HBA 2 4 SCSI driver Switch zFCP driver chpid 3 HBA 3 qdio driver Server 1 6 chpid 4 HBA 4 8 25 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 26. Linux on System z Performance Evaluation FCP/SCSI LUN I/O to a single disk  Assume that disk 3 in rank 8 is reachable via channel 2 and host bus adapter 2 Application program  Up to 32 (default value) I/O requests can be sent out to disk 3 before the first completion is required  The throughput will be limited by the rank and / or the device adapter  There is no high availability provided for the connection between the host and the storage server VFS DA ranks Block device layer Server 0 Page cache 2 I/O scheduler chpid 1 HBA 1 chpid 2 HBA 2 4 SCSI driver Switch zFCP driver chpid 3 HBA 3 qdio driver ! Server 1 6 chpid 4 HBA 4 8 26 Linux on System z disk I/O performance ! © 2010 IBM Corporation
  • 27. Linux on System z Performance Evaluation FCP/SCSI LUN I/O to a single disk with multipathing  VFS sees one device Application program  The device mapper sees the multibus or failover alternatives to the same disk  Administrational effort is required to define all paths to one disk VFS  Additional processor cycles are spent to do the mapping to the desired path for the disk in the device mapper  Multibus requires more additonal processor cycles than failover Multipath DA ranks dm ! Block device layer Server 0 Page cache 2 I/O scheduler chpid 1 HBA 1 chpid 2 HBA 2 4 SCSI driver Switch zFCP driver chpid 3 HBA 3 qdio driver Server 1 6 chpid 4 HBA 4 8 27 Linux on System z disk I/O performance ! © 2010 IBM Corporation
  • 28. Linux on System z Performance Evaluation FCP/SCSI LUN I/O to a to a linear or striped logical volume  VFS sees one device (logical volume) Application program  The device mapper sees the logical volume and the physical volumes  Additional processor cycles are spent to do map the I/Os to the physical volumes.  Striped logical volumes require more additional processor cycles than linear logical volumes VFS  With a striped logical volume the I/Os can be well balanced over the entire storage server and overcome limitations from a single rank, a single device adapter or a single server  To ensure high availability the logical volume should be used in combination with multipathing LVM ! DA ranks dm Block device layer Server 0 Page cache 2 I/O scheduler chpid 1 HBA 1 chpid 2 HBA 2 4 SCSI driver Switch zFCP driver chpid 3 HBA 3 qdio driver Server 1 6 chpid 4 HBA 4 8 28 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 29. Linux on System z Performance Evaluation FCP/SCSI LUN I/O to a storage pool striped volume with multipathing  Storage pool striped volumes make no sense without high availability Application program  VFS sees one device  The device mapper sees the multibus or failover alternatives to the same disk The storage pool striped volume spans over several ranks of one server and overcomes the limitations VFS  of a single rank and / or a single device adapter  Storage pool striped volumes can also be used as physical disks for a logical volume to make use of both server sides. Multipath DA ranks dm ! Block device layer Server 0 Page cache 2 I/O scheduler chpid 1 HBA 1 chpid 2 HBA 2 4 SCSI driver Switch zFCP driver chpid 3 HBA 3 qdio driver Server 1 6 chpid 4 HBA 4 extent pool 8 29 Linux on System z disk I/O performance ! © 2010 IBM Corporation
  • 30. Linux on System z Performance Evaluation Summary, recommendations and outlook  FICON/ECKD – Storage pool striped disks (no disk placement) – HyperPAV (SLES11) – Large volume (RHEL5, SLES11) – High Performance FICON (SLES11)  FCP/SCSI – Linux LV with striping (disk placement) – Multipath with failover 30 Linux on System z disk I/O performance © 2010 IBM Corporation
  • 31. Linux on System z Performance Evaluation Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Other product and service names might be trademarks of IBM or other companies. Source: If applicable, describe source origin 31 Linux on System z disk I/O performance © 2010 IBM Corporation