SlideShare a Scribd company logo
Exadata and OLTP
Enkitec Extreme Exadata Expo
     august 13-14 Dallas


       Frits Hoogland
Who am I?
 Frits Hoogland
 – Working with Oracle products since 1996
 – Working with VX Company since 2009
 Interests
 – Databases, Operating Systems, Application Servers
 – Web techniques, TCP/IP, network security
 – Technical security, performance
Twitter: @fritshoogland
 Blog: http://fritshoogland.wordpress.com
 Email: fhoogland@vxcompany.com
 Oracle ACE Director
 OakTable member
What is exadata
    – Engineered system specifically for oracle database.
    – Ability to reach high number of read IOPS and huge
      bandwidth.
    – Has it‟s own patchbundles.
    – Validated versions and patch versions across database,
      clusterware, o/s and storage, firmware.
    – Dedicated, private storage for databases.
    – ASM.
    – Recent hardware & recent CPU.
    – No virtualisation.




3
Exadata versions

    – Oracle database 64 bit version >= 11

    – ASM 64 bit version >= 11
       - Exadata communication is layer in skgxp code

    – Linux OL5 x64
       - No UEK kernel used (except X2-8)




4
Exadata hardware

    – Intel Xeon server hardware

    – Infiniband 40Gb/s

    – Oracle cell (storage) server
       - Flash to mimic SAN cache
       - High performance disks or high capacity disks
          - 600GB 15k RPM / ~ 5ms latency
          - 2/3TB 7.2k RPM / ~ 8ms latency




5
Flash
    – Flashcards are in every storage server
    – Total of 384GB per storage server

    – Do not confuse exadata STORAGE server flashcache
      with oracle database flashcache

    – Flash can be configured either as cache (flash cache
      and flash log) or as diskgroup or both

    – When flash is used as diskgroup latency is ~ 1 ms
      - Much faster than disk
      - My guess was < 400µs
        - 1µs infiniband
        - 200µs flash IO time
        - some time for storage server
6
Flash
    – Flash is restricted to 4x96GB = 384GB per storage
      server.
      - Totals:
         - Q:1152GB, H:2688GB, F:5376GB

      - Net (ASM Normal redundancy):
         - Q: 576GB, H:1344GB, F:2688GB


    – That is a very limited amount of storage.

    – But with flash as diskgroup there‟s no cache for PIO‟s!




7
Exadata specific features
    – The secret sauce of exadata: the storage server

      - smartscan

      - storage indexes

      - EHCC *

      - IO Resource manager




8
OLTP
     – How does OLTP look like (in general | simplistic)

     – Fetch small amount of data
       - Invoice numbers, client id, product id
       - select single values or small ranges via index

     – Create or update rows
       - Sold items on invoice, payments, order status
       - insert or update values




10
SLOB
     – A great way to mimic or measure OLTP performance is

                                 SLOB

     – Silly Little Oracle Benchmark

     – Author: Kevin Closson
     – http://oaktable.net/articles/slob-silly-little-oracle-
       benchmark




11
SLOB
 –It can do reading:

 FOR i IN 1..5000 LOOP
     v_r := dbms_random.value(257, 10000) ;
  SELECT COUNT(c2) into x
 FROM cf1 where custid > v_r - 256 AND custid < v_r;
 END LOOP;




12
SLOB
 –And writing:

 FOR i IN 1..500 LOOP
         v_r := dbms_random.value(257, 10000) ;
         UPDATE cf1 set
      c2 =
 'AAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAA
 BBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBA
 AAAAAAABBBBBBBB',
     ....up to column 20 (c20)....
     where custid > v_r - 256 AND custid < v_r;
         COMMIT;
 END LOOP;




13
– Lets run SLOB with 0 writers and 1 reader at:

     – Single instance database, 10G SGA, 9G cache.

                               >> Cold cache <<


       - Exadata V2 / Oracle 11.2.0.2 / HP
         - Half rack / 7 storage servers / 84 disks (15k rpm)

       - Exadata X2 / Oracle 11.2.0.2 / HC
         - Quarter rack / 3 storage servers / 36 disks (7.2k rpm)




14
15
16
17
18
1 reader results

     - V2 time: 5 sec - CPU: 84.3%

       - PIO:    10‟768 (0.8%)   -- IO time 0.8 sec
       - LIO: 1‟299‟493

     - X2 time: 4 sec - CPU: 75.7%

       - PIO:    10‟922 (0.8%)   -- IO time 0.9 sec
       - LIO: 1‟300‟726

     - ODA tm: 4 sec - CPU: 55.2%     (20 disks 15k rpm)

       - PIO:    10‟542 (0.8%)   -- IO time 2.2 sec
       - LIO: 1‟297‟502



19
1 reader conclusion

     - The time spend on PIO is 15% - 45%

     - Majority of time spend on LIO/CPU time

     - Because main portion is CPU, fastest CPU “wins”
       - Actually: fastest CPU, memory bus and memory.




20
LIO benchmark
     – Let‟s do a pre-warmed cache run

       - Pre-warmed means: no PIO, data already in BC

       - This means ONLY LIO speed is measured




21
LIO benchmark

     - ODA: 1 sec

     - X2:   2 sec

     - V2:   3 sec




22
components!
                                                 Use ‘dmidecode’ to look to the system’s
     – Reason: LIO essentially means
       - Reading memory
       - CPU processing
     – ODA
       - Intel Xeon X5675 @ 3.07GHz (2s12c24t)
            - L1:384kB, L2:1.5MB, L3:12MB
       - Memory: Type DDR3, speed 1333 Mhz
     – X2:
       - Intel Xeon X5670 @ 2.93GHz (2s12c24t)
            - L1:384kB, L2:1.5MB, L3:12MB
       - Memory: Type DDR3, speed 1333 Mhz
     – V2
       - Intel Xeon E5540 @ 2.53GHz (2s8c16t)
            - L1:128kB, L2:1MB, L3:8MB
       - Memory: Type DDR3, speed 800 Mhz
23
LIO benchmark




24
LIO benchmark




25
LIO benchmark




26
LIO benchmark



     Core difference and slower memory shows when # readers
     exceeds core count.



           Same memory speed: CPU speed matters less with
           more concurrency.




27
LIO benchmark




28
LIO benchmark


     Lesser core’s and slower memory make LIO processing
     increasingly slower with more concurrency




          For LIO processing ODA (non-Exadata) and Exadata
          does not matter.




29
– Conclusion:

       - LIO performance is impacted by:

         -   CPU speed
         -   Number of sockets and core‟s
         -   L1/2/3 cache sizes
         -   Memory speed

       - Exadata does not matter here!

         - When comparing entirely different systems also consider:
              -   Oracle version
              -   O/S and version (scheduling)
              -   Hyper threading / CPU architecture
              -   NUMA (Exadata/ODA: no NUMA!)


30
– But how about physical IO?

       - Lower the buffercache to 4M
         - sga_max_size to 1g
         - cpu_count to 1
         - db_cache_size to 1M (results in 4M)

       - Slob run with 1 reader




31
The V2 is the slowest with 106 seconds.


     The X2 is only a little slower with 76 seconds.



     Surprise! ODA is the fastest here with 73 seconds.




32
Total time (s)   CPU time (s)   IO time (s)

      ODA           73               17             60

      X2            76               33             55

      V2            106              52             52

     – IO ODA:
       - 60/1264355= 0.047 ms

     – IO X2:
       - 55/1265602= 0.043 ms
     – IO V2:
       - 52/1240941= 0.042 ms



33
– This is not random disk IO!

       - Average latency of random IO 15k rpm disk ~ 5ms
       - Average latency of random IO 7.2k rpm disk ~ 8ms



     – So this must come from a cache or is not random disk IO

       - Exadata has flashcache.
       - On ODA, data probably very nearby on disk.




34
Total time (s)   CPU time (s)       IO time (s)

     ODA            73               17                 60

     X2             76               33                 55

     V2             106              52                 52

      - Exadata IO takes (way) more CPU.

      - Roughly the same time is spend on doing IO‟s.




35
Slob / 10 readers




36
Now IO responsetime on ODA is way higher than
     Exadata (3008s)




      Both Exadata’s perform alike: X2 581s, V2 588s.




37
Total time (s)   CPU time (s)   IO time (s)

      ODA           3008             600            29428

      X2            581              848            5213

      V2            588              1388           4866

     – IO ODA:
       - 29428/13879603 = 2.120 ms

     – IO X2:
       - 5213/14045812 = 0.371 ms
     – IO V2:
       - 4866/14170303 = 0.343 ms



38
Slob / 20 readers




39
40
Total time (s)   CPU time (s)   IO time (s)

      ODA           4503             1377           88756

      X2            721              2069           13010

      V2            747              3373           12405

     – IO ODA:
       - 88756/28246604 =0.003142183039 = 3.142 ms

     – IO X2:
       - 13010/28789330=0.0004519035351 = 0.452 ms
     – IO V2:
       - 12405/28766804=0.0004312262148 = 0.431 ms



41
Slob / 30 readers




42
ODA (20 x 15krpm HDD) disk capacity is saturated so
     response time increases with more readers.




     Flashcache is not saturated, so response time of IO
      of 10-20-30 readers increases very little.




43
Slob / up to 80 readers




44
ODA response time more or less increases linear.




                            The V2 response time (with more flashcache!) starts
                            increasing at 70 readers. A bottleneck is showing up!
                            (7x384GB!!)


              X2 flashcache (3x384GB) is not saturated, so little
              increase in response time.




45
IOPS view instead of responsetime


                3x384GB Flashcache and IB can serve
                > 115’851 read IOPS!


                                    This V2 has more flashcache, so
                                    decline in read IOPS probably due to
                                    something else!



                               ODA maxed out at ~ 11’200 read IOPS




46
- V2 top 5 timed events with 80 readers:

 Event                              Waits Time(s) (ms) time Wait Class
 ------------------------------ ------------ ----------- ------ ------ ----------
 cell single block physical rea 102,345,354                    56,614        1 47.1 User I/O
 latch: cache buffers lru chain 27,187,317                    33,471         1 27.8 Other       44.1%
 latch: cache buffers chains              14,736,819          19,594        1 16.3 Concurrenc
 DB CPU                                         13,427            11.2
 wait list latch free                932,930            553       1 .5 Other


           - X2 top 5 timed events with 80 readers:
 Event                              Waits Time(s) (ms) time Wait Class
 ------------------------------ ------------ ----------- ------ ------ ----------
 cell single block physical rea 102,899,953                    68,209        1 87.9 User I/O
 DB CPU                                          9,297           12.0
 latch: cache buffers lru chain 10,917,303                     1,585        0 2.0 Other         2.9%
 latch: cache buffers chains              2,048,395             698       0 .9 Concurrenc
 cell list of blocks physical r          368,795            522       1 .7 User I/O



47
– On the V2, cache concurrency control throttles
        throughput
      – On the X2, this happens only very minimal
                  - V2: CPU: Intel Xeon E5540 @ 2.53GHz (2s8c16t)
                  - X2: CPU: Intel Xeon X5670 @ 2.93GHz (2s12c24t)

      – V2
 Event                     Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
 -------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
 latch: cache buffers chain 14.7M 44.1 37.0 16.3 2.6                           .0    .0
 latch: cache buffers lru c 27.2M 37.2 42.6 20.0                     .2    .0       .0

      – X2
 latch: cache buffers chain 2048. 91.8 7.5                   .6    .0     .0
 latch: cache buffers lru c 10.9M 97.4 2.3                   .2    .0




48
1 LIO       80 LIO        1 PIO         80 PIO

     ODA                     1            11           73             9795

     X2 (HC disks)           2            11           76             976

     V2 (HP disks)           3            22           106            1518




52
1 LIO          80 LIO         1 PIO          80 PIO

     ODA              1             11             73            9795

     X2               2             11             76            976

     V2               3            22             106            1518


           1 PIO w/o flashcache          80 PIO w/o flashcache

     ODA                            73                           9795

     X2                            167                              ?

     V2                            118                           5098


53
- For scalability, OLTP needs buffered IO (LIO)

     - Flashcache is EXTREMELY important physial IO scalability

       - Never, ever, let flash be used for something else
       - Unless you can always keep all your small reads in cache


     - Flash mimics a SAN/NAS cache

       - So nothing groundbreaking here, it does what current, normal infra should
         do too...

     - The bandwidth needed to deliver the data to the database is
       provided by Infiniband
       - 1 Gb ethernet = 120MB/s, 4 Gb fiber = 400MB/s
       - Infiniband is generally available.
54
– How much IOPS can a single cell do?
       - According to
         https://blogs.oracle.com/mrbenchmark/entry/inside_the_sun_oracl
         e_database
       - A single cell can do 75‟000 IOPS from flash (8kB)
         - Personal calculation: 60‟000 IOPS with 8kB


     – Flashcache cache
       - Caches small reads & writes (8kB and less) mostly
       - Large multiblock reads are not cached, unless segment property
         „cell_flash_cache‟ is set to „keep‟.




55
– Is Exadata a good idea for OLTP?
       - From a strictly technical point of view, there is no benefit.

     – But...

     – Exadata gives you IORM
     – Exadata gives you reasonably up to date hardware
     – Exadata gives a system engineered for performance
     – Exadata gives you dedicated disks
     – Exadata gives a validated combination of database,
       clusterware, operating system, hardware, firmware.



56
– Exadata storage servers provide NO redundancy for
       data
       - That‟s a function of ASM

     – Exadata is configured with either

       - Normal redundancy (mirroring) or
       - High redundancy (triple mirroring)

     – to provide data redundancy.




57
– Reading has no problem with normal/high redundancy.

     – During writes, all two or three AU‟s need to be written.

     – This means when you calculate write throughput, you
       need to double all physical writes if using normal
       redundancy.




58
– But we got flash! Right?

     – Yes, you got flash. But it probably doesn‟t do what you
       think it does:




59
– This is on the half rack V2 HP:

     [oracle@dm01db01 [] stuff]$ dcli -l celladmin -g cell_group cellcli -e "list metriccurrent where name like
        'FL_.*_FIRST'"
     dm01cel01: FL_DISK_FIRST        FLASHLOG            316,563 IO requests
     dm01cel01: FL_FLASH_FIRST FLASHLOG                  9,143 IO requests
     dm01cel02: FL_DISK_FIRST        FLASHLOG            305,891 IO requests
     dm01cel02: FL_FLASH_FIRST FLASHLOG                  7,435 IO requests
     dm01cel03: FL_DISK_FIRST        FLASHLOG            307,634 IO requests
     dm01cel03: FL_FLASH_FIRST FLASHLOG                  10,577 IO requests
     dm01cel04: FL_DISK_FIRST        FLASHLOG            299,547 IO requests
     dm01cel04: FL_FLASH_FIRST FLASHLOG                  10,381 IO requests
     dm01cel05: FL_DISK_FIRST        FLASHLOG            311,978 IO requests
     dm01cel05: FL_FLASH_FIRST FLASHLOG                  10,888 IO requests
     dm01cel06: FL_DISK_FIRST        FLASHLOG            315,084 IO requests
     dm01cel06: FL_FLASH_FIRST FLASHLOG                  10,022 IO requests
     dm01cel07: FL_DISK_FIRST        FLASHLOG            323,454 IO requests
     dm01cel07: FL_FLASH_FIRST FLASHLOG                  8,807 IO requests



60
–This is on the quarter rack X2 HC:
 [root@xxxxdb01 ~]# dcli -l root -g cell_group cellcli -e "list metriccurrent where name like 'FL_.*_FIRST'"
 xxxxcel01: FL_DISK_FIRST              FLASHLOG             68,475,141 IO requests
 xxxxcel01: FL_FLASH_FIRST             FLASHLOG             9,109,142 IO requests
 xxxxcel02: FL_DISK_FIRST              FLASHLOG             68,640,951 IO requests
 xxxxcel02: FL_FLASH_FIRST             FLASHLOG             9,229,226 IO requests
 xxxxcel03: FL_DISK_FIRST              FLASHLOG             68,388,238 IO requests
 xxxxcel03: FL_FLASH_FIRST             FLASHLOG             9,072,660 IO requests




61
– Please mind these are cumulative numbers!

     – The half-rack is a POC machine, no heavy usage
       between POC‟s.
     – The quarter-rack has had some load, but definately not
       heavy OLTP.

     – I can imagine flashlog can prevent long write times if disk
       IO‟s queue.
       - A normal configured database on Exadata has online redo in
         DATA and in RECO diskgroup
       - Normal redundancy means every log write must be done 4 times



62
– Log writer wait times:

       - V2 min: 16ms (1 writer), max: 41ms (20 writers)
       - X2 min: 39ms (10 writers), max: 110ms (40 writers)

     – Database writer wait time is significantly lower




63
– Log file write response time on Exadata is not in the
       same range as reads.

     – There‟s the flashlog feature, but it does not work as the
       whitepaper explains

     – Be careful with heavy writing on Exadata.
       - There‟s no Exadata specific improvement for writes.




64
Thank you for attending!

     Questions and answers.




65
Thanks to

 • Klaas-Jan Jongsma
 • VX Company
 • Martin Bach
 • Kevin Closson




66

More Related Content

PDF
PostgreSQL on Solaris
PPTX
IO Dubi Lebel
PDF
Linux performance tuning & stabilization tips (mysqlconf2010)
PDF
200.1,2-Capacity Planning
KEY
Deployment Strategies (Mongo Austin)
PPTX
Sql server scalability fundamentals
PDF
SSD Deployment Strategies for MySQL
PDF
LUG 2014
PostgreSQL on Solaris
IO Dubi Lebel
Linux performance tuning & stabilization tips (mysqlconf2010)
200.1,2-Capacity Planning
Deployment Strategies (Mongo Austin)
Sql server scalability fundamentals
SSD Deployment Strategies for MySQL
LUG 2014

What's hot (20)

PDF
Hadoop on a personal supercomputer
PDF
SSD based storage tuning for databases
KEY
Andy Parsons Pivotal June 2011
PPTX
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
PDF
Blazing Performance with Flame Graphs
PPTX
Migrating to XtraDB Cluster
PDF
Best Practices with PostgreSQL on Solaris
PDF
ZFSperftools2012
PPTX
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
PDF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
PDF
Oracle Latch and Mutex Contention Troubleshooting
PDF
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
PDF
Loadays managing my sql with percona toolkit
PPTX
Deploying ssd in the data center 2014
PDF
MySQL for Large Scale Social Games
PDF
Why Exadata wins - real exadata case studies from Proact portfolio - Fabien d...
PPTX
Making the most of ssd in oracle11g
PDF
Introduction to PostgreSQL for System Administrators
PDF
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
PDF
Deployment
Hadoop on a personal supercomputer
SSD based storage tuning for databases
Andy Parsons Pivotal June 2011
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Blazing Performance with Flame Graphs
Migrating to XtraDB Cluster
Best Practices with PostgreSQL on Solaris
ZFSperftools2012
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Oracle Latch and Mutex Contention Troubleshooting
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Loadays managing my sql with percona toolkit
Deploying ssd in the data center 2014
MySQL for Large Scale Social Games
Why Exadata wins - real exadata case studies from Proact portfolio - Fabien d...
Making the most of ssd in oracle11g
Introduction to PostgreSQL for System Administrators
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Deployment
Ad

Similar to Exadata and OLTP (20)

PDF
Exadata X3 in action: Measuring Smart Scan efficiency with AWR
PPTX
Proving out flash storage array performance using swingbench and slob
PDF
Colvin exadata mistakes_ioug_2014
PDF
Presentation database on flash
PPTX
Optimizing Oracle databases with SSD - April 2014
PPT
Oracle 10g Performance: chapter 00 statspack
PPTX
Miro Consulting Oracle Exadata Database Machine Offering
PDF
PostgreSQL performance archaeology
PDF
Exadata 11-2-overview-v2 11
PDF
EnterpriseOne and Database Compression
PDF
PoC Oracle Exadata - Retour d'expérience
PDF
Meetup Oracle Database MAD_BCN: 4 Saborea Exadata
PDF
How to Modernize Your Database Platform to Realize Consolidation Savings
PDF
Vizuri Exadata East Coast Users Conference
PDF
Open world exadata_top_10_lessons_learned
PDF
Frits Hoogland - About multiblock reads
PPTX
Oracle Exadata X2-8: A Critical Review
PDF
Oracle Exadata Database
PDF
505 kobal exadata
PPT
Exadata x3 workshop
Exadata X3 in action: Measuring Smart Scan efficiency with AWR
Proving out flash storage array performance using swingbench and slob
Colvin exadata mistakes_ioug_2014
Presentation database on flash
Optimizing Oracle databases with SSD - April 2014
Oracle 10g Performance: chapter 00 statspack
Miro Consulting Oracle Exadata Database Machine Offering
PostgreSQL performance archaeology
Exadata 11-2-overview-v2 11
EnterpriseOne and Database Compression
PoC Oracle Exadata - Retour d'expérience
Meetup Oracle Database MAD_BCN: 4 Saborea Exadata
How to Modernize Your Database Platform to Realize Consolidation Savings
Vizuri Exadata East Coast Users Conference
Open world exadata_top_10_lessons_learned
Frits Hoogland - About multiblock reads
Oracle Exadata X2-8: A Critical Review
Oracle Exadata Database
505 kobal exadata
Exadata x3 workshop
Ad

More from Enkitec (20)

PDF
Using Angular JS in APEX
PDF
Controlling execution plans 2014
PDF
Engineered Systems: Environment-as-a-Service Demonstration
PDF
Think Exa!
PDF
In Memory Database In Action by Tanel Poder and Kerry Osborne
PDF
In Search of Plan Stability - Part 1
PDF
Mini Session - Using GDB for Profiling
PDF
Profiling Oracle with GDB
PDF
Oracle Performance Tools of the Trade
PDF
Oracle Performance Tuning Fundamentals
PDF
SQL Tuning Tools of the Trade
PDF
Using SQL Plan Management (SPM) to Balance Plan Flexibility and Plan Stability
PDF
Oracle GoldenGate Architecture Performance
PDF
OGG Architecture Performance
PDF
APEX Security Primer
PDF
How Many Ways Can I Manage Oracle GoldenGate?
PDF
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
PDF
Sql tuning made easier with sqltxplain (sqlt)
PDF
Profiling the logwriter and database writer
PDF
Fatkulin hotsos 2014
Using Angular JS in APEX
Controlling execution plans 2014
Engineered Systems: Environment-as-a-Service Demonstration
Think Exa!
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Search of Plan Stability - Part 1
Mini Session - Using GDB for Profiling
Profiling Oracle with GDB
Oracle Performance Tools of the Trade
Oracle Performance Tuning Fundamentals
SQL Tuning Tools of the Trade
Using SQL Plan Management (SPM) to Balance Plan Flexibility and Plan Stability
Oracle GoldenGate Architecture Performance
OGG Architecture Performance
APEX Security Primer
How Many Ways Can I Manage Oracle GoldenGate?
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
Sql tuning made easier with sqltxplain (sqlt)
Profiling the logwriter and database writer
Fatkulin hotsos 2014

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Modernizing your data center with Dell and AMD
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
KodekX | Application Modernization Development
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Modernizing your data center with Dell and AMD
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KodekX | Application Modernization Development
Network Security Unit 5.pdf for BCA BBA.
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
GamePlan Trading System Review: Professional Trader's Honest Take
MYSQL Presentation for SQL database connectivity
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx

Exadata and OLTP

  • 1. Exadata and OLTP Enkitec Extreme Exadata Expo august 13-14 Dallas Frits Hoogland
  • 2. Who am I?  Frits Hoogland – Working with Oracle products since 1996 – Working with VX Company since 2009  Interests – Databases, Operating Systems, Application Servers – Web techniques, TCP/IP, network security – Technical security, performance Twitter: @fritshoogland  Blog: http://fritshoogland.wordpress.com  Email: fhoogland@vxcompany.com  Oracle ACE Director  OakTable member
  • 3. What is exadata – Engineered system specifically for oracle database. – Ability to reach high number of read IOPS and huge bandwidth. – Has it‟s own patchbundles. – Validated versions and patch versions across database, clusterware, o/s and storage, firmware. – Dedicated, private storage for databases. – ASM. – Recent hardware & recent CPU. – No virtualisation. 3
  • 4. Exadata versions – Oracle database 64 bit version >= 11 – ASM 64 bit version >= 11 - Exadata communication is layer in skgxp code – Linux OL5 x64 - No UEK kernel used (except X2-8) 4
  • 5. Exadata hardware – Intel Xeon server hardware – Infiniband 40Gb/s – Oracle cell (storage) server - Flash to mimic SAN cache - High performance disks or high capacity disks - 600GB 15k RPM / ~ 5ms latency - 2/3TB 7.2k RPM / ~ 8ms latency 5
  • 6. Flash – Flashcards are in every storage server – Total of 384GB per storage server – Do not confuse exadata STORAGE server flashcache with oracle database flashcache – Flash can be configured either as cache (flash cache and flash log) or as diskgroup or both – When flash is used as diskgroup latency is ~ 1 ms - Much faster than disk - My guess was < 400µs - 1µs infiniband - 200µs flash IO time - some time for storage server 6
  • 7. Flash – Flash is restricted to 4x96GB = 384GB per storage server. - Totals: - Q:1152GB, H:2688GB, F:5376GB - Net (ASM Normal redundancy): - Q: 576GB, H:1344GB, F:2688GB – That is a very limited amount of storage. – But with flash as diskgroup there‟s no cache for PIO‟s! 7
  • 8. Exadata specific features – The secret sauce of exadata: the storage server - smartscan - storage indexes - EHCC * - IO Resource manager 8
  • 9. OLTP – How does OLTP look like (in general | simplistic) – Fetch small amount of data - Invoice numbers, client id, product id - select single values or small ranges via index – Create or update rows - Sold items on invoice, payments, order status - insert or update values 10
  • 10. SLOB – A great way to mimic or measure OLTP performance is SLOB – Silly Little Oracle Benchmark – Author: Kevin Closson – http://oaktable.net/articles/slob-silly-little-oracle- benchmark 11
  • 11. SLOB –It can do reading: FOR i IN 1..5000 LOOP v_r := dbms_random.value(257, 10000) ; SELECT COUNT(c2) into x FROM cf1 where custid > v_r - 256 AND custid < v_r; END LOOP; 12
  • 12. SLOB –And writing: FOR i IN 1..500 LOOP v_r := dbms_random.value(257, 10000) ; UPDATE cf1 set c2 = 'AAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAA BBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBA AAAAAAABBBBBBBB', ....up to column 20 (c20).... where custid > v_r - 256 AND custid < v_r; COMMIT; END LOOP; 13
  • 13. – Lets run SLOB with 0 writers and 1 reader at: – Single instance database, 10G SGA, 9G cache. >> Cold cache << - Exadata V2 / Oracle 11.2.0.2 / HP - Half rack / 7 storage servers / 84 disks (15k rpm) - Exadata X2 / Oracle 11.2.0.2 / HC - Quarter rack / 3 storage servers / 36 disks (7.2k rpm) 14
  • 14. 15
  • 15. 16
  • 16. 17
  • 17. 18
  • 18. 1 reader results - V2 time: 5 sec - CPU: 84.3% - PIO: 10‟768 (0.8%) -- IO time 0.8 sec - LIO: 1‟299‟493 - X2 time: 4 sec - CPU: 75.7% - PIO: 10‟922 (0.8%) -- IO time 0.9 sec - LIO: 1‟300‟726 - ODA tm: 4 sec - CPU: 55.2% (20 disks 15k rpm) - PIO: 10‟542 (0.8%) -- IO time 2.2 sec - LIO: 1‟297‟502 19
  • 19. 1 reader conclusion - The time spend on PIO is 15% - 45% - Majority of time spend on LIO/CPU time - Because main portion is CPU, fastest CPU “wins” - Actually: fastest CPU, memory bus and memory. 20
  • 20. LIO benchmark – Let‟s do a pre-warmed cache run - Pre-warmed means: no PIO, data already in BC - This means ONLY LIO speed is measured 21
  • 21. LIO benchmark - ODA: 1 sec - X2: 2 sec - V2: 3 sec 22
  • 22. components! Use ‘dmidecode’ to look to the system’s – Reason: LIO essentially means - Reading memory - CPU processing – ODA - Intel Xeon X5675 @ 3.07GHz (2s12c24t) - L1:384kB, L2:1.5MB, L3:12MB - Memory: Type DDR3, speed 1333 Mhz – X2: - Intel Xeon X5670 @ 2.93GHz (2s12c24t) - L1:384kB, L2:1.5MB, L3:12MB - Memory: Type DDR3, speed 1333 Mhz – V2 - Intel Xeon E5540 @ 2.53GHz (2s8c16t) - L1:128kB, L2:1MB, L3:8MB - Memory: Type DDR3, speed 800 Mhz 23
  • 26. LIO benchmark Core difference and slower memory shows when # readers exceeds core count. Same memory speed: CPU speed matters less with more concurrency. 27
  • 28. LIO benchmark Lesser core’s and slower memory make LIO processing increasingly slower with more concurrency For LIO processing ODA (non-Exadata) and Exadata does not matter. 29
  • 29. – Conclusion: - LIO performance is impacted by: - CPU speed - Number of sockets and core‟s - L1/2/3 cache sizes - Memory speed - Exadata does not matter here! - When comparing entirely different systems also consider: - Oracle version - O/S and version (scheduling) - Hyper threading / CPU architecture - NUMA (Exadata/ODA: no NUMA!) 30
  • 30. – But how about physical IO? - Lower the buffercache to 4M - sga_max_size to 1g - cpu_count to 1 - db_cache_size to 1M (results in 4M) - Slob run with 1 reader 31
  • 31. The V2 is the slowest with 106 seconds. The X2 is only a little slower with 76 seconds. Surprise! ODA is the fastest here with 73 seconds. 32
  • 32. Total time (s) CPU time (s) IO time (s) ODA 73 17 60 X2 76 33 55 V2 106 52 52 – IO ODA: - 60/1264355= 0.047 ms – IO X2: - 55/1265602= 0.043 ms – IO V2: - 52/1240941= 0.042 ms 33
  • 33. – This is not random disk IO! - Average latency of random IO 15k rpm disk ~ 5ms - Average latency of random IO 7.2k rpm disk ~ 8ms – So this must come from a cache or is not random disk IO - Exadata has flashcache. - On ODA, data probably very nearby on disk. 34
  • 34. Total time (s) CPU time (s) IO time (s) ODA 73 17 60 X2 76 33 55 V2 106 52 52 - Exadata IO takes (way) more CPU. - Roughly the same time is spend on doing IO‟s. 35
  • 35. Slob / 10 readers 36
  • 36. Now IO responsetime on ODA is way higher than Exadata (3008s) Both Exadata’s perform alike: X2 581s, V2 588s. 37
  • 37. Total time (s) CPU time (s) IO time (s) ODA 3008 600 29428 X2 581 848 5213 V2 588 1388 4866 – IO ODA: - 29428/13879603 = 2.120 ms – IO X2: - 5213/14045812 = 0.371 ms – IO V2: - 4866/14170303 = 0.343 ms 38
  • 38. Slob / 20 readers 39
  • 39. 40
  • 40. Total time (s) CPU time (s) IO time (s) ODA 4503 1377 88756 X2 721 2069 13010 V2 747 3373 12405 – IO ODA: - 88756/28246604 =0.003142183039 = 3.142 ms – IO X2: - 13010/28789330=0.0004519035351 = 0.452 ms – IO V2: - 12405/28766804=0.0004312262148 = 0.431 ms 41
  • 41. Slob / 30 readers 42
  • 42. ODA (20 x 15krpm HDD) disk capacity is saturated so response time increases with more readers. Flashcache is not saturated, so response time of IO of 10-20-30 readers increases very little. 43
  • 43. Slob / up to 80 readers 44
  • 44. ODA response time more or less increases linear. The V2 response time (with more flashcache!) starts increasing at 70 readers. A bottleneck is showing up! (7x384GB!!) X2 flashcache (3x384GB) is not saturated, so little increase in response time. 45
  • 45. IOPS view instead of responsetime 3x384GB Flashcache and IB can serve > 115’851 read IOPS! This V2 has more flashcache, so decline in read IOPS probably due to something else! ODA maxed out at ~ 11’200 read IOPS 46
  • 46. - V2 top 5 timed events with 80 readers: Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- cell single block physical rea 102,345,354 56,614 1 47.1 User I/O latch: cache buffers lru chain 27,187,317 33,471 1 27.8 Other 44.1% latch: cache buffers chains 14,736,819 19,594 1 16.3 Concurrenc DB CPU 13,427 11.2 wait list latch free 932,930 553 1 .5 Other - X2 top 5 timed events with 80 readers: Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- cell single block physical rea 102,899,953 68,209 1 87.9 User I/O DB CPU 9,297 12.0 latch: cache buffers lru chain 10,917,303 1,585 0 2.0 Other 2.9% latch: cache buffers chains 2,048,395 698 0 .9 Concurrenc cell list of blocks physical r 368,795 522 1 .7 User I/O 47
  • 47. – On the V2, cache concurrency control throttles throughput – On the X2, this happens only very minimal - V2: CPU: Intel Xeon E5540 @ 2.53GHz (2s8c16t) - X2: CPU: Intel Xeon X5670 @ 2.93GHz (2s12c24t) – V2 Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s -------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- latch: cache buffers chain 14.7M 44.1 37.0 16.3 2.6 .0 .0 latch: cache buffers lru c 27.2M 37.2 42.6 20.0 .2 .0 .0 – X2 latch: cache buffers chain 2048. 91.8 7.5 .6 .0 .0 latch: cache buffers lru c 10.9M 97.4 2.3 .2 .0 48
  • 48. 1 LIO 80 LIO 1 PIO 80 PIO ODA 1 11 73 9795 X2 (HC disks) 2 11 76 976 V2 (HP disks) 3 22 106 1518 52
  • 49. 1 LIO 80 LIO 1 PIO 80 PIO ODA 1 11 73 9795 X2 2 11 76 976 V2 3 22 106 1518 1 PIO w/o flashcache 80 PIO w/o flashcache ODA 73 9795 X2 167 ? V2 118 5098 53
  • 50. - For scalability, OLTP needs buffered IO (LIO) - Flashcache is EXTREMELY important physial IO scalability - Never, ever, let flash be used for something else - Unless you can always keep all your small reads in cache - Flash mimics a SAN/NAS cache - So nothing groundbreaking here, it does what current, normal infra should do too... - The bandwidth needed to deliver the data to the database is provided by Infiniband - 1 Gb ethernet = 120MB/s, 4 Gb fiber = 400MB/s - Infiniband is generally available. 54
  • 51. – How much IOPS can a single cell do? - According to https://blogs.oracle.com/mrbenchmark/entry/inside_the_sun_oracl e_database - A single cell can do 75‟000 IOPS from flash (8kB) - Personal calculation: 60‟000 IOPS with 8kB – Flashcache cache - Caches small reads & writes (8kB and less) mostly - Large multiblock reads are not cached, unless segment property „cell_flash_cache‟ is set to „keep‟. 55
  • 52. – Is Exadata a good idea for OLTP? - From a strictly technical point of view, there is no benefit. – But... – Exadata gives you IORM – Exadata gives you reasonably up to date hardware – Exadata gives a system engineered for performance – Exadata gives you dedicated disks – Exadata gives a validated combination of database, clusterware, operating system, hardware, firmware. 56
  • 53. – Exadata storage servers provide NO redundancy for data - That‟s a function of ASM – Exadata is configured with either - Normal redundancy (mirroring) or - High redundancy (triple mirroring) – to provide data redundancy. 57
  • 54. – Reading has no problem with normal/high redundancy. – During writes, all two or three AU‟s need to be written. – This means when you calculate write throughput, you need to double all physical writes if using normal redundancy. 58
  • 55. – But we got flash! Right? – Yes, you got flash. But it probably doesn‟t do what you think it does: 59
  • 56. – This is on the half rack V2 HP: [oracle@dm01db01 [] stuff]$ dcli -l celladmin -g cell_group cellcli -e "list metriccurrent where name like 'FL_.*_FIRST'" dm01cel01: FL_DISK_FIRST FLASHLOG 316,563 IO requests dm01cel01: FL_FLASH_FIRST FLASHLOG 9,143 IO requests dm01cel02: FL_DISK_FIRST FLASHLOG 305,891 IO requests dm01cel02: FL_FLASH_FIRST FLASHLOG 7,435 IO requests dm01cel03: FL_DISK_FIRST FLASHLOG 307,634 IO requests dm01cel03: FL_FLASH_FIRST FLASHLOG 10,577 IO requests dm01cel04: FL_DISK_FIRST FLASHLOG 299,547 IO requests dm01cel04: FL_FLASH_FIRST FLASHLOG 10,381 IO requests dm01cel05: FL_DISK_FIRST FLASHLOG 311,978 IO requests dm01cel05: FL_FLASH_FIRST FLASHLOG 10,888 IO requests dm01cel06: FL_DISK_FIRST FLASHLOG 315,084 IO requests dm01cel06: FL_FLASH_FIRST FLASHLOG 10,022 IO requests dm01cel07: FL_DISK_FIRST FLASHLOG 323,454 IO requests dm01cel07: FL_FLASH_FIRST FLASHLOG 8,807 IO requests 60
  • 57. –This is on the quarter rack X2 HC: [root@xxxxdb01 ~]# dcli -l root -g cell_group cellcli -e "list metriccurrent where name like 'FL_.*_FIRST'" xxxxcel01: FL_DISK_FIRST FLASHLOG 68,475,141 IO requests xxxxcel01: FL_FLASH_FIRST FLASHLOG 9,109,142 IO requests xxxxcel02: FL_DISK_FIRST FLASHLOG 68,640,951 IO requests xxxxcel02: FL_FLASH_FIRST FLASHLOG 9,229,226 IO requests xxxxcel03: FL_DISK_FIRST FLASHLOG 68,388,238 IO requests xxxxcel03: FL_FLASH_FIRST FLASHLOG 9,072,660 IO requests 61
  • 58. – Please mind these are cumulative numbers! – The half-rack is a POC machine, no heavy usage between POC‟s. – The quarter-rack has had some load, but definately not heavy OLTP. – I can imagine flashlog can prevent long write times if disk IO‟s queue. - A normal configured database on Exadata has online redo in DATA and in RECO diskgroup - Normal redundancy means every log write must be done 4 times 62
  • 59. – Log writer wait times: - V2 min: 16ms (1 writer), max: 41ms (20 writers) - X2 min: 39ms (10 writers), max: 110ms (40 writers) – Database writer wait time is significantly lower 63
  • 60. – Log file write response time on Exadata is not in the same range as reads. – There‟s the flashlog feature, but it does not work as the whitepaper explains – Be careful with heavy writing on Exadata. - There‟s no Exadata specific improvement for writes. 64
  • 61. Thank you for attending! Questions and answers. 65
  • 62. Thanks to • Klaas-Jan Jongsma • VX Company • Martin Bach • Kevin Closson 66