Exadata and OLTP

Exadata and OLTP
Enkitec Extreme Exadata Expo
august 13-14 Dallas

Frits Hoogland

Who am I?
 Frits Hoogland
– Working with Oracle products since 1996
– Working with VX Company since 2009
 Interests
– Databases, Operating Systems, Application Servers
– Web techniques, TCP/IP, network security
– Technical security, performance
Twitter: @fritshoogland
 Blog: http://fritshoogland.wordpress.com
 Email: fhoogland@vxcompany.com
 Oracle ACE Director
 OakTable member

What is exadata
– Engineered system specifically for oracle database.
– Ability to reach high number of read IOPS and huge
bandwidth.
– Has it‟s own patchbundles.
– Validated versions and patch versions across database,
clusterware, o/s and storage, firmware.
– Dedicated, private storage for databases.
– ASM.
– Recent hardware & recent CPU.
– No virtualisation.

3

Exadata versions

– Oracle database 64 bit version >= 11

– ASM 64 bit version >= 11
- Exadata communication is layer in skgxp code

– Linux OL5 x64
- No UEK kernel used (except X2-8)

4

Exadata hardware

– Intel Xeon server hardware

– Infiniband 40Gb/s

– Oracle cell (storage) server
- Flash to mimic SAN cache
- High performance disks or high capacity disks
- 600GB 15k RPM / ~ 5ms latency
- 2/3TB 7.2k RPM / ~ 8ms latency

5

Flash
– Flashcards are in every storage server
– Total of 384GB per storage server

– Do not confuse exadata STORAGE server flashcache
with oracle database flashcache

– Flash can be configured either as cache (flash cache
and flash log) or as diskgroup or both

– When flash is used as diskgroup latency is ~ 1 ms
- Much faster than disk
- My guess was < 400µs
- 1µs infiniband
- 200µs flash IO time
- some time for storage server
6

Flash
– Flash is restricted to 4x96GB = 384GB per storage
server.
- Totals:
- Q:1152GB, H:2688GB, F:5376GB

- Net (ASM Normal redundancy):
- Q: 576GB, H:1344GB, F:2688GB

– That is a very limited amount of storage.

– But with flash as diskgroup there‟s no cache for PIO‟s!

7

Exadata specific features
– The secret sauce of exadata: the storage server

- smartscan

- storage indexes

- EHCC *

- IO Resource manager

8

OLTP
– How does OLTP look like (in general | simplistic)

– Fetch small amount of data
- Invoice numbers, client id, product id
- select single values or small ranges via index

– Create or update rows
- Sold items on invoice, payments, order status
- insert or update values

10

SLOB
– A great way to mimic or measure OLTP performance is

SLOB

– Silly Little Oracle Benchmark

– Author: Kevin Closson
– http://oaktable.net/articles/slob-silly-little-oracle-
benchmark

11

SLOB
–It can do reading:

FOR i IN 1..5000 LOOP
v_r := dbms_random.value(257, 10000) ;
SELECT COUNT(c2) into x
FROM cf1 where custid > v_r - 256 AND custid < v_r;
END LOOP;

12

SLOB
–And writing:

FOR i IN 1..500 LOOP
v_r := dbms_random.value(257, 10000) ;
UPDATE cf1 set
c2 =
'AAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAA
BBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBA
AAAAAAABBBBBBBB',
....up to column 20 (c20)....
where custid > v_r - 256 AND custid < v_r;
COMMIT;
END LOOP;

13

– Lets run SLOB with 0 writers and 1 reader at:

– Single instance database, 10G SGA, 9G cache.

>> Cold cache <<

- Exadata V2 / Oracle 11.2.0.2 / HP
- Half rack / 7 storage servers / 84 disks (15k rpm)

- Exadata X2 / Oracle 11.2.0.2 / HC
- Quarter rack / 3 storage servers / 36 disks (7.2k rpm)

14

1 reader results

- V2 time: 5 sec - CPU: 84.3%

- PIO: 10‟768 (0.8%) -- IO time 0.8 sec
- LIO: 1‟299‟493

- X2 time: 4 sec - CPU: 75.7%

- PIO: 10‟922 (0.8%) -- IO time 0.9 sec
- LIO: 1‟300‟726

- ODA tm: 4 sec - CPU: 55.2% (20 disks 15k rpm)

- PIO: 10‟542 (0.8%) -- IO time 2.2 sec
- LIO: 1‟297‟502

19

1 reader conclusion

- The time spend on PIO is 15% - 45%

- Majority of time spend on LIO/CPU time

- Because main portion is CPU, fastest CPU “wins”
- Actually: fastest CPU, memory bus and memory.

20

LIO benchmark
– Let‟s do a pre-warmed cache run

- Pre-warmed means: no PIO, data already in BC

- This means ONLY LIO speed is measured

21

LIO benchmark

- ODA: 1 sec

- X2: 2 sec

- V2: 3 sec

22

components!
Use ‘dmidecode’ to look to the system’s
– Reason: LIO essentially means
- Reading memory
- CPU processing
– ODA
- Intel Xeon X5675 @ 3.07GHz (2s12c24t)
- L1:384kB, L2:1.5MB, L3:12MB
- Memory: Type DDR3, speed 1333 Mhz
– X2:
- Intel Xeon X5670 @ 2.93GHz (2s12c24t)
- L1:384kB, L2:1.5MB, L3:12MB
– V2
- Intel Xeon E5540 @ 2.53GHz (2s8c16t)
- L1:128kB, L2:1MB, L3:8MB
23

LIO benchmark

Core difference and slower memory shows when # readers
exceeds core count.

Same memory speed: CPU speed matters less with
more concurrency.

27

LIO benchmark

Lesser core’s and slower memory make LIO processing
increasingly slower with more concurrency

For LIO processing ODA (non-Exadata) and Exadata
does not matter.

29

– Conclusion:

- LIO performance is impacted by:

- CPU speed
- Number of sockets and core‟s
- L1/2/3 cache sizes
- Memory speed

- Exadata does not matter here!

- When comparing entirely different systems also consider:
- Oracle version
- O/S and version (scheduling)
- Hyper threading / CPU architecture
- NUMA (Exadata/ODA: no NUMA!)

30

– But how about physical IO?

- Lower the buffercache to 4M
- sga_max_size to 1g
- cpu_count to 1
- db_cache_size to 1M (results in 4M)

- Slob run with 1 reader

31

The V2 is the slowest with 106 seconds.

The X2 is only a little slower with 76 seconds.

Surprise! ODA is the fastest here with 73 seconds.

32

Total time (s) CPU time (s) IO time (s)

ODA 73 17 60

X2 76 33 55

V2 106 52 52

– IO ODA:
- 60/1264355= 0.047 ms

– IO X2:
- 55/1265602= 0.043 ms
– IO V2:
- 52/1240941= 0.042 ms

33

– This is not random disk IO!

- Average latency of random IO 15k rpm disk ~ 5ms
- Average latency of random IO 7.2k rpm disk ~ 8ms

– So this must come from a cache or is not random disk IO

- Exadata has flashcache.
- On ODA, data probably very nearby on disk.

34


ODA 73 17 60

X2 76 33 55

V2 106 52 52

- Exadata IO takes (way) more CPU.

- Roughly the same time is spend on doing IO‟s.

35

Now IO responsetime on ODA is way higher than
Exadata (3008s)

Both Exadata’s perform alike: X2 581s, V2 588s.

37


ODA 3008 600 29428

X2 581 848 5213

V2 588 1388 4866

– IO ODA:
- 29428/13879603 = 2.120 ms

– IO X2:
- 5213/14045812 = 0.371 ms
– IO V2:
- 4866/14170303 = 0.343 ms

38


ODA 4503 1377 88756

X2 721 2069 13010

V2 747 3373 12405

– IO ODA:
- 88756/28246604 =0.003142183039 = 3.142 ms

– IO X2:
- 13010/28789330=0.0004519035351 = 0.452 ms
– IO V2:
- 12405/28766804=0.0004312262148 = 0.431 ms

41

ODA (20 x 15krpm HDD) disk capacity is saturated so
response time increases with more readers.

Flashcache is not saturated, so response time of IO
of 10-20-30 readers increases very little.

43

Slob / up to 80 readers

44

ODA response time more or less increases linear.

The V2 response time (with more flashcache!) starts
increasing at 70 readers. A bottleneck is showing up!
(7x384GB!!)

X2 flashcache (3x384GB) is not saturated, so little
increase in response time.

45

IOPS view instead of responsetime

3x384GB Flashcache and IB can serve
> 115’851 read IOPS!

This V2 has more flashcache, so
decline in read IOPS probably due to
something else!

ODA maxed out at ~ 11’200 read IOPS

46

- V2 top 5 timed events with 80 readers:

Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
cell single block physical rea 102,345,354 56,614 1 47.1 User I/O
latch: cache buffers lru chain 27,187,317 33,471 1 27.8 Other 44.1%
latch: cache buffers chains 14,736,819 19,594 1 16.3 Concurrenc
DB CPU 13,427 11.2
wait list latch free 932,930 553 1 .5 Other

- X2 top 5 timed events with 80 readers:
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
cell single block physical rea 102,899,953 68,209 1 87.9 User I/O
DB CPU 9,297 12.0
latch: cache buffers lru chain 10,917,303 1,585 0 2.0 Other 2.9%
latch: cache buffers chains 2,048,395 698 0 .9 Concurrenc
cell list of blocks physical r 368,795 522 1 .7 User I/O

47

– On the V2, cache concurrency control throttles
throughput
– On the X2, this happens only very minimal
- V2: CPU: Intel Xeon E5540 @ 2.53GHz (2s8c16t)
- X2: CPU: Intel Xeon X5670 @ 2.93GHz (2s12c24t)

– V2
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
latch: cache buffers chain 14.7M 44.1 37.0 16.3 2.6 .0 .0
latch: cache buffers lru c 27.2M 37.2 42.6 20.0 .2 .0 .0

– X2
latch: cache buffers chain 2048. 91.8 7.5 .6 .0 .0
latch: cache buffers lru c 10.9M 97.4 2.3 .2 .0

48

1 LIO 80 LIO 1 PIO 80 PIO

ODA 1 11 73 9795

X2 (HC disks) 2 11 76 976

V2 (HP disks) 3 22 106 1518

52

1 LIO 80 LIO 1 PIO 80 PIO

ODA 1 11 73 9795

X2 2 11 76 976

V2 3 22 106 1518

1 PIO w/o flashcache 80 PIO w/o flashcache

ODA 73 9795

X2 167 ?

V2 118 5098

53

- For scalability, OLTP needs buffered IO (LIO)

- Flashcache is EXTREMELY important physial IO scalability

- Never, ever, let flash be used for something else
- Unless you can always keep all your small reads in cache

- Flash mimics a SAN/NAS cache

- So nothing groundbreaking here, it does what current, normal infra should
do too...

- The bandwidth needed to deliver the data to the database is
provided by Infiniband
- 1 Gb ethernet = 120MB/s, 4 Gb fiber = 400MB/s
- Infiniband is generally available.
54

– How much IOPS can a single cell do?
- According to
https://blogs.oracle.com/mrbenchmark/entry/inside_the_sun_oracl
e_database
- A single cell can do 75‟000 IOPS from flash (8kB)
- Personal calculation: 60‟000 IOPS with 8kB

– Flashcache cache
- Caches small reads & writes (8kB and less) mostly
- Large multiblock reads are not cached, unless segment property
„cell_flash_cache‟ is set to „keep‟.

55

– Is Exadata a good idea for OLTP?
- From a strictly technical point of view, there is no benefit.

– But...

– Exadata gives you IORM
– Exadata gives you reasonably up to date hardware
– Exadata gives a system engineered for performance
– Exadata gives you dedicated disks
– Exadata gives a validated combination of database,
clusterware, operating system, hardware, firmware.

56

– Exadata storage servers provide NO redundancy for
data
- That‟s a function of ASM

– Exadata is configured with either

- Normal redundancy (mirroring) or
- High redundancy (triple mirroring)

– to provide data redundancy.

57

– Reading has no problem with normal/high redundancy.

– During writes, all two or three AU‟s need to be written.

– This means when you calculate write throughput, you
need to double all physical writes if using normal
redundancy.

58

– But we got flash! Right?

– Yes, you got flash. But it probably doesn‟t do what you
think it does:

59

– This is on the half rack V2 HP:

[oracle@dm01db01 [] stuff]$ dcli -l celladmin -g cell_group cellcli -e "list metriccurrent where name like
'FL_.*_FIRST'"
dm01cel01: FL_DISK_FIRST FLASHLOG 316,563 IO requests
dm01cel01: FL_FLASH_FIRST FLASHLOG 9,143 IO requests

60

–This is on the quarter rack X2 HC:
[root@xxxxdb01 ~]# dcli -l root -g cell_group cellcli -e "list metriccurrent where name like 'FL_.*_FIRST'"
xxxxcel01: FL_DISK_FIRST FLASHLOG 68,475,141 IO requests
xxxxcel01: FL_FLASH_FIRST FLASHLOG 9,109,142 IO requests

61

– Please mind these are cumulative numbers!

– The half-rack is a POC machine, no heavy usage
between POC‟s.
– The quarter-rack has had some load, but definately not
heavy OLTP.

– I can imagine flashlog can prevent long write times if disk
IO‟s queue.
- A normal configured database on Exadata has online redo in
DATA and in RECO diskgroup
- Normal redundancy means every log write must be done 4 times

62

– Log writer wait times:

- V2 min: 16ms (1 writer), max: 41ms (20 writers)
- X2 min: 39ms (10 writers), max: 110ms (40 writers)

– Database writer wait time is significantly lower

63

– Log file write response time on Exadata is not in the
same range as reads.

– There‟s the flashlog feature, but it does not work as the
whitepaper explains

– Be careful with heavy writing on Exadata.
- There‟s no Exadata specific improvement for writes.

64

Thank you for attending!

Questions and answers.

65

Thanks to

• Klaas-Jan Jongsma
• VX Company
• Martin Bach
• Kevin Closson

66

Exadata and OLTP

More Related Content

What's hot (20)

Similar to Exadata and OLTP (20)

More from Enkitec (20)

Recently uploaded (20)

Exadata and OLTP