SlideShare a Scribd company logo
SIMD Instructions
outside and inside
Oracle 12c
Laurent Léturgez – 2015
ABOUT ME
´ Oracle Consultant since 2001
´ Former developer (C, Java, perl, PL/SQL)
´ Blogger since 2004
´ http://laurent.leturgez.free.fr (In french and discontinued)
´ http://laurent-leturgez.com
´ Twitter : @lleturgez
´ Paris Oracle Meetup Organizer: @ParisOracle
´ OCM 11g
Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
Caveats
´ Most of the topics are from
´ My own researches
´ My past life as a developer
´ Some of the topics are about internals, so:
´ Analysis and conclusion may be incomplete
´ Future versions of Oracle may change the features
´ Tests have been done with Oracle 12.1.0.2, Oracle
Enterprise Linux 7.1, VMWare Fusion 7 (And
VirtualBox)
Before we start …
´ Some fundamentals (from Dennis Yurichev’s book)
´ CPU register : […]The easiest way to understand a register is
to think of it as an untyped temporary variable. Imagine if
you were working with high-level PL1 and could only use
eight 32-bit (or 64-bit) variables. Yet a lot can be done using
just these!
´ Instruction : A primitive CPU command. The simplest
examples include: moving data between registers, working
with memory and arithmetic primitives. As a rule, each CPU
has its own instruction set architecture (ISA).
´ Assembly language : Mnemonic code and some extensions
like macros which are intended to make a programmer’s life
easier.
http://beginners.re/Reverse_Engineering_for_Beginners-en.pdf
Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
SIMD instructions … outside
Oracle 12c
´ SIMD stands for Single Instruction Multiple Data
´ Process multiple data
´ In one CPU instruction
´ Based on
´ Specific registers
´ Specific CPU instructions and sets of instructions
´ Not Oracle specific
´ CPU Architecture specific
´ Intel
´ IBM
´ Sparc
´ This presentation is mainly about Intel architecture
SIMD instructions … outside
Oracle 12c
´ What is a SIMD register ?
´ It’s a CPU register
´ Wider than traditional registers (RDI, RSI, R8, R9 etc.)
´ 128 up to 512 bits wide
´ Contains many data
SIMD instructions … outside
Oracle 12c
´ Scalar operation
´ an array of 4 integers {1,2,3,4}
´ add 1 to each value
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
2
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
2
2
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
4
1
5
3 4 52
…/
…
LOAD ADD SAVE
4 LOAD
4 ADD
4 SAVE
SIMD instructions … outside
Oracle 12c
´ SIMD operation
´ an array of 4 integers {1,2,3,4}
´ add 1 to each value
SIMD Reg1
CPU
RAM
In
Out
2 3 41
1 1 11SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
2 3 41
1 1 11SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
2 3 41
1 1 11
3 4 52
SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
3 4 52
2 3 41
1 1 11
3 4 52
SIMD Reg2
SIMD Reg3
LOAD ADD SAVE
SIMD instructions … outside
Oracle 12c
´ MMX: MultiMedia eXtensions (Pentium II)
´ 64 bits registers
´ 8 registers (MM0 to MM7)
´ SSE: Streaming SIMD Extensions: (Pentium III)
´ 128 bits registers
´ 8 registers (XMM0 to XMM7)
´ Only four 32 bits single precision floating point numbers
´ SSE2 (Pentium IV), SSE3 (Pentium IV Prescott, Xeon Nocona), SSSE3
(Xeon 5100, Core 2), SSE4.1 (Penryn), SSE4.2 (Nehalem)
´ 128 bits registers
´ 16 registers (XMM0 to XMM15)
´ Usage expansion (two 64 bits double precision, four 32 bits
integers until sixteen 8 bits bytes)
´ New instructions
SIMD instructions … outside
Oracle 12c
´ AVX: Advanced Vector eXtension (Sandy Bridge processors)
´ XMM registers are extended to 256 bits
´ 16 AVX registers named YMM0 to YMM15
´ Three operand instructions (non destructive) : A+B=C rather than
A=A+B
´ Some alignment requirements are relaxed
´ AVX2 (Introduced with Haswell processors)
´ 256 bits registers
´ New instructions (shifting, value broadcasting etc…)
´ AVX-512 or AVX3 (Skylake processors)
´ 512 bits registers
´ 32 registers named ZMM0 to ZMM31
´ AVX-1024 … the future
´ 1024 bits registers
SIMD instructions … outside
Oracle 12c
´ SIMD instructions
´ Reduce number of CPU cycles and memory pressure
´ Process data in parallel without any contention
´ Need a programming method (vector programming) with some
constraints (data alignments etc.)
´ Size matters
´ Wider registers, more data loaded (but wider register files
increase CPU power consumption à Challenge)
´ Processing is always done as a single CPU Cycle
´ More registers
´ Use cases
´ Data Filtering
´ Graphics
´ Bioinformatics …
SIMD instructions … outside
Oracle 12c
´ Intel API (C/C++) : Intel Intrinsics Guide
https://software.intel.com/sites/landingpage/IntrinsicsGuide/
´ Sample codes:
https://app.box.com/simdSampleC-2015
SIMD instructions … outside
Oracle 12c
Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
Will my application use SIMD registers
and instructions ?
´ It depends on :
´ Hardware
´ Consult processors datasheets to see which instruction set
extensions are used (if many)
´ http://ark.intel.com/#@Processors
´ Hypervisor
´ Some (old) hypervisors do not support modern extensions
´ VirtualBox versions <5.0 don’t support SSE4, AVX and AVX2
´ Hyper-V on W2008R2-SP1 needs patch for specific processors
to support AVX
´ It depends on the Operating System
´AVX (256 bits) is supported from
´ Linux Kernel >= 2.6.30
´ Redhat EL5 : 2.6.18
´ Oracle EL5 w/UEK : 2.6.32
AVX needs xsave kernel parameter
´ Solaris 10 upd 10 and Solaris 11
´ Windows 2008 R2 SP1
Will my application use SIMD registers
and instructions ?
´ It depends on the compiler
´ GCC
´ > 4.6 for AVX support
´ Use of specific switches (-msse2, -msse4.1, msse4.2, -
mavx, -mavx2 …)
´ Intel C/C++ Compiler (ICC)
´ > 11.1 for AVX Support and > 13.0 for AVX2 support
´ Use of specific switches (-xsse4.2, -xavx, -xcore-avx2
…)
´ Beware of optimization switches (-O1,-O2, -O3)
´ More … disassemble (if you are allowed to J )
´ Registers
´ Assembler instructions
Will my application use SIMD registers
and instructions ?
Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
´ Based on a C program
´ Used CPU: Haswell microarchitecture (Core
i7-4960HQ). AVX/AVX2 enabled
´ 3 tests : No SIMD, SSE4, AVX
´ Input: one array containing 1Million values.
´ Goal: Add 1 to each value, each million
values repeated 4k, 8k, 16k and 32k times
´ CPU Time(s) = f(#rows)
“Quick and Dirty” Sample code available here:
https://app.box.com/s/ibmnbblpho4xtbeq2x8ir60nrk37208v
Raw performance
Raw performance
10,35
20,46
42,35
85,64
3,3 6,81
13,73
25,58
1,96 3,51 7,23
15,15
0
10
20
30
40
50
60
70
80
90
4096 M. ROWS 8192 M. ROWS 16384 M. ROWS 32768 M. ROWS
CPUTime(Sec)
RAW Performance (CPU) for SIMD Instructions
NO SIMD SSE4 (XMM Registers) AVX (YMM Registers)
Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
SIMD instructions … inside
Oracle 12c
´ In Memory Data Structure
´ In Memory Compression Unit :
IMCU
´ IMCU is the unit of column store
allocation
´ Target size is 1M rows
(controlled by _inmemory_imcu_target_rows)
´ One IMCU can contain more
than one column
´ Each column in one IMCU is a
column unit (CU)
SIMD instructions … inside
Oracle 12c
´ In memory column store storage indexes
´ For each column unit, min and max values are
maintained in a storage index
´ Storage Indexes provide CU pruning
´ Information about CU available in GV$IM_COL_CU
(Undocumented. See BugID 19361690)
IMCU
Pruning
SIMD instructions … inside
Oracle 12c
´ The way your data is sorted matters for best IMCU pruning
SIMD instructions … inside
Oracle 12c
´ SIMD extensions are used with In Memory storage
indexes for efficient filtering
1. IM Storage Indexes do IMCU pruning
2. SIMD instructions apply efficiently filter predicates
IMCU
Pruning
Prod-id
10
10
14
14
10
Filtering
with SIMD
SIMD instructions … inside
Oracle 12c
´ Oracle 12c uses specific libraries for SIMD (and
compression)
´ Located in $ORACLE_HOME/lib
´ libshpksse4212.so for SSE4.2 extensions
Compiled with ICC v12 with specific xsse4.2 switch
´ libshpkavx12.so for AVX extensions
Compiled with ICC v12 with specific xavx switch
´ libshpkavx212.so for AVX2 extensions
Not yet implemented (8 functions implemented)
No ICC avx2 switch used because ICC v12 doesn’t support AVX2
´ Thanks Tanel Pöder
SIMD instructions … inside
Oracle 12c
´ Oracle SIMD related functions
´ Located in kdzk kernel module (HPK)
´ Part of Advanced Compression library (ADVCMP)
´ Easily tracked with systemtap
SIMD instructions … inside
Oracle 12c
´ How Oracle uses SIMD extensions ?
It depends on many parameters
´ OS Level : /proc/cpuinfo
´ AVX and AVX2 support
´ SSE4 Support only
SIMD instructions … inside
Oracle 12c
´ Which library am I using ?
´ pmap
´ AVX support
´ SSE4 support
SIMD instructions … inside
Oracle 12c
´ Which compiler options have been used ?
´ Read “comment” section in ELF
´ Read the corresponding compiler documentation
[oracle@oel7 conf]$ readelf -p .comment $ORACLE_HOME/lib/libshpkavx12.so |
> | egrep -i 'intel|gcc' | egrep 'xavx|mavx’
[ 2c] -?comment:Intel(R) C Intel(R) 64 Compiler XE for applications running on
Intel(R) 64, Version 12.0 Build 20120731
…/…
-DNTEV_USE_EPOLL -DNET_USE_LDAP -xavx
SIMD instructions … inside
Oracle 12c
´ How are SIMD registers used by Oracle ?
´ GDB
´ To get the call stack (backtrace)
´ To set breakpoints on interesting functions
´ To view register contents (traditional and SIMD)
´ “Info registers” for traditional registers
´ “Info all-registers” for all registers (SIMD reg included)
´ (gdb) print $ymmX.<format>
Format can be v8_float, v4_double, v32_int8, v16_int16, v8_int32,
v4_int64, or v2_int128
SIMD instructions … inside
Oracle 12c
In red, register content
has been modified
In blue, the second
part of the SIMD
registers (128 bits) is
empty
SIMD instructions … inside
Oracle 12c
´ Oracle IM can use AVX or SSE4 extensions for SIMD
operations
´ When AVX is used
It uses only 128 bits out of 256 bits wide registers
• AVX adds new register-state through the 256-bit wide
YMM register file
• Explicit operating system support is required to properly
save and restore AVX's expanded registers
between context switches
• Without this, only AVX 128-bit is supported
SIMD instructions … inside
Oracle 12c
´The culprit
´ Oracle 12.1.0.2 is supported from EL5 onwards
´ EL5 Redhat Kernel is 2.6.18 and this flag
(xsave) is supported from 2.6.30 kernels
´ For compatibility reasons, Oracle has to
compile its code on 2.6.18 kernels
SIMD instructions … inside
Oracle 12c
´Or maybe …
´ Oracle needs to use values packed below
32bits wide
Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
Tracing SIMD in Oracle 12c
´ Oradebug has 2 components related to IM
Tracing SIMD in Oracle 12c
´ Interesting components to trace for SIMD
and/or IMCU Pruning are :
´ IM_optimizer
´Gives information about CBO calculation
related to IM
´ ADVCMP_DECOMP.*
´ADVCMP_DECOMP_HPK : SIMD functions
´ADVCMP_DECOMP_PCODE : Portable Code
Machine (usually comparison functions and
results)
Tracing SIMD in Oracle 12c
´ IM_optimizer
´ Information available in trace file
´ IMCU Pruning ratio
´ CU decompression costing (per IMCU)
´ Predicate evaluation costing (per row)
´ Statement has to be parsed to get results
Tracing SIMD in Oracle 12c
select prod_id,cust_id,time_id from laurent.s_capa_high where amount_sold=20;
Tracing SIMD in Oracle 12c
´ This information is available in CBO trace file (10053 or
SQL_costing event)
Tracing SIMD in Oracle 12c
´ ADVCMP_DECOMP
´ ADVCMP_DECOMP_HPK
´ Information is available in the trace file (for each IMCU
processed)
´ Used library and function
´ Number of rows and counting algorithm
´ Processing rate (comparison and decompression if relevant)
´ But nothing on the results of the processing L
Tracing SIMD in Oracle 12c
´ ADVCMP_DECOMP
´ ADVCMP_DECOMP_HPK
´ Gives information about SIMD function usage and filtering
(after IMCU pruning)
´ Example: inmemory table with NO MEMCOMPRESS or DML
compression
Tracing SIMD in Oracle 12c
´ ADVCMP_DECOMP
´ ADVCMP_DECOMP_HPK
´ Example: inmemory compressed table
´ SIMD are used only in the kdzk_eq_dict functions
Tracing SIMD in Oracle 12c
´ My thoughts about compression/decompression
´ NO MEMCOMPRESS / COMPRESS FOR DML
´ kdzk*dynp* functions (ex: kdzk_eq_dynp_16bit,
kdzk_le_dynp_32bit etc.)
´ FOR QUERY LOW / QUERY HIGH
´ Dictionary Encoding (LZW ?) : kdzk_*dict* functions (ex:
kdzk_eq_dict_7bit, kdzk_le_dict_4bit etc.)
´ Run Length Encoding: kdzk_burst_rle* functions (ex:
kdzk_burst_rle_8bit, kdzk_burst_rle_16bit …)
´ Bit packing compression: kdzk*fixed* functions (ex:
kdzk_ge_lt_fixed_32bit, kdzk_lt_fixed_8bit …)
Tracing SIMD in Oracle 12c
´ My thoughts about compression/decompression
´ FOR CAPACITY LOW
´ FOR QUERY LOW + additional proprietary compression (OZIP)
´ Functions: ozip_decode_dict*, kdzk_ozip_decode* (Ex:
kdzk_ozip_decode_dydi, ozip_decode_dict_9_bit etc.)
´ FOR CAPACITY HIGH
´ FOR QUERY HIGH + heavy weigth compression algorithm
´ Compression/decompression method depends on:
´ Datatype
´ Column Compression Unit size
´ Column contents
leturgezl@gmail.com
http://laurent-leturgez.com
@lleturgez

More Related Content

PDF
A meta model supporting both hardware and smalltalk-based execution of FPGA c...
PDF
Scada Strangelove - 29c3
PPTX
Tools for Practical Attacks on Analog-to-Digital Conversion
PDF
Never Trust Your Inputs
PDF
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC
PDF
Practical reverse engineering and exploit development for AVR-based Embedded ...
PDF
Message Signaled Interrupts
PPT
W10: Interrupts
A meta model supporting both hardware and smalltalk-based execution of FPGA c...
Scada Strangelove - 29c3
Tools for Practical Attacks on Analog-to-Digital Conversion
Never Trust Your Inputs
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC
Practical reverse engineering and exploit development for AVR-based Embedded ...
Message Signaled Interrupts
W10: Interrupts

What's hot (10)

PPTX
Embedded systems design @ defcon 2015
PDF
Never Trust Your Inputs or how to fool an ADC
PDF
Keysight Mini-ICT - Testing Days México
PDF
Introduction to FPGA, VHDL
PDF
Embedded systems development Defcon 19
PDF
Session two
PPTX
Intel galileo gen 2
PDF
14157565 embedded-programming
PPTX
Recon: Hopeless relay protection for substation automation
PDF
Jtag presentation
Embedded systems design @ defcon 2015
Never Trust Your Inputs or how to fool an ADC
Keysight Mini-ICT - Testing Days México
Introduction to FPGA, VHDL
Embedded systems development Defcon 19
Session two
Intel galileo gen 2
14157565 embedded-programming
Recon: Hopeless relay protection for substation automation
Jtag presentation
Ad

Similar to SIMD inside and outside Oracle 12c In Memory (20)

PPTX
Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)
PPTX
SIMD inside and outside oracle 12c
PPTX
lec2 - Modern Processors - SIMD.pptx
PPTX
Something about SSE and beyond
PDF
Designing C++ portable SIMD support
PPTX
SIMD Processing Using Compiler Intrinsics
PPTX
SIMD.pptx
PDF
Simd programming introduction
PPTX
x86 architecture
 
PDF
X86 SIMD Instructions
PPTX
Parallel Processors (SIMD)
PPTX
Parallel Processors (SIMD)
PDF
Joel Falcou, Boost.SIMD
PDF
How I learned to stop worrying and love the dark silicon apocalypse.pdf
PPTX
Data-Level Parallelism in Microprocessors
PPT
IBM processors,registers,segmentation
PPT
System Software introduction and SIC machine Architecture
PPTX
Data-Level Parallelism in Vector, SIMD, and GPU Architectures.pptx
PPTX
Advanced computer architecture
PPT
Intel microprocessor history lec12_x86arch.ppt
Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)
SIMD inside and outside oracle 12c
lec2 - Modern Processors - SIMD.pptx
Something about SSE and beyond
Designing C++ portable SIMD support
SIMD Processing Using Compiler Intrinsics
SIMD.pptx
Simd programming introduction
x86 architecture
 
X86 SIMD Instructions
Parallel Processors (SIMD)
Parallel Processors (SIMD)
Joel Falcou, Boost.SIMD
How I learned to stop worrying and love the dark silicon apocalypse.pdf
Data-Level Parallelism in Microprocessors
IBM processors,registers,segmentation
System Software introduction and SIC machine Architecture
Data-Level Parallelism in Vector, SIMD, and GPU Architectures.pptx
Advanced computer architecture
Intel microprocessor history lec12_x86arch.ppt
Ad

More from Laurent Leturgez (6)

PPTX
Python and Oracle : allies for best of data management
PPTX
Architecting a datalake
PDF
Oracle hadoop let them talk together !
PDF
Oracle Database : Addressing a performance issue the drilldown approach
PDF
Improve oracle 12c security
PDF
Which cloud provider for your oracle database
Python and Oracle : allies for best of data management
Architecting a datalake
Oracle hadoop let them talk together !
Oracle Database : Addressing a performance issue the drilldown approach
Improve oracle 12c security
Which cloud provider for your oracle database

Recently uploaded (20)

PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
medical staffing services at VALiNTRY
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
top salesforce developer skills in 2025.pdf
PPT
Introduction Database Management System for Course Database
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
System and Network Administration Chapter 2
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Which alternative to Crystal Reports is best for small or large businesses.pdf
Design an Analysis of Algorithms II-SECS-1021-03
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
medical staffing services at VALiNTRY
2025 Textile ERP Trends: SAP, Odoo & Oracle
top salesforce developer skills in 2025.pdf
Introduction Database Management System for Course Database
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Wondershare Filmora 15 Crack With Activation Key [2025
Internet Downloader Manager (IDM) Crack 6.42 Build 41
How to Migrate SBCGlobal Email to Yahoo Easily
How Creative Agencies Leverage Project Management Software.pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
System and Network Administration Chapter 2
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Adobe Illustrator 28.6 Crack My Vision of Vector Design

SIMD inside and outside Oracle 12c In Memory

  • 1. SIMD Instructions outside and inside Oracle 12c Laurent Léturgez – 2015
  • 2. ABOUT ME ´ Oracle Consultant since 2001 ´ Former developer (C, Java, perl, PL/SQL) ´ Blogger since 2004 ´ http://laurent.leturgez.free.fr (In french and discontinued) ´ http://laurent-leturgez.com ´ Twitter : @lleturgez ´ Paris Oracle Meetup Organizer: @ParisOracle ´ OCM 11g
  • 3. Agenda ´ SIMD Instructions, outside Oracle 12c ´ What is a SIMD instruction ? ´ Will my application use SIMD ? ´ Raw Performance ´ SIMD Instructions, inside Oracle 12c ´ How SIMD instructions are used inside Oracle 12c ´ Tracing SIMD in Oracle 12c
  • 4. Caveats ´ Most of the topics are from ´ My own researches ´ My past life as a developer ´ Some of the topics are about internals, so: ´ Analysis and conclusion may be incomplete ´ Future versions of Oracle may change the features ´ Tests have been done with Oracle 12.1.0.2, Oracle Enterprise Linux 7.1, VMWare Fusion 7 (And VirtualBox)
  • 5. Before we start … ´ Some fundamentals (from Dennis Yurichev’s book) ´ CPU register : […]The easiest way to understand a register is to think of it as an untyped temporary variable. Imagine if you were working with high-level PL1 and could only use eight 32-bit (or 64-bit) variables. Yet a lot can be done using just these! ´ Instruction : A primitive CPU command. The simplest examples include: moving data between registers, working with memory and arithmetic primitives. As a rule, each CPU has its own instruction set architecture (ISA). ´ Assembly language : Mnemonic code and some extensions like macros which are intended to make a programmer’s life easier. http://beginners.re/Reverse_Engineering_for_Beginners-en.pdf
  • 6. Agenda ´ SIMD Instructions, outside Oracle 12c ´ What is a SIMD instruction ? ´ Will my application use SIMD ? ´ Raw Performance ´ SIMD Instructions, inside Oracle 12c ´ How SIMD instructions are used inside Oracle 12c ´ Tracing SIMD in Oracle 12c
  • 7. SIMD instructions … outside Oracle 12c ´ SIMD stands for Single Instruction Multiple Data ´ Process multiple data ´ In one CPU instruction ´ Based on ´ Specific registers ´ Specific CPU instructions and sets of instructions ´ Not Oracle specific ´ CPU Architecture specific ´ Intel ´ IBM ´ Sparc ´ This presentation is mainly about Intel architecture
  • 8. SIMD instructions … outside Oracle 12c ´ What is a SIMD register ? ´ It’s a CPU register ´ Wider than traditional registers (RDI, RSI, R8, R9 etc.) ´ 128 up to 512 bits wide ´ Contains many data
  • 9. SIMD instructions … outside Oracle 12c ´ Scalar operation ´ an array of 4 integers {1,2,3,4} ´ add 1 to each value Reg1 Reg2 Reg3 CPU RAM In Out 2 3 41 1 Reg1 Reg2 Reg3 CPU RAM In Out 2 3 41 1 1 Reg1 Reg2 Reg3 CPU RAM In Out 2 3 41 1 1 2 Reg1 Reg2 Reg3 CPU RAM In Out 2 3 41 1 1 2 2 Reg1 Reg2 Reg3 CPU RAM In Out 2 3 41 4 1 5 3 4 52 …/ … LOAD ADD SAVE 4 LOAD 4 ADD 4 SAVE
  • 10. SIMD instructions … outside Oracle 12c ´ SIMD operation ´ an array of 4 integers {1,2,3,4} ´ add 1 to each value SIMD Reg1 CPU RAM In Out 2 3 41 1 1 11SIMD Reg2 SIMD Reg3 SIMD Reg1 CPU RAM In Out 2 3 41 2 3 41 1 1 11SIMD Reg2 SIMD Reg3 SIMD Reg1 CPU RAM In Out 2 3 41 2 3 41 1 1 11 3 4 52 SIMD Reg2 SIMD Reg3 SIMD Reg1 CPU RAM In Out 2 3 41 3 4 52 2 3 41 1 1 11 3 4 52 SIMD Reg2 SIMD Reg3 LOAD ADD SAVE
  • 11. SIMD instructions … outside Oracle 12c ´ MMX: MultiMedia eXtensions (Pentium II) ´ 64 bits registers ´ 8 registers (MM0 to MM7) ´ SSE: Streaming SIMD Extensions: (Pentium III) ´ 128 bits registers ´ 8 registers (XMM0 to XMM7) ´ Only four 32 bits single precision floating point numbers ´ SSE2 (Pentium IV), SSE3 (Pentium IV Prescott, Xeon Nocona), SSSE3 (Xeon 5100, Core 2), SSE4.1 (Penryn), SSE4.2 (Nehalem) ´ 128 bits registers ´ 16 registers (XMM0 to XMM15) ´ Usage expansion (two 64 bits double precision, four 32 bits integers until sixteen 8 bits bytes) ´ New instructions
  • 12. SIMD instructions … outside Oracle 12c ´ AVX: Advanced Vector eXtension (Sandy Bridge processors) ´ XMM registers are extended to 256 bits ´ 16 AVX registers named YMM0 to YMM15 ´ Three operand instructions (non destructive) : A+B=C rather than A=A+B ´ Some alignment requirements are relaxed ´ AVX2 (Introduced with Haswell processors) ´ 256 bits registers ´ New instructions (shifting, value broadcasting etc…) ´ AVX-512 or AVX3 (Skylake processors) ´ 512 bits registers ´ 32 registers named ZMM0 to ZMM31 ´ AVX-1024 … the future ´ 1024 bits registers
  • 13. SIMD instructions … outside Oracle 12c ´ SIMD instructions ´ Reduce number of CPU cycles and memory pressure ´ Process data in parallel without any contention ´ Need a programming method (vector programming) with some constraints (data alignments etc.) ´ Size matters ´ Wider registers, more data loaded (but wider register files increase CPU power consumption à Challenge) ´ Processing is always done as a single CPU Cycle ´ More registers ´ Use cases ´ Data Filtering ´ Graphics ´ Bioinformatics …
  • 14. SIMD instructions … outside Oracle 12c ´ Intel API (C/C++) : Intel Intrinsics Guide https://software.intel.com/sites/landingpage/IntrinsicsGuide/ ´ Sample codes: https://app.box.com/simdSampleC-2015
  • 15. SIMD instructions … outside Oracle 12c
  • 16. Agenda ´ SIMD Instructions, outside Oracle 12c ´ What is a SIMD instruction ? ´ Will my application use SIMD ? ´ Raw Performance ´ SIMD Instructions, inside Oracle 12c ´ How SIMD instructions are used inside Oracle 12c ´ Tracing SIMD in Oracle 12c
  • 17. Will my application use SIMD registers and instructions ? ´ It depends on : ´ Hardware ´ Consult processors datasheets to see which instruction set extensions are used (if many) ´ http://ark.intel.com/#@Processors ´ Hypervisor ´ Some (old) hypervisors do not support modern extensions ´ VirtualBox versions <5.0 don’t support SSE4, AVX and AVX2 ´ Hyper-V on W2008R2-SP1 needs patch for specific processors to support AVX
  • 18. ´ It depends on the Operating System ´AVX (256 bits) is supported from ´ Linux Kernel >= 2.6.30 ´ Redhat EL5 : 2.6.18 ´ Oracle EL5 w/UEK : 2.6.32 AVX needs xsave kernel parameter ´ Solaris 10 upd 10 and Solaris 11 ´ Windows 2008 R2 SP1 Will my application use SIMD registers and instructions ?
  • 19. ´ It depends on the compiler ´ GCC ´ > 4.6 for AVX support ´ Use of specific switches (-msse2, -msse4.1, msse4.2, - mavx, -mavx2 …) ´ Intel C/C++ Compiler (ICC) ´ > 11.1 for AVX Support and > 13.0 for AVX2 support ´ Use of specific switches (-xsse4.2, -xavx, -xcore-avx2 …) ´ Beware of optimization switches (-O1,-O2, -O3) ´ More … disassemble (if you are allowed to J ) ´ Registers ´ Assembler instructions Will my application use SIMD registers and instructions ?
  • 20. Agenda ´ SIMD Instructions, outside Oracle 12c ´ What is a SIMD instruction ? ´ Will my application use SIMD ? ´ Raw Performance ´ SIMD Instructions, inside Oracle 12c ´ How SIMD instructions are used inside Oracle 12c ´ Tracing SIMD in Oracle 12c
  • 21. ´ Based on a C program ´ Used CPU: Haswell microarchitecture (Core i7-4960HQ). AVX/AVX2 enabled ´ 3 tests : No SIMD, SSE4, AVX ´ Input: one array containing 1Million values. ´ Goal: Add 1 to each value, each million values repeated 4k, 8k, 16k and 32k times ´ CPU Time(s) = f(#rows) “Quick and Dirty” Sample code available here: https://app.box.com/s/ibmnbblpho4xtbeq2x8ir60nrk37208v Raw performance
  • 22. Raw performance 10,35 20,46 42,35 85,64 3,3 6,81 13,73 25,58 1,96 3,51 7,23 15,15 0 10 20 30 40 50 60 70 80 90 4096 M. ROWS 8192 M. ROWS 16384 M. ROWS 32768 M. ROWS CPUTime(Sec) RAW Performance (CPU) for SIMD Instructions NO SIMD SSE4 (XMM Registers) AVX (YMM Registers)
  • 23. Agenda ´ SIMD Instructions, outside Oracle 12c ´ What is a SIMD instruction ? ´ Will my application use SIMD ? ´ Raw Performance ´ SIMD Instructions, inside Oracle 12c ´ How SIMD instructions are used inside Oracle 12c ´ Tracing SIMD in Oracle 12c
  • 24. SIMD instructions … inside Oracle 12c ´ In Memory Data Structure ´ In Memory Compression Unit : IMCU ´ IMCU is the unit of column store allocation ´ Target size is 1M rows (controlled by _inmemory_imcu_target_rows) ´ One IMCU can contain more than one column ´ Each column in one IMCU is a column unit (CU)
  • 25. SIMD instructions … inside Oracle 12c ´ In memory column store storage indexes ´ For each column unit, min and max values are maintained in a storage index ´ Storage Indexes provide CU pruning ´ Information about CU available in GV$IM_COL_CU (Undocumented. See BugID 19361690) IMCU Pruning
  • 26. SIMD instructions … inside Oracle 12c ´ The way your data is sorted matters for best IMCU pruning
  • 27. SIMD instructions … inside Oracle 12c ´ SIMD extensions are used with In Memory storage indexes for efficient filtering 1. IM Storage Indexes do IMCU pruning 2. SIMD instructions apply efficiently filter predicates IMCU Pruning Prod-id 10 10 14 14 10 Filtering with SIMD
  • 28. SIMD instructions … inside Oracle 12c ´ Oracle 12c uses specific libraries for SIMD (and compression) ´ Located in $ORACLE_HOME/lib ´ libshpksse4212.so for SSE4.2 extensions Compiled with ICC v12 with specific xsse4.2 switch ´ libshpkavx12.so for AVX extensions Compiled with ICC v12 with specific xavx switch ´ libshpkavx212.so for AVX2 extensions Not yet implemented (8 functions implemented) No ICC avx2 switch used because ICC v12 doesn’t support AVX2 ´ Thanks Tanel Pöder
  • 29. SIMD instructions … inside Oracle 12c ´ Oracle SIMD related functions ´ Located in kdzk kernel module (HPK) ´ Part of Advanced Compression library (ADVCMP) ´ Easily tracked with systemtap
  • 30. SIMD instructions … inside Oracle 12c ´ How Oracle uses SIMD extensions ? It depends on many parameters ´ OS Level : /proc/cpuinfo ´ AVX and AVX2 support ´ SSE4 Support only
  • 31. SIMD instructions … inside Oracle 12c ´ Which library am I using ? ´ pmap ´ AVX support ´ SSE4 support
  • 32. SIMD instructions … inside Oracle 12c ´ Which compiler options have been used ? ´ Read “comment” section in ELF ´ Read the corresponding compiler documentation [oracle@oel7 conf]$ readelf -p .comment $ORACLE_HOME/lib/libshpkavx12.so | > | egrep -i 'intel|gcc' | egrep 'xavx|mavx’ [ 2c] -?comment:Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.0 Build 20120731 …/… -DNTEV_USE_EPOLL -DNET_USE_LDAP -xavx
  • 33. SIMD instructions … inside Oracle 12c ´ How are SIMD registers used by Oracle ? ´ GDB ´ To get the call stack (backtrace) ´ To set breakpoints on interesting functions ´ To view register contents (traditional and SIMD) ´ “Info registers” for traditional registers ´ “Info all-registers” for all registers (SIMD reg included) ´ (gdb) print $ymmX.<format> Format can be v8_float, v4_double, v32_int8, v16_int16, v8_int32, v4_int64, or v2_int128
  • 34. SIMD instructions … inside Oracle 12c In red, register content has been modified In blue, the second part of the SIMD registers (128 bits) is empty
  • 35. SIMD instructions … inside Oracle 12c ´ Oracle IM can use AVX or SSE4 extensions for SIMD operations ´ When AVX is used It uses only 128 bits out of 256 bits wide registers • AVX adds new register-state through the 256-bit wide YMM register file • Explicit operating system support is required to properly save and restore AVX's expanded registers between context switches • Without this, only AVX 128-bit is supported
  • 36. SIMD instructions … inside Oracle 12c ´The culprit ´ Oracle 12.1.0.2 is supported from EL5 onwards ´ EL5 Redhat Kernel is 2.6.18 and this flag (xsave) is supported from 2.6.30 kernels ´ For compatibility reasons, Oracle has to compile its code on 2.6.18 kernels
  • 37. SIMD instructions … inside Oracle 12c ´Or maybe … ´ Oracle needs to use values packed below 32bits wide
  • 38. Agenda ´ SIMD Instructions, outside Oracle 12c ´ What is a SIMD instruction ? ´ Will my application use SIMD ? ´ Raw Performance ´ SIMD Instructions, inside Oracle 12c ´ How SIMD instructions are used inside Oracle 12c ´ Tracing SIMD in Oracle 12c
  • 39. Tracing SIMD in Oracle 12c ´ Oradebug has 2 components related to IM
  • 40. Tracing SIMD in Oracle 12c ´ Interesting components to trace for SIMD and/or IMCU Pruning are : ´ IM_optimizer ´Gives information about CBO calculation related to IM ´ ADVCMP_DECOMP.* ´ADVCMP_DECOMP_HPK : SIMD functions ´ADVCMP_DECOMP_PCODE : Portable Code Machine (usually comparison functions and results)
  • 41. Tracing SIMD in Oracle 12c ´ IM_optimizer ´ Information available in trace file ´ IMCU Pruning ratio ´ CU decompression costing (per IMCU) ´ Predicate evaluation costing (per row) ´ Statement has to be parsed to get results
  • 42. Tracing SIMD in Oracle 12c select prod_id,cust_id,time_id from laurent.s_capa_high where amount_sold=20;
  • 43. Tracing SIMD in Oracle 12c ´ This information is available in CBO trace file (10053 or SQL_costing event)
  • 44. Tracing SIMD in Oracle 12c ´ ADVCMP_DECOMP ´ ADVCMP_DECOMP_HPK ´ Information is available in the trace file (for each IMCU processed) ´ Used library and function ´ Number of rows and counting algorithm ´ Processing rate (comparison and decompression if relevant) ´ But nothing on the results of the processing L
  • 45. Tracing SIMD in Oracle 12c ´ ADVCMP_DECOMP ´ ADVCMP_DECOMP_HPK ´ Gives information about SIMD function usage and filtering (after IMCU pruning) ´ Example: inmemory table with NO MEMCOMPRESS or DML compression
  • 46. Tracing SIMD in Oracle 12c ´ ADVCMP_DECOMP ´ ADVCMP_DECOMP_HPK ´ Example: inmemory compressed table ´ SIMD are used only in the kdzk_eq_dict functions
  • 47. Tracing SIMD in Oracle 12c ´ My thoughts about compression/decompression ´ NO MEMCOMPRESS / COMPRESS FOR DML ´ kdzk*dynp* functions (ex: kdzk_eq_dynp_16bit, kdzk_le_dynp_32bit etc.) ´ FOR QUERY LOW / QUERY HIGH ´ Dictionary Encoding (LZW ?) : kdzk_*dict* functions (ex: kdzk_eq_dict_7bit, kdzk_le_dict_4bit etc.) ´ Run Length Encoding: kdzk_burst_rle* functions (ex: kdzk_burst_rle_8bit, kdzk_burst_rle_16bit …) ´ Bit packing compression: kdzk*fixed* functions (ex: kdzk_ge_lt_fixed_32bit, kdzk_lt_fixed_8bit …)
  • 48. Tracing SIMD in Oracle 12c ´ My thoughts about compression/decompression ´ FOR CAPACITY LOW ´ FOR QUERY LOW + additional proprietary compression (OZIP) ´ Functions: ozip_decode_dict*, kdzk_ozip_decode* (Ex: kdzk_ozip_decode_dydi, ozip_decode_dict_9_bit etc.) ´ FOR CAPACITY HIGH ´ FOR QUERY HIGH + heavy weigth compression algorithm ´ Compression/decompression method depends on: ´ Datatype ´ Column Compression Unit size ´ Column contents