SlideShare a Scribd company logo
Transactional Storage for MySQL
        FAST. RELIABLE. PROVEN.



InnoDB Internals: InnoDB File
  Formats and Source Code
         Structure
   MySQL University, October 2009


                   Calvin Sun
               Principal Engineer
               Oracle Corporation
Today’s Topics
•   Goals of InnoDB
•   Key Functional Characteristics
•   InnoDB Design Considerations
•   InnoDB Architecture
•   InnoDB On Disk Format
•   Source Code Structure
•   Q&A
Goals of InnoDB


•   OLTP oriented
•   Performance, Reliability, Scalability
•   Data Protection
•   Portability
InnoDB Key Functional
            Characteristics
•   Full transaction support
•   Row-level locking
•   MVCC
•   Crash recovery
•   Efficient IO
Design Considerations
• Modeled on Gray & Reuter’s “Transactions
 Processing: Concepts & Techniques”
• Also emulated the Oracle architecture
• Added unique subsystems
  • Doublewrite
  • Insert buffering
  • Adaptive hash index
• Designed to evolve with changing
  hardware & requirements
InnoDB Architecture
    Server                        Applications


 Handler API         Embedded InnoDB API
                 Transaction
                   Cursor / Row
   Mini-
                      B-tree             Lock
transaction
                      Page
                    Buffer
              File Space Manager
                     IO
InnoDB On Disk Format
•   InnoDB Database Files
•   InnoDB Tablespaces
•   InnoDB Pages / Extents
•   InnoDB Rows
•   InnoDB Indexes
•   InnoDB Logs
•   File Format Design Considerations
InnoDB Database Files
                               MySQL Data Directory
System tablespace

                                                         InnoDB
                                                          tables
                           internal
                             data                                  .frm files
                          dictionary

                            insert        OR          innodb_file_per_table
                            buffer

                            undo
                            logs
                                                               .ibd files

                     ibdata files
InnoDB Tablespaces
• A tablespace consists of multiple files and/or
  raw disk partitions.
  file_name:file_size[:autoextend[:max:max_file_size]]
• A file/partition is a collection of segments.
• A segment consists of fixed-length pages.
• The page size is always 16KB in uncompressed
  tablespaces, and 1KB-16KB in compressed
  tablespaces (for both data and index).
System Tablespace
•   Internal Data Dictionary
•   Undo
•   Insert Buffer
•   Doublewrite Buffer
•   MySQL Replication Info
InnoDB Tablespaces
 Tablespace
                                           Segment
                                      Extent          Extent
 Leaf node segment
Non-leaf node segment                 Extent          Extent

                                                                   Extent
  Rollback segment
                                           Page
                                     Row        Row
             Row
             Trx id                 Row    Row Row
           Roll pointer
       Field pointers               Row   Row

 Field 1    Field 2       Field n

                                                        an extent = 64 pages
InnoDB Pages
                        InnoDB Page Types
      Symbol             Value                    Notes
   FIL_PAGE_INODE          3     File segment inode
   FIL_PAGE_INDEX        17855   B-tree node
 FIL_PAGE_TYPE_BLOB       10     Uncompressed BLOB page

 FIL_PAGE_TYPE_ZBLOB      11     1st compressed BLOB page
FIL_PAGE_TYPE_ZBLOB2      12     Subsequent compressed BLOB page
  FIL_PAGE_TYPE_SYS        6     System page

FIL_PAGE_TYPE_TRX_SYS      7     Transaction system page
                                 i-buf bitmap, I-buf free list, file space
       others                    header, extent desp page, new
                                 allocated page
InnoDB Pages
A page consists of: a page header, a page
  trailer, and a page body (rows or other
  contents).
                             Page header
               Row                 Row         Row       Row

                             Row                        Row

              Row       Row              Row


                 Row   Row


                                     row offset array
                              Page trailer
Page Declares
typedef struct                    /* a space address */
   {
     ulint     pageno;            /* page number within the file */
     ulint     boffset;           /* byte offset within the page */
   } fil_addr_t;

typedef struct
  {
   ulint      checksum;      /*
                             checksum of the page (since 4.0.14) */
   ulint      page_offset;   /*
                             page offset inside space */
   fil_addr_t previous;      /*
                             offset or fil_addr_t */
   fil_addr_t next;          /*
                             offset or fil_addr_t */
   dulint     page_lsn;      /*
                             lsn of the end of the newest
                              modification log record to the page */
  PAGE_TYPE page type;    /* file page type */
  dulint     file_flush_lsn;/* the file has been flushed to disk
                             at least up to this lsn */
  int         space_id;  /* space id of the page */
  char        data[];    /* will grow */
  ulint       page_lsn;  /* the last 4 bytes of page_lsn */
  ulint       checksum;  /* page checksum, or checksum magic, or 0 */
  } PAGE, *PAGE;
InnoDB Compressed Pages
   Page header      • InnoDB keeps a “modification
                      log” in each page
                • Updates & inserts of small
compressed data records are written to the log
                  w/o page reconstruction;
                  deletes don’t even require
                  uncompression
 modification log   • Log also tells InnoDB if the
                      page will compress to fit page
   empty space        size
  BLOB pointers     • When log space runs out,
  page directory      InnoDB uncompresses the
   Page trailer       page, applies the changes and
                      recompresses the page
InnoDB Rows
                             …     prefix(768B)          …
                                                          COMACT format



                                                                overflow
             20 bytes                                             page
       …                    …
                              DYNAMIC format



                                        overflow
                                          page



Record hdr   Trx ID     Roll ptr   Fld ptrs overflow-page ptr .. Field values
InnoDB Indexes - Primary
                           PK values
                           001 - nnn
                                                                         ●Data   rows are stored
                  …             …                                        in the B-tree leaf
                                                                         nodes of a clustered
      001 –
       500
                            500 –
                             800
                                                     801 –
                                                      nnn                index
                                                                          ● B-tree is organized
                                                                   xxx
001
 -
275
          276 –
           500
                     501
                      -
                     630
                             631
                              -
                             768
                                       769
                                        -
                                       800
                                               801
                                                -
                                               949
                                                             950
                                                              -
                                                             xxx
                                                                    -
                                                                   nnn      by primary key or
                                                                            non-null unique key
                                           clustered
                                                                            of table, if defined;
                  Key values
                                         (primary key)
                                             index                          else, an internal
                   501-
                   501-630
                   + data for
              corresponding rows
                                             Primary Index                  column with 6-byte
                                                                            ROW_ID is added.
InnoDB Indexes - Secondary
                                          clustered
                                     clustered
                                       (primary key)
                                   (primary PK values
                                              key)
                                        index - nnn
                                              001
                                            index

● Secondary index B-
  tree leaf nodes
  contain, for each key
  value, the primary           B-tree leaf nodes, containing data
  keys of the
  corresponding rows,
  used to access                            key values
                                               A Z

  clustering index to
  obtain the data
             Secondary Index
                               B-tree leaf nodes, containing PKs

                                        Secondary index
                                     Secondary index
InnoDB Logging

                              Rollback segments




     Log Buffer                                   Buffer Pool

           log thread
                                                      write thread




Log File                Log File
  #1
           redo                                   DATA
                          #2                                         rollback
            log
                                   log files
                                                       ibdata files
InnoDB Redo Log



         end of log      start of log        last checkpoint
                                     min LSN

Redo log structure:
        Space id      PageNo    OpCode       Data
File Format Management
              • Builtin InnoDB format: “Antelope”
              • New “Barracuda” format enables
                compression,ROW_FORMAT=DYNAMIC
   .ibd
                • Fast index creation, other features do not
 data files       require Barracuda file format
 (file per
  table)      • Builtin InnoDB can access “Antelope”
                databases, but not “Barracuda”
                databases
                • Check file format tag in system tablespace
                  on startup
              • Enable a file format with new dynamic
                parameter innodb_file_format
              • Preserves ability to downgrade easily
InnoDB File Format Design
      Considerations
• Durability
  • Logging, doublewrite, checksum;
• Performance
  • Insert buffering, table compression
• Efficiency
  • Dynamic row format, table compression
• Compatibility
  • File format management
Source Code Structure
• 31 subdirectories
• Relevant InnoDB source files on file
  formats
  • Tablespace: fsp0fsp {.c, .ic, .h}
  • Page: page0page, page0zip {.c, .ic, .h}
  • Log: log0log {.c, .ic, .h}
Source Code Subdirectories
•   buf       •   ibuf      •   que
•   data      •   include   •   read
•   db        •   lock      •   rem
•   dict      •   log       •   row
•   dyn       •   math      •   srv
•   eval      •   mem       •   sync
•   fil       •   mtr       •   thr
•   fsp       •   os        •   trx
•   fut       •   page      •   usr
•   ha        •   pars      •   ut
•   handler
Summary:
        Durability, Performance,
       Compatibility & Efficiency
• InnoDB is the leading transactional storage engine
  for MySQL
• InnoDB’s architecture is well-suited to modern, on-
  line transactional applications; as well as embedded
  applications.
• InnoDB’s file format is designed for high durability,
  better performance, and easy to manage
QUESTIONS
 ANSWERS
InnoDB Size Limits
•   Max # of tables: 4 G
•   Max size of a table: 32TB
•   Columns per table: 1000
•   Max row size: n*4 GB
    • 8 kB if stored on the same page
    • n*4 GB with n BLOBs
• Max key length: 3500
• Maximum tablespace size: 64 TB
• Max # of concurrent trxs: 1023

More Related Content

PDF
The InnoDB Storage Engine for MySQL
PDF
Parallel Replication in MySQL and MariaDB
PDF
MySQL Advanced Administrator 2021 - 네오클로바
PDF
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
PDF
Oracle db performance tuning
PDF
InnoDB MVCC Architecture (by 권건우)
PDF
Upgrade to MySQL 8.0!
PDF
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
The InnoDB Storage Engine for MySQL
Parallel Replication in MySQL and MariaDB
MySQL Advanced Administrator 2021 - 네오클로바
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
Oracle db performance tuning
InnoDB MVCC Architecture (by 권건우)
Upgrade to MySQL 8.0!
MySQL Database Architectures - InnoDB ReplicaSet & Cluster

What's hot (20)

PDF
MySQL Administrator 2021 - 네오클로바
PDF
Redo log improvements MYSQL 8.0
PDF
mysql 8.0 architecture and enhancement
PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
PDF
How to upgrade like a boss to MySQL 8.0 - PLE19
PDF
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
PDF
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
PPTX
Maxscale 소개 1.1.1
PPTX
MySQL_MariaDB-성능개선-202201.pptx
DOCX
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
PDF
InnoDb Vs NDB Cluster
PDF
M|18 Deep Dive: InnoDB Transactions and Write Paths
PDF
Using ClickHouse for Experimentation
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PPT
MySQL Atchitecture and Concepts
PDF
Introduction to Redis
PDF
PostgreSQL Deep Internal
PDF
Introduction to MongoDB
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
PDF
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
MySQL Administrator 2021 - 네오클로바
Redo log improvements MYSQL 8.0
mysql 8.0 architecture and enhancement
The Full MySQL and MariaDB Parallel Replication Tutorial
How to upgrade like a boss to MySQL 8.0 - PLE19
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
Maxscale 소개 1.1.1
MySQL_MariaDB-성능개선-202201.pptx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
InnoDb Vs NDB Cluster
M|18 Deep Dive: InnoDB Transactions and Write Paths
Using ClickHouse for Experimentation
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
MySQL Atchitecture and Concepts
Introduction to Redis
PostgreSQL Deep Internal
Introduction to MongoDB
From cache to in-memory data grid. Introduction to Hazelcast.
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
Ad

Viewers also liked (20)

PPT
Recovery of lost or corrupted inno db tables(mysql uc 2010)
PDF
Inno Db Internals Inno Db File Formats And Source Code Structure
PPTX
cPanelCon 2014: InnoDB Anatomy
PDF
InnoDB Architecture and Performance Optimization, Peter Zaitsev
PDF
MySQL 5.5 Guide to InnoDB Status
PDF
Optimizing MySQL
PPTX
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
PDF
innoDBのインデックスとアルゴリズムについて調べてみた話
ODP
Mastering InnoDB Diagnostics
PPTX
Database , 5 Semantic
PPTX
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQL
PDF
MySQL 5.7: Focus on InnoDB
PDF
Mvcc Unmasked (Bruce Momjian)
PPTX
The nightmare of locking, blocking and isolation levels!
ODP
Mysql For Developers
PDF
The Power of MySQL Explain
PDF
Mv unmasked.w.code.march.2013
 
PDF
Como migrar una base de datos de mysql a power designer
ODP
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Inno Db Internals Inno Db File Formats And Source Code Structure
cPanelCon 2014: InnoDB Anatomy
InnoDB Architecture and Performance Optimization, Peter Zaitsev
MySQL 5.5 Guide to InnoDB Status
Optimizing MySQL
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
innoDBのインデックスとアルゴリズムについて調べてみた話
Mastering InnoDB Diagnostics
Database , 5 Semantic
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQL
MySQL 5.7: Focus on InnoDB
Mvcc Unmasked (Bruce Momjian)
The nightmare of locking, blocking and isolation levels!
Mysql For Developers
The Power of MySQL Explain
Mv unmasked.w.code.march.2013
 
Como migrar una base de datos de mysql a power designer
Ad

Similar to InnoDB Internal (20)

PDF
Inno db internals innodb file formats and source code structure
PDF
Data recovery talk on PLUK
PDF
MySQL Space Management
PDF
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
PDF
Lecture storage-buffer
PPT
Recovery of lost or corrupted inno db tables(mysql uc 2010)
PDF
InnoDB architecture and performance optimization (Пётр Зайцев)
PDF
Pldc2012 innodb architecture and internals
PDF
Locality of (p)reference
PDF
Page Cache in Linux 2.6.pdf
PDF
Mysteries of the binary log
PDF
MySQL innoDB split and merge pages
PDF
Configuring workload-based storage and topologies
PDF
Innodb 和 XtraDB 结构和性能优化
PPTX
BGOUG 2012 - XML Index Strategies
PPTX
Linux Kernel Booting Process (2) - For NLKB
PPT
15 bufferand records
PPTX
SQLBits X SQL Server 2012 Rich Unstructured Data
PDF
Apache Arrow Workshop at VLDB 2019 / BOSS Session
PDF
Incremental backups
Inno db internals innodb file formats and source code structure
Data recovery talk on PLUK
MySQL Space Management
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
Lecture storage-buffer
Recovery of lost or corrupted inno db tables(mysql uc 2010)
InnoDB architecture and performance optimization (Пётр Зайцев)
Pldc2012 innodb architecture and internals
Locality of (p)reference
Page Cache in Linux 2.6.pdf
Mysteries of the binary log
MySQL innoDB split and merge pages
Configuring workload-based storage and topologies
Innodb 和 XtraDB 结构和性能优化
BGOUG 2012 - XML Index Strategies
Linux Kernel Booting Process (2) - For NLKB
15 bufferand records
SQLBits X SQL Server 2012 Rich Unstructured Data
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Incremental backups

More from mysqlops (20)

PDF
The simplethebeautiful
PPT
Oracle数据库分析函数详解
PDF
Percona Live 2012PPT:mysql-security-privileges-and-user-management
PDF
Percona Live 2012PPT: introduction-to-mysql-replication
PDF
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
PDF
Percona Live 2012PPT: MySQL Query optimization
PPSX
DBA新人的述职报告
PDF
分布式爬虫
PPSX
MySQL应用优化实践
PPT
eBay EDW元数据管理及应用
PPT
基于协程的网络开发框架的设计与实现
PPT
eBay基于Hadoop平台的用户邮件数据分析
PPSX
对MySQL DBA的一些思考
PPT
QQ聊天系统后台架构的演化与启示
PPT
腾讯即时聊天IM1.4亿在线背后的故事
PDF
分布式存储与TDDL
PDF
MySQL数据库生产环境维护
PDF
Memcached
PDF
DevOPS
PDF
MySQL数据库开发的三十六条军规
The simplethebeautiful
Oracle数据库分析函数详解
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Query optimization
DBA新人的述职报告
分布式爬虫
MySQL应用优化实践
eBay EDW元数据管理及应用
基于协程的网络开发框架的设计与实现
eBay基于Hadoop平台的用户邮件数据分析
对MySQL DBA的一些思考
QQ聊天系统后台架构的演化与启示
腾讯即时聊天IM1.4亿在线背后的故事
分布式存储与TDDL
MySQL数据库生产环境维护
Memcached
DevOPS
MySQL数据库开发的三十六条军规

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PPT
Teaching material agriculture food technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
Teaching material agriculture food technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
MYSQL Presentation for SQL database connectivity
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
Understanding_Digital_Forensics_Presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...

InnoDB Internal

  • 1. Transactional Storage for MySQL FAST. RELIABLE. PROVEN. InnoDB Internals: InnoDB File Formats and Source Code Structure MySQL University, October 2009 Calvin Sun Principal Engineer Oracle Corporation
  • 2. Today’s Topics • Goals of InnoDB • Key Functional Characteristics • InnoDB Design Considerations • InnoDB Architecture • InnoDB On Disk Format • Source Code Structure • Q&A
  • 3. Goals of InnoDB • OLTP oriented • Performance, Reliability, Scalability • Data Protection • Portability
  • 4. InnoDB Key Functional Characteristics • Full transaction support • Row-level locking • MVCC • Crash recovery • Efficient IO
  • 5. Design Considerations • Modeled on Gray & Reuter’s “Transactions Processing: Concepts & Techniques” • Also emulated the Oracle architecture • Added unique subsystems • Doublewrite • Insert buffering • Adaptive hash index • Designed to evolve with changing hardware & requirements
  • 6. InnoDB Architecture Server Applications Handler API Embedded InnoDB API Transaction Cursor / Row Mini- B-tree Lock transaction Page Buffer File Space Manager IO
  • 7. InnoDB On Disk Format • InnoDB Database Files • InnoDB Tablespaces • InnoDB Pages / Extents • InnoDB Rows • InnoDB Indexes • InnoDB Logs • File Format Design Considerations
  • 8. InnoDB Database Files MySQL Data Directory System tablespace InnoDB tables internal data .frm files dictionary insert OR innodb_file_per_table buffer undo logs .ibd files ibdata files
  • 9. InnoDB Tablespaces • A tablespace consists of multiple files and/or raw disk partitions. file_name:file_size[:autoextend[:max:max_file_size]] • A file/partition is a collection of segments. • A segment consists of fixed-length pages. • The page size is always 16KB in uncompressed tablespaces, and 1KB-16KB in compressed tablespaces (for both data and index).
  • 10. System Tablespace • Internal Data Dictionary • Undo • Insert Buffer • Doublewrite Buffer • MySQL Replication Info
  • 11. InnoDB Tablespaces Tablespace Segment Extent Extent Leaf node segment Non-leaf node segment Extent Extent Extent Rollback segment Page Row Row Row Trx id Row Row Row Roll pointer Field pointers Row Row Field 1 Field 2 Field n an extent = 64 pages
  • 12. InnoDB Pages InnoDB Page Types Symbol Value Notes FIL_PAGE_INODE 3 File segment inode FIL_PAGE_INDEX 17855 B-tree node FIL_PAGE_TYPE_BLOB 10 Uncompressed BLOB page FIL_PAGE_TYPE_ZBLOB 11 1st compressed BLOB page FIL_PAGE_TYPE_ZBLOB2 12 Subsequent compressed BLOB page FIL_PAGE_TYPE_SYS 6 System page FIL_PAGE_TYPE_TRX_SYS 7 Transaction system page i-buf bitmap, I-buf free list, file space others header, extent desp page, new allocated page
  • 13. InnoDB Pages A page consists of: a page header, a page trailer, and a page body (rows or other contents). Page header Row Row Row Row Row Row Row Row Row Row Row row offset array Page trailer
  • 14. Page Declares typedef struct /* a space address */ { ulint pageno; /* page number within the file */ ulint boffset; /* byte offset within the page */ } fil_addr_t; typedef struct { ulint checksum; /* checksum of the page (since 4.0.14) */ ulint page_offset; /* page offset inside space */ fil_addr_t previous; /* offset or fil_addr_t */ fil_addr_t next; /* offset or fil_addr_t */ dulint page_lsn; /* lsn of the end of the newest modification log record to the page */ PAGE_TYPE page type; /* file page type */ dulint file_flush_lsn;/* the file has been flushed to disk at least up to this lsn */ int space_id; /* space id of the page */ char data[]; /* will grow */ ulint page_lsn; /* the last 4 bytes of page_lsn */ ulint checksum; /* page checksum, or checksum magic, or 0 */ } PAGE, *PAGE;
  • 15. InnoDB Compressed Pages Page header • InnoDB keeps a “modification log” in each page • Updates & inserts of small compressed data records are written to the log w/o page reconstruction; deletes don’t even require uncompression modification log • Log also tells InnoDB if the page will compress to fit page empty space size BLOB pointers • When log space runs out, page directory InnoDB uncompresses the Page trailer page, applies the changes and recompresses the page
  • 16. InnoDB Rows … prefix(768B) … COMACT format overflow 20 bytes page … … DYNAMIC format overflow page Record hdr Trx ID Roll ptr Fld ptrs overflow-page ptr .. Field values
  • 17. InnoDB Indexes - Primary PK values 001 - nnn ●Data rows are stored … … in the B-tree leaf nodes of a clustered 001 – 500 500 – 800 801 – nnn index ● B-tree is organized xxx 001 - 275 276 – 500 501 - 630 631 - 768 769 - 800 801 - 949 950 - xxx - nnn by primary key or non-null unique key clustered of table, if defined; Key values (primary key) index else, an internal 501- 501-630 + data for corresponding rows Primary Index column with 6-byte ROW_ID is added.
  • 18. InnoDB Indexes - Secondary clustered clustered (primary key) (primary PK values key) index - nnn 001 index ● Secondary index B- tree leaf nodes contain, for each key value, the primary B-tree leaf nodes, containing data keys of the corresponding rows, used to access key values A Z clustering index to obtain the data Secondary Index B-tree leaf nodes, containing PKs Secondary index Secondary index
  • 19. InnoDB Logging Rollback segments Log Buffer Buffer Pool log thread write thread Log File Log File #1 redo DATA #2 rollback log log files ibdata files
  • 20. InnoDB Redo Log end of log start of log last checkpoint min LSN Redo log structure: Space id PageNo OpCode Data
  • 21. File Format Management • Builtin InnoDB format: “Antelope” • New “Barracuda” format enables compression,ROW_FORMAT=DYNAMIC .ibd • Fast index creation, other features do not data files require Barracuda file format (file per table) • Builtin InnoDB can access “Antelope” databases, but not “Barracuda” databases • Check file format tag in system tablespace on startup • Enable a file format with new dynamic parameter innodb_file_format • Preserves ability to downgrade easily
  • 22. InnoDB File Format Design Considerations • Durability • Logging, doublewrite, checksum; • Performance • Insert buffering, table compression • Efficiency • Dynamic row format, table compression • Compatibility • File format management
  • 23. Source Code Structure • 31 subdirectories • Relevant InnoDB source files on file formats • Tablespace: fsp0fsp {.c, .ic, .h} • Page: page0page, page0zip {.c, .ic, .h} • Log: log0log {.c, .ic, .h}
  • 24. Source Code Subdirectories • buf • ibuf • que • data • include • read • db • lock • rem • dict • log • row • dyn • math • srv • eval • mem • sync • fil • mtr • thr • fsp • os • trx • fut • page • usr • ha • pars • ut • handler
  • 25. Summary: Durability, Performance, Compatibility & Efficiency • InnoDB is the leading transactional storage engine for MySQL • InnoDB’s architecture is well-suited to modern, on- line transactional applications; as well as embedded applications. • InnoDB’s file format is designed for high durability, better performance, and easy to manage
  • 27. InnoDB Size Limits • Max # of tables: 4 G • Max size of a table: 32TB • Columns per table: 1000 • Max row size: n*4 GB • 8 kB if stored on the same page • n*4 GB with n BLOBs • Max key length: 3500 • Maximum tablespace size: 64 TB • Max # of concurrent trxs: 1023