SlideShare a Scribd company logo
Recovery of lost or corrupted InnoDB tables MySQL User Conference 2010, Santa Clara [email_address] Percona Inc. http://MySQLPerformanceBlog.com
Agenda InnoDB format overview Internal system tables SYS_INDEXES and SYS_TABLES InnoDB Primary and Secondary keys Typical failure scenarios InnoDB recovery tool - - Three things are certain: Death, taxes and lost data. Guess which has occurred?
1. InnoDB format overview
How MySQL stores data in InnoDB A table space (ibdata1) System tablespace(data dictionary, undo, insert buffer, etc.) PRIMARY indices (PK + data) SECONDARY indices (SK + PK) If the key is (f1, f2) it is stored as (f1, f2, PK) file per table (.ibd) PRIMARY index SECONDARY indices InnoDB pages size 16k (uncompressed) Every index is identified by  index_id
 
How MySQL stores data in InnoDB Page identifier index_id      TABLE: name test/site_folders, id 0 119, columns 9, indexes 1, appr.rows 1       COLUMNS: id: DATA_INT len 4 prec 0; name: type 12 len 765 prec 0; sites_count: DATA_INT len 4 prec 0;                            created_at: DATA_INT len 8 prec 0; updated_at: DATA_INT len 8 prec 0;                     DB_ROW_ID: DATA_SYS prtype 256 len 6 prec 0; DB_TRX_ID: DATA_SYS prtype 257 len 6 prec 0;                     DB_ROLL_PTR: DATA_SYS prtype 258 len 7 prec 0;            INDEX: name PRIMARY, id  0 254 , fields 1/7, type 3            root page 271, appr.key vals 1, leaf pages 1, size pages 1            FIELDS:  id DB_TRX_ID DB_ROLL_PTR name sites_count created_at updated_at  mysql> CREATE TABLE innodb_table_monitor(x int) engine=innodb Error log:
InnoDB page format Fil Trailer  Page Directory FREE SPACE USER RECORDS   INFINUM+SUPREMUM RECORDS PAGE_HEADER FIL HEADER
InnoDB page format Fil Header   the latest archived log file number at the time that  FIL_PAGE_FILE_FLUSH_LSN  was written (in the log)  4  FIL_PAGE_ARCH_LOG_NO   "the file has been flushed to disk at least up to this lsn" (log serial number), valid only on the first page of the file  8  FIL_PAGE_FILE_FLUSH_LSN   current defined types are:  FIL_PAGE_INDEX ,  FIL_PAGE_UNDO_LOG ,  FIL_PAGE_INODE ,  FIL_PAGE_IBUF_FREE_LIST   2  FIL_PAGE_TYPE   log serial number of page's latest log record  8  FIL_PAGE_LSN   offset of next page in key order  4  FIL_PAGE_NEXT   offset of previous page in key order  4  FIL_PAGE_PREV   ordinal page number from start of space  4  FIL_PAGE_OFFSET   4 ID of the space the page is in  4  FIL_PAGE_SPACE   Remarks   Size   Name   Data are stored in  FIL_PAGE_INODE  == 0x03
InnoDB page format Page  Header  "file segment header for the non-leaf pages in a B-tree" (this is irrelevant here)  10  PAGE_BTR_SEG_TOP   "file segment header for the leaf pages in a B-tree" (this is irrelevant here)  10  PAGE_BTR_SEG_LEAF   identifier of the index the page belongs to  8  PAGE_INDEX_ID   level within the index (0 for a leaf page)  2  PAGE_LEVEL   the highest ID of a transaction which might have changed a record on the page (only set for secondary indexes)  8  PAGE_MAX_TRX_ID   number of user records  2  PAGE_N_RECS   number of consecutive inserts in the same direction, e.g. "last 5 were all to the left"  2  PAGE_N_DIRECTION   either  PAGE_LEFT ,  PAGE_RIGHT , or  PAGE_NO_DIRECTION   2  PAGE_DIRECTION   record pointer to the last inserted record  2  PAGE_LAST_INSERT   "number of bytes in deleted records"  2  PAGE_GARBAGE   record pointer to first free record  2  PAGE_FREE   number of heap records; initial value = 2  2  PAGE_N_HEAP   record pointer to first record in heap  2  PAGE_HEAP_TOP   number of directory slots in the Page Directory part; initial value = 2  2  PAGE_N_DIR_SLOTS   Remarks   Size   Name   index_id Highest bit is row format(1 -COMPACT, 0 - REDUNDANT )
InnoDB page format (REDUNDANT) Extra bytes   pointer to next record in page  16 bits  next 16 bits   1 if each Field Start Offsets is 1 byte long (this item is also called the "short" flag)  1 bit  1byte_offs_flag  number of fields in this record, 1 to 1023  10 bits  n_fields  record's order number in heap of index page  13 bits  heap_no  number of records owned by this record  4 bits  n_owned  1 if record is predefined minimum record  1 bit  min_rec_flag  1 if record is deleted  1 bit  deleted_flag  _ORDINAR Y,  _NODE_PTR ,  _INFIMUM ,  _SUPREMUM 2  bit s   record_status Description   Size   Name
InnoDB page format (COMPACT) Extra bytes   a relative pointer to the next record in the page 16 next 16 bits   000=conventional, 001=node pointer (inside B-tree),  010=infimum, 011=supremum, 1xx=reserved 3 record type the order number of this record in the heap of the index page 1 3 heap_no  the number of records owned by this record (this term is explained in page0page.h)  4   n_owned  4 bits used to delete mark a record, and mark a predefined minimum record in alphabetical order 4   record_statu s deleted_fla g min_rec_flag  Description   Size , bits Name
How to check row format? The highest bit of the  PAGE_N_HEAP   from the page header 0 stands for version  REDUNDANT , 1 - for  COMACT dc -e "2o `hexdump –C d pagefile | grep 00000020 | awk '{ print $12}'` p" | sed 's/./& /g' | awk '{ print $1}'
Rows in an InnoDB page Rows in a single pages is a linked  list The first record INFIMUM  The last record SUPREMUM Sorted by Primar y key infimum next supremum 0 100 data... next 101 data... next 102 data... next 103 data... next
Records are saved in insert order insert into t1 values(10, 'aaa'); insert into t1 values(30, ' ccc '); insert into t1 values(20, ' bbb '); JG....................N<E....... ................................ .............................2.. ...infimum......supremum......6. ........)....2.. aaa ............. ...*....2.. ccc .... ...........+. ...2.. bbb ....................... ................................
Row format EXAMPLE: CREATE TABLE ` t1 ` ( ` ID ` int( 11 ) unsigned NOT NULL, ` NAME ` varchar(120), ` N_FIELDS ` int(10), PRIMARY KEY  (`ID`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 depends on content  Field Contents  6  bytes (5  bytes  if COMPACT format) Extra Bytes  (F*1) or (F*2) bytes  Field Start Offsets  Size   Name
REDUNDANT A row:  (10 , ‘abcdef’, 20 ) 4 6 7 Actualy stored as:  (10 , TRX_ID, PTR_ID, ‘abcdef’, 20 ) 6 4 Field Offsets … . next Extra 6 bytes: 0x00 00 00 0A record_status deleted_flag  min_rec_flag  n_owned  heap_no  n_fields  1byte_offs_flag   Fields ... ... abcdef 0x80 00 00 14
COMPACT A row:  (10 , ‘abcdef’, 20 ) 6 NULLS Actualy stored as:  (10 , TRX_ID, PTR_ID, ‘abcdef’, 20 ) Field Offsets … . next Extra 5 bytes: 0x00 00 00 0A Fields ... ... abcdef 0x80 00 00 14 A bit per NULL-able field
Data types INT types (fixed-size) String types VARCHAR(x) – variable-size CHAR(x) – fixed-size, variable-size if UTF-8 DECIMAL Stored in strings before 5.0.3, variable in size Binary format after 5.0.3, fixed-size.
BLOB and other long fields Field length (so called offset) is one or two byte long Page size is 16k If record size <  (UNIV_PAGE_SIZE/2-200)  == ~7k – the record is stored internally (in a PK page) Otherwise – 768 bytes internally, the rest in an external page
2 . Internal system tables SYS_INDEXES and SYS_TABLES
Why are SYS_* tables needed? Correspondence “table name” -> “index_id” Storage for other internal information
How MySQL stores data in InnoDB SYS_TABLES and SYS_INDEXES Always REDUNDANT format! CREATE TABLE `SYS_INDEXES` ( ` TABLE_ID ` bigint(20) unsigned NOT NULL default '0', ` ID ` bigint(20) unsigned NOT NULL default '0', ` NAME ` varchar(120) default NULL, ` N_FIELDS ` int(10) unsigned default NULL, ` TYPE ` int(10) unsigned default NULL, ` SPACE ` int(10) unsigned default NULL, ` PAGE_NO ` int(10) unsigned default NULL, PRIMARY KEY  (`TABLE_ID`,`ID`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 CREATE TABLE `SYS_TABLES` ( ` NAME ` varchar(255) NOT NULL default '', ` ID ` bigint(20) unsigned NOT NULL default '0', ` N_COLS ` int(10) unsigned default NULL, ` TYPE ` int(10) unsigned default NULL, ` MIX_ID ` bigint(20) unsigned default NULL, ` MIX_LEN ` int(10) unsigned default NULL, ` CLUSTER_NAME ` varchar(255) default NULL, ` SPACE ` int(10) unsigned default NULL, PRIMARY KEY  (`NAME`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 index_id = 0-3 index_id = 0-1 Name:  PRIMARY GEN_CLUSTER_ID or unique index name
How MySQL stores data in InnoDB NAME   ID  …   &quot;archive/msg_store&quot;  40  8 1 0 0 NULL 0 &quot;archive/msg_store&quot;  40  8 1 0 0 NULL 0 &quot;archive/msg_store&quot;  40  8 1 0 0 NULL 0 TABLE_ID   ID   NAME   … 40   196389  &quot;PRIMARY&quot; 2 3 0 21031026 4 0   196390 &quot;msg_hash&quot; 1 0 0 21031028 SYS_TABLES SYS_INDEXES Example:
3. InnoDB Primary and Secondary keys
Primary key The table: CREATE TABLE `t1` ( `ID` int(11), `NAME` varchar(120), `N_FIELDS` int(10), PRIMARY KEY  (`ID`), KEY `NAME` (`NAME`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 Fields in the PK: ID DB_TRX_ID DB_ROLL_PTR NAME N_FIELDS
Secondary key The table: CREATE TABLE `t1` ( `ID` int(11), `NAME` varchar(120), `N_FIELDS` int(10), PRIMARY KEY  (`ID`), KEY `NAME` (`NAME`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 Fields in the SK: NAME ID    Primary key
4. Typical failure scenarios
Deleted records DELETE FROM table  WHERE id = 5; Forgotten WHERE clause Band-aid: Stop/kill mysqld ASAP
How delete is performed? &quot;row/row0upd.c“ : “… /* How is a delete performed?...The delete is performed by setting the delete bit in the record and substituting the id of the deleting transaction for the original trx id, and substituting a new roll ptr for previous roll ptr. The old trx id and roll ptr are saved in the undo log record. Thus, no physical changes occur in the index tree structure at the time of the delete . Only when the undo log is purged, the index records will be physically deleted from the index trees.…”
Dropped table/database DROP TABLE table; DROP DATABASE database; Often happens when restoring from SQL dump Bad because .FRM file goes away Especially painful when innodb_file_per_table Band-aid: Stop/kill mysqld ASAP Stop IO on an HDD or mount read-only or take a raw image
Corrupted InnoDB tablespace Hardware failures OS or filesystem failures InnoDB bugs Corrupted InnoDB tablespace by other processes Band-aid: Stop mysqld Take a copy of InnoDB files
Wrong UPDATE statement UPDATE user SET Password = PASSWORD(‘qwerty’)  WHERE User=‘root’ ; Again forgotten WHERE clause Bad because changes are applied in a PRIMARY index immediately Old version goes to UNDO segment Band-aid: Stop/kill mysqld ASAP
5. InnoDB recovery tool
Recovery prerequisites Media ibdata1 *.ibd HDD image Tables structure SQL dump *.FRM files
table_defs.h { /* int(11) unsigned */ name: “ I D&quot;, type: FT_UINT, fixed_length: 4, has_limits: TRUE, limits: { can_be_null: FALSE, uint_min_val: 0, uint_max_val: 4294967295ULL }, can_be_null: FALSE }, { /* varchar(120) */ name: &quot;NAME&quot;, type: FT_CHAR, min_length: 0, max_length: 120, has_limits: TRUE, limits: { can_be_null: TRUE, char_min_len: 0, char_max_len: 120, char_ascii_only: TRUE }, can_be_null: TRUE }, generated by  create_defs.pl
How to get CREATE info from .frm files 1.  CREATE TABLE t1 (id int) Engine=INNODB; 2.  Replace t1.frm  with the one’s you need to get scheme 3. R un “show create table t1” If mysqld crashes See the end of  bvi t1.frm  : .ID.NAME.N_FIELDS.. 2. *.FRM viewer  !TODO
InnoDB recovery tool http:// launchpad.net / percona -innodb-recovery-tool / Written in Percona Contributed by Percona and community Supported by Percona Consists of two major tools page_parser  – splits InnoDB tablespace into 16k pages constraints_parser  – scans a page and finds good records
InnoDB recovery tool server #  ./page_parser -4 -f /var/lib/mysql/ibdata1 Opening file: /var/lib/mysql/ibdata1 Read data from fn=3... Read page #0.. saving it to pages-1259793800/0-18219008/0-00000000.page Read page #1.. saving it to pages-1259793800/0-0/1-00000001.page Read page #2.. saving it to pages-1259793800/4294967295-65535/2-00000002.page Read page #3.. saving it to pages-1259793800/0-0/3-00000003.page page_parser
Page signature check 0{....guatda.com/cmx.p0...4...4......=..E....... ........<..~...A.......|........ ................................ ... infimum ...... supremum f .....qT M/T/196001834/ XXXXX   XXXXXXXXXXX  L X X   X   X   X   X   X   X   X   XXXXXX X X XX X X X  INFIMUM and SUPREMUM records are in fixed positions Works with corrupted pages
InnoDB recovery tool server #  ./constraints_parser -4 -f pages-1259793800/0-16/51-00000051.page constraints_parser Table structure is defined in  &quot;include/table_defs.h&quot; See HOWTO for details http://code.google.com/p/innodb-tools/wiki/InnodbRecoveryHowto Filters inside table_defs.h are very important
Check InnoDB page before reading recs # ./constraints_parser -5 -U -f pages/0-418/12665-00012665.page -V Initializing table definitions... Processing table: document_type_fieldsets_link - total fields: 5 - nullable fields: 0 - minimum header size: 5 - minimum rec size: 25 - maximum rec size: 25 Read data from fn=3... Page id: 12665 Checking a page Infimum offset: 0x63 Supremum offset: 0x70 Next record at offset: 0x9F (159) Next record at offset: 0xB0 (176) Next record at offset: 0x3D95 (15765) … Next record at offset: 0x70 (112) Page is good Check if the tool can follow all records by addresses If so, find a rec. exactly at the position where the record is. Helps a lot for COMPACT format!
Import result t1   1  &quot;browse&quot;  10 t1   2  &quot;dashboard&quot;  20 t1   3  &quot;addFolder&quot;  18 t1   4  &quot;editFolder&quot;  15 mysql> LOAD DATA INFILE '/path/to/datafile'  REPLACE INTO TABLE <table_name>  FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '&quot;'  LINES STARTING BY '<table_name>\t'  ;
Questions ? Thank you for coming! References http://www.mysqlperformanceblog.com/ http://percona.com/ http://www.slideshare.net/guest808c167/recovery-of-lost-or-corrupted-inno-db-tablesmysql-uc-2010 - - Applause :-)

More Related Content

PDF
InnoDB Internal
PDF
Upgrade from MySQL 5.7 to MySQL 8.0
PPTX
Introduction to Storm
ODP
MySQL HA with PaceMaker
PDF
Best practices for MySQL High Availability
PDF
Hands-on DNSSEC Deployment
PPTX
Unique ID generation in distributed systems
PDF
Why Use EXPLAIN FORMAT=JSON?
InnoDB Internal
Upgrade from MySQL 5.7 to MySQL 8.0
Introduction to Storm
MySQL HA with PaceMaker
Best practices for MySQL High Availability
Hands-on DNSSEC Deployment
Unique ID generation in distributed systems
Why Use EXPLAIN FORMAT=JSON?

What's hot (20)

PDF
InnoDB MVCC Architecture (by 권건우)
PDF
Upgrade to MySQL 8.0!
PDF
MariaDB Optimization
PDF
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
PDF
제3회난공불락 오픈소스 인프라세미나 - Pacemaker
PPT
Weblogic Server Overview Weblogic Scripting Tool
PDF
MySQL Administrator 2021 - 네오클로바
PDF
20090622 Velocity
PDF
InnoDB Architecture and Performance Optimization, Peter Zaitsev
PDF
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
PDF
HTTP入門
PDF
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
PPTX
NoSQL Data Modeling 101
PDF
Server-Sent Events in Action
PDF
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...
PDF
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
PPTX
Redis Introduction
PDF
Sql server 2019 new features
PDF
1.mysql disk io 모니터링 및 분석사례
DOCX
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
InnoDB MVCC Architecture (by 권건우)
Upgrade to MySQL 8.0!
MariaDB Optimization
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
제3회난공불락 오픈소스 인프라세미나 - Pacemaker
Weblogic Server Overview Weblogic Scripting Tool
MySQL Administrator 2021 - 네오클로바
20090622 Velocity
InnoDB Architecture and Performance Optimization, Peter Zaitsev
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
HTTP入門
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
NoSQL Data Modeling 101
Server-Sent Events in Action
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Redis Introduction
Sql server 2019 new features
1.mysql disk io 모니터링 및 분석사례
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Ad

Similar to Recovery of lost or corrupted inno db tables(mysql uc 2010) (20)

PPT
Recovery of lost or corrupted inno db tables(mysql uc 2010)
PDF
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
PDF
Data recovery talk on PLUK
PPTX
cPanelCon 2014: InnoDB Anatomy
PPTX
Optimizando MySQL
PPTX
Data Warehouse and Business Intelligence - Recipe 2
PPTX
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
PDF
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
ODP
InnoDB: архитектура транзакционного хранилища (Константин Осипов)
PDF
Page Cache in Linux 2.6.pdf
PPT
15 Ways to Kill Your Mysql Application Performance
PPT
Explain that explain
PPT
0104 abap dictionary
PPT
Less08 Schema
PDF
PE102 - a Windows executable format overview (booklet V1)
PDF
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
PPTX
cPanelCon 2015: InnoDB Alchemy
PPTX
Implementation
PPTX
vFabric SQLFire Introduction
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
Data recovery talk on PLUK
cPanelCon 2014: InnoDB Anatomy
Optimizando MySQL
Data Warehouse and Business Intelligence - Recipe 2
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
InnoDB: архитектура транзакционного хранилища (Константин Осипов)
Page Cache in Linux 2.6.pdf
15 Ways to Kill Your Mysql Application Performance
Explain that explain
0104 abap dictionary
Less08 Schema
PE102 - a Windows executable format overview (booklet V1)
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
cPanelCon 2015: InnoDB Alchemy
Implementation
vFabric SQLFire Introduction
Ad

More from Aleksandr Kuzminsky (8)

PDF
​Implementing Compliant Secrets with AWS Secrets Manager
PDF
ProxySQL at Scale on AWS.pdf
PPTX
Omnibus as a Solution for Dependency Hell
PDF
Efficient Indexes in MySQL
PDF
Efficient Use of indexes in MySQL
PPT
Netstore overview
PDF
Undrop for InnoDB
PDF
Undrop for InnoDB
​Implementing Compliant Secrets with AWS Secrets Manager
ProxySQL at Scale on AWS.pdf
Omnibus as a Solution for Dependency Hell
Efficient Indexes in MySQL
Efficient Use of indexes in MySQL
Netstore overview
Undrop for InnoDB
Undrop for InnoDB

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
Teaching material agriculture food technology
PDF
Encapsulation theory and applications.pdf
PPTX
Cloud computing and distributed systems.
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Modernizing your data center with Dell and AMD
PDF
Empathic Computing: Creating Shared Understanding
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology
Encapsulation theory and applications.pdf
Cloud computing and distributed systems.
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Understanding_Digital_Forensics_Presentation.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Modernizing your data center with Dell and AMD
Empathic Computing: Creating Shared Understanding
CIFDAQ's Market Insight: SEC Turns Pro Crypto
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
Dropbox Q2 2025 Financial Results & Investor Presentation
Network Security Unit 5.pdf for BCA BBA.
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Monthly Chronicles - July 2025

Recovery of lost or corrupted inno db tables(mysql uc 2010)

  • 1. Recovery of lost or corrupted InnoDB tables MySQL User Conference 2010, Santa Clara [email_address] Percona Inc. http://MySQLPerformanceBlog.com
  • 2. Agenda InnoDB format overview Internal system tables SYS_INDEXES and SYS_TABLES InnoDB Primary and Secondary keys Typical failure scenarios InnoDB recovery tool - - Three things are certain: Death, taxes and lost data. Guess which has occurred?
  • 3. 1. InnoDB format overview
  • 4. How MySQL stores data in InnoDB A table space (ibdata1) System tablespace(data dictionary, undo, insert buffer, etc.) PRIMARY indices (PK + data) SECONDARY indices (SK + PK) If the key is (f1, f2) it is stored as (f1, f2, PK) file per table (.ibd) PRIMARY index SECONDARY indices InnoDB pages size 16k (uncompressed) Every index is identified by index_id
  • 5.  
  • 6. How MySQL stores data in InnoDB Page identifier index_id     TABLE: name test/site_folders, id 0 119, columns 9, indexes 1, appr.rows 1       COLUMNS: id: DATA_INT len 4 prec 0; name: type 12 len 765 prec 0; sites_count: DATA_INT len 4 prec 0;                            created_at: DATA_INT len 8 prec 0; updated_at: DATA_INT len 8 prec 0;                    DB_ROW_ID: DATA_SYS prtype 256 len 6 prec 0; DB_TRX_ID: DATA_SYS prtype 257 len 6 prec 0;                    DB_ROLL_PTR: DATA_SYS prtype 258 len 7 prec 0;           INDEX: name PRIMARY, id 0 254 , fields 1/7, type 3            root page 271, appr.key vals 1, leaf pages 1, size pages 1            FIELDS:  id DB_TRX_ID DB_ROLL_PTR name sites_count created_at updated_at mysql> CREATE TABLE innodb_table_monitor(x int) engine=innodb Error log:
  • 7. InnoDB page format Fil Trailer Page Directory FREE SPACE USER RECORDS INFINUM+SUPREMUM RECORDS PAGE_HEADER FIL HEADER
  • 8. InnoDB page format Fil Header the latest archived log file number at the time that FIL_PAGE_FILE_FLUSH_LSN was written (in the log) 4 FIL_PAGE_ARCH_LOG_NO &quot;the file has been flushed to disk at least up to this lsn&quot; (log serial number), valid only on the first page of the file 8 FIL_PAGE_FILE_FLUSH_LSN current defined types are: FIL_PAGE_INDEX , FIL_PAGE_UNDO_LOG , FIL_PAGE_INODE , FIL_PAGE_IBUF_FREE_LIST 2 FIL_PAGE_TYPE log serial number of page's latest log record 8 FIL_PAGE_LSN offset of next page in key order 4 FIL_PAGE_NEXT offset of previous page in key order 4 FIL_PAGE_PREV ordinal page number from start of space 4 FIL_PAGE_OFFSET 4 ID of the space the page is in 4 FIL_PAGE_SPACE Remarks Size Name Data are stored in FIL_PAGE_INODE == 0x03
  • 9. InnoDB page format Page Header &quot;file segment header for the non-leaf pages in a B-tree&quot; (this is irrelevant here) 10 PAGE_BTR_SEG_TOP &quot;file segment header for the leaf pages in a B-tree&quot; (this is irrelevant here) 10 PAGE_BTR_SEG_LEAF identifier of the index the page belongs to 8 PAGE_INDEX_ID level within the index (0 for a leaf page) 2 PAGE_LEVEL the highest ID of a transaction which might have changed a record on the page (only set for secondary indexes) 8 PAGE_MAX_TRX_ID number of user records 2 PAGE_N_RECS number of consecutive inserts in the same direction, e.g. &quot;last 5 were all to the left&quot; 2 PAGE_N_DIRECTION either PAGE_LEFT , PAGE_RIGHT , or PAGE_NO_DIRECTION 2 PAGE_DIRECTION record pointer to the last inserted record 2 PAGE_LAST_INSERT &quot;number of bytes in deleted records&quot; 2 PAGE_GARBAGE record pointer to first free record 2 PAGE_FREE number of heap records; initial value = 2 2 PAGE_N_HEAP record pointer to first record in heap 2 PAGE_HEAP_TOP number of directory slots in the Page Directory part; initial value = 2 2 PAGE_N_DIR_SLOTS Remarks Size Name index_id Highest bit is row format(1 -COMPACT, 0 - REDUNDANT )
  • 10. InnoDB page format (REDUNDANT) Extra bytes pointer to next record in page 16 bits next 16 bits 1 if each Field Start Offsets is 1 byte long (this item is also called the &quot;short&quot; flag) 1 bit 1byte_offs_flag number of fields in this record, 1 to 1023 10 bits n_fields record's order number in heap of index page 13 bits heap_no number of records owned by this record 4 bits n_owned 1 if record is predefined minimum record 1 bit min_rec_flag 1 if record is deleted 1 bit deleted_flag _ORDINAR Y, _NODE_PTR , _INFIMUM , _SUPREMUM 2 bit s record_status Description Size Name
  • 11. InnoDB page format (COMPACT) Extra bytes a relative pointer to the next record in the page 16 next 16 bits 000=conventional, 001=node pointer (inside B-tree), 010=infimum, 011=supremum, 1xx=reserved 3 record type the order number of this record in the heap of the index page 1 3 heap_no the number of records owned by this record (this term is explained in page0page.h) 4 n_owned 4 bits used to delete mark a record, and mark a predefined minimum record in alphabetical order 4 record_statu s deleted_fla g min_rec_flag Description Size , bits Name
  • 12. How to check row format? The highest bit of the PAGE_N_HEAP from the page header 0 stands for version REDUNDANT , 1 - for COMACT dc -e &quot;2o `hexdump –C d pagefile | grep 00000020 | awk '{ print $12}'` p&quot; | sed 's/./& /g' | awk '{ print $1}'
  • 13. Rows in an InnoDB page Rows in a single pages is a linked list The first record INFIMUM The last record SUPREMUM Sorted by Primar y key infimum next supremum 0 100 data... next 101 data... next 102 data... next 103 data... next
  • 14. Records are saved in insert order insert into t1 values(10, 'aaa'); insert into t1 values(30, ' ccc '); insert into t1 values(20, ' bbb '); JG....................N<E....... ................................ .............................2.. ...infimum......supremum......6. ........)....2.. aaa ............. ...*....2.. ccc .... ...........+. ...2.. bbb ....................... ................................
  • 15. Row format EXAMPLE: CREATE TABLE ` t1 ` ( ` ID ` int( 11 ) unsigned NOT NULL, ` NAME ` varchar(120), ` N_FIELDS ` int(10), PRIMARY KEY (`ID`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 depends on content Field Contents 6 bytes (5 bytes if COMPACT format) Extra Bytes (F*1) or (F*2) bytes Field Start Offsets Size Name
  • 16. REDUNDANT A row: (10 , ‘abcdef’, 20 ) 4 6 7 Actualy stored as: (10 , TRX_ID, PTR_ID, ‘abcdef’, 20 ) 6 4 Field Offsets … . next Extra 6 bytes: 0x00 00 00 0A record_status deleted_flag min_rec_flag n_owned heap_no n_fields 1byte_offs_flag Fields ... ... abcdef 0x80 00 00 14
  • 17. COMPACT A row: (10 , ‘abcdef’, 20 ) 6 NULLS Actualy stored as: (10 , TRX_ID, PTR_ID, ‘abcdef’, 20 ) Field Offsets … . next Extra 5 bytes: 0x00 00 00 0A Fields ... ... abcdef 0x80 00 00 14 A bit per NULL-able field
  • 18. Data types INT types (fixed-size) String types VARCHAR(x) – variable-size CHAR(x) – fixed-size, variable-size if UTF-8 DECIMAL Stored in strings before 5.0.3, variable in size Binary format after 5.0.3, fixed-size.
  • 19. BLOB and other long fields Field length (so called offset) is one or two byte long Page size is 16k If record size < (UNIV_PAGE_SIZE/2-200) == ~7k – the record is stored internally (in a PK page) Otherwise – 768 bytes internally, the rest in an external page
  • 20. 2 . Internal system tables SYS_INDEXES and SYS_TABLES
  • 21. Why are SYS_* tables needed? Correspondence “table name” -> “index_id” Storage for other internal information
  • 22. How MySQL stores data in InnoDB SYS_TABLES and SYS_INDEXES Always REDUNDANT format! CREATE TABLE `SYS_INDEXES` ( ` TABLE_ID ` bigint(20) unsigned NOT NULL default '0', ` ID ` bigint(20) unsigned NOT NULL default '0', ` NAME ` varchar(120) default NULL, ` N_FIELDS ` int(10) unsigned default NULL, ` TYPE ` int(10) unsigned default NULL, ` SPACE ` int(10) unsigned default NULL, ` PAGE_NO ` int(10) unsigned default NULL, PRIMARY KEY (`TABLE_ID`,`ID`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 CREATE TABLE `SYS_TABLES` ( ` NAME ` varchar(255) NOT NULL default '', ` ID ` bigint(20) unsigned NOT NULL default '0', ` N_COLS ` int(10) unsigned default NULL, ` TYPE ` int(10) unsigned default NULL, ` MIX_ID ` bigint(20) unsigned default NULL, ` MIX_LEN ` int(10) unsigned default NULL, ` CLUSTER_NAME ` varchar(255) default NULL, ` SPACE ` int(10) unsigned default NULL, PRIMARY KEY (`NAME`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 index_id = 0-3 index_id = 0-1 Name: PRIMARY GEN_CLUSTER_ID or unique index name
  • 23. How MySQL stores data in InnoDB NAME ID … &quot;archive/msg_store&quot; 40 8 1 0 0 NULL 0 &quot;archive/msg_store&quot; 40 8 1 0 0 NULL 0 &quot;archive/msg_store&quot; 40 8 1 0 0 NULL 0 TABLE_ID ID NAME … 40 196389 &quot;PRIMARY&quot; 2 3 0 21031026 4 0 196390 &quot;msg_hash&quot; 1 0 0 21031028 SYS_TABLES SYS_INDEXES Example:
  • 24. 3. InnoDB Primary and Secondary keys
  • 25. Primary key The table: CREATE TABLE `t1` ( `ID` int(11), `NAME` varchar(120), `N_FIELDS` int(10), PRIMARY KEY (`ID`), KEY `NAME` (`NAME`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 Fields in the PK: ID DB_TRX_ID DB_ROLL_PTR NAME N_FIELDS
  • 26. Secondary key The table: CREATE TABLE `t1` ( `ID` int(11), `NAME` varchar(120), `N_FIELDS` int(10), PRIMARY KEY (`ID`), KEY `NAME` (`NAME`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 Fields in the SK: NAME ID  Primary key
  • 27. 4. Typical failure scenarios
  • 28. Deleted records DELETE FROM table WHERE id = 5; Forgotten WHERE clause Band-aid: Stop/kill mysqld ASAP
  • 29. How delete is performed? &quot;row/row0upd.c“ : “… /* How is a delete performed?...The delete is performed by setting the delete bit in the record and substituting the id of the deleting transaction for the original trx id, and substituting a new roll ptr for previous roll ptr. The old trx id and roll ptr are saved in the undo log record. Thus, no physical changes occur in the index tree structure at the time of the delete . Only when the undo log is purged, the index records will be physically deleted from the index trees.…”
  • 30. Dropped table/database DROP TABLE table; DROP DATABASE database; Often happens when restoring from SQL dump Bad because .FRM file goes away Especially painful when innodb_file_per_table Band-aid: Stop/kill mysqld ASAP Stop IO on an HDD or mount read-only or take a raw image
  • 31. Corrupted InnoDB tablespace Hardware failures OS or filesystem failures InnoDB bugs Corrupted InnoDB tablespace by other processes Band-aid: Stop mysqld Take a copy of InnoDB files
  • 32. Wrong UPDATE statement UPDATE user SET Password = PASSWORD(‘qwerty’) WHERE User=‘root’ ; Again forgotten WHERE clause Bad because changes are applied in a PRIMARY index immediately Old version goes to UNDO segment Band-aid: Stop/kill mysqld ASAP
  • 34. Recovery prerequisites Media ibdata1 *.ibd HDD image Tables structure SQL dump *.FRM files
  • 35. table_defs.h { /* int(11) unsigned */ name: “ I D&quot;, type: FT_UINT, fixed_length: 4, has_limits: TRUE, limits: { can_be_null: FALSE, uint_min_val: 0, uint_max_val: 4294967295ULL }, can_be_null: FALSE }, { /* varchar(120) */ name: &quot;NAME&quot;, type: FT_CHAR, min_length: 0, max_length: 120, has_limits: TRUE, limits: { can_be_null: TRUE, char_min_len: 0, char_max_len: 120, char_ascii_only: TRUE }, can_be_null: TRUE }, generated by create_defs.pl
  • 36. How to get CREATE info from .frm files 1. CREATE TABLE t1 (id int) Engine=INNODB; 2. Replace t1.frm with the one’s you need to get scheme 3. R un “show create table t1” If mysqld crashes See the end of bvi t1.frm : .ID.NAME.N_FIELDS.. 2. *.FRM viewer !TODO
  • 37. InnoDB recovery tool http:// launchpad.net / percona -innodb-recovery-tool / Written in Percona Contributed by Percona and community Supported by Percona Consists of two major tools page_parser – splits InnoDB tablespace into 16k pages constraints_parser – scans a page and finds good records
  • 38. InnoDB recovery tool server # ./page_parser -4 -f /var/lib/mysql/ibdata1 Opening file: /var/lib/mysql/ibdata1 Read data from fn=3... Read page #0.. saving it to pages-1259793800/0-18219008/0-00000000.page Read page #1.. saving it to pages-1259793800/0-0/1-00000001.page Read page #2.. saving it to pages-1259793800/4294967295-65535/2-00000002.page Read page #3.. saving it to pages-1259793800/0-0/3-00000003.page page_parser
  • 39. Page signature check 0{....guatda.com/cmx.p0...4...4......=..E....... ........<..~...A.......|........ ................................ ... infimum ...... supremum f .....qT M/T/196001834/ XXXXX XXXXXXXXXXX L X X X X X X X X X XXXXXX X X XX X X X INFIMUM and SUPREMUM records are in fixed positions Works with corrupted pages
  • 40. InnoDB recovery tool server # ./constraints_parser -4 -f pages-1259793800/0-16/51-00000051.page constraints_parser Table structure is defined in &quot;include/table_defs.h&quot; See HOWTO for details http://code.google.com/p/innodb-tools/wiki/InnodbRecoveryHowto Filters inside table_defs.h are very important
  • 41. Check InnoDB page before reading recs # ./constraints_parser -5 -U -f pages/0-418/12665-00012665.page -V Initializing table definitions... Processing table: document_type_fieldsets_link - total fields: 5 - nullable fields: 0 - minimum header size: 5 - minimum rec size: 25 - maximum rec size: 25 Read data from fn=3... Page id: 12665 Checking a page Infimum offset: 0x63 Supremum offset: 0x70 Next record at offset: 0x9F (159) Next record at offset: 0xB0 (176) Next record at offset: 0x3D95 (15765) … Next record at offset: 0x70 (112) Page is good Check if the tool can follow all records by addresses If so, find a rec. exactly at the position where the record is. Helps a lot for COMPACT format!
  • 42. Import result t1 1 &quot;browse&quot; 10 t1 2 &quot;dashboard&quot; 20 t1 3 &quot;addFolder&quot; 18 t1 4 &quot;editFolder&quot; 15 mysql> LOAD DATA INFILE '/path/to/datafile' REPLACE INTO TABLE <table_name> FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '&quot;' LINES STARTING BY '<table_name>\t' ;
  • 43. Questions ? Thank you for coming! References http://www.mysqlperformanceblog.com/ http://percona.com/ http://www.slideshare.net/guest808c167/recovery-of-lost-or-corrupted-inno-db-tablesmysql-uc-2010 - - Applause :-)