Everything MySQL

A great place to be!

Broken Index from InnoDB Hot Backup

A very interesting problem came up a while back when testing a rebuild and failover procedure. I had just run a rebuild of a slave server with InnoDB Hot Backup from the master. After the failover, one query on three tables in three different databases was not performing as it should. This was very odd to me given that the same table in all three databases was acting up. Below is the table structure and example query:

mysql> show create table plx_async_job\\\\G
*************************** 1. row ***************************
Table: plx_async_job
Create Table: CREATE TABLE `plx_async_job` (
`async_job_id` int(11) NOT NULL AUTO_INCREMENT,
`db_id` tinyint(3) unsigned NOT NULL DEFAULT '0',
`user_id` int(11) NOT NULL DEFAULT '0',
`created` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`modified` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`job_status` tinyint(4) NOT NULL DEFAULT '0',
`job_type` smallint(6) NOT NULL DEFAULT '0',
`run_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`obj_id` int(11) NOT NULL DEFAULT '0',
`job_data` blob NOT NULL,
PRIMARY KEY (`async_job_id`),
KEY `job_status` (`job_status`,`job_type`,`run_at`),
KEY `user_id` (`user_id`,`job_status`,`job_type`,`obj_id`)
) ENGINE=InnoDB

mysql> explain SELECT plx_async_job.async_job_id, plx_async_job.db_id, plx_async_job.user_id, plx_async_job.created, plx_async_job.modified, plx_async_job.job_status, plx_async_job.job_type, plx_async_job.run_at, plx_async_job.obj_id, plx_async_job.job_data FROM plx_async_job where job_status = 0 and job_type = 18 and run_at <= '2009-10-26 22:19:21' LIMIT 1000;
+----+-------------+---------------+-------+---------------+------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+------------+---------+------+--------+-------------+
| 1 | SIMPLE | plx_async_job | range | job_status | job_status | 11 | NULL | 333072 | Using where |
+----+-------------+---------------+-------+---------------+------------+---------+------+--------+-------------+
1 row in set (0.00 sec)

Keep in mind that there is an average 188K rows in the table on all three databases. I was a bit puzzled when I saw the explain output running an index search on almost double the rows. I have to say that I have never seen this behavior before and initially equated this to a bug in MySQL 5.4.0.

I started trouble shooting this issue by running the same query on all three databases on the old master and the new master. Both the explain output and the query execution time did not match up to what I was seeing on the new master. Here are the examples:

On the new master (the server that was rebuilt from InnoDB Hot Backup)

mysql> explain SELECT plx_async_job.async_job_id, plx_async_job.db_id, plx_async_job.user_id, plx_async_job.created, plx_async_job.modified, plx_async_job.job_status, plx_async_job.job_type, plx_async_job.run_at, plx_async_job.obj_id, plx_async_job.job_data FROM plx_async_job where job_status = 0 and job_type = 18 and run_at <= '2009-10-26 22:19:21' LIMIT 1000;
+----+-------------+---------------+-------+---------------+------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+------------+---------+------+--------+-------------+
| 1 | SIMPLE | plx_async_job | range | job_status | job_status | 11 | NULL | 333072 | Using where |
+----+-------------+---------------+-------+---------------+------------+---------+------+--------+-------------+
1 row in set (0.00 sec)

mysql> SELECT plx_async_job.async_job_id, plx_async_job.db_id, plx_async_job.user_id, plx_async_job.created, plx_async_job.modified, plx_async_job.job_status, plx_async_job.job_type, plx_async_job.run_at, plx_async_job.obj_id, plx_async_job.job_data FROM plx_async_job where job_status = 0 and job_type = 18 and run_at <= '2009-10-26 22:19:21' LIMIT 1000;
Empty set (4.57 sec)

On the old master (Where the backup was taken from)

mysql> explain SELECT plx_async_job.async_job_id, plx_async_job.db_id, plx_async_job.user_id, plx_async_job.created, plx_async_job.modified, plx_async_job.job_status, plx_async_job.job_type, plx_async_job.run_at, plx_async_job.obj_id, plx_async_job.job_data FROM plx_async_job where job_status = 0 and job_type = 18 and run_at <= '2009-10-26 22:19:21' LIMIT 1000;
+----+-------------+---------------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+------------+---------+------+------+-------------+
| 1 | SIMPLE | plx_async_job | range | job_status | job_status | 11 | NULL | 1 | Using where |
+----+-------------+---------------+-------+---------------+------------+---------+------+------+-------------+

mysql> SELECT plx_async_job.async_job_id, plx_async_job.db_id, plx_async_job.user_id, plx_async_job.created, plx_async_job.modified, plx_async_job.job_status, plx_async_job.job_type, plx_async_job.run_at, plx_async_job.obj_id, plx_async_job.job_data FROM plx_async_job where job_status = 0 and job_type = 18 and run_at <= '2009-10-26 22:19:21' LIMIT 1000;
Empty set (0.00 sec)

The information above pointed to InnoDB fragmentation; however, normal InnoDB fragmentation occurs over time based on usage, not from an InnoDB Hot Backup. I tested the fragmented theory by trying to defragment the table. There are three ways to do this. 1. mysqldump with –order-by-primary then import 2. Dump and reload the table with SELECT INTO OUTFILE and LOAD DATA INFILE 3. alter table table_name ENGINE=INNODB;

I chose option 2 cause I could not afford the time to fail back and did not want the table to lock for the duration of the table rebuild with option 3.

mysql> alter table plx_async_job rename to plx_async_job_save;

mysql> CREATE TABLE `plx_async_job` (
`async_job_id` int(11) NOT NULL AUTO_INCREMENT,
`db_id` tinyint(3) unsigned NOT NULL DEFAULT '0',
`user_id` int(11) NOT NULL DEFAULT '0',
`created` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`modified` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`job_status` tinyint(4) NOT NULL DEFAULT '0',
`job_type` smallint(6) NOT NULL DEFAULT '0',
`run_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`obj_id` int(11) NOT NULL DEFAULT '0',
`job_data` blob NOT NULL,
PRIMARY KEY (`async_job_id`),
KEY `job_status` (`job_status`,`job_type`,`run_at`),
KEY `user_id` (`user_id`,`job_status`,`job_type`,`obj_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

mysql> select async_job_id,db_id,user_id,created,modified,job_status,job_type,run_at,obj_id,job_data INTO OUTFILE '/var/tmp/user3.plx_async_job_save.txt'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\\\\n'
FROM plx_async_job_save;

mysql> LOAD DATA INFILE '/var/tmp/user3.plx_async_job_save.txt' IGNORE INTO TABLE plx_async_job
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\\\\n';

The process took less that 1 minute to complete on each table. Both the explain output and execution time took were now showing the following:

mysql> explain SELECT plx_async_job.async_job_id, plx_async_job.db_id, plx_async_job.user_id, plx_async_job.created, plx_async_job.modified, plx_async_job.job_status, plx_async_job.job_type, plx_async_job.run_at, plx_async_job.obj_id, plx_async_job.job_data FROM plx_async_job where job_status = 0 and job_type = 18 and run_at <= '2009-10-26 22:19:21' LIMIT 1000;
+----+-------------+---------------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+------------+---------+------+------+-------------+
| 1 | SIMPLE | plx_async_job | range | job_status | job_status | 11 | NULL | 1 | Using where |
+----+-------------+---------------+-------+---------------+------------+---------+------+------+-------------+

mysql> SELECT plx_async_job.async_job_id, plx_async_job.db_id, plx_async_job.user_id, plx_async_job.created, plx_async_job.modified, plx_async_job.job_status, plx_async_job.job_type, plx_async_job.run_at, plx_async_job.obj_id, plx_async_job.job_data FROM plx_async_job where job_status = 0 and job_type = 18 and run_at <= '2009-10-26 22:19:21' LIMIT 1000;
Empty set (0.00 sec)

Conclusions:
It’s very easy to trust what you have always used, tools like InnoDB Hot Backup, mysqldump and maybe even XtraBackup… That said, these tools in the right situation can cause annoying headaches so just remember to take a step back and remember the basics. In this case it was standard InnoDB fragmentation that was overlooked initially.

Views: 458

Comment

You need to be a member of Everything MySQL to add comments!

Join Everything MySQL

RSS

© 2024   Created by Chris.   Powered by

Badges  |  Report an Issue  |  Terms of Service