Everything MySQL

A great place to be!

Well, the answer I always get is "DROP". Then I would say... "what if you wanted to preserve your data?" It is very common for DBAs to think only in terms of what is obviously available in MySQL, ALTER, than all of the tools that are really there. I have been placed in situations in many companies where ALTER was just not the way to go.

Given the amount of MySQL usage throughout the WEB2.0/IT industry and the fact that data sets seem to keep growing... I have come up with a great solution to ALTERING large data sets. Please keep in mind that this is an old feature of MySQL and was a great "lightning speech" at the Percona conference in 04/08.

-- THE TABLE
mysql> use test;
mysql> show table status like 't1'\\\\G
*************************** 1. row ***************************
Name: t1
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 892130
Avg_row_length: 6544
Data_length: 5838471168
Max_data_length: 0
Index_length: 213909504
Data_free: 9437184
Auto_increment: NULL
Create_time: 2009-09-09 15:50:32
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.35 sec)

mysql> CREATE TABLE `t1` (
`id` varchar(32) NOT NULL,
`id0` varchar(32) DEFAULT NULL,
`id1` varchar(32) DEFAULT NULL,
`intcol1` int(32) DEFAULT NULL,
...
`intcol10` int(32) DEFAULT NULL,
`charcol1` varchar(128) DEFAULT NULL,
...
`charcol30` varchar(128) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id0` (`id0`),
UNIQUE KEY `id1` (`id1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.14 sec)

-- TABLE SIZE
mysql> select ((5838471168+213909504)/1024/1024/1024) as TBL_IN_GB;
+----------------+
| TBL_IN_GB |
+----------------+
| 5.636718750000 |
+----------------+
1 row in set (0.00 sec)

-- TESTING ALTER
mysql> alter table t1 modify charcol1 varchar(255);
Query OK, 800059 rows affected (25 min 42.40 sec)
Records: 800059 Duplicates: 0 Warnings: 0

OBSERVATIONS:
Big suprise, right... the table space doubled and it took a bit of time!



-- TESTING SELECT OUT AND LOAD IN
mysql> select * from test.t1 INTO OUTFILE '/var/lib/mysql/test.t1.out';
Query OK, 800059 rows affected (2 min 26.22 sec)

mysql> drop table test.t1;
Query OK, 0 rows affected (0.45 sec)

mysql> CREATE TABLE `t1` (
`id` varchar(32) NOT NULL,
`id0` varchar(32) DEFAULT NULL,
`id1` varchar(32) DEFAULT NULL,
`intcol1` int(32) DEFAULT NULL,
...
`intcol10` int(32) DEFAULT NULL,
`charcol1` varchar(255) DEFAULT NULL, -- FROM ALTER STATEMENT
...
`charcol30` varchar(255) DEFAULT NULL, -- NEW
PRIMARY KEY (`id`),
UNIQUE KEY `id0` (`id0`),
UNIQUE KEY `id1` (`id1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.14 sec)

mysql> LOAD DATA INFILE '/var/lib/mysql/test.t1.out' INTO TABLE test.t1;
Query OK, 800059 rows affected (12 min 43.13 sec)
Records: 800059 Deleted: 0 Skipped: 0 Warnings: 0

-- TABLE BREAKDOWN
30 VARCHAR COLUMNS
10 INT COLUMNS
1 PRIMARY KEY
2 UNIQUE KEYS

-- CONCLUSIONS
Alter table is slow and a full dump and reload is 47% faster!

-- APPLICATIONS
It can be said that ALTER TABLE is more efficient when dealing with small tables (< 1GB); however, when I think of any large scale data set or larger sharded cluster set, the dump and reload wins every time!

-- EXAMPLES
1. 20GB table per database with 10 databases per server
-- 200GB of the data on the server needs to be modified
-- Now multiply the server count times 20
--- This is 4TB worth of data
2. The examples can go on and on!!!!!!

-- PROCESS
1. If you have a slave, AND YOU SHOULD AT THIS POINT, run stop slave
2. Run the SELECT INTO OUTFILE local, if you have the space, or over nfs (not very fast)
3. DROP or RENAME THE TABLE you are trying to ALTER
-- RENAME only if you have the space and you don't trust THE PROCESS!
4. CREATE the same table with the ALTERATIONS, if you dropped, or a new table with the ALTERATIONS
5. IMPORT the file from step 2
6. Either RENAME the two tables, if you renamed, or start slave
7. Fail over the writes to the newly altered slave
8. Repeat steps 1 - 7 for the old master

-- CONSIDERATIONS
1. How long do you have for the outtage
2. How much downtime can you take
3. Do you have a replica of the data
4. Do you have the space, either local or over nfs

Views: 7074

Comment

You need to be a member of Everything MySQL to add comments!

Join Everything MySQL

Comment by Chris on September 28, 2009 at 8:31pm
Sweet. We are looking forward to the results.
Comment by Seattlegaucho on September 28, 2009 at 4:08pm
thank you both for your answers. the tables are big and the outage we can take is not that 'generous'. I'll post the results once I get them.
Comment by Sarah Sproehnle on September 27, 2009 at 7:19pm
Hey Seattlegaucho! Glad to see you posting.

I've not benchmarked this, but you could also consider SELECT ..INTO OUTFILE and LOAD DATA INFILE. It can be faster than a normal bulk insert. And remember that InnoDB is better at inserting in primary key order. I'd also disable unique key/fk checks if you can.
Comment by Chris on September 26, 2009 at 1:43pm
If the tables are not too large, maybe under 2GB, this will work. You also need to consider the amount of an outage you can take. I still feel that a full dump out and reload would be faster with larger tables.
Comment by Seattlegaucho on September 25, 2009 at 2:09pm
I'm facing this situation to convert a table from MyISAM to InnoDB, no structure changes. Since I have enough space for both tables on disk, I'm thinking that the following procedure might save some time:
1. Create the new table w/ ENGINE=InnoDB, t_innodb for this example base on SHOW CREATE TABLE t_myisam\\\\G
2. Do a INSERT INTO t_innodb SELECT ... FROM t_myisam.
3. DROP t_myisam and rename t_innodb to whatever was the original table name.

Any comments?
Comment by Chris on September 10, 2009 at 8:46pm
I agree with Ronald on his point that dropping indexes would not help. Plus, if you dropped the index then ran the dump out and load I believe that would take the same amount of time as a normal alter.

Settlegaucho,
To your point about the different mechanisms on the dump out and reload... this is another reason why I love this method so much. There are so many options and so many applications.
Comment by Ronald Bradford on September 10, 2009 at 7:13pm
@seattlegaucho Testing has shown that the creation without indexes does not improve performance. I don't recall the specific version, but the detection of an empty table and use of LOAD introduced a delayed creation of keys, as if you performed it manually.
Comment by Seattlegaucho on September 10, 2009 at 7:10pm
... and if you drop indexes before the data load and recreate them after it's finished, chances are it'll be even faster. Not to mention that you can do the same using mysqldump with the right options to dump a single table. Then edit the CREATE TABLE statement and reload. mysqldump will add the DROP INDEX and CREATE INDEX at both ends of the data load.

My $.02
G

RSS

© 2024   Created by Chris.   Powered by

Badges  |  Report an Issue  |  Terms of Service