Everything MySQL

A great place to be!

So... in the shared hosting a DBA comes across many bad queries. For one reason or another a developer or admin or the guy who someone paid $10/hr writes SQL like this:

SELECT DISTINCT Resort.ResortID, Resort.Name, Resort.Address1, Resort.Address2, Resort.Address3, Resort.Address4
FROM Resort, ResortAttribute c2, ResortAttribute c3, ResortAttribute c4, ResortAttribute c5, ResortAttribute c6,
ResortAttribute c7, ResortAttribute c8, ResortAttribute c9, ResortAttribute c10, ResortAttribute c11, ResortAttribute c12,
ResortAttribute c13, ResortAttribute c14, ResortAttribute c15, ResortAttribute c16, ResortAttribute c17, ResortAttribute c18,
ResortAttribute c19, ResortAttribute c20, ResortAttribute c21
WHERE Resort.ResortID = c2.ResortID
AND Resort.ResortID = c3.ResortID
AND Resort.ResortID = c4.ResortID
AND Resort.ResortID = c5.ResortID
AND Resort.ResortID = c6.ResortID
AND Resort.ResortID = c7.ResortID
AND Resort.ResortID = c8.ResortID
AND Resort.ResortID = c9.ResortID
AND Resort.ResortID = c10.ResortID
AND Resort.ResortID = c11.ResortID
AND Resort.ResortID = c12.ResortID
AND Resort.ResortID = c13.ResortID
AND Resort.ResortID = c14.ResortID
AND Resort.ResortID = c15.ResortID
AND Resort.ResortID = c16.ResortID
AND Resort.ResortID = c17.ResortID
AND Resort.ResortID = c18.ResortID
AND Resort.ResortID = c19.ResortID
AND Resort.ResortID = c20.ResortID
AND Resort.ResortID = c21.ResortID
AND Resort.Name like '%%'
AND c2.AttributeID = 273
AND c3.AttributeID = 357
AND c9.AttributeID = 182
AND c10.AttributeID = 256
AND c11.AttributeID = 258
AND c12.AttributeID = 243
AND c13.AttributeID = 225
AND c14.AttributeID = 208
AND c15.AttributeID = 446
AND c16.AttributeID = 191
AND c17.AttributeID = 204
AND c18.AttributeID = 462
AND c19.AttributeID = 211
AND c20.AttributeID = 216
AND c21.AttributeID = 217
AND (c4.AttributeID = 173
OR c5.AttributeID = 181
OR c6.AttributeID = 184
OR c7.AttributeID = 215
OR c8.AttributeID = 234)
ORDER BY Resort.Name;

*************************** 1. row ***************************
Table: Resort
Create Table: CREATE TABLE `Resort` (
`ResortID` int(11) NOT NULL auto_increment,
`Name` varchar(255) NOT NULL default '',
`Address1` varchar(100) default NULL,
`Address2` varchar(100) default NULL,
`Address3` varchar(100) default NULL,
`Address4` varchar(100) default NULL,
`Phone1` varchar(30) default NULL,
`Phone2` varchar(30) default NULL,
`Phone3` varchar(30) default NULL,
`URL` varchar(255) default NULL,
`Description` text,
`NumberOfUnits` mediumint(11) default NULL,
`AvgPrice` float default NULL,
`RatingOverall` float default NULL,
`RatingGuestRoom` float default NULL,
`RatingService` float default NULL,
`RatingCleanliness` float default NULL,
`RatingDiningExperience` float default NULL,
`RatingRecommendation` float default NULL,
`RatingCommercial` float default NULL,
PRIMARY KEY (`ResortID`)
) TYPE=MyISAM
1 row in set (0.00 sec)

*************************** 1. row ***************************
Table: ResortAttribute
Create Table: CREATE TABLE `ResortAttribute` (
`ResortID` int(11) NOT NULL default '0',
`AttributeID` int(11) NOT NULL default '0',
PRIMARY KEY (`ResortID`,`AttributeID`)
) TYPE=MyISAM
1 row in set (0.00 sec)

Now, I love a good SQL puzzle but this is ridiculous! I sometimes give the SQL above as an interview question and you would not believe how many DBAs cannot solve it.

Here is some background:
1. This was a real query
2. The person who wrote it made it incredible inefficient
3. It is easy to solve if you just sit back and look at it
4. The query took forever to return a result

Ready for the solution? Not so fast! I need someone to comment on this before I reveal the "What where they thinking" and "How to solve it".

Views: 249

Reply to This

Replies to This Discussion

Chris,

I imagine you and I could go back and forth all night and day with bad SQL examples...I think we have in fact.

Let's digest this one:

Someone is searching for the resort that includes all of $x amenities and may include $z amenities. This is obviously a code generated query. The wildcards on resortname tell the story....I want a resort that accommodates: golf, a day spa, horseback riding, para-sailing and dirt-bikes. I would like a resort that accommodates all of that and maybe has: four-star dining, water-park, vegan menu.

Because this is a code generated query so the unqualified search on ResortName is justifiable (someone may put "disneyland" in the resort search string.
SELECT
ResortName, etc...
FROM
Resort r
, ResortAttribute ra
WHERE
ra.resortid = r.resortid
AND
ra.attributeid IN (lots of IDs)
AND r.resortname = "%%"
UNION
SELECT
ResortName, etc...
FROM
Resort r1
, ResortAttribute ra1
WHERE
ra1.resortid = r1.resortid
AND r1.resortname = "%%"
AND
(ra1.AttributeID = ?
OR ra1.AttributeID = ?
OR ra1.AttributeID = ?
OR ra1.AttributeID = ?
OR ra1.AttributeID = ?)
ORDER BY Resort.Name;

That query begs a question though: how does MySQL deal with multiple calls to the same file within"tr a single query? As given, these are MyISAM tables (3 file handles)...is each alias another 3 file handles opened in succession, or is it a single "transaction"?

Cheers,

Dov
I am not sure how MySQL deals with the multiple calls to the same file above but I am just guessing that each one is opened in succession.

Great observation on resortname and this was a script generated query! Here is how I solved it after speaking with the customer:

-- First I added the necessary index
alter table ResortAttribute add index (AttributeID);

-- The new query
SELECT DISTINCT Resort.ResortID, Resort.Name, Resort.Address1, Resort.Address2, Resort.Address3, Resort.Address4
FROM Resort
LEFT JOIN ResortAttribute c2 on (Resort.ResortID=c2.ResortID)
AND Resort.Name like '%%'
AND c2.AttributeID IN(273,357, 182, 256, 258, 243, 225, 208, 446, 191, 204, 462, 211, 216, 217, 173, 181, 184, 215, 234)
ORDER BY Resort.Name
LIMIT 0, 20;

Maybe Sarah can comment on how MySQL deals with multiple call to the same file...
The way MySQL opens tables (and therefore file descriptors in MyISAM's case) is by opening a handle to each table for each request for that table. So even though it's one query, the table would be "opened" twice. All handles are opened at the beginning of the query. For example:

-- Session 1:
SELECT * FROM t1 WHERE i = sleep(10)
UNION
SELECT * FROM t1 WHERE i = sleep(10);

-- Session 2: Immediately run show open tables even though the first query hasn't been going 10 seconds yet
SHOW OPEN TABLES;
+----------+-------+--------+-------------+
| Database | Table | In_use | Name_locked |
+----------+-------+--------+-------------+
| test | t1 | 2 | 0 |
+----------+-------+--------+-------------+

Note "In_use" is 2 throughout the entire query.

On a related note, it is often helpful to set the table_cache much larger than the number of tables in your system, since each table can be "opened" several times concurrently by different users or UNIONs, etc.

Does that answer your question?
Thanks a bunch Sarah! This is a great explanation!

Dov,
What did you think? Need anything else for this topic?
PS. As you know, MyISAM has a .MYD file and .MYI. Both files need to be opened the first time, but only the .MYD file needs to be opened for other instances of that table in a query. So in my above example, if I do:
flush tables;
show global status like 'open%files';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Open_files | 3 |
| Opened_files | 88 |
+---------------+-------+

3 open files (.MYI + .MYD*2)
Sarah,

Thanks, that makes sense. If then, the initial query is being run hundreds of times per minute the impact on IO would be remarkable in many cases (multiply that by thousands of tables and 'remarkable' become problematic).
The IO might not be bad (especially if these are covering indexes or at least index lookups), however the number of file descriptors that MySQL opens can be a lot. To avoid the constant opening/closing of the files, increase your table_cache (or table_open_cache in 5.1). And if MySQL gives errors about opening files, increase open_files_limit.
Hold on there, you've changed the return set criteria with that join. The last four AttributeID (I HATE camel-hump notation!) in the original query are "OR"s. Making it all one IN with a LEFT JOIN turns all of the "AND"s into "OR"s.

So if I query:

must have: day spa, childcare, golf course, batting cage
optional: horseback riding

I could get a resort that offers a day spa and horseback riding, but sans golf course...

As far as indexing, the SQL was bad enough that I didn't even look at that aspect. The ResortAttribute table in particular is going to stay so small, I don't know that I would bother for the milliseconds I save on lookups. I would recommend changing both fields to SMALLINT as well, I can't think of more than ~80 resort attributes....

Hopefully Sarah will chime in on the internals of file look up...
I believe that just making an extra OR IN statement would do it. What do you think?

SELECT DISTINCT Resort.ResortID, Resort.Name, Resort.Address1, Resort.Address2, Resort.Address3, Resort.Address4
FROM Resort
LEFT JOIN ResortAttribute c2 on (Resort.ResortID=c2.ResortID)
AND Resort.Name like '%%'
AND c2.AttributeID IN(273,357, 182, 256, 258, 243, 225, 208, 446, 191, 204, 462, 211, )
OR c2.AttributeID IN(216, 217, 173, 181, 184, 215, 234)
ORDER BY Resort.Name
LIMIT 0, 20;
Nope, still not the desired result set.

This has gotten interesting...I've populated (minimally) the tables on a test server of mine:

mysql> select * from Resort;
+----------+---------------+--------------+--------------+----------+----------+-----------+-----------+--------+----------------+-------------+---------------+----------+---------------+-----------------+---------------+-------------------+------------------------+----------------------+------------------+
| ResortID | Name | Address1 | Address2 | Address3 | Address4 | Phone1 | Phone2 | Phone3 | URL | Description | NumberOfUnits | AvgPrice | RatingOverall | RatingGuestRoom | RatingService | RatingCleanliness | RatingDiningExperience | RatingRecommendation | RatingCommercial |
+----------+---------------+--------------+--------------+----------+----------+-----------+-----------+--------+----------------+-------------+---------------+----------+---------------+-----------------+---------------+-------------------+------------------------+----------------------+------------------+
| 1 | disneyland | 1234 anaheim | 4567 anaheim | NULL | NULL | 123456789 | 123456789 | NULL | www.foobar.com | fun | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 2 | disneyworld | 1234 anaheim | 4567 anaheim | NULL | NULL | 123456789 | 123456789 | NULL | www.foobar.com | fun | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 3 | hyatt regency | 1234 anaheim | 4567 anaheim | NULL | NULL | 123456789 | 123456789 | NULL | www.foobar.com | fun | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 4 | venetian | 1234 anaheim | 4567 anaheim | NULL | NULL | 123456789 | 123456789 | NULL | www.foobar.com | fun | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+----------+---------------+--------------+--------------+----------+----------+-----------+-----------+--------+----------------+-------------+---------------+----------+---------------+-----------------+---------------+-------------------+------------------------+----------------------+------------------+
4 rows in set (0.00 sec)

mysql> select * from ResortAttribute;
+----------+-------------+
| ResortID | AttributeID |
+----------+-------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 7 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 5 |
| 3 | 2 |
| 3 | 6 |
| 4 | 5 |
| 4 | 6 |
+----------+-------------+
12 rows in set (0.19 sec)


Here are the results so far:

Like the original query:

mysql> SELECT DISTINCT Resort.ResortID,
-> Resort.Name,
-> Resort.Address1,
-> Resort.Address2,
-> Resort.Address3,
-> Resort.Address4 FROM Resort,
-> ResortAttribute c2,
-> ResortAttribute c3,
-> ResortAttribute c4,
-> ResortAttribute c5,
-> ResortAttribute c6 WHERE Resort.ResortID = c2.ResortID AND
-> Resort.ResortID = c3.ResortID AND
-> Resort.ResortID = c4.ResortID AND
-> Resort.ResortID = c5.ResortID AND
-> Resort.ResortID = c6.ResortID AND
-> Resort.Name like '%%' AND
-> c2.attributeid = 1 AND
-> c3.attributeid = 2 AND
-> c4.attributeid = 3 AND
-> (c5.attributeid = 5 OR c6.attributeid = 7)
-> order by Resort.name;
+----------+-------------+--------------+--------------+----------+----------+
| ResortID | Name | Address1 | Address2 | Address3 | Address4 |
+----------+-------------+--------------+--------------+----------+----------+
| 1 | disneyland | 1234 anaheim | 4567 anaheim | NULL | NULL |
| 2 | disneyworld | 1234 anaheim | 4567 anaheim | NULL | NULL |
+----------+-------------+--------------+--------------+----------+----------+
2 rows in set (0.08 sec)


That's the result we're looking for, both Disney resorts have AttributeID 1 AND 2 AND 3, and each has one of the optional AttributeIDs (5 OR 7).

My first attempt assumed that the IN statement is EXCLUSIVE, when in fact it is INCLUSIVE. In other words, an IN statement is just a bunch of OR statement values bound in parenthesis:

mysql> SELECT
-> DISTINCT(r.Name) as name,
-> r.Address1,
-> r.Address2,
-> r.Address3,
-> r.Address4
-> FROM
-> Resort r
-> , ResortAttribute ra
-> WHERE
-> ra.resortid = r.resortid
-> AND
-> ra.attributeid IN (1,2,3)
-> AND r.name like "%%"
-> ;
+---------------+--------------+--------------+----------+----------+
| name | Address1 | Address2 | Address3 | Address4 |
+---------------+--------------+--------------+----------+----------+
| disneyland | 1234 anaheim | 4567 anaheim | NULL | NULL |
| disneyworld | 1234 anaheim | 4567 anaheim | NULL | NULL |
| hyatt regency | 1234 anaheim | 4567 anaheim | NULL | NULL |
+---------------+--------------+--------------+----------+----------+
3 rows in set (0.01 sec)


As you can see, the Hyatt is included even though it doesn't have attributes 1 AND 2 AND 3:

mysql> select * from ResortAttribute where ResortID = 3;
+----------+-------------+
| ResortID | AttributeID |
+----------+-------------+
| 3 | 2 |
| 3 | 6 |
+----------+-------------+
2 rows in set (0.00 sec)

Here is the result of your first correction to the SQL:

mysql> SELECT DISTINCT Resort.ResortID, Resort.Name, Resort.Address1, Resort.Address2, Resort.Address3, Resort.Address4
-> FROM Resort
-> LEFT JOIN ResortAttribute c2 on (Resort.ResortID=c2.ResortID)
-> AND Resort.Name like '%%'
-> AND c2.AttributeID IN(1,2,3,5,7)
-> ORDER BY Resort.Name
-> LIMIT 0, 20;
+----------+---------------+--------------+--------------+----------+----------+
| ResortID | Name | Address1 | Address2 | Address3 | Address4 |
+----------+---------------+--------------+--------------+----------+----------+
| 1 | disneyland | 1234 anaheim | 4567 anaheim | NULL | NULL |
| 2 | disneyworld | 1234 anaheim | 4567 anaheim | NULL | NULL |
| 3 | hyatt regency | 1234 anaheim | 4567 anaheim | NULL | NULL |
| 4 | venetian | 1234 anaheim | 4567 anaheim | NULL | NULL |
+----------+---------------+--------------+--------------+----------+----------+
4 rows in set (0.01 sec)

It gives us all the results in this (admittedly small) data set because all conditions are met:

Same for the extra OR statement:

mysql> SELECT DISTINCT Resort.ResortID, Resort.Name, Resort.Address1, Resort.Address2, Resort.Address3, Resort.Address4
-> FROM Resort
-> LEFT JOIN ResortAttribute c2 on (Resort.ResortID=c2.ResortID)
-> AND Resort.Name like '%%'
-> AND c2.AttributeID IN(1,2,3)
-> OR c2.AttributeID IN(5,7)
-> ORDER BY Resort.Name
-> LIMIT 0, 20;
+----------+---------------+--------------+--------------+----------+----------+
| ResortID | Name | Address1 | Address2 | Address3 | Address4 |
+----------+---------------+--------------+--------------+----------+----------+
| 1 | disneyland | 1234 anaheim | 4567 anaheim | NULL | NULL |
| 2 | disneyworld | 1234 anaheim | 4567 anaheim | NULL | NULL |
| 3 | hyatt regency | 1234 anaheim | 4567 anaheim | NULL | NULL |
| 4 | venetian | 1234 anaheim | 4567 anaheim | NULL | NULL |
+----------+---------------+--------------+--------------+----------+----------+
4 rows in set (0.00 sec)

We can overcome part of this limitation with a HAVING clause:

mysql> SELECT
-> DISTINCT(r1.Name)
-> , count(r1.ResortID) AS count1
-> , r1.ResortID
-> , r1.Address1
-> , r1.Address2
-> , r1.Address3
-> , r1.Address4
-> FROM
-> Resort r1
-> , ResortAttribute ra1
-> WHERE
-> ra1.ResortID = r1.ResortID
-> AND
-> r1.name like "%%"
-> AND
-> ra1.AttributeID IN (1,2,3)
-> GROUP BY r1.ResortID
-> HAVING count1 = 3;
+-------------+--------+----------+--------------+--------------+----------+----------+
| Name | count1 | ResortID | Address1 | Address2 | Address3 | Address4 |
+-------------+--------+----------+--------------+--------------+----------+----------+
| disneyland | 3 | 1 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| disneyworld | 3 | 2 | 1234 anaheim | 4567 anaheim | NULL | NULL |
+-------------+--------+----------+--------------+--------------+----------+----------+
2 rows in set (0.03 sec)

That gives us exclusivity on the IN statement (the value of 'count1' is the number of AttributeIDs the IN clause contains, in this instance 3.). However, the OR values are still tricky...I tried the UNION again, but it still isn't exclusive:

mysql> SELECT
-> DISTINCT(r.Name)
-> ,count(r.ResortID) as count
-> , r.ResortID
-> , r.Address1
-> , r.Address2
-> , r.Address3
-> , r.Address4
-> FROM
-> Resort r
-> , ResortAttribute ra
-> WHERE
-> ra.ResortID = r.ResortID
-> AND
-> ra.attributeid IN (5,7)
-> AND r.Name like "%%"
-> GROUP BY r.ResortID
-> UNION ALL
-> SELECT
-> DISTINCT(r1.Name)
-> , count(r1.ResortID) AS count1
-> , r1.ResortID
-> , r1.Address1
-> , r1.Address2
-> , r1.Address3
-> , r1.Address4
-> FROM
-> Resort r1
-> , ResortAttribute ra1
-> WHERE
-> ra1.ResortID = r1.ResortID
-> AND
-> r1.name like "%%"
-> AND
-> ra1.AttributeID IN (1,2,3)
-> GROUP BY r1.ResortID
-> HAVING count1 = 3;
+-------------+-------+----------+--------------+--------------+----------+----------+
| Name | count | ResortID | Address1 | Address2 | Address3 | Address4 |
+-------------+-------+----------+--------------+--------------+----------+----------+
| disneyland | 1 | 1 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| disneyworld | 1 | 2 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| venetian | 1 | 4 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| disneyland | 3 | 1 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| disneyworld | 3 | 2 | 1234 anaheim | 4567 anaheim | NULL | NULL |
+-------------+-------+----------+--------------+--------------+----------+----------+
5 rows in set (0.03 sec)


So we're still behind the eight ball...but, this using the HAVING clause, we can add some neat functionality. Say you want to rank Resorts on how much of the criteria was met. We can add the OR conditional and change the having to an >=:

mysql> SELECT
-> DISTINCT(r1.Name)
-> , count(r1.ResortID) AS count1
-> , r1.ResortID
-> , r1.Address1
-> , r1.Address2
-> , r1.Address3
-> , r1.Address4
-> FROM
-> Resort r1
-> , ResortAttribute ra1
-> WHERE
-> ra1.ResortID = r1.ResortID
-> AND
-> r1.name like "%%"
-> AND
-> ra1.AttributeID IN (1,2,3,5,7)
-> GROUP BY r1.ResortID
-> HAVING count1 >= 3;
+-------------+--------+----------+--------------+--------------+----------+----------+
| Name | count1 | ResortID | Address1 | Address2 | Address3 | Address4 |
+-------------+--------+----------+--------------+--------------+----------+----------+
| disneyland | 4 | 1 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| disneyworld | 4 | 2 | 1234 anaheim | 4567 anaheim | NULL | NULL |
+-------------+--------+----------+--------------+--------------+----------+----------+
2 rows in set (0.00 sec)


In this example it returns the 'correct' data set, but that is a result of the limited amount of data in the example:

mysql> insert into ResortAttribute values(4,1),(4,2);
Query OK, 2 rows affected (0.40 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> SELECT
-> DISTINCT(r1.Name)
-> , count(r1.ResortID) AS count1
-> , r1.ResortID
-> , r1.Address1
-> , r1.Address2
-> , r1.Address3
-> , r1.Address4
-> FROM
-> Resort r1
-> , ResortAttribute ra1
-> WHERE
-> ra1.ResortID = r1.ResortID
-> AND
-> r1.name like "%%"
-> AND
-> ra1.AttributeID IN (1,2,3,5,7)
-> GROUP BY r1.ResortID
-> HAVING count1 >= 3;
+-------------+--------+----------+--------------+--------------+----------+----------+
| Name | count1 | ResortID | Address1 | Address2 | Address3 | Address4 |
+-------------+--------+----------+--------------+--------------+----------+----------+
| disneyland | 4 | 1 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| disneyworld | 4 | 2 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| venetian | 3 | 4 | 1234 anaheim | 4567 anaheim | NULL | NULL |
+-------------+--------+----------+--------------+--------------+----------+----------+
3 rows in set (0.18 sec)

mysql> select * from ResortAttribute where ResortID = 4;
+----------+-------------+
| ResortID | AttributeID |
+----------+-------------+
| 4 | 1 |
| 4 | 2 |
| 4 | 5 |
| 4 | 6 |
+----------+-------------+
4 rows in set (0.00 sec)


So now we have three Resorts, two of which are an 80% match (count1 / $total_number_of_attributes) and one which is a 60% match. Kind of neat, and a nice way to use the feature, but still not the data set that the original query returns:

mysql> SELECT
-> DISTINCT(r1.Name)
-> , count(r1.ResortID) AS count1
-> , r1.ResortID
-> , r1.Address1
-> , r1.Address2
-> , r1.Address3
-> , r1.Address4
-> FROM
-> Resort r1
-> , ResortAttribute ra1
-> WHERE
-> ra1.ResortID = r1.ResortID
-> AND
-> r1.name like "%%"
-> AND
-> ra1.AttributeID IN (1,2,3,5,7)
-> GROUP BY r1.ResortID
-> HAVING count1 >= 3;
+-------------+--------+----------+--------------+--------------+----------+----------+
| Name | count1 | ResortID | Address1 | Address2 | Address3 | Address4 |
+-------------+--------+----------+--------------+--------------+----------+----------+
| disneyland | 4 | 1 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| disneyworld | 4 | 2 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| venetian | 3 | 4 | 1234 anaheim | 4567 anaheim | NULL | NULL |
+-------------+--------+----------+--------------+--------------+----------+----------+
3 rows in set (0.45 sec)





The director of net-ops I work with was intrigued by this problem, meaning it's a good one...we will rarely spend too much time on something that isn't 'interesting' and we've deemed this one interesting. His suggestion was to bitmask the whole shebang:

ResortAttribute 1234567
_______________________
Resort 1|1110001
2|1110100
3|0100010
4|1100110


I'm not sure how to accomplish this without breaking my brain...

I'll figure it out. When I do, who do I send the invoice to? I don't work cheap you know! :P

Cheers,

Dov
Incidentally, I should have written the HAVING clause like this:

SELECT
DISTINCT(r1.Name)
, count(r1.ResortID) AS count1
, r1.ResortID
, r1.Address1
, r1.Address2
, r1.Address3
, r1.Address4
FROM
Resort r1
, ResortAttribute ra1
WHERE
ra1.ResortID = r1.ResortID
AND
r1.name like "%%"
AND
ra1.AttributeID IN (1,2,3,5,7)
GROUP BY r1.ResortID
HAVING count1 > 3;

Because the Resort MUST have all three AttributeIDs; 1, 2 and 3, AND it must have ONE of the optional attributes. But that doesn't solve the problem:

mysql> insert into ResortAttribute values(4,7);
Query OK, 1 row affected (0.00 sec)

ResortID 4 now has four of the attributes, two of three required and two of the optional:

mysql> select * from ResortAttribute where ResortID = 4;
+----------+-------------+
| ResortID | AttributeID |
+----------+-------------+
| 4 | 1 |
| 4 | 2 |
| 4 | 5 |
| 4 | 6 |
| 4 | 7 |
+----------+-------------+
5 rows in set (0.00 sec)

And as such is meets all the criteria:

mysql> SELECT
-> DISTINCT(r1.Name)
-> , count(r1.ResortID) AS count1
-> , r1.ResortID
-> , r1.Address1
-> , r1.Address2
-> , r1.Address3
-> , r1.Address4
-> FROM
-> Resort r1
-> , ResortAttribute ra1
-> WHERE
-> ra1.ResortID = r1.ResortID
-> AND
-> r1.name like "%%"
-> AND
-> ra1.AttributeID IN (1,2,3,5,7)
-> GROUP BY r1.ResortID
-> HAVING count1 > 3;
+-------------+--------+----------+--------------+--------------+----------+----------+
| Name | count1 | ResortID | Address1 | Address2 | Address3 | Address4 |
+-------------+--------+----------+--------------+--------------+----------+----------+
| disneyland | 4 | 1 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| disneyworld | 4 | 2 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| venetian | 4 | 4 | 1234 anaheim | 4567 anaheim | NULL | NULL |
+-------------+--------+----------+--------------+--------------+----------+----------+
3 rows in set (0.01 sec)


One last thing, if you are going to index the AttributeID, you might as well make it a composite AttributeID, ResortID and let the optimizer choose where to use that or the PK depending on the situation...

Dov
Bingo!

mysql> SELECT
-> DISTINCT(r1.Name)
-> , COUNT(r1.ResortID) as count1
-> , r1.ResortID
-> , r1.Address1
-> , r1.Address2
-> , r1.Address3
-> , r1.Address4
-> FROM
-> Resort r1
-> , ResortAttribute ra1
-> WHERE
-> ra1.ResortID = r1.ResortID
-> AND
-> r1.name like "%%"
-> AND
-> ra1.AttributeID IN (1,2,3)
-> AND
-> r1.Name =
-> (SELECT
-> DISTINCT(r2.Name)
-> FROM
-> Resort r2
-> , ResortAttribute ra2
-> WHERE
-> ra2.AttributeID in (5,7)
-> AND
-> r2.ResortID = r1.ResortID
-> )
-> GROUP BY r1.ResortID
-> HAVING count1 = 3;
+-------------+--------+----------+--------------+--------------+----------+----------+
| Name | count1 | ResortID | Address1 | Address2 | Address3 | Address4 |
+-------------+--------+----------+--------------+--------------+----------+----------+
| disneyland | 3 | 1 | 1234 anaheim | 4567 anaheim | NULL | NULL |
| disneyworld | 3 | 2 | 1234 anaheim | 4567 anaheim | NULL | NULL |
+-------------+--------+----------+--------------+--------------+----------+----------+
2 rows in set (0.12 sec)

The key here was the DISTINCT value of the Resort.Name, that insures that the sub-select will only return a single value while still meeting all the criteria.

Chris,

Please try this in the 'real' environment with a variety of values and let me know if there are differing data sets from the original query to the one I've proposed, as this is a really small pool of data.

I'm still playing with this, I can't say it's correct yet. Here's the explain plans with and without the index though:

+----+--------------------+-------+--------+---------------+---------+---------+-----------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+---------------+---------+---------+-----------------+------+----------------------------------------------+
| 1 | PRIMARY | r1 | ALL | PRIMARY | NULL | NULL | NULL | 4 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | ra1 | ref | PRIMARY | PRIMARY | 4 | dov.r1.ResortID | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | r2 | eq_ref | PRIMARY | PRIMARY | 4 | dov.r1.ResortID | 1 | Using temporary |
| 2 | DEPENDENT SUBQUERY | ra2 | index | NULL | PRIMARY | 8 | NULL | 15 | Using where; Using index |
+----+--------------------+-------+--------+---------------+---------+---------+-----------------+------+----------------------------------------------+
4 rows in set (0.00 sec)

mysql> alter table ResortAttribute add index attr_resort_Idx (`AttributeID`, `ResortID`);
Query OK, 15 rows affected (2.51 sec)
Records: 15 Duplicates: 0 Warnings: 0

mysql> explain SELECT DISTINCT(r1.Name) , COUNT(r1.ResortID) as count1 , r1.ResortID , r1.Address1 , r1.Address2 , r1.Address3 , r1.Address4 FROM Resort r1 , ResortAttribute ra1 WHERE ra1.ResortID = r1.ResortID AND r1.name like "%%" AND ra1.AttributeID IN (2,7) AND r1.Name = (SELECT DISTINCT(r2.Name) FROM Resort r2 , ResortAttribute ra2 WHERE ra2.AttributeID in (3,4) AND r2.ResortID = r1.ResortID ) GROUP BY r1.ResortID HAVING count1 = 2;
+----+--------------------+-------+--------+-------------------------+-----------------+---------+-----------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+-------------------------+-----------------+---------+-----------------+------+----------------------------------------------+
| 1 | PRIMARY | r1 | ALL | PRIMARY | NULL | NULL | NULL | 4 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | ra1 | ref | PRIMARY,attr_resort_Idx | PRIMARY | 4 | dov.r1.ResortID | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | r2 | eq_ref | PRIMARY | PRIMARY | 4 | dov.r1.ResortID | 1 | Using temporary |
| 2 | DEPENDENT SUBQUERY | ra2 | range | attr_resort_Idx | attr_resort_Idx | 4 | NULL | 2 | Using where; Using index |
+----+--------------------+-------+--------+-------------------------+-----------------+---------+-----------------+------+----------------------------------------------+
4 rows in set (0.56 sec)


Cheers,

Dov

RSS

© 2017   Created by Chris.   Powered by

Badges  |  Report an Issue  |  Terms of Service