Everything MySQL

A great place to be!

This post describes how MySQL Cluster executes queries. First of all, Cluster is a storage engine. It doesn't actually execute queries because it doesn't speak SQL. That is why you use a MySQL server, which parses your queries and sends low-level storage engine API calls to the Cluster data nodes. The data nodes know how to retrieve or store data. Or you can talk to the data nodes directly using the NDB API(s).

MySQL Cluster has various means of executing queries. They boil down to:
  1. Primary key lookup
  2. Unique key lookup
  3. Ordered index scan (i.e., non-unique indexes that use T-trees)
  4. Full table scan

Let's say you have 4 data nodes in your cluster (NoOfReplicas=2). This means you have 2 node groups and each one has half the data. Cluster uses a hash on the primary key (unless you've controlled the partitioning using the 5.1 partitioning features). So for any table, half the rows are in one node group and half the rows are the in other node group.

Now for the 4 types of query execution. You can verify which type of execution is used with EXPLAIN. Here's how each ones works:
  1. Primary key lookup - the MySQL server can calculate the hash on the primary key and know which data node the relevant row resides in. Then the MySQL server contacts that data node and receives the row.
  2. Unique key lookup - the MySQL server cannot know which data node the row might be stored in. So it contacts a pseudo-random node. That data node has a hidden table that maps the unique key values to the primary key. Then the hash on the primary key reveals where the row resides.
  3. Ordered index scans are sent in parallel to all data nodes, where they search their local t-tree.
  4. Full table scans are send in parallel to all data nodes, where they scan their primary fragment.

Summary: primary key lookups are best. If you have more than 2 nodes, throughput goes up because all nodes are actively serving different fragments of data. Ordered index lookup and full table scans are done in parallel, so more nodes leads to better performance. I hope to post some benchmarks soon (just need to get my hands on the hardware..)

Views: 1647

Comment

You need to be a member of Everything MySQL to add comments!

Join Everything MySQL

RSS

© 2024   Created by Chris.   Powered by

Badges  |  Report an Issue  |  Terms of Service