A great place to be!

Hadoop for MySQL people

There's a lot of buzz lately about Hadoop. If you're completely new to Hadoop, I recommend the free videos from Cloudera (http://www.cloudera.com/resources/?type=Training). If you have a vague idea and want to play around, it's easy!

First, download Cloudera's training VM which has a small Hadoop cluster already installed and running:

http://www.cloudera.com/developers/downloads/virtual-machine/

Second, you need to put some data into Hadoop. Fortunately for database folks, there's a tool to import data into Hadoop from MySQL called "Sqoop". It's already installed on the VM and there are instructions for using Sqoop to import some MySQL tables into Hadoop (see Desktop/instructions/exercises/SqoopExercise.html inside the VM). FYI, it's not uncommon to "Sqoop" data into Hadoop, do analysis and transformations, and then use Sqoop to export the data back to MySQL.

Now you're ready to do analysis of your data using Hadoop's powerful MapReduce. Except that MapReduce requires coding (Java, Python, PHP, etc) and an understanding of the functional programming model that is MapReduce. For an easier entry into Hadoop, try Hive. Hive is a data warehousing system for Hadoop. It offers a language (HiveQL) that feels just like SQL. Examples:

$ hive

hive> SHOW TABLES;

hive> SELECT * FROM <table> LIMIT 10;

Hive supports most of the SQL queries you are used to. For example JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, GROUP BY, ORDER BY, aggregate functions, etc. The best part is that Hive can scale to analyze petabytes of data!

Hadoop for MySQL people

You need to be a member of Everything MySQL to add comments!

RSS