Sunday, November 14, 2010

MongoDB for a MySQL user

Over the past few years NO SQL has become very popular, specially with a capability to boast about an illustrious reference list like twitter, facebook and foursquare.

Being a MySQL user I was curious to explore into this and see what it has and what it offers. Certainly there are plenty of NO SQL implementations, first wanted to check casandra but the first few lines of its documentation sounded like a no go zone, hence decided to try Mongo DB. Going through it I realized that there were alot of similarities. So listed them here for quick reference.


Criteria
MySQL
MongoDB
Type of database
Relational
Document
Installation
On Linux you need to go through a certain amount of steps like user creation, permission setting, creating the data directory
Pretty stright forward create the the data directory at the default location /data/db. It was pretty simpler than MySQL
Starting Up
starting up using safe_mysqld (techincally it invokes mysqld)
pretty straight forward run mongod for a basic start up
Queries
Using SQL
No SQL

Need to execute create statements
ex. When you wanted to create a table with columns x, y
you need to do the following.
1. Create database mine;
2. create table a(x int, name varchar(50))engine = MyISAM;
3. Then seperately do an insert
insert into a values(10, "hiii");
No create statements for DBs, collections, etc
Updated and created as and when needed(lazy loading), so to do the same we need to,
db.mine.insert({x: 10, y: "hiii"})
similarly you have functions for update, save, remove, find

if you want to select data; we can select * from a;
On MongoDB its,
db.mine.find()
Storage Engines
A lot of Storage Engines to choose from
Upto now a single storage engine implementation
Operations/ Transactions
When you fire a query to the MySQL server the client needs to wait till the MySQL server returns the result.
In contrast Mongo DB uses 'Fire and forget'. So technically speaking client is not aware whether the query executed successfully or not. It doesn’t return OK.
For system where you need safe operations immediately following the query getLastError command is issued and the exception handled.
For Requirements like analytics , status message updates, pics comments this fire and forget is perfect. Anyhow for Financial type of system we need to use the work around.

storage engine like INNODB supprts transactions and has ACID compliance
Not typically made for transactional systems
Indexing
Supports mutiple types of indexes. For optimzation EXPLAIN is used with the select statements
Very similar to MySQL indexing.
For optimzation similar to MySQL this has the EXPLAIN statement and another statement called as hint
Main difference is that you can define the order of indexing for composite indexes. Which gives more control to the query definition Eg. db.mine.ensureIndex({"x" : 1}) 1 indicates the direction
Geo Spatial Indexing
MyISAM storage engine supports geo spatial indexing.
Is created by passing "2d" instead of passing 1 or -1 to the ensureIndex  function.
Pretty handy stuff with  straight forward support for querying nerarest locations, find entries within a shape and find distances.
Compound geo spatial indexing also supported, this will be handy for finding nearest ATM etc. I think this is a must check for people developing location based systems
Back Up
MySQL back up is more complex, specific to the storage engines.
Snapshot based back ups on LVM2 volumes is similar to the fsync command in mongo
supports fsync command which is a warm back up with point of time support.
Supports mongodump - which does a hot back up but without point of time support
Replication


Mode of Replication
asynchorouns
asynchronous

Replication is based on Binary log - which has all the writes
Replication is based on OpLog, similar to the binary log but has idompotent operations.
Which basically means if the same statement is executed mutiple times it will not make the date inconsistant .
Anyhow the statements shoule be executed in order

set up is straight forward after having a master and slave in place
quiet similar - at start up itself the server can be specified as slave in the start up line,
./mongod --dbpath ~/data/slave --port 10001 --slave --source localhost:10000

supports complex replication set ups, like two way replication, circular replication, etc
does not support replication from the slave
Replica Set
MySQL cluster supports Replica sets- anyhow MySQL cluster replica sets has synchrous replication
Similarly for HA support MondoDB supports replica sets - which is a master slave cluster with automatic failover.
Similar to MySQL cluster in the replica sets. There are arbiter nodes which decide on the primary and secondary node election in case of failures
Sharding
MySQL doesn’t have inbuilt sharding support. Sharding is usually handled at application or ORM level for MySQL
sharding support is given - mongos is a process that will interface multiple mongods which will have the data distributed.
Very good feature for scalability


For those who want to try out mongodb check this book, http://oreilly.com/catalog/0636920001096

Next up I want to try Couch DB, which is another No SQL database and implemented on Erlang!