Thursday, February 16, 2012

Scaling using mongodb : HA and Automatic sharding (part 2)

We discussed creation of HA using replica sets in the earlier post part 1. In this post we will create a sharded cluster of replica sets consisting of multiple mongodb servers. If you would have noticed the parameters passed during starting mongodb on the earlier server 241 & 242, you could see that in addition to the db path, log path and replica set parameters, I have also added the parameter '--shardsvr'. This enables sharding on this instance of mongodb.

For creating a cluster of database servers using mongodb you need data nodes, config servers and router servers. Data nodes are used to store data. The config servers are used to store sharding information. It is used for the client to figure out where the data resides on the data nodes. The router servers are used by the client to communicate with the config servers and data nodes. The clients interface to the database through the router servers.

On each of the config servers, start mongodb with the parameter '--configsvr'.

230:/home/mongodb } ./bin/mongod --configsvr --logappend --logpath ./data/log --pidfilepath ./data/mongodb.pid --fork --dbpath ./data/config --directoryperdb --bind_ip 192.168.1.230 --port 10002
231:/home/mongodb } ./bin/mongod --configsvr --logappend --logpath ./data/log --pidfilepath ./data/mongodb.pid --fork --dbpath ./data/config --directoryperdb --bind_ip 192.168.1.231 --port 10002
And start routing service on the routing servers.

233:/home/mongodb } ./bin/mongos --port 10003 --configdb 192.168.1.230:10002,192.168.1.231:10002 --logappend --logpath ./data/route_log --fork --bind_ip 192.168.1.233 &
The same service can be started on routing server 234. To shard a database and add nodes to the shard, you need to connect to the routing server.

233:/home/mongodb } ./bin/mongo 192.168.1.233:10003
233:/home/mongodb } mongos> use admin
switched to db admin

To add the first replica set (set_a) to the cluster, run the following command.

233:/home/mongodb } mongos> db.runCommand( { addshard : "set_a/192.168.1.241:10000,192.168.1.242:10000,192.168.1.243:10000", name : "shard1" } )
{ "shardAdded" : "shard1", "ok" : 1 }

Similarly replica sets of the other 2 shards can also be added to the mongodb cluster. Eventually you can fire a listShards command to see the shards added to the cluster.

233:/home/mongodb } mongos > db.runCommand( {listshards:1} )
{
        "shards" : [
                {
                        "_id" : "shard1",
                        "host" : "192.168.1.241:10000,192.168.1.242:10000,192.168.1.243:10000"
                },
                {
                        "_id" : "shard2",
                        "host" : "192.168.1.244:10000,192.168.1.245:10000,192.168.1.246:10000"
                },
                {
                        "_id" : "shard3",
                        "host" : "192.168.1.247:10000,192.168.1.248:10000,192.168.1.249:10000"
                }           
        ],                                                                                                                                                        
        "ok" : 1                    
}              

To enable sharding on a database say "test" run the following commands.

233:/home/mongodb } mongos > db.runCommand( { enablesharding : "test" } )
{ "ok" : 1 }

To shard on a particular key. The test collection consists of the following fields id, name and email. Lets shard on id.

233:/home/mongodb } > db.runCommand( { shardcollection : "test.test", key : { id : 1 } } )
{ "collectionsharded" : "test.test", "ok" : 1 }

I used a php script to push more than 20,000 records to the sharded cluster. You can check the status of sharded cluster using the following commands.

233:/home/mongodb } mongos > db.printShardingStatus()
--- Sharding Status ---
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
        {  "_id" : "shard1",  "host" : "192.168.1.241:10000,192.168.1.242:10000,192.168.1.243:10000" }
        {  "_id" : "shard2",  "host" : "192.168.1.244:10000,192.168.1.245:10000,192.168.1.246:10000" }
        {  "_id" : "shard3",  "host" : "192.168.1.247:10000,192.168.1.248:10000,192.168.1.249:10000" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "test",  "partitioned" : true,  "primary" : "shard1" }
                test.test chunks:
                                shard1        2
                                shard2        1
                        { "id" : { $minKey : 1 } } -->> { "id" : "1" } on : shard1 { "t" : 2000, "i" : 1 }
                        { "id" : "1" } -->> { "id" : "999" } on : shard2 { "t" : 1000, "i" : 3 }
                        { "id" : "999" } -->> { "id" : { $maxKey : 1 } } on : shard2 { "t" : 2000, "i" : 0 }

233:/home/mongodb } mongos > db.printCollectionStats()
---
test
{
        "sharded" : true,
        "flags" : 1,
        "ns" : "test.test",
        "count" : 19998,
        "numExtents" : 6,
        "size" : 1719236,
        "storageSize" : 2801664,
        "totalIndexSize" : 1537088,
        "indexSizes" : {
                "_id_" : 662256,
                "id_1" : 874832
        },
        "avgObjSize" : 85.97039703970397,
        "nindexes" : 2,
        "nchunks" : 3,
        "shards" : {
                "shard1" : {
                        "ns" : "test.test",
                        "count" : 19987,
                        "size" : 1718312,
                        "avgObjSize" : 85.97148146295092,
                        "storageSize" : 2793472,
                        "numExtents" : 5,
                        "nindexes" : 2,
                        "lastExtentSize" : 2097152,
                        "paddingFactor" : 1,
                        "flags" : 1,
                        "totalIndexSize" : 1520736,
                        "indexSizes" : {
                                "_id_" : 654080,
                                "id_1" : 866656
                        },
                        "ok" : 1
                },
                "shard2" : {
                        "ns" : "test.test",
                        "count" : 11,
                        "size" : 924,
                        "avgObjSize" : 84,
                        "storageSize" : 8192,
                        "numExtents" : 1,
                        "nindexes" : 2,
                        "lastExtentSize" : 8192,
                        "paddingFactor" : 1,
                        "flags" : 1,
                        "totalIndexSize" : 16352,
                        "indexSizes" : {
                                "_id_" : 8176,
                                "id_1" : 8176
                        },
                        "ok" : 1
                }
        },
        "ok" : 1
}
---

To use multiple routing servers in a database connection string - for connecting to mongodb through a programming language for example php the following syntax can be used

mongodb://[username:password@]host1[:port1][,host2[:port2:],...]/db_name

This will connect to atleast one host from the list of hosts provided. If none of the hosts are available, an exception will be thrown.

No comments: