Sharding with MangoDB

As per my earlier post we need to scale out our data store when the application data requirements outgrow a single instance capacity. This is in most cases achieved through sharding (a.k.a. partition of data).

With traditional RDBMS systems the application data model has to be modeled such from the very start to enable equalized partitioning of data or it may require a considerable change if we want to do it as an afterthought to scale up the application for using multiple database instances where each instance keep a balanced subset of the big data for an application. So in a way we have to plan for scaling out an application in typical RDBMS world but then with MongoDB it can actually occur as an afterthought as it provides automatic balancing and fail over of the partioned data

MongoDB as on 1.6 onwards supports production ready automated sharding architecture where we can convert a single instance into a cluster of shard whereby MongoDB manages the distribution of load of data across the shards (redistributing if the data on one node goes out of proportion with the rest) and also provides fail over support with each shard data being replicated in a replica set and all this with very minimal or no code change. The application needs to connect to a mongos process (which sits in front of the sharded cluster) which is responsible for managing the reads and writes across these shards giving a similar impression as we are working with a single node database systems.
Read the rest of this entry »

Web Applications Scalability

Please do take care of the scalability concerns of your application or at least architect it such that it can be scalable later on.

If we ignore this fact at the very start concentrating more on the business aspects of the application (something which is misunderstood in Agile) we are doomed to fail even at the slightest hint of making the application scalable. Do not get me wrong, there are measures which can be taken a little later in the development cycle which can improve the scalability of the application but then they have their limitations. For example something like what to choose as data storage is something you need to decide upfront. Agreed having designed the application with a layer of data access objects abstracts the rest of the application from the data storage dependency but then why do the rework of rewriting the data access implementation. I am very sure we cannot regard such a change as trivial.

Have a strategy to allow the application to scale from the very start will definitely help and it is more a matter of choosing the right technologies and right architecture to accomplish it upfront.
Read the rest of this entry »

First Stop to Cassandra

So I start my search for a non relational database one more time to see if the latest buzz about NoSQL really worth it or is it just another time when we are looking at the options which eventually are going to die down for the ever dominant RDBMS.

I soon realized NoSQL encompasses much more then I would liked (just because I find myself a little lost with so many options) with so much already out there challenging the SQL databases. My first stop Cassandra.

Why Cassandra? – For me at least to start with the need for a non RDBMS has been quite straight forward, a data store which can provide me high availability with complete fault tolerance, highly scalable, very minimal downtime if any. This is mostly achieved through highly replicated and db clustered environment. Cassandra comes out as a natural choice to explore.
Read the rest of this entry »

Application Availability

Availability should be defined not just at the application level scope but also at each functionality level. It should be further quantified in terms of acceptable response time.

Availability of an application is increased by getting rid of single point of failure and this is done by converting this point into a pool of points so that if one goes down, the request is served by another point. Pooling the point is more commonly referred as horizontal scaling and building for it requires load balancing the requests hitting the application plus a provision for fail over support.

Load balancing and fail over support is done through deploying the application in a clustered environment. Clustering not only ensures high availability but also make the application more scalable. Clustering is the term which describes the was to provide redundant servers to ensure high availability.

Clustering is OK but what to cluster, anything where we are calling a distributed object, so need to cluster those components that can be deployed in a distributed way.

How to scale a J2EE application?

Some quite simple and basic points.

  • Get rid of single point of bottleneck : There can be any point across the whole architecture of a layered lava application which can act as a constraint from the application to scale . This point is know as the single point of bottleneck, Single point of bottleneck can be any service, resource or server that all the request shares and goes through, examples can be connection pools, singletons, databases, JMS queues, web services etc. First we need to identify that constraint and then identify the ways to remove that constraint. This should be done till we end up finding another constraint.

 

  • Do not use databases for everything : Do not persist high volume stuff which is of very little return like logging info, http session state (there are much better performing options for caching stuff like these) into the database. Put only those things which are persistent and transactional.

 

Read the rest of this entry »

It’s time for your database to perform

The number one reason for applications not performing the way they can is the database misuse. May be the following few simple steps can go a long way making our J2EE applications perform and scale well

1. Use databases for only that data which you need to persist and not for everything.

2. Model and design your database to third normal form in general but do the de-normalization of the database to cater to specific functional needs

3. Create indexes where necessary but lets not overdo it as it has its own impact while inserting stuff as database might take up lot of time updating them.

4. Learn about the specific datatypes our database offers and use them to maximum as some might perform better then the others
Read the rest of this entry »

JVM level Clustering Solution (Open Terracota)

What is it?

To make applications scalable and highly available we need to run java application on more then one JVM. This is known as JVM clustering. JVM-level clustering enables applications to be deployed on multiple JVMs, yet interact with each other as if they were running on the same JVM. Open Terracotta allows threads in a cluster of JVMs to interact with each other across JVM boundaries using the same build-in JVM facilities extended to have a cluster-wide meaning

What does it provide?

It provides transparent clustering and coordination services for the Java platform by allowing us to selectively share object graphs across the cluster (Heap level Replication), manage heaps much bigger than for a single JVM (Large Virtual Heap) and distributed wait/notify and synchronized capability (Cluster wide locking semantics). All this with no serialization.
Read the rest of this entry »

Follow

Get every new post delivered to your Inbox.