Cluster 101

There was a query about clustered hosting on Whirlpool. I wrote a reply and thought it’d be pertinent for someone who was starting out during the investigation of building a clustered services setup. The original thread is here.

There’s two main ways (and you can mix and match components of each) you can approach a hosted cluster situation with both having pros and cons. I’ll try and be short and to the point:

Scenario:
Clustered services delivered over a single system image platform. Shared infrastructure delivered via a direct connect mechanism (Fibre Channel using SAS disks runs quite nicely). No requirement to have frontend load balancers.

Single system image scenario’s present a single root and process tracking/control system. Consequently your memory displays as the sum of all your servers and each node simply results in an ‘additional’ set of CPUs. At the network layer your SSI delivers traffic to the node that requires it.

Pros:
– It’s a single system image. That means 1 set of packages, 1 set of maintenance cycles etc.
– Ability to have processes migrated as required and automatically. That means that your processes are balanced cleanly to your other nodes and it’s an equal CPU/memory distribution.
– 1 installation so it’s more familiar to someone just wanting to do something quickly.
– It’s a single / setup. Consequently, you have less chance of having to deal with network file system issues (anyone who’s used mbox file format [god forbid! :-|] on NFS would know what I mean).

Cons:
– Direct connected storage (although you can cut corners) can be prohibitively expensive.
– Higher chance of a human causing a disaster. Basically, if you simply ‘shutdown’ you’re going to take down your entire cluster so you have to be sure ‘root’ level operations are done with care.
– Commercial software is more than likely going to have issues in this scenario. If you wrote the software your running this is less of a factor to take into consideration.
– Network infrastructure is a bit hacked. The whole concept of layering a network layer over a network layer in kernel isn’t a netops friendly situation. 🙂

Scenario:
Segmented services, shared storage infrastructure delivered over a command file sharing mechanism (eg. NFS). Frontend load balancers doing either L2, L3 Direct Routing or L3 NAT (depends on your budget and wants/needs).

Pros:
– Scale out as required
– Easier integration with commercial control panel software. Since your nodes have the potential to share all their IPs (using direct routing at least) you can typically run up a failover control panel pair then simply sync configuration to your ‘head’ nodes with web, mail using something like cfengine.
– Can virtualise some or all of the services components. In one scenario I’ve had multiple servers with 10 or so virtual environments but whether you can do that is determinant on your requirements.

Cons:
– Probably a tad more expensive in network equipment
– While you can balance requests effectively it doesn’t really provide an ‘equal’ processor/memory share between your hardware.
– Your storage is going to be delivered via a pair of servers (failover). Depending on what your requirements are this may not provide all the redundancy you’re really looking for.
– Quite a few machines to look after independently. That’s ok if it’s the only thing you do during your day but sysadmin’s aim to be as lazy if at all possible and inevitably there’s always another project you could be working on! 🙂

Ultimately it’s your call as to what is best suited for your situation. Personally I’ve done both scenarios and they are both perfect for different situations.

Good luck!

Stuart