|
"Clustering Solutions and Zero Downtime Hosting Pitfalls"
There are a number of benchmarks, which we may use to evaluate hosting companies.
One of these is reliability. Like most things in this life, reliability in web
hosting is typically a function of how much we are willing to spend for it.
In essence, a "cost-effectiveness" equation needs to be determined
and solved.
Reliability can be measured in terms of percentage availability. Industry personnel
will talk of reliability in terms of system availability with three (99.9 per
cent), four (99.99 per cent) or five nines.(99.999 per cent).
Typically, web hosting availability exceeding three nines was the purvue of
extremely large companies with multiple layers of redundancy built into their
network and software systems. However technology has now brought high-availability
theory and cost-effective reality into alignment.
High availability can be achieved by removing, as far as possible, any "single
point/s of failure", or, where this is not altogether possible, minimizing
the time spent in a "failure" situation.
One of the ways in which small businesses and ISPs can reasonably avoid single
point of failures is by employing server farm clustering and load-balancing
solutions.
Webopedia defines server farm clustering as follows:
"A server farm is a group of networked servers that are housed in one location.
A server farm streamlines internal processes by distributing the workload between
the individual components of the farm and expedites computing processes by harnessing
the power of multiple servers.
"The farms rely on load-balancing software that accomplishes such tasks
as tracking demand for processing power from different machines, prioritizing
the tasks and scheduling and rescheduling them depending on priority and demand
that users put on the network. When one server in the farm fails, another can
step in as a backup."
It is important to note, that typically, web servers, which are load-balanced
in such a manner, display one external IP address to the public Internet, while
using internal network IPs to communicate between the clustered servers
and load balancer. Now this is indeed fantastic! Not only do you receive web
site peak demand scalability with web server clusters, but you also have the
built-in "high uptime availability" component which is so important.
However this is only half of the picture. There are very important cautionary
notes to keep in mind.
Where web hosting is concerned, availability depends on two things:
1. Hardware reliability (RAID drives, server clustering etc) within the Data
Center;
2. High Bandwidth Internet Connectivity to the Data Center / Network Operating
Center (NOC).
Now, with all your well thought out server clustering solutions, what would
be the result, if, (as had recently occurred in a very high profile web company),
a fire in the Network vicinity had caused the entire Data Center to shut down
power for hours. Or, a bandwidth provider to the NOC had router problems. All
your websites would be showing the dreaded "Page Cannot be Displayed"
page.
The ideal solution therefore would be to employ clustering solutions with servers
in entirely different Data Centers with different bandwidth providers. Redundant
Data Centers eliminate the NOC itself being a single point of failure. This
scenario becomes interesting at this point, because the difficulty of addressing
the potential problems now increase exponentially.
We now have to deal with DNS caching, the concept of failover, and how static
and dynamic web applications respond to failure events. Failover and Load balancing
are frequently used interchangeably, however they are in fact quite different.
Load Balancing refers to physically sharing servers capacity, so that one server
is not overloaded and swamped with requests.
Failover however, is the process that manually or automatically switches a failed
server or bandwidth provider to a standby server or network if the primary system
fails or is temporarily shut down for servicing.
As such, failover software is an important function of mission-critical systems
that rely on constant accessibility.
One of the inherent difficulties with failover for Web Hosting companies operating
on different networks is the limitations imposed by the DNS caching system.
As DNS records are passed from the original DNS servers (i.e., ns1/ns2.your-domain.com),
they are cached or stored at several different ISPs along the way. Which
is why it takes a while for a newly registered domain name to resolve to its
IP address.
Each DNS record has a TTL (time to live) setting assigned. By manipulating
this value, it is possible to alter how long that particular IP address/ DNS
record combo is stored. If your site is on two different servers with two different
IP addresses, you could set the time to live with a value of, say,
two minutes.
The failover software would check server availability by "pinging"
the web server every few minutes to determine whether its IP address is
responding appropriately. (perhaps by looking for a particular text string in
a web page).
If a failure is detected, then the software would pull the non-working web
server IP address out of the list of IP addresses assigned to the your web sites
domain name. If/when your web server IP comes back online it would be restored
to the list.
With a TTL setting of twp minutes, theoretically, your web site should be down
for just two minutes, while switching DNS information to the other web server.
The problem with this scenario, is that, while some ISPs caching might
respond to such low figures, other ISPs may decide to ignore,(to save
on bandwidth utilization), any TTLs below a certain value, say, 60 minutes.
So it is entirely possible that some of your visitors would see your websites
and for others, your site would be down for one hour or more, even though one
of your servers was operating perfectly.
Static non interactive web sites are great candidates for server clustering,
but the wicket becomes a bit sticky for dynamically generated sites. Most database
application software in general, although having some replication capabilities,
are not happy with multiple server master/slave relationships and real time
updating between servers. The issue can become very problematic if your site
requires frequent updates.
Then there is the problem of how to keep your websites synchronized. Unix/Linux
servers have a built in synchronizing software tool called rsync. You can also
automate the synchronizing process by setting up a cron job to run periodically.
DNS caching and synchronizing issues can be so problematic so as to nullify
the advantages of server clustering. For example, a cron job to synchronize
your servers every few minutes might very well use up your server capacity.
Your customers will also have to contend with their desktop email client software
having dual email addresses for each email account on each web server. e.g.
info@server1.net, info@server2.net.
It is important to realize that DNS operates by default in a round robin manner,
so that, if you have the same web site on two separate servers, it is very likely
that server one will get 50 per cent of all the web traffic.
Now, this is important for a number of reasons, but one of the principal reasons
to keep this in mind, is that, you will not be able to effectively keep a "back-up"
site (as some providers would have you believe) which will only be used when
the primary server goes down. For e.g. a site saying" were sorry
our main server is down but you may contact us at: http://www.yourdomain2.com.
On a final note, hardware based load balancing solutions tend to be quite expensive
and also introduce a potential single point of failure into the system, the
load balancer itself. There is a very prominent Data Center that began offering
load balanced hosting solutions, where the load balancer itself failed on several
occasions, although the web servers were operating perfectly. The net effect
to the public however, was that the sites were unavailable. Reasonable cost
effective software based solutions may be obtained as a service model or by
purchasing the software yourself. Zoneedit is an example of a service model,
and Simplefailover is an example of a software based model which maybe purchased
on a server license basis.
In conclusion, at this point in time, there are several limiting factors to
successfully implementing a "true" high availability multiple server
web hosting system. Depending on your clientele and the nature of their web
sites,this may indeed be a very viable alternative. For others, simply setting
up a server with high quality components, redundant RAID hard drives and a good
supply of server spare parts may be the best way to ensure high availability.
About the Author:
Godfrey Heron is the Website Manager of the Irieisle Multiple Domain Hosting
Services company. Signup for your free trial, and host multi domains and web
sites on one account: http://www.irieisle-online.com |