Columbia College
Server Load Balancing: in Relation to Website Design and Hosting
Rian Booyer
Instructor Matthew Chrenka
CISS 493 ADE
5 December 2018
Watch my paper presentation on YouTube:
Abstract
Clustering is a concept I am very interested in, so I will approach the subject by breaking down certain sections and explaining them as I can. The first section is the introduction explaining why I was interested in the subject and will pose the question of whether clustering computers in my home environment will improve the abysmal performance of an old website I used to take care of. Section two deals with what clustering does and its purpose while section three deals with load balancing clusters. Section four deals with my own experiments here at home with virtualized cluster servers and the results of those experiments. Lastly section five deals with my conclusion on the results of my new understandings of cluster computing with the limited experiments I was able to run.
I found that even though my resources were limited that separating out the different tasks or jobs of a web server did in fact increase my ability to speed up the serving of the site over the long run and the paper includes some specific information on how I did this using what software platforms, server software, and some details on the experiment I performed to come to my conclusions.
1. Introduction
A while ago I hosted a website with a hosting provider and the provider tried to force me to upgrade from a shared hosting to a virtual server on their system due to the fact my website took too many resources from their main system. In my mind, a site that did not get more than fifty unique visitors a month didn’t need to be put on a forty dollar a month server especially since I did not make money off the site since its primary purpose was to provide a service to the community in which I live.
I moved it off their servers onto an old nasty laptop that didn’t have a screen, keyboard, or any of the usual niceties that systems come with these days. It was basically a motherboard wrapped in plastic with a network port and a SATA hard drive taped to the bottom. This system outperformed the hosting provider I was using at the time which surprised me but got me to thinking about how bad their cluster servers must be that an old laptop most people would toss away could outperform. Thus, I decided to try to understand the clustering of servers in relation to website hosting and if I put my web server into a group of clusters would it improve performance.
2. What is the purpose of clustering servers?
Servers are meant to perform specific purposes within a company, but they are expensive with some costing millions of dollars each. However, if you need the power of a mainframe computer but on a much smaller budget, clustering may be an option for your business. The choice is, do you upgrade your single server or mainframe to a more expensive platform, or do you add additional lower-cost servers and connect those servers together into a cluster. Having a cluster will allow you to increase the power of your company’s system using multiple servers to handle requests, add reliability by allowing the addition and removal of systems at will, and provide failover in case problems develop with individual servers. Failed or removed servers are logged by the cluster controller software (or load balancer proxy) as failed and allow the system to be marked as offline. While offline requests are not routed to the server; instead, the cluster control software skips this server and sends the requests to the next available server in line to fulfill the needed purpose of the cluster until it is repaired, or upgrades are completed, and the server is brought back online (Bryhni et al.) (Limoncelli). Clustering also allows you to split up the complicated primary purpose of a computer system into smaller tasks or jobs each handled by a different cluster set.
For example, back when farming was the primary occupation of the world. Farmers needed a way to transport crops to market. Many of these farmers used horse-drawn carts. In this example, horses represent the servers and the carts represent the load on those servers. The farmers would use the horses to pull the carts, however, what choice does the farmer have if the cart got too heavy for the horse. The first one which is the most obvious is to replace the horse each time the carts got too heavy with a stronger horse at great cost, the second option is to make more trips costing the farmer more money to get goods to market. However, what happens when you get a cart that even the strongest horse can’t pull. The solution they came up with was to keep the original horse and purchase a second horse and hitch it to the cart so that it could be pulled by two horses and thus the cart would make it to market. With the horses working together they could not only use the horses longer before replacement but they would also take longer to wear out. In another way, if a horse got sick you could just substitute with a healthy horse without much impact on the team.
2.1 Separating the purpose of a system into smaller tasks or jobs
The purpose of a system is what the system is meant to do, it is where all the tasks come together to make a whole. Many times, however, having a cluster of web servers doesn’t necessarily mean you will increase performance by leaps and bounds especially in the case of serving websites. If the web server is performing the hosting of the web requests for HTTP, HTTPS, DNS, database, and file services then the cluster won’t be running optimally (Paderin).
To increase the performance to needed levels you need to break up the primary purposes into tasks or jobs. For example, a web server running WordPress on Linux uses a web server software package, a database software package, and the file system to store files that can’t be dynamically generated by WordPress through the use of a database such as the image files, pdf files, and so on. To allow the system to run optimally you split these three segments or jobs into different clusters; One to handle HTTP, https requests using in my case the HTTP project by Apache, Second another cluster to handle the database requests such as a Galera Cluster for MariaDB which is a scalable version of the popular MySQL, and a file server cluster using some sort of networked storage cluster or cluster-based off a distributed file system such as the Lustre project (found at Lustre.org) (Paderin).
With the jobs split into three different clusters and handled by a load balancing software or hardware device performance can be increased while server hardware requirements are decreased as discussed earlier. All these tasks are controlled by a load balancing proxy to forward requests to the individual servers in the cluster.
3. Load Balancing Clusters
Load balancing is the process in which a control software will distribute the task of a cluster to individual servers inside that cluster using a load balancing algorithm. Depending on the algorithm used the purpose is to optimize the functionality of the cluster while equally distributing the workload between different server nodes. If the work isn’t distributed evenly to the different server nodes they may become over or under loaded and performance can suffer and bottlenecks can occur so picking the proper algorithm and technique for the tasks the cluster is to perform is very important (Narwal).
A load balancing proxy can be software, hardware, or virtualized and each has pros and cons for the specific use. Many Linux based projects that may be used in a cluster usually contain a load balancer as either a compile in option or module and can exist with the software itself or can be compiled into the kernel. Apache’s HTTP project offers the Mod_Proxy_Balancer module that comes with many flavors of Linux to allow simple cluster configuration and balancing using different algorithms (“Mod_Proxy_Balancer – Apache HTTP Server Version 2.4”). I used MariaDB that uses a built-in cluster control software called Galera and a cluster control called MaxScale (Karlsson), there is even a distributed file system that is compiled in called Lustre that I wasn’t able to experiment with but can be found at lustre.org that can create a series of servers with a mirrored file system and kept up to date very fast through the use of file metadata. These are great places to start if you are trying to find free options for starting a cluster in an organization, however, if you are planning on over three servers the paid versions may be needed to simplify the management and deployment of the clusters themselves. Another option would be dedicated load balanced hardware devices that can be brought online and configured quickly. If you are using a cloud service, there are also load balancer options that can be integrated into those as well.
One option I tried which offers several types is Kemp. Kemp offers many different types of load balancers that can be loaded onto a server starting at $4,000 plus the cost of a server, hardware-based load balancers also starting at $4,000 but includes the hardware, cloud-based load balancers starting at 0.29 an hour and can be deployed in Amazon Web Services (AWS) or Microsoft Azure, and Virtual load balancers that start at $3,000 per year. There is also a free tier for their load balancers that you load onto a hypervisor such as VMware’s ESXI that can be used however, there is a catch. To remain free the Kemp software must be allowed to “dial home” once every 30 days to retain the license and configuration. This is what I used in my experiments.
3.1 Load Balance Algorithms
There are several algorithms that are available on the market for load balancers to use including Round Robin, Randomized, Central Manager, Threshold, Central Queue Algorithm (CQA), Least Connection, and Weighted Least connection to name a few (Tamilarasi, et al.).
In Round Robbin, each new request is sent to a new server each time as if the servers were arranged in a circle. Round Robin works well if the job requests were handled quickly and efficiently and represented an equal amount of time for the work to be completed. However, this is often not the case and Tamilarasi mentions in his article if the workloads are equal the Round Robin Algorithm works well however if the workloads are unequal they create different weights of work for each server and result in some servers being over-loaded while others may be under-loaded. Some companies such as Kemp keep track of server loads and will skip overloaded servers or servers that are close to maximum load if another server is available with more capacity. This is often referred to as Weighted Round Robin or Randomized Algorithm (Tamilarasi, et al.)
Another one to mention is the Central Manager Algorithm that keeps track of the load on each server and sends jobs to servers based on the load of the server at that time. Modern servers and versions of this algorithm can exchange accurate measurements of the load on the server and the Central Manager Algorithm can determine if the job will fit within the load availability of each server, however, too much communication between the servers and the algorithm can create a bottleneck and hold up the job. Tamilarasi states that the best use of this algorithm is in a cluster that uses dissimilar server abilities such as mixing old and new servers together. The threshold algorithm is similar except it only communicates three states: Under loaded, Medium Loaded, and Over Loaded back to the algorithm (Tamilarasi, et al.).
Central Queue Algorithm uses a primary host and routes jobs using a circular FIFO queue. When a computer is ready to handle the job, it sends a request and is issued a job from the FIFO queue and if there are no jobs available it records the order in which the requests for jobs come in, so they are served jobs in the same manner of first in first out (Tamilarasi, et al.).
The last two I am going to mention are Least Connection and Weighted Least Connection. The two algorithms work by counting the number of connections or jobs each server has and assigns new jobs or requests to the server with the lowest amount of working jobs. With Weighted Least Connection the performance of each server is taken into account before assigning so that if a server has a low connection or job count but the utilization of the server is high the job is sent to the next lowest connection server. For example if a server is working forty jobs and another is working two jobs then under Least Connection the server working the two jobs gets the next job, however, if the server with two jobs is at 100% utilization and the server doing the forty jobs is utilized at 50% then the job is sent to the server with forty jobs (Tamilarasi, et al.).
There are so many variations and alternate algorithms it is again important to pick the right ones and to perform your load balancing to improve not only its ability to survive a failure but increase its performance as well.
3.2 Load Balancing Redundancy and Capacity
According to Thomas Limoncelli of ACM, a cluster should be designed with N+1 redundancy where N is the number of necessary servers and +1 is an extra server and represents the number of servers that can be down while still providing full services by the cluster. The reason behind this is in case one server fails then the cluster isn’t completely over-loaded which would cause a slowdown in the system. The redundancy is not limited to the +1 spare server the system could be designed with any number of servers to allow for more servers to be offline at the same time. For example, you could have and N+10 which would allow for 10 servers to be offline at the same time but still have the cluster working at full power (Limoncelli).
Problems arise in three ways according to Limoncelli. First, the team can’t agree on why the cluster exists. If some of the team think it’s supposed to add reliability to the whole system while others think it’s supposed to add more power. The real answer should be the purpose is to do both. Secondly, the team needs to understand the measurable capacity of the system and clearly define what level of redundancy N it is supposed to be. The cluster should be verified to be redundant and resilient by measurable metrics and benchmarks without verification the system may not be properly configured to handle the expected loads. The third problem is when the teams don’t monitor the system to verify what has been defined is working the way it should be. A server could be taken offline to verify the N+1 redundancy and if the system is nearly at capacity then the team should add capacity or extra redundancy when required (Limoncelli).
3.3 Storing files in a clustered environment
Nikos Mastorakis published an article by Hsien-Tsung Chang that touched on this subject very well. In it, Chang discussed (back in 2009) how storage is important in relation to the world wide web and by extension clustered computers that are now used within the world wide web. One major point Chang makes is to point out that traditional storage types for servers such as RAID cannot fulfill the storage requirements of ever increasing website numbers and needs for more storage that these sites require. He discusses how many file system types have been proposed and tested by many companies including the Lustre file system I mentioned previously and others such as the Google file system (Mastorakis).
If I understand Chang’s design correctly the storage servers maintain a list of ID’s in memory similar to Lustre’s metadata server that holds the information, access permissions, location of files and replication information in the servers main memory so that locating and accessing files on any of the servers in the cluster is faster and more efficient than if they were all seen as a single distributed file system. Storage space and file information (Buckets) are replicated to multiple servers and thus load balancing can occur because the servers keep a list of duplicates on which servers and can assign a custom access algorithm to assign what requests for that file are routed to which file on what server. It also balances files by allowing recording of server storage and CPU utilization to allow a decision of where to store certain files on what server. For example, a server that stores large files but has low CPU utilization due to the files not being accessed as much may need the files relocated to other servers, so it can serve files that are accessed more often along with the lower accessed files to optimize the cluster (Mastorakis).
4. My Own Experience and Experimentation
The primary concern when setting up a cluster is to verify that the servers are loaded with the proper software and data replication is setup and verified to be working properly. Since my website used WordPress and relies heavily on database read and writes it would have to have a load balancer specifically to forward and return those requests. The database server I used as I mentioned earlier was MariaDB setup in a Galera cluster which allows the servers inside the cluster to synchronize data based on the position in a log file. The configuration was simple enough once I had a guide for the setup.
The first step was to define the server clusters. The first cluster was the database cluster and creation weren’t very difficult with Galera. I named the primary server DBMaster and then I created two slave servers: DBslave1 and DBslave2. I named them this way instead of just DBServer1 to DBServer3 due to the possibility of needing a read/write split cluster where you have one server that handles the read and writes requests and all other servers only handle read-only requests. By doing this it is my hope that the cluster wouldn’t go out of sync. Luckily with the Galera software, it was easy to have three masters that didn’t require the read/write split configuration and kept each other in sync virtually instantaneously even running a benchmark of one million reads and one million writes to verify that the system did indeed operate under stress. On a much larger scale of dozens of servers with a large website a read/write split may be best practice so that the synchronization isn’t complicated, but this may require a load balancing proxy that can break down the requests and forward them to the correct servers based on the need for read or write.
The second server cluster was the web servers to provide the web content. Two web servers were setup WebHead1 and WebHead2, their names coming from a how-to on the internet, but I also figured that it works because they are the initial servers hit by a user browsing to the website. Each was configured to handle basic HTTP requests on port 80 while incorporating some performance enhancements such as compression and caching on each server. A fresh copy of WordPress was installed, and an old website was imported to the new servers with the database for WordPress pointing to the database load balancer. Once static files were uploaded to the WebHeads the site was up and running.
4.1 Hardware Used
There was a total of three physical servers: two servers with VMWare ESXI free hypervisor installed and one old crusty laptop that I used as a web server for almost two years. All the hardware was tested for functionality before use and the servers utilized single network cards connected to a 1Gb network. Internet access was provided by my local ISP at 100Mbps downstream and 4Mbps Upstream.
4.2 Virtualized Servers
The virtualized servers were loaded with a very minimal copy of CentOS 7.5 with custom configurations onto the VMWare ESXI Hypervisor at version 6.7.0. The more powerful physical server hosted the web servers and two of the database servers while a secondary server was created on a laptop I purchased last year with an external 2tb hard drive attached for storage. The secondary server handled the primary database server and a free virtualized Kemp load balancer very well considering it was running on USB 3.0 hard drive and a USB 2.0 16Gb Key drive that was running the actual ESXI Hypervisor itself.
4.3 Load Balancers Used
Initially, I tried to use free load balancing software as mentioned above but the configuration for the database servers ended up being too complicated and hard to understand. Someone could write several college courses on the configuration of MaxScale and other load balancing proxies for MariaDB and MySQL and you wouldn’t begin to scratch the surface. In contrast, Apaches’ Mod_Proxy_Balancer was actually very simple to implement for the creation of an HTTP web server cluster.
4.4 Problems found during initial installation
Initially, with the installation and configuration of the Galera cluster, I found that a line in the script “galera_new_cluster” needed to be changed from “return $exitcode” to “exit $exitcode” fixing a bug that prevented the cluster from being created.
I had planned on enabling a Distributed File System to handle synchronizing physical files between the web servers but was unable to get the project working. I had tried a few suggestions and eventually found the Lustre project that allowed a kernel module to be installed onto CentOS and a file system to be created for fairly fast synchronization between the webservers but unfortunately, I failed to complete this step in time. I proceded with testing the clusters by manually synchronizing the files as needed since I wasn’t making any changes to the site aside from the initial setup. I also allowed each WebHead to create its own cached copies of pages using Apache Mod_Cache plugin which requires Apache HTTP to assign specific names for specific files under a specified cache directory. I also made the assumption that each WebHeads individual Apache installation would name the files something that may differ between servers so instead of taking the risk I proceeded as is.
4.5 Initial Results
After everything was set up and working I used an external site; websitetest.org, to test the response time, first-byte time, and to verify the configuration of caching and that compression of all files was enabled properly.
The initial load times with the proxies existing on the old laptop server averaged around 10 seconds for full page load while time to first byte was almost 2 seconds. Time to first byte is primarily defined as how quickly a web server can respond to request for data and is literally the first byte received by the user’s browser. Being at almost 2 seconds is unheard of but I have seen it before when my site was sitting on a hosting provider that didn’t provide proper configuration of the database and Apache web server. After going back and verifying the configurations were correct the average was still high, so I turned my attention to the proxy server I had created to see if any optimization could be done there.
The old laptop I had used for a web server was unfortunately slow with a single processor running at about 1Ghz and 4GB memory, this, however, was not the bottleneck. The bottleneck ended up being the 100Mb network card that couldn’t handle the data transfers and requests between all the different servers. This led me to research into device and virtualized based load balancers and that’s when I found Kemp.
Another metric I used to benchmark the webservers was the Java-based Apache JMeter that has the ability to send requests to the proxy and measure how many requests are answered before failure. With the old laptop, the system could barely handle 50 requests before timeout failures started to show. Unfortunately, this was very disappointing.
4.6 Results with a hardware-based load balancer
The hardware-based load balancer was a lifesaver, it went in easy, was very well documented and came as an OVF file for the VMWare ESXI hypervisor. After installing the virtual machine files, it comes up and asks you for a password and then dials home for a license to be used free. The configuration was simple by just specifying the type of service (HTTP, database, etc.) and the servers and algorithm to be used. The only problem I ran into is with WordPress that requires one specific web and database server to run its administration console through. If you try to run the administration console using the clustered servers it would pop up with errors even though some source configurations were specified in Kemps configuration. I was, unfortunately, unable to solve this problem in time.
After the configuration was completed I re-ran the tests on websitetest.org and was surprised that the time to first byte dropped from 2 seconds down to approximately 0.250 seconds and total page load time was 3 seconds with the first view at 2.4 seconds on average. To add to my excitement the Apache JMeter results came back much higher as well with the ability to serve approximately 5000 requests per minute instead of the abysmal 50 requests that the old laptop proxy setup was providing.
During the testing, I found that the Round Robin Algorithm did truly provide the best performance increases for the HTTP cluster while a Weighted Least Connection seemed to improve database request speeds through WordPress. Luckily with Kemp, it’s simple to switch between the different algorithms with a few clicks of a mouse button.
4.7 Possible bottlenecks with experiment setup
The primary bottleneck with the setup of the ESXI servers could be several. The first being that the two servers only have access to one network card so splitting some servers to secondary or tertiary network connections to provide additional bandwidth wasn’t possible.
Secondly, each server utilized only one standard (non-SSD or SSHD) hard drive each and with the more powerful server having an advanced format drive (4k sectors) it was able to handle more servers. In my estimation though, the hard drive performance was probably maxed for file access a good deal of the time.
Third, the internet connection upstream speed was also a limiter with 4Mbps upstream I was probably limited to about 1Mb upload speeds which would slow down the website load times to users compared to a more business-like upload speed of 100Mbps for a small business connection in our area.
If I had enough time and resources I would have been able to increase the performance of the servers by adding more hard drives to allow each virtual server, its own dedicated hard drive as well as adding additional network cards to allow for more scalability experimenting.
5. Conclusion
The research I have done for this paper has been enlightening even though it has been limited by available resources and monies available for the experimentation on clustering for a web serving environment. I am a little disappointed I ran out of resources to be able to test distributed file systems and load balancing with them and feel that the paper has suffered because of it. I have enjoyed the experience though and have found that the clustering indeed did improve the performance, reliability, and scalability of the website I was using especially once I was able to put into play a proper load balancer between the clusters and the users that had been optimized specifically for that purpose. However, I am certain with enough time I could have made that old laptop work just as efficiently if I had more programming experience to build an interface for the configuration customizations necessary.
Works Cited
Bryhni, H. et al. “A Comparison Of Load Balancing Techniques For Scalable Web Servers”. IEEE Network, vol 14, no. 4, 2000, pp. 58-64. Institute Of Electrical And Electronics Engineers (IEEE), doi:10.1109/65.855480. Accessed 3 Nov 2018.
Karlsson, Anders. “Getting Started With Mariadb Galera And Mariadb Maxscale On Centos | Mariadb”. Mariadb, 2017, https://mariadb.com/resources/blog/getting-started-with-mariadb-galera-and-mariadb-maxscale-on-centos/. Accessed 4 Dec 2018.
LIMONCELLI, THOMAS A. “Are You Load Balancing Wrong?” Communications of the ACM, vol. 60, no. 2, Feb. 2017, pp. 55–57. EBSCOhost, doi:10.1145/3024926.
“Mod_Proxy_Balancer – Apache HTTP Server Version 2.4”. Httpd.Apache.Org, 2018, https://httpd.apache.org/docs/2.4/mod/mod_proxy_balancer.html. Accessed 4 Dec 2018.
Mastorakis, Nikos E. Recent Advances In Computers. WSEAS Press, 2009, pp. 351-356.
Narwal, Abhikriti1, abhikritiin@gmail.co., and Sunita1 Dhingra. “Analytical Review of Load Balancing Techniques in Cloud Computing.” International Journal of Advanced Research in Computer Science, vol. 9, no. 2, Mar. 2018, pp. 550–553. EBSCOhost, doi:10.26483/ijarcs.v9i2.5820.
Paderin, Maksim. ANALYSIS OF SERVER CLUSTERING ITS USES AND IMPLEMENTATION. 1st ed., South Eastern Finland University Of Applied Sciences, 2017, https://www.theseus.fi/bitstream/handle/10024/137384/Maksim_Paderin.pdf. Accessed 4 Dec 2018.
Tamilarasi, S. .., and K. .. Kungumaraj. “Dcwslbq: Dynamic Content Based Web Server Load Balancing Queue Algorithm for Heterogeneous Web Cluster.” International Journal of Advanced Research in Computer Science, vol. 8, no. 8, Sept. 2017, pp. 489–494. EBSCOhost, doi:10.26483/ijarcs.v8i8.4561.