Set Up Load Balancing using Nginx web server

Load balancing is a way to efficiently distribute the incoming network traffic across multiple server instances. For Load balancing, we host the same application in multiple server instances configured at different data centers/regions. This helps us improve the application availability, reliability and scalability. It ensures that no particular server instance is over-loaded with network traffic at any given time and it is one of the critical component in achieving the horizontal scalability in software architectures.

We can use NGINX as a very efficient HTTP load balancer to distribute our client requests across multiple application servers. For implementing the load balancer, we need at least two server instances. In our previous chapter, we discussed about the enhancements in the cloud deployment architecture. There, we can see, same Node.js application is hosted in multiple server instances, located at Availability Zone A and B in us-west-2 region. We already have one of the application server ready in us-west-2 region and Availability Zone A.

Let's create another application server which will be hosted in us-west-2 region and Availability Zone B. Please refer to the following link for creating and configuring the new lightsail instances.

Multiple application servers hosted in Availability Zone A and B

Now that the server instance is created, we also need to run the Node.js application in that server instance and for that, please refer to the following link:

Both of our backend application servers are up and running as seen in image below. We can now proceed ahead to implement load balancing using nginx.

Multiple application servers up and running

Open-source version of NGINX supports following load balancing mechanisms:

  • Round-Robin

    Traffic is distributed evenly across the group of servers, with server weights taken into consideration. It is used by default. It doesn't consider for current load on the server.
  • Least Connections

    Traffic is sent to the server with least number of active connections, with server weights taken into consideration as well. To use this load balancing method, we need to specify least_conn directive in the upstream block. Since it sends the traffic to the server with least active connections, the performance it delivers is usually better than round-robin.
  • Ip Hash

    In this mechanism, traffic is sent to the servers based on the client Ip address. Requests from the same IP address are guaranteed to go to the same server unless it's not available. To use this load balancing method, we need to specify ip_hash directive in the upstream block.
  • Generic Hash

    In this mechanism, traffic is sent to the servers based on the user-defined key which can be a text string, variable, or a combination. The optional consistent parameter to the hash directive enables ketama consistent-hash load balancing. To use this load balancing method, we need to specify hash directive with $request_uri parameter in the upstream block.

To implement load balancing using nginx, we need to implement upstream directive in a nginx configuration file. upstream module is used to define groups of servers to which a client request will be proxied. Let's create a new configuration file called upstream.conf file.  First connect to the web server instance using SSH command.

   	cd /etc/nginx

   	touch upstream.conf && vi upstream.conf

If you are getting permission denied issue, run the following command:

   	sudo chown ec2-user:ec2-user -R /etc/nginx

Replace ec2-user with the current user. To find the current user, issue the following command:


For our scenario, syntax for upstream directive looks like below:

   	 upstream UPSTREAM_NAME {
        server PRIVATE_IP_APP_SERVER_A:3000;
        server PRIVATE_IP_APP_SERVER_B:3000;

UPSTREAM_NAME is the name of the upstream servers. We can give any meaningful name to it - say  backend, which we need to call, when implementing proxy_pass directive in a location block. Replace PRIVATE_IP_APP_SERVER_A with private ip address of application server hosted in Availability Zone A and PRIVATE_IP_APP_SERVER_B with private ip address of application server hosted in Availability Zone B. We used server directive to define the address and other parameters of a server. The address can be identified using IP address or  domain name with an optional port parameter or a UNIX socket path having prefix "unix:". 3000 is the port number on which the Node.js application is listening for client requests. By default, port 80 is used, if a port number is not specified in a server address. Now the upstream.conf file looks like below:

   	 upstream backend {

We need to import this upstream.conf file into the main nginx configuration file(nginx.conf) using include directive, inside of http block. We need to import it before we start importing site configuration files from conf.d directory, as we need to use this upstream in the site configuration files. Now the nginx.conf file looks like below:

   	user nginx;
	worker_processes  auto;

	error_log   /var/log/nginx/error.log;
	error_log   /var/log/nginx/error_debug.log debug;
	error_log   /var/log/nginx/error_extreme.log emerg;
	error_log   /var/log/nginx/error_critical.log crit;

	pid        /var/run/;

	events {
	    worker_connections  65535;

	http {
	    include       /etc/nginx/mime.types;
	    default_type  application/octet-stream;
	    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
	                      '$status $body_bytes_sent "$http_referer" '
	                      '"$http_user_agent" "$http_x_forwarded_for"';
	    access_log  /var/log/nginx/access.log  main;
	    sendfile        on;
	    keepalive_timeout  65;
	    include /etc/nginx/upstream.conf;
	    include /etc/nginx/conf.d/*.conf;

Let's use the above upstream block named backend to proxy the api requests to our Node.js applications. In default.conf file, we use following location block for sending api requests to the backend server, since, at the time, we only have single  application server.

   	location /api/ {

Now that, we have implemented multiple server instances of same application in different availability zones, we need to modify the above proxy pass directive in the following way:

   	location /api/ {
		proxy_pass      http://backend;

Now the default.conf file looks like below:

   	server {
	    listen 80 default_server;
	    listen [::]:80 default_server;
	    server_name  localhost;
	    location / {
	        root   /usr/share/nginx/html;
	        index  index.html index.htm;

	    location /api/ {
	        proxy_pass      http://backend;

	    location ~ /\. {
	        deny all;

	    error_page   500 502 503 504  /50x.html;
	    location = /50x.html {
	        root   /usr/share/nginx/html;

Reload the nginx configuration using following command:

   	sudo systemctl reload nginx

Now that the NGINX configuration changes are reloaded with load balancing implemented, let's call the api endpoints to see the changes. Let's call the hotel api GET endpoint 3 times and observe the distribution of the traffic.

Postman get api requests to test nginx load balancing

We can see 2 requests going to the server in the left, say  server-left and a 1 request going to the server in the right, say server-right. The nginx will sequentially distribute the network traffic across available servers. Let's again call the hotel GET api endpoint to see this uniform distribution:

Postman get api requests to test nginx load balancing even

As we can see, both server-left and server-right receives equal number of network traffic. Now, let's create a hotel using POST api endpoint:

Postman post api request create hotel

Form the above logs, we can see that, hotel is created and the request is handled by server-left. Let's get the list of hotels again.

Postman get api request with no dataPostman get api request with no data log

This time, our request goes to server-right and if you remember, we store our application data in the server memory which is bound to that specific server instance only. So, if the requests goes to that particular server instance where hotel create api endpoint is executed, it will return the hotel data, else it will return an empty array, since we have not executed the hotel create api in other servers.

Let's again call the hotel GET api endpoint. Since Nginx with Round Robin load balancing method distributes the network traffic sequentially, now it's the server-left turn to process the client requests. This time, we can see it returns the list of hotels.

Postman get api request with data log

Postman get api request with data

This will create a major issue in any software architecture and it will prevent us from implementing horizontal scaling approaches. We can solve this by implementing the sticky session. With sticky sessions, a load balancer will always redirect the traffic to the same server during the entirety of the session. Sticky sessions can be achieved by using IP Hash and Generic Hash load balancing mechanism. Let's try with IP hash method using ip_hash parameter.

   	upstream backend {

Let's try with Generic hash method.

   	upstream backend {
		hash $request_uri consistent;

If you now make api requests,  you will see that the api request will always go to the same server during the entirety of the session.

Sometimes, we want to deploy our new changes to the production servers. We can follow multiple processes to deploy the changes to the server. One of the process is, we can deploy the changes to all the server instances at once. It works but for a small duration of time, our application will be unavailable thereby impacting customer's requests. To solve this, we can implement rolling deployment mechanism in which each server is taken offline, deploy the latest changes and then bringing it back online again. Then we continue the same process with other servers. This will help us achieve Zero Downtime deployment. To do this, we can use down parameter in server directive inside of upstream block.

   	 upstream backend {
        server down;

When we specify the down parameter in the server, the load balancer treats that server as offline or not available, thereby redirecting all the traffic to other healthy servers, which in our case is server with address =

Let's us assume, we have two servers of varying capacity - say server 1 is twice as much powerful as server 2. In this scenario, server 1 can handle a lot more traffic than server 2 and, we obviously want server 1 to handle more traffic than server2. We can achieve this by using weight parameter with different values.

   	 upstream backend {
        server weight=2;
        server weight=1;

So, here out of 3 requests, 2 will go to the first server and 1 will go to the second server.

If we want the traffic to be sent to the server with least number of active connections, we can implement it in following way:

   	 upstream backend {

NGINX load balancer performs passive server health checks. It will only send requests to the healthy server instances. The server directive accepts two more parameters for health check purpose - max_fails and fail_timeout.

max_fails sets the number of unsuccessful attempts to communicate with the server within time set by fail_timeout parameter. If both the condition matches, then server is considered  unhealthy and will be marked as failed for the duration of time set by fail_timeout parameter. By default, max_fails is set to 1 and fail_timeout is set to 10 seconds. If you want to modify these parameters, then it can be done in the following way:

   	 upstream backend {
        server max_fails=3 fail_timeout=20s;
        server max_fails=2;


If an error occurs while communicating with one of the upstream server, then request will be passed to the next server and so on, until all of the available servers will be tried.

You can see this by taking one of the application server down. In the server instance, where application is no longer listening for requests, run the following command to track the requests:

   	sudo tcpdump -i eth0 tcp port 3000
Testing health check implementation

So, from above image, we can see that even though the Node.js application hosted in server-right is not active, the requests keep on coming for the number of times set by max_fails which is 3 and after that, requests is no longer passed to that server, as it is marked unhealthy.

In our next chapter, we will install and configure MySQL server for our travel application.

Prev Chapter                                                                                          Next Chapter