Home   Profile   Fun
#138 Linux  03.06.2007

Setting up a load balancing cluster with director failover based on Keepalived, LVS and Gentoo


This article describes the setup of a clustered HTTP and HTTPS service with two directors and two real servers (I use the words load balancer and director synonymously). On the directors LVS is used for TCP and UDP load balancing. Additionally the directors are used as firewalls. The underlying operating system is Gentoo.
Note that director failover is not enough to realise high availability. You need redundant cables, switches, uplinks etc. HA is beyond the scope of this article.

LVS is the appreviation for the Linux Virtual Server project. LVS offers several ways to set up the routing within the load balancing cluster. In this example I will describe the easiest set up which is NAT. Alternatively you can use direct routing or ip tunneling. Direct routing should be used in clusters which have a very large number of real servers (those machines which provide the actual service, e.g. webservers). As soon as the number of real servers increases the directors tend to become the bottleneck of the cluster if NAT is used. This is because the traffic parses the directors in both directions. With direct routing only the requests go through the directors, the answers bypass them and go directly to the client. Ip tunneling is the third option and must be used if the cluster consists of physically seperated real servers, e.g. if the real servers are located in different data centers. In the two latter cases the network configuration is much more complicated and as far as I know there is a tendency to NAT.

For our cluster we need
* Health checks for the directors and the real servers
* Automatic failover of the directors
* Routing to the real servers with rr (round robin) or wrr (weighted round robin)

In short Keepalived is used for the health checks of the load balancers and real servers. At the same time Keepalived configures LVS.
LVS itself is used for NAT. This is achieved by the following operations on the directors
Activation of LVS in the Linux Kernel
Installation and configuration of Keepalived
Optional: installation of ipvsadm
Installation and configuration of iptables

Click on the small image for an overview about the IP addresses used in the cluster.

LVS cluster overview

Load balancer 1 eth0: 
 192.168.0.174 VIP (virtual IP) is the main cluster IP which is used by the clients.
 192.168.0.204 external IP, used to login directly to this machine.
Load balancer 1  eth1: 
 192.168.170.179 internal dummy IP, Keepalived needs an IP binded on every NIC it uses.
 192.168.170.1 DIP (director IP) is the gateway for the real servers.

Load balancer 2 eth0:
 [ 192.168.0.174 ] VIP (virtual IP) is the main cluster IP which is used by the clients.
 192.168.0.176 external IP, used to login directly to this machine.
Load balancer 2 eth1: 
 [ 192.168.170.1 ] DIP (director IP) is the gateway for the real servers.]
 192.168.170.180 internal dummy IP, Keepalived needs an IP binded on every NIC it uses.
 
Real server 1 eth0:
 192.168.170.25 RIP (real IP) accessible only from the master load balancer

Real server 2 eth0:
 192.168.170.35 RIP (real IP) accessible only from the master load balancer

The VIP and DIP is binded on the master load balancer.


As mentioned before Keepalived needs at least one IP already binded to the NICs. If you start Keepalived without any IP binded to eth1 for example you can see the following messages in the logs (/var/log/messages):

Oct 19 11:09:16 director01 Keepalived: Starting VRRP child process, pid=5576 
Oct 19 11:09:16 director01 Keepalived_vrrp: cant do IP_ADD_MEMBERSHIP errno=No such device (19) 
Oct 19 11:09:16 director01 Keepalived_vrrp: cant bind to device eth1. errno=9. (try to run it as root) 
Oct 19 11:09:19 director01 Keepalived_vrrp: VRRP_Instance(VI_INTERNAL) Transition to MASTER STATE



Activation of LVS in the Linux Kernel
The first step is to login to the directors and activate LVS with TCP and UDP load balancing in the Kernel. To be flexible all schedulers should be included as modules. The most important ones are round robin and weighted round robin.

This is the path to the Kernel LVS configuration:
Symbol: IP_VS_PROTO_TCP [=y] 
Prompt: TCP load balancing support 
Defined at net/ipv4/ipvs/Kconfig:66 
Depends on: NET && INET && NETFILTER && IP_VS 
Location: 
 -> Networking 
 -> Networking support (NET [=y]) 
 -> Networking options 
 -> TCP/IP networking (INET [=y]) 
 -> IP: Virtual Server Configuration 
 -> IP virtual server support (EXPERIMENTAL) (IP_VS [=m]) 

In order for LVS to work some Kernel parameters must be set. This can be done temporarily from the shell
echo 1 > /proc/sys/net/ipv4/ip_forward 
echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects 
echo 0 > /proc/sys/net/ipv4/conf/default/send_redirects 
echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects 

or permanently by editing /etc/sysctl
net.ipv4.ip_forward = 1 
net.ipv4.conf.all.send_redirects = 0 
net.ipv4.conf.default.send_redirects = 0 
net.ipv4.conf.eth0.send_redirects = 0 

Additionally we activate ip forwarding in /etc/conf.d/iptables:
ENABLE_FORWARDING_IPv4="yes"

To load all the LVS Kernel modules at boot time the file /etc/modules.autoload.d/kernel-2.6 needs these lines:
#lvs 
ip_vs 
ip_vs_rr 
ip_vs_wrr 
ip_vs_lc 
ip_vs_wlc 
ip_vs_lblc 
ip_vs_lblcr 
ip_vs_dh 
ip_vs_sh 
ip_vs_sed
ip_vs_nq 
ip_vs_ftp



Installation and configuration of Keepalived
Now that LVS is activated in the Kernel it's time to install and configure Keepalived.
emerge -va keepalived
vi /etc/keepalived/keepalived.conf

! Configuration file for Keepalived on DIRECTOR01
! First of all some global parameters
global_defs { 
  notification_email { admin@mydomain } 
  notification_email_from director01@mydomain
  smtp_server 192.168.0.15 
  smtp_connect_timeout 30 
  ! UID for this director
  router_id DIRECTOR01
} 

! This part assiociates two NICs to a virtual sync group: the external and the internal network interface. 
! It tells Keepalived to move both IPs (VIP 192.168.0.174 and DIP 192.168.170.1) to the other director in case of a failover.
vrrp_sync_group VG1 { 
  group { 
    ! eth1
    VI_INTERNAL
    ! eth0
    VI_EXTERNAL 
  }
  ! In case of a state change to master (recovery on the master, failover on the slave) 
  ! the following script sends an arp-reply and informs the router in front of our load balancers 
  ! about the new mac address for 192.168.0.174:
  ! #/bin/bash 
  ! arping -q -c 20 -A -I eth0 192.168.0.174 
  ! exit 
  !
  ! "tcpdump -i eth0 arp" should give you something like
  ! 11:07:25.835839 arp reply 192.168.0.174 is-at 00:40:f4:ec:55:fd (oui Unknown)
  !
  notify_master /home/keepalived/scripts/keepalived_to_master.sh 
}

! For every NIC we create an instance.
! Both have the state MASTER because we are on the master director
! This state is determined by the priority! If you start an instance of Keepalived with the state MASTER but
! with a lower priority than the instance of the second director both will change their state.

! This instance uses eth0 and the VIP of the cluster.
vrrp_instance VI_EXTERNAL { 
  state MASTER 
  interface eth0
  virtual_router_id 41 
  priority 100 
  advert_int 5 
  authentication { 
    auth_type PASS
    auth_pass 1111 
  } 
  virtual_ipaddress {
    192.168.0.174 
  } 
} 


! This instance uses eth1 and the DIP for the real servers.
vrrp_instance VI_INTERNAL {
  state MASTER 
  interface eth1
  virtual_router_id 42
  priority 100 
  advert_int 5 
  authentication { 
    auth_type PASS 
    auth_pass 1111 
  }
  virtual_ipaddress { 
    192.168.170.1 
  } 
}

! Now we use the VIP from VI_EXTERNAL to define a virtual server which includes several real servers.
! A virtual server is defined through the VIP of the cluster and the port of the virtual service.
! We use NAT and the protocol TCP.
virtual_server 192.168.0.174 80 { 
  delay_loop 6 
  lb_algo rr 
  lb_kind NAT
  nat_mask 255.255.255.0 
  protocol TCP 
  virtualhost test.mydomain 

  ! First real server with a simple TCP health check 
  real_server 192.168.170.25 80 {
    weight 1 
    TCP_CHECK {
      connect_timeout 3 
      connect_port 80 
    }
 } 

  ! Second real server with an exact TCP check. See genhash below.
  real_server 192.168.170.35 80 {
    weight 1 
    HTTP_GET { 
    url { 
      path /
      digest f53ded420ab7ac960bbf7d6f11115f39 
    } 
    connect_timeout 10 
    connect_port 80 
    nb_get_retry 3
    delay_before_retry 10 
  } 
} 


!If you want to make sure that a client connects to the same real server you can
!use a persistent connection. This virtual server illustrates it for HTTPS.
!For a maximum of one hour a client is routed to the same real server.
virtual_server 192.168.0.174 443 { 
  delay_loop 10 
  lb_algo rr 
  lb_kind NAT
  nat_mask 255.255.255.0 
  protocol TCP 
  persistence_timeout 3600 
  virtualhost test.mydomain

  real_server 192.168.170.35 443 { 
    weight 1
    SSL_GET { 
      url {
        path / 
        digest f53ded420ab7ac960bbf7d6f11115f39 
      } 
      connect_timeout 10 
      connect_port 443 
      nb_get_retry 3 
      delay_before_retry 10 
    }
  }
} 

It is not necessary to define a virtual server for VI_INTERNAL as we are doing NAT only on this IP.


The same has to be done on the second director. The Keepalived configuration files on both directors differ only in three parameters: the state, the router_id and the priority. The difference of the priorities should be 50 at least.
global_defs { 
  notification_email { admin@mydomain } 
  notification_email_from director01@mydomain
  smtp_server 192.168.0.15 
  smtp_connect_timeout 30 
  router_id DIRECTOR02
} 

vrrp_sync_group VG1 { 
  group { 
    ! eth1
    VI_INTERNAL
    ! eth0
    VI_EXTERNAL 
  }
  notify_master /home/keepalived/scripts/keepalived_to_master.sh 
}

vrrp_instance VI_EXTERNAL { 
  state BACKUP
  interface eth0
  virtual_router_id 41 
  priority 50 
  advert_int 5 
  authentication { 
    auth_type PASS
    auth_pass 1111 
  } 
  virtual_ipaddress {
    192.168.0.174 
  } 
} 


vrrp_instance VI_INTERNAL {
  state BACKUP
  interface eth1
  virtual_router_id 42
  priority 50 
  advert_int 5 
  authentication { 
    auth_type PASS 
    auth_pass 1111 
  }
  virtual_ipaddress { 
    192.168.170.1 
  } 
}

virtual_server 192.168.0.174 80 { 
  delay_loop 6 
  lb_algo rr 
  lb_kind NAT
  nat_mask 255.255.255.0 
  protocol TCP 
  virtualhost test.mydomain 

  real_server 192.168.170.25 80 {
    weight 1 
    TCP_CHECK {
      connect_timeout 3 
      connect_port 80 
    }
  } 

  real_server 192.168.170.35 80 {
    weight 1 
    HTTP_GET { 
    url { 
      path /
      digest f53ded420ab7ac960bbf7d6f11115f39 
    } 
    connect_timeout 10 
    connect_port 80 
    nb_get_retry 3
    delay_before_retry 10 
  } 
} 


virtual_server 192.168.0.174 443 { 
  delay_loop 10 
  lb_algo rr 
  lb_kind NAT
  nat_mask 255.255.255.0 
  protocol TCP 
  persistence_timeout 3600 
  virtualhost test.mydomain

  real_server 192.168.170.35 443 { 
    weight 1
    SSL_GET { 
      url {
        path / 
        digest f53ded420ab7ac960bbf7d6f11115f39 
      } 
      connect_timeout 10 
      connect_port 443 
      nb_get_retry 3 
      delay_before_retry 10 
    }
  }
} 


As soon as Keepalived is started on the master it creates the virtual IP addresses we have defined in keepalived.conf.
# ip addr list 
1: lo: <LOOPBACK,UP,10000> mtu 16436 qdisc noqueue 
   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 
   inet 127.0.0.1/8 brd 127.255.255.255 scope host lo 
2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000 
   link/ether aa:00:3f:2e:85:d9 brd ff:ff:ff:ff:ff:ff 
   inet 192.168.0.204/24 brd 192.168.0.255 scope global eth0 
   inet 192.168.0.174/32 scope global eth0 
3: eth1: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000 
   link/ether aa:00:3f:2e:85:d0 brd ff:ff:ff:ff:ff:ff 
   inet 192.168.170.179/24 brd 192.168.170.255 scope global 
   eth1 inet 192.168.170.1/32 scope global eth1 

On the slave it should look like this, the VIP and DIP are not binded because Keepalived runs in backup state.
# ip addr list 
1: lo: <LOOPBACK,UP,10000> mtu 16436 qdisc noqueue 
   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 
   inet 127.0.0.1/8 brd 127.255.255.255 scope host lo 
2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000 
   link/ether aa:00:3f:2e:86:d9 brd ff:ff:ff:ff:ff:ff
   inet 192.168.0.176/24 brd 192.168.0.255 scope global eth0 
3: eth1: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
   link/ether aa:00:3f:2e:87:d0 brd ff:ff:ff:ff:ff:ff 
   inet 192.168.170.180/24 brd 192.168.170.255 scope global eth1 

It is important that the instances of Keepalived on both directors are running at the same time. Otherwise failover will not work. Master and slave communicate with each other and decide who is master and who is slave. In case of an outage of the master the slave just changes its status to master. These services are the core of the cluster and must be monitored accordingly!



Optional: installation of ipvsadm
Optionally the program ipvsadm can be installed. It is a frontend for LVS. With ipvsadm you get a nice overview about the status of the cluster, its connections and which real servers are currently used.

First we unmask ipvsadm in package.keywords:
sys-cluster/ipvsadm ~amd64 

If ipvsadm is started without a running instance of Keepalived it looks like that:
# ipvsadm 
IP Virtual Server version 1.2.1 (size=4096) 
Prot LocalAddress:Port Scheduler Flags 
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn 

No real server is part of the cluster. As soon as Keepalived is started the output changes. Two real servers are part of the cluster. In this example the health check for the server with HTTPS failed, thus it is removed from the cluster. When the HTTPS server is up and running it is added to the cluster automatically. In the same way each server for which the health check fails is removed automatically.
# ipvsadm 
IP Virtual Server version 1.2.1 (size=4096) 
Prot LocalAddress:Port Scheduler Flags 
-> RemoteAddress:Port Forward Weight ActiveConn InActConn 
TCP 192.168.0.174:https rr persistent 3600 
TCP 192.168.0.174:http rr 
-> 192.168.170.35:http Masq 1 0 0 
-> 192.168.170.25:http Masq 1 0 0 

Show all current connections of the cluster:
ipvsadm -lcn 

Instead of using the configuration of Keepalived you can create virtual services directly with ipvsadm. For example use the following commands to create a virtual service for SSH with round robin scheduling. Then add a real server to this service and use masquerading. This might be useful for testing where you don't want to touch the Keepalived configuration.
ipvsadm -A -t 192.168.0.174:22 -s rr
ipvsadm -a -t 192.168.0.174:22 -r 192.168.170.10:22 -m


Another important thing is to configure the real servers properly. Otherwise the routing will not work. All real servers must use the the DIP as the only gateway. The routing table must look like this:
# route -n 
127.0.0.1	0.0.0.0		255.0.0.0	...	lo 
0.0.0.0 	192.168.170.1 	0.0.0.0 	... 	eth0 

This means that the real servers are not accessible from outside of the internal cluster network. But if SSH is running on them you can log into a specific real servers by first logging into the master load balancer. A better way is the use of OpenVPN for example.

Now if you type test.mydomain (resolved to 192.168.0.174) into the browser you should see the webpage provided by the cluster.

All information needed about Keepalived is written to /var/log/messages. When Keepalived changes its state it can be seen here. If a real server is down or comes up again it can be seen here as well. Additionally a mail is send to the admin each time when a real server is added to or removed from the cluster.



Installation and configuration of iptables
The firewalls on the real servers usually don't need to be changed. But on the directors some additional iptables rules are necessary. Keepalived provides two mechanisms for authentication. The first one is PASS which is used here. This method is stable but very unsafe. The second method is AH which is more secure but is not working stable. So some care must be taken to isolate the directors from the rest of the world (VRRP). This is done through the firewalls.
# DIRECTOR01
# LVS NAT 
$IPTABLES -A FORWARD -i eth1 -s 192.168.170.0/255.255.255.0 -j ACCEPT 
$IPTABLES -A FORWARD -i eth0 -d 192.168.170.0/255.255.255.0 -j ACCEPT 
$IPTABLES -t nat -A POSTROUTING -o eth0 -j MASQUERADE 

# LVS VRRP, syncing between load balancers
$IPTABLES -A INPUT -i eth0 -p udp --dport 694 -s 192.168.0.176 -j ACCEPT 

# Multicast must be allowed
$IPTABLES -I INPUT -i eth0 -s 192.168.0.176 -d 224.0.0.0/4 -j ACCEPT 

# SSH
$IPTABLES -I INPUT -i eth0 -d 192.168.0.174 -p tcp --dport 22 -j ACCEPT 

# HTTP 
$IPTABLES -I INPUT -i eth0 -d 192.168.0.174 -p tcp --dport 80 -j ACCEPT 
# HTTPS
$IPTABLES -I INPUT -i eth0 -d 192.168.0.174 -p tcp --dport 443 -j ACCEPT 

# These rules are needed for a recovery of the master after a failover
$IPTABLES -A FORWARD -i eth0 -o eth0 -d 192.168.0.174 -p tcp --dport 22 -j ACCEPT 
$IPTABLES -A FORWARD -i eth0 -o eth0 -d 192.168.0.174 -p tcp --dport 80 -j ACCEPT 
$IPTABLES -A FORWARD -i eth0 -o eth0 -d 192.168.0.174 -p tcp --dport 443 -j ACCEPT 

# Reply of the mailserver
$IPTABLES -A INPUT -i eth0 -s 192.168.0.15 -d 192.168.0.204 -p tcp --sport 25 -j ACCEPT 

# DIRECTOR02
# LVS NAT
$IPTABLES -A FORWARD -i eth1 -s 192.168.170.0/255.255.255.0 -j ACCEPT 
$IPTABLES -A FORWARD -i eth0 -d 192.168.170.0/255.255.255.0 -j ACCEPT 
$IPTABLES -t nat -A POSTROUTING -o eth0 -j MASQUERADE 

# LVS VRRP, syncing between load balancers
$IPTABLES -A INPUT -i eth0 -p udp --dport 694 -s 192.168.0.204 -j ACCEPT 

# Multicast must be allowed
$IPTABLES -I INPUT -i eth0 -s $192.168.0.204 -d 224.0.0.0/4 -j ACCEPT 

# SSH 
$IPTABLES -I INPUT -i eth0 -d 192.168.0.174 -p tcp --dport 22 -j ACCEPT 

# HTTP
$IPTABLES -I INPUT -i eth0 -d 192.168.0.174 -p tcp --dport 80 -j ACCEPT 
# HTTPS
$IPTABLES -I INPUT -i eth0 -d 192.168.0.174 -p tcp --dport 443 -j ACCEPT 

# These rules are needed for a recovery of the master after a failover
$IPTABLES -A FORWARD -i eth0 -o eth0 -d 192.168.0.174 -p tcp --dport 22 -j ACCEPT 
$IPTABLES -A FORWARD -i eth0 -o eth0 -d 192.168.0.174 -p tcp --dport 80 -j ACCEPT 
$IPTABLES -A FORWARD -i eth0 -o eth0 -d 192.168.0.174 -p tcp --dport 443 -j ACCEPT 

# Reply of the mailserver
$IPTABLES -A INPUT -i eth0 -s 192.168.0.15 -d 192.168.0.176 -p tcp --sport 25 -j ACCEPT

If something is not working check on the master director:
route -n
ip addr list 
ipvsadm 
tail /var/log/messages 
Iptables logs
/etc/init.d/keepalived --debug start 

On the real servers check these things:
netstat -tan
ip addr list
route -n

A symptom for communication problems between the directors is the following record in /var/log/messages:
Oct 31 12:57:50 director01 Keepalived_vrrp: VRRP_Instance(VI_EXTERNAL) Received lower prio advert, forcing new election 
In this case especially check all involved IP addresses.



For the real_server 192.168.170.35 (HTTP and HTTPS) we used an exact health check. This means the md5sum hash of the correct content of the page is compared to the md5sum hash generated during the health check. Such a check can be created with the help of the genhash program. First of all we make sure that the page is delivered correctly. Then we create the hash. The last parameter of the following command is the path, something like /index.html is possible as well.
# genhash -s 192.168.170.35 -p 80 -u / 
MD5SUM = f53ded420ab7ac960bbf7d6f11115f39 

The result is used within the Keepalived configuration:
HTTP_GET { 
  url { 
    path / 
    digest f53ded420ab7ac960bbf7d6f11115f39 
  } 
  connect_timeout 10 
  connect_port 80 
  nb_get_retry 3 
  delay_before_retry 10 
} 



I wanted to see how the directors behave in case of a network outage. So I removed some of the network cables from both directors.
Master:
Removed both cables at the same time which resulted in a director failover, cluster service ok.
Removed external and internal cable separately: slave is permantely changing its state, cluster service down!
Something similar to STONITH should we have here. I did not figure out yet how to do this. Maybe it can be realised with the notify_fault parameter. Another way is to make the failure of a single connection very unlikely, through link aggregation for example.
Slave:
Removed both cables at the same time, no failover, cluster service ok
Removed external and internal cable separately: no failover, cluster service ok

As you see here the failover works only if the whole master is offline. If just a single network connection on the master breaks then the cluster service is offline! Director failover alone is still far away from real HA. High availability has no single point of failure any more, redudant cables, switches, uplinks etc.

For a first load test of the cluster you may put a machine in front of the load balancers and run ab2. The ab2 program is part of the Apache2 package. The following command creates 300 requests. A maximum of 30 requests is run in parallel.
ab2 -c 30 -n 300

As we have configured the same weight for all real servers ipvsadm should show an equal number of connections for every real server.