diff --git a/doc/guide/hooks-ha.xml b/doc/guide/hooks-ha.xml index 66ca7621ee..0b0a52df74 100644 --- a/doc/guide/hooks-ha.xml +++ b/doc/guide/hooks-ha.xml @@ -95,6 +95,41 @@ +
+ Clocks on Active Servers + Synchronized clocks are essential for the HA setup to operate + reliably. The servers share lease information via lease updates and + during synchronization of the databases. The lease information includes + the time when the lease has been allocated and when it expires. Some + clock skew between the servers participating the HA setup would usually + exist. This is acceptable as long as the clock skew is relatively low, + comparing to the lease lifetimes. However, if the clock skew becomes too + high, the different notions of time for the lease expiration by different + servers may cause the HA system to malfuction. For example, one server + may consider a valid lease to be expired. As a consequence, the lease + reclamation process may remove a name associated with this lease from + the DNS, even though the lease may later get renewed by a client. + + Each active server monitors the clock skew by comparing its current + time with the time returned by its partner in response to the heartbeat + command. This gives a good approximation of the clock skew, although it + doesn't take into account the time between sending the response by the + partner and receiving this response by the server which sent the + heartbeat command. If the clock skew exceeds 30 seconds, a warning log + message is issued. The administrator may correct this problem by + synchronizing the clocks (e.g. using NTP). The servers should notice + the clock skew correction and stop issuing the warning + + If the clock skew is not corrected and it exceeds 60 seconds, the + HA service on each of the servers is terminated, i.e. the state + machine enters the terminated state. The servers + will continue to respond to the DHCP clients (as in the load-balancing + or hot-standby mode), but will neither exchange lease updates nor + heartbeats and their lease databases will diverge. In this case, the + administrator should synchronize the clocks and restart the servers. + +
+
Server States The DHCP server operating within an HA setup runs a state machine @@ -167,6 +202,26 @@ answer from the partner and is not doing anything else while the leases synchronization takes place. + terminated - an active server + transitions to this state when the High Availability hooks library + is unable to further provide reliable service and a manual + intervention of the administrator is required to correct the problem. + It is envisaged that various issues with the HA setup may cause the + server to transition to this state in the future. As of Kea 1.4.0 + release, the only issue causing the HA service to terminate is + unacceptably high clock skew between the active servers, i.e. if the + clocks on respective servers are more than 60 seconds apart. + While in this state, the server will continue responding to the + DHCP clients based on the HA mode selected (load balancing or + hot standby), but the lease updates won't be exchanged and the + heartbeats won't be sent. Once a server has entered the + "terminated" state it will remain in this state until it is + restarted. The administrator must correct the issue which caused + this situation prior to restarting the server (e.g. synchronize clocks). + Otherwise, the server will return to the "terminated" state as + soon as it finds that the clock skew is still too high. + + waiting - each started server instance enters this state. The backup server will transition directly from this state to the backup state. @@ -179,9 +234,16 @@ synchronize first. The secondary or standby server will remain in the waiting state until the primary synchronizes the database.. - + - Whether the server responds to the DHCP queries and which + + Currently, restarting the HA service being in the + terminated state requires restarting the + DHCP server or reloading its configuration. In the future, we will + provide a command to restart the HA service. + + + Whether the server responds to the DHCP queries and which queries it responds to is a matter of the server's state, if no administrative action is performed to configure the server otherwise. The following table provides the default behavior for @@ -245,6 +307,12 @@ disabled none + + terminated + active server + enabled + same as in the load-balancing or hot-standby state + waiting any server