From 8a3e8ac701b8df76428c23bcc64d1bdbc4e8dedf Mon Sep 17 00:00:00 2001 From: Marcin Siodelski Date: Thu, 7 Nov 2019 17:36:50 +0100 Subject: [PATCH] [#998,!588] Added section about ha-heartbeat --- doc/sphinx/arm/hooks-ha.rst | 87 +++++++++++++++++++++++++++++++++++-- 1 file changed, 84 insertions(+), 3 deletions(-) diff --git a/doc/sphinx/arm/hooks-ha.rst b/doc/sphinx/arm/hooks-ha.rst index eda75d4fb9..92c16247ed 100644 --- a/doc/sphinx/arm/hooks-ha.rst +++ b/doc/sphinx/arm/hooks-ha.rst @@ -130,8 +130,6 @@ clocks and restart the servers. Server States ~~~~~~~~~~~~~ -.. _command-ha-heartbeat: - A DHCP server operating within an HA setup runs a state machine, and the state of the server can be retrieved by its peers using the ``ha-heartbeat`` command sent over the RESTful API. If the partner @@ -1216,5 +1214,88 @@ command structure is as simple as: :: { - "command": "ha-continue" + "command": "ha-continue", + "service": [ "dhcp4" ] } + + +.. _command-ha-heartbeat: + +The ha-heartbeat Command +------------------------ + +The :ref:`ha-server-states` describes how the ``ha-heartbeat`` command is used by +the active HA servers to detect a failure of one of them. This command, however, +can also be sent by the system administrator to one or both servers to check their +state with regards to the HA relationship. This allows for hooking up a monitoring +system to the HA enabled servers to periodically check if they are operational +or if any manual intervention is required. The ``ha-heartbeat`` command takes no +arguments, e.g.: + +:: + + { + "command": "ha-heartbeat", + "service": [ "dhcp4" ] + } + +Upon successful communication with the server a response similar to this should +be returned: + +:: + + { + "result": 0, + "text": "HA peer status returned.", + "arguments": + { + "state": "partner-down", + "date-time": "Thu, 07 Nov 2019 08:49:37 GMT" + } + } + +The returned state value may be one of the values listed in :ref:`ha-server-states`. +In the example above the ``partner-down`` state is returned, which indicates that +the server which responded to the command is assuming that its partner is offline, +thus it is serving all DHCP requests sent to the servers. In order to ensure that +the partner is indeed offline the administrator should send the ``ha-heartbeat`` +command to the second server. If sending the command fails, e.g. as a result of +inability to establish TCP connection to the Control Agent or the Control Agent +reports issues with communication with the DHCP server, it is very likely that +the server is not running. + +The typical response returned by one of the servers when both servers are +operational is: + +:: + + { + "result": 0, + "text": "HA peer status returned.", + "arguments": + { + "state": "load-balancing", + "date-time": "Thu, 07 Nov 2019 08:49:37 GMT" + } + } + +In most cases it is desired to send the ``ha-heartbeat`` command to both HA +enabled servers to verify the state of the entire HA setup. In particular, +if the response sent to one of the servers indicates that the server is in the +``load-balancing`` state, it merely means that this server is operating as if +the partner is still functional. When the partner dies it actually takes some +time for the surviving server to realize it. The :ref:`ha-scope-transition` +section describes the algorithm which the surviving server follows before +it transitions to the ``partner-down`` state. If the ``ha-heartbeat`` command +is sent during the time window between the failure of one of the servers and the +transition of the surviving server to the ``partner-down`` state, the response +from the surviving server doesn't reflect the failure. Sending the command +to the failing server allows for detecting the failure. + +.. note:: + + Remember! Always send the ``ha-heartbeat`` command to both active HA servers + to check the state of the entire HA setup. Sending it to only one of the + servers may not reflect issues with one of the servers that just began. + +