mirror of
https://gitlab.isc.org/isc-projects/kea
synced 2025-08-22 18:08:16 +00:00
2371 lines
108 KiB
ReStructuredText
2371 lines
108 KiB
ReStructuredText
.. _hooks-high-availability:
|
|
|
|
``ha``: High Availability Outage Resilience for Kea Servers
|
|
===========================================================
|
|
|
|
This hook library can be loaded on a pair of DHCPv4 or DHCPv6 servers, to
|
|
increase the reliability of the DHCP service in the event of an outage on one
|
|
server. This library was previously only available to ISC's paid subscribers,
|
|
but is now part of the open source Kea, available to all users.
|
|
|
|
.. note::
|
|
|
|
This library can only be loaded by the ``kea-dhcp4`` or ``kea-dhcp6`` process.
|
|
|
|
High Availability (HA) of the DHCP service is provided by running multiple
|
|
cooperating server instances. If any of these instances becomes unavailable for
|
|
any reason (DHCP software crash, Control Agent software crash, power outage,
|
|
hardware failure), a surviving server instance can continue providing reliable
|
|
service to clients. Many DHCP server implementations include the "DHCP Failover"
|
|
protocol, whose most significant features are communication between the servers,
|
|
partner failure detection, and lease synchronization between the servers.
|
|
However, the DHCPv4 failover standardization process was never completed by the
|
|
IETF. The DHCPv6 failover standard (RFC 8156) was published, but it is complex
|
|
and difficult to use, has significant operational constraints, and is different
|
|
from its v4 counterpart. Although it may be useful to use a "standard" failover
|
|
protocol, most Kea users are simply interested in a working solution which
|
|
guarantees high availability of the DHCP service. Therefore, the Kea HA hook
|
|
library derives major concepts from the DHCP failover protocol but uses its own
|
|
solutions for communication and configuration. It offers its own state machine,
|
|
which greatly simplifies its implementation and generally fits better into Kea,
|
|
and it provides the same features in both DHCPv4 and DHCPv6. This document
|
|
intentionally uses the term "high availability" rather than "failover" to
|
|
emphasize that it is not the failover protocol implementation.
|
|
|
|
The following sections describe the configuration and operation of the Kea HA
|
|
hook library.
|
|
|
|
.. _ha-supported-configurations:
|
|
|
|
Supported Configurations
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The Kea HA hook library supports three configurations, also known as HA modes:
|
|
``load-balancing``, ``hot-standby``, and ``passive-backup``. In the
|
|
``load-balancing`` mode, two servers respond to DHCP requests. The
|
|
``load-balancing`` function is implemented as described in `RFC
|
|
3074 <https://tools.ietf.org/html/rfc3074>`__, with each server responding to
|
|
half the received DHCP queries. When one of the servers allocates a lease for a
|
|
client, it notifies the partner server over the control channel (via the RESTful
|
|
API), so the partner can save the lease information in its own database. If the
|
|
communication with the partner is unsuccessful, the DHCP query is dropped and
|
|
the response is not returned to the DHCP client. If the lease update is
|
|
successful, the response is returned to the DHCP client by the server which has
|
|
allocated the lease. By exchanging lease updates, both servers get a copy of all
|
|
leases allocated by the entire HA setup, and either server can be switched to
|
|
handle the entire DHCP traffic if its partner becomes unavailable.
|
|
|
|
In the ``load-balancing`` configuration, one of the servers must be designated
|
|
as ``primary`` and the other as ``secondary``. Functionally, there is no
|
|
difference between the two during normal operation. However, this distinction is
|
|
required when the two servers are started at (nearly) the same time and have to
|
|
synchronize their lease databases. The primary server synchronizes the database
|
|
first. The secondary server waits for the primary server to complete the lease
|
|
database synchronization before it starts the synchronization.
|
|
|
|
In the ``hot-standby`` configuration, one of the servers is designated as
|
|
``primary`` and the other as ``standby``. During normal operation, the primary
|
|
server is the only one that responds to DHCP requests. The standby server
|
|
receives lease updates from the primary over the control channel; however, it
|
|
does not respond to any DHCP queries as long as the primary is running or, more
|
|
accurately, until the standby considers the primary to be offline. If the
|
|
standby server detects the failure of the primary, it starts responding to all
|
|
DHCP queries.
|
|
|
|
.. note::
|
|
|
|
Operators often wonder whether to use ``load-balancing`` or ``hot-standby``
|
|
mode. The ``load-balancing`` mode has the benefit of splitting the DHCP load
|
|
between two instances, reducing the traffic processed by each of them.
|
|
However, it is not always clear to the operators that using the
|
|
``load-balancing`` mode requires manually splitting the address pools between
|
|
two Kea instances using client classification, to preclude both servers from
|
|
allocating the same address to different clients.
|
|
Such a split is not needed in the ``hot-standby`` mode. Thus, the benefit
|
|
of using ``hot-standby`` over ``load-balancing`` is that the former has a
|
|
simpler configuration. Conversely, ``load-balancing`` has higher performance
|
|
potential at the cost of more complex configuration.
|
|
See :ref:`ha-load-balancing-config` for details on how to split the pools
|
|
using client classification.
|
|
|
|
In the configurations described above, both the primary and secondary/standby
|
|
are referred to as ``active`` servers, because they receive lease updates and
|
|
can automatically react to the partner's failures by responding to the DHCP
|
|
queries which would normally be handled by the partner. The HA hook library
|
|
supports another server type/role: ``backup``. The use of a backup server is
|
|
optional, and can be implemented in both ``load-balancing`` and ``hot-standby``
|
|
setup, in addition to the active servers. There is no limit on the number of
|
|
backup servers in the HA setup; however, the presence of backup servers may
|
|
increase the latency of DHCP responses, because not only do active servers send
|
|
lease updates to each other, but also to the backup servers. The active servers
|
|
do not expect acknowledgments from the backup servers before responding to the
|
|
DHCP clients, so the overhead of sending lease updates to the backup servers is
|
|
minimized.
|
|
|
|
In the last supported configuration, ``passive-backup``, there is only one
|
|
active server and typically one or more backup servers. A ``passive-backup``
|
|
configuration with no backup servers is also accepted, but it is no different
|
|
than running a single server with no HA function at all.
|
|
|
|
The ``passive-backup`` configuration is used in situations when an administrator
|
|
wants to take advantage of the backup server(s) as an additional storage for
|
|
leases without running the full-blown failover setup. In this case, if the
|
|
primary server fails, the DHCP service is lost; it requires the administrator to
|
|
manually restart the primary to resume DHCP service. The administrator may also
|
|
configure one of the backup servers to provide DHCP service to the clients, as
|
|
these servers should have accurate or nearly accurate information about the
|
|
allocated leases. The major advantage of the ``passive-backup`` mode is that it
|
|
provides some redundancy of the lease information but with better performance of
|
|
the primary server responding to the DHCP queries.
|
|
The primary server does not have to wait for acknowledgments to the lease
|
|
updates from the backup servers before it sends a response to the DHCP client.
|
|
This reduces the response time compared to the ``load-balancing`` and
|
|
``hot-standby`` cases, in which the server responding to the DHCP query has to
|
|
wait for the acknowledgment from the other active server before it can respond
|
|
to the client.
|
|
|
|
.. note::
|
|
|
|
An interesting use case for a single active server running in the
|
|
``passive-backup`` mode is a notification service, in which software
|
|
pretending to be a backup server receives live notifications about allocated
|
|
and deleted leases from the primary server and can display them on a
|
|
monitoring screen, trigger alerts, etc.
|
|
|
|
Clocks on Active Servers
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Synchronized clocks are essential for the HA setup to operate reliably.
|
|
The servers share lease information - via lease updates and during
|
|
synchronization of the databases - including the time when the lease was
|
|
allocated and when it expires. Some clock skew between the servers participating
|
|
in the HA setup usually exists; this is acceptable as long as the clock skew is
|
|
relatively low, compared to the lease lifetimes. However, if the clock skew
|
|
becomes too high, the different lease expiration times on different servers may
|
|
cause the HA system to malfunction. For example, one server may consider a lease
|
|
to be expired when it is actually still valid. The lease reclamation process may
|
|
remove a name associated with this lease from the DNS, causing problems when the
|
|
client later attempts to renew the lease.
|
|
|
|
Each active server monitors the clock skew by comparing its current time with
|
|
the time returned by its partner in response to the heartbeat command. This
|
|
gives a good approximation of the clock skew, although it does not take into
|
|
account the time between the partner sending the response and the receipt of
|
|
this response by the server which sent the heartbeat command. If the clock skew
|
|
exceeds 30 seconds, a warning log message is issued. The administrator may
|
|
correct this problem by synchronizing the clocks (e.g. using NTP); the servers
|
|
should notice the clock skew correction and stop issuing the warning.
|
|
|
|
If the clock skew is not corrected and exceeds 60 seconds, the HA service on
|
|
each of the servers is terminated, i.e. the state machine enters the
|
|
``terminated`` state. The servers will continue to respond to DHCP clients (as
|
|
in the ``load-balancing`` or ``hot-standby`` mode), but will exchange neither
|
|
lease updates nor heartbeats and their lease databases will diverge. In this
|
|
case, the administrator should synchronize the clocks and restart the servers.
|
|
|
|
.. note::
|
|
|
|
It is possible to restart the servers one at a time, in no particular order.
|
|
The clocks must be in sync before restarting the servers.
|
|
|
|
.. note::
|
|
|
|
The clock skew is only assessed between two active servers, and only the
|
|
active servers enter the ``terminated`` state if the skew is too high. The
|
|
clock skew between active and backup servers is not assessed, because active
|
|
servers do not exchange heartbeat messages with backup servers.
|
|
|
|
.. _ha-https-support:
|
|
|
|
HTTPS Support
|
|
~~~~~~~~~~~~~
|
|
|
|
Since Kea 1.9.7, the High Availability hook library supports HTTPS via TLS, as
|
|
described in :ref:`tls`.
|
|
|
|
The HTTPS configuration parameters are:
|
|
|
|
- ``trust-anchor`` - specifies the name of a file or directory where the
|
|
certification authority certificate of a Control Agent can be found.
|
|
|
|
- ``cert-file`` - specifies the name of the file containing the end-entity
|
|
certificate to use.
|
|
|
|
- ``key-file`` - specifies the private key of the end-entity certificate to use.
|
|
|
|
These parameters can be configured at the global and peer levels. When
|
|
configured at both levels the peer value is used, allowing common values to be
|
|
shared.
|
|
|
|
The three parameters must be either all not specified (HTTPS disabled) or all
|
|
specified (HTTPS enabled). Specification of the empty string is considered not
|
|
specified; this can be used, for instance, to disable HTTPS for a particular
|
|
peer when it is enabled at the global level.
|
|
|
|
As the High Availability hook library is an HTTPS client, there is no
|
|
``cert-required`` parameter in this hook configuration.
|
|
This parameter can be set in the Control Agent to require and verify a client
|
|
certificate in client-server communication. It does not affect communication
|
|
between HA peers at the client side; see below for information on the server
|
|
side.
|
|
|
|
Before Kea 2.1.7 using HTTPS in the HA setup required use of the Control Agent
|
|
on all peers. (See :ref:`tls` for Control Agent TLS configuration).
|
|
|
|
Since Kea 2.1.7 the HTTPS server side is supported:
|
|
|
|
- the peer entry for the server name is used for the TLS setting.
|
|
|
|
- the new ``require-client-certs`` parameter specifies whether client
|
|
certificates are required and verified, i.e. like ``cert-required``. It
|
|
defaults to ``true`` and is an HA config (vs. peer config) parameter.
|
|
|
|
Kea 2.1.7 added a new security feature with the ``restrict-commands`` HA config
|
|
parameter: when set to ``true``, commands which are not used by the hook are
|
|
rejected. The default is ``false``.
|
|
|
|
The following is an example of an HA server pair and Control Agent configuration
|
|
for ``hot-standby`` with TLS.
|
|
|
|
Server 1:
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"Dhcp4": {
|
|
"hooks-libraries": [{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
}, {
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [{
|
|
"this-server-name": "server1",
|
|
"trust-anchor": "/usr/lib/kea/CA.pem",
|
|
"cert-file": "/usr/lib/kea/server1_cert.pem",
|
|
"key-file": "/usr/lib/kea/server1_key.pem",
|
|
"mode": "hot-standby",
|
|
"heartbeat-delay": 10000,
|
|
"max-response-delay": 60000,
|
|
"max-ack-delay": 5000,
|
|
"max-unacked-clients": 5,
|
|
"peers": [{
|
|
"name": "server1",
|
|
"url": "http://192.168.56.33:8000/",
|
|
"role": "primary",
|
|
"auto-failover": true
|
|
}, {
|
|
"name": "server2",
|
|
"url": "http://192.168.56.66:8000/",
|
|
"role": "standby",
|
|
"auto-failover": true
|
|
}]
|
|
}]
|
|
}
|
|
}],
|
|
|
|
"subnet4": [{
|
|
"subnet": "192.0.3.0/24",
|
|
"pools": [{
|
|
"pool": "192.0.3.100 - 192.0.3.250"
|
|
}]
|
|
}]
|
|
}
|
|
}
|
|
|
|
Server 2:
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"Dhcp4": {
|
|
"hooks-libraries": [{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
}, {
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [{
|
|
"this-server-name": "server2",
|
|
"trust-anchor": "/usr/lib/kea/CA.pem",
|
|
"cert-file": "/usr/lib/kea/server2_cert.pem",
|
|
"key-file": "/usr/lib/kea/server2_key.pem",
|
|
"mode": "hot-standby",
|
|
"heartbeat-delay": 10000,
|
|
"max-response-delay": 60000,
|
|
"max-ack-delay": 5000,
|
|
"max-unacked-clients": 5,
|
|
"peers": [{
|
|
"name": "server1",
|
|
"url": "http://192.168.56.33:8000/",
|
|
"role": "primary",
|
|
"auto-failover": true
|
|
}, {
|
|
"name": "server2",
|
|
"url": "http://192.168.56.66:8000/",
|
|
"role": "standby",
|
|
"auto-failover": true
|
|
}]
|
|
}]
|
|
}
|
|
}],
|
|
|
|
"subnet4": [{
|
|
"subnet": "192.0.3.0/24",
|
|
"pools": [{
|
|
"pool": "192.0.3.100 - 192.0.3.250"
|
|
}]
|
|
}]
|
|
}
|
|
}
|
|
|
|
Control Agent on Server 1:
|
|
::
|
|
|
|
{
|
|
"Control-agent": {
|
|
"http-host": "192.168.56.33",
|
|
"http-port": 8000,
|
|
"control-sockets": {
|
|
"dhcp4": {
|
|
"socket-type": "unix",
|
|
"socket-name": "/var/run/kea/control_socket"
|
|
}
|
|
},
|
|
"trust-anchor": "/var/lib/kea/CA.pem",
|
|
"cert-file": "/var/lib/kea/server1_cert.pem",
|
|
"key-file": "/var/lib/kea/server1_key.pem",
|
|
"cert-required": true
|
|
}
|
|
}
|
|
|
|
Control Agent on Server 2:
|
|
::
|
|
|
|
{
|
|
"Control-agent": {
|
|
"http-host": "192.168.56.66",
|
|
"http-port": 8000,
|
|
"control-sockets": {
|
|
"dhcp4": {
|
|
"socket-type": "unix",
|
|
"socket-name": "/var/run/kea/control_socket"
|
|
}
|
|
},
|
|
"trust-anchor": "/var/lib/kea/CA.pem",
|
|
"cert-file": "/var/lib/kea/server2_cert.pem",
|
|
"key-file": "/var/lib/kea/server2_key.pem",
|
|
"cert-required": true
|
|
}
|
|
}
|
|
|
|
.. _ha-server-states:
|
|
|
|
Server States
|
|
~~~~~~~~~~~~~
|
|
|
|
A DHCP server operating within an HA setup runs a state machine, and the state
|
|
of the server can be retrieved by its peers using the ``ha-heartbeat`` command
|
|
sent over the RESTful API. If the partner server does not respond to the
|
|
``ha-heartbeat`` command within the specified amount of time, the communication
|
|
is considered interrupted and the server may, depending on the configuration,
|
|
use additional measures (described later in this document) to verify that the
|
|
partner is still operating. If it finds that the partner is not operating, the
|
|
server transitions to the ``partner-down`` state to handle all the DHCP traffic
|
|
directed to the system.
|
|
|
|
In this case, the surviving server continues to send the ``ha-heartbeat``
|
|
command to detect when the partner wakes up. At that time, the partner
|
|
synchronizes the lease database. When it is again ready to operate, the
|
|
surviving server returns to normal operation, i.e. the ``load-balancing`` or
|
|
``hot-standby`` state.
|
|
|
|
The following is the list of all possible server states:
|
|
|
|
- ``backup`` - normal operation of the backup server. In this state it receives
|
|
lease updates from the active server(s).
|
|
|
|
- ``communication-recovery`` - an active server running in ``load-balancing``
|
|
mode may transition to this state when it experiences communication issues
|
|
with a partner server over the control channel. This is an intermediate state
|
|
between the ``load-balancing`` and ``partner-down`` states. In this state the
|
|
server continues to respond to DHCP queries but does not send lease updates
|
|
to the partner; lease updates are queued and are sent when normal
|
|
communication is resumed. If communication does not resume within the time
|
|
specified, the primary server then transitions to the ``partner-down`` state.
|
|
The ``communication-recovery`` state was introduced to ensure reliable DHCP
|
|
service when both active servers remain operational but the communication
|
|
between them is interrupted for a prolonged period of time. Either server can
|
|
be configured to never enter this state by setting the
|
|
``delayed-updates-limit`` to 0 (please refer to
|
|
:ref:`ha-load-balancing-config`, later in this chapter, for details on this
|
|
parameter). Disabling entry into the ``communication-recovery`` state causes
|
|
the server to begin testing for the ``partner-down`` state as soon as the
|
|
server is unable to communicate with its partner.
|
|
|
|
.. note::
|
|
|
|
In Kea 1.9.4, with the introduction of ``delayed-updates-limit``, the default
|
|
server's behavior in ``load-balancing`` mode changed. When a server
|
|
experiences communication issues with its partner, it now enters the
|
|
``communication-recovery`` state and queues lease updates until communication
|
|
is resumed. Prior to Kea 1.9.4, a server that could not communicate with its
|
|
partner in ``load-balancing`` mode would immediately begin the transition to
|
|
the ``partner-down`` state.
|
|
|
|
- ``hot-standby`` - normal operation of the active server running in the
|
|
``hot-standby`` mode; both the primary and the standby server are in this
|
|
state during their normal operation. The primary server responds to DHCP
|
|
queries and sends lease updates to the standby server and to any backup
|
|
servers that are present.
|
|
|
|
- ``load-balancing`` - normal operation of the active server running in the
|
|
``load-balancing`` mode; both the primary and the secondary server are in
|
|
this state during their normal operation. Both servers respond to DHCP
|
|
queries and send lease updates to each other and to any backup servers that
|
|
are present.
|
|
|
|
- ``in-maintenance`` - an active server transitions to this state as a result
|
|
of being notified by its partner that the administrator requested maintenance
|
|
of the HA setup. The administrator requests the maintenance by sending the
|
|
``ha-maintenance-start`` command to the server which is supposed to take over
|
|
the responsibility for responding to the DHCP clients while the other server
|
|
is taken offline for maintenance. If the server is in the ``in-maintenance``
|
|
state it can be safely shut down. The partner transitions to the
|
|
``partner-down`` state immediately after discovering that the server in
|
|
maintenance has been shut down.
|
|
|
|
- ``partner-down`` - an active server transitions to this state after detecting
|
|
that its partner (another active server) is offline. The server does not
|
|
transition to this state if only a backup server is unavailable. In the
|
|
``partner-down`` state the active server responds to all DHCP queries,
|
|
including those queries which are normally handled by the server that is now
|
|
unavailable.
|
|
|
|
- ``partner-in-maintenance`` - an active server transitions to this state
|
|
after receiving a ``ha-maintenance-start`` command from the administrator.
|
|
The server in this state becomes responsible for responding to all DHCP
|
|
requests. The server sends a ``ha-maintenance-notify`` command to the partner,
|
|
which should enter the ``in-maintenance`` state. The server remaining in the
|
|
``partner-in-maintenance`` state keeps sending lease updates to the partner
|
|
until it finds that the partner has stopped responding to those lease updates,
|
|
heartbeats, or any other commands. In this case, the server in the
|
|
``partner-in-maintenance`` state transitions to the ``partner-down`` state
|
|
and keeps responding to the queries, but no longer sends lease updates.
|
|
|
|
- ``passive-backup`` - a primary server running in the ``passive-backup`` HA
|
|
mode transitions to this state immediately after it boots up. The primary
|
|
server in this state responds to all DHCP traffic and sends lease updates to
|
|
the backup servers it is connected to. By default, the primary server does
|
|
not wait for acknowledgments from the backup servers and responds to a DHCP
|
|
query right after sending lease updates to all backup servers. If any of the
|
|
lease updates fail, a backup server misses the lease update but the DHCP
|
|
client is still provisioned. This default configuration can be changed by
|
|
setting the ``wait-backup-ack`` configuration parameter to ``true``, in which
|
|
case the primary server always waits for the acknowledgements and drops the
|
|
DHCP query if sending any of the corresponding lease updates fails. This
|
|
improves lease database consistency between the primary and the secondary.
|
|
However, if a communication failure between the active server and any of the
|
|
backups occurs, it effectively causes the failure of the DHCP service from
|
|
the DHCP clients' perspective.
|
|
|
|
- ``ready`` - an active server transitions to this state after synchronizing
|
|
its lease database with an active partner. This state indicates to the
|
|
partner (which may be in the ``partner-down`` state) that it should return to
|
|
normal operation. If and when it does, the server in the ``ready`` state also
|
|
starts normal operation.
|
|
|
|
- ``syncing`` - an active server transitions to this state to fetch leases from
|
|
the active partner and update the local lease database. When in this state,
|
|
the server issues the ``dhcp-disable`` command to disable the DHCP service of
|
|
the partner from which the leases are fetched. The DHCP service is disabled
|
|
for a maximum time of 60 seconds, after which it is automatically re-enabled,
|
|
in case the syncing partner was unable to re-enable the service. If the
|
|
synchronization completes successfully, the synchronizing server issues the
|
|
``ha-sync-complete-notify`` command to notify the partner. In most states,
|
|
the partner re-enables its DHCP service to continue responding to the DHCP
|
|
queries. In the ``partner-down`` state, the partner first ensures that
|
|
communication between the servers is re-established before enabling the DHCP
|
|
service. The syncing operation is synchronous; the server waits for an answer
|
|
from the partner and does nothing else while the lease synchronization takes
|
|
place. A server that is configured not to synchronize the lease database with
|
|
its partner, i.e. when the ``sync-leases`` configuration parameter is set to
|
|
``false``, will never transition to this state. Instead, it transitions
|
|
directly from the ``waiting`` state to the ``ready`` state.
|
|
|
|
- ``terminated`` - an active server transitions to this state when the High
|
|
Availability hook library is unable to further provide reliable service and a
|
|
manual intervention of the administrator is required to correct the problem.
|
|
Various issues with the HA setup may cause the server to transition to this
|
|
state. While in this state, the server continues responding to DHCP clients
|
|
based on the HA mode selected (``load-balancing`` or ``hot-standby``), but
|
|
lease updates are not exchanged and heartbeats are not sent. Once a server
|
|
has entered the ``terminated`` state, it remains in this state until it is
|
|
restarted. The administrator must correct the issue which caused this
|
|
situation prior to restarting the server (e.g. synchronize the clocks);
|
|
otherwise, the server will return to the ``terminated`` state once it finds
|
|
that the issue persists.
|
|
|
|
- ``waiting`` - each started server instance enters this state. A backup server
|
|
transitions directly from this state to the ``backup`` state. An active
|
|
server sends a heartbeat to its partner to check its state; if the partner
|
|
appears to be unavailable, the server transitions to the ``partner-down``
|
|
state. If the partner is available, the server transitions to the ``syncing``
|
|
or ``ready`` state, depending on the setting of the ``sync-leases``
|
|
configuration parameter. If both servers appear to be in the ``waiting``
|
|
state (concurrent startup), the primary server transitions to the next state
|
|
first. The secondary or standby server remains in the ``waiting`` state until
|
|
the primary transitions to the ``ready`` state.
|
|
|
|
.. note::
|
|
|
|
Currently, restarting the HA service from the ``terminated`` state requires
|
|
restarting the DHCP server or reloading its configuration.
|
|
|
|
Whether the server responds to DHCP queries and which queries it responds to is
|
|
a matter of the server's state, if no administrative action is performed to
|
|
configure the server otherwise. The following table provides the default
|
|
behavior for various states.
|
|
|
|
The ``DHCP Service Scopes`` denote which group of received DHCP queries the
|
|
server responds to in the given state. The HA configuration must specify a
|
|
unique name for each server within the HA setup. This document uses the
|
|
following convention within the provided examples: "server1" for a primary
|
|
server, "server2" for the secondary or standby server, and "server3" for the
|
|
backup server. In real life any names can be used as long as they remain unique.
|
|
|
|
An in-depth explanation of the scopes can be found below.
|
|
|
|
.. table:: Default behavior of the server in various HA states
|
|
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| State | Server Type | DHCP Service | DHCP Service |
|
|
| | | | Scopes |
|
|
+========================+=================+=================+================+
|
|
| backup | backup server | disabled | none |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| communication-recovery | primary or | enabled | "HA_server1" |
|
|
| | secondary | | or |
|
|
| | (load-balancing | | "HA_server2" |
|
|
| | mode only) | | |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| hot-standby | primary or | enabled | "HA_server1" |
|
|
| | standby | | if primary, |
|
|
| | (hot-standby | | none otherwise |
|
|
| | mode) | | |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| load-balancing | primary or | enabled | "HA_server1" |
|
|
| | secondary | | or |
|
|
| | (load-balancing | | "HA_server2" |
|
|
| | mode) | | |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| in-maintenance | active server | disabled | none |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| partner-down | active server | enabled | all scopes |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| partner-in-maintenance | active server | enabled | all scopes |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| passive-backup | active server | enabled | all scopes |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| ready | active server | disabled | none |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| syncing | active server | disabled | none |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| terminated | active server | enabled | same as in the |
|
|
| | | | load-balancing |
|
|
| | | | or hot-standby |
|
|
| | | | state |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
| waiting | any server | disabled | none |
|
|
+------------------------+-----------------+-----------------+----------------+
|
|
|
|
In the ``load-balancing`` mode there are two scopes specified for the active
|
|
servers: "HA_server1" and "HA_server2". The DHCP queries load-balanced to
|
|
``server1`` belong to the "HA_server1" scope and the queries load-balanced to
|
|
``server2`` belong to the "HA_server2" scope. If either server is in the
|
|
``partner-down`` state, the active partner is responsible for serving both
|
|
scopes.
|
|
|
|
In the ``hot-standby`` mode, there is only one scope - "HA_server1" - because
|
|
only ``server1`` is responding to DHCP queries. If that server becomes
|
|
unavailable, ``server2`` becomes responsible for this scope.
|
|
|
|
The backup servers do not have their own scopes. In some cases they can be used
|
|
to respond to queries belonging to the scopes of the active servers. Also, a
|
|
backup server which is neither in the ``partner-down`` state nor in normal
|
|
operation serves no scopes.
|
|
|
|
The scope names can be used to associate pools, subnets, and networks with
|
|
certain servers, so that only these servers can allocate addresses or prefixes
|
|
from those pools, subnets, or networks. This is done via the client
|
|
classification mechanism (see :ref:`ha-load-balancing-advanced-config` for more
|
|
details).
|
|
|
|
.. _ha-scope-transition:
|
|
|
|
Scope Transition in a Partner-Down Case
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
When one of the servers finds that its partner is unavailable, it starts serving
|
|
clients from both its own scope and the scope of the unavailable partner. This
|
|
is straightforward for new clients, i.e. those sending DHCPDISCOVER (DHCPv4) or
|
|
Solicit (DHCPv6), because those requests are not sent to any particular server.
|
|
The available server responds to all such queries when it is in the
|
|
``partner-down`` state.
|
|
|
|
When a client renews a lease, it sends its DHCPREQUEST (DHCPv4) or Renew (DHCPv6)
|
|
message directly to the server which has allocated the lease being renewed. If
|
|
this server is no longer available, the client will get no response. In that
|
|
case, the client continues to use its lease and attempts to renew until the
|
|
rebind timer (T2) elapses. The client then enters the rebinding phase, in which
|
|
it sends a DHCPREQUEST (DHCPv4) or Rebind (DHCPv6) message to any available
|
|
server. The surviving server receives the rebinding request and typically
|
|
extends the lifetime of the lease. The client then continues to contact that new
|
|
server to renew its lease as appropriate.
|
|
|
|
If and when the other server once again becomes available, both active servers
|
|
will eventually transition to the ``load-balancing`` or ``hot-standby`` state,
|
|
in which they will again be responsible for their own scopes. Some clients
|
|
belonging to the scope of the restarted server will try to renew their leases
|
|
via the surviving server, but this server will no longer respond to them; the
|
|
client will eventually transition back to the correct server via the rebinding
|
|
mechanism.
|
|
|
|
.. _ha-load-balancing-config:
|
|
|
|
Load-Balancing Configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The following is the configuration snippet to enable high availability on the
|
|
primary server within the ``load-balancing`` configuration. The same
|
|
configuration should be applied on the secondary and backup servers, with the
|
|
only difference that ``this-server-name`` should be set to "server2" and
|
|
"server3" on those servers, respectively.
|
|
|
|
.. note::
|
|
|
|
Remember that ``load-balancing`` mode requires the address pools and
|
|
delegated prefix pools to be split between the active servers. During normal
|
|
operation, the servers use non-overlapping pools to avoid allocating the same
|
|
lease to different clients by both instances. A server only uses the pool
|
|
fragments owned by the partner when the partner is not running. See the notes
|
|
in :ref:`ha-supported-configurations` highlighting differences between the
|
|
``load-balancing`` and ``hot-standby`` modes. The semantics of pool
|
|
partitioning is explained further in this section.
|
|
The :ref:`ha-load-balancing-advanced-config` section provides advanced
|
|
pool-partitioning examples.
|
|
|
|
::
|
|
|
|
"Dhcp4": {
|
|
"hooks-libraries": [{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
}, {
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [{
|
|
"this-server-name": "server1",
|
|
"mode": "load-balancing",
|
|
"heartbeat-delay": 10000,
|
|
"max-response-delay": 60000,
|
|
"max-ack-delay": 5000,
|
|
"max-unacked-clients": 5,
|
|
"max-rejected-lease-updates": 10,
|
|
"delayed-updates-limit": 100,
|
|
"peers": [{
|
|
"name": "server1",
|
|
"url": "http://192.168.56.33:8000/",
|
|
"role": "primary",
|
|
"auto-failover": true
|
|
}, {
|
|
"name": "server2",
|
|
"url": "http://192.168.56.66:8000/",
|
|
"role": "secondary",
|
|
"auto-failover": true
|
|
}, {
|
|
"name": "server3",
|
|
"url": "http://192.168.56.99:8000/",
|
|
"role": "backup",
|
|
"basic-auth-user": "foo",
|
|
"basic-auth-password": "bar",
|
|
"auto-failover": false
|
|
}]
|
|
}]
|
|
}
|
|
}],
|
|
|
|
"subnet4": [{
|
|
"subnet": "192.0.3.0/24",
|
|
"pools": [{
|
|
"pool": "192.0.3.100 - 192.0.3.150",
|
|
"client-class": "HA_server1"
|
|
}, {
|
|
"pool": "192.0.3.200 - 192.0.3.250",
|
|
"client-class": "HA_server2"
|
|
}],
|
|
|
|
"option-data": [{
|
|
"name": "routers",
|
|
"data": "192.0.3.1"
|
|
}],
|
|
|
|
"relay": { "ip-address": "10.1.2.3" }
|
|
}]
|
|
}
|
|
|
|
Two hook libraries must be loaded to enable HA: ``libdhcp_lease_cmds.so`` and
|
|
``libdhcp_ha.so``. The latter implements the HA feature, while the former
|
|
enables control commands required by HA to fetch and manipulate leases on the
|
|
remote servers. In the example provided above, it is assumed that Kea libraries
|
|
are installed in the ``/usr/lib`` directory. If Kea is not installed in the
|
|
``/usr`` directory, the hook libraries' locations must be updated accordingly.
|
|
|
|
The HA configuration is specified within the scope of ``libdhcp_ha.so``.
|
|
Note that while the top-level parameter ``high-availability`` is a list, only a
|
|
single entry is currently supported.
|
|
|
|
The following are the global parameters which control the server's behavior with
|
|
respect to HA:
|
|
|
|
- ``this-server-name`` - is a unique identifier of the server within this HA
|
|
setup. It must match one of the servers specified within the ``peers`` list.
|
|
|
|
- ``mode`` - specifies an HA mode of operation. The currently supported modes
|
|
are ``load-balancing`` and ``hot-standby``.
|
|
|
|
- ``heartbeat-delay`` - specifies a duration in milliseconds between sending
|
|
the last heartbeat (or other command sent to the partner) and the next
|
|
heartbeat. Heartbeats are sent periodically to gather the status of the
|
|
partner and to verify whether the partner is still operating. The default
|
|
value of this parameter is 10000 ms.
|
|
|
|
- ``max-response-delay`` - specifies a duration in milliseconds since the last
|
|
successful communication with the partner, after which the server assumes
|
|
that communication with the partner is interrupted. This duration should be
|
|
greater than the ``heartbeat-delay``; typically it should be a multiple of
|
|
``heartbeat-delay``. When the server detects that communication is
|
|
interrupted, it may transition to the ``partner-down`` state (when
|
|
``max-unacked-clients`` is 0) or trigger the failure-detection procedure
|
|
using the values of the two parameters below. The default value of this
|
|
parameter is 60000 ms.
|
|
|
|
- ``max-ack-delay`` - is one of the parameters controlling partner
|
|
failure-detection. When communication with the partner is interrupted, the
|
|
server examines the values of the "secs" field (DHCPv4) or "elapsed time"
|
|
option (DHCPv6), which denote how long the DHCP client has been trying to
|
|
communicate with the DHCP server. This parameter specifies the maximum time
|
|
in milliseconds for the client to try to communicate with the DHCP server,
|
|
after which this server assumes that the client failed to communicate with
|
|
the DHCP server (is unacknowledged or "unacked"). The default value of this
|
|
parameter is 10000.
|
|
|
|
- ``max-unacked-clients`` - specifies how many "unacked" clients are allowed
|
|
(see ``max-ack-delay``) before this server assumes that the partner is
|
|
offline and transitions to the ``partner-down`` state. The special value of 0
|
|
is allowed for this parameter, which disables the failure-detection mechanism.
|
|
In this case, a server that cannot communicate with its partner over the
|
|
control channel assumes that the partner server is down and transitions to
|
|
the ``partner-down`` state immediately. The default value of this parameter
|
|
is 10.
|
|
|
|
- ``max-rejected-lease-updates`` - specifies how many lease updates for distinct
|
|
clients can fail, due to a conflict between the lease and the partner configuration
|
|
or state, before the server transitions to the ``terminated`` state. Conflict
|
|
can be a sign of a misconfiguration; usually, a small number of conflicted
|
|
leases are acceptable because they affect only a few devices. However, if
|
|
the conflicts occur for many devices (e.g., an entire subnet), the HA service
|
|
becomes unreliable and should be terminated, and the problem must be manually
|
|
corrected by an administrator. It is up to the administrator to select
|
|
the highest acceptable value of ``max-rejected-lease-updates``. The default
|
|
value is 10. The special value of 0 configures the server to never terminate
|
|
the HA service due to lease conflicts. If the value is 1, the server
|
|
transitions to the ``terminated`` state when the first conflict occurs.
|
|
This parameter does not pertain to conflicting lease updates sent to
|
|
the backup servers.
|
|
|
|
- ``delayed-updates-limit`` - specifies the maximum number of lease updates
|
|
which can be queued while the server is in the ``communication-recovery``
|
|
state. This parameter was introduced in Kea 1.9.4. The special value of 0
|
|
configures the server to never transition to the ``communication-recovery``
|
|
state and the server behaves as in earlier Kea versions, i.e. if the server
|
|
cannot reach its partner, it goes straight into the ``partner-down`` state.
|
|
The default value of this parameter is 100.
|
|
|
|
.. note::
|
|
|
|
The ``max-rejected-lease-updates`` parameter was introduced in Kea 2.3.1.
|
|
Previously, the server did not differentiate between a lease update
|
|
failure due to a non-functioning partner and a failure due to a conflict
|
|
(e.g., configuration issues).
|
|
As a result, the server could sometimes transition to the ``partner-down``
|
|
state even though the partner was operating normally, but only certain leases
|
|
had issues. Conflicts should no longer cause such a transition. However,
|
|
depending on the ``max-rejected-lease-updates`` setting, too many conflicts
|
|
can lead to termination of the High Availability service. In that case, both
|
|
servers continue to respond to DHCP queries but no longer send lease updates.
|
|
|
|
The values of ``max-ack-delay`` and ``max-unacked-clients`` must be selected
|
|
carefully, taking into account the specifics of the network in which the DHCP
|
|
servers are operating. The server in question may not respond to some DHCP
|
|
clients following administrative policy, or the server may drop malformed
|
|
queries from clients. Therefore, selecting too low a value for the
|
|
``max-unacked-clients`` parameter may result in a transition to the
|
|
``partner-down`` state even though the partner is still operating. On the other
|
|
hand, selecting too high a value may result in never transitioning to the
|
|
``partner-down`` state if the DHCP traffic in the network is very low (e.g. at
|
|
night), because the number of distinct clients trying to communicate with the
|
|
server could be lower than the ``max-unacked-clients`` setting.
|
|
|
|
In some cases it may be useful to disable the failure-detection mechanism
|
|
altogether, if the servers are located very close to each other and network
|
|
partitioning is unlikely, i.e. failure to respond to heartbeats is only possible
|
|
when the partner is offline. In such cases, set ``max-unacked-clients`` to 0.
|
|
|
|
The ``delayed-updates-limit`` parameter is used to enable or disable the
|
|
``communication-recovery`` procedure, and controls the server's behavior in the
|
|
``communication-recovery`` state. This parameter can only be used in the
|
|
``load-balancing`` mode.
|
|
|
|
If a server in the ``load-balancing`` state experiences communication issues
|
|
with its partner (a heartbeat or lease-update failure), the server transitions
|
|
to the ``communication-recovery`` state. In this state, the server keeps
|
|
responding to DHCP queries but does not send lease updates to the partner. The
|
|
lease updates are queued until communication is re-established, to ensure that
|
|
DHCP service remains available even in the event of the communication loss
|
|
between the partners. There may appear to be communication loss when either one
|
|
of the servers has terminated, or when both servers remain available but cannot
|
|
communicate with each other. In the former case, the surviving server will
|
|
follow the normal procedure and should eventually transition to the
|
|
``partner-down`` state. In the latter case, both servers should transition to
|
|
the ``communication-recovery`` state and should never transition to the
|
|
``partner-down`` state (if ``max-unacked-clients`` is set to a non-zero value),
|
|
because all DHCP queries are answered and neither server would see any unacked
|
|
DHCP queries.
|
|
|
|
Introduction of the ``communication-recovery`` procedure was motivated by issues
|
|
which may appear when two servers remain online but the communication between
|
|
them remains interrupted for a period of time. In earlier Kea versions, the
|
|
servers having communication issues used to drop DHCP packets before
|
|
transitioning to the ``partner-down`` state. In some cases they both
|
|
transitioned to the ``partner-down`` state, which could potentially result in
|
|
allocations of the same IP addresses or delegated prefixes to different clients
|
|
by both servers. By entering the intermediate ``communication-recovery`` state,
|
|
these problems are avoided.
|
|
|
|
If a server in the ``communication-recovery`` state re-establishes communication
|
|
with its partner, it tries to send the partner all of the outstanding lease
|
|
updates it has queued. This is done synchronously and may take a considerable
|
|
amount of time before the server transitions to the ``load-balancing`` state and
|
|
resumes normal operation.
|
|
The maximum number of lease updates which can be queued in the
|
|
``communication-recovery`` state is controlled by ``delayed-updates-limit``.
|
|
If the limit is exceeded, the server stops queuing lease updates and performs a
|
|
full database synchronization after re-establishing the connection with the
|
|
partner, instead of sending outstanding lease updates before transitioning to
|
|
the ``load-balancing`` state. Even if the limit is exceeded, the server in the
|
|
``communication-recovery`` state remains responsive to DHCP clients.
|
|
|
|
It may be preferable to set higher values of ``delayed-updates-limit`` when
|
|
there is a risk of prolonged communication interruption between the servers and
|
|
when the lease database is large, to avoid costly lease-database synchronization.
|
|
On the other hand, if the lease database is small, the time required to send
|
|
outstanding lease updates may be longer than the lease-database synchronization.
|
|
In such cases it may be better to use a lower value, e.g. 10. The default value
|
|
of 100 is a reasonable compromise and should work well in most deployments with
|
|
moderate traffic.
|
|
|
|
.. note::
|
|
|
|
This parameter is new and values for it that work well in some environments
|
|
may not work well in others. Feedback from users will help us build a better
|
|
working set of recommendations.
|
|
|
|
The ``peers`` parameter contains a list of servers within this HA setup.
|
|
This configuration must contain at least one primary and one secondary server.
|
|
It may also contain an unlimited number of backup servers. In this example,
|
|
there is one backup server which receives lease updates from the active servers.
|
|
|
|
Since Kea version 1.9.0, basic HTTP authentication is available
|
|
to protect the Kea control agent against local attackers.
|
|
|
|
These are the parameters specified for each of the peers within this
|
|
list:
|
|
|
|
- ``name`` - specifies a unique name for the server.
|
|
|
|
- ``url`` - specifies the URL to be used to contact this server over the
|
|
control channel. Other servers use this URL to send control commands to that
|
|
server.
|
|
|
|
- ``basic-auth-user`` - specifies the user ID for basic HTTP authentication. If
|
|
not specified or specified as an empty string, no authentication header is
|
|
added to HTTP transactions. It must not contain the colon (:) character.
|
|
|
|
- ``basic-auth-password`` - specifies the password for basic HTTP
|
|
authentication. This parameter is ignored when the user ID is not specified
|
|
or is empty. The password is optional; if not specified, an empty password is
|
|
used.
|
|
|
|
- ``basic-auth-password-file`` - is an alternative to ``basic-auth-password``:
|
|
instead of presenting the password in the configuration file it is specified
|
|
in the file indicated by this parameter.
|
|
|
|
- ``role`` - denotes the role of the server in the HA setup. The following
|
|
roles are supported in the ``load-balancing`` configuration: ``primary``,
|
|
``secondary``, and ``backup``. There must be exactly one primary and one
|
|
secondary server in the ``load-balancing`` setup.
|
|
|
|
- ``auto-failover`` - a boolean value which denotes whether a server detecting
|
|
a partner's failure should automatically start serving the partner's clients.
|
|
The default value of this parameter is ``true``.
|
|
|
|
In our example configuration above, both active servers can allocate leases from
|
|
the subnet "192.0.3.0/24". This subnet contains two address pools:
|
|
"192.0.3.100 - 192.0.3.150" and "192.0.3.200 - 192.0.3.250", which are
|
|
associated with HA server scopes using client classification. When ``server1``
|
|
processes a DHCP query, it uses the first pool for lease allocation. Conversely,
|
|
when ``server2`` processes a DHCP query it uses the second pool. If either of
|
|
the servers is in the ``partner-down`` state, the other can serve leases from
|
|
both pools; it selects the pool which is appropriate for the received query. In
|
|
other words, if the query would normally be processed by ``server2`` but this
|
|
server is not available, ``server1`` allocates the lease from the pool of
|
|
"192.0.3.200 - 192.0.3.250". The Kea control agent in front of ``server3``
|
|
requires basic HTTP authentication, and authorizes the user ID "foo" with the
|
|
password "bar".
|
|
|
|
.. note::
|
|
|
|
The ``url`` schema can be ``http`` or ``https``, but since Kea version 1.9.6
|
|
the ``https`` schema requires a TLS setup. The hostname part must be an IPv4
|
|
address or an IPv6 address between square brackets, e.g.
|
|
``http://[2001:db8::1]:8080/``. Names are not accepted.
|
|
|
|
.. _ha-load-balancing-advanced-config:
|
|
|
|
Load Balancing With Advanced Classification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In the previous section, we provided an example of a ``load-balancing``
|
|
configuration with client classification limited to the "HA_server1" and
|
|
"HA_server2" classes, which are dynamically assigned to the received DHCP
|
|
queries. In many cases, HA is needed in deployments which already use some other
|
|
client classification.
|
|
|
|
Suppose there is a system which classifies devices into two groups: "phones" and
|
|
"laptops", based on some classification criteria specified in the Kea
|
|
configuration file. Both types of devices are allocated leases from different
|
|
address pools. Introducing HA in ``load-balancing`` mode results in a further
|
|
split of each of those pools, as each server allocates leases for some phones
|
|
and some laptops. This requires each of the existing pools to be split between
|
|
"HA_server1" and "HA_server2", so we end up with the following classes:
|
|
|
|
- "phones_server1"
|
|
- "laptops_server1"
|
|
- "phones_server2"
|
|
- "laptops_server2"
|
|
|
|
The corresponding server configuration, using advanced classification (and the
|
|
``member`` expression), is provided below. For brevity's sake, the HA hook
|
|
library configuration has been removed from this example.
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"Dhcp4": {
|
|
"client-classes": [{
|
|
"name": "phones",
|
|
"test": "substring(option[60].hex,0,6) == 'Aastra'"
|
|
}, {
|
|
"name": "laptops",
|
|
"test": "not member('phones')"
|
|
}, {
|
|
"name": "phones_server1",
|
|
"test": "member('phones') and member('HA_server1')"
|
|
}, {
|
|
"name": "phones_server2",
|
|
"test": "member('phones') and member('HA_server2')"
|
|
}, {
|
|
"name": "laptops_server1",
|
|
"test": "member('laptops') and member('HA_server1')"
|
|
}, {
|
|
"name": "laptops_server2",
|
|
"test": "member('laptops') and member('HA_server2')"
|
|
}],
|
|
|
|
"hooks-libraries": [{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
}, {
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [{
|
|
}]
|
|
}
|
|
}],
|
|
|
|
"subnet4": [{
|
|
"subnet": "192.0.3.0/24",
|
|
"pools": [{
|
|
"pool": "192.0.3.100 - 192.0.3.125",
|
|
"client-class": "phones_server1"
|
|
}, {
|
|
"pool": "192.0.3.126 - 192.0.3.150",
|
|
"client-class": "laptops_server1"
|
|
}, {
|
|
"pool": "192.0.3.200 - 192.0.3.225",
|
|
"client-class": "phones_server2"
|
|
}, {
|
|
"pool": "192.0.3.226 - 192.0.3.250",
|
|
"client-class": "laptops_server2"
|
|
}],
|
|
|
|
"option-data": [{
|
|
"name": "routers",
|
|
"data": "192.0.3.1"
|
|
}],
|
|
|
|
"relay": { "ip-address": "10.1.2.3" }
|
|
}]
|
|
}
|
|
}
|
|
|
|
The configuration provided above splits the address range into four pools: two
|
|
pools dedicated to "HA_server1" and two to "HA_server2". Each server can assign
|
|
leases to both phones and laptops. Both groups of devices are assigned addresses
|
|
from different pools. The "HA_server1" and "HA_server2" classes are built-in
|
|
(see :ref:`classification-using-vendor`) and do not need to be declared.
|
|
They are assigned dynamically by the HA hook library as a result of the
|
|
``load-balancing`` algorithm. "phones_*" and "laptop_*" evaluate to ``true``
|
|
when the query belongs to a given combination of other classes, e.g. "HA_server1"
|
|
and "phones". The pool is selected accordingly as a result of such an evaluation.
|
|
|
|
Consult :ref:`classify` for details on how to use the ``member`` expression and
|
|
class dependencies.
|
|
|
|
.. _ha-hot-standby-config:
|
|
|
|
Hot-Standby Configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The following is an example configuration of the primary server in a
|
|
``hot-standby`` configuration:
|
|
|
|
::
|
|
|
|
"Dhcp4": {
|
|
"hooks-libraries": [{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
}, {
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [{
|
|
"this-server-name": "server1",
|
|
"mode": "hot-standby",
|
|
"heartbeat-delay": 10000,
|
|
"max-response-delay": 60000,
|
|
"max-ack-delay": 5000,
|
|
"max-unacked-clients": 5,
|
|
"max-rejected-lease-updates": 10,
|
|
"peers": [{
|
|
"name": "server1",
|
|
"url": "http://192.168.56.33:8000/",
|
|
"role": "primary",
|
|
"auto-failover": true
|
|
}, {
|
|
"name": "server2",
|
|
"url": "http://192.168.56.66:8000/",
|
|
"role": "standby",
|
|
"auto-failover": true
|
|
}, {
|
|
"name": "server3",
|
|
"url": "http://192.168.56.99:8000/",
|
|
"basic-auth-user": "foo",
|
|
"basic-auth-password": "bar",
|
|
"role": "backup",
|
|
"auto-failover": false
|
|
}]
|
|
}]
|
|
}
|
|
}],
|
|
|
|
"subnet4": [{
|
|
"subnet": "192.0.3.0/24",
|
|
"pools": [{
|
|
"pool": "192.0.3.100 - 192.0.3.250",
|
|
"client-class": "HA_server1"
|
|
}],
|
|
|
|
"option-data": [{
|
|
"name": "routers",
|
|
"data": "192.0.3.1"
|
|
}],
|
|
|
|
"relay": { "ip-address": "10.1.2.3" }
|
|
}]
|
|
}
|
|
|
|
This configuration is very similar to the ``load-balancing`` configuration
|
|
described in :ref:`ha-load-balancing-config`, with a few notable differences.
|
|
|
|
The ``mode`` is now set to ``hot-standby``, in which only one server responds to
|
|
DHCP clients. If the primary server is online, it responds to all DHCP queries.
|
|
The ``standby`` server takes over all DHCP traffic only if it discovers that the
|
|
primary is unavailable.
|
|
|
|
In this mode, the non-primary active server is called ``standby`` and that is
|
|
its role.
|
|
|
|
Finally, because there is always only one server responding to DHCP queries,
|
|
there is only one scope - "HA_server1" - in use within pool definitions. In fact,
|
|
the ``client-class`` parameter could be removed from this configuration without
|
|
harm, because there can be no conflicts in lease allocations by different
|
|
servers as they do not allocate leases concurrently. The ``client-class``
|
|
remains in this example mostly for demonstration purposes, to highlight the
|
|
differences between the ``hot-standby`` and ``load-balancing`` modes of
|
|
operation.
|
|
|
|
.. _ha-passive-backup-config:
|
|
|
|
Passive-Backup Configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The following is an example configuration file for the primary server in a
|
|
``passive-backup`` configuration:
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"Dhcp4": {
|
|
"hooks-libraries": [{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
}, {
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [{
|
|
"this-server-name": "server1",
|
|
"mode": "passive-backup",
|
|
"wait-backup-ack": false,
|
|
"peers": [{
|
|
"name": "server1",
|
|
"url": "http://192.168.56.33:8000/",
|
|
"role": "primary"
|
|
}, {
|
|
"name": "server2",
|
|
"url": "http://192.168.56.66:8000/",
|
|
"role": "backup"
|
|
}, {
|
|
"name": "server3",
|
|
"url": "http://192.168.56.99:8000/",
|
|
"basic-auth-user": "foo",
|
|
"basic-auth-password": "bar",
|
|
"role": "backup"
|
|
}]
|
|
}]
|
|
}
|
|
}],
|
|
|
|
"subnet4": [{
|
|
"subnet": "192.0.3.0/24",
|
|
"pools": [{
|
|
"pool": "192.0.3.100 - 192.0.3.250"
|
|
}],
|
|
|
|
"option-data": [{
|
|
"name": "routers",
|
|
"data": "192.0.3.1"
|
|
}],
|
|
|
|
"relay": { "ip-address": "10.1.2.3" }
|
|
}]
|
|
}
|
|
}
|
|
|
|
The configurations of three peers are included: one for the primary and two for
|
|
the backup servers.
|
|
|
|
Many of the parameters present in the ``load-balancing`` and ``hot-standby``
|
|
configuration examples are not relevant in the ``passive-backup`` mode, thus
|
|
they are not specified here. For example: ``heartbeat-delay``,
|
|
``max-unacked-clients``, ``max-rejected-lease-updates``, and others related to
|
|
the failover mechanism should not be specified in the ``passive-backup`` mode.
|
|
|
|
The ``wait-backup-ack`` is a boolean parameter not present in previous examples.
|
|
It defaults to ``false`` and must not be modified in the ``load-balancing`` and
|
|
``hot-standby`` modes. In the ``passive-backup`` mode this parameter can be set
|
|
to ``true``, which causes the primary server to expect acknowledgments to the
|
|
lease updates from the backup servers prior to responding to the DHCP client. It
|
|
ensures that the lease has propagated to all servers before the client is given
|
|
the lease, but it poses a risk of losing a DHCP service if there is a
|
|
communication problem with one of the backup servers. This setting also
|
|
increases the latency of the DHCP response, because of the time that the primary
|
|
spends waiting for the acknowledgements. We recommend that the
|
|
``wait-backup-ack`` setting be left at its default value (``false``) if the DHCP
|
|
service reliability is more important than consistency of the lease information
|
|
between the primary and the backups, and in all cases when the DHCP service
|
|
latency should be minimal.
|
|
|
|
.. note::
|
|
|
|
Currently, active servers place lease updates to be sent to peers onto
|
|
internal queues (one queue per peer/URL). In ``passive-backup`` mode, active
|
|
servers do not wait for lease updates to be acknowledged; thus during times
|
|
of heavy client traffic it is possible for the number of lease updates queued
|
|
for transmission to accumulate faster than they can be delivered. As client
|
|
traffic lessens the queues begin to empty. Since Kea 2.0.0, active servers
|
|
monitor the size of these queues and emit periodic warnings (see
|
|
HTTP_CLIENT_QUEUE_SIZE_GROWING in :ref:`kea-messages`) if they perceive a
|
|
queue as growing too quickly. The warnings cease once the queue size begins
|
|
to shrink. These messages are intended as a bellwether and seeing them
|
|
sporadically during times of heavy traffic load does not necessarily indicate
|
|
a problem. If, however, they occur continually during times of routine
|
|
traffic load, they likely indicate potential mismatches in server
|
|
capabilities and/or configuration; this should be investigated, as the size
|
|
of the queues may eventually impair an active server's ability to respond to
|
|
clients in a timely manner.
|
|
|
|
.. _ha-sharing-lease-info:
|
|
|
|
Lease Information Sharing
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
An HA-enabled server informs its active partner about allocated or renewed
|
|
leases by sending appropriate control commands, and the partner updates the
|
|
lease information in its own database. When the server starts up for the first
|
|
time or recovers after a failure, it synchronizes its lease database with its
|
|
partner. These two mechanisms guarantee consistency of the lease information
|
|
between the servers and allow the designation of one of the servers to handle
|
|
the entire DHCP traffic load if the other server becomes unavailable.
|
|
|
|
In some cases, though, it is desirable to disable lease updates and/or database
|
|
synchronization between the active servers, if the exchange of information about
|
|
the allocated leases is performed using some other mechanism. Kea supports
|
|
various database types that can be used to store leases, including MySQL and
|
|
PostgreSQL. Those databases include built-in solutions for data replication
|
|
which are often used by Kea administrators to provide redundancy.
|
|
|
|
The HA hook library supports such scenarios by disabling lease updates over the
|
|
control channel and/or lease-database synchronization, leaving the server to
|
|
rely on the database replication mechanism. This is controlled by the two
|
|
boolean parameters ``send-lease-updates`` and ``sync-leases``, whose values
|
|
default to ``true``:
|
|
|
|
::
|
|
|
|
"Dhcp4": {
|
|
"hooks-libraries": [
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
},
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [ {
|
|
"this-server-name": "server1",
|
|
"mode": "load-balancing",
|
|
"send-lease-updates": false,
|
|
"sync-leases": false,
|
|
"peers": [
|
|
{
|
|
"name": "server1",
|
|
"url": "http://192.168.56.33:8000/",
|
|
"role": "primary"
|
|
},
|
|
{
|
|
"name": "server2",
|
|
"url": "http://192.168.56.66:8000/",
|
|
"role": "secondary"
|
|
}
|
|
]
|
|
} ]
|
|
}
|
|
}
|
|
],
|
|
...
|
|
}
|
|
|
|
In the most typical use case, both parameters are set to the same value, i.e.
|
|
both are ``false`` if database replication is in use, or both are ``true``
|
|
otherwise. Introducing two separate parameters to control lease updates and
|
|
lease-database synchronization is aimed at possible special use cases; for
|
|
example, when synchronization is performed by copying a lease file (therefore
|
|
``sync-leases`` is set to ``false``), but lease updates should be conducted as
|
|
usual (``send-lease-updates`` is set to ``true``). It should be noted that Kea
|
|
does not natively support such use cases, but users may develop their own
|
|
scripts and tools around Kea to provide such mechanisms. The HA hook library
|
|
configuration is designed to maximize flexibility of administration.
|
|
|
|
.. _ha-syncing-page-limit:
|
|
|
|
Controlling Lease-Page Size Limit
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
An HA-enabled server initiates synchronization of the lease database after
|
|
downtime or upon receiving the ``ha-sync`` command. The server uses commands
|
|
described in :ref:`command-lease4-get-page` and :ref:`command-lease6-get-page`
|
|
to fetch leases from its partner server (lease queries). The size of the results
|
|
page (the maximum number of leases to be returned in a single response to one of
|
|
these commands) can be controlled via configuration of the HA hook library.
|
|
Increasing the page size decreases the number of lease queries sent to the
|
|
partner server, but it causes the partner server to generate larger responses,
|
|
which lengthens transmission time as well as increases memory and CPU
|
|
utilization on both servers. Decreasing the page size helps to decrease resource
|
|
utilization, but requires more lease queries to be issued to fetch the entire
|
|
lease database.
|
|
|
|
The default value of the ``sync-page-limit`` command controlling the page size
|
|
is 10000. This means that the entire lease database can be fetched with a single
|
|
command if the size of the database is equal to or less than 10000 lines.
|
|
|
|
.. _ha-syncing-timeouts:
|
|
|
|
Timeouts
|
|
~~~~~~~~
|
|
|
|
In deployments with a large number of clients connected to the network,
|
|
lease-database synchronization after a server failure may be a time-consuming
|
|
operation. The synchronizing server must gather all leases from its partner,
|
|
which yields a large response over the RESTful interface. The server receives
|
|
leases using the paging mechanism described in :ref:`ha-syncing-page-limit`.
|
|
Before the page of leases is fetched, the synchronizing server sends a
|
|
``dhcp-disable`` command to disable the DHCP service on the partner server. If
|
|
the service is already disabled, this command resets the timeout for the DHCP
|
|
service being disabled, which by default is set to 60 seconds. If fetching a
|
|
single page of leases takes longer than the specified time, the partner server
|
|
assumes that the synchronizing server has died and resumes its DHCP service. The
|
|
connection of the synchronizing server with its partner is also protected by the
|
|
timeout. If the synchronization of a single page of leases takes longer than the
|
|
specified time, the synchronizing server terminates the connection and the
|
|
synchronization fails. Both timeout values are controlled by a single
|
|
configuration parameter, ``sync-timeout``. The following configuration snippet
|
|
demonstrates how to modify the timeout for automatic re-enabling of the DHCP
|
|
service on the partner server and how to increase the timeout for fetching a
|
|
single page of leases from 60 seconds to 90 seconds:
|
|
|
|
::
|
|
|
|
"Dhcp4": {
|
|
"hooks-libraries": [
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
},
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [ {
|
|
"this-server-name": "server1",
|
|
"mode": "load-balancing",
|
|
"sync-timeout": 90000,
|
|
"peers": [
|
|
{
|
|
"name": "server1",
|
|
"url": "http://192.168.56.33:8000/",
|
|
"role": "primary"
|
|
},
|
|
{
|
|
"name": "server2",
|
|
"url": "http://192.168.56.66:8000/",
|
|
"role": "secondary"
|
|
}
|
|
]
|
|
} ]
|
|
}
|
|
}
|
|
],
|
|
...
|
|
}
|
|
|
|
It is important to note that extending this ``sync-timeout`` value may sometimes
|
|
be insufficient to prevent issues with timeouts during lease-database
|
|
synchronization. The control commands travel via the Control Agent, which also
|
|
monitors incoming (with a synchronizing server) and outgoing (with a DHCP server)
|
|
connections for timeouts. The DHCP server also monitors the connection from the
|
|
Control Agent for timeouts. Those timeouts cannot currently be modified via
|
|
configuration; extending these timeouts is only possible by modifying them in
|
|
the Kea code and recompiling the server. The relevant constants are located in
|
|
the Kea source at: ``src/lib/config/timeouts.h``.
|
|
|
|
.. _ha-pause-state-machine:
|
|
|
|
Pausing the HA State Machine
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The ``high-availability`` state machine includes many different states described
|
|
in detail in :ref:`ha-server-states`. The server enters each state when certain
|
|
conditions are met, most often taking into account the partner server's state.
|
|
In some states the server performs specific actions, e.g. synchronization of the
|
|
lease database in the ``syncing`` state, or responding to DHCP queries according
|
|
to the configured mode of operation in the ``load-balancing`` and ``hot-standby``
|
|
states.
|
|
|
|
By default, transitions between the states are performed automatically and the
|
|
server administrator has no direct control over when the transitions take place;
|
|
in most cases, the administrator does not need such control. In some situations,
|
|
however, the administrator may want to "pause" the HA state machine in a
|
|
selected state to perform some additional administrative actions before the
|
|
server transitions to the next state.
|
|
|
|
Consider a server failure which results in the loss of the entire lease database.
|
|
Typically, the server rebuilds its lease database when it enters the ``syncing``
|
|
state by querying the partner server for leases, but it is possible that the
|
|
partner was also experiencing a failure and lacks lease information. In this
|
|
case, it may be required to reconstruct lease databases on both servers from
|
|
some external source, e.g. a backup server. If the lease database is to be
|
|
reconstructed via the RESTful API, the servers should be started in the initial,
|
|
i.e. ``waiting``, state and remain in this state while leases are being added.
|
|
In particular, the servers should not attempt to synchronize their lease
|
|
databases nor start serving DHCP clients.
|
|
|
|
The HA hook library provides configuration parameters and a command to control
|
|
pausing and resuming the HA state machine. The following configuration causes
|
|
the HA state machine to pause in the ``waiting`` state after server startup.
|
|
|
|
::
|
|
|
|
"Dhcp4": {
|
|
"hooks-libraries": [
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
},
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [ {
|
|
"this-server-name": "server1",
|
|
"mode": "load-balancing",
|
|
"peers": [
|
|
{
|
|
"name": "server1",
|
|
"url": "http://192.168.56.33:8000/",
|
|
"role": "primary"
|
|
},
|
|
{
|
|
"name": "server2",
|
|
"url": "http://192.168.56.66:8000/",
|
|
"role": "secondary"
|
|
}
|
|
],
|
|
"state-machine": {
|
|
"states": [
|
|
{
|
|
"state": "waiting",
|
|
"pause": "once"
|
|
}
|
|
]
|
|
}
|
|
} ]
|
|
}
|
|
}
|
|
],
|
|
...
|
|
}
|
|
|
|
The ``pause`` parameter value ``once`` denotes that the state machine should be
|
|
paused upon the first transition to the ``waiting`` state; later transitions to
|
|
this state will not cause the state machine to pause. Two other supported values
|
|
of the ``pause`` parameter are ``always`` and ``never``. The latter is the
|
|
default value for each state, which instructs the server never to pause the
|
|
state machine.
|
|
|
|
In order to "unpause" the state machine, the ``ha-continue`` command must be
|
|
sent to the paused server. This command does not take any arguments. See
|
|
:ref:`ha-control-commands` for details about commands specific to the HA hook
|
|
library.
|
|
|
|
It is possible to configure the state machine to pause in more than one state.
|
|
Consider the following configuration:
|
|
|
|
::
|
|
|
|
"Dhcp4": {
|
|
"hooks-libraries": [
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
},
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [ {
|
|
"this-server-name": "server1",
|
|
"mode": "load-balancing",
|
|
"peers": [
|
|
{
|
|
"name": "server1",
|
|
"url": "http://192.168.56.33:8000/",
|
|
"role": "primary"
|
|
},
|
|
{
|
|
"name": "server2",
|
|
"url": "http://192.168.56.66:8000/",
|
|
"role": "secondary"
|
|
}
|
|
],
|
|
"state-machine": {
|
|
"states": [
|
|
{
|
|
"state": "ready",
|
|
"pause": "always"
|
|
},
|
|
{
|
|
"state": "partner-down",
|
|
"pause": "once"
|
|
}
|
|
]
|
|
}
|
|
} ]
|
|
}
|
|
}
|
|
],
|
|
...
|
|
}
|
|
|
|
This configuration instructs the server to pause the state machine every time it
|
|
transitions to the ``ready`` state and upon the first transition to the
|
|
``partner-down`` state.
|
|
|
|
Refer to :ref:`ha-server-states` for a complete list of server states. The state
|
|
machine can be paused in any of the supported states; however, it is not
|
|
practical to pause in the ``backup`` or ``terminated`` states because the server
|
|
never transitions out of these states anyway.
|
|
|
|
.. note::
|
|
|
|
In the ``syncing`` state the server is paused before it makes an attempt to
|
|
synchronize the lease database with a partner. To pause the state machine
|
|
after lease-database synchronization, use the ``ready`` state instead.
|
|
|
|
.. note::
|
|
|
|
The state of the HA state machine depends on the state of the cooperating
|
|
server. Therefore, pausing the state machine of one server may affect the
|
|
operation of the partner server. For example: if the primary server is paused
|
|
in the ``waiting`` state, the partner server will also remain in the
|
|
``waiting`` state until the state machine of the primary server is resumed
|
|
and that server transitions to the ``ready`` state.
|
|
|
|
.. _ha-ctrl-agent-config:
|
|
|
|
Control Agent Configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The :ref:`kea-ctrl-agent` describes in detail the Kea daemon, which provides a
|
|
RESTful interface to control the Kea servers. The same functionality is used by
|
|
the High Availability hook library to establish communication between the HA
|
|
peers. Therefore, the HA library requires that the Control Agent (CA) be started
|
|
for each DHCP instance within the HA setup. If the Control Agent is not started,
|
|
the peers cannot communicate with a particular DHCP server (even if the DHCP
|
|
server itself is online) and may eventually consider this server to be offline.
|
|
|
|
The following is an example configuration for the CA running on the same
|
|
machine as the primary server. This configuration is valid for both the
|
|
``load-balancing`` and the ``hot-standby`` cases presented in previous sections.
|
|
|
|
::
|
|
|
|
{
|
|
"Control-agent": {
|
|
"http-host": "192.168.56.33",
|
|
|
|
// If enabling HA and multi-threading, the 8000 port is used by the HA
|
|
// hook library http listener. When using HA hook library with
|
|
// multi-threading to function, make sure the port used by dedicated
|
|
// listener is different (e.g. 8001) than the one used by CA. Note
|
|
// the commands should still be sent via CA. The dedicated listener
|
|
// is specifically for HA updates only.
|
|
"http-port": 8000,
|
|
|
|
"control-sockets": {
|
|
"dhcp4": {
|
|
"socket-type": "unix",
|
|
"socket-name": "/tmp/kea-dhcp4-ctrl.sock"
|
|
},
|
|
"dhcp6": {
|
|
"socket-type": "unix",
|
|
"socket-name": "/tmp/kea-dhcp6-ctrl.sock"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
Since Kea 1.9.0, basic HTTP authentication is supported.
|
|
|
|
.. _ha-mt-config:
|
|
|
|
Multi-Threaded Configuration (HA+MT)
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
HA peer communication consists of specialized API commands sent between HA peers.
|
|
Prior to Kea 1.9.7, each peer had to be paired with a local instance of
|
|
``kea-ctrl-agent`` in order to exchange commands. The agent received HA commands
|
|
via HTTP, communicated via Linux socket with the local peer to carry out the
|
|
command, and then sent the response back to the requesting peer via HTTP. To
|
|
send HA commands, each peer opened its own HTTP client connection to the URL of
|
|
each of its peers.
|
|
|
|
In Kea 1.9.7 and newer, it is possible to configure HA to use direct
|
|
multi-threaded communication between peers. We refer to this mode as HA+MT.
|
|
With HA+MT enabled, each peer runs its own dedicated, internal HTTP listener
|
|
(i.e. server) which receives and responds to commands directly, thus eliminating
|
|
the need for an agent to carry out HA protocol between peers. In addition, both
|
|
the listener and client components use multi-threading to support multiple,
|
|
concurrent connections between peers. By eliminating the agent and executing
|
|
multiple command exchanges in parallel, HA throughput between peers should
|
|
improve considerably in most situations.
|
|
|
|
The following parameters have been added to the HA configuration, to support
|
|
HA+MT operation:
|
|
|
|
- ``enable-multi-threading`` - enables or disables multi-threading HA peer
|
|
communication (HA+MT). Kea core multi-threading must be enabled for HA+MT to
|
|
operate. When ``false``, the server relies on ``kea-ctrl-agent`` for
|
|
communication with its peer, and uses single-threaded HTTP client processing.
|
|
The default is ``true``.
|
|
|
|
- ``http-dedicated-listener`` - enables or disables the creation of a dedicated,
|
|
internal HTTP listener through which the server receives HA messages from its
|
|
peers. The internal listener replaces the role of ``kea-ctrl-agent`` traffic,
|
|
allowing peers to send their HA commands directly to each other. The listener
|
|
listens on the peer's ``url``. When ``false``, the server
|
|
relies on ``kea-ctrl-agent``. This parameter has been provided largely for
|
|
flexibility and testing; running HA+MT without dedicated listeners enabled
|
|
will substantially limit HA throughput. The default is ``true``.
|
|
|
|
- ``http-listener-threads`` - indicates the maximum number of threads the
|
|
dedicated listener should use. A value of ``0`` instructs the server to use the
|
|
same number of threads that the Kea core is using for DHCP multi-threading.
|
|
The default is ``0``.
|
|
|
|
- ``http-client-threads`` - indicates the maximum number of threads that should
|
|
be used to send HA messages to its peers. A value of ``0`` instructs the server
|
|
to use the same number of threads that the Kea core is using for DHCP
|
|
multi-threading. The default is ``0``.
|
|
|
|
These parameters are grouped together under a map element, ``multi-threading``,
|
|
as illustrated below:
|
|
|
|
::
|
|
|
|
"Dhcp4": {
|
|
"hooks-libraries": [
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
},
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [ {
|
|
"this-server-name": "server1",
|
|
"multi-threading": {
|
|
"enable-multi-threading": true,
|
|
"http-dedicated-listener": true,
|
|
"http-listener-threads": 4,
|
|
"http-client-threads": 4
|
|
},
|
|
"peers": [
|
|
// This is the configuration of this server instance.
|
|
{
|
|
"name": "server1",
|
|
// This specifies the URL of our server instance.
|
|
// Since the HA+MT uses a direct connection, the
|
|
// DHCPv4 server open its own socket. Note that it
|
|
// must be different than the one used by the CA
|
|
// (typically 8000). In this example, 8001 is used.
|
|
"url": "http://192.0.2.1:8001/",
|
|
// This server is primary. The other one must be
|
|
// secondary.
|
|
"role": "primary"
|
|
},
|
|
// This is the configuration of our HA peer.
|
|
{
|
|
"name": "server2",
|
|
// This specifies the URL of our server instance.
|
|
// Since the HA+MT uses a direct connection, the
|
|
// DHCPv4 server open its own socket. Note that it
|
|
// must be different than the one used by the CA
|
|
// (typically 8000). In this example, 8001 is used.
|
|
"url": "http://192.0.2.2:8001/",
|
|
// The partner is a secondary. This server is a
|
|
// primary as specified in the previous "peers"
|
|
// entry and in "this-server-name" before that.
|
|
"role": "secondary"
|
|
},
|
|
...
|
|
],
|
|
...
|
|
},
|
|
...
|
|
]
|
|
}
|
|
},
|
|
...
|
|
],
|
|
...
|
|
}
|
|
|
|
|
|
In the example above, HA+MT is enabled with four threads for the listener and
|
|
four threads for the client.
|
|
|
|
.. note::
|
|
|
|
It is essential to configure the ports correctly. One common mistake is to
|
|
configure CA to listen on port 8000 and also configure dedicated listeners on
|
|
port 8000. In such a configuration, the communication will still work over CA,
|
|
but it will be slow and the DHCP server will fail to bind sockets.
|
|
Administrators should ensure that dedicated listeners use a different port
|
|
(8001 is a suggested alternative); if ports are misconfigured or the ports
|
|
dedicated to CA are used, the performance bottlenecks caused by the
|
|
single-threaded nature of CA and the sequential nature of the UNIX socket
|
|
that connects CA to DHCP servers will nullify any performance gains offered
|
|
by HA+MT.
|
|
|
|
.. _ha-parked-packet-limit:
|
|
|
|
Parked-Packet Limit
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
Kea servers contain a mechanism by which the response to a client packet may
|
|
be held, pending completion of hook library work. We refer to this as "parking"
|
|
the packet. The HA hook library makes use of this mechanism. When an HA server
|
|
needs to send a lease update to its peer(s) to notify it of the change to the
|
|
lease, it will "park" the client response until the peer acknowledges the lease
|
|
update. At that point, the server will "unpark" the response and send it to the
|
|
client. This applies to client queries which cause lease changes, such as
|
|
DHCPREQUEST for DHCPv4 and Request, Renew, and Rebind for DHCPv6. It does not
|
|
apply to DHPCDISCOVERs (v4) or Solicits (v6).
|
|
|
|
There is a global parameter, ``parked-packet-limit``, that may be used to limit
|
|
the number of responses that may be parked at any given time. This acts as a
|
|
form of congestion handling and protects the server from being swamped when the
|
|
volume of client queries is outpacing the server's ability to respond. Once the
|
|
limit is reached, the server emits a log and drops any new responses until
|
|
parking spaces are available.
|
|
|
|
In general, smaller values for the parking lot limit are likely to cause more
|
|
drops but with shorter response times. Larger values are likely to result in
|
|
fewer drops but with longer response times. Currently, the default value for
|
|
``parked-packet-limit`` is 256.
|
|
|
|
.. warning::
|
|
|
|
Using too small a value may result in an unnecessarily high drop rate, while
|
|
using too large a value may lead to response times that are simply too long
|
|
to be useful. A value of 0, while allowed, disables the limit altogether, but
|
|
this is highly discouraged as it may lead to Kea servers becoming
|
|
unresponsive to clients. Choosing the best value is very site-specific; we
|
|
recommend users initially leave it at the default value of 256 and observe
|
|
how the system behaves over time with varying load conditions.
|
|
|
|
::
|
|
|
|
"Dhcp6": {
|
|
// Limit the number of concurrently parked packets to 128.
|
|
"parked-packet-limit": 128,
|
|
"hooks-libraries": [
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_lease_cmds.so",
|
|
"parameters": { }
|
|
},
|
|
{
|
|
"library": "/usr/lib/kea/hooks/libdhcp_ha.so",
|
|
"parameters": {
|
|
"high-availability": [ {
|
|
"this-server-name": "server1",
|
|
...
|
|
} ]
|
|
}
|
|
},
|
|
...
|
|
],
|
|
...
|
|
}
|
|
|
|
.. note::
|
|
|
|
While ``parked-packet-limit`` is not specifically tied to HA, currently HA
|
|
is the only ISC hook that employs packet parking.
|
|
|
|
.. _ha-maintenance:
|
|
|
|
Controlled Shutdown and Maintenance of DHCP Servers
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Having a pair of servers providing High Availability allows for controlled
|
|
shutdown and maintenance of those servers without disrupting the DHCP service.
|
|
For example, an administrator can perform an upgrade of one of the servers while
|
|
the other one continues to respond to DHCP queries. When the first server is
|
|
upgraded and back online, the upgrade can be performed for the second server.
|
|
|
|
A typical problem reported with early versions of the High Availability hook
|
|
library was that the administrator did not have direct control over the state of
|
|
the DHCP server. Shutting down one of the servers for maintenance did not
|
|
necessarily cause the other server to start responding to all DHCP queries,
|
|
because the failure-detection algorithm described in :ref:`ha-scope-transition`
|
|
requires that the partner not respond for a configured period of time and,
|
|
depending on the configuration, may also require that a number of DHCP requests
|
|
not be responded to for a specified period of time. The maintenance procedure,
|
|
however, requires that the administrator be able to instruct one of the servers
|
|
to instantly start serving all DHCP clients, and the other server to instantly
|
|
stop serving any DHCP clients, so it can be safely shut down.
|
|
|
|
The maintenance feature of the High Availability hook library addresses this
|
|
situation. The ``ha-maintenance-start`` command was introduced to allow the
|
|
administrator to put the pair of the active servers in a state in which one of
|
|
them is responding to all DHCP queries and the other one is awaiting shutdown.
|
|
|
|
Suppose that the HA setup includes two active servers, ``server1`` and
|
|
``server2``, and the latter needs to be shut down for maintenance.
|
|
The administrator can send the ``ha-maintenance-start`` command to ``server1``,
|
|
as this is the server which is going to handle the DHCP traffic while the other
|
|
one is offline. ``server1`` responds with an error if its state or the partner's
|
|
state does not allow for a maintenance shutdown: for example, if maintenance is
|
|
not supported for the backup server or if the server is in the ``terminated``
|
|
state. Also, an error is returned if the ``ha-maintenance-start`` request was
|
|
already sent to the other server.
|
|
|
|
Upon receiving the ``ha-maintenance-start`` command, ``server1`` sends the
|
|
``ha-maintenance-notify`` command to ``server2`` to put it in the
|
|
``in-maintenance`` state. If ``server2`` confirms, ``server1`` transitions to
|
|
the ``partner-in-maintenance`` state. This is similar to the ``partner-down``
|
|
state, except that in the ``partner-in-maintenance`` state ``server1`` continues
|
|
to send lease updates to ``server2`` until the administrator shuts down
|
|
``server2``. ``server1`` now responds to all DHCP queries.
|
|
|
|
The administrator can now safely shut down ``server2`` in the ``in-maintenance``
|
|
state and perform any necessary maintenance actions. While ``server2`` is
|
|
offline, ``server1`` will obviously not be able to communicate with its partner,
|
|
so it will immediately transition to the ``partner-down`` state; it will
|
|
continue to respond to all DHCP queries but will no longer send lease updates to
|
|
``server2``. Restarting ``server2`` after the maintenance will trigger normal
|
|
state negotiation, lease-database synchronization, and, ultimately, a transition
|
|
to the normal ``load-balancing`` or ``hot-standby`` state. Maintenance can then
|
|
be performed on ``server1``, after sending the ``ha-maintenance-start`` command
|
|
to ``server2``.
|
|
|
|
If the ``ha-maintenance-start`` command was sent to the server and the server
|
|
has transitioned to the ``partner-in-maintenance`` state, it is possible to
|
|
transition both it and its partner back to their previous states to resume the
|
|
normal operation of the HA pair. This is achieved by sending the
|
|
``ha-maintenance-cancel`` command to the server that is in the
|
|
``partner-in-maintenance`` state. However, if the server has already
|
|
transitioned to the ``partner-down`` state as a result of detecting that the
|
|
partner is offline, canceling the maintenance is no longer possible. In that
|
|
case, it is necessary to restart the other server and allow it to complete its
|
|
normal state negotiation process.
|
|
|
|
Upgrading From Older HA Versions
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
To upgrade from an older HA hook library to the current version, the
|
|
administrator must shut down one of the servers and rely on the failover
|
|
mechanism to force the online server to transition to the ``partner-down`` state,
|
|
where it starts serving all DHCP clients. Once the hook library on the first
|
|
server is upgraded to a current version, the ``ha-maintenance-start`` command
|
|
can be used to upgrade the second server.
|
|
|
|
In such a case, shut down the server running the old version. Next, send the
|
|
``ha-maintenance-start`` command to the server that has been upgraded. This
|
|
server should immediately transition to the ``partner-down`` state as it cannot
|
|
communicate with its offline partner. In the ``partner-down`` state the first
|
|
(upgraded) server will respond to all DHCP requests, allowing the administrator
|
|
to perform the upgrade on the second server.
|
|
|
|
.. note::
|
|
|
|
Do not send the ``ha-maintenance-start`` command while the server running the
|
|
old hook library is still online. The server receiving this command will
|
|
return an error.
|
|
|
|
|
|
.. _ha-control-commands:
|
|
|
|
Control Commands for High Availability
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Even though the HA hook library is designed to automatically resolve issues with
|
|
DHCP service interruptions by redirecting the DHCP traffic to a surviving server
|
|
and synchronizing the lease database as needed, it may be useful for the
|
|
administrator to have more control over both servers' behavior. In particular,
|
|
it may be useful to be able to trigger lease-database synchronization on demand,
|
|
or to manually set the HA scopes that are being served.
|
|
|
|
The backup server can sometimes be used to handle DHCP traffic if both active
|
|
servers are down. The backup server does not perform the failover function
|
|
automatically; thus, in order to use the backup server to respond to DHCP
|
|
queries, the server administrator must enable this function manually.
|
|
|
|
The following sections describe commands supported by the HA hook library which
|
|
are available for the administrator.
|
|
|
|
.. _command-ha-sync:
|
|
|
|
The ``ha-sync`` Command
|
|
-----------------------
|
|
|
|
The ``ha-sync`` command instructs the server to synchronize its local lease
|
|
database with the selected peer. The server fetches all leases from the peer and
|
|
updates any locally stored leases which are older than those fetched. It also
|
|
creates new leases when any of those fetched do not exist in the local database.
|
|
All leases that are not returned by the peer but are in the local database are
|
|
preserved. The database synchronization is unidirectional; only the database on
|
|
the server to which the command has been sent is updated. To synchronize the
|
|
peer's database, a separate ``ha-sync`` command must be issued to that peer.
|
|
|
|
Database synchronization may be triggered for both active and backup server
|
|
types. The ``ha-sync`` command has the following structure (in a DHCPv4 example):
|
|
|
|
::
|
|
|
|
{
|
|
"command": "ha-sync",
|
|
"service": [ "dhcp4 "],
|
|
"arguments": {
|
|
"server-name": "server2",
|
|
"max-period": 60
|
|
}
|
|
}
|
|
|
|
When the server receives this command it first disables the DHCP service of the
|
|
server from which it will be fetching leases, by sending the ``dhcp-disable``
|
|
command to that server. The ``max-period`` parameter specifies the maximum
|
|
duration (in seconds) for which the DHCP service should be disabled. If the DHCP
|
|
service is successfully disabled, the synchronizing server fetches leases from
|
|
the remote server by issuing one or more ``lease4-get-page`` commands. When the
|
|
lease-database synchronization is complete, the synchronizing server sends the
|
|
``dhcp-enable`` command to the peer to re-enable its DHCP service.
|
|
|
|
The ``max-period`` value should be sufficiently long to guarantee that it does
|
|
not elapse before the synchronization is completed. Otherwise, the DHCP server
|
|
will automatically enable its DHCP function while the synchronization is still
|
|
in progress. If the DHCP server subsequently allocates any leases during the
|
|
synchronization, those new (or updated) leases will not be fetched by the
|
|
synchronizing server, leading to database inconsistencies.
|
|
|
|
.. _command-ha-scopes:
|
|
|
|
The ``ha-scopes`` Command
|
|
-------------------------
|
|
|
|
This command allows an administrator to modify the HA scopes being served.
|
|
Consult :ref:`ha-load-balancing-config` and :ref:`ha-hot-standby-config` to
|
|
learn which scopes are available for the different HA modes of operation. The
|
|
``ha-scopes`` command has the following structure (in a DHCPv4 example):
|
|
|
|
::
|
|
|
|
{
|
|
"command": "ha-scopes",
|
|
"service": [ "dhcp4" ],
|
|
"arguments": {
|
|
"scopes": [ "HA_server1", "HA_server2" ]
|
|
}
|
|
}
|
|
|
|
This command configures the server to handle traffic from both the "HA_server1"
|
|
and "HA_server2" scopes. To disable all scopes specify an empty list:
|
|
|
|
::
|
|
|
|
{
|
|
"command": "ha-scopes",
|
|
"service": [ "dhcp4 "],
|
|
"arguments": {
|
|
"scopes": [ ]
|
|
}
|
|
}
|
|
|
|
.. _command-ha-continue:
|
|
|
|
The ``ha-continue`` Command
|
|
---------------------------
|
|
|
|
This command is used to resume the operation of the paused HA state machine, as
|
|
described in :ref:`ha-pause-state-machine`. It takes no arguments, so the
|
|
command structure is simply:
|
|
|
|
::
|
|
|
|
{
|
|
"command": "ha-continue",
|
|
"service": [ "dhcp4" ]
|
|
}
|
|
|
|
.. _command-ha-heartbeat:
|
|
|
|
The ``ha-heartbeat`` Command
|
|
----------------------------
|
|
|
|
The :ref:`ha-server-states` section describes how the ``ha-heartbeat`` command
|
|
is used by a pair of active HA servers to detect one partner's failure. This
|
|
command, however, can also be sent by the system administrator to one or both
|
|
servers to check their HA state. This allows a monitoring system to be deployed
|
|
on the HA enabled servers to periodically check whether they are operational or
|
|
whether any manual intervention is required. The ``ha-heartbeat`` command takes
|
|
no arguments:
|
|
|
|
::
|
|
|
|
{
|
|
"command": "ha-heartbeat",
|
|
"service": [ "dhcp4" ]
|
|
}
|
|
|
|
Upon successful communication with the server, a response similar to this should
|
|
be returned:
|
|
|
|
::
|
|
|
|
{
|
|
"result": 0,
|
|
"text": "HA peer status returned.",
|
|
"arguments":
|
|
{
|
|
"state": "partner-down",
|
|
"date-time": "Thu, 07 Nov 2019 08:49:37 GMT",
|
|
"scopes": [ "server1" ],
|
|
"unsent-update-count": 123
|
|
}
|
|
}
|
|
|
|
The returned ``state`` value should be one of the values listed in
|
|
:ref:`ha-server-states`. In the example above, the ``partner-down`` state is
|
|
returned, which indicates that the server which responded to the command
|
|
believes that its partner is offline; thus, it is serving all DHCP requests sent
|
|
to the servers. To ensure that the partner is indeed offline, the administrator
|
|
should send the ``ha-heartbeat`` command to the second server. If sending the
|
|
command fails, e.g. due to an inability to establish a TCP connection to the
|
|
Control Agent, or if the Control Agent reports issues with communication with
|
|
the DHCP server, it is very likely that the server is not running.
|
|
|
|
The ``date-time`` parameter conveys the server's notion of time.
|
|
|
|
The ``unsent-update-count`` value is a cumulative count of all unsent lease
|
|
updates since the server was booted; its value is set to 0 when the server is
|
|
started. It is never reset to 0 during the server's operation, even after the
|
|
partner synchronizes the database. It is incremented by the partner sending the
|
|
heartbeat response when it cannot send the lease update. For example, suppose
|
|
the failure is a result of a temporary communication interruption. In that case,
|
|
the partner receiving the ``partner-down`` heartbeat response tracks the value
|
|
changes and can determine, once communication is reestablished, whether there
|
|
are any new lease updates that it did not receive. If the values on both servers
|
|
do not match, it is an indication that the partner should synchronize its lease
|
|
database. A non-zero value itself is not an indication of any present issues
|
|
with lease updates, but a constantly incrementing value is.
|
|
|
|
The typical response returned by one server when both are
|
|
operational is:
|
|
|
|
::
|
|
|
|
{
|
|
"result": 0,
|
|
"text": "HA peer status returned.",
|
|
"arguments":
|
|
{
|
|
"state": "load-balancing",
|
|
"date-time": "Thu, 07 Nov 2019 08:49:37 GMT",
|
|
"scopes": [ "server1" ],
|
|
"unsent-update-count": 0
|
|
}
|
|
}
|
|
|
|
In most cases, the ``ha-heartbeat`` command should be sent to both HA-enabled
|
|
servers to verify the state of the entire HA setup. In particular, if one of the
|
|
servers indicates that it is in the ``load-balancing`` state, it means that this
|
|
server is operating as if its partner is functional. When a partner goes down,
|
|
it takes some time for the surviving server to realize it. The
|
|
:ref:`ha-scope-transition` section describes the algorithm which the surviving
|
|
server follows before it transitions to the ``partner-down`` state. If the
|
|
``ha-heartbeat`` command is sent during the time window between the failure of
|
|
one of the servers and the transition of the surviving server to the
|
|
``partner-down`` state, the response from the surviving server does not reflect
|
|
the failure. Resending the command detects the failure once the surviving server
|
|
has entered the ``partner-down`` state.
|
|
|
|
.. note:
|
|
|
|
Always send the ``ha-heartbeat`` command to both active HA servers to check
|
|
the state of the entire HA setup. Sending it to only one of the servers may
|
|
not reflect issues that just began with one of the servers.
|
|
|
|
.. _command-ha-status-get:
|
|
|
|
The ``status-get`` Command
|
|
--------------------------
|
|
|
|
``status-get`` is a general-purpose command supported by several Kea daemons,
|
|
not only the DHCP servers. However, when sent to a DHCP server with HA enabled,
|
|
it can be used to get insight into the details of the HA-specific server status.
|
|
Not only does the response contain the status information of the server
|
|
receiving this command, but also the information about its partner if it is
|
|
available.
|
|
|
|
The following is an example response to the ``status-get`` command, including
|
|
the HA status of two ``load-balancing`` servers:
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"result": 0,
|
|
"text": "",
|
|
"arguments": {
|
|
"pid": 1234,
|
|
"uptime": 3024,
|
|
"reload": 1111,
|
|
"high-availability": [
|
|
{
|
|
"ha-mode": "load-balancing",
|
|
"ha-servers": {
|
|
"local": {
|
|
"role": "primary",
|
|
"scopes": [ "server1" ],
|
|
"state": "load-balancing"
|
|
},
|
|
"remote": {
|
|
"age": 10,
|
|
"in-touch": true,
|
|
"role": "secondary",
|
|
"last-scopes": [ "server2" ],
|
|
"last-state": "load-balancing",
|
|
"communication-interrupted": true,
|
|
"connecting-clients": 2,
|
|
"unacked-clients": 1,
|
|
"unacked-clients-left": 2,
|
|
"analyzed-packets": 8
|
|
}
|
|
}
|
|
}
|
|
],
|
|
"multi-threading-enabled": true,
|
|
"thread-pool-size": 4,
|
|
"packet-queue-size": 64,
|
|
"packet-queue-statistics": [ 0.2, 0.1, 0.1 ],
|
|
"sockets": {
|
|
"status": "ready"
|
|
}
|
|
}
|
|
}
|
|
|
|
The ``high-availability`` argument is a list which currently comprises only one
|
|
element.
|
|
|
|
The ``ha-servers`` map contains two structures: ``local`` and ``remote``. The
|
|
former contains the status information of the server which received the command,
|
|
while the latter contains the status information known to the local server about
|
|
the partner. The ``role`` of the partner server is gathered from the local
|
|
configuration file, and thus should always be available. The remaining status
|
|
information, such as ``last-scopes`` and ``last-state``, is not available until
|
|
the local server communicates with the remote by successfully sending the
|
|
``ha-heartbeat`` command. If at least one such communication has taken place,
|
|
the returned value of the ``in-touch`` parameter is set to ``true``. By
|
|
examining this value, the command's sender can determine whether the information
|
|
about the remote server is reliable.
|
|
|
|
The ``last-scopes`` and ``last-state`` parameters contain information about the
|
|
HA scopes served by the partner and its state. This information is gathered
|
|
during the heartbeat command exchange, so it may not be accurate if a
|
|
communication problem occurs between the partners and this status information is
|
|
not refreshed. In such a case, it may be useful to send the ``status-get``
|
|
command to the partner server directly to check its current state. The ``age``
|
|
parameter specifies the age of the information from the partner, in seconds.
|
|
|
|
The ``communication-interrupted`` boolean value indicates whether the server
|
|
receiving the ``status-get`` command (the local server) has been unable to
|
|
communicate with the partner longer than the duration specified as
|
|
``max-response-delay``. In such a situation, the active servers are considered
|
|
to be in the ``communication-interrupted`` state. At this point, the local
|
|
server may start monitoring the DHCP traffic directed to the partner to see if
|
|
the partner is responding to this traffic. More about the failover procedure can
|
|
be found in :ref:`ha-load-balancing-config`.
|
|
|
|
The ``connecting-clients``, ``unacked-clients``, ``unacked-clients-left``, and
|
|
``analyzed-packets`` parameters were introduced along with the
|
|
``communication-interrupted`` parameter and they convey useful information about
|
|
the state of the DHCP traffic monitoring in the ``communication-interrupted``
|
|
state. Once the server leaves the ``communication-interrupted`` state, these
|
|
parameters are all reset to 0.
|
|
|
|
These parameters have the following meaning in the ``communication-interrupted``
|
|
state:
|
|
|
|
- ``connecting-clients`` - this is the number of different clients which have
|
|
attempted to get a lease from the remote server. These clients are
|
|
differentiated by their MAC address and client identifier (in DHCPv4) or DUID
|
|
(in DHCPv6). This number includes "unacked" clients (for which the "secs"
|
|
field or "elapsed time" value exceeded the ``max-response-delay``).
|
|
|
|
- ``unacked-clients`` - this is the number of different clients which have been
|
|
considered "unacked", i.e. the clients which have been trying to get the
|
|
lease longer than the value of the "secs" field, or for which the
|
|
"elapsed time" exceeded the ``max-response-delay`` setting.
|
|
|
|
- ``unacked-clients-left`` - this indicates the number of additional clients
|
|
which have to be considered "unacked" before the server enters the
|
|
``partner-down`` state. This value decreases when the ``unacked-clients``
|
|
value increases. The local server enters the ``partner-down`` state when this
|
|
value decreases to 0.
|
|
|
|
- ``analyzed-packets`` - this is the total number of packets directed to the
|
|
partner server and analyzed by the local server since entering the
|
|
communication interrupted state. It includes retransmissions from the same
|
|
clients.
|
|
|
|
Monitoring these values helps to predict when the local server will enter the
|
|
``partner-down`` state or to understand why the server has not yet entered this
|
|
state.
|
|
|
|
The ``ha-mode`` parameter returns the HA mode of operation selected using the
|
|
``mode`` parameter in the configuration file. It can hold one of the following
|
|
values: ``load-balancing``, ``hot-standby``, or ``passive-backup``.
|
|
|
|
The ``status-get`` response has the format described above only in the
|
|
``load-balancing`` and ``hot-standby`` modes. In the ``passive-backup`` mode the
|
|
``remote`` map is not included in the response because in this mode there is
|
|
only one active server (local). The response includes no information about the
|
|
status of the backup servers.
|
|
|
|
.. _command-ha-maintenance-start:
|
|
|
|
The ``ha-maintenance-start`` Command
|
|
------------------------------------
|
|
|
|
This command is used to initiate the transition of the server's partner into the
|
|
``in-maintenance`` state and the transition of the server receiving the command
|
|
into the ``partner-in-maintenance`` state. See the :ref:`ha-maintenance` section
|
|
for details.
|
|
|
|
::
|
|
|
|
{
|
|
"command": "ha-maintenance-start",
|
|
"service": [ "dhcp4" ]
|
|
}
|
|
|
|
.. _command-ha-maintenance-cancel:
|
|
|
|
The ``ha-maintenance-cancel`` Command
|
|
-------------------------------------
|
|
|
|
This command is used to cancel the maintenance previously initiated using the
|
|
``ha-maintenance-start`` command. The server receiving this command will first
|
|
send ``ha-maintenance-notify``, with the ``cancel`` flag set to ``true``, to its
|
|
partner. Next, the server reverts from the ``partner-in-maintenance`` state to
|
|
its previous state. See the :ref:`ha-maintenance` section for details.
|
|
|
|
::
|
|
|
|
{
|
|
"command": "ha-maintenance-cancel",
|
|
"service": [ "dhcp4" ]
|
|
}
|
|
|
|
.. _command-ha-maintenance-notify:
|
|
|
|
The ``ha-maintenance-notify`` Command
|
|
-------------------------------------
|
|
|
|
This command is sent by the server receiving the ``ha-maintenance-start`` or the
|
|
``ha-maintenance-cancel`` command to its partner, to cause the partner to
|
|
transition to the ``in-maintenance`` state or to revert from this state to a
|
|
previous state. See the :ref:`ha-maintenance` section for details.
|
|
|
|
::
|
|
|
|
{
|
|
"command": "ha-maintenance-notify",
|
|
"service": [ "dhcp4" ],
|
|
"arguments": {
|
|
"cancel": false
|
|
}
|
|
}
|
|
|
|
.. warning::
|
|
|
|
The ``ha-maintenance-notify`` command is not meant to be used by system
|
|
administrators. It is used for internal communication between a pair of
|
|
HA-enabled DHCP servers. Direct use of this command is not supported and may
|
|
produce unintended consequences.
|
|
|
|
.. _command-ha-reset:
|
|
|
|
The ``ha-reset`` Command
|
|
------------------------
|
|
|
|
This command causes the server to reset its High Availability state machine by
|
|
transitioning it to the ``waiting`` state. A partner in the
|
|
``communication-recovery`` state may send this command to cause the server
|
|
to synchronize its lease database. Database synchronization is required when the
|
|
partner has failed to send all lease database updates after re-establishing
|
|
connection after a temporary connection failure. It is also required when the
|
|
``delayed-updates-limit`` is exceeded, when the server is in the
|
|
``communication-recovery`` state.
|
|
|
|
A server administrator may send this command to reset a misbehaving state
|
|
machine.
|
|
|
|
This command includes no arguments:
|
|
|
|
::
|
|
|
|
{
|
|
"command": "ha-reset",
|
|
"service": [ "dhcp4" ]
|
|
}
|
|
|
|
And elicits the response:
|
|
|
|
::
|
|
|
|
{
|
|
"result": 0,
|
|
"text": "HA state machine reset."
|
|
}
|
|
|
|
If the server receiving this command is already in the ``waiting`` state, the
|
|
command has no effect.
|
|
|
|
.. _command-ha-sync-complete-notify:
|
|
|
|
The ``ha-sync-complete-notify`` Command
|
|
---------------------------------------
|
|
|
|
A server sends this command to its partner to signal that it has completed
|
|
lease-database synchronization. The partner may enable its DHCP service if it
|
|
can allocate new leases in its current state. The partner does not enable the
|
|
DHCP service in the ``partner-down`` state until it sends a successful heartbeat
|
|
test to its partner server. If the connection is still unavailable, the server
|
|
in the ``partner-down`` state enables its own DHCP service to continue
|
|
responding to clients.
|
|
|
|
This command includes no arguments:
|
|
|
|
::
|
|
|
|
{
|
|
"command": "ha-sync-complete-notify",
|
|
"service": [ "dhcp4" ]
|
|
}
|
|
|
|
And elicits the response:
|
|
|
|
::
|
|
|
|
{
|
|
"result": 0,
|
|
"text": "Server successfully notified about the synchronization completion."
|
|
}
|
|
|
|
.. warning::
|
|
|
|
The ``ha-sync-complete-notify`` command is not meant to be used by system
|
|
administrators. It is used for internal communication between a pair of
|
|
HA-enabled DHCP servers. Direct use of this command is not supported and may
|
|
produce unintended consequences.
|