mirror of
https://gitlab.isc.org/isc-projects/kea
synced 2025-08-31 05:55:28 +00:00
[5112] Several text corrections
This commit is contained in:
@@ -9,14 +9,14 @@
|
||||
|
||||
@section parserIntro Parser background
|
||||
|
||||
Kea's format of choice is JSON, which is used in configuration files, in the
|
||||
command channel and also when communicating between DHCP servers and DHCP-DDNS
|
||||
component. It is almost certain that it will be used as the syntax for any
|
||||
upcoming features.
|
||||
Kea's data format of choice is JSON (https://tools.ietf.org/html/rfc7159), which
|
||||
is used in configuration files, in the command channel and also when
|
||||
communicating between DHCP servers and DHCP-DDNS component. It is almost certain
|
||||
it will be used as the data format for any new features.
|
||||
|
||||
Historically, Kea used @ref isc::data::Element::fromJSON and @ref
|
||||
isc::data::Element::fromJSONFile methods to parse received data that is expected
|
||||
to be in JSON syntax. This in-house parser was developed back in early BIND10
|
||||
to be in JSON syntax. This in-house parser was developed back in the early BIND10
|
||||
days. Its two main advantages were that it didn't have any external dependencies
|
||||
and that it was already available in the source tree when the Kea project
|
||||
started. On the other hand, it was very difficult to modify (several attempts to
|
||||
@@ -49,9 +49,9 @@ and here: http://kea.isc.org/wiki/SimpleParser.
|
||||
To solve the issue of phase 1 mentioned earlier, a new parser has been developed
|
||||
that is based on flex and bison tools. The following text uses DHCPv6 as an
|
||||
example, but the same principle applies to DHCPv4 and D2 and CA will likely to
|
||||
follow. The new parser consists of two core elements (the following description
|
||||
is slightly oversimplified to convey the intent, more detailed description
|
||||
is available in the following sections):
|
||||
follow. The new parser consists of two core elements with a wrapper around them
|
||||
(the following description is slightly oversimplified to convey the intent, more
|
||||
detailed description is available in the following sections):
|
||||
|
||||
-# Flex lexer (src/bin/dhcp6/dhcp6_lexer.ll) that is essentially a set of
|
||||
regular expressions with C++ code that creates new tokens that represent whatever
|
||||
@@ -87,20 +87,23 @@ is available in the following sections):
|
||||
(a token with a value of 100), RCURLY_BRACKET, RCURLY_BRACKET, END
|
||||
|
||||
-# Parser context. As there is some information that needs to be passed between
|
||||
parser and lexer, @ref isc::dhcp::Parser6Context is a convenient to wrapper
|
||||
parser and lexer, @ref isc::dhcp::Parser6Context is a convenience wrapper
|
||||
around those two bundled together. It also works as a nice encapsulation,
|
||||
hiding all the flex/bison details underneath.
|
||||
|
||||
@section parserBuild Building flex/bison code
|
||||
|
||||
The only input file used by flex is the .ll file. The only input file used
|
||||
by bison is the .yy file. When processed, those two tools will generate
|
||||
a number of .hh and .cc files. The major ones are names the same as their
|
||||
.ll and .yy counterparts (e.g. dhcp6_lexer.cc, dhcp6_parser.cc and dhcp6_parser.h),
|
||||
but there's a number of additional files created: location.hh, position.hh
|
||||
and stack.hh. Those are internal bison headers that are needed. To avoid every
|
||||
user to have flex and bison installed, we chose to generate the files and
|
||||
add them to the Kea repository. To generate those files, do the following:
|
||||
The only input file used by flex is the .ll file. The only input file used by
|
||||
bison is the .yy file. When making changes to the lexer or parser, only those
|
||||
two files are edited. When processed, those two tools will generate a number of
|
||||
.hh and .cc files. The major ones are named the same as their .ll and .yy
|
||||
counterparts (e.g. dhcp6_lexer.cc, dhcp6_parser.cc and dhcp6_parser.h), but
|
||||
there's a number of additional files created: location.hh, position.hh and
|
||||
stack.hh. Those are internal bison headers that are needed for compilation.
|
||||
|
||||
To avoid every user to have flex and bison installed, we chose to generate the
|
||||
files and add them to the Kea repository. To generate those files, do the
|
||||
following:
|
||||
|
||||
@code
|
||||
./configure --enable-generate-parser
|
||||
@@ -120,7 +123,9 @@ generated may be different and cause unnecessarily large diffs, may cause
|
||||
coverity/cpp-check issues appear and disappear and cause general unhappiness.
|
||||
To avoid those problems, we will introduce a requirement to generate flex/bison
|
||||
files on one dedicated machine. This machine will likely be docs. Currently Ops
|
||||
is working on installing the necessary versions of flex/bison required
|
||||
is working on installing the necessary versions of flex/bison required, but
|
||||
for the time being we can use the versions installed in Francis' home directory
|
||||
(export PATH=/home/fdupont/bin:$PATH).
|
||||
|
||||
Note: the above applies only to the code being merged on master. It is probably
|
||||
ok to generate the files on your development branch with whatever version you
|
||||
@@ -145,10 +150,10 @@ documented, but the docs for it may be a bit cryptic. When developing new
|
||||
parsers, it's best to start by copying whatever we have for DHCPv6 and tweak as
|
||||
needed.
|
||||
|
||||
Second addition are flex conditions. They're defined with %x and they define a
|
||||
Second addition are flex conditions. They're defined with %%x and they define a
|
||||
state of the lexer. A good example of a state may be comment. Once the lexer
|
||||
detects that a comment has started, it switches to certain condition (by calling
|
||||
BEGIN(COMMENT) for example) and the code should ignore whatever follows
|
||||
detects that a comment's beginning, it switches to a certain condition (by calling
|
||||
BEGIN(COMMENT) for example) and the code then ignores whatever follows
|
||||
(especially strings that look like valid tokens) until the comment is closed
|
||||
(when it returns to the default condition by calling BEGIN(INITIAL)). This is
|
||||
something that is not frequently used and the only use cases for it are the
|
||||
@@ -157,7 +162,7 @@ forementioned comments and file inclusions.
|
||||
Second addition are parser contexts. Let's assume we have a parser that uses
|
||||
"ip-address" regexp that would return IP_ADDRESS token. Whenever we want to
|
||||
allow "ip-address", the grammar allows IP_ADDRESS token to appear. When the
|
||||
lexer is called, it will match the regexp, will generate IP_ADDRESS token and
|
||||
lexer is called, it will match the regexp, will generate the IP_ADDRESS token and
|
||||
the parser will carry out its duty. This works fine as long as you have very
|
||||
specific grammar that defines everything. Sadly, that's not the case in DHCP as
|
||||
we have hooks. Hook libraries can have parameters that are defined by third
|
||||
@@ -193,7 +198,7 @@ in src/bin/dhcp6/dhcp6_parser.yy. Here's a simplified excerpt of it:
|
||||
dhcp6_object: DHCP6 COLON LCURLY_BRACKET global_params RCURLY_BRACKET;
|
||||
|
||||
// This defines all parameters that may appear in the Dhcp6 object.
|
||||
// It can either contain a global_param (defined below) or a
|
||||
// It can either contain a global_param (defined below) or a
|
||||
// global_params list, followed by a comma followed by a global_param.
|
||||
// Note this definition is recursive and can expand to a single
|
||||
// instance of global_param or multiple instances separated by commas.
|
||||
@@ -201,7 +206,7 @@ dhcp6_object: DHCP6 COLON LCURLY_BRACKET global_params RCURLY_BRACKET;
|
||||
global_params: global_param
|
||||
| global_params COMMA global_param
|
||||
;
|
||||
|
||||
|
||||
// These are the parameters that are allowed in the top-level for
|
||||
// Dhcp6.
|
||||
global_param: preferred_lifetime
|
||||
@@ -222,9 +227,9 @@ global_param: preferred_lifetime
|
||||
| server_id
|
||||
| dhcp4o6_port
|
||||
;
|
||||
|
||||
|
||||
renew_timer: RENEW_TIMER COLON INTEGER;
|
||||
|
||||
|
||||
// Many other definitions follow.
|
||||
@endcode
|
||||
|
||||
@@ -244,7 +249,7 @@ rule.
|
||||
|
||||
The "leaf" rules that don't contain any other rules, must be defined by a
|
||||
series of tokens. An example of such a rule is renew_timer above. It is defined
|
||||
as a series of 3 tokens: RENEW_TIMER, COLON and INTEGER.
|
||||
as a series of 3 tokens: RENEW_TIMER, COLON and INTEGER.
|
||||
|
||||
Speaking of integers, it is worth noting that some tokens can have values. Those
|
||||
values are defined using %token clause. For example, dhcp6_parser.yy has the
|
||||
@@ -272,7 +277,7 @@ renew_timer with some extra code:
|
||||
@code
|
||||
renew_timer: RENEW_TIMER {
|
||||
cout << "renew-timer token detected, so far so good" << endl;
|
||||
} COLON {
|
||||
} COLON {
|
||||
cout << "colon detected!" << endl;
|
||||
} INTEGER {
|
||||
uint32_t timer = $3;
|
||||
@@ -298,11 +303,11 @@ ncr_protocol: NCR_PROTOCOL {
|
||||
ctx.enter(ctx.NCR_PROTOCOL); (1)
|
||||
} COLON ncr_protocol_value {
|
||||
ctx.stack_.back()->set("ncr-protocol", $4); (3)
|
||||
ctx.leave();
|
||||
ctx.leave(); (4)
|
||||
};
|
||||
|
||||
ncr_protocol_value:
|
||||
UDP { $$ = ElementPtr(new StringElement("UDP", ctx.loc2pos(@1))); }
|
||||
UDP { $$ = ElementPtr(new StringElement("UDP", ctx.loc2pos(@1))); }
|
||||
| TCP { $$ = ElementPtr(new StringElement("TCP", ctx.loc2pos(@1))); } (2)
|
||||
;
|
||||
@endcode
|
||||
@@ -358,8 +363,8 @@ The first line creates an instance of IntElement with a value of the token. The
|
||||
second line adds it to the current map (current = the last on the stack). This
|
||||
approach has a very nice property of being generic. This rule can be referenced
|
||||
from global and subnet scope (and possibly other scopes as well) and the code
|
||||
will add the IntElement object to whatever is last on the stack, be it
|
||||
global, subnet or perhaps even something else (maybe we will allow preferred
|
||||
will add the IntElement object to whatever is last on the stack, be it global,
|
||||
subnet or perhaps even something else (maybe one day we will allow preferred
|
||||
lifetime to be defined on a per pool or per host basis?).
|
||||
|
||||
@section parserSubgrammar Parsing partial grammar
|
||||
@@ -385,6 +390,9 @@ This trick is also implemented in the lexer. There's a flag called start_token_f
|
||||
When initially set to true, it will cause the lexer to emit an artificial
|
||||
token once, before parsing any input whatsoever.
|
||||
|
||||
This optional feature can be skipped altogether if you don't plan to parse parts
|
||||
of the configuration.
|
||||
|
||||
@section parserBisonExtend Extending grammar
|
||||
|
||||
Adding new parameters to existing parsers is very easy once you get hold of the
|
||||
@@ -402,7 +410,7 @@ Here's the complete set of necessary changes.
|
||||
@code
|
||||
SUBNET_4O6_INTERFACE_ID "4o6-interface-id"
|
||||
@endcode
|
||||
This defines a token called SUBNET_4O6_INTERFACE_ID that, when needed to
|
||||
This defines a token called SUBNET_4O6_INTERFACE_ID that, when needed to
|
||||
be printed, will be represented as "4o6-interface-id".
|
||||
|
||||
2. Tell lexer how to recognize the new parameter:
|
||||
@@ -439,7 +447,7 @@ Here's the complete set of necessary changes.
|
||||
weird that happens to match our reserved keywords. Therefore we switch to
|
||||
no keyword context. This tells the lexer to interpret everything as string,
|
||||
integer or float.
|
||||
|
||||
|
||||
4. Finally, extend the existing subnet4_param that defines all allowed parameters
|
||||
in Subnet4 scope to also cover our new parameter (the new line marked with *):
|
||||
@code
|
||||
|
Reference in New Issue
Block a user