Add support for MLDv1 and MLDv2. The behavior is not that different from
IGMP. Packets to all-hosts address and queries are always flooded,
reports go to routers, routers are added when a query is observed, and
all MLD packets go through slow path.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Cc: Flavio Leitner <fbl@redhat.com>
Cc: Ben Pfaff <blp@nicira.com>
[blp@nicira.com moved an assignment out of an 'if' statement]
Signed-off-by: Ben Pfaff <blp@nicira.com>
The current support for Geneve in OVS is exactly equivalent to VXLAN:
it is possible to set and match on the VNI but not on any options
contained in the header. This patch enables the use of options.
The goal for Geneve support is not to add support for any particular option
but to allow end users or controllers to specify what they would like to
match. That is, the full range of Geneve's capabilities should be exposed
without modifying the code (the one exception being options that require
per-packet computation in the fast path).
The main issue with supporting Geneve options is how to integrate the
fields into the existing OpenFlow pipeline. All existing operations
are referred to by their NXM/OXM field name - matches, action generation,
arithmetic operations (i.e. tranfer to a register). However, the Geneve
option space is exactly the same as the OXM space, so a direct mapping
is not feasible. Instead, we create a pool of 64 NXMs that are then
dynamically mapped on Geneve option TLVs using OpenFlow. Once mapped,
these fields become first-class citizens in the OpenFlow pipeline.
An example of how to use Geneve options:
ovs-ofctl add-geneve-map br0 {class=0xffff,type=0,len=4}->tun_metadata0
ovs-ofctl add-flow br0 in_port=LOCAL,actions=set_field:0xffffffff->tun_metadata0,1
This will add a 4 bytes option (filled will all 1's) to all packets
coming from the LOCAL port and then send then out to port 1.
A limitation of this patch is that although the option table is specified
for a particular switch over OpenFlow, it is currently global to all
switches. This will be addressed in a future patch.
Based on work originally done by Madhu Challa. Ben Pfaff also significantly
improved the comments.
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
We have a special flow_metadata structure to represent the parts
of a packet that aren't carried in the payload itself. This is
used in the case where we need to send the packet as a Packet In
to an OpenFlow controller. This is a subset of the more general
struct flow.
In practice, almost all operations we do on this structure involve
converting it to or from a match or have code that is the same as
a match. Serialization to NXM and back is done as a match. There
is special flow_metadata formatting code that is almost identical
to match formatting.
The uses for struct flow_metadata aren't performance critical
when it comes to memory, so we can save quite a bit of code by
just using a match structure directly instead. In addition, as
metadata increases and becomes more complex (Geneve options require
some special handling beyond just additional fields), using the
match structure means we only have to do this work in one place.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Introduces two new NXMs to represent VXLAN-GBP [0] fields.
actions=load:0x10->NXM_NX_TUN_GBP_ID[],NORMAL
tun_gbp_id=0x10,actions=drop
This enables existing VXLAN tunnels to carry security label
information such as a SELinux context to other network peers.
The values are carried to/from the datapath using the attribute
OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS.
[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy-00
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
A "conjunctive match" allows higher-level matches in the flow table, such
as set membership matches, without causing a cross-product explosion for
multidimensional matches. Please refer to the documentation that this
commit adds to ovs-ofctl(8) for a better explanation, including an example.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
So far the compressed flow data in struct miniflow has been in 32-bit
words with a 63-bit map, allowing for a maximum size of struct flow of
252 bytes. With the forthcoming Geneve options this is not sufficient
any more.
This patch solves the problem by changing the miniflow data to 64-bit
words, doubling the flow max size to 504 bytes. Since the word size
is doubled, there is some loss in compression efficiency. To counter
this some of the flow fields have been reordered to keep related
fields together (e.g., the source and destination IP addresses share
the same 64-bit word).
This change should speed up flow data processing on 64-bit CPUs, which
may help counterbalance the impact of making the struct flow bigger in
the future.
Classifier lookup stage boundaries are also changed to 64-bit
alignment, as the current algorithm depends on each miniflow word to
not be split between ranges. This has resulted in new padding (part
of the 'mpls_lse' field).
The 'dp_hash' field is also moved to packet metadata to eliminate
otherwise needed padding there. This allows the L4 to fit into one
64-bit word, and also makes matches on 'dp_hash' more efficient as
misses can be found already on stage 1.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Use MAP_FOR_EACH_INDEX to improve readability when there is no
apparent cost for it.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
minimask_equal() and minimask_has_extra() can take benefit from the
fact that minimasks have no zero data.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
miniflow_get__() can be used when it is known that the miniflow stores
a value at the given index.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This field allows a flow table to match on the output port currently in the
action set.
ONF-JIRA: EXT-233
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
ETH_ADDR_LEN is defined in lib/packets.h, valued 6.
Use this macro instead of magic number 6 to represent the length
of eth mac address.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Megaflow inserts and removals are simplified:
- No need for classifier internal mutex, as dpif-netdev already has a
'flow_mutex'.
- Number of memory allocations/frees can be halved.
- Lookup code path can rely on netdev_flow_key always having inline data.
This will also be easier to simplify further when moving to per-thread
megaflow classifiers in the future.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
It seemed awkward to have declarations outside the for loop.
This may also be a little faster because it avoids some calls to
count_1bits(). The idea for that change is due to Jarno Rajahalme
<jrajahalme@nicira.com>.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Add function flow_wildcards_init_for_packet() that can be used to set
sensible wildcards when megaflows are disabled. Before this, we set
all the mask bits to ones, which caused printing tunnel, mpls, and/or
transport port fields even for packets for which it makes no sense.
This has the side effect of generating different megaflow masks for
different packet types, so there will be more than one kind of mask in
the datapath classifier. This should not make practical difference,
as megaflows should not be disabled when performance is important.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The patch contains the necessary modifications to compile and also to run
under MSVC.
Added the files to the build system and also changed dpif_linux to be under
a more generic name dpif_windows.
Added a TODO under the windows part in case we want to implement another
counterpart for epoll functions.
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Use the "+-" syntax more uniformly when printing masked flags, and use
the syntax of delimited 1-flags also for formatting fully masked TCP
flags.
The "+-" syntax only deals with masked flags, but if there are many of
those, the printout becomes long and confusing. Typically there are
many flags only when flags are fully masked, but even then most of
them are zeros, so it makes sense to print the flags that are set
(ones) and omit the zero flags.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Using different types for the two bitfields did not work on MSVC, so
reverting back to "64-bit bool" :-)
Reported-by: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Miniflows can nowadays be dynamically allocated to different inline
sizes, as done by lib/classifier.c, but this had not been documented
at the struct miniflow definition.
Also, MINI_N_INLINE had a different value for 32-bit and 64-bit builds
due to a historical reason. Now we use 8 for both.
Finally, use change the storage type of 'values_inline' to uint8_t, as
uint64_t looks kind of wide for a boolean, even though we intend the
bit be carved out from the uint64_t where 'map' resides.
Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
These 64-bit registers are intended to conform with the OpenFlow 1.5
draft specification.
EXT-244.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
When, during a classifier lookup, we narrow down to a single potential
rule, it is enough to match on ("unwildcard") one bit that differs
between the packet and the rule.
This is a special case of the more general algorithm, where it is
sufficient to match on enough bits that separates the packet from all
higher priority rules than the matched rule. For a miss that would be
all the rules. Implementing this is expensive for a more than a few
rules. This patch starts by doing this for a single rule when we
already have it, also reducing the lookup cost by finishing the lookup
earlier than before.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Some tunnel formats have mechanisms for indicating that packets are
OAM frames that should be handled specially (either as high priority or
not forwarded beyond an endpoint). This provides support for allowing
those types of packets to be matched.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
C99 declarations within code are allowed now. Change the
FLOW_FOR_EACH_IN_MAP to use loop variable within the for statement.
This makes this macro more generally useful.
The loop variable name is suffixed with two underscores with the
intention that there would be a low likelihood of collision with any
of the macro parameters.
Also fix the return type of flow_get_next_in_map().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Change the classifier to allocate variable sized miniflows and
minimasks in cls_match and cls_subtable, respectively. Do not
duplicate the mask in cls_rule any more.
miniflow_clone and miniflow_move can now take variably sized miniflows
as source. The destination is assumed to be regularly sized miniflow.
Inlining miniflow and mask values reduces memory indirection and helps
reduce cache misses.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This allows use of miniflows that have all of their values inline.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
We only need to iterate over the bits masked by the 'b' in
minimask_has_extra(), since for zeroes in 'b' there can be no 'extra'
wildcards in 'a', as 'b' has already wildcarded all the bits.
minimask_is_catchall() can be simplified by the invariant that mask's
map never has 1-bits for all-zero values.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This helps about 1% in TCP_CRR performance test. However, this also
helps by clearly showing the classifier_lookup() cost in perf reports
as one item.
This also cleans up the flow/match APIs from functionality only used
by the classifier, making is more straightforward to evolve them
later.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Add new macro MINIFLOW_MAP(FIELD) that returns the map covering the
given struct flow field.
Change the miniflow accessors to macros so that they can take the
field name directly.
Use these to add ipv6 support to miniflow_hash_5tuple().
Add ipv6 support to flow_hash_5tuple() as well so that these two
functions continue to return the same hash value for the corresponding
flows.
Also, simplify miniflow_get_metadata().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Apart from STP, EVB extension of LLDP as well as IEEE 802.1QBG use the
Nearest Customer Bridge (NCB) DMAC which has a value of 0180.c200.0000.
STP can be distinguished by Ethertype from these protocols.
Signed-off-by: Padmanabhan Krishnan <kprad1@yahoo.com>
[blp@nicira.com rewrote the details of the patch]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Tested-by: Padmanabhan Krishnan <kprad1@yahoo.com>
Use miniflow as a flow key in the userspace datapath classifier. The
miniflow is expanded for upcalls, but for existing datapath flows, the
key need not be expanded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Support struct miniflow as a key for datapath flow lookup.
The new classifier interface classifier_lookup_miniflow_first() takes
a miniflow as a key and stops at the first match with no regard to
flow prioritites. This works only if the classifier has no
conflicting rules (as is the case with the userspace datapath
classifier).
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Add inlined generic accessors for miniflow integer type fields, and a
new miniflow_get_tcp_flags() usinge these. These will be used in a
later patch.
Some definitions also used in lib/packets.h had to be moved there to
resolve circular include dependencies. Similarly, some inline
functions using struct flow are now in lib/flow.h. IMO this is
cleaner, since now the lib/flow.h need not be included from
lib/packets.h.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
miniflow_extract() extracts packet headers directly to a miniflow,
which is a compressed form of the struct flow. This does not require
a large struct to be cleared to begin with, and accesses less memory.
These performance benefits should allow this to be used in the DPDK
datapath.
miniflow_extract() takes a miniflow as an input/output parameter. On
input the buffer for values to be extracted must be properly
initialized. On output the map contains ones for all the fields that
have been extracted.
Some struct flow fields are reordered to make miniflow_extract to
progress in the logical order.
Some explicit "inline" keywords are necessary for GCC to optimize this
properly. Also, macros are used for same reason instead of inline
functions for pushing data to the miniflow.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Since the dp_hash will often be a hash of the 5 tuple, it makes sense
to put it with the L4 header so it hits in the last classifier lookup
stage.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
threads read upcall.
This commit implements the API functions to allow multiple handler
threads read upcall.
Also, this commit removes the handling priority of DPIF_UC_MISS
over DPIF_UC_ACTION. So, both misses will be put to the same
queue. The decision is based on the fact that a lot has changed
since the age when flow setup rate is most treasured and starving
all actions in the presence of any flow misses doesn't seem like
a sound balancing solution.
Thusly the current implementation will be put in testing and
investigation for better balancing solution will continue if
there is an issue.
Also note, the introduction and use of flow_hash_5tuple() will
put missed ICMP packets from same source but with different
type/code to different handler queues. This may cause reordering
of these packets. For now, we do not count this as a problem.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Change the flow_extract() API to accept struct pkt_metadata,
instead of individual metadata fields. It will make the API more
logical and easier to maintain when we need to expand metadata
down the road.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>¬
This commit makes the userspace support for MPLS more complete. Now
up to 3 labels are supported.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Co-authored-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
We allow zero 'values' in a miniflow for it to have the same map
as the corresponding minimask. Minimasks themselves never have
zero data values, though. Document this and optimize the code
accordingly.
v2:
- Made miniflow_get_map_in_range() to return data offset instead of
a pointer via the last parameter.
- Simplified minimatch_hash_in_range() by removing pointer arithmetic.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch adds a new function flow_unildcard_tp_ports() which doesn't
unwildcard the upper half of tp_src and tp_dst with ICMP packets.
Unfortunately, this matters in future patches when we compare masks
carefully to determine if flows should be evicted from the datapath.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit bcd2633a5be(ofproto-dpif: Store relevant fields for wildcarding
in facet) implements flow revalidation by comparing the newly looked
up flow mask with that of the existing facet.
The non-packet fields, such as register masks, are always cleared
by xlate_actions in the masks stored within facets, but they are not
cleared in the newly looked up flow masks, causing otherwise valid
flows to be declared as invalid flows and be removed from the datapath.
This patch provides a fix. I was able to verify the fix on a system set
up by Ying Chen where the bug can be reproduced.
Bug #21680
Reported by: Ying Chen <yingchen@vmware.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Allow TCP flags match specification with symbolic flag names. TCP
flags are optionally specified as a string of flag names, each
preceded by '+' when the flag must be one, or '-' when the flag must
be zero. Any flags not explicitly included are wildcarded. The
existing hex syntax is still allowed, and is used in flow dumps when
all the flags are matched.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Subtable lookup is performed in ranges defined for struct flow,
starting from metadata (registers, in_port, etc.), then L2 header, L3,
and finally L4 ports. Whenever it is found that there are no matches
in the current subtable, the rest of the subtable can be skipped. The
rationale of this logic is that as many fields as possible can remain
wildcarded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
tcp_flags=flags/mask
Bitwise match on TCP flags. The flags and mask are 16-bit num‐
bers written in decimal or in hexadecimal prefixed by 0x. Each
1-bit in mask requires that the corresponding bit in port must
match. Each 0-bit in mask causes the corresponding bit to be
ignored.
TCP protocol currently defines 9 flag bits, and additional 3
bits are reserved (must be transmitted as zero), see RFCs 793,
3168, and 3540. The flag bits are, numbering from the least
significant bit:
0: FIN No more data from sender.
1: SYN Synchronize sequence numbers.
2: RST Reset the connection.
3: PSH Push function.
4: ACK Acknowledgement field significant.
5: URG Urgent pointer field significant.
6: ECE ECN Echo.
7: CWR Congestion Windows Reduced.
8: NS Nonce Sum.
9-11: Reserved.
12-15: Not matchable, must be zero.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
This patch gets rid of the need for having explicit padding in struct
flow as new fields are being added. flow_wildcards_init_exact(), which
used to set bits in both compiler generated and explicit padding, is
removed. match_wc_init() is now used instead, which generates the mask
based on a given flow, setting bits only in fields which make sense.
Places where random bits were placed in struct flow have been changed to
only set random bits on fields that are significant in the given context.
This avoids setting padding bits.
- lib/flow:
- Properly initialize struct flow also in places we used to zero out
padding before.
- Add flow_random_hash_fields() used for testing.
- Remove flow_wildcards_init_exact() to avoid initializing
masks where compiler generated padding has bits set.
- lib/match.c match_wc_init(): Wildcard transport layer fields for later
fragments, remove match_init_exact(), which used
flow_wildcards_init_exact().
- tests/test-flows.c: use match_wc_init() instead of match_init_exact()
- tests/flowgen.pl: generate more accurate packets and flows when
fragmenting, mark unavailable fields as wildcarded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>