Miniflows can nowadays be dynamically allocated to different inline
sizes, as done by lib/classifier.c, but this had not been documented
at the struct miniflow definition.
Also, MINI_N_INLINE had a different value for 32-bit and 64-bit builds
due to a historical reason. Now we use 8 for both.
Finally, use change the storage type of 'values_inline' to uint8_t, as
uint64_t looks kind of wide for a boolean, even though we intend the
bit be carved out from the uint64_t where 'map' resides.
Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
This reverts commit 29d2aa3aa74ab97df0f00af5bddaa12485c1d39a.
Turns out the code was correct but the dynamic nature of miniflow
inline storage was poorly documented. A following patch fixes the
documentation.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
The branch is unused as size < sizeof dst->inline_values must
always be true for inlined values. Hitting the branch would lead
to corruption as inline_values is accessed out of bounds.
Remove branch and add assertion.
Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
These 64-bit registers are intended to conform with the OpenFlow 1.5
draft specification.
EXT-244.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Use generic names hash_add() and hash_finish() instead of mhash_*
equivalents. This makes future changes to hash implentations more
localized.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Some tunnel formats have mechanisms for indicating that packets are
OAM frames that should be handled specially (either as high priority or
not forwarded beyond an endpoint). This provides support for allowing
those types of packets to be matched.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
flow_compose_l4() can cause 'b' to be reallocated, thus the network header
pointer needs to be refreshed afterward.
Found by valgrind in the IPv6 case. I updated the IPv4 case too just in
case, and for consistency.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This patch addresses two bugs related to ICMPv6(NDP) packets:
- In miniflow_extract() push the words in the correct order
- In parse_icmpv6() use sizeof struct, not size of struct *
match_wc_init() has been modified, to include the nd_target field
when the transport layer protocol is ICMPv6
A testcase has been added to detect the bugs.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
This commit replace a while loop in miniflow_equal() with a call to
memcmp() for performace reasons.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Change the classifier to allocate variable sized miniflows and
minimasks in cls_match and cls_subtable, respectively. Do not
duplicate the mask in cls_rule any more.
miniflow_clone and miniflow_move can now take variably sized miniflows
as source. The destination is assumed to be regularly sized miniflow.
Inlining miniflow and mask values reduces memory indirection and helps
reduce cache misses.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This allows use of miniflows that have all of their values inline.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
We only need to iterate over the bits masked by the 'b' in
minimask_has_extra(), since for zeroes in 'b' there can be no 'extra'
wildcards in 'a', as 'b' has already wildcarded all the bits.
minimask_is_catchall() can be simplified by the invariant that mask's
map never has 1-bits for all-zero values.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This helps about 1% in TCP_CRR performance test. However, this also
helps by clearly showing the classifier_lookup() cost in perf reports
as one item.
This also cleans up the flow/match APIs from functionality only used
by the classifier, making is more straightforward to evolve them
later.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Add new macro MINIFLOW_MAP(FIELD) that returns the map covering the
given struct flow field.
Change the miniflow accessors to macros so that they can take the
field name directly.
Use these to add ipv6 support to miniflow_hash_5tuple().
Add ipv6 support to flow_hash_5tuple() as well so that these two
functions continue to return the same hash value for the corresponding
flows.
Also, simplify miniflow_get_metadata().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Use miniflow as a flow key in the userspace datapath classifier. The
miniflow is expanded for upcalls, but for existing datapath flows, the
key need not be expanded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Support struct miniflow as a key for datapath flow lookup.
The new classifier interface classifier_lookup_miniflow_first() takes
a miniflow as a key and stops at the first match with no regard to
flow prioritites. This works only if the classifier has no
conflicting rules (as is the case with the userspace datapath
classifier).
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Upcoming patches add classifier lookups using miniflows, this is
heavily used for it.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Add inlined generic accessors for miniflow integer type fields, and a
new miniflow_get_tcp_flags() usinge these. These will be used in a
later patch.
Some definitions also used in lib/packets.h had to be moved there to
resolve circular include dependencies. Similarly, some inline
functions using struct flow are now in lib/flow.h. IMO this is
cleaner, since now the lib/flow.h need not be included from
lib/packets.h.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
miniflow_extract() extracts packet headers directly to a miniflow,
which is a compressed form of the struct flow. This does not require
a large struct to be cleared to begin with, and accesses less memory.
These performance benefits should allow this to be used in the DPDK
datapath.
miniflow_extract() takes a miniflow as an input/output parameter. On
input the buffer for values to be extracted must be properly
initialized. On output the map contains ones for all the fields that
have been extracted.
Some struct flow fields are reordered to make miniflow_extract to
progress in the logical order.
Some explicit "inline" keywords are necessary for GCC to optimize this
properly. Also, macros are used for same reason instead of inline
functions for pushing data to the miniflow.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Reviewed-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Infrastructure to enable megaflow support for bond ports using
recirculation. This patch adds the following features:
* Generate RECIRC action when bond can benefit from recirculation.
* Populate post recirculation rules in a hidden table. Currently table 254.
* Uses post recirculation rules for bond rebalancing
* A recirculation implementation in dpif-netdev.
The goal of this patch is to be able to megaflow bond outputs and
thus greatly improve performance. However, this patch does not
actually improve the megaflow generation. It is left for a later commit.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Rename 'l2' to 'frame' and add new ofpbuf_set_frame() and ofpbuf_l2().
ofpbuf_set_frame() alse resets all the layer offsets. ofpbuf_l2()
returns NULL if the packet has no Ethernet header, as indicated either
by unset l3 offset or NULL frame pointer. Callers of ofpbuf_l2() are
supposed to check the return value, unless they can otherwise be sure
that the packet has a valid Ethernet header.
The recent commit 437d0d22 made some assumptions that were not valid
regarding the use of the 'l2' pointer in rconn module and by
compose_rarp(). This is now fixed as follows: rconn now relies on the
fact that once OpenFlow messages are given to rconn for transport, the
frame pointer is no longer needed to refer to the OpenFlow header; and
compose_rarp() now sets the frame pointer and offsets as expected.
In addition to storing network frames, ofpbufs are also used for
handling OpenFlow messages and action lists. lib/ofpbuf.h now has a
comment documenting the current usage conventions and invariants.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch shrinks the struct ofpbuf from 104 to 48 bytes on 64-bit
systems, or from 52 to 36 bytes on 32-bit systems (counting in the
'l7' removal from an earlier patch). This may help contribute to
cache efficiency, and will speed up initializing, copying and
manipulating ofpbufs. This is potentially important for the DPDK
datapath, but the rest of the code base may also see a little benefit.
Changes are:
- Remove 'l7' pointer (previous patch).
- Use offsets instead of layer pointers for l2_5, l3, and l4 using
'l2' as basis. Usually 'data' is the same as 'l2', but this is not
always the case (e.g., when parsing or constructing a packet), so it
can not be easily used as the offset basis. Also, packet parsing is
faster if we do not need to maintain the offsets each time we pull
data from the ofpbuf.
- Use uint32_t for 'allocated' and 'size', as 2^32 is enough even for
largest possible messages/packets.
- Use packed enum for 'source'.
- Rearrange to avoid unnecessary padding.
- Remove 'private_p', which was used only in two cases, both of which
had the invariant ('l2' == 'data'), so we can temporarily use 'l2'
as a private pointer.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Now that we don't need to parse TCP flags from the packet after
extraction, we usually do not need the 'l7' pointer any more. When
needed, ofpbuf_get_tcp|udp|sctp|icmp_payload() or ofpbuf_get_l4_size()
can be used instead.
Removal of 'l7' was requested by Pravin for the DPDK datapath work, as
it simplifies packet parsing a bit.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
First part of the hash was discarded as basis was used too late.
Also be explicit about the input type expected by mhash_add().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
threads read upcall.
This commit implements the API functions to allow multiple handler
threads read upcall.
Also, this commit removes the handling priority of DPIF_UC_MISS
over DPIF_UC_ACTION. So, both misses will be put to the same
queue. The decision is based on the fact that a lot has changed
since the age when flow setup rate is most treasured and starving
all actions in the presence of any flow misses doesn't seem like
a sound balancing solution.
Thusly the current implementation will be put in testing and
investigation for better balancing solution will continue if
there is an issue.
Also note, the introduction and use of flow_hash_5tuple() will
put missed ICMP packets from same source but with different
type/code to different handler queues. This may cause reordering
of these packets. For now, we do not count this as a problem.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
We used to map ODPP_NONE to port number 0, which is wrong, as
ODPP_NONE is a valid value of the flow's in_port.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Change the flow_extract() API to accept struct pkt_metadata,
instead of individual metadata fields. It will make the API more
logical and easier to maintain when we need to expand metadata
down the road.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>¬
OpenFlow 1.3.3 spec (and earlier) specify that the default value for an
MPLS label should be copied from the outer header.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This commit makes the userspace support for MPLS more complete. Now
up to 3 labels are supported.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Co-authored-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
This is in preparation for pushing vlan tags
using the TPID provided by the kernel via auxdata.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We allow zero 'values' in a miniflow for it to have the same map
as the corresponding minimask. Minimasks themselves never have
zero data values, though. Document this and optimize the code
accordingly.
v2:
- Made miniflow_get_map_in_range() to return data offset instead of
a pointer via the last parameter.
- Simplified minimatch_hash_in_range() by removing pointer arithmetic.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This allows other libraries to use util.h that has already
defined NOT_REACHED.
Signed-off-by: Harold Lim <haroldl@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
This patch adds a new function flow_unildcard_tp_ports() which doesn't
unwildcard the upper half of tp_src and tp_dst with ICMP packets.
Unfortunately, this matters in future patches when we compare masks
carefully to determine if flows should be evicted from the datapath.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit bcd2633a5be(ofproto-dpif: Store relevant fields for wildcarding
in facet) implements flow revalidation by comparing the newly looked
up flow mask with that of the existing facet.
The non-packet fields, such as register masks, are always cleared
by xlate_actions in the masks stored within facets, but they are not
cleared in the newly looked up flow masks, causing otherwise valid
flows to be declared as invalid flows and be removed from the datapath.
This patch provides a fix. I was able to verify the fix on a system set
up by Ying Chen where the bug can be reproduced.
Bug #21680
Reported by: Ying Chen <yingchen@vmware.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Add a prefix tree (trie) structure for tracking the used address
space, enabling skipping classifier tables containing longer masks
than necessary for an address field value in a packet header being
classified. This enables less unwildcarding for datapath flows in
parts of the address space without host routes.
Trie lookup is interwoven to the staged lookup, so that a trie is
searched only when the configured trie field becomes relevant
for the lookup. The trie lookup results are retained so that each
trie is checked at most once for each classifier lookup.
This implementation tracks the number of rules at each address prefix
for the whole classifier. More aggressive table skipping would be
possible by maintaining lists of tables that have prefixes at the
lengths encountered on tree traversal, or by maintaining separate
tries for subsets of rules separated by metadata fields.
Prefix tracking is configured via OVSDB. A new column "prefixes" is
added to the database table "Flow_Table". "prefixes" is a set of
string values listing the field names for which prefix lookup should
be used.
As of now, the fields for which prefix lookup can be enabled are:
- tun_id, tun_src, tun_dst
- nw_src, nw_dst (or aliases ip_src and ip_dst)
- ipv6_src, ipv6_dst
There is a maximum number of fields that can be enabled for any one
flow table. Currently this limit is 3.
Examples:
ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \
--id=@N1 create Flow_Table name=table0
ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \
--id=@N1 create Flow_Table name=table1
ovs-vsctl set Flow_Table table0 prefixes=ip_dst,ip_src
ovs-vsctl set Flow_Table table1 prefixes=[]
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Add the missing code for generating IPv6 packets for testing purposes.
Also make flow_compose() set the l4 and l7 pointers more consistently
with flow_extract().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Allow TCP flags match specification with symbolic flag names. TCP
flags are optionally specified as a string of flag names, each
preceded by '+' when the flag must be one, or '-' when the flag must
be zero. Any flags not explicitly included are wildcarded. The
existing hex syntax is still allowed, and is used in flow dumps when
all the flags are matched.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Subtable lookup is performed in ranges defined for struct flow,
starting from metadata (registers, in_port, etc.), then L2 header, L3,
and finally L4 ports. Whenever it is found that there are no matches
in the current subtable, the rest of the subtable can be skipped. The
rationale of this logic is that as many fields as possible can remain
wildcarded.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
This avoids a conflict with NetBSD's strings.h/libc.
(http://netbsd.gw.com/cgi-bin/man-cgi?popcount++NetBSD-current)
The new name is suggested by Ben Pfaff.
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@gmail.com>
Having a single function that can do popcount() on any integer type is
easier for callers to get right. The implementation is probably slower
if the caller actually provides a 32-bit (or shorter) integer, but the
only existing callers always provide a full 64-bit integer so this seems
unimportant for now.
This also restores use, in practice, of the optimized implementation of
population count. (As the comment on popcount32() says, this version is
2x faster than __builtin_popcount().)
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Having a single function that can do raw_ctz() on any integer type is
easier for callers to get right, and there is no real downside in the
implementation.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
flow_extract() fills in ->l7 but flow_compose() wasn't doing it, which
confused bfd_process_packet() when invoked via the ofproto/trace appctl
command.
Signed-off-by: Ben Pfaff <blp@nicira.com>
tcp_flags=flags/mask
Bitwise match on TCP flags. The flags and mask are 16-bit num‐
bers written in decimal or in hexadecimal prefixed by 0x. Each
1-bit in mask requires that the corresponding bit in port must
match. Each 0-bit in mask causes the corresponding bit to be
ignored.
TCP protocol currently defines 9 flag bits, and additional 3
bits are reserved (must be transmitted as zero), see RFCs 793,
3168, and 3540. The flag bits are, numbering from the least
significant bit:
0: FIN No more data from sender.
1: SYN Synchronize sequence numbers.
2: RST Reset the connection.
3: PSH Push function.
4: ACK Acknowledgement field significant.
5: URG Urgent pointer field significant.
6: ECE ECN Echo.
7: CWR Congestion Windows Reduced.
8: NS Nonce Sum.
9-11: Reserved.
12-15: Not matchable, must be zero.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>