This patch adds support for specifying a "helper" or ALG to assist
connection tracking for protocols that consist of multiple streams.
Initially, only support for FTP is included.
Below is an example set of flows to allow FTP control connections from
port 1->2 to establish active data connections in the reverse direction:
table=0,priority=1,action=drop
table=0,arp,action=normal
table=0,in_port=1,tcp,action=ct(alg=ftp,commit),2
table=0,in_port=2,tcp,ct_state=-trk,action=ct(table=1)
table=1,in_port=2,tcp,ct_state=+trk+est,action=1
table=1,in_port=2,tcp,ct_state=+trk+rel,action=ct(commit),1
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Packets are still sampled at ingress only, so the egress
tunnel and/or MPLS structures are only included when there is just 1 output
port. The actions are either provided by the datapath in the sample upcall
or looked up in the userspace cache. The former is preferred because it is
more reliable and does not present any new demands or constraints on the
userspace cache, however the code falls back on the userspace lookup so that
this solution can work with existing kernel datapath modules. If the lookup
fails it is not critical: the compiled user-action-cookie is still available
and provides the essential output port and output VLAN forwarding information
just as before.
The openvswitch actions can express almost any tunneling/mangling so the only
totally faithful representation would be to somehow encode the whole list of
flow actions in the sFlow output. However the standard sFlow tunnel structures
can express most common real-world scenarios, so in parsing the actions we
look for those and skip the encoding if we see anything unusual. For example,
a single set(tunnel()) or tnl_push() is interpreted, but if a second such
action is encountered then the egress tunnel reporting is suppressed.
The sFlow standard allows "best effort" encoding so that if a field is not
knowable or too onerous to look up then it can be left out. This is often
the case for the layer-4 source port or even the src ip address of a tunnel.
The assumption is that monitoring is enabled everywhere so a missing field
can typically be seen at ingress to the next switch in the path.
This patch also adds unit tests to check the sFlow encoding of set(tunnel()),
tnl_push() and push_mpls() actions.
The netlink attribute to request that actions be included in the upcall
from the datapath is inserted for sFlow sampling only. To make that option
be explicit would require further changes to the printing and parsing of
actions in lib/odp-util.c, and to scripts in the test suite.
Further enhancements to report on 802.1AD QinQ, 64-bit tunnel IDs, and NAT
transformations can follow in future patches that make only incremental
changes.
Signed-off-by: Neil McKee <neil.mckee@inmon.com>
[blp@nicira.com made stylistic and semantic changes]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Use IPv6 internally for storing multicast addresses. IPv4 addresses are
translated to their IPv4-mapped equivalent.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Cc: Flavio Leitner <fbl@redhat.com>
Cc: Ben Pfaff <blp@nicira.com>
[blp@nicira.com added a "sparse" implementation of IN6_IS_ADDR_V4MAPPED.]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Sparse doesn't like several of the DPDK header files. This patch
works around it so we can get analysis when compiling DPDK.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Extend IPFIX exporter to export tunnel headers when both input and output
of the port.
Add three other_config options in IPFIX table: enable-input-sampling,
enable-output-sampling and enable-tunnel-sampling, to control whether
sampling tunnel info, on which direction (input or output).
Insert sampling action before output action and the output tunnel port
is sent to datapath in the sampling action.
Make datapath collect output tunnel info and send it back to userpace
in upcall message with a new additional optional attribute.
Add a tunnel ports map to make the tunnel port lookup faster in sampling
upcalls in IPFIX exporter. Make the IPFIX exporter generate IPFIX template
sets with enterprise elements for the tunnel info, save the tunnel info
in IPFIX cache entries, and send IPFIX DATA with tunnel info.
Add flowDirection element in IPFIX templates.
Signed-off-by: Wenyu Zhang <wenyuz@vmware.com>
Acked-by: Romain Lenglet <rlenglet@vmware.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Commit 73c85181d (netdev-linux: Read packet auxdata to obtain vlan_tid)
added #include <linux/if_packet.h> to this file, to get the definition
of PACKET_AUXDATA and some other definitions, but on RHEL 6.1 this
provoked compiler errors:
In file included from /usr/include/linux/rtnetlink.h:5,
from lib/netdev-linux.c:34:
/usr/include/linux/netlink.h:34: error: expected specifier-qualifier-list
before 'sa_family_t'
Since the old #includes worked everywhere, and this file already defined
its own versions of most of the new macros that it needed, this commit just
reverts the old #includes and adds the one macro definition it didn't
already have.
(RHEL 6.1 isn't necessarily the only platform where this is a problem, but
it's the first one for which we noticed the problem.)
This switches the definition of sockaddr_ll used from the Linux one, which
uses __be16 for sll_protocol, to the glibc one, which uses plain "unsigned
short int". This makes sparse complain (rightly), so this commit also
adds a sparse-specific header that uses ovs_be16 to prevent the warning.
Signed-off-by: Ben Pfaff <blp@nicira.com>
If VLAN acceleration is used when the kernel receives a packet
then the outer-most VLAN tag will not be present in the packet
when it is received by netdev-linux. Rather, it will be present
in auxdata.
This patch uses recvmsg() instead of recv() to read auxdata for
each packet and if the vlan_tid is set then it is added to the packet.
Adding the vlan_tid makes use of headroom available
in the buffer parameter of rx_recv.
Signed-off-by: Simon Horman <horms@verge.net.au>
Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Include stddef.h in include/sparse/sys/socket.h to ensure
that NULL is defined and thus avoid the following sparse warning.
./include/sparse/sys/socket.h:74:15: error: undefined identifier 'NULL'
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We've seen a number of deadlocks in the tree since thread safety was
introduced. So far, all of these are self-deadlocks, that is, a single
thread acquiring a lock and then attempting to re-acquire the same lock
recursively. When this has happened, the process simply hung, and it was
somewhat difficult to find the cause.
POSIX "error-checking" mutexes check for this specific problem (and
others). This commit switches from other types of mutexes to
error-checking mutexes everywhere that we can, that is, everywhere that
we're not using recursive mutexes. This ought to help find problems more
quickly in the future.
There might be performance advantages to other kinds of mutexes in some
cases. However, the existing mutex type choices were just guesses, so I'd
rather go for easy detection of errors until we know that other mutex
types actually perform better in specific cases. Also, I did a quick
microbenchmark of glibc mutex types on my host and found that the
error checking mutexes weren't any slower than the other types, at least
when the mutex is uncontended.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
The Clang support for thread-safety annotations is much more effective
than "sparse" support. I found that I was unable to make the annotations
warning-free under sparse.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This commit adds annotations for thread safety check. And the
check can be conducted by using -Wthread-safety flag in clang.
Co-authored-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
With the latest version of sparse, the ATOMIC_VAR_INIT macro
generates the following warning. This patch suppresses it.
warning: Using plain integer as NULL pointer
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The only tricky part here is that I'm throwing in annotations to allow
"sparse" to report unbalanced locking.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Commit 796223f5 (netdev: Add new "struct netdev_rx" for capturing packets
from a netdev) refactored send and receive into separate netdevs. As a
result, send and receive now use different socket descriptors (except for tap
interfaces which are treated specially). An unintended side effect was that
all sent packets are looped back and received, which had previously been
avoided as the kernel specifically prevents this from happening on a single
socket descriptor.
To resolve the situation, a socket filter is added to the receive socket
so that it only accepts inbound packets.
Simon Horman co-discovered and initially reported this issue.
Signed-off-by: Murphy McCauley <murphy.mccauley@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Tested-by: Simon Horman <horms@verge.net.au>
Reviewed-by: Simon Horman <horms@verge.net.au>
These will be used in upcoming commits.
This commit also adds corresponding definitions to the "sparse" header,
so that sparse still works.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The changes allow the user to specify a separate dscp value for the
controller connection and the manager connection. The value will take
effect on resetting the connections. If no value is specified a default
value of 192 is chosen for each of the connections.
Feature #10074
Requested-by: Rajiv Ramanathan <rramanathan@nicira.com>
Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
Open vSwitch userspace can set up flows at a high rate, but it is somewhat
"bursty" in opportunities to set up flows, by which I mean that OVS sets up
a batch of flows, then goes off and does some other work for a while, then
sets up another batch of flows, and so on. The result is that, if a large
number of packets that need flow setups come in all at once, then some of
them can overflow the relatively small kernel-to-user buffers.
This commit increases the kernel-to-user buffers from the default of
approximately 120 kB each to 1 MB each. In one somewhat synthetic test
case that I ran based on an "hping3" that generated a load of about 20,000
new flows per second (including both requests and replies), this reduced
the packets dropped at the kernel-to-user interface from about 30% to none.
I expect that it will similarly improve packet loss in workloads where
flow arrival is not easily predictable.
(This has little effect on workloads generated by "ovs-benchmark rate"
because that benchmark is effectively "self-clocking", that is, a new flow
is triggered only by a reply to a request made earlier, which means that
the number of buffered packets at any given has a known, constant upper
limit.)
Bug #10210.
Signed-off-by: Ben Pfaff <blp@nicira.com>
With this commit, the tree compiles clean with sparse commit 87f4a7fda3d
"Teach 'already_tokenized()' to use the stream name hash table" with patch
"evaluate: Allow sizeof(_Bool) to succeed" available at
http://permalink.gmane.org/gmane.comp.parsers.sparse/2461 applied, as long
as the "include/sparse" directory is included for use by sparse (only),
e.g.:
make CC="CHECK='sparse -I../include/sparse' cgcc"