We use OvsFindVportByPortIdAndNicIndex() to lookup the vport for a
packte received from the Hyper-V switch. If a packet was indeed received
from the virtual external NIC, we should flag it.
Validation:
1. Install and Uninstall the OVS EXT Driver (without enabling the OVS
extension on the Hyper-V switch).
2. Install and Uninstall the OVS EXT Driver (with enabling the OVS
extension on the Hyper-V switch). Hyper-V switch had a few ports.
3. Install and Uninstall the OVS EXT Driver (with enabling the OVS
extension on the Hyper-V switch). Added a few ports before
uninstalling.
4. Install the OVS EXT driver, and test the following functionality:
a) ping between 2 VMs on the same host
b) ping between 2 VMs on 2 Hyper-Vs - one physical and another
virtual backed by VLAN (patch port between br-pif and br-int).
c) ping between 2 VMs on 2 Hyper-Vs - one physical and another
virtual backed by VXLAN.
d) Successful uninstallation after these tests.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Tested-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
In this patch, we make the following updates to the vport add code:
1. Clarify the roles of the different hash tables, so it is easier to
read/write code in the future.
2. Update OvsNewVportCmdHandler() to support adding bridge-internal
ports.
3. Fixes in OvsNewVportCmdHandler() to support adding external port.
Earlier, we'd hit ASSERTs.
4. I could not figure out way to add a port of type
OVS_PORT_TYPE_INTERNAL with name "internal" to the confdb using
ovs-vsctl.exe. And, this is needed in order to add the Hyper-V
internal port from userspace. To workaround this problem, we treat a
port of type OVS_PORT_TYPE_NETDEV with name "internal" as a request
to add the Hyper-V internal port. This is a workaround. The end
result is that there's a discrepancy between the port type in the
datpaath v/s confdb, but this has not created any trouble in testing
so far. If this ends up becoming an issue, we can mark the Hyper-V
internal port to be of type OVS_PORT_TYPE_NETDEV. No harm.
5. Because of changes indicated in #1, we also update the vport dump
code to look at the correct hash table for ports added from
userspace.
6. Add a OvsGetTunnelVport() for convenience.
7. Update ASSERTs() while cleaning up the switch.
8. Nuke OvsGetExternalVport() and OvsGetExternalMtu().
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Tested-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
We do a bunch of changes that did not make sense to split up into
smaller patches:
1. Add descriptive comments to the important functions to clarify
purpose.
2. s/OvsInitVportCommon/InitHvVportCommon - this function is common code
for every port that shows up on the Hyper-V switch.
3. Introduce a InitOvsVportCommon() that is common code for evrey port
that gets added from userspace. This is especially useful for ports
that are not present on the Hyper-V switch. ie. tunnel ports and
bridge-internal ports.
4. Fix OvsClearAllSwitchVports() to remove ports from both the lists:
the ones added from Hyper-V as well as the ones added from OVS
userspace.
5. Update OvsInitVxlanTunnel() to not call into InitHvVportCommon
(formerly OvsInitVportCommon()) since it is not a port on the Hyper-v
switch. In a later patch in the series, we'll call
InitOvsVportCommon() for a VXLAN port.
6. 'numNonHvVports' increments and decrements ONLY for ports that are
added from OVS userspace but not present on the Hyper-V switch.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Tested-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
In this patch, we provide explanation and the reasoning for
bridge-internal ports. The code to add such a port in come in later
patch in the series.
We also fix some formatting issues in PacketIO.c.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Tested-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
In this patch, we add some explanation about the usage of
'externalVport' in the switch context. Also, we rename 'externalVport'
to 'virtualExternalVport' in alignment with the explanation. Also, we
rename 'numVports' to 'numHvVports' since ports are added from 2 ends
now.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Tested-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Without this patch, the kernel crashes when it tries to cleanup a port
at unload time when a port has been previously deleted from userspace.
Crash is in OvsRemoveAndDeleteVport() when we call into
RemoveEntryList().
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Tested-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Currently, OVSRCU_TYPE_INITIALIZER always initializes the RCU pointer
as NULL. There is no reason why the RCU pointer could not be
initialized with a non-NULL value, however, as statically allocated
memory is even more stable than required for RCU.
This patch changes the initializer to OVSRCU_INITIALIZER(VALUE), which
can take any pointer value as a parameter.
This allows rculist, which is introduced in a following patch, to
provide an initializer similar to the one in the normal list.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
tests/test-classifier.c used to include lib/classifier.c to gain
access to the internal data structures and some utility functions.
This was confusing, so this patch splits the relevant groups of
classifier internal definations to a new file
(lib/classifier-private.h), which is included by both lib/classifier.c
and tests/test-classifier.c. Other use of the new file is
discouraged.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Vlog functions assume a vlog module has been defined for the current
translation unit. Including lib/vlog.h from a header file makes the
vlog API visible even when no vlog module may not have been defined.
This patch removes the two cases in the tree where vlog.h was included
from a header file.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
OpenFlow 1.1 - 1.4 specify that indirect groups should
opperate on the "one defined bucket in the group". OpenFlow
1.2 - 1.4 also state "This group only supports a single bucket."
This patch enforces the single bucket limitation for indirect groups
when decoding group mod messages. A test is also added to exercise
this change.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
ipv6_find_hdr() already fixed in newer upstram kernel by Ansis, we
can start using this API safely.
This patch also backports fix (ipv6: ipv6_find_hdr restore prev
functionality) to compat ipv6_find_hdr().
CC: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
rcu_dereference_raw() api is cleaner way of accessing RCU pointer
when no locking is required.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This feature bit was overlooked when we added support for group stats.
Reported-by: Anup Khadka <khadka.py@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Commit c2d936a44f (ofp-actions: Centralize all OpenFlow action code for
maintainability.) rewrote OpenFlow action parsing but failed to check that
actions don't overflow their buffers. This commit fixes the problem and
adds negative tests so that this bug doesn't recur.
Reported-by: Tomer Pearl <Tomer.Pearl@Contextream.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
In this patch, we add support in dpif-netlink.c to receive packets on
Windows. Windows does not natively support epoll(). Even though there
are mechanisms/interfaces that provide functionality similar to epoll(),
we take a simple approach of using a pool of sockets.
Here are some details of the implementaion to aid review:
1. There's pool of sockets per upcall handler.
2. The pool of sockets is initialized while setting up the handler in
dpif_netlink_refresh_channels() primarily.
3. When sockets are to be allocated for a vport, we walk through the
pool of sockets for all handlers and pick one of the sockets in each of
the pool. Within a handler's pool, sockets are picked in a round-robin
fashion.
4. We currently support only 1 handler, since there are some kernel
changes needed for support more than 1 handler per vport.
5. The pool size is also set to 1 currently.
The restructions imposed by #4 and #5 can be removed in the future
without much code churn.
Validation:
1. With a hacked up kernel which figures out the netlink socket that is
designated to receive packets, we are cable to perform pings between 2
VMs on the same Hyper-V host.
2. Compiled the code in Linux as well.
3. Tested with pool size == 2 as well, though in this patch we set the
pool size = 1.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
In this patch, we add support in userspace for packet subscribe API
similar to the join/leave MC group API that is used for port events.
The kernel code has already been commited.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
On Windows platform, TCP_NODELAY can only be set when TCP is established.
(This is an observed behavior and not written in any MSDN documentation.)
The current code does not create any problems while running unit tests
(because connections get established immediately) but is reportedly
observed while connecting to a different machine.
commit 8b76839(Move setsockopt TCP_NODELAY to when TCP is connected.)
made changes to call setsockopt with TCP_NODELAY after TCP is connected
only in lib/stream-ssl.c. We need the same change for stream-tcp too and
this commit does that.
Currently, a failure of setting TCP_NODELAY results in reporting
the error and then closing the socket. This commit changes that
behavior such that an error is reported if setting TCP_NODELAY
fails, but the connection itself is not torn down.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Commit a8d819675f (Remove stream, vconn, and rconn functions to get
local/remote IPs/ports.) removed the code that used the local socket
address but neglected to remove the code to fetch that address. This
commit removes the latter code also.
Reported-by: Eitan Eliahu <eliahue@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Eitan Eliahu <eliahue@vmware.com>
We normally only add 1-bits to wc->masks for datapath flow matching
purposes, never removing them. In this case, the bits that get set to
zero will be set back to 1 later on in the function, so this does not fix
any actual bug, but the principle of only setting to 1, not to 0, seems
sound to me.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
We have not yet tested the wakup via pending IRP functionality on
Windows yet. Hence we use poll_immediate_wake().
Signed-off-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Eitan Eliahu <eliahue@vmware.com>
If we are trying to insert a flow while there's already a key with the
same flow, return success instead of failure. It can be argued that we
should probably return a transactional error EEXIST, but we'll handle
this in a subsequent commit. I've added a comment to address this later.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Eitan Eliahu <eliahue@vmware.com>
ETH_ADDR_LEN is defined in lib/packets.h, valued 6.
Use this macro instead of magic number 6 to represent the length
of eth mac address.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL. This can happen when GSO is used for header verification.
However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.
Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.
However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.
It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
In nl_sock_recv__() on Windows, we realloc a new ofpbuf to copy received
data if the caller specified buffer is small. While we do so, we need
reset some of the other stack variables to point to the new ofpbuf.
Other fixes are around using 'error' rather than 'errno'.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
A couple of miscellaneous fixes in code that creates a packet for
userspace as well as when we copy the packet to memory specified by
userspace.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Ankur Sharma <ankursharma@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
The semantics are read operation are generally to return 0 bytes and
STATUS_SUCCESS when there are no events.
Also, added a fix to assign the PID to the synthetic OVS_MESSAGE formed
for the command validation.
Signed-off-by: Nithin Raju <nithin@vmware.com>
Acked-by: Eitan Eliahu <eliahue@vmware.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
When flow_get fails (in this case flow does not exist) simply log
the key part of the get and erase the rest of the flow because it
is invalid.
Verified the fix by doing ovs-ofctl del-flows when traffic is running.
2014-10-18T20:12:13.785Z|00011|dpif(revalidator20)|WARN|system@ovs-system: failed to flow_get (No such file or directory) dp_hash(0),recirc_id(0),skb_priority(0),in_port(2),skb_mark(0),eth(src=00:13:72:0b:52:fa,dst=00:14:72:0b:52:fa),eth_type(0x0800),ipv4(src=10.0.0.164,dst=11.0.0.164,proto=6,tos=0,ttl=4,frag=no),tcp(src=1651,dst=6095),tcp_flags(ack), packets:0, bytes:0, used:never
Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
An I/O request is queued in Kernel to be completed upon a packet mismatch.
This mechanism is similar to the port state notification.
Access to instance data should be under a lock (TBD)
Signed-off-by: Eitan Eliahu <eliahue@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
There was not much difference between the two files after moving
all of the Windows socket HANDLE polling functionality to poll-loop.c.
So merge them together.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
This patch updates the documentation to reflect that DPDK 1.7.1
is supported. Travis scripts have also been updated to reflect
this. DPDK phy and ring ports were validated against DPDK 1.7.1.
Reviewed-by: Mark D. Gray <mark.d.gray@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>