mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-29 05:18:13 +00:00

Author	SHA1	Message	Date
Joe Stringer	8e53fe8cf7	Add connection tracking mark support. This patch adds a new 32-bit metadata field to the connection tracking interface. When a mark is specified as part of the ct action and the connection is committed, the value is saved with the current connection. Subsequent ct lookups with the table specified will expose this metadata as the "ct_mark" field in the flow. For example, to allow new TCP connections from port 1->2 and only allow established connections from port 2->1, and to associate a mark with those connections: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2 table=0,in_port=2,ct_state=-trk,tcp,action=ct(table=1) table=1,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1 Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-10-13 15:34:15 -07:00
Joe Stringer	07659514c3	Add support for connection tracking. This patch adds a new action and fields to OVS that allow connection tracking to be performed. This support works in conjunction with the Linux kernel support merged into the Linux-4.3 development cycle. Packets have two possible states with respect to connection tracking: Untracked packets have not previously passed through the connection tracker, while tracked packets have previously been through the connection tracker. For OpenFlow pipeline processing, untracked packets can become tracked, and they will remain tracked until the end of the pipeline. Tracked packets cannot become untracked. Connections can be unknown, uncommitted, or committed. Packets which are untracked have unknown connection state. To know the connection state, the packet must become tracked. Uncommitted connections have no connection state stored about them, so it is only possible for the connection tracker to identify whether they are a new connection or whether they are invalid. Committed connections have connection state stored beyond the lifetime of the packet, which allows later packets in the same connection to be identified as part of the same established connection, or related to an existing connection - for instance ICMP error responses. The new 'ct' action transitions the packet from "untracked" to "tracked" by sending this flow through the connection tracker. The following parameters are supported initally: - "commit": When commit is executed, the connection moves from uncommitted state to committed state. This signals that information about the connection should be stored beyond the lifetime of the packet within the pipeline. This allows future packets in the same connection to be recognized as part of the same "established" (est) connection, as well as identifying packets in the reply (rpl) direction, or packets related to an existing connection (rel). - "zone=[u16\|NXM]": Perform connection tracking in the zone specified. Each zone is an independent connection tracking context. When the "commit" parameter is used, the connection will only be committed in the specified zone, and not in other zones. This is 0 by default. - "table=NUMBER": Fork pipeline processing in two. The original instance of the packet will continue processing the current actions list as an untracked packet. An additional instance of the packet will be sent to the connection tracker, which will be re-injected into the OpenFlow pipeline to resume processing in the specified table, with the ct_state and other ct match fields set. If the table is not specified, then the packet is submitted to the connection tracker, but the pipeline does not fork and the ct match fields are not populated. It is strongly recommended to specify a table later than the current table to prevent loops. When the "table" option is used, the packet that continues processing in the specified table will have the ct_state populated. The ct_state may have any of the following flags set: - Tracked (trk): Connection tracking has occurred. - Reply (rpl): The flow is in the reply direction. - Invalid (inv): The connection tracker couldn't identify the connection. - New (new): This is the beginning of a new connection. - Established (est): This is part of an already existing connection. - Related (rel): This connection is related to an existing connection. For more information, consult the ovs-ofctl(8) man pages. Below is a simple example flow table to allow outbound TCP traffic from port 1 and drop traffic from port 2 that was not initiated by port 1: table=0,priority=1,action=drop table=0,arp,action=normal table=0,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=9),2 table=0,in_port=2,tcp,ct_state=-trk,action=ct(zone=9,table=1) table=1,in_port=2,ct_state=+trk+est,tcp,action=1 table=1,in_port=2,ct_state=+trk+new,tcp,action=drop Based on original design by Justin Pettit, contributions from Thomas Graf and Daniele Di Proietto. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-10-13 15:34:15 -07:00
Jarno Rajahalme	449b813113	dpif-netdev: Exact match non-presence of vlans. The Netlink encoding of datapath flow keys cannot express wildcarding the presence of a VLAN tag. Instead, a missing VLAN tag is interpreted as exact match on the fact that there is no VLAN. This makes reading datapath flow dumps confusing, since for everything else, a missing key value means that the corresponding key was wildcarded. Unless we refactor a lot of code that translates between Netlink and struct flow representations, we have to do the same in the userspace datapath. This makes at least the flow install logs show that the vlan_tci field is matched to zero. However, the datapath flow dumps remain as they were before, as they are performed using the netlink format. Add a test to verify that packet with a vlan will not match a rule that may seem wildcarding the presence of the vlan tag. Applying this test without the userspace datapath modification showed that the userspace datapath failed to create a new datapath flow for the VLAN packet before this patch. Reported-by: Tony van der Peet <tony.vanderpeet@gmail.com> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-09-18 17:47:37 -07:00
Daniele Di Proietto	f2f44f5da0	dpif-netdev: Check for PKT_RX_RSS_HASH flag. DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is set in 'ol_flags'. Otherwise the hash is garbage and doesn't relate to the packet. This fixes an issue with vhost, which, being a virtual NIC, doesn't compute the hash. Reported-by: Dongjun <dongj@dtdream.com> Suggested-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2015-09-11 17:43:39 +01:00
Pravin B Shelar	7f9b850474	tnl-ports: Add destination IP and MAC address to the match. Currently tnl-port table wildcard destination ip and mac addresses for given tunnel packet. That could result accepting tunnel packets destined for other hosts. Following patch adds support for matching for ip and mac address. IP address upates to tnl-port table are piggybacked on ovs-router updates. Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-09-08 16:24:35 -07:00
Alex Wang	e4e74c3a2b	dpif-netdev: Purge all ukeys when reconfigure pmd. When dpdk configuration changes, all pmd threads are recreated and rx queues of each port are reloaded. After this process, rx queue could be mapped to a different pmd thread other than the one before reconfiguration. However, this is totally transparent to ofproto layer modules. So, if the ofproto-dpif-upcall module still holds ukeys generated before pmd thread recreation, this old ukey will collide with the ukey for the new upcalls from same traffic flow, causing flow installation failure. To fix the bug, this commit adds a new call-back function in dpif layer for notifying upper layer the purging of datapath (e.g. pmd thread deletion in dpif-netdev). So, the ofproto-dpif-upcall module can react properly with deleting the ukeys and with collecting flows' last stats. Reported-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Alex Wang <ee07b291@gmail.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joestringer@nicira.com>	2015-09-02 05:57:59 +00:00
Jarno Rajahalme	5fcff47b0b	flow: Add struct flowmap. Struct miniflow is now sometimes used just as a map. Define a new struct flowmap for that purpose. The flowmap is defined as an array of maps, and it is automatically sized according to the size of struct flow, so it will be easier to maintain in the future. It would have been tempting to use the existing struct bitmap for this purpose. The main reason this is not feasible at the moment is that some flowmap algorithms are simpler when it can be assumed that no struct flow member requires more bits than can fit to a single map unit. The tunnel member already requires more than 32 bits, so the map unit needs to be 64 bits wide. Performance critical algorithms enumerate the flowmap array units explicitly, as it is easier for the compiler to optimize, compared to the normal iterator. Without this optimization a classifier lookup without wildcard masks would be about 25% slower. With this more general (and maintainable) algorithm the classifier lookups are about 5% slower, when the struct flow actually becomes big enough to require a second map. This negates the performance gained in the "Pre-compute stage masks" patch earlier in the series. Requested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-08-26 15:37:22 -07:00
Alex Wang	fbe0962b28	coverage: Add coverage_try_clear() for performance-critical threads. For performance-critical threads like pmd threads, we currently make them never call coverage_clear() to avoid contention over the global mutex 'coverage_mutex'. So, even though pmd thread still keeps updating their thread-local coverage count, the count is never attributed to the global total. But it is useful to have them available. This commit makes this happen by implementing a non-contending version of the clear function, coverage_try_clear(). The function will use the ovs_mutex_trylock() and return immediately if the mutex cannot be acquired. Since threads like pmd thread are always busy-looping, the lock will eventually be acquired. Requested-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@nicira.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com	2015-08-25 16:21:36 -07:00
Jesse Gross	6728d578f6	dpif-netdev: Translate Geneve options per-flow, not per-packet. The kernel implementation of Geneve options stores the TLV option data in the flow exactly as received, without any further parsing. This is then translated to known options for the purposes of matching on flow setup (which will then install a datapath flow in the form the kernel is expecting). The userspace implementation behaves a little bit differently - it looks up known options as each packet is received. The reason for this is there is a much tighter coupling between datapath and flow translation and the representation is generally expected to be the same. This works but it incurs work on a per-packet basis that could be done per-flow instead. This introduces a small translation step for Geneve packets between datapath and flow lookup for the userspace datapath in order to allow the same kind of processing that the kernel does. A side effect of this is that unknown options are now shown when flows dumped via ovs-appctl dpif/dump-flows, similar to the kernel. There is a second benefit to this as well: for some operations it is preferable to keep the options exactly as they were received on the wire, which this enables. One example is that for packets that are executed from ofproto-dpif-upcall to the datapath, this avoids the translation of Geneve metadata. Since this conversion is potentially lossy (for unknown options), keeping everything in the same format removes the possibility of dropping options if the packet comes back up to userspace and the Geneve option translation table has changed. To help with these types of operations, most functions can understand both formats of data and seamlessly do the right thing. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2015-08-05 20:26:48 -07:00
Jesse Gross	9f861c9182	dpif-netdev: Don't use metaflow to operate on userspace datapath fields. If ofproto-dpif installs a flow into the userspace datapath that doesn't include a mask, we need to synthesize an exact match one. This is currently done using the metaflow infrastructure, iterating over each field and setting it to all ones. There is a conceptual mismatch here because metaflow is operating on OpenFlow fields, not datapath ones. Even though they are generally very similar, there are subtle differences, which is why it is necessary to fix up the input port mask. With Geneve options, the mapping is much more complicated and so the situation is worse. The first issue is that the metaflow to flow mapping can change over time, so we would need to do more revalidation to track this. In addition, an upcoming patch will completely disconnect the option format between ofproto-dpif and dpif-netdev, so the values written by metaflow don't make sense at all. When megaflows are turned off, ofproto-dpif internally generates masks using flow_wildcards_init_for_packet(). Since that's the same as what we want to do here, we can just use that instead of metaflow. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2015-08-05 20:25:40 -07:00
Ilya Maximets	2aca813cf4	dpif-netdev: fix race for queues between pmd threads. Currently pmd threads select queues in pmd_load_queues() according to get_n_pmd_threads_on_numa(). This behavior leads to race between pmds, beacause dp_netdev_set_pmds_on_numa() starts them one by one and current number of threads changes incrementally. As a result we may have the following situation with 2 pmd threads: * dp_netdev_set_pmds_on_numa() * pmd12 thread started. Currently only 1 pmd thread exists. dpif_netdev(pmd12)\|INFO\|Core 1 processing port 'port_1' dpif_netdev(pmd12)\|INFO\|Core 1 processing port 'port_2' * pmd14 thread started. 2 pmd threads exists. dpif_netdev\|INFO\|Created 2 pmd threads on numa node 0 dpif_netdev(pmd14)\|INFO\|Core 2 processing port 'port_2' We have: core 1 --> port 1, port 2 core 2 --> port 2 Fix this by starting pmd threads only after all of them have been configured. Cc: Daniele Di Proietto <diproiettod at vmware.com> Cc: Dyasly Sergey <s.dyasly at samsung.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ilya Maximets <i.maximets at samsung.com> Signed-off-by: Ethan Jackson <ethan@nicira.com>	2015-08-04 12:36:45 -07:00
Jarno Rajahalme	361d808dd9	flow: Split miniflow's map. Use two maps in miniflow to allow for expansion of struct flow past 512 bytes. We now have one map for tunnel related fields, and another for the rest of the packet metadata and actual packet header fields. This split has the benefit that for non-tunneled packets the overhead should be minimal. Some miniflow utilities now exist in two variants, new ones operating over all the data, and the old ones operating only on a single 64-bit map at a time. The old ones require doubling of code but should execute faster, so those are used in the datapath and classifier's lookup path. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-07-17 15:18:43 -07:00
Jarno Rajahalme	8cd27cfd1e	dpif-netdev: Skip also xregs when building a mask. When no mask is passed in, dpif_netdev_mask_from_nlattrs() builds a mask from all possible fields whose prerequisites are met. OpenFlow metadata is not relevant for a datapath flow, so we skip registers and metadata. Do this for XREGs as well. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-07-17 15:18:43 -07:00
Jarno Rajahalme	09b0fa9c55	flow: Make compile with MSVC. MSVC does not like zero sized arrays in structs. Hence, remove the 'values' member from struct miniflow and add back the getters miniflow_values() and miniflow_get_values(). Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-07-16 17:42:24 -07:00
Jarno Rajahalme	8fd4792403	flow: Always inline miniflows. Now that performance critical code already inlines miniflows and minimasks, we can simplify struct miniflow by always dynamically allocating miniflows and minimasks to the correct size. This changes the struct minimatch to always contain pointers to its miniflow and minimask. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-07-15 13:17:10 -07:00
Joe Stringer	2494ccd78f	odp-util: Share fields between odp and dpif_backer. Datapath support for some flow key fields is used inside ofproto-dpif as well as odp-util. Share these fields using the same structure. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>	2015-07-06 10:17:37 -07:00
Jesse Gross	35303d715b	tunnels: Don't initialize unnecessary packet metadata. The addition of Geneve options to packet metadata significantly expanded its size. It was reported that this can decrease performance for DPDK ports by up to 25% since we need to initialize the whole structure on each packet receive. It is not really necessary to zero out the entire structure because miniflow_extract() only copies the tunnel metadata when particular fields indicate that it is valid. Therefore, as long as we zero out these fields when the metadata is initialized and ensure that the rest of the structure is correctly set in the presence of a tunnel, we can avoid touching the tunnel fields on packet reception. Reported-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-07-01 15:24:04 -07:00
Mark Kavanagh	7dd671f08e	dpif-netdev: log port/core affinity When using multiple PMDs and numerous ports, a performance gain may be achieved in some use cases by pinning a PMD/port to a particular (set of) core(s). This patch provides a summary of the switch's port/core affinities each time that the status of the switch's ports is modified. Based on this information, a user may determine what affinity modifications are required to optimise performance for their particular use case. Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Wojciech Andralojc <wojciechx.andralojc@intel.com> Acked-by: Flavio Leitner <fbl@redhat.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2015-06-25 11:21:38 -07:00
Jesse Gross	ec1f6f327e	odp-util: Pass down flow netlink attributes when translating masks. Sometimes we need to look at flow fields to understand how to parse an attribute. However, masks don't have this information - just the mask on the field. We already use the translated flow structure for this purpose but this isn't always enough since sometimes we actually need the raw netlink information. Fortunately, that is also readily available so this passes it down from the appropriate callers. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-06-25 11:08:58 -07:00
Jesse Gross	3ee6026aba	bitmap: Convert single bitmap functions to 64-bit. Currently the functions to set, clear, and iterate over bitmaps only operate over 32 bit values. If we convert them to handle 64 bit bitmaps, they can be used in more places. Suggested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-06-25 11:08:31 -07:00
Justin Pettit	2d34dbd9e1	Merge remote-tracking branch 'origin/master' into ovn4	2015-06-18 22:02:55 -07:00
Jesse Gross	5262eea1b8	odp-util: Convert flow serialization parameters to a struct. Serializing between userspace flows and netlink attributes currently requires several additional parameters besides the flows themselves. This will continue to grow in the future as well. This converts the function arguments to a parameters struct, which makes the code easier to read and allowing irrelevant arguments to be omitted. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com>	2015-06-18 16:42:48 -07:00
Ben Pfaff	8420c7ad4e	dummy: Introduce new --enable-dummy=system option. Until now there have been two variants for --enable-dummy: * --enable-dummy: This adds support for "dummy" dpif and netdev. * --enable-dummy=override: In addition, this replaces every existing dpif and netdev by the dummy type. The latter is useful for testing but it defeats the possibility of using the userspace native tunneling implementation (because all the tunnel netdevs get replaced by dummy netdevs). Thus, this commit adds a third variant: * --enable-dummy=system: This replaces the "system" dpif and netdev by dummies but leaves the others untouched. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>	2015-06-16 08:21:38 -07:00
Ben Pfaff	c4ea752900	dpif: Generalize test for dummy dpifs beyond the name. When --enable-dummy=system or --enable-dummy=override is in use, dpifs other than "dummy" are actually dummy dpifs, so use a more reliable test. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>	2015-06-16 08:21:28 -07:00
Daniele Di Proietto	72a5e2b8fc	dpif-netdev: Prefetch next packet before miniflow_extract(). It appears that miniflow_extract() in emc_processing() spends a lot of cycles waiting for the packet's data to be read. Prefetching the next packet's data while parsing removes this delay. For a single flow pipeline the throughput improves by ~10%. With a more realistic pipeline the change has a much smaller effect (~0.5% improvement) Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-06-15 15:03:50 -07:00
Joe Stringer	bdd7ecf5bf	types: Rename and move ovs_u128_equal(). This function doesn't need to be exported in the public OVS headers, and it had an inconsistent name compared to uuid_equals(). Rename and move. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-06-09 18:20:02 -07:00
Daniele Di Proietto	3bcc10c070	dpif-netdev: Fix non-pmd thread queue id. Non pmd threads have a core_id == UINT32_MAX, while queue ids used by netdevs range from 0 to the number of CPUs. Therefore core ids cannot be used directly to select a queue. This commit introduces a simple mapping to fix the problem: pmd threads continue using queues 0 to N (where N is the number of CPUs in the system), while non pmd threads use queue N+1. Fixes: d5c199ea7ff7 ("netdev-dpdk: Properly support non pmd threads.") Reported-by: 차은호 <eunho.cha@atto-research.com Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Mark D. Gray <mark.d.gray@intel.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Flavio Leitner <fbl@redhat.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-06-03 15:34:58 -07:00
Daniele Di Proietto	d5c199ea7f	netdev-dpdk: Properly support non pmd threads. We used to reserve DPDK lcore 0 for non pmd operations, making it difficult to use core 0 for packet processing. DPDK 2.0 properly support non EAL threads with lcore LCORE_ID_ANY. Using non EAL threads for non pmd threads, we do not need to reserve any core for non pmd operations Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-05-22 11:28:19 -07:00
Daniele Di Proietto	bd5131ba76	ovs-numa: Change 'core_id' to unsigned. DPDK lcore_id is unsigned. We need to support big values like LCORE_ID_ANY (=UINT32_MAX). Therefore I am changing the type everywhere in OVS. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-05-22 11:28:19 -07:00
Daniele Di Proietto	048963aa85	dpif-netdev: Reset RSS hash when recirculating. Having the same RSS hash after recirculation can cause unnecessary collisions in the exact match cache. A simple solution is to rehash it with the recirculation depth if it is non-zero. Suggested-by: Ethan Jackson <ethan@nicira.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-05-21 14:00:24 -07:00
Ethan Jackson	603f2ce04d	dpif-netdev: Clear flow batches before execute. When executing actions, it's possible a recirculation will occur causing dp_netdev_input() to be called multiple times. If the batch pointers embedded in dp_netdev_flow aren't cleared, it's possible packets after the recirculation will be reinserted into a batch associated with the original lookup. This could be very bad. This patch fixes the problem by zeroing out flow batch pointers before calling packet_batch_execute(). This probably has a slightly negative performance impact, though I haven't tried it. Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2015-05-21 13:49:46 -07:00
Ciara Loftus	fc82e877ef	dpif-netdev: Increase the number of EMC entries Prior to this commit, the number of possible entries in the Exact Match Cache stood at 1024 per thread exacting to 0.18Mb. A typical server system will have 2.5Mb cache per core meaning a larger EMC will comfortably fit in. This patch increases the number of entries to 8192 per thread (1.4Mb) which in turn yields improved throughput when processing multiple flows of traffic. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-05-21 13:46:43 -07:00
Ethan Jackson	cd159f1a82	dpdk: Ditch MAX_PKT_BURST macro. The MAX_PKT_BURST and NETDEV_MAX_RX_BATCH macros had a confusing relationship. They basically purport to do the same thing, making it unclear which is the source of truth. Furthermore, while NETDEV_MAX_RX_BATCH was 256, MAX_PKT_BURST was 32, meaning we never process a batch larger than 32 packets further adding to the confusion. This patch resolves the issue by removing MAX_PKT_BURST completely, and shrinking the new NETDEV_MAX_BURST macro to only 32. This should have no change in the execution path except shrinking a couple of structs and memory allocations (can't hurt). Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2015-05-19 14:47:00 -07:00
Daniele Di Proietto	8aaa125dab	dpif-netdev: Share emc and fast path output batches. Until now the exact match cache processing was able to handle only four megaflows. The rest of the packets was passed to the megaflow classifier. The limit was arbitraly set to four also because the algorithm used to group packets in output batches didn't perform well with a lot of megaflows. After changing the algorithm and after some performance testing it seems much better just to share the same output batches between the exact match cache and the megaflow classifier. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2015-05-18 15:14:02 -07:00
Daniele Di Proietto	11e5cf1f90	dpif-netdev: Store batch pointer in dp_netdev_flow. The userspace datapath 1. receives a batch of packets. 2. finds a 'netdev_flow' (megaflow) for each packet. 3. groups the packets in output batches based on the 'netdev_flow'. Until now the grouping (2) was done using a simple algorithm with a O(N^2) runtime, where N is the number of distinct megaflows of the packets in the incoming batch. This could quickly become a bottleneck, even with a small number of megaflows. With this commit the datapath simply stores in the 'netdev_flow' (the megaflow) a pointer to the output batch, if one has been created for the current input batch. The pointer will be cleared when the output batch is sent. In a simple phy2phy test with 128 megaflows the throughput is more than doubled. The reason that stopped us from doing this change was that the 'netdev_flow' memory was shared between multiple threads: this is no longer the case with the per-thread classifier. Also, this commit reorders struct dp_netdev_flow to group toghether the members used in the fastpath. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2015-05-18 15:14:02 -07:00
Daniele Di Proietto	efa2bcbb35	dpif-netdev: Store pkt_metadata structure in dp_netdev_port. Initializing a struct pkt_metadata for every packet can be surprisingly expensive. It's much faster to keep a copy for each port and copying it on each packet. Suggested-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2015-05-18 15:14:02 -07:00
Daniele Di Proietto	28e2fa027d	dpif-netdev: Batch packets when recirculating. Now that we have per packet metadata, there's no need to split packet batches when recirculating. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-20 12:56:29 -07:00
Daniele Di Proietto	2bc1bbd27d	dp-packet: Rename 'dp_hash' in 'rss_hash'. We already have the 'dp_hash' embedded in the metadata. This caused confusion in the code. With this commit it should be clear that 'rss_hash' is the packet hash used for internal purposes, while 'md.dp_hash' is part of the flow, computed during the execution of certain actions. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-20 12:49:41 -07:00
Daniele Di Proietto	11bfdaddf2	dpif-netdev: Cache time_msec() calls for each received batch. Calling time_msec() (which calls clock_gettime()) too often might be expensive. With this commit OVS makes only one call per received batch and caches the result. Suggested-by: Ethan Jackson <ethan@nicira.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-20 12:49:41 -07:00
Daniele Di Proietto	9ff55ae284	dpif-netdev: Store actions data and size contiguously. As stated by the comment above the structure, the 'action' pointer does not change during the 'dp_netdev_actions' lifetime: we might as well embed the pointed memory into the structure. The commit also updates the description of dp_netdev_actions_create(). Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-20 12:49:41 -07:00
Ben Pfaff	17050610ec	dpif-netdev: Reject adding duplicate ports. Otherwise it is at least very confusing. Found during testing. An upcoming commit adds a test. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2015-04-16 08:13:10 -07:00
Daniele Di Proietto	6553d06bd1	dpif-netdev: Add dpif-netdev/pmd-stats-* appctl commands. These commands can be used to get packets and cycles counters on a pmd thread basis. They're useful to get a clearer picture about the performance of the userspace datapath. They export these pieces of information: - A (per-thread) view of the caches hit rate. Hits in the exact match cache are reported separately from hits in the masked classifier - A rough cycles count. This will allow to estimate the load of OVS and the polling overhead. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-14 12:31:30 -07:00
Daniele Di Proietto	c8973eb634	dpif-provider: Add class init function. This init function is called when the dpif class is registered. It will be used by following commits Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-14 12:30:11 -07:00
Daniele Di Proietto	55e3ca97d1	dpif-netdev: Add simple per pmd-thread cycles counters. The counters use x86 TSC if available (currently only with DPDK). They will be exposed by subsequents commits Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-14 12:28:50 -07:00
Daniele Di Proietto	abcf3ef4c3	dpif-netdev: Count exact match cache hits. We used to count exact match cache hits and masked classifier hits together. This commit splits the DP_STAT_HIT counter into two. This change will be used by future commits. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-09 15:00:52 -07:00
Daniele Di Proietto	eb94da30ae	dpif-netdev: Make datapath and flow stats atomic. A read operation from a non atomic shared value (without external locking) can return incorrect values. Using the atomic semantics prevents this from happening. However: * No memory barriers are used. We don't need that kind of consistency for statistics (we use relaxed operations). * The updates are not atomic, just the loads and stores. This is ok because there's a single writer. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-09 15:00:52 -07:00
Daniele Di Proietto	60fc3b7ba4	dpif-netdev: Group statistics updates in the slow path. Since statistics updates might require locking (in future commits) grouping them will reduce the locking overhead. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-09 15:00:52 -07:00
Daniele Di Proietto	97447f55a9	dpif-netdev: Remove support for DPIF_FP_ZERO_STATS flag Since flow statistics are thread local and updated without any lock, it is not correct to do a memset from another thread. This commit simply removes the support for the flag. It is not needed by ofproto-dpif, it is only exposed by dpctl commands. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-04-02 17:55:17 -07:00
Daniele Di Proietto	7ad20cbd96	dpif-netdev: Account for and free lost packets. Packets for which an upcall has failed (lost packets) must be deleted. We also need to count them as MISS and LOST. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2015-03-30 13:17:41 -07:00
Pravin B Shelar	6fd6ed71cb	ofpbuf: Simplify ofpbuf API. ofpbuf was complicated due to its wide usage across all layers of OVS, Now we have introduced independent dp_packet which can be used for datapath packet, we can simplify ofpbuf. Following patch removes DPDK mbuf and access API of ofpbuf members. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2015-03-03 13:37:39 -08:00

1 2 3 4 5 ...

381 Commits