mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

Author	SHA1	Message	Date
Ethan Jackson	d7a291ac1c	timeval: Simplify poll interval logging. log_poll_interval() is a little bit too aggressive, and is therefore less useful than it could be. This patch removes the mean interval calculation, and simply logs if the poll loop took longer than 1 second instead. Signed-off-by: Ethan Jackson <ethan@nicira.com>	2012-10-20 17:26:38 -07:00
Ethan Jackson	61a7b1e6fe	timeval: Block SIGALRM when sleeping. Commit 00a16895 (timeval: Don't require signals for time_alarm().) Incorrectly disabled signals when when CACHE_TIME was disabled. In fact, the reverse was correct. As a result of this bug, OVS would wake once every 100ms unnecessarily. It shouldn't have affected correctness otherwise. Signed-off-by: Ethan Jackson <ethan@nicira.com>	2012-10-16 15:02:45 -07:00
Ethan Jackson	9d232a6d57	timeval: Add new "backtrace" appctl command. Often, it can be quite difficult to debug performance issues in Open vSwitch. Typically one needs to run something like gprof, but that requires rebuilding and installing on the affected system which is often problematic. This patch adds a light weight profiling solution which can be used in these situations. The ovs-appctl backtrace command prints out backtraces taken at 100 millisecond intervals over a 5 second period of time. It is currently only supported on systems which have the execinfo library and enable time caching. Signed-off-by: Ethan Jackson <ethan@nicira.com>	2012-10-14 19:36:39 -07:00
Ethan Jackson	361906b1e2	config: Add explicit support for building on ESX. The ESX userspace looks quite a bit like linux, but has some key differences which need to be specially handled in the build. To distinguish between ESX and systems which use the linux datapath module, this patch adds two new macros "ESX" and "LINUX_DATAPATH". It uses these macros to disable building code on ESX which only applies to a true Linux environment. In addition, it adds a new route-table-stub implementation which is required for the build to complete successfully on ESX. Signed-off-by: Ethan Jackson <ethan@nicira.com>	2012-10-09 15:23:47 -07:00
Ethan Jackson	a457574880	timeval: Recover from failed timer_create() calls. The timer_create() system call is not supported in ESX and returns an error when called. Aborting when this system call fails seems a bit extreme. So instead, this patch simply falls back to disabling the cached time optimization. Signed-off-by: Ethan Jackson <ethan@nicira.com>	2012-10-05 11:33:11 -07:00
Ethan Jackson	00a1689587	timeval: Don't require signals for time_alarm(). Before this patch, time_alarm() used the SIGALRM handler to notify the poll loop that it should exit the program. Instead, this patch simply implements time_alarm() directly in the pool loop. This significantly simplifies the code, while removing a call to timer_create() which is not currently supported on ESX. Signed-off-by: Ethan Jackson <ethan@nicira.com>	2012-10-05 11:17:00 -07:00
Leo Alterman	49635519ba	timeval: On Linux x86-64 systems refresh time whenever it is requested. 64-bit Linux appears to avoid syscalls for clock_gettime(), so we can get higher resolution timing and avoid having a timer firing off SIGALRM without introducing extra overhead. Signed-off-by: Leo Alterman <lalterman@nicira.com>	2012-08-09 15:06:38 -07:00
Raju Subramanian	e0edde6fee	Global replace of Nicira Networks. Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc. Feature #10593 Signed-off-by: Raju Subramanian <rsubramanian@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-05-02 17:08:02 -07:00
Ben Pfaff	a5f607bc89	coverage: Make ovs-appctl command more useful and less alarming. I've had a few complaints that ovs-vswitchd logs its coverage counters at WARN level, but this is mainly wrong: ovs-vswitchd only logs coverage counters at WARN level when the "coverage/log" command is used through ovs-appctl. This was even documented. The reason to log at such a high level was to make it fairly certain that these messages specifically requested by the admin would not be filtered out before making it to the log. But it's even better if the admin just gets the coverage counters as a reply to the ovs-appctl command. So that is what this commit does. This commit also improves the documentation of the ovs-appctl command. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-04-26 14:07:29 -07:00
Ben Pfaff	c563de0e38	timeval: Rate-limit logging rusage information. I'd always assumed that the exponentially weighted moving average code here was sufficient rate-limiting, but I actually encountered a pathological case some time ago that forced this rusage information to print once a second or so, which seems too often. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-04-26 14:06:27 -07:00
Ben Pfaff	f802352d41	timeval: Add "time/stop" unixctl command, for use in unit tests. Although we try to avoid it, some unit tests are necessarily timing-sensitive. The new "time/stop" command that this commit adds should help with that, by preventing time from advancing from the viewpoint of the OVS "timeval" functions except when "time/warp" explicitly advances the current time. This should allow the unit tests that need it to become reproducible regardless of the speed at which the tests run. This commit adds one unit of "time/stop" to the unit test suite, in the one timing-sensitive test of which I am currently aware. Bug #9782. Reported-by: Tim Chen <tchen@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-02-28 17:14:22 -08:00
Ethan Jackson	bde9f75de1	unixctl: New JSON RPC back-end. The unixctl library had used the vde2 management protocol since the early days of Open vSwitch. As Open vSwitch has matured, several Python daemons have been added to the code base which would benefit from a unixctl implementations. Instead of implementing the old unixctl protocol in Python, this patch changes unixctl to use JSON RPC for which we already have an implementation in both Python and C. Future patches will need to implement a unixctl library in Python on top of JSON RPC. Signed-off-by: Ethan Jackson <ethan@nicira.com>	2012-02-21 00:02:20 -08:00
Ben Pfaff	4ae90ff9e2	timeval: New function time_boot_msec(), factored out of vlog. An upcoming commit has a new use for the time at which OVS started up, so this moves this functionality to a common location. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-02-01 14:15:12 -08:00
Ben Pfaff	6197af6e4b	timeval: Add ability to fast-forward time, for unit testing. Signed-off-by: Ben Pfaff <blp@nicira.com>	2011-12-19 14:53:54 -08:00
Ben Pfaff	cee03df4f5	poll-loop: Track the poll timeout as an absolute, not a relative time. This is a necessary prerequisite for allowing time to be "fast forwarded" in unit tests, to keep tests that depend on the passage of time from running in real time. Without this change, a code sequence like this: poll_timer_wait(1000); ...fast forward time 5 seconds... poll_block(); would still sleep for a second, because the poll_loop module would still have a relative timeout of 1000 ms. Signed-off-by: Ben Pfaff <blp@nicira.com>	2011-12-19 14:53:48 -08:00
Ben Pfaff	959ec62e32	poll-loop: Automatically log reason for wakeup when CPU usage spikes. For a long time, the poll-loop module has had the ability to log the reason for wakeups, which is valuable for debugging excessive use of CPU time. But I have to ask users to turn up the log level for the module, which wastes their time and mine. This commit improves the situation by automatically logging the reason for a wakeup whenever a process's estimated CPU usage rises above 50%. (ovs-vswitchd often uses less than 1% CPU; more than 5% CPU is uncommon.)	2011-05-24 12:23:42 -07:00
Ben Pfaff	279c9e0308	Log anything that could prevent a daemon from starting. If a daemon doesn't start, we need to know why. Being able to consistently consult the log to find out is helpful.	2011-04-04 10:58:55 -07:00
Ben Pfaff	14865427e6	timeval: Only log poll intervals longer than 50 ms. When poll interval-based logging was introduced a long time, we were actively interested in looking at almost every long poll interval. But these days, with OVS working rather well, with pretty good latency, most of the messages are red herrings that bother some administrators and provoke false reports. So this commit suppresses all but the most egregious long poll intervals that may in fact be worth looking at. NIC-366.	2011-03-30 11:31:16 -07:00
Ben Pfaff	d98e600755	vlog: Make client supply semicolon for VLOG_DEFINE_THIS_MODULE. It's kind of odd for VLOG_DEFINE_THIS_MODULE to supply its own semicolon, so this commit switches to the more common form.	2010-10-29 09:48:47 -07:00
Joe Perches	d295e8e97a	treewide: Remove trailing whitespace Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>	2010-08-30 13:23:08 -07:00
Ben Pfaff	5136ce492c	vlog: Introduce VLOG_DEFINE_THIS_MODULE for declaring vlog module in use. Adding a macro to define the vlog module in use adds a level of indirection, which makes it easier to change how the vlog module must be defined. A followup commit needs to do that, so getting these widespread changes out of the way first should make that commit easier to review.	2010-07-21 15:47:09 -07:00
Ben Pfaff	ff8bb7e76b	timeval: Make time_init() static and remove calls to it. Since the timeval module now initializes itself on-demand, there is no longer any need to initialize it explicitly, or to provide an interface to do so.	2010-07-21 15:47:09 -07:00
Ben Pfaff	7bc9188dda	timeval: Make timeval module initialize itself. It's more convenient if clients don't have to initialize modules explicitly.	2010-07-21 15:43:30 -07:00
Ben Pfaff	f54e56fc50	timeval: Integrate CLOCK_MONOTONIC detection into time_init(). I don't see a reason that set_up_monotonic() should be separate from time_init(). Doing all the time initialization in one place seems reasonable, so this commit makes that change.	2010-07-21 15:40:26 -07:00
Ben Pfaff	4cfffdd86b	timeval: Hide "memory leak" from Valgrind. In glibc, "timer_t" is a "void *" that appears to point into malloc()'d memory. By throwing it away entirely, we leak it, which makes valgrind complain. We really don't ever care to use the timer object again, but we can't destroy it without stopping the periodic timer. So make it static to avoid a warning from Valgrind.	2010-06-25 13:28:34 -07:00
Jesse Gross	c73814a3e6	timeval: Use monotonic time where appropriate. Most of the timekeeping needs of OVS are simply to measure intervals, which means that it is sensitive to changes in the clock. This commit replaces the existing clocks with monotonic timers. An additional set of wall clock timers are added and used in locations that need absolute time. Bug #1858	2010-06-08 18:01:25 -07:00
Ben Pfaff	ea8cd10d29	timeval: Always log true poll interval instead of rounding off. This code triggers when a trip through the process's main loop takes much longer than expected. The code for calculating the expected time rounds down to a maximum of 10000 ms to avoid overflow. But there is no reason that the correct time should not be displayed in the log message, and furthermore displaying the correct time may help tracking down the underlying issue, since it lets the administrator find out exactly when the trip through the main loop started. So this commit displays the exact time without rounding down.	2010-03-25 11:34:01 -07:00
Ben Pfaff	58fda1dab1	Merge "master" branch into "db".	2009-12-02 11:49:53 -08:00
Ben Pfaff	d6fbec6de0	Spell verb form of "set up" correctly throughout the tree.	2009-10-26 14:41:32 -07:00
Ben Pfaff	dc81071d9f	cfg: Fix implementation of timeout in attempting to lock the config file. Without removing SA_RESTART from the SIGALRM handler, the fcntl call will never return, even after the signal handler is invoked and returns. We haven't seen a problem in practice, at least not recently, but that's probably just luck combined with not holding the configuration file lock for very long.	2009-10-23 11:52:05 -07:00
Ben Pfaff	e7cfedd6de	Add test to ensure that time advances both normally and in a daemon. This commit adds a test to ensure that the fix that makes daemonize() reinitialize the periodic timer actually worked.	2009-10-23 11:52:05 -07:00
Ben Pfaff	03fbffbda4	Make sure that time advances in a daemon between calls to time_refresh(). Open vSwitch uses an interval timer signal to tell it that its cached idea of the current time has expired. However, this didn't work in a daemon detached from the foreground session (invoked with --detach) because a child created with fork() does not inherit the parent's interval timer and we did not re-set it after calling fork(). This commit fixes the problem by setting the interval timer back up after calling fork() from daemonize(). This fix is based on code inspection (which was then verified to be correct through testing). It may not fix any actual problems in practice, because time_refresh() is called every time through the poll loop, and the poll loop typically runs more quickly than the periodic timer fires (1 ms or so average in ovs-vswitchd, vs. 100 ms timer interval).	2009-10-15 10:43:36 -07:00
Justin Pettit	6bc995e46d	logging: Lower logging level for coverage messages due to slow poll loops By default, many OVS processes keep track of their time through a poll loop. If it takes an unusually long time (measured as some distance from the mean), the processes will log stats it has been keeping about coverage. It was doing this at level WARN. On Xen systems, syslog messages written at level INFO and higher are written to /var/log/messages synchronously. This would mean that there would be dire messages that it took a few dozen milliseconds to go through the loop, meanwhile, it would take up to 6(!) seconds writing those. Meanwhile, the process would do no other processing, which could be quite serious in the case of a process such as ovs-vswitchd. This problem was somewhat masked because the time used by this logging was not used in the calculations for determining how long it was taking to get through the loop. This commit lowers the default log level for those coverage messages to INFO. On Xen systems, it raises the default level at which messages are written to syslog to WARN. Diagnosed and fixed with the help of Ian Campbell.	2009-09-11 10:49:27 -07:00
Justin Pettit	f5c6854a73	Provide ability to retrieve coverage information Previously, there was no way to induce coverage information to be displayed; it would only print when the system noticed unusual delays between polling intervals. Now, production of coverage logs can be forced with "coverage/log" command in ovs-appctl. Coverage counters may be reset with "coverage/clear".	2009-07-10 15:09:41 -07:00
Ben Pfaff	a14bc59fb8	Update primary code license to Apache 2.0.	2009-06-15 15:11:30 -07:00
Ben Pfaff	064af42167	Import from old repository commit 61ef2b42a9c4ba8e1600f15bb0236765edc2ad45.	2009-07-08 13:19:16 -07:00

36 Commits